Subscribe to DSC Newsletter

Being Data Miner is nowadays much more prestigious, after Doctor ROBERT NISBET´s et al. HANDBOOK was released. Do you agree?

I started to read Handbook of Statistical Analysis & Data Mining - Applications by R. Nisbet, J. Elder, G. Miner and many guest authors just several days ago. Thus, this is not intended to be a review, this is my first impression and wish for your opinions.

I was looking forward to this book for a long time and after first brief pass through the handbook, I agree - it is fulfilling my anticipation completely. There are several good books focused on modern statistics sensu lato on the market. But this is the first really kind step-by-step-like tutorial for beginners. Not only that - it covers a lot of very diverse topics and applications providing the perfect evidence for nowadays (and future, of course) crucial importance of modern statistics in virtually any field. Moreover, many tutorials reach the professional level, revealing useful hints which are traditionally “secret know-how of experienced data miners and companies”. Several reasons why – in my opinion – this book is a great enterprise follow. I am looking forward to your opinions.

1. About 1/2 of pages are devoted to step-by-step tutorials (many other case studies and examples are on the attached DVD). Data and example outputs from statistical software are provided.
2. Virtually any field is covered (bioinformatics, risk management, CRM, scoring, finance modeling, industry and quality control, psychology, fraud detection, churn analysis, health…)
3. SAS-EM, SPSS-Clementine (Modeler) and STATISTICA Data Miner are used for example solving. That thorough opportunity to compare the 3 most common data miner software in one book is very outstanding.
4. The highest number of tutorials are performed in STATISTICA Data Miner (G. Miner´s affiliation is certainly partial catalyst of this). It is a great day for StatSoft, due to providing evidence, that STATISTICA Data Miner is able to deal with various very complex projects, at least as virtuously as more expensive competitors.
5. StatSoft released 90-day trial of SDM (on embedded DVD) for readers of the handbook. It is a good promotion of StatSoft, since trial of the SPSS and SAS data miners are very hardly available.
6. The book is not only a tutorial for “black box” modeling. Many statistical considerations, comparisons of a diverse method performance, data preparation and imputation purpose etc. are building an “analytic bridge” between the classical statistics and data mining. It removes a big piece of unbelief and uncertainty from data mining, which is still sometimes felt as a manipulation with reality and “data fishing”.

I will be goint through the handbook comprehensively and I believe that soon I will be able to add more specific findings. In the meantime, I am anticipating yours…

Jiri Kubasek, StatSoft CR

Views: 354

Reply to This

Replies to This Discussion

Hi Jiri,

It has been a great resource for learning/teaching add (Data Set, 90 Days Trial, Over view of available Data Miner Software). We validated & authenticated by few TOP B Schools & eminent academecian in India and received a good note . As a result, acquisition & migration of few academeic customers in India

We really appreciate & look forward further initiative & embedment in this brand/cocept visibility in recent futue.

Biswajit - StatSoft India.
Hi Biswajit,

This is Kiran from India - Is there an Indian version of the book that we can buy at a lower price?
Is there a way I can get a trial version of Statistica Data Miner/Text Miner in India? Will I be able to buy a version at a discounted price?

No. were this book to manage such a feat it would have to be addressed to a far broader, less-technical audience full of examples of what data mining is and can do for us. To date, there are hundreds of programmers, maybe even thousands, for every specifically titled data miner in any organization.
Management rarely values analysis in crises, and the research group is always the first one dismissed from start-ups when revenue is late!

Data mining is a huge mystery to almost everyone who isn't a practitioner. Management likes certainty, even in their risk models. And we all know how well that works out, don't we? David Li managed it with his work on risk reduction on Wall Street that ended up giving us all the credit default swap and at least a 25% hit on our collective portfolios along with a billion dollar federal bailout of immoral and incompetent business practices.

No, data mining isn't as prestigious as it is mystifying. as always. Last decade it was data miners. In the silicon valley, if you are a data miner and not a trained machine learning student, forget it. SV loves machine learning now for the same reason it loved data mining n the late 90's/early 00's. They hope it will be their magic formula and when it's not, they lose interest quickly!

Has machine learning paid off yet for anyone? has anyone in SV managed to promote their long-tail from a .2% response rate to a 20% rr? No. Did we find any terrorists this way? foil any terrorist plots? The word is "NO" there, too. Every "terror" bust that stuck came from good old fashionedd police work (snitches and survellance) while the ones that ended up tied in a legal gordian-knot originated with "suspicious behavior detected online". Dat amInign fro Terrorists was a total bust as a national effort.

Which didn't exactly help "Data Mining Prestige" a whole lot either, did it?

I know this sounds bitter, but it is true. The world hates statistics since they always see it as a way to "never say you are certain".

which is true.

the problem is not statistics or data mining being "too hard to understand", it is the USA's "positive thinking" influence on the rest of the world has managed to convince everyone that "anything is possible" if you try hard enough. which is a nice mantra for teaching kindergartners but has led us as adults to unsustainable technologies out the wazoo.

read about that phenomenon here; Barbara Ehrenreich has touched on something we see in our own industry; an inability to deal with uncertainty in responsible fashions!

Data mining is a huge mystery to almost everyone who isn't a practitioner.

I definitely agree. Even practitioners are surprised from time to time by black-box-effects, which cannot be understand because e.g. the model does allow only restricted insights or the reason is beyond the scope of the data miner (worse-than-failure errors in data collection).

The First Certainty Principle: C~ 1/K; Certainty is inversely proportional to knowledge.
A person who really understands data and analysis will understand all the pitfalls and limitations, and hence be constantly caveating what they say. Somebody who is simple, straightforward, and 100% certain usually has no idea what they are talking about.

This principle is (in my limited experience) indeed hard to explain to non-techies.

@the book: I considered to buy it, but frankly the usage of a commercial software for the examples is a huge obstacle (90-day-trial --- and then ?). I do not want the authors to re-code all the algorithms, but it would be nice if they could use one of the widely used free data mining plugin frameworks.
Dear Timothy,
thank you very much for your response. I spent some time with dictionary, to learn new vocabulary :))).
I translated almost all but "wazoo". Sorry for my poor english, but what does it mean ?

There are many interesting and true ideas in your post. For example USA´s exaggerated lust for positiveness. I have many various feelings with Americans too (I am not going to describe it here, since I am working for American company :))).

On the other hand, I am afraid, we have different definition of data mining in minds. Machine learning is in my feeling one of the component of data mining philosophy. I feel data mining in very wide sense, similarly to authors of handbook which I glorified here.

Did you read this book? Do you really think, this book is not for broad audience? And what do you think about tutorials? There are many real examples and data sets. On the other hand, I agree - many tutorials are too much schematic, finished too quickly, without many things for it data will be promising. Moreover, many screenshots are ugly.

Despite these all, I still believe, this book is making very good job for wide range of analytics and data miners.

Although "inevitableness of uncertainty" is still hardly acceptable for many people (not only from USA), it is going to be better. I believe :). For example, several days ago, on Czech TV, 20 min document about data mining was broadcasted. It is good sign. Next one, IBM is realizing power of modern statistics, thus they acquired SPSS... What do you think about?
It means "we have plenty". "p the wazoo" means we have so much of something it is coming out of our collective "wazoos". it's an indeliate reference.

I didn't read teh bok but have read so many they all seem the same. with all due respoect to the author who likely worked hard on it and is a great data miner himself, the fact is that specialization is a human commodity since long before the apprentice/journeyman/master system came along in your neck of the woods in 1500 or so! Expecting "the many" to be as good at something that currently only "the few" have mastered or even care about doing right.

IBM bought "my" software, LikeMinds, in 2000 and promptly lost all aspects of its particualr analytical abilites to chain it to their personalization rules engine, then bury it deep inside Websphere Portal Server. Intelligent Miner disappeared into the DB2 suite of "analytics", and if anyone expects to see SPSS alive in 10 years raise your hand! ;D After IBM got ahold of Likeminds our 30mS return turned into 15 mintues in the IBM Java Layer in Websphere Developers Suite. Yikes!

SO, I am a great fan of data mining and human behavior prediction, despite how fickle we all are and how unpredictable we all are to boot. I trained in it so I can see it done right. I predict that if "amateurs" use this tool, they'll only ge amateurish results!

thanks for your reply. I think this site can be quite good for discussions like this!

Thanks for the positive impressions. Those were many of our goals for the book.
One clarification: how book owners can download free trials of SAS Enterprise Miner and SPSS Clementine is explained on page xxii.


On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service