# AnalyticBridge

A Data Science Central Community

In my opinion, there is none. If you graduated with a degree in stats, you call it computational statistics. If you graduated with a degree in computer science, you call it data mining. In both cases, it's about processing large data sets using statistical techniques. Do you have a different opinion?

Views: 6325

### Replies to This Discussion

Re the idea of meaninglessness: Blind signal processing is the analysis of data in which one doesn't know what components are there, or their meanings. In contrast, recognition techniques such as speech recognition and pattern recognitioni are used when one is searching for particular features.
I allways thought there was a huge difference, you know but now Im not so sure, I am assisting with a 'Random Forrest' implementation right now, its a bit off-piste for me, athough ironically a long time ago I was involved in early data mining experiments which were not as methodologically sound as they are today. Anyway implementing this Random Forrests application makes clear to me that not only from the technology perspective but also in terms of the math and the visualisation potential, data mining and statistical computing are asymptotic to use an odd metaphor. There are other current trends pointing that way, particularly in the 'semantic integration' and optimised search space, in my view; for what its worth.
i would say Statistical computing is a confirmatory technique and Data mining is an explainratory technique.
did you mean exploratory?
I feel "data mining" is nothing but "data analytics" aided by "computational statistics" . You need both to actually mine for knowledge. For instance, Market Basket Analysis is a type of data analytic which requires computational statistics such as Probabililty and Regression measures to beget knowledge.
Statistics has a strong set of principles based on axioms of probability. It is a subject where the random sample is the starting point. The basic difference between statistics and data mining is in the way data is generated for a study. In fact, when you have a problem, you propose to study, you define the associated population, you choose a method of sampling, you sample and then use the different statistical methods of computing to infer. In laboratory experiments data is generated using principles of experiments, like, randomization, replication and local control, then use appropriate statistical computing to infer. But in data mining the data is available, you try to identify patterns and use them for your requirement. In fact, data mining is termed as 'dirty statistics'. What is required is to use the pattern, propose experiments and generate new data, then it will be leading to statistical computing.
Strange my answer disappeared; here again:
Data mining is everything that Statistics is not, conceptually and practically.
1. Data mining role is to define hypotheses, to be run later by Statistics.
2. DM is algorithms while Statistics is a set of mathematical formulas.
3. DM is made for exploration of actual data, Statistics is made for controlled "lab" environment.
4. Statistics need provision of knowledge. DM generates it.
5. DM as an exploration tool does not necessarily require the target to be stated at start. Statistics require a target function. The same about the interrelations among variables – DM does not assume it upfront, Statistics require it to be clearly defined.
6. Statistics (if satisfied the required conditions) produce an optimum, DM dosn’t.
7. DM knows to handle complexity, incomplete data, dynamics, and unknown population mix. Statistics is much restricted about these.
8. In addition, GT type of data mining can observe rare correlations (such as irregularities, mutations, etc). Statistics is blind by definition to any effect that is not known apriori... That is why data mining was invented.

Edith

Not exactly: to get Internal statistics you first need to manipulate with texts: parse them obtaining phrases by clauses, for sentences and paragraphs. And only after that you calculate weights; where the weight refers to the frequency that a context phrase occurs in relation to other context phrases.

Therefore, statistics is one of two components and the result.

Hello Ilya.

Could it be that you talk about text mining? Because I was referring to the alpha-numeric type of data mining. Anyway, the data mining task is the same in both, i.e. to search for hidden patterns of behavior or underlying connections. In other word: DM means searching for new hypotheses and should come BEFORE the statistics test. Therefore DM is using a different methods than stat.

Edith