Subscribe to DSC Newsletter

Earthquake prediction through sunspots part II: common Data mining mistakes!

While I was writing the last post I was wondering how long before my followers notice the mistakes I introduced in the experiments.
Let's start the treasure hunt!
1. Don't always trust your data: often they are not homogeneous. here to read the entire post

A good data miner must always check his dataset! you should always ask to yourself whether the data have been produced in a congruent way. here to read the entire post

I love the bubble chart because it is really a nice way to plot 3D data in 2D!!

2. Sampling the data: are you sampling correctly your data? here to read the entire post

3. Don't rely on the good results on Training Set.
This is maybe the worst joke I played in the post :) I showed you very good results obtained with the support regression model.
The left graph shows the Training Set (in Blue the number of quakes per year, in Red the forecasting model).
The graph on the right side shows the behavior of the forecasting model over a temporal range never seen before by the system. The mean error is +/-17 quakes per year. here to read the entire post

Just to have a better feeling of how the regressor is good, I smoothed the data through a median filter:

4. You found out a good regressor, so the phenomena has been explained: FALSE

You could find whatever "link" between totally independent phenomena ... but this link is just a relation between input/output. nothing more, nothing less.
As you know this is not the place for theorems, but let me give you a sort of empirical rule:
"The dependency among variables is inverse proportional to the complexity of the regressor".

Views: 782


You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service