A Data Science Central Community
While I was writing the last post I was wondering how long before my followers notice the mistakes I introduced in the experiments.
Let's start the treasure hunt!
1. Don't always trust your data: often they are not homogeneous.
A good data miner must always check his dataset! you should always ask to yourself whether the data have been produced in a congruent way.
I love the bubble chart because it is really a nice way to plot 3D data in 2D!!
2. Sampling the data: are you sampling correctly your data?
|The left graph shows the Training Set (in Blue the number of quakes per year, in Red the forecasting model).
The graph on the right side shows the behavior of the forecasting model over a temporal range never seen before by the system. The mean error is +/-17 quakes per year.
Just to have a better feeling of how the regressor is good, I smoothed the data through a median filter:
4. You found out a good regressor, so the phenomena has been explained: FALSE