Subscribe to DSC Newsletter

The End of Theory: The Data Deluge Makes the Scientific Method Obsolete | Wired

Here's my rebuttal to this article published in Wired in 2008.

Vincent's rebuttal:

A lot can be done with black-box pattern detection, where patterns are found but not understood. For many applications (e.g. high frequency trading) it's fine as long as your algorithm works. But in other contexts (e.g. root cause analysis for cancer eradication), deeper investigation is needed for higher success. And in all contexts, identifying and weighting true factors that explain the cause, usually allows for better forecasts, especially if good model selection, model fitting and cross-validation is performed. But if advanced modeling requires paying a high salary to a statistician for 12 months, maybe the ROI becomes negative and black-box brute force performs better, ROI-wise. In both cases, whether caring about cause or not, it is still science. Indeed it is actually data science - and it includes an analysis to figure out when/whether deeper statistical science is really required. And ill all cases, it always involves cross-validation and design of experiment. Only the statistical theoretical modeling aspect can be ignored. Other aspects, such as scalability and speed, must be considered, and this is science too: data and computer science.

Related article

Views: 3601

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Stephen L R Ellison on May 28, 2013 at 10:46am

Dang, but there's more wrong with it than that.

For a start, the 'data deluge' is terribly domain specific. We aren't going to get huge clinical trial datasets because a) it's too expensive and b) it'd be unethical to use more subjects than absolutely necessary. Those considerations rule out 'bg data' approaches for a huge fraction of research.

Second, if you want a prediction, you'll need a theory about how the future depends on the past. No theory implies no prediction. It may be a simpler theory if it has huge backing in observed fluctuation, but it's still a theory.

And third, the scientific method is about finding underlying causes or models for the way the world works; I sincerely doubt that the basic process of checking one's theory using data is going to die just because we have more data.

 

Comment by Swamp Wizard on May 18, 2013 at 1:41pm
Doesn't data science involve a heavy dose of statistics?
Comment by David Robinson on May 14, 2013 at 3:42pm

When I saw the subject header I was going to jump in here and start some rant in response. Clearly mistaken, I still have some issue with the idea that simply running a black-box analysis  package to identify patterns constitutes 'data science'.  There has to be enough foundation in statistics to recognize that either your black-box isn't working any more or something has gone awry in your data.  

On Data Science Central

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service