A Data Science Central Community
Over 100 years ago, the great science fiction writer H. G. Wells was credited with saying, "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read or write." It is clear that this statement is probably more true today than ever, as Big Data and Analytics are paraded before every aspect of life, business, government, and social media experience. Statistical thinking is the bedrock of data science as statistics is a core methodology for many disciplines, including experimental science, operations research, decision sciences, and marketing research. Yet many appear to have forgotten this (or maybe have let it "slip their mind") -- see the recent article by the American Statistical Association (ASA) President, Dr. Marie Davidian: "Aren't We Data Science?" As we read this, we need to remember also that Data Science includes several core methodologies (disciplines): machine learning (data mining), visualization, data management (including data structures, indexing, modeling, taxonomies), applied mathematics, semantics (ontologies), and application-specific discipline science, as well as the original core "data science" of statistics!
Consequently, it is wise for us to avoid the pitfalls that await us if we ignore the tenets and truths of statistics. Some of these "truisms" include:
Read more about these specific examples in the full article "Statistical Truisms in the Age of Big Data" at http://www.statisticsviews.com/details/feature/4911381/Statistical-...
Comment
This is true. Its not hard to foresee that statistics will leed us one day to live those stories told by I. Asimov. :)
@Vincent, thanks for the links to your two articles. You have really clarified the issues here. As a small counter-example, I should say that there are two well known statisticians in my department at GMU, who have called themselves Data Scientists for decades, and yet they are very respectful of the "new data science" and have graciously welcomed the invasion by this astronomer into their territory. It is an excellent positive working relationship, which I appreciate every day, in which statisticians and "modern" data scientists can work side-by-side so effortlessly and productively. Your articles clearly suggest that not all circumstances are as productive or as enlightened as mine, and consequently we still have a ways to go in this big data revolution.
I think the problem is two-fold:
1) Statisticians have not been involved in the big data revolution. Some have written books such as applied data science, but it's just a repackaging of very old stuff, and has nothing to do with data science. Read my article on fake data science, at http://www.analyticbridge.com/profiles/blogs/fake-data-science
2) Methodologies that work for big data sets - as big data was defined back in 2005 (20 million rows would qualify back then) - miserably fail on post-2010 big data (terabytes). Read my article on the curse of big data, at http://www.analyticbridge.com/profiles/blogs/the-curse-of-big-data
As a result, people think that data science is just statistics, with a new name. They are totally wrong on two points: they confuse data science and fake data science, and they confuse big data 2005 and big data 2013.
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge