A Data Science Central Community
BIG DATA and TRUE PREDICTIVE ANALYTICS are indeed turning the world 180-Degrees (upside down, so to speak).
But is BIG DATA the "answer to everything"? I find that "QUALITY DATA" is what is most important. Sometimes, from a BIG DATA set we can extract one or more "smaller QUALAITY SUB-DATASETS" that are extremely valuable. But other times, and maybe most of the times, these "QUALITY DATASETS" are found lurking in other places, like research labs...........Have any others had similar experience?
I strongly agree. The point is information is not just a bunch of data. Diving blindly into the haystack is a bunch of hay. Stack it however you want it is still hay. What is the first thing you learn as a statistician. Define your objective!! Sometimes you can explore data discover possibilities but that needs to be tested in a scientific way. I've worked with engineers most of my career and their definition of statistics is some kind of computation. Even a blind belief where data is collected to support that belief and data counter is discarded. Similar problem with management. OMG last week sales were down. What will we do? What will we do? Ever heard of randomness. Who has read Black Swan? Data analysis can tell you about past behaviors but not forecast unless factors remain the same. Change is always there!! Example erin a long time ago. Polio and hot weather correlated. Sure were. But did hot weather cause polio? That's what digging in the haystack of data can easily result it. There must be a substantive objective in order to use data and create information. Someone determined that a virus caused polio. Then that person did an experiment and to create supporting evidence. All those correlations, regressions, clustering and trees don't result in new knowledge. They can describe what is going on. It adds to human knowledge but can fail to get to actual causes without a testable hypothesis--an objective. Cross validation is part of it. But it is only part of it.
At the same time a statistician is not necessarily familiar with the problems and perils of 'big data'. These are separate fields that can be combined in a data scientist or when skilled people in allied fields can collaborate.