A Data Science Central Community
What separate a Big data from a huge data? I have seen the abuse of Big Data being adopted by the database and BI vendors. It lacks the spirit. Merely calculating faster or storing more should not qualify for a Big Data thinking. It is purely technological breakthrough - just like having a faster CPU. Will you call an intel CPU i7 a change of era from i5? Here are my observations and principles to be Big Data:
1. Collect the impossible
With the emergence of e-government, more data are being digitized for common use. We should be bolder in collecting what seems to be trivial using the latest technologies in capturing emotion, movement and what used to be existed in qualitative world. Now, sentiment becomes a data type with the use of language dictionary and text mining technologies. Relationship becomes identifiable using social network analysis.
2. Bridge the unknown using prediction
In case there is a gap in the actual data availability, from my experience, we can still bridge the gap using predictive models. In case there is only triennial census data on demographic, why not using interpolation to simulate the momentum given the patterns observed?
3. Emulate people's thinking process
In a nutshell, the predictive model simply copy a human's thinking process, and factors of consideration. The difference is that a machine can only catch what is digitized. The gap between human and machine is narrowing Most of the models simply handle more data point than a normal human being - assuming a perfectly rational person is doing the job.
4. Harmonize common data points as databases
Supposed that three times out of five the people who do the same stock valuation will consider the same factor, then why not putting it in the model as a column? In such case, it means that you have to build up the regular routine to collect the data and put it properly in your database.
5. Predict one-level lower behavior
One vendor side consultant asked me how to predict manufacturing company's sales demand? I told him: predict manufacturer's customers' demand. There is no better predictor than rooting out the fundamental driver of your customers' behavior. With the possibility of data supply - we can do a better job.
I do agree the notion that half of the job can be computerized (Link). According to the Economists, the real benefits of this big data is yet to be unveiled - after replacement of the existing Big data illiterate workforce - which will take a generation to be materialized.