Subscribe to DSC Newsletter

Big Data thinking - What makes Big Data different?

What separate a Big data from a huge data? I have seen the abuse of Big Data being adopted by the database and BI vendors. It lacks the spirit. Merely calculating faster or storing more should not qualify for a Big Data thinking. It is purely technological breakthrough - just like having a faster CPU. Will you call an intel CPU i7 a change of era from i5? Here are my observations and principles to be Big Data:

1. Collect the impossible

With the emergence of e-government, more data are being digitized for common use. We should be bolder in collecting what seems to be trivial using the latest technologies in capturing emotion, movement and what used to be existed in qualitative world. Now, sentiment becomes a data type with the use of language dictionary and text mining technologies. Relationship becomes identifiable using social network analysis. 

2. Bridge the unknown using prediction

In case there is a gap in the actual data availability, from my experience, we can still bridge the gap using predictive models. In case there is only triennial census data on demographic, why not using interpolation to simulate the momentum given the patterns observed?

3. Emulate people's thinking process

In a nutshell, the predictive model simply copy a human's thinking process, and factors of consideration. The difference is that a machine can only catch what is digitized. The gap between human and machine is narrowing Most of the models simply handle more data point than a normal human being - assuming a perfectly rational person is doing the job.

4. Harmonize common data points as databases

Supposed that three times out of five the people who do the same stock valuation will consider the same factor, then why not putting it in the model as a column? In such case, it means that you have to build up the regular routine to collect the data and put it properly in your database.

5. Predict one-level lower behavior

One vendor side consultant asked me how to predict manufacturing company's sales demand? I told him: predict manufacturer's customers' demand. There is no better predictor than rooting out the fundamental driver of your customers' behavior. With the possibility of data supply - we can do a better job.

I do agree the notion that half of the job can be computerized (Link). According to the Economists, the real benefits of this big data is yet to be unveiled - after replacement of the existing Big data illiterate workforce - which will take a generation to be materialized.

Views: 1483


You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Redouane F. on April 17, 2014 at 11:02am

I like your point number 3. I would like to add this.

The brain is the center of the nervous system, able to exert control over all other organs in the body. Its neural tissue is layered and folded in a way to maximize a surface area, which houses over 100 billion nerve cells. By exchanging electrical and chemical signals among themselves, these specialized cells, called neurons, can recognize a face from 150 feet away, hold a lively conversation and remember over 70 years’ worth of memories, ready access at a moment’s notice.

Those tasks sound simple to do, but cognitively speaking, they’re highly complex. In fact, not even the world’s most powerful machines can keep up with the mental capacities of a rambunctious four-year-old.

Comment by Jeffrey Ng on April 16, 2014 at 4:27am
You are right. The goal is to derive benefits for the future. Data processing as defined on how it can be fed to the modeller is also important. And from my observation, the main obstacle is mindset. Eg. The ETL person can spend 10pct more time to set up a more flexible ETL process than a less flexible but easier to write script. It is simply the quality that matters.
Comment by Redouane F. on April 15, 2014 at 9:51am

Today, rather than looking at data to assess what occurred in the past, organizations need to think in terms of continuous flows and processes. “Streaming analytics allows you to process data during an event to improve the outcome,” notes Tom Deutsch, program director for big data technologies and applied analytics at IBM. This capability is becoming increasingly important in fields such as health care. At Toronto’s Hospital for Sick Children, for example, machine learning algorithms are able to discover patterns that anticipate infections in premature babies before they occur.

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service