Many interesting questions came up in the NIST Definitions and Taxonomy Big Data group meeting today. Brilliant minds are hard at work to stabilize the language around Big Data, but some fundamental questions have been posed that the marketplace seems to believe we have already solved.
- How do we differentiate Big Data from traditional big data like sensor feeds, credit card processing, and financial transactions. What makes it different? One noted professional taxonomist asserted that a basic differentiator may exist in the variability and variety of data.
- Has data lifecycle changed with BD? The subgroup lead Nancy Grady made a compelling argument that the position of storage in the workstream may be of interest. She pointed out that traditional decision support transforms and stores the data prior to analysis, whereas the Big Data paradigm frequently stores data raw and applies structure later (schema on read).
- Should there be an obsolescence characteristic attached to data definitions? Ubiquitous sensors (The Internet of Things) may present disposable data with immediate obsolescence which climate monitoring sensors only provide value at a future date.
- Data cleanliness may be less important than traditional BI.
- Are there certain enablers to Big Data that should be assumed in planning such as (perhaps) cloud computing?
At this point it is obvious there is no consensus on these questions, but what do we as a community of practitioners think about these questions?
The working group meetings are highly compelling and I encourage anyone who wishes to become involved to go to the group site, http://bigdatawg.nist.gov/