A Data Science Central Community
When we devote so much time and energy talking about Big Data, are we neglecting the important things that you can do with Small Data?
Maybe, but... probably not.
Looking beyond the Big Data hype helps us to capture real value from advanced analytics on data, big and small.
The drumbeat of Big Data dialogue in social media, in the press, and everywhere merely highlights the important roles that data and analytics are now playing in all sectors. While we read, think, and dream about Big Data, we also realize that the "big" in Big Data refers to more than just data volume. We know all about Big Data velocity, variety, value, and more. For example: independent of data volume, you can have lots of variety in your data, you can have very tight real-time (data velocity) constraints, and you can derive huge value from your data assets. So, I believe that the deluge of Big Data discussions are not actually diverting our attention from small data, but they are indeed democratizing data assets -- causing us to give more attention to how we can learn from data, both big and small.
So, how do we "learn from data"?
In the field of Machine Learning, algorithms are usually categorized as Supervised, Semi-Supervised, or Unsupervised. The first and second of these usually require the use of historical training data to build and improve classification and predictive models --- it is fair to say that (in most cases) the bigger the training data set, then the better (more complete, accurate, robust) will be our predictive analytics models.
The third category of Machine Learning (Unsupervised Learning) is essentially the purest form of Data Mining (in my opinion): it is data-driven, evidence-based, unfettered by models or preconceived notions regarding the patterns in the data. It is used to discover the patterns, anomalies, categories, correlations, and features in the data, both BIG and SMALL. This is true knowledge discovery from data. One article (Shabalin et al. 2009) described it this way: "unsupervised exploratory analysis plays an important role in the study of large, high-dimensional datasets that arise in a variety of applications". When performed with rigorous systematic scientific methodology (as opposed to random "fishing expeditions"), the data mining application of Unsupervised Machine Learning algorithms becomes "powerful Jedi" Data Science.
"Discovery from Data" and "Learning from Data" certainly become more effective when the data set is big, but most types of unsupervised learning work just fine with relatively small data. There are four broad categories of unsupervised learning that you can apply to your data (big and small). These are:
Yes, of course, knowledge discovery from data (i.e., Learning From Data) is FUN, especially if it is unsupervised!
You can see more discussion of unsupervised machine learning from small data (specifically for time series data) in the article "Hello World, I'm Learning From Data!", published at http://www.bigdatarepublic.com/author.asp?section_id=3146&doc_i....
Follow Kirk on Twitter at @KirkDBorne
Comment
@paul, with regard to "rigorous systematic scientific methodology", I am talking about: (1) hypothesis generation, (2) experimental design, (3) experiment & testing, (4) data collection & analysis, (5) discovery & inference from data, (6) hypothesis refinement, (7) go back to step #1. This is the scientific method, or (more accurately) the scientific cycle. Following this formal process helps to prevent (often biased, subjective) non-statistical sampling and haphazard exploration of the data. Data Science to me should follow the scientific method: define a problem, design an experiment, do the experiment, learn from it, make data-informed inferences and decisions, and then build on that knowledge and experience.
Can you elaborate on your "rigorous systematic scientific methodology?"
When performed with rigorous systematic scientific methodology (as opposed to random "fishing expeditions"), the data mining application of Unsupervised Machine Learning algorithms becomes "powerful Jedi" Data Science.
Thanks!
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge