A Data Science Central Community
Machine data is a separate analytics discipline because the focus is on 'data in motion' not 'data at rest'.
Smart grid, smart sensors, smart switches, ad hoc networks (military, space and commercial) continuously stream data as a push model to federated or single data acquisition systems that act as both filters and funnels to an analytics solution. (See document uploaded Cloud Service Smart Grid Solution.pdf)
The current generation of data storage and relational database solutions is not designed to process this in real time or handle the growing volume of non-structured data types.
That's why hardware and OS virtualization has emerged as a cloud solution.
That's why EMC, Teradata, Oracle and IBM are embracing the Google created Hadoop data model.
That's why in memory analytics is increasingly being used to handle SQL-like query on data in transit.
That's why columnar data appliances (Netezza, AsterData, Vertica...) are growing rapidly
To see the future, check out ucirrus.com which elegantly and by far the most rapidly processes SQL queries and data visualization on hundreds of thousands of data points per second. They proved it for call record analysis in telecoms and replaced Oracle at eBay for the same reasons.
But, machine learning implies that a data modeling and inference process, with minimal human input after completion of the learning cycle, can steadily improve results over time.
This requires a continuous data forensics process running in parallel. If input data formats change or certain data types exhibit volatility (sparse data, wrong values), the machine may not be "smart" enough to detect and warn, let alone adapt and modify its algorithms.
Machine learning is excellent in industrial processes (geospatial, robotics) but weak in human behavior processes (fraud detection, building ontologies from human speech).
Really, Big Data just means rapidly growing, semi-structured, multi-modal, multi-point transactional data.
You don't have to store it in a database in relational form and then apply BI tools for analysis. These exabytes of data end up as Gigabytes once analyzed, reduced and archived.
Google knows that with its crawled data. So should BI users.