A Data Science Central Community
BigML provides APIs to solve large scale macghine learning problems such as supervised classification. I found an example of use described in the book Data Science at the Command Line, pages 153-156, offered with source code (classification of wines into red or white wines).
Does it work with large data sets (clustering 10 MM observations)? Is it free? Can the API return streaming data?
Full Disclosure: I work for BigML.
BigML offers a state of the art Machine Learning platform that is highly intuitive, programmable and scalable.
Does it work with large data sets (clustering 10 MM observations)?
BigML can handle large datasets including this type of clustering tasks. In addition to the default multi-tenant version hosted on BigML.com, we offer Private Deployments (https://bigml.com/private-deployments) in the form or either single tenant Virtual Private Cloud or On-premises installations. System performance can be linearly scaled across many more cores in our Private Deployments as all of our algorithms are built in a distributed fashion.
Is it free?
BigML offers a full-featured free version for anyone that signs up with their email at BigML.com. The only limitation on this plan is the data size to analyze, which is capped at 16MB. For those that are interested in larger datasets we have subscription plans that start from $30/mo. and go up to $7,500/mo., which lets you analyze 32GB with up to 64 parallel tasks.
Alternatively, we offer Private Deployments as described here: https://bigml.com/private-deployments
Can the API return streaming data?
BigML provides real-time predictions through its API, exportable models that can be reified to any programming language, and a PredictServer that uses BigML's Node.js bindings to support throughputs of hundreds of thousands of predictions per second. It also enables sophisticated scoring mechanisms to identify in real-time whether a model is still valid or not i.e., covariate shift or a data point is anomalous and the model is not competent to provide a predictions. The whole process of online learning can be fully implemented on top of BigML.