A Data Science Central Community
Machine Learning is being hailed as “Next Generation Analytics”.Machine Learning tasks can be roughly classified as –
Python is a leader here. Being a simple general purpose scripting programming language, it has Application Programming Interfaces (APIs) that let it connect with a variety of data sources – Excel/CSV/Text files, databases, Hadoop file systems etc. More often than not we need to scrap data from the web, deal with XML and JSON data types and we need to parse that information. Python does all of that with ease and is way ahead of its competitors in this space.
Packages like scipy, numpy, pandas and sframes enableus to scale up to gigabytes of data and process it in machines with commodity hardware. With very simple functions, we can reshape data to more amenable forms for further processing.
With scikit learn and graphlab even the most sophisticated algorithms can be implemented in a few lines of code. It is very easy to tweak parameters so that the implementation suits one’s needs.
Natively built matplotlib libraries can be used to build beautiful visualizations, plots, 3d charts etc.
Python is again a leader. Being a general purpose programming language, it can integrate seamlessly with any system if the results need to be pushed downstream. Real time dashboards containing interactive visuals can be effortlessly built given its native widgets. With Python Kivy, we can build mobile applications with ML embedded. We can also export results standalone in static forms (HTML/CSV etc.) if need be.
In the corporate world, SAS is still the dominant tool for Statistical Analysis. But with the advent of open source softwares like R and python, the trend is fast changing. Startups want a cost effective infrastructure and even big enterprises are slowly switching to open source solutions as the liscence costs seem to be weighing them down. So all in all, it can be safely assumed that python will emerge as one of the primary technologies for implementing ML.
This article has been contributed by a Machine learning and Big data enthusiast.