Subscribe to DSC Newsletter

After using a lot of R for analytics projects believing that it was the best language for Data Scientists, I recently had the chance to pick up Python. R does seem a bit cumbersome when dealing with interfaces to other languages or to the web such as oauth. That was my motivation, to use Python to get text from the web and later process it in R, which was, I felt the "best" tool to go about.

However, Python surprised me not only with it's web interfacing abilities, but also with it's analytical features. It got me thinking, at a lot of points, why I was still using R when Python could do is so much more elegantly. 

So here are some points where I found Python really useful. In a way, this is my version of an answer to the question Python vs R:

1. Interfaces - Like I mentioned before, the number of interfaces and wrappers in Python are huge when you compare it to R. (E.g.: Apache Spark has a direct Python interface while with R, you'd need to configure a wrapper named SparkR.) In  some cases though, R is pretty good such as Jeff Gentry's twitteR package which is amazing. 

2. Handling Large Data - Now, this is one problem all R programmers face and everyone seems to talk about RAM at some point. One option is to use H2O...I didn't find it very easy to use though it's much easier than the typical Big Data frameworks. With Python, not only do you have more interfaces to big data, but also more options to read data or even a CSV line by line. It could be used to build amazing algorithms such as the one that google built for CTR prediction - Google's Whitepaper

3. The code - I remember people talking about the learning curve in R. With python, the syntax is so readable, it almost feels like it's given you the ability to run algorithm descriptions/ pseudo codes. Warnings are a bit limited though and you can still build infinite loops in Python. The data types in Python are a bit more primitive and you feel the need to have something like a Data frame. This is where the python package "pandas" comes in. Pandas gives you R - like (sometimes better) flexibility with Data frames. One thing I didn't like about Python though was it's interface with installing packages. You can't install it easily through any of the IDEs. They do have a package named "pip" which you could use from the command line. Also, unlike R versions, Python has a v2.7.x where most of the present packages run and a v3.4 where nothing really runs but they still want everyone to start using eventually.

4. The models - Finally, that's all that we praise R for really. The tools available such as random forests, gradient boosting, glm and gam. While these were built in R earlier, with python, you have the package "scikit - learn" which gives you all of these models (I haven't explored this exhaustively, but all the models I typically use are available in python). In addition, changing python source code is much easier in case you want to build custom models over those already built. What amazed me was the python visualizations. These are as good as R and you could also choose a custom plotting tool such as Qt as well.

Overall, I am now using Python at places where I find R cumbersome and have also started using it where it's convenient. I'm new to Python and have posted most of the good things I found about it. I'll possibly write about the shortcomings in subsequent blogs as I explore further.

Views: 3201

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Amogh Borkar on March 10, 2015 at 1:08am

@Myles Gartland - Among the IDEs, I found python (x,y) great for moving from R. It has an option to run Ipython instead of the console and can be configured to be a bit like R-Studio though not as good.

While R can be called from Python through rpy2/ rmagic, I think if we could use Rserve instead to access R externally, that would make a formidable analytics platform. Has anyone tried this?

Comment by Myles Gartland on March 9, 2015 at 11:30am

I have had the same experience. Started with and love(ed) R. But now seeing the benefits of Python. That said, I am not a programmer per se, so some of the ways Python works I still need to get more comfortable with.

But toolbox of Pandas, SciKit-learn and Matplotlib are great (although I think I still like R graphics better). Also, I love the ipython notebook experience.

Of course R has packages that just can be beat and are either not implemented into Python (yet) or more difficult in Python.

So my utopia is something like rmagic and where I can use both inside a ipython notebook. 

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service