A Data Science Central Community
To correctly answer this question, you need to ask yourself what is the job that you want to aim? A data scientist can aim for three different jobs. For the lack of better words (or my lack of knowedge of those words!), let me classify them as
1. Analysts, 2. Consultants, 3. Engineers
Analysts: These are the guys who do the same job repeatedly (statistical analysis in clinical trials, target marketing in banks etc.). In India, I see quite a few companies that get outsourced analytics also fall in this category. I noticed that they get data in a standard form and they use the same model to analyze and use same charts to visualize. The variance from project to project is very little.
You need to be a master of one or two modules of one tool (like SAS, SPSS) for this. Any online video and an installed version of the software and some data is good enough to get you started. You do not need to have in depth understanding of science also.
Your organization itself has a lot of inertia to try anything new. I really had a tough time to convince a bank to try decision trees (they were doing logistic regression for 20 years) as late as 2010! The manager said why do you bring new things when the old ones are working fine:-)
Also, when I talked to his team about logistic regression, I realized that they did not understand the underlying mathematics or science well enough. But, it was not a major deterrent for that specific job. They were doing fine.
Beware, these are the low end jobs in data science. Choose this path if and only if you are OK with routine and not so difficult work.
2. Consultants: These are the Mckinsey, Deloitte, Booz and Hamilton kind of guys. I also see them in dedicated analytics groups of large insurance, tech companies. They work on different problems that their clients are facing and provide needed guidance and consulting.
You need a very good aptitude to understand and communicate the business problems at a big level (sort of MBAish skills). You need to be very good with a few algorithms (standard ones like trees, nearest neighbors, regression, naive bayes). If you position yourself as a data scientist and not a business consultant, you need working knowledge of more advanced algorithms also (support vector machines, beliefnets, neural nets etc.). I strongly recommend one language to implement these (R, SAS, SPSS…) hands on. Infact, now a days, I am teaching R/Shiny for my students so they can quickly put up interactive demos. I strongly recommend a visualization tool (ggplot in R or Tableu or Qlikview).
I also emphasize on understanding the underlying mathematics intuitively. You should be able to play and experiment and not just use. The problem solving and logical skills are very important.
3. Engineers: These are the product guys. Google/Amazon/FB and a score of start-ups etc. need data guys who can code and build products.
You need to be very good at SQL and one language (my favorite is Python but Java etc. is fine). Now a days, NOSQL skills (Mongo, Cassandra, HBASe etc.) and Hive/PIG kind of big data scripting skills are also very useful. You need to be very good with machine learning algorithms, efficient engineering of software and standard coding and development procedures. You most likely will work on technology and hence the business and consulting skills are not as important as the previous one.
In all three above, interestingly, an intuitive understanding of the algorithms is good enough and you do not need really deep math (I know I am scandalizing a purist here!).
If your goal is to teach and do research in data science, you need the skills mentioned in either 2 (if you want to go for teaching in a business school) or 3 (if you want to teach in a CS school). In addition, you must be extremely good in advanced undergraduate mathematics (calculus, linear algebra and coordinate geometry). Designing newer algorithms and mathematics becomes very important here. For various topics related to data science check (http://beyond.insofe.edu.in/)
So, to sum it up, the skills you need to hone depend on the specific interests you want to pursue as a data scientist. Realize that data science is very broad and hence may lead to different professions. You pick what you love and tune yourself for that.
Great article, In short I can say Data Science= Statistics + Computer Science