Subscribe to DSC Newsletter

Are there any book about popular programming languages (Python, Java, Hadoop, SQL, R etc.) that every data scientist should know? I'm talking about a 500-page book that has about 100 pages per language, presented in a very concise way, and also discussing how these languages interact.

Anybody interested in writing such a book? I think the potential for revenue is high. So far, multi-language books focus on specific topics and specific programming languages environment e.g.

  • XML, SOAP, WSDL, Perl, HTML for API development or programming
  • PHP, MySQL, JavaScript and CSS for web development

The only book I found that is truly a book on multiple languages is "Handbook of Programming Languages" (4 volumes, 1998 so it's quite old now) and it only had two reviews on Amazon - one of them was very negative.

Do programming language books need to be like natural language books - focusing on just one language? I understand that a book about how to learn Spanish, French, German and Russian might not be successful, but what about how to learn Unix, Excel, R, SQL, Python, Java and Hadoop?

PS: I plan to write 10 pages on programming languages in my training manual to become a data scientist, but it will be very concise and most likely point to external references. This booklet (once written) is not an answer to the problem discussed here.

Related articles:

Views: 10963

Reply to This

Replies to This Discussion

I really like^2 this approach of education and teaching through focused documentation rather than hundreds of pages of books written without adding any value.This is root cause of our "drowning in information and starving for knowledge"


parag Kulkarni

Hi Vincent,

I believe that reason for focusing on small quantity of languages is more reasonable in methodical way. For person who is not experienced, switching between languages is very difficult because of syntax differences, etc.

Although data analysis usually requires data processing which needs a lot of skills in different languages, also there is one more problem, and that is if person programs in lot of languages, not many companies would like to hire that person, and that is i believe another reason for finding so small quantity of multilingual programming books.

Only way to teach someone to develop in multiple languages simultaneously would be in problem driven development, when the goal is clear and all problems choose best language/method for that assignment.

I believe it can be written in 10-15 pages if the problem is simple, but more than 5 languages shouldn't be used. Although this method can produce shallow knowledge or confusion, it can also produce adaptability of user to dynamic development environment.


I taught Computer Science and Software Engineering for over 10 years and have been doing statistical software for 20 additional years -- I have graduate degrees in both areas.  I don't see how it is going to be possible to stuff enough information on multiple programming languages into 500 pages.  It is hard enough introducing a single programming language in 500 pages.  Describing a language like C just in BNF is going to take many pages.   I think the task is equivalent to writing a single 500 page statistics book that covers a similar number of topics -- think about the level of coverage that one would be allowed if one needed to stuff basic probability and statistics, regression analysis, experimental design, Bayesian statistics, samplying theory, ... into a single volume.


On another topic -- I've seen production software written by people with not enough programming knowledge and/or experience.  I've also seen statistical analysis done and the results used by people with only elementary statistical knowledge.  Both are VERY SCARY!


Hi Mike,

I think it depends on the student. Maybe I am an exception (at least I guess less than 20% of "learners" are like me - in which case your argument makes perfect sense), but for a student like me, here what works:

  • Very concise training manuals
  • Focus on the 10% of the stuff used 90% of the times: I have no interest in having a huge book with all the Unix commands: just tell me the 10 most useful ones, and give me a link to external, comprehensive references
  • Do I really need to learn the details of quick sort and 50 other similar algorithms? No, I'd rather read the principles summarized in one page, have a short illustration, and read some modern stuff such as web crawler optimization, taxonomy creation, plagiarism detection, statistical scoring
  • I'm not interested in the first 200 hundred pages of most statistical books: chapters on univariate distributions, expectation, limit theorems, Markov chains etc. Instead one link pointing to references is enough for me, but I'd be happy to have 5 pages that mention the top 100 statistical distribution and when they are used, and where you can find details about their generating function etc.
  • The scratch course on time series that I attended at Cambridge had a syllabus with only 15 pages, but contain, in easy-to-read English, material that would be covered in a 10-book encyclopedia here in US. I learned more from this books than from any other time series books.

But I understand many students need a different experience to succeed. The typical training would bore me to death and scare me away from mathematics, statistics,  computer sciences, programming languages (I learned Perl and SQL by myself without even studying in a book, but rather by playing with actual code and reverse engineering on real computers). It is even more true today, since material has not be updated (except in a few programs) since 1980.

Hope this helps,


Another example of translating from one tool to another: how to do SQL's CREATE TABLE in Pandas (Python):


On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service