A Data Science Central Community
Are there any book about popular programming languages (Python, Java, Hadoop, SQL, R etc.) that every data scientist should know? I'm talking about a 500-page book that has about 100 pages per language, presented in a very concise way, and also discussing how these languages interact.
Anybody interested in writing such a book? I think the potential for revenue is high. So far, multi-language books focus on specific topics and specific programming languages environment e.g.
The only book I found that is truly a book on multiple languages is "Handbook of Programming Languages" (4 volumes, 1998 so it's quite old now) and it only had two reviews on Amazon - one of them was very negative.
Do programming language books need to be like natural language books - focusing on just one language? I understand that a book about how to learn Spanish, French, German and Russian might not be successful, but what about how to learn Unix, Excel, R, SQL, Python, Java and Hadoop?
PS: I plan to write 10 pages on programming languages in my training manual to become a data scientist, but it will be very concise and most likely point to external references. This booklet (once written) is not an answer to the problem discussed here.
I really like^2 this approach of education and teaching through focused documentation rather than hundreds of pages of books written without adding any value.This is root cause of our "drowning in information and starving for knowledge"
I believe that reason for focusing on small quantity of languages is more reasonable in methodical way. For person who is not experienced, switching between languages is very difficult because of syntax differences, etc.
Although data analysis usually requires data processing which needs a lot of skills in different languages, also there is one more problem, and that is if person programs in lot of languages, not many companies would like to hire that person, and that is i believe another reason for finding so small quantity of multilingual programming books.
Only way to teach someone to develop in multiple languages simultaneously would be in problem driven development, when the goal is clear and all problems choose best language/method for that assignment.
I believe it can be written in 10-15 pages if the problem is simple, but more than 5 languages shouldn't be used. Although this method can produce shallow knowledge or confusion, it can also produce adaptability of user to dynamic development environment.
I taught Computer Science and Software Engineering for over 10 years and have been doing statistical software for 20 additional years -- I have graduate degrees in both areas. I don't see how it is going to be possible to stuff enough information on multiple programming languages into 500 pages. It is hard enough introducing a single programming language in 500 pages. Describing a language like C just in BNF is going to take many pages. I think the task is equivalent to writing a single 500 page statistics book that covers a similar number of topics -- think about the level of coverage that one would be allowed if one needed to stuff basic probability and statistics, regression analysis, experimental design, Bayesian statistics, samplying theory, ... into a single volume.
On another topic -- I've seen production software written by people with not enough programming knowledge and/or experience. I've also seen statistical analysis done and the results used by people with only elementary statistical knowledge. Both are VERY SCARY!
I think it depends on the student. Maybe I am an exception (at least I guess less than 20% of "learners" are like me - in which case your argument makes perfect sense), but for a student like me, here what works:
But I understand many students need a different experience to succeed. The typical training would bore me to death and scare me away from mathematics, statistics, computer sciences, programming languages (I learned Perl and SQL by myself without even studying in a book, but rather by playing with actual code and reverse engineering on real computers). It is even more true today, since material has not be updated (except in a few programs) since 1980.
Hope this helps,
Another example of translating from one tool to another: how to do SQL's CREATE TABLE in Pandas (Python):