Subscribe to DSC Newsletter

Save 45% on all books mentioned in this mailer. Just enter wm051514 in the Promotional Code box when you check out. Applies to these selected eBooks, pBooks, and MEAPs. Expires May 21. Only at

Practical Data Science with R

Nina Zumel and John Mount
Foreword by Jim Porzak

March 2014 | 416 pages 
ISBN: 9781617291562 


Practical Data Science with R lives up to its name. It explains basic principles without the theoretical mumbo-jumbo and jumps right to the real use cases you'll face as you collect, curate, and analyze the data crucial to the success of your business. You'll apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business intelligence, and decision support.


Business analysts and developers are increasingly collecting, curating, analyzing, and reporting on crucial business data. The R language and its associated tools provide a straightforward way to tackle day-to-day data science tasks without a lot of academic theory or advanced mathematics.

Practical Data Science with R shows you how to apply the R programming language and useful statistical techniques to everyday business situations. Using examples from marketing, business intelligence, and decision support, it shows you how to design experiments (such as A/B tests), build predictive models, and present results to audiences of all levels.

This book is accessible to readers without a background in data science. Some familiarity with basic statistics, R, or another scripting language is assumed.


  • Data science for the business professional
  • Statistical analysis using the R language
  • Project lifecycle, from planning to delivery
  • Numerous instantly familiar use cases
  • Keys to effective data presentations


Nina Zumel and John Mount are cofounders of a San Francisco-based data science consulting firm. Both hold PhDs from Carnegie Mellon and blog on statistics, probability, and computer science at

Part 1 Introduction to data science

Chapter 1 The data science process
The roles in a data science project
Stages of a data science project
Setting expectations
Chapter 2 Loading data into R
Working with data from files
Working with relational databases
Chapter 3 Exploring data
Using summary statistics to spot problems
Spotting problems using graphics and visualization
Chapter 4 Managing data
Cleaning data
Sampling for modeling and validation

Part 2 Modeling methods

Chapter 5 Choosing and evaluating models
Mapping problems to machine learning tasks
Evaluating models
Validating models
Chapter 6 Memorization methods
KDD and KDD Cup 2009
Building single-variable models
Building models using many variables
Chapter 7 Linear and logistic regression
Using linear regression
Using logistic regression
Chapter 8 Unsupervised methods
Cluster analysis
Association rules
Chapter 9 Exploring advanced methods
Using bagging and random forests to reduce training variance
Using generalized additive models (GAMs) to learn non-monotone relationships
Using kernel methods to increase data separation
Using SVMs to model complicated decision boundaries

Part 3 Delivering results

Chapter 10 Documentation and deployment
The buzz dataset
Using knitr to produce milestone documentation
Using comments and version control for running documentation
Deploying models
Chapter 11 Producing effective presentations
Presenting your results to the project sponsor
Presenting your model to end users
Presenting your work to other data scientists

appendix A Working with R and other tools 
appendix B Important statistical concepts 
appendix C More tools and ideas worth exploring 

Views: 1423

Replies to This Discussion

Note to readers about the paper edition of this book. The plots and graphs are in grey scale. I wrote to the authors about this and they were kind enough to respond:

I paid for and have both the PDF and paper editions. Overall, I think the authors do a good job with the book. The content is helpful, interesting, and decently written. The PDF edition is beautiful and engaging with charts and graphs that are easy to read, but it's electronic format.

The big miss with the book is the presentation of the plots and graphs in the the paper edition. Because there is no color, the experience is taxing and dull. The paper edition hinders cognition rather than enabling it. I haven't picked up the paper edition since my first experience with it.


On Data Science Central

© 2020 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service