Subscribe to DSC Newsletter

The Data Science eBook is free for Analyticbridge members. Click here to get your PDF copy. Membership is free.

Updated content as of June 6, 2012. See below for an outdated HTML version; the PDF version is of much higher quality, proofread and with the most recent content. Click here to preview the 2nd EditionTo download a copy, you must be a member. Membership is free. Click here to sign up and check all the benefits of membership


Part I - Data Science Recipes

  1. New random number generator: simple, strong and fast
  2. Lifetime value of an e-mail blast: much longer than you think
  3. Two great ideas to create a much better search engine
  4. Identifying the number of clusters: finally a solution
  5. Online advertising: a solution to optimize ad relevancy
  6. Example of architecture for AaaS (Analytics as a Service)
  7. Why and how to build a data dictionary for big data sets
  8. Hidden decision trees: a modern scoring methodology
  9. Scorecards: Logistic, Ridge and Logic Regression
  10. Iterative Algorithm for Linear Regression
  11. Approximate Solutions to Linear Regression Problems
  12. Theorems for Traders
  13. Preserving metric and score consistency over time and across clients
  14. Advertising: reach and frequency mathematical formulas
  15. Real Life Example of Text Mining to Detect Fraudulent Buyers
  16. Discount optimization problem in retail analytics
  17. Sales forecasts: how to improve accuracy while simplifying models?
  18. How could Amazon increase sales by redefining relevancy?
  19. How to build simple, accurate, data-driven, model-free confidence i...
  20. Comprehensive list of Excel errors, inaccuracies and use of non-sta...
  21. 10+ Great Metrics and Strategies for Email Campaign Optimization
  22. 10+ Great Metrics and Strategies for Fraud Detection
  23. Case Study: Four different ways to solve a data science problem
  24. Case Study: Email marketing -  analytic tips to boost performance b...
  25. Optimize keyword campaigns on Google in 7 days: an 11-step procedure
  26. How do you estimate the proportion of bogus accounts on Facebook?

Part II - Data Science Discussions

  1. Statisticians Have Large Role to Play in Web Analytics (AMSTAT inte...
  2. Future of Web Analytics: Interview with Dr. Vincent Granville
  3. Connecting with the Social Analytics Experts
  4. Interesting note and questions on mathematical patents
  5. Big data versus smart data: who will win?
  6. Creativity vs. Analytics: Are These Two Skills Incompatible?
  7. Barriers to hiring analytic people
  8. Salary report for selected analytical job titles
  9. Are we detailed-oriented or do we think "big picture", or both?
  10. Why you should stay away from the stock market
  11. Gartner Executive Programs' Worldwide Survey of More Than 2,300 CIOs
  12. 4.4 Million New IT Jobs Globally to Support Big Data By 2015
  13. One Third of Organizations Plan to Use Cloud Offerings to Augment BI Capabilities
  14. Twenty Questions about Big Data and Data Sciences
  15. Interview with Drew Rockwell, CEO of Lavastorm
  16. Can we use data science to measure distances to stars?
  17. Eighteen questions about real time analytics
  18. Can any data structure be represented by one-dimensional arrays?
  19. Data visualization: example of a great, interactive chart
  20. Data science jobs not requiring human interactions
  21. Featured Data Scientist: Vincent Granville, Analytic Entrepreneur
  22. Healthcare fraud detection still uses cave-man data mining techniques
  23. Why are spam detection algorithms so terrible?
  24. What is a Data Scientist?
  25. Twenty seven types of data scientists:  where do you fit?

Part III - Data Science Resources

  1. Vincent’s list
  2. History of 24 analytic companies over the last 30 years
  3. Fifteen great data science articles from influential news outlets
  4. List of publicly traded analytic companies
  5. Thirty unusual applications of data sciences, analytics and big data
  6. 50 unusual ways analytics are used to make our lives better
  7. Berkeley course on Data Science


While the book is not yet finished, we wanted to share with you the 123 pages (as of June 6, 2012) that we have written so far. The reasons are as follows:

  • We want your feedback about the style and content.
  • We want to attract sponsors and affiliates to submit contributions. 

Affiliates submit an 2-3 pages article, in exchange for making the book available for download on their website. Contributing to the book will drive traffic to your blog or website, as clickable links are available throughout the book, and can be added in your article as well.We are looking for authors to submit contributions to Part I or Part II.

So far, most articles are from Vincent Granville or Analyticbridge staff for the following reasons:

  • We own copyright for our own articles
  • We don't have to spend much time to select and review our own articles
  • We know that the links associated with our contributions are permanent, making our book more robust

However, we would love to include contributions from external authors in Part I and II, and contributions from sponsors (e.g. vendors) and affiliates in Part I, II and III. So feel free to send articles for possible inclusion. You can check and download the "book in progress" by clicking on the link below: 

About the book:

Our Data Science e-Book provides recipes, intriguing discussions and resources for data scientists and executives or decision makers. You don't need an advanced degree to understand the concepts. Most of the material is written in simple English, however it offers simple, better and patentable solutions to many modern business problems, especially about how to leverage big data.

Emphasis is on providing high-level information that executives can easily understand, while being detailed enough so that data scientists can easily implement our proposed solutions. Unlike most other data science books, we do not favor any specific analytic method nor any particular programming language: we stay one level above practical implementations. But we do provide recommendations about which methods to use when necessary.

Most of the material is original, and  can be used to develop better systems, derive patents or write scientific articles. We also provide several rules of the thumbs and details about craftsmanship used to avoid traditional pitfalls when working with data sets. The book also contains interviews with analytic leaders, and material about what should be included in a business analytics curriculum, or about how to efficiently optimize a search to fill an analytic position.

Among the more technical contributions, you will find notes on

  • How to determine the number of clusters
  • How to implement a system to detect plagiarism
  • How to build an ad relevancy algorithm
  • What is a data dictionary, and how to use it
  • Tutorial on how to design successful stock trading strategies
  • New fast and efficient random number generator
  • How to detect patterns vs. randomness

The book has three parts:

  • Part I: Data science recipes
  • Part II: Data science discussions
  • Part III: Data science resources

Part I and II mostly consist of the best Analyticbridge posts by Dr. Vincent Granville, founder of Analyticbridge. Part III consists of sponsored vendor contributions as well as contributions by organizations (affiliates offering software, conferences, training, books, etc.) who make our free e-book available for download on their web site. To become a sponsor or affiliate, please contact us at [email protected]


The Data Science eBook is free for Analyticbridge members. Click here to get your copy. Membership is free.

Previous digest | Recent jobs | Top Links | Data Science eBook 
Apprenticeship | Subscribe | Events | Press Releases


Views: 107670

Replies to This Discussion

Also, I believe that

  • This is the first book about data science
  • This is the first analytic book with content mostly coming from a social network
  • This is the first free e-book generating revenue via sponsors (vendor contributions) and where marketing is both internal and via contributors offering the book for download on their website

Indeed, we believe that this is the new way to sell and market a book. In many ways, it is the exact opposite of what traditional publishers still do as of today: selling the book for a fee, not having sponsors, and having very expensive marketing strategies. Eliminate all of this by proceeding as follows:

This new book publishing model has two components:

  1. Identify your own best posts on your network - and publish it in PDF format with tons of clickable links to your your web site. 
  2. Have a Resources from vendors (AKA sponsors) section - these are the guys who will pay you money, but you should also offer free contributions (from "affiliates") in exchange for having your book available for download on affiliate web sites. 

Make the book available for free, use your network to market it.

This creates an exploding mix where you generate traffic very fast, at no cost, and generate revenue directly from the book (via the sponsors), and indirectly to your network (web site) due to increased traffic and thus increased ad revenue.

(Note: we think that one day, we'll make a paper copy of our e-book - but the original version will be digital) 


I read the ABbook5.pdf with the expectation to see a data mining version of "Numerical Recipes". I am a big fan of the "Numerical Recipes" books as they provide accessible introductions to very complex and rich topics.

However, the description you offer here indicates that this is a different type of book: a new kind coming from a social network. So having properly reset my expectations, here is my feedback:

- as different problems have different scope, the recipes also widely differ in scope and applicability. As a service to the reader, if there can be some editorial control applied to the descriptions to add/unify that scope and applicability, the recipes will become much much better. Some supporting links like Wikipedia would help as well.

- the three parts are as different as they come, so I would suggest that you truly separate the parts into their own three distinct books. I do not see a reason to combine them in the same book, nor would I use the book in that way. By separating them you gain clarity and agility, and I would say from an operational/advertising/sponsor point of view, you gain more revenue and opportunity.


- love the idea to create a new publishing format, although it doesn't seem to go as far as Wikipedia, or platforms like the StackExchange or Quora (Q&A driven sites), or a marketplace like Spiceworks. I am excited to see if you can make this eBook work as it sits between Wikipedia and StackExchange.


- Take a look at Spiceworks as a model: right now I don't believe that the world of analytics is big enough to support something like Spiceworks for Analytics, but as big data and self-service BI are becoming essential elements in the arsenal of modern business the class of knowledge workers that deal with analytics will steadily grow. IMHO, fundamentally, Spiceworks works because IT professionals have a big say in the budgets they need to operate. Analytics professionals have a much smaller impact on budgets in the traditional enterprise, although they do sit at the strategy table in the analytics startups that we all love. If that trend continues, the Spiceworks model could work.

I have uploaded a new version (56 pages, 22 contributions, updated bio). If you can not download it, refresh your browser, then try again. The URL is the same. Many new articles will be added in the next 2 weeks.

Just curious (even after reading all benefits of pdf publishing), will it be a good idea to do an iBook too?  The demographics of ipad owners are not too small too and might be a good place to do. I played with iBook author a little and seems interesting due to its ability to add interactive elements. 

The fact that the book is published in PDF format offers new advertising opportunities: clickable banner ads. That would be the first time that pay-per-click or pay-per-lead advertising is sold in an e-book, and used to finance the cost of publishing the book.

While others point out the contents, may I add suggestions on style? 

After reading the articles it seems to me that short magazine style layout might suit. Here is one template we can probably use:

PDF render beautiful electronic version of paper books or magazines. Still, they are optimized to be read on paper (or maybe a huge monitor while sitting on your desk).
However, electronic documents are best when consumed in a lightweigth ereader or tablet. Since these are smaller devices, in order to be able to fit a large font the book should reflow at the margin of this particular ereader. PDF is not appropriate for this use. EPub, and other proprietary ereader formats can reflow and are better suited for this purpose.

Thanks for all the comments. I really need to work on the format, maybe hire someone since I don't have much time available (otherwise I would have used TEX rather than Word as word processor), also to proofread, optimize navigation, add structure to the book, an index etc.

The nice thing with PDF is that it has clickable links to the actual discussions, so you can access the most up-to-date versions of the discussions on AB, with all the fresh comments. But PDF takes lots of bandwidth and storage. Also the page size is too large. But it looks very nice when printed :-)

Excellent reference and thank you for making it available. My company is caught right at Part II Section 6. We are a very creative software company that are focused on analytics of big data in real time, however we lack the analytic know how. We think our technology is interesting based on discussions we have had by offering true real-time (getting answers as they happen) but turing technology into a product and understanding what is important as a product is very difficult. You work here has really inspired me and has helped me quite a bit.


I love the ebook, and I want to make an addition to item A7 -- "Why and how to build a data dictionary for big datasets". 

I believe that you have described a data profiler and not a data dictionary.  That being said, I delivered a paper on my SAS dataset profiler at the inaugural conference for IFSUG.   The profiler produces a complete and accurate profile with only two passes of the dataset.  Every value of every "reportable" column is listed and summarized with statistics for all other columns in the dataset.  You have to see it to really understand what it does, and luckily, I created a visual poster to show the reports and how to use them to analyze any dataset. 

The IFSUG presentation is here:

Look for my paper:  Using Dictionary Tables to Explore SAS® Datasets (SAS code 1, SAS code 2)

Paper, poster, and SAS code are available.

I will also present this paper at SESUG 2012 in Durham, NC -- near the mothership for SAS Institute.


This is an amazing book.  I look forward to the finished product.

Please consider including the following analytics companies in your lists.:

MarketShare -

Quantivo -

More to come.

thank you,


I have only skimmed through it yet but it's a great job. Thanks a lot.


On Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service