A Data Science Central Community
The Data Science eBook is free for Analyticbridge members. Click here to get your PDF copy. Membership is free.
Updated content as of June 6, 2012. See below for an outdated HTML version; the PDF version is of much higher quality, proofread and with the most recent content. Click here to preview the 2nd Edition. To download a copy, you must be a member. Membership is free. Click here to sign up and check all the benefits of membership.
Part I - Data Science Recipes
Part II - Data Science Discussions
Part III - Data Science Resources
While the book is not yet finished, we wanted to share with you the 123 pages (as of June 6, 2012) that we have written so far. The reasons are as follows:
Affiliates submit an 2-3 pages article, in exchange for making the book available for download on their website. Contributing to the book will drive traffic to your blog or website, as clickable links are available throughout the book, and can be added in your article as well.We are looking for authors to submit contributions to Part I or Part II.
So far, most articles are from Vincent Granville or Analyticbridge staff for the following reasons:
However, we would love to include contributions from external authors in Part I and II, and contributions from sponsors (e.g. vendors) and affiliates in Part I, II and III. So feel free to send articles for possible inclusion. You can check and download the "book in progress" by clicking on the link below:
About the book:
Our Data Science e-Book provides recipes, intriguing discussions and resources for data scientists and executives or decision makers. You don't need an advanced degree to understand the concepts. Most of the material is written in simple English, however it offers simple, better and patentable solutions to many modern business problems, especially about how to leverage big data.
Emphasis is on providing high-level information that executives can easily understand, while being detailed enough so that data scientists can easily implement our proposed solutions. Unlike most other data science books, we do not favor any specific analytic method nor any particular programming language: we stay one level above practical implementations. But we do provide recommendations about which methods to use when necessary.
Most of the material is original, and can be used to develop better systems, derive patents or write scientific articles. We also provide several rules of the thumbs and details about craftsmanship used to avoid traditional pitfalls when working with data sets. The book also contains interviews with analytic leaders, and material about what should be included in a business analytics curriculum, or about how to efficiently optimize a search to fill an analytic position.
Among the more technical contributions, you will find notes on
The book has three parts:
Part I and II mostly consist of the best Analyticbridge posts by Dr. Vincent Granville, founder of Analyticbridge. Part III consists of sponsored vendor contributions as well as contributions by organizations (affiliates offering software, conferences, training, books, etc.) who make our free e-book available for download on their web site. To become a sponsor or affiliate, please contact us at [email protected]
The Data Science eBook is free for Analyticbridge members. Click here to get your copy. Membership is free.
Also, I believe that
Indeed, we believe that this is the new way to sell and market a book. In many ways, it is the exact opposite of what traditional publishers still do as of today: selling the book for a fee, not having sponsors, and having very expensive marketing strategies. Eliminate all of this by proceeding as follows:
This new book publishing model has two components:
Make the book available for free, use your network to market it.
This creates an exploding mix where you generate traffic very fast, at no cost, and generate revenue directly from the book (via the sponsors), and indirectly to your network (web site) due to increased traffic and thus increased ad revenue.
(Note: we think that one day, we'll make a paper copy of our e-book - but the original version will be digital)
I read the ABbook5.pdf with the expectation to see a data mining version of "Numerical Recipes". I am a big fan of the "Numerical Recipes" books as they provide accessible introductions to very complex and rich topics.
However, the description you offer here indicates that this is a different type of book: a new kind coming from a social network. So having properly reset my expectations, here is my feedback:
- as different problems have different scope, the recipes also widely differ in scope and applicability. As a service to the reader, if there can be some editorial control applied to the descriptions to add/unify that scope and applicability, the recipes will become much much better. Some supporting links like Wikipedia would help as well.
- the three parts are as different as they come, so I would suggest that you truly separate the parts into their own three distinct books. I do not see a reason to combine them in the same book, nor would I use the book in that way. By separating them you gain clarity and agility, and I would say from an operational/advertising/sponsor point of view, you gain more revenue and opportunity.
- love the idea to create a new publishing format, although it doesn't seem to go as far as Wikipedia, or platforms like the StackExchange or Quora (Q&A driven sites), or a marketplace like Spiceworks. I am excited to see if you can make this eBook work as it sits between Wikipedia and StackExchange.
- Take a look at Spiceworks as a model: right now I don't believe that the world of analytics is big enough to support something like Spiceworks for Analytics, but as big data and self-service BI are becoming essential elements in the arsenal of modern business the class of knowledge workers that deal with analytics will steadily grow. IMHO, fundamentally, Spiceworks works because IT professionals have a big say in the budgets they need to operate. Analytics professionals have a much smaller impact on budgets in the traditional enterprise, although they do sit at the strategy table in the analytics startups that we all love. If that trend continues, the Spiceworks model could work.
Just curious (even after reading all benefits of pdf publishing), will it be a good idea to do an iBook too? The demographics of ipad owners are not too small too and might be a good place to do. I played with iBook author a little and seems interesting due to its ability to add interactive elements.
While others point out the contents, may I add suggestions on style?
After reading the articles it seems to me that short magazine style layout might suit. Here is one template we can probably use:
Thanks for all the comments. I really need to work on the format, maybe hire someone since I don't have much time available (otherwise I would have used TEX rather than Word as word processor), also to proofread, optimize navigation, add structure to the book, an index etc.
The nice thing with PDF is that it has clickable links to the actual discussions, so you can access the most up-to-date versions of the discussions on AB, with all the fresh comments. But PDF takes lots of bandwidth and storage. Also the page size is too large. But it looks very nice when printed :-)
Excellent reference and thank you for making it available. My company is caught right at Part II Section 6. We are a very creative software company that are focused on analytics of big data in real time, however we lack the analytic know how. We think our technology is interesting based on discussions we have had by offering true real-time (getting answers as they happen) but turing technology into a product and understanding what is important as a product is very difficult. You work here has really inspired me and has helped me quite a bit.
I love the ebook, and I want to make an addition to item A7 -- "Why and how to build a data dictionary for big datasets".
I believe that you have described a data profiler and not a data dictionary. That being said, I delivered a paper on my SAS dataset profiler at the inaugural conference for IFSUG. The profiler produces a complete and accurate profile with only two passes of the dataset. Every value of every "reportable" column is listed and summarized with statistics for all other columns in the dataset. You have to see it to really understand what it does, and luckily, I created a visual poster to show the reports and how to use them to analyze any dataset.
The IFSUG presentation is here: http://www.ifsug.org/2012-proceedings
Paper, poster, and SAS code are available.
I will also present this paper at SESUG 2012 in Durham, NC -- near the mothership for SAS Institute.
I have only skimmed through it yet but it's a great job. Thanks a lot.