For the most recent updates, visit this page (check the Discussion section).
Our textbook is now published, new data sets and new tutorials added, and the data science cheat sheet will soon be available in its final format.
So what does it mean for you and how to get started?
First, we remind you that this is still a program for self-learners, presenting original, core, modern, applied, useful pioneering data science material not found in traditional programs. It is basically free (for now), besides the cost of purchasing our book. The version presented here is on-demand and DIY (do-it-yourself), and may not be suitable for everyone. We are working with partners to offer more traditional training for students who need a well structured learning environment and/or face-to-face or frequent email interactions with professors, who need more support to learn, and/or who need an official diploma at the end.
So the text below applies only to the online, on-demand, DIY version.
How to get started?
Read the steps required to complete the program (see below), and if you are still interested, proceed to step #1.
The program in 7 steps
- Email us at [email protected] to show your interest. Include links to your Data Science Central and LinkedIn profiles, in your email. Two years of college training in computer science, statistical science, operations research, machine learning, mathematics, data engineering or other analytic domain (or equivalent business experience requiring analytic practice and acumen), are deemed necessary to succeed. If too many candidates apply, priority will be given to students or professionals meeting this criterion.
- After reviewing your profile, expect to hear from us within 7 days: either you are accepted immediately, or you may be accepted as soon as a spot become available if the demand is too large. By "accepted", it simply means that we have the resources (time) to review your submitted material such as project, a data science test that you will receive from us, etc.
- Obtain a copy of our book, the book addendum, and our data science cheat sheet when published (for now, study the litterature mentioned in this document since the data science cheat sheet is not yet published). One important thing is to install the right environment on your laptop (for instance, Cygwin / Perl or Python, and R) or use a server where this kind of stuff is already pre-installed, get familiar with basic UNIX commands and operators, regular expressions, file processing, FTP, and Excel or other summarization / visualization / EDA tools (EDA = exploratory data analysis). More on this in the cheat sheet (coming soon) but hopefully you know this stuff already.
- Select your data science project. We listed a number of potential data sets and projects to work on (as part of the apprenticeship) in a previous update. At this stage, only two projects are approved (more projects to be accepted, so keep checking this page every month): (1) working on the jackknife regression project that involves simulated data (and indeed, a chance to win $1,000), and (2) detecting reputable big data, data science and analytics digital publishers that accept RSS feeds (click here for details).
- Complete one of these real-life projects, email us your solution. You will then receive comments from us (possibly asking for a revision), and a questionnaire - a data science test - to test your general knowledge and business acumen. Answers to the technical questions are found in our training material, questions are different for each candidate, and questions are open-ended.
- Once approved, publish the following three items: (1) the solution to your project (including methodology, data and results), (2) answers to our questions emailed to you (both technical and non-technical), and (3) how your project could be scaled (architecture details, distributed computing, faster algorithms, better crawling and potential problems/bottlenecks if working with big data; automation and maintenance; please provide ample details customized to your project, written in your own words, not just a brief generic overview). Publish these three items in one blog on Data Science Central, with subject line "My Data Science Apprenticeship Project". You have six months, from the time you are accepted, to publish your results.
- Great contributions will be featured (Dr. Vincent Granville review the submissions) and you can then expect to hear from hiring managers and other professionals interested in your talents. You will also benefit from crowdsourcing: advice and comments provided by our readers. Authors of featured projects will earn our data science certification.