A Data Science Central Community

I’m attending Rachel Schutt’s Columbia University Data Science course on Wednesdays this semester and I’m planning to blog the class. Here’s what happened yesterday at the first meeting.

**Syllabus**

Rachel started by going through the syllabus. Here were her main points:

- The prerequisites for this class are: linear algebra, basic statistics, and some programming.
- The goals of this class are: to learn what data scientists do. and to learn to do some of those things.
- Rachel will teach for a couple weeks, then we will have guest lectures.
- The profiles of those speakers vary considerably, as do their backgrounds. Yet they are all data scientists.
- We will be resourceful with readings: part of being a data scientist is realizing lots of stuff isn’t written down yet.
- There will be 6-10 homework assignments, due every two weeks or so.
- The final project will be an internal Kaggle competition. This will be a team project.
- There will also be an in-class final.
- We’ll use R and python, mostly R. The support will be mainly for R. Download RStudio.
- If you’re only interested in learning hadoop and working with huge data, take Bill Howe’s Coursera course. We will get to big data, but not til the last part of the course.

**The current landscape of data science**

So, what is data science? Is data science new? Is it real? What is it?

This is an ongoing discussion, but Michael Driscoll’s answer is pretty good:

Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-inspired statistics.

But data science is not merely hacking, because when hackers finish debugging their Bash one-liners and Pig scripts, few care about non-Euclidean distance metrics.

And data science is not merely statistics, because when statisticians finish theorizing the perfect model, few could read a ^A delimited file into R if their job depended on it.

Data science is the civil engineering of data. Its acolytes possess a practical knowledge of tools & materials, coupled with a theoretical understanding of what’s possible.

Driscoll also refers to Drew Conway’s Venn diagram of data science from 2010.

We also may want to look at Nathan Yau’s “sexy skills of data geeks” from his “Rise of the Data...:

- Statistics – traditional analysis you’re used to thinking about
- Data Munging – parsing, scraping, and formatting data
- Visualization – graphs, tools, etc.

But wait, is data science a bag of tricks? Or is it just the logical extension of other fields like statistics and machine learning?

**DSC Resources**

- Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
- Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
- Buzz: Business News | Announcements | Events | RSS Feeds
- Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers

**Additional Reading**

- Data Scientist Reveals his Growth Hacking Techniques
- 10 Modern Statistical Concepts Discovered by Data Scientists
- Top data science keywords on DSC
- 4 easy steps to becoming a data scientist
- 13 New Trends in Big Data and Data Science
- 22 tips for better data science
- Data Science Compared to 16 Analytic Disciplines
- How to detect spurious correlations, and how to find the real ones
- 17 short tutorials all data scientists should read (and practice)
- 10 types of data scientists
- 66 job interview questions for data scientists
- High versus low-level data science

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Tags: