A Data Science Central Community
I’m going to keep this tutorial light on math, because the goal is just to give a general understanding.
The idea of Monte Carlo methods is this—generate some random samples for some random variable of interest, then use these samples to compute values you’re interested in.
I know, super broad. The truth is Monte Carlo has a ton of different applications. It’s…Continue
Linear regression is one of the first things you should try if you’re modeling a linear relationship (actually, non-linear relationships too!). It’s fairly simple, and probably the first thing to learn when tackling machine learning.
At first, linear regression shows up just as a simple equation for a line. In machine learning, the weights are usually represented by a vector θ (in statistics they’re often represented…Continue
It’s important to know what goes on inside a machine learning algorithm. But it’s hard. There is some pretty intense math happening, much of which is linear algebra. When I took Andrew Ng’s course on machine learning, I found the hardest part was the linear…Continue
Added by Alex Woods on July 10, 2015 at 10:30pm — No Comments
Random Forest is a machine learning algorithm used for classification, regression, and feature selection. It's an ensemble technique, meaning it combines the output of one weaker technique in order to get a stronger result.
The weaker technique in this case is a decision tree. Decision trees work by splitting the and re-splitting the data by…Continue
Added by Alex Woods on July 4, 2015 at 8:30am — No Comments
When you're cleaning up data, you usually end up using a 5-8 functions a ton of times, and then a few more once or twice. Here are those 5-8 functions I find myself using again and again.
Here is a quick overview:
names() - returns the column names of a dateset…Continue
Added by Alex Woods on July 4, 2015 at 8:00am — No Comments