A Data Science Central Community
I’m going to keep this tutorial light on math, because the goal is just to give a general understanding.
The idea of Monte Carlo methods is this—generate some random samples for some random variable of interest, then use these samples to compute values you’re interested in.
I know, super broad. The truth is Monte Carlo has a ton of different applications. It’s used in product design, to simulate variability in manufacturing. It’s used in physics, biology and chemistry, to do a whole host of things that I only partially understand. It can be used in AI for games, for example the chinese game Go. And finally, in finance, to evaluate financial derivatives or option pricing . In short—it’s used everywhere.
The methods we use today originated from the Manhattan Project, as a way to simulate the distance neutrons would travel through through various materials . Ideas using sampling had been around for a little while, but they took off in the making of the atomic bomb, and have since appeared in lots of other fields.
The big advantage with Monte Carlo methods is that they inject randomness and real-world complexity into the model. They are also more robust to adjustments such as applying a distribution to the random variable you are considering. The justification for a Monte Carlo method lies in the law of large numbers. I’ll elaborate in the first example.
The examples I give are considered simple Monte Carlo. In this kind of problem we want to know the expected value of some random variable. We generate a bunch of these random variables and take their average. The random variable will often have a probability distribution.
We can use something called the random darts method, a Monte Carlo simulation, to estimate pi. Here is my R code for this example.
The logic goes as follows—
If we inscribe a circle in a square, where one side length of the square equals the diameter of the circle we can easily calculate the ratio of circle area to square area.
Now if we could estimate this value, we would be able to estimate pi.
We can do this by randomly sampling points in the square, and calculating the proportion of those inside the circle to the total points. So I just calculate red points over total points, and multiply by 4.
Now as the number of points increases, the closer our value will get to pi.
This is a very simple example of a Monte Carlo method at work.
Here is a more useful example. We can simulate traffic using the Nagel–Schreckenberg model. In this model, we have a road, which is made up by cells or spaces and contains a speed limit, and then a certain number of cars. We iterate through the cars and update their velocity based on the four following rules. Note – a car’s velocity = v.
This model is simple, but it does a pretty good job of simulating traffic behavior. It doesn’t deal with accidents or bad drivers; it’s purpose is to assess those times when traffic just appears and vanishes without any apparent reason. More sophisticated models exist, but many of them are based on this model.
The first big challenge for Monte Carlo is how to come up with independent samples for whatever distribution your dealing with. This is a harder than you might think. In my code I just called R or Python’s built in random functions, but sampling can become much more sophisticated. That is a lot of what you will read about from more academic sources.
Here is a link on how R’s built in uniform sampling distribution works.
Another problem is getting the error to converge. Notice with the pi example how the error kind of stopped decreasing. Most Monte Carlo applications just use really large samples due to low computing costs to compensate.
Monte Carlo methods are an awesome topic to explore, and I hope this post popularizes them even a little bit more (outside of finance and physics, that is).
2. Art Owen’s textbook on the subject. My favorite resource so far.
3. Kevin Murphy’s textbook.
View the original post, and others from the author here.