A Data Science Central Community
I’m going to keep this tutorial light on math, because the goal is just to give a general understanding.
The idea of Monte Carlo methods is this—generate some random samples for some random variable of interest, then use these samples to compute values you’re interested in.
I know, super broad. The truth is Monte Carlo has a ton of different applications. It’s used in product design, to simulate variability in manufacturing. It’s used in physics, biology and chemistry, to do a whole host of things that I only partially understand. It can be used in AI for games, for example the chinese game Go. And finally, in finance, to evaluate financial derivatives or option pricing [1]. In short—it’s used everywhere.
The methods we use today originated from the Manhattan Project, as a way to simulate the distance neutrons would travel through through various materials [1]. Ideas using sampling had been around for a little while, but they took off in the making of the atomic bomb, and have since appeared in lots of other fields.
The big advantage with Monte Carlo methods is that they inject randomness and real-world complexity into the model. They are also more robust to adjustments such as applying a distribution to the random variable you are considering. The justification for a Monte Carlo method lies in the law of large numbers. I’ll elaborate in the first example.
The examples I give are considered simple Monte Carlo. In this kind of problem we want to know the expected value of some random variable. We generate a bunch of these random variables and take their average. The random variable will often have a probability distribution.
We can use something called the random darts method, a Monte Carlo simulation, to estimate pi. Here is my R code for this example.
The logic goes as follows—
If we inscribe a circle in a square, where one side length of the square equals the diameter of the circle we can easily calculate the ratio of circle area to square area.
Now if we could estimate this value, we would be able to estimate pi.
We can do this by randomly sampling points in the square, and calculating the proportion of those inside the circle to the total points. So I just calculate red points over total points, and multiply by 4.
Now as the number of points increases, the closer our value will get to pi.
This is a very simple example of a Monte Carlo method at work.
Here is a more useful example. We can simulate traffic using the Nagel–Schreckenberg model. In this model, we have a road, which is made up by cells or spaces and contains a speed limit, and then a certain number of cars. We iterate through the cars and update their velocity based on the four following rules. Note – a car’s velocity = v.
This model is simple, but it does a pretty good job of simulating traffic behavior. It doesn’t deal with accidents or bad drivers; it’s purpose is to assess those times when traffic just appears and vanishes without any apparent reason. More sophisticated models exist, but many of them are based on this model.
View my code for getting the simulated data here, and for visualizing it in Processing here.
The first big challenge for Monte Carlo is how to come up with independent samples for whatever distribution your dealing with. This is a harder than you might think. In my code I just called R or Python’s built in random functions, but sampling can become much more sophisticated. That is a lot of what you will read about from more academic sources.
Here is a link on how R’s built in uniform sampling distribution works.
Another problem is getting the error to converge. Notice with the pi example how the error kind of stopped decreasing. Most Monte Carlo applications just use really large samples due to low computing costs to compensate.
Monte Carlo methods are an awesome topic to explore, and I hope this post popularizes them even a little bit more (outside of finance and physics, that is).
Sources
1. Wikipedia
2. Art Owen’s textbook on the subject. My favorite resource so far.
3. Kevin Murphy’s textbook.
View the original post, and others from the author here.
Comment
Alex, what I have in mind is to create synthetic numbers that have fast/efficient series expansions (like Pi) or basic recurrence formulas to generate the digits (like SQRT{n} for small n), to generate/reproduce the randomness that these famous numbers exhibit. My hope is to come up with trillions of trillions of trillions of synthetic numbers by modifying parameters associated with the nice or exploitable mathematical formulas available for famous numbers, hoping that my synthetic numbers will be as random as the famous numbers, when it comes to the distribution of digits or decimals.
Alex, It has to do with low discrepancy sets in small dimensions. See https://en.wikipedia.org/wiki/Quasi-Monte_Carlo_method
They show how to mix pseudo and quasi to generate sets that perform well in higher dimensions.
A great book on monte carlo methods and random number generation in general is
Random Number Generation and Monte Carlo Methods, by Gentle
http://www.amazon.com/Random-Generation-Methods-Statistics-Computin...
Enjoy! :*)
Vincent - Very cool. It seems like any of the doubts I had about it you addressed with the tests (in your article). Definitely enjoyed that article.
John - Great suggestion! I tried it; it worked. Why?
I also wrote a bit about random generators, see here. My interest is in using decimals of some irrational numbers (Pi has an easy, fast formula to generate decimals, I'm working on a formula for SQRT{2}), as well as synthetic numbers designed to simulate randomness, and period-free.
The R randtoolbox package has a nice set of quasi-random sequence generators, such that if you substitute, say, the "halton" function for "runif" in your code, you will see that the error converges far more quickly. For small dimensions (say, up to 5 dimension), in practice the quasi-random sequences as point generators have much better convergence.
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge