Daily Dose Of Geekiness, Or

Simulations 101

A simulation is a statistical model used to make informed predictions. Since simulations seem to have a mystical aura these days, due to all this climate stuff, it’s in everybody’s best interest to understand that no, they aren’t magic, and yes, they’re actually quite simple.

You need four things. You need data, from which you can create a model. You need simulated input data to feed the model, and that will produce simulated output data. Because the output are simulated, you need to run the simulation model repeatedly (these are called iterations), and the more iterations you run, the more reliable the output data are. Because you have multiple iterations and therefore, multiple output data, they are interpretated statistically to produce a single output.

Sound complicated? Well, simulations can be nightmarishly complex, but the general concept is actually pretty simple.

Let’s say you are a new freshman at Some State University, and your finances are tight. You did not purchase a parking sticker at registration because you are mathematically savvy and you wanted to determine whether it would be cheaper to buy the sticker or pay parking tickets. Let’s say you know that there is a 30% chance that if you park illegally in the lot outside your classroom building, you will get a ticket (we’ll ignore how you’d get that information). A parking sticker would cost $140 per semester, and each parking ticket would set you back $25. There are 15 weeks in the semester, and after looking at your schedule, you have determined that you would have to park in seven different lots every week (that’s 105 times a semester, and each time, you have a 30% chance of being ticketed).

The 30% chance of being ticketed is the probability you have extracted from the data (again, for the purposes of this, we’ll ignore how you got it). Let me show you how simple this is.

Imagine a roulette wheel with 100 pockets, numbered 1 through 100. Get a piece of paper and a pencil, and write Ticket Y and Ticket N on it. Spin the roulette wheel, toss the ball onto it, and wait for it to land in a pocket. If the number of the pocket is 1-30, make a hash mark under Ticket Y; otherwise, mark Ticket N. Now, because you are going to do this 105 times throughout the semester, repeat this process 105 times.

You have just completed one iteration of the simulation. The more iterations you do, the more reliable your results will be, so do 99 more iterations (by the way, do you see why these are known as Monte Carlo simulations?)

When you have finished all 100 iterations, average the Y and N hashes for all of the iterations (we do other things too, like look at the standard error and so forth, but that’s for another time). Now, multiply the average number under Y, multiply it by $25, and compare it to the cost of a parking sticker.

Using the roulette wheel is simulated input data. It isn’t real, because it’s not really parking in those lots. But it produces a random number, and since you know that the probability of getting a ticket is 0.3, you can determine, based on the simulated data, whether you get ticketed or not. So you can create a simulation model to determine whether it will be cheaper to buy a sticker or pay the parking tickets.

Okay, sure, you can look at the probability and the rest of the data and figure out that it’s going to be cheaper to buy the sticker. But that was merely a very simple model meant only to explain exactly what a simulation is. A simulation can be as complex as we need it to be. For example, weather affects the chance of being ticketed (meter maids don’t like being out in the rain and snow any more than you do). So if the weather is bad, the probability of being ticketed decreases. Again, as long as we know the probabilities, we can easily create a simulation. Also, lots are policed more at the beginnings of semesters (to catch the new students) and in the final two weeks (studying for and taking those final exams). If you have the probabilities, you can create the simulation. Staffing is tight, so lots are policed in shifts throughout the week, so the probability of being ticketed in a particular lot depends on the day of the week and the time. But again, as long as you have the data and can extract the probabilities, you can create the simulation.

This is what we call a manual simulation, where we use a raw probability to calculate the outcome, and for all but the simplest problems, isn’t very sophisticated. But we can use other software packages (the @Risk add-in for Excel, for example) which uses the distribution of past data instead of probabilites extracted from it to create highly sophisticated simulation models.

A simulation is only as good as the input data and the model. If you got, say, the probability of being ticketed wrong, your simulation output would give you an incorrect prediction. Likewise, if you set up your model wrong and got one of the calculations incorrect, you would get an incorrect prediction. Keep that in mind as you read about what this or that simulation predicts.

Leave a comment