Topics in Performance Evaluation – Exercise 9

Topics in Performance Evaluation

Exercise 9 – Simulation Accuracy

In this exercise we will simulate rain drops and use this to calculate pi. As we know what the answer should be, we can use this to gauge the accuracy of our simulation procedure.

Background

Consider rain falling on a square patch of land (with each side equal to one unit of length). We can record the exact place where each drop fell, and find its distance from the bottom-left corner of the square. If this distance is smaller than 1, then the drop has fallen within the quarter of the unit circle that is centered at that corner. If a lot of drops fall randomly and uniformly in the square, counting those that are in the circle and those that are out of it enables us to approximate the area of the circle. Using the formula for the area of a circle (pi r²), this allows us to approximate the value of pi.

The way to evaluate the accuracy of a simulation when the answer is not known in advance is to compute confidence intervals.

The simulation typically samples some random variable, and it is easy to compute the mean of the samples. The question is how close this is to the mean of the underlying population. By looking at the variance of the samples, we can give a probabilistic answer to this question. It goes as follows.

If a set of random variables x₁, x₂, ... x_n are independent and normally distributed, then the mean xbar is distributed according to the t distribution. This means that Pr(|xbar - m| < h) = 1-a <=> h = t_n-1,1-a/2 S_xbar Where m is the real mean, S_xbar is the standard deviation of the distribution of means, and t_n-1,1-a/2 is obtained from tables. For large samples, when n > 30, the t distribution can be approximated by the normal distribution z_1-a/2.
The variance of the distribution of means can be approximated by dividing the variance of the samples themselves by n. Therefore the standard deviation has to be divided by sqrt(n). Note, however, that this depends on the fact that the x_i are independent.
Even if the samples come from some unknown distribution the above can be used, because the mean is the sum of many samples. Therefore, by the central limit theorem, the distribution of the means is normal (assuming that the original distribution has finite variance). However, n should be large enough, e.g. n > 100.

The above is explained more fully in Jain section 13.2.

The accuracy is usually taken to be the relative size of the confidence interval, that is, the size of the confidence interval divided by the mean. In symbols, if the confidence interval is xbar +- h, the accuracy is h / xbar. For example, a result of 250 +- 2.5 has an accuracy of 1%.

Getting back to rain and pi, simulating x drops gives one sample estimate of "fraction of drops in circle" and hence pi. We want to find how many drops we need to get a specific accuracy (as the number of drops grows, the confidence interval becomes smaller, and we do this until it is small enough for the desired accuracy). For this we need n independent samples. We get them by using the batch means approach: do n samples of x drops each, and apply the above procedure. Continue to increase n until the desired accuracy is achieved. The batch means method is explained in Jain section 25.5.

Assignment

Write a simulation of rain to find pi. The simulation should calculate confidence intervals, and terminate when a specified accuracy is achieved. The confidence level is an input parameter; you only need to handle the values 90%, 95%, and 99%. The accuracy is also an input parameter; values of 1%, 5%, and 10% of the mean are common, but others are also possible.

The output of the program will include

The result: the estimate of pi, the target accuracy, and the significance (e.g., "this is within 5% of the true value with a confidence level of 95%").
A check whether this is indeed true (based on the fact that we know that pi = 3.1415927).
The length of the simulation needed to get this result, where length is measured in rain drops.

Run your program 9 times, for the 9 combinations of confidence intervals and accuracies mentioned above.

Submit

Use the Moodle system to submit a report on your work, in pdf format, with the following data.

Your names, logins, and IDs
The results of the 9 runs
Answer the question: do we need to worry about an initial transient period in this simulation?
The source code for the actual simulation and calculating confidence intervals, without the mechanics of doing different runs (should be short).

Submission deadline is Sunday, 1 June 2014, just before class, because there is no class on Tuesday, and there's another ex coming up.

Please do the exercise in pairs.

To the course home page