Experimental Methods in Computer Science – Exercise 3

Experimental Methods in Computer Science

Exercise 3 – Measuring Trap Overhead

Goals

Understand the factors affecting a microbenchmark
Measure the overhead of a trap and get reliable results

Background

A basic characterization of a computer system is often done using microbenchmarks. These are short programs that are designed to measure a single well-defined feature of the system. For example, the lmbench benchmark suite includes programs to measure

memory latency and bandwidth
inter-process communication bandwidth
I/O bandwidth
signal handling overhead
process creation overhead
...and a few other things

To obtain precise results, the measurements are typically repeated multiple times and averaged. However, this leads to a risk of increased noise due to interrupts, and a risk of measuring the wrong thing due to unknown compiler or hardware optimizations (for example, when trying to measure memory latency we may end up measuring cache latency instead). It is therefore necessary to carefully design the benchmarks, and to ensure they are measuring the right thing.

The specific measurement we will perform in this exercise is the overhead involved in trapping into the kernel and returning to user mode. We will measure this using several system calls that are expected to do very little while actually in the kernel:

Closing a file descriptor that was not open, e.g. close(13).
Obtaining the process ID with getpid().
Write one word to /dev/null. For this, first use open("/dev/null",O_WRONLY) to get a file descriptor, define an integer variable x, and then measure the time to perform write(fd,&x,sizeof(x)).

Assignment

The assignment has 3 parts.

Part one is to characterize gettimeofday(), which we will use to measure time. Write a short program that calls gettimeofday repeatedly, and then look at the intervals between the obtained values. What can you say about the accuracy, precision, and resolution? Think about how to write the best program for this purpose, and repeat the measurement if you come up with new ideas.

Part two is to study the effect of the loop structure of performing a measurement. Specifically, you should compare measurements obtained using averages of 1, 10, 100, 1000, 10000, 100000, and 1000000 repetitions of the close(13) system call. Think about how to account for the loop overhead. Consider the use of loop unroling, and repeating measurements more than once. It is crucial to take compiler optimizations into account here — use gcc -S to create assembler, and look at it to verify that the compiler did not optimize your measurement away, nor added spurious instructions (do this just for the main measurement code, not the whole program, to make it easier to identify what is going on).

Part three is to compare the three system calls listed above. Based on your data from part two, decide on a measurement scheme (i.e. how many measurements to conduct and how to structure the loops), and then measure all three system calls.

Submit

Submit a single pdf file that contains all the following information:

Your names, logins, and IDs
A short explanation of what you did and why, organized as answers to the following questions:
1. On what machine did you run your tests (machine name, CPU type, and clock rate; use hostname and see /proc/cpuinfo).
2. What were your considerations in writing the program to characterize gettimeofday()?
3. Your results pertaining to gettimeofday: what can you say about its accuracy, precision, and resolution?
4. Did you encounter any problems with compiler optimizations? What did you do to avoid them?
5. What were the results obtained for the trap overhead using different numbers of repetitions? If you used several loop structures, provide all the results in an organized manner (but don't just create lots of meaningless combinations and cause clutter).
6. What measurement scheme do you think produces the best results, and why?
7. Your results of the trap overhead as measured by the three different system calls. If these results do not agree with each other,
  1. Try to explain the discrepancy.
  2. Which one do you trust the most as an estimate of the trap overhead, and why?
Any relevant data or graphs that you want to use to illustrate or present your results. When creating graphs, remember to label the axes, use a legend, etc. as needed.
The program used to characterize gettimeofday() and the program you used to measure the three system calls (include the code listing at the end of the report; don't submit the code itself).

Submission deadline is Monday morning, 7/3/11, so I can give feedback in class on Tuesday.

Please do the exercise in pairs. Remember, this is to allow you to collaborate and discuss how to get the best solution. Benchmarking is tricky, so thinking together can be a big help.

To the course home page