Experimental Methods in Computer Science – Exercise 4

Experimental Methods in Computer Science

Exercise 4 – Modeling Memory Copy Performance

Goals

Background

Modern microprocessors have a multi-level memory hierarchy. Usually there are two levels of caches between the processor and the main memory. Data is moved between the levels according to usage: When a datum is first used, it is inserted into the caches. When space is needed to serve other data, older data is evicted.

To make things even more interesting, not all the memory is directly accessible to the processor. The processor works with virtual addresses, and these have to be translated to physical addresses using the page table. In order to work faster, the most useful part of the page table is cached in another, special cache, called the TLB. Thus only memory that is mapped using the TLB is directly accessible; memory whose mapping does not appear in the TLB will first suffer a TLB miss, and the access can only be tried after the TLB is loaded with the required translation.

Assignment

Measure the time needed to copy an area of memory using memcpy as a function of the size of the area copied. As usual, you probably need to make multiple copy operations, and find the average time needed. And there may be various problems that cause you to measure something else instead of what you intended.

The major problem is that if you just copy the same area multiple times, the first time will take much more time, because the copied memory has to be brought into the cache. And subsequent copies will be quicker, because it is already in the cache. But then you are not measuring the time to copy in memory, but the time to copy in the cache. So you need to think about how to avoid cache effects, or alternatively, how to characterize and measure the cache effects correctly.

Then try to understand behavior of the results. Look at the results both in terms of time needed to copy a certain area, and as the memory bandwidth achieved (bandwidth is bytes transfered per unit time). Try to find a formula that gives the dependence of time or bandwidth on the size of the area copied.

As in all other exercises, there's some freedom here. If you find yourself focusing on one aspect of the above that's fine. The main thing is to convince me that you tackled the problem and figured out what is going on. In other words, it is more important to understand and explain your own measurements than to do the measurements I would have done.

Submit

Submit a single pdf file that contains all the following information:

  1. Your names, logins, and IDs
  2. Specifics of where you ran the measurements: what machine, and sizes of caches and memory as far as you can find them.
  3. Your measurement results, in a graphical format. Think about how to best show what you got. Don't forget to label the axes etc.
  4. A short explanation of what the results show. If you see different bandwidths for different data sizes try to explain what is going on. In particular, does this relate to the machine's memory hierarchy?
  5. Your formula for time or bandwidth achieved as a function of the size of memory copied. State whether this is universal or only applicable to certain situations, and if so, under what conditions it is applicable.
  6. The program you wrote to perform the measurements.
Submission deadline is Monday morning, 4/5/09, so I can give feedback in class on Tuesday.

Please do the exercise in pairs.

To the course home page