Modern microprocessors have a multi-level memory hierarchy. Usually there are two or even three levels of caches between the processor and the main memory. Data is moved between the levels according to usage: When a datum is first used, it is inserted into the caches. When space is needed to serve other data, older data is evicted.
In order to benefit from spatial locality, data is moved between memory levels in chunks called cache lines. Each cache line contains a certain number of words (or bytes). Bringing in a cache line is an expensive operation, but once it's there, you can access its elements for a very low cost. For example, accessing only one element from each cache line takes approximately the same time as accessing all of them.
The caches themselves have different sizes: the L1 cache is the smallest, the L2 cache bigger, but it takes more time to access the L2 cache. Caches may also have different levels of associativity.
To make things even more interesting, not all the memory is directly accessible to the processor. The processor works with virtual addresses, and these have to be translated to physical addresses using the page table. In order to work faster, the most useful part of the page table is cached in another, special cache, called the TLB. Thus only memory that is mapped using the TLB is directly accessible; memory whose mapping does not appear in the TLB will first suffer a TLB miss, and the access can only be tried after the TLB is loaded with the required translation. Luckily, the TLB may not be relevant in this exercise.
In this exercise we want to write a microbenchmark that provides data regarding the size of the cache line and the sizes of the different caches.
First, measure the time needed to access the elements of a large array ("large" means really large, so that you can be sure it doesn't fit in the caches, e.g. 64MB). Measure the total time when accessing every cell, every other cell, every fourth cell, and so on, up to every 1024th cell (all values are powers of two). Draw a graph of the results. Are the results as may be expected? What can you learn from this about the cache line size?
Now measure the time per access when traversing arrays of different sizes. Start with a size of 1KB, and go up to 64MB in powers of two. You do not need to touch all the elements of each array -- it is enough to touch one from each cache line, based on the previous measurement. Draw a graph of the results. Are the results as may be expected? What can you learn from this about the cache sizes?
When you design your measurements, think about the following:
Submit a single pdf file that contains all the following information:
Please do the exercise in pairs.