Experimental Methods in Computer Science – Exercise 11

Experimental Methods in Computer Science

Exercise 11 – Complexity of Sorting

Goals

Verify that the complexity of sorting is indeed O(n log n)
...or not
See how this may depend on inputs

Background

One of the most often repeated results in algorithm analysis is that the complexity of sorting is O(n log n). But this is based on an abstract model of computation, e.g. a random-access machine, or even on only counting comparison operations and ignoring all else. In this exercise we'll try to see what happens in reality.

Assignment

In general, you'll just measure the time to sort various inputs.

First, you need to select a sorting program. You can use one supplied in a library (e.g. qsort from libc), or even the Unix sort utility. Alternatively, you can write your own. This has the advantage that you can explore the effect of various details, e.g. how the pivot is selected in quicksort, but it is not a requirement.
Measure the time to sort n numbers, for different n. Use n random numbers as the input. Start from small ns and go up to very large ns, selecting additional values along the way. Part of your assignment is to set the limit; try to explore as much as possible, because after all the O(n log n) claim is an asymptotic result. You are allowed to place a reasonable limit on the runtime (e.g. the total time for all the measurements done in this exercise need not be more than a few hours).
Make this a solid measurement: take multiple measurements for each n (i.e. with different random seeds), look for outliers, and calculate means and standard deviations. Or better yet, calculate confidence intervals. Note that in presenting the results, you need to be very specific about what exactly you did.
Note that one of the problems you may encounter is cache effects. Can you make reliable measurements that characterize a cold-cache situation as opposed to a warm-cache situation? Which do you actually want to measure?
Check the effect of different inputs, and specifically the degree to which the input is pre-sorted. At a minimum, compare random input, sorted input, and reverse-sorted input. If you want you can also try to check partially sorted input; but then you need to specify exactly how you generate partially sorted input, and how you quantify the degree to which it is sorted.

Submit

Submit a single pdf file that contains all the following information:

Your names, logins, and IDs.
Specification of what sorting program you used. If you wrote your own, attach the source code.
Results of the experiments, detailing the effect of n and of the input. You should provide the raw results (possibly in an appendix), and an analysis that portrays your results, and specifically, reaches some conclusion as to whether the complexity is O(n log n). This should probably include a graph; think about how to make your point most clearly. In particular, you should be able to differentiate between O(n), O(n log n), O(n²), etc.

Submission deadline is Monday morning, 23/5/11, so I can give feedback in class on Tuesday.

Please do the exercise in pairs.

To the course home page