Experimental Methods in Computer Science
Exercise 11 – Complexity of Sorting
Goals
- Verify that the complexity of sorting is indeed O(n log n)
- ...or not
- See how this may depend on inputs
Background
One of the most often repeated results in algorithm analysis is that the
complexity of sorting is O(n log n).
But this is based on an abstract model of computation, e.g. a random-access
machine, or even on only counting comparison operations and ignoring all else.
In this exercise we'll try to see what happens in reality.
Assignment
In general, you'll just measure the time to sort various inputs.
- First, you need to select a sorting program.
You can use one supplied in a library (e.g. qsort from libc), or even the Unix
sort utility.
Alternatively, you can write your own.
This has the advantage that you can explore the effect of various details,
e.g. how the pivot is selected in quicksort, but it is not a requirement.
- Measure the time to sort n numbers, for different n.
Use n random numbers as the input.
Start from small ns and go up to very large ns, selecting
additional values along the way.
Part of your assignment is to set the limit; try to explore as much as
possible, because after all the O(n log n) claim is an asymptotic result.
You are allowed to place a reasonable limit on the runtime
(e.g. the total time for all the measurements done in this exercise need not
be more than a few hours).
Make this a solid measurement:
take multiple measurements for each n (i.e. with different random
seeds), look for outliers, and calculate means and standard deviations.
Or better yet, calculate confidence intervals.
Note that in presenting the results, you need to be very specific about what
exactly you did.
Note that one of the problems you may encounter is cache effects.
Can you make reliable measurements that characterize a cold-cache situation as
opposed to a warm-cache situation?
Which do you actually want to measure?
- Check the effect of different inputs, and specifically the degree to which
the input is pre-sorted.
At a minimum, compare random input, sorted input, and reverse-sorted input.
If you want you can also try to check partially sorted input; but then you need
to specify exactly how you generate partially sorted input, and how you
quantify the degree to which it is sorted.
Submit
Submit
a single pdf file that contains all the following information:
- Your names, logins, and IDs.
- Specification of what sorting program you used.
If you wrote your own, attach the source code.
- Results of the experiments, detailing the effect of n and of the
input.
You should provide the raw results (possibly in an appendix), and an analysis
that portrays your results, and specifically, reaches some conclusion as to
whether the complexity is O(n log n).
This should probably include a graph; think about how to make your point most
clearly.
In particular, you should be able to differentiate between O(n), O(n log n),
O(n2), etc.
Submission deadline is
Monday morning, 23/5/11,
so I can give feedback in class on Tuesday.
Please do the exercise in pairs.
To the course home page