Experimental Methods in Computer Science – Exercise 2

Experimental Methods in Computer Science

Exercise 2 – How to Display Data

Goals

Assignment

In ex1 we just practiced the mechanics of creating a graph. Here the main point is the data: what to show and how to show it.

Think about how to present the following data set. The data comes from an experiment that compares the performance of four schedulers (called "EASY", "maui", "flex", and "cons"). Several metrics are used, including average response time (in seconds), and bounded slowdown (slowdown is the ratio of response time to the actual running time, so it is a unitless pure number). The performance data is given for successive months, using workload data from the KTH SP2 parallel supercomputer from September 1996 to August 1997.

The data is given in the form of 4 files, one for each scheduler: EASY, maui, flex, and cons. Each file contains a line for each month, with the following fields:

Obviously this is a lot of data and therefore hard to swallow, so some sort of graphical representation should be used. Your job is to decide what graphical representation to use, and then to actually create the graph. You can also use a combination of several graphs if you wish, but consider this carefully, as too many graphs may create clutter and obscure the focus you are trying to achieve.

In making the decision, you should focus on what question you want the data to answer; choose the representation that seems best for this need. Hint: we are comparing four schedulers. We are using different metrics, and considering different months separately, in order to obtain different perspectives on this comparison. Regrettably, the results are inconclusive: on certain months, and using certain metrics, one scheduler may be better than another, but the opposite may happen with other metrics or on other months. This is what makes the data messy, and why making it understandable is a challenge. Try to find a representation that (1) shows the important differences in performance, and (2) tries to see if they are correlated with some other factors.

Another important decision is what data to use. You do not necessarily need to present all the available data. If some of the data items repeat each other and do not provide anything new or insightful, they need not be shown (you will probably only find this out after showing them; don't be afraid to remove things you have already done). Some of the data may be just boring or meaningless. And some of the data may be suspect for some reason, and should be deleted altogether.

Note that you are not expected to understand the details of the scheduling and the metrics. Thus any reasonable decisions you make will be accepted. The focus is on creating clear graphics that allow the major features of the data to be viewed.

Submit

Submit a report on your work, in pdf. Do not send me a Microsoft word (.doc) file. The report should include

  1. Your names, logins, and IDs
  2. A short explanation of what you did, including
    1. What was the goal? what did you want to see?
    2. What graphical representation did you choose? How does it support the above goal?
  3. The output plot(s). If you have multiple plots, try to compose them together.
Submission deadline is Monday, 30/3/09, so I can give feedback in class on Tuesday.

Please do the exercise in pairs.

To the course home page