Experimental Methods in Computer Science – Exercise 1

Experimental Methods in Computer Science

Exercise 1 – Simple Display of Data

Goals

Background

Experiments often produce many numbers. Thus it is very important to choose exactly what data to show, and to choose the best way to show it. This typically means using a graphical display.

The simplest tool for many people is Excel. You may use Excel. However, be warned that Excel has pretty bad defaults for lots of things, and excuses like "this is how Excel did it" will not be accepted. Example problems are the use of line-plots instead of X-Y plots and limitations with logarithmic axes. Another is the color scheme but this was improved in the latest version.

It may be worth your time to learn to use some other graphics package. Commonly used free packages include gnuplot and ploticus. R is a full statistics analysis environment with good graphics capabilities. Or you could just use matlab.

In this exercise and in all future exercises strong emphasis will be placed on graphical excellence. This means you should take care of the following:

  1. Scales should be appropriate to show the data clearly, without misleading the reader. If relevant, consider using logarithmic scales or axis breaks.
  2. The axes should be labeled and the units included in square brackets. Only exception for units is when you are plotting a pure number, e.g. a count.
  3. The available space should be used efficiently (that is, avoid situations where the graph occupies only a small part of the plotting area, unless you have a good reason related to the story that the graph is trying to convey).
  4. Colors and shapes should be used intelligently to make connections as appropriate. Avoid situations where a line is plotted in light yellow on white background, or lines are hard to distinguish from each other.

Assignment

The data is a pair of measurements of how network bandwidth depends on the size of the messages that are being transmitted. Two such measurements have been performed, and the results are available as dataset 1 and dataset 2. The format is simply two coumns: in each row, the first number is the message size in bytes, and the second is the achieved bandwidth in Mb/s.

Your task is simply to show both these data sets together. Think about what graphical form to use. Don't forget to label the axes, provide a legend or annotations, etc.

Submit

Use moodle to submit a report on your work, in pdf. Note that I request pdf; do not send me a Microsoft word (.doc) file. The report should include

  1. Your names, logins, and IDs
  2. Your rationale for drawing the graph as you did
  3. The resulting graph(s).
Submission deadline is Friday, 14/3/14.

To the course home page