Experimental Methods in Computer Science – Exercise 1

Experimental Methods in Computer Science

Exercise 1 – Data Exploration and Simple Graphs

Goals

Background

Experiments often produce many numbers. The big question is what these numbers mean. Thus a basic activity is exploratory data analysis: looking at the data in many different ways and trying to figure out what the data tells us. The main technique for data exploration is creating graphical displays.

Once we (think we) know what the data has to say, we need to display it in a way that will enables others to see the light too. Thus it is very important to choose exactly what data to show, and to choose the best way to show it. Again this typically means using a graphical display.

There are many options for graphs:

Moreover, the above can be used repeatedly, combined with each other, etc.

In addition, there is the technical issue of preparing graphs. As you will need to plot graphs for this and future exercises, you should select and learn to use some plotting tool. Options include

Assignment

In this exercise we do the first step, namely explore the data. Given the following data about the Titanic disaster, try to decide what is interesting about it.

The data is given in the linked file as a simple text table. This shows the breakdown of passengers into 1st class, 2nd class, 3rd class, and crew on one hand, and into men, women, and children on the other. For each combination, two numbers are given: the total number (e.g. there were 93 women in 2nd class) and how many of them survived the disaster (in this case, 80).

Your task is to look at this data set. This means you should probably create several simple graphs that portray the data in different ways. A good idea may be to limit each graph to one particular aspect of the data. Try to be comprehensive and look at all aspects of the data, because initially you don't know what will turn out to be interesting. Think about distributions (how many in each class), absolute numbers (how many survived), relative numbers (what fraction survived), correlations (were there more women in higher classes?), etc. If some of the graphs turn out to be boring, that's perfectly OK. At this stage we are using the graphs to learn about the data, and not all the things we check will turn out to be important or interesting.

Based on your exploration, you will hopefully reach some conclusions about what is in fact interesting in the data. This can be used as a guideline for preparing a “final” graphical display that shows all the important things in a concise and elegant way. The main output of this exercise is a set of such guidelines, explaining what the final graphic should show. It does not include the final graphic itself, or even ideas about how to show the data.

In your graphs, even if they are only for personal use, don't forget to label the axes, provide a legend or annotations, etc.

Submit

Use moodle to submit a report on your work, in pdf. Do not send me a Microsoft Word (.doc or .docx) file. The report should include

  1. Your names, logins, and IDs
  2. A short explanation of what you did, including the graph(s). This is expected to be a list of ideas like "we wanted to see X", and then a simple graph showing X.
  3. Your findings: suggested guidelines about what is interesting in the data and what should indeed be shown.

Submission deadline is Monday, 21/2/11, so I can give feedback in class on Tuesday.

Please do the exercise in pairs. Use this to collaborate and discuss how to get the best solution, not to divide the work between you ("I do this course, you do that course").

To the course home page