Experimental Methods in Computer Science – Exercise 9

Experimental Methods in Computer Science

Exercise 9 – Mapping the Internet with DIMES

Goals

Background

The Internet's topology can be studied at several levels of granularity. The main two are the router level and the AS level, where AS stands for ``autonomous system'', the organizations that make up the Internet and provide its infrastructure.

The DIMES project collects data about the Internet's topology by using nearly 20,000 user agents, who map what they see from their vantage point. This data is transmitted to the center at Tel-Aviv, and collected to produce a listing of all the observed connections between ASs. Recently, they have also added data about the connections between cities and countries. This is the data we'll use.

Assignment

  1. Get data.

    We'll use the "CityEdges" files which have been collected since 2007. They are available for download from the DIMES Public data repository. To save you trouble, we have downloaded three sample files from April 2007, April 2008, and February 2009. These are available here at ~exp/www/CitiEdges*. The format of these files is a line for each edge between two cities, with the following comma-separated fields:

    1. Source City
    2. Destination City
    3. Source Country
    4. Destination Country
    5. Source Latitude
    6. Source Longitude
    7. Destination Latitude
    8. Destination Longitude
    9. Date Of Discovery
    10. Date Of Validation
    11. Number Of IPs
    As usual, start by familiarizing yourself with the data, and checking whether it looks OK or there is something suspicious about it.

  2. Analyze.

    Our goal is to characterize the connectivity of different countries, and how it changes with time. To do this, write a short script to extract the following data:

    1. The number of internal links within each country, i.e. links between cities within that same country.
    2. The number of external links from each country, i.e. links from a city in this country to a city in another country.
    3. The number of distinct neighboring countries for each country, i.e. with how many different countries is it connected.
    4. The dominant city in each country, i.e. the one with the most links.
    You are invited to look for other interesting things as well. We are just beginning to research this data, and are wide open to suggestions...

  3. Compare.

    Run your script on the 3 data files suggested (or more if you wish — but then you need to select them carefully; use Scott's slides showing the variability of the data in different months as a guideline). Use this to look for the countries with highest figures and the biggest changes over time in each of the three metrics (internal links, external links, and neighbors). Also specifically look at "interesting" countries, including the US, Russia, China, India, Brazil, and another two of your own choice. For the set of countries you identify, also consider the dominant city: is it a major city you know about? If not, is it near one of the country's major cities (use the geographical coordinates) ? Does the dominant city change with time?

Submit

Submit a single pdf file that contains all the following information:

  1. Your names, logins, and IDs.
  2. The results of the analysis as described above.
  3. Any relevant output plots you may have generated.
  4. The programs/scripts you used to analyze the data.
Submission deadline is Monday morning, 8/6/09.

Please do the exercise in pairs.

To the course home page