The Internet's topology can be studied at several levels of granularity. The main two are the router level and the AS level, where AS stands for ``autonomous system'', the organizations that make up the Internet and provide its infrastructure.
The DIMES project collects data about the Internet's topology by using nearly 20,000 user agents, who map what they see from their vantage point. This data is transmitted to the center at Tel-Aviv, and collected to produce a listing of all the observed connections between ASs. Recently, they have also added data about the connections between cities and countries. This is the data we'll use.
Assignment
We'll use the "CityEdges" files which have been collected since 2007. They are available for download from the DIMES Public data repository. To save you trouble, we have downloaded three sample files from April 2007, April 2008, and February 2009. These are available here at ~exp/www/CitiEdges*. The format of these files is a line for each edge between two cities, with the following comma-separated fields:
Our goal is to characterize the connectivity of different countries, and how it changes with time. To do this, write a short script to extract the following data:
Run your script on the 3 data files suggested (or more if you wish — but then you need to select them carefully; use Scott's slides showing the variability of the data in different months as a guideline). Use this to look for the countries with highest figures and the biggest changes over time in each of the three metrics (internal links, external links, and neighbors). Also specifically look at "interesting" countries, including the US, Russia, China, India, Brazil, and another two of your own choice. For the set of countries you identify, also consider the dominant city: is it a major city you know about? If not, is it near one of the country's major cities (use the geographical coordinates) ? Does the dominant city change with time?
Submit
Submit a single pdf file that contains all the following information:
Please do the exercise in pairs.