This log contains 11 months worth of accounting records for the 512-node IBM SP2 located at the Cornell Theory Center (CTC). Apparently, only 338 nodes are used for the batch jobs in the log. Scheduling on this machine was performed by EASY and LoadLeveler. For more information about CTC, see URL http://www.tc.cornell.edu/. The workload log from the CTC SP2 was graciously provided by Dan Dwyer (dwyer@tc.cornell.edu) from the Cornell Theory Center, a high-performance computing center at Cornell University, Ithaca, New York, USA. The information below was provided by Steve Hotovy. If you use this log in your work, please use a similar acknowledgment. Also, please send a notice of your work to cal@tc.cornell.edu. In addition to the production log from July 1996 to May 1997, an early log covering 75,944 jobs during June 1995 to April 1996 is also available. This is the log used by Hotovy in his analysis of the evolution of the workload soon after the machine was installed ([hotovy96]). During this period only LoadLeveler was used.
Downloads:
System EnvironmentOf the 512 nodes in the system, 430 are dedicated to running batch jobs (but see usage notes below). The remainder of the nodes are used for interactive jobs, I/O nodes, special projecs, and system testing. The log pertains to the batch partition.The CTC SP2 is heterogeneous in the sense that not all 512 nodes are identical. The actual configurations of the 430 nodes in the batch partition are as follows:
Update (3 June 2013):The link given above for data about the system is no longer available, but a snapshot from 1997 is available on the Internet Archive. In particular, this includes a page specifying the details of the SP system's comfiguration. This indicates that the system was divided into several distinct pools that were scheduled in different ways. Specifically, pool 4 was scheduled by EASY-LL, and included 21 racks of 16 thin nodes each, plus 27 nodes from additional racks. Given that 16x21=336, this may be the actual partition and size that gave rise to this log. This also matches the usage data as shown below. If including the nodes from the other racks the size is 363, but then the typical usage level is only 0.93.(Thanks to Dan Tsafrir for digging this up.) |
This file contains one line per completed job with the following white-space separated fields:
The differences between conversion 3 (reflected in CTC-SP2-1996-3.swf) and conversion 2 (CTC-SP2-1996-2.swf) is only in the assumed size of the machine: in conversion 3 it set to 338.
The differences between conversion 2 (reflected in CTC-SP2-1996-2.swf) and conversion 1 (CTC-SP2-1996-1.swf) are
The converted early log is available as CTC-SP2-1995-2.swf. The conversion from the original format to SWF was done subject to the following.
The original log contains a flurry of activity by one user which may
not be representative of normal usage.
This has been removed in the cleaned version of the log, and it is
recommended that this version be used.
The cleaned log is available as CTC-SP2-1996-2.1-cln.swf.
A flurry is a burst of very high activity by a single user. In this case, it involved 2080 jobs. The filter used to remove it was
user=135 and job>47420 and job<50308Note that the filter was applied to the original log, and unfiltered jobs remain untouched. As a result, in the cleaned log job numbering is not consecutive.
Further information on flurries and the justification for removing them can be found in:
This is the utilization graph when assuming 430 nodes, showing that the utilization has a pronounced upper limit of 0.78, and implying that the actual partition size is actually smaller. |
File CTC-SP2-1996-3.swf
File CTC-SP2-1996-3.1-cln.swf (cleaned)
File CTC-SP2-1995-2.swf (early log)