This log contains some three years worth of accounting records from the Sandia Ross cluster. This is phase III of the CPlant project. It was installed in 2000, and comprises 48 scalable units from Compaq, each with 32 nodes. Note, however, that this size was probably reduced later. This implies that the load in different parts of the log may be quite different, making it unsuitable for bulk usage in simulations, see usage notes. The workload log from the Sandia Ross cluster was graciously provided by Jon Stearley (jrstear@sandia.gov). If you use this log in your work, please use a similar acknowledgment.
Downloads:
System EnvironmentThe Sandia CPlant project was a realtively early large-scale cluster intended to replace an MPP.The log available here is from phase III of the project. Initially this was composed of 48 cabinets. Each cabinet had 32 Compaq DS10L servers, for a total of 1536 servers. Of these, 1524 were used to run parallel jobs. However, it seems that later this number was reduced. Each node had a 466 MHz 21264 (EV6) Alpha microprocessor and 256 MB ECC SDRAM. Each cabinet also has a service node used for management. The nodes are connected by a Myrinet gigabit network. Each cabinet also has an Ethernet. System software included a parallel job launcher called yod, a compute-node daemon process called PCT on each node, and a system-wide compute node allocator called bebopd, which works with PBS. |
|
However, using an SWF parser in conjunction with a general converter module the following anomalies were observed:
The original log contains quite a few flurries of activity by three
users which may not be representative of normal usage.
This has been removed in the cleaned version of the log, and it is
recommended that this version be used.
The cleaned log is available as File Sandia-Ross-2001-1.1-cln.swf
A flurry is a burst of very high activity by a single user. The filters used to remove the three flurries that were identified are
user=38 and (job>5843 and job<9472) or (job>34042 and job<36017) (2593 jobs)In total, 27473 jobs were removed. Note that the filters were applied to the original log, and unfiltered jobs remain untouched. As a result, in the cleaned log job numbering is not consecutive.
user=84 and (job>10178 and job<24056) or (job>25398 and job<29185) (10600 jobs)
user=175 and job>50166 and job<70468 (14280 jobs)
Further information on flurries and the justification for removing them can be found in:
File Sandia-Ross-2001-1.1-cln.swf