This log contains two years worth of accounting records for the 416-node Intel Paragon located at the San Diego Supercompter Center (SDSC). For more information about SDSC, see URL http://www.sdsc.edu/. The original logs and some more information (including a log of downtime!) are also available directly from SDSC. Due to historical reasons, the log is divided to two parts (one per year). These extensive logs contain information about the number of nodes, submit, start, and end times, CPU time used, NQS queues used and their limits, and user. There is no information about the application being run. The workload logs from the SDSC Paragon were graciously provided by Reagan Moore (moore@sdsc.edu) and Allen Downey (downey@sdsc.edu), who also helped with background information and interpretation. If you use this log in your work, please use a similar acknowledgment.
You can also reference the following: Downloads:
|
|
48 | Interactive partition |
352 | Compute partition (for parallel jobs via NQS) |
6 | Service partition (logins etc.) |
10 | I/O partition (parallel file system) |
Batch jobs were handled by NQS, which was configured with a large number of queues. Queue names are parsed as follows:
q [f] nodes lenThe optional `f' indicates the use of fat nodes, with 32MB of memory. 256 nodes in the compute partition are thus configured. Other nodes have 16MB of memory.
s | Short: 1 hour |
m | Medium: 4 hours |
l | Long: 12 hours |
The scheduling algorithm used on this system is described in detail in
the following paper:
Michael Wan, Reagan Moore, George Kremenek, and Ken Steube,
"A Batch Scheduler for the Intel
Paragon with a Non-Contiguous Node Allocation Algorithm".
In Job Scheduling Strategies for Parallel Processing,
Dror G. Feitelson, and Larry Rudolph (Eds.), Springer-Verlag,
pp. 48-64, 1996, Lect. Notes Comput. Sci. vol. 1162.
The differences between conversion 3 (reflected in SDSC-Par-1995-3.swf and SDSC-Par-1996-3.swf) and conversion 2 (SDSC-Par-1995-2.swf and SDSC-Par-1996-2.swf) are
The first anomaly is a set of 16 jobs that is executed every day at around 3:45 AM. These are probably automatic jobs that perform some system administration function. They were removed using the following filter:
(user=2 or user=5) and (submit_hour=3 or submit_hour=4)In 1995, 6604 jobs were thus removed. In 1996, 5301 were removed.
The second anomaly is flurries of very high activity by individual users. Four flurries of different magnitudes occured in 1995. The filters used to remove them were
user=61 and job>17495 and job<25398 (2005 jobs)In addition, a single flurry occured in 1996. This was removed by the following filter
user=62 and job>17441 and job<26321 (2139 jobs)
user=66 and job>55419 and job<69428 (8678 jobs)
user=92 and job>69856 and job<76815 (3476 jobs)
user=23 and job>3035 and job<4677 (1283 jobs)Note that the filters were applied to the original logs, and unfiltered jobs remain untouched. As a result, in the filtered logs job numbering is not consecutive.
Further information on flurries and the justification for removing them can be found in:
File SDSC-Par-1995-3.1-cln.swf
File SDSC-Par-1996-3.swf
File SDSC-Par-1996-3.1-cln.swf