This log contain nearly 4 months worth of accounting records for the 256-node Cray T3D located at the Lawrence Livermore National Lab (LLNL). For more information about this installation, see URL http://www.llnl.gov/sccd/. This log is unique in that the scheduler used supported a coarse-grained version of gang scheduling, whereby jobs could be preempted and swapped out to make room for other jobs. This activity is known as rolling out and rolling in. The log contains information about each separate roll of each job. Note, however, that there are some inconsistencies in the log, where a roll of one job (that does not terminate) is subsequently followed by a roll of another job with the same process ID. Some is 1456, which is 6.4% of the jobs in the log. The log is available in two formats: the original format with information about each roll, and a condensed format where all rolls of each job have been summed up. The log contains information about the start time, resource usage, user, and job, for each execution slot of each parallel job. The workload log from the LLNL Cray T3D was graciously provided by Moe Jette (jette AT llnl.gov), who also helped with background information and interpretation. If you use this log in your work, please use a similar acknowledgment.
Downloads:
|
|
Jobs are submitted either interactively or via NQS. The following queues and resource usage limits were used:
interactive | 2 hours | 32 procs |
pe32 | 4 hours | 32 procs |
pe64 | 4 hours | 64 procs |
pe128 | 4 hours | 128 procs |
pe256 | 4 hours | 256 procs |
pe64_long | 40 hours | 64 procs |
pe128_short | 15 minutes | 128 procs |
pe256_short | 15 minutes | 256 procs |
The gang scheduler is responsible for dispatching the interactive jobs and those selected for execution by NQS, if their combined resource needs exceed those physically available. This is done by issuing instructions to the underlying operating system to roll out currently running jobs, and roll in other jobs in their place.
More information about the gang scheduler is available on-line at URL
http://www.llnl.gov/sccd/lc/gang/
And in the following paper:
Dror G. Feitelson and Morris A. Jette,
Improved
utilization and responsiveness with gang scheduling,
In Job Scheduling Strategies for Parallel Processing,
D. G. Feitelson and L. Rudolph (Eds.), Springer-Verlag, 1997,
Lect. Notes Comput. Sci. vol. 1291, pp. 238-261.
This file contains one line per roll (execution slot) of each job, with the following white-space separated fields:
R | Rolled out |
K | Killed (includes time limit exceeded) |
- | Ran to completion (exited normally) |
The differences between conversion 2 (reflected in LLNL-T3D-1996-2.swf) and conversion 1 (LLNL-T3D-1996-1.swf) are
Graphs for utilization are not given, because we do not maintain correct data about individual roles. As a result we don't know how many processors were indeed used at each time instant.