Parallel Workloads Archive: LLNL Cray T3D

The Lawrence Livermore National Lab (LLNL) T3D log

System:	256-node Cray T3D
Duration:	June through September 1996
Jobs:	22,779 (represented by 40591 rolls)

This log contain nearly 4 months worth of accounting records for the 256-node Cray T3D located at the Lawrence Livermore National Lab (LLNL). For more information about this installation, see URL http://www.llnl.gov/sccd/.

This log is unique in that the scheduler used supported a coarse-grained version of gang scheduling, whereby jobs could be preempted and swapped out to make room for other jobs. This activity is known as “rolling out” and “rolling in”. The log contains information about each separate “roll” of each job. Note, however, that there are some inconsistencies in the log, where a roll of one job (that does not terminate) is subsequently followed by a roll of another job with the same process ID. “Some” is 1456, which is 6.4% of the jobs in the log.

The log is available in two formats: the original format with information about each roll, and a condensed format where all rolls of each job have been summed up.

The log contains information about the start time, resource usage, user, and job, for each execution slot of each parallel job.

The workload log from the LLNL Cray T3D was graciously provided by Moe Jette (jette AT llnl.gov), who also helped with background information and interpretation. If you use this log in your work, please use a similar acknowledgment.

Downloads:

LLNL-T3D-1996-0	0.6 MB gz	original log with full data
LLNL-T3D-1996-0j2	0.2 MB gz	original log with job summaries
LLNL-T3D-1996-2.swf	0.2 MB gz	converted log
LLNL-T3D-1996-1.swf	0.2 MB gz	OLD VERSION of converted log (replaced 30 Nov 2011)

(May need to click with right mouse button to save to disk)

Papers Using this Log:

This log was used in the following papers:
[feitelson97a] [feitelson98b] [talby99b] [kavas01] [bender05] [talby07] [iosup08]

System Environment

The Cray T3D has a 3-dimensional torus topology. There are also 4 synchronization circuits structured as a tree. Each of the 128 nodes has two DEC Alpha 21064 processors, each with 64MB of memory. The minimal allocation appears to be one node (2 processors). Nodes can only be allocated in certain groups, which are always powers of two. Rolling out jobs (swapping) is backed by a 48GB file system.

Jobs are submitted either interactively or via NQS. The following queues and resource usage limits were used:

interactive 2 hours 32 procs

pe32 4 hours 32 procs

pe64 4 hours 64 procs

pe128 4 hours 128 procs

pe256 4 hours 256 procs

pe64_long 40 hours 64 procs

pe128_short 15 minutes 128 procs

pe256_short 15 minutes 256 procs

The gang scheduler is responsible for dispatching the interactive jobs and those selected for execution by NQS, if their combined resource needs exceed those physically available. This is done by issuing instructions to the underlying operating system to roll out currently running jobs, and roll in other jobs in their place.

More information about the gang scheduler is available on-line at URL http://www.llnl.gov/sccd/lc/gang/ And in the following paper:
Dror G. Feitelson and Morris A. Jette, “Improved utilization and responsiveness with gang scheduling”, In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson and L. Rudolph (Eds.), Springer-Verlag, 1997, Lect. Notes Comput. Sci. vol. 1291, pp. 238-261.

Original Log Format

The original log is available as LLNL-T3D-1996-0.

This file contains one line per roll (execution slot) of each job, with the following white-space separated fields:

Start date (date execution actually begins, after job initiation or roll in). Year is not given (it is 1996).
Start time (time execution actually begins, after job initiation or roll in).
Process ID (consistent across all rolls).
Partition ID (essentially, a session ID on the machine, can change with roll).
User name (sanitized to preserve anonymity).
Number of processors.
Execution time in seconds (for this roll/ time slot). To compute the total execution time for a job, sum the records from all the entries with the same process ID.
Resource use (processor count times execution time).
Exit code -- one of the following:

R Rolled out

K Killed (includes time limit exceeded)

- Ran to completion (exited normally)
Application name (also sanitized).

Jobs Log Format

The jobs summary log is available as LLNL-T3D-1996-0j2. This file contains one line per job, with the following white-space separated fields:

Start date of first roll (without year which is 1996)
Start time of first roll.
Process ID.
User name (sanitized to preserve anonymity).
Number of processors.
Execution time in seconds, summed over all rolls of this job.
Number of rolls for this job.
Exit code: K or - as above, or S to indicate strange behavior.
Application name (also sanitized).

Strange behavior means that the same process ID was reused inconsistently, so there was an R record with certain parameters, but the next record had different parameters (i.e. different user, processors, or application). When this happened it was assumed that the first job ended in some strange way, and this is the job with the S status. The new record was taken to indicate the start of another job that happened to have the same process ID.

Conversion Notes

The converted log is available as LLNL-T3D-1996-2.swf. The conversion from the jobs summary format to SWF was essentially straightforward, and was done subject to the following.

As submit times are not given, start times were used and wait times were given as -1.
The conversion loses the following data, that cannot be represented in the SWF:
- The division of each job into rolls. (Actually this can be done using status codes 2, 3, and 4, but was not done.)
The following anomalies were identified in the conversion:
- As noted above, in 1456 jobs the last roll did not have a termination status. These jobs are assumed to have failed.

The conversion was done by a log-specific parser in conjunction with a more general converter module.

The differences between conversion 2 (reflected in LLNL-T3D-1996-2.swf) and conversion 1 (LLNL-T3D-1996-1.swf) are

In the original conversion the "strange" jobs were deleted. In the new one they are retained. There are 1456 such jobs.
In the original conversion wait times were listed as 0, and run times as -1. The recorded runtimes were then listed in the requested runtime field. In the new conversion a more common approach is taken, where wait times are -1, runtimes are as given, and requested runtimes are -1.

Usage Notes

There are no known problems with this log.

The Log in Graphics

LLNL-T3D-1996-2.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot

Graphs for utilization are not given, because we do not maintain correct data about individual roles. As a result we don't know how many processors were indeed used at each time instant.

Parallel Workloads Archive - Logs

interactive	2 hours	32 procs
pe32	4 hours	32 procs
pe64	4 hours	64 procs
pe128	4 hours	128 procs
pe256	4 hours	256 procs
pe64_long	40 hours	64 procs
pe128_short	15 minutes	128 procs
pe256_short	15 minutes	256 procs

R	Rolled out
K	Killed (includes time limit exceeded)
-	Ran to completion (exited normally)