This log contains two years worth of accounting records produced by the DJM software running on the 1024-node CM-5 at Los Alamos National Lab (LANL). For more information about LANL, see URL http://www.lanl.gov/. The log contains detailed information about resource requests and use, including memory. It also contains data on the user, executable, project, and submit, start, and end times. Jobs on the CM-5 use powers of two nodes according to a fixed partitioning. Gang scheduling is used, especially on smaller partitions, but jobs can also run in dedicated mode. Using gang scheduling implies that runtime information may be inaccurate, see usage notes. The log is available in two formats. One is the original daily log files created by DJM (the job management software on the CM-5), which also include details on the operation of DJM itself and various special cases (such as re-running an application after a failure, or forcing it to run immediately). The other is a condensed form with one line per job, with only the conventional timing and resource usage information. The workload log from the LANL CM-5 was graciously provided by Curt Canada, who also helped with background information and interpretation. If you use this log in your work, please use a similar acknowledgment.
Downloads:
|
|
These files contain multi-line entries for each event that took place. Examples of events are job submittal, job start, job termination, etc. The format is largely self-explanatory.
This was parsed by a special perl script to produce the condensed format.
The differences between conversion 4 (reflected in LANL-CM5-1994-4.swf) and conversion 3 (LANL-CM5-1994-3.swf) are
The original log contains several flurries of very high activity by individual users, which may not be representative of normal usage. These were removed in the cleaned version, and it is recommended that this version be used. The cleaned log is available as LANL-CM5-1994-4.1-cln.swf.
A flurry is a burst of very high activity by a single user. The filters used to remove the three flurries that were identified are
user=50 and job>24438 and job<64543 (33452 jobs)In total, 79327 jobs were removed. Note that the filters were applied to the original log, and unfiltered jobs remain untouched. As a result, in the filtered log job numbering is not consecutive.
user=31 and job>64586 and job<115041 (34307 jobs)
user=38 and job>178584 and job<192711 (11568 jobs)
Further information on flurries and the justification for removing them can be found in:
File LANL-CM5-1994-4.1-cln.swf