Parallel Workloads Archive: KTH SP2

The Swedish Royal Institute of Technology (KTH) IBM SP2 log

System:	100-node IBM SP2
Duration:	October 1996 thru August 1997
Jobs:	28,490

This log contains eleven months worth of accounting records from the 100-node IBM SP2 at the Swedish Royal Institute of Technology (KTH) in Stockholm. For more information about this installation, see URL http://www.pdc.kth.se

Note that the first couple of weeks of the log exhibit a somewhat reduced utilization. this could indicate that the system's configuration was different during this period. However the effect is modest and its duration relatively short. The cleaned version of the log disposes of much of the problem.

The workload log from the KTH SP2 was graciously provided by Lars Malinowsky (lama@pdc.kth.se), who also helped with background information and interpretation. If you use this log in your work, please use a similar acknowledgment.

Downloads:

KTH-SP2-1996-0	0.74 MB gz	original log
KTH-SP2-1996-2.swf	0.41 MB gz	converted log
KTH-SP2-1996-2.1-cln.swf	0.41 MB gz	cleaned log -- RECOMMENDED, see usage notes
KTH-SP2-1996-1.swf	0.40 MB gz	OLD VERSION of converted log (replaced 1 Aug 2006)

(May need to click with right mouse button to save to disk)

There is no cleaned version of this log as no serious anomalies have been found so far.

Papers Using this Log:

This log was used in the following papers: [feitelson98] [talby99a] [talby99b] [zotkin99] [cirne00] [mualem01] [feitelson01] [cirne01b] [streit02] [srinivasan02] [lawson02] [lublin03] [shmueli03] [ernemann03] [feitelson03a] [song04] [streit04] [feitelson04b] [feitelson05c] [feitelson05d] [talby05] [tsafrir05b] [shmueli05] [zilber05] [tsafrir06b] [shmueli06] [franke06] [iosup06] [ranjan06] [tsafrir07a] [feitelson07a] [talby07] [shmueli07] [ranjan08] [iosup08] [feitelson08] [shmueli09] [feitelson09] [folling09] [minh09] [thebe09] [tsafrir10] [sodan11] [lindsay12] [liux12] [krakov12] [zakay12] [klusacek12] [deng13] [zakay13] [liang13] [krakov13] [rajbhandary13] [sheikhalishahi14] [zakay14] [zakay14b] [feitelson14] [liu15] [lucarelli17] [soysal19]

System Environment

The 100 nodes in the batch pool are divided into different types:

number	code	type
88	T	thin2
10	W	wide
2	Z	wide with more memory
64	U	another remote machine (experimental)

Over the period of time covered the actual number of nodes available has fluctuated due to PEs being set aside for reserved/interactive/course use, as PIOFS-servers, upgrades, service, etc.

The system imposes limits on job run times, and this was changes a couple of times during the period that the log was recorded. The limits in effect were as follows.

Prior to June 6, 1997 Weekdays between 0700 and 1600: limit of 4h.
Weekday nights: limit of 15h.
Weekends: limit of 60h.
However, no jobs could run across 0700 and 1600 weekdays (synchronization points.)

June 6 to July 15, 1997 Same as the above, but the restriction due to the synchronization points was removed. For example, a job could start at 0600 on a weekday, and it would have to terminate by 1100, so as not to violate the 4-hour rule that came into effect at 0700.

Starting July 15, 1997 Same as the above, but the 4-hour restriction during weekdays only applies to 64T and 2W nodes. The rest of the nodes allow a 15-hour limit even during weekdays. In addition, a `fallback' mechanism was activated; for example, if you request a T node you might get any kind of TWZ.

Log Format

The original log is available as KTH-SP2-1996-0.

This file contains one line per completed job with the following white-space separated fields:

usr: username.
cac: accounting group — was enforced towards the end of the log period.
jid: job ID with embedded submit date and time
req: requested nodes, possibly designating desired types, e.g. 72T8W.
tstart: date and time when all nodes were available.
tstop: date and time when the last node was returned (jobs may deallocate individual nodes).
npe: total number of CPUs (should match the sum of different types from req).
treq: wall time requested (used by EASY for backfilling).
uwall: used wall time (first node deallocation minus last node allocation).
reqcpu: requested CPU-time (npe x treq).
ucpu: used CPU-time (the sum of for how long each PE was allocated).
twait: difference between job start and when job entered the FIFO queue.
status: jobs that were released automatically by the system, e.g. because they exceeded their requested time, are marked by "autorel". Jobs that terminated normally do not have anything in this field.

Elapsed and aggregate times are reported in a unique format, with the hours and minutes separated by the letter `h'. For example, 4h is 4 hours, 0h02 is 2 minutes, and 84h25 is 84 hours and 25 minutes (about 3.5 days).

Note that uwall is not the same as the run time usually reported in other logs. A better match to common practice is to calculate tstop - tstart, the time from when all nodes became available and the job started running until the last node was returned.

Finally, the system administrators report that sometimes they have pushed jobs through the FIFO by giving them artificially low `enter-fifo' times. Thus the value of the wait field will be bogus.

Conversion Notes

The converted log is available as KTH-SP2-1996-2.swf. The conversion from the original format to SWF was done subject to the following.

This log does not include the job submittal time. However, this can be calculated as tstart - twait, and this was indeed done in the conversion.
The above trick does not work for jobs that got artificial `enter FIFO' times to increase their priority. Luckily, a version of the submit time is also encoded in the job-id. Typically, a job can not have waited for longer than what its job-id indicates. So actually the submit time is calculated as
submit = max{ tstart – twait, jobID.submit }
This correction actually happened 46 times.
The option to request U nodes (from another machine) was only used in 15 jobs, of which 11 requested one such node. In 3 cases the total number of nodes was more than 100. In any case, U nodes were deleted from the job's size, only leaving nodes used on this machine.
The conversion loses the following data, that cannot be represented in the SWF:
- Node type requested
- "Overlap time" during which all the jobs nodes were allocated
The following anomalies were identified in the conversion:
- 219 jobs got more processors than they requested.
- 475 jobs got more runtime than they requested. In 64 cases the extra runtime was larger than 1 minute.
- One job (job 27313) was recorded as having requested and used 0 processors, but terminated successfully. This is a job that used only one U node. It was removed from the log.

The conversion was done by a log-specific parser in conjunction with a more general converter module.

The differences between conversion 2 (reflected in KTH-SP2-1996-2.swf) and conversion 1 (KTH-SP2-1996-1.swf) are

In the conversion 1 the U nodes were left in, so in 3 jobs the size was bigger than the machine size.
In the conversion 1, all timestamps were off by one or two hours, partly due to mishandling daylight saving time. This was corrected in conversion 2.
In conversion 1, jobs that requested nodes without specifying any node type were erroneously recorded as having requested 0 nodes (however, the number of used nodes was correct). This was corrected in conversion 2.

Usage Notes

The log has a cleaned version available as KTH-SP2-1996-2.1-cln.swf. It is recommended that this version be used.

The cleaning consisted of removing the first 14 jobs, as they seem to represent activity from long before the actual logging started.

The Log in Graphics

File KTH-SP2-1996-2.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot utilization offered load performance

Parallel Workloads Archive - Logs