Parallel Workloads Archive: KTH SP2

The Swedish Royal Institute of Technology (KTH) IBM SP2 log

System: 100-node IBM SP2
Duration: October 1996 thru August 1997
Jobs: 28,490

This log contains eleven months worth of accounting records from the 100-node IBM SP2 at the Swedish Royal Institute of Technology (KTH) in Stockholm. For more information about this installation, see URL http://www.pdc.kth.se

Note that the first couple of weeks of the log exhibit a somewhat reduced utilization. this could indicate that the system's configuration was different during this period. However the effect is modest and its duration relatively short. The cleaned version of the log disposes of much of the problem.

The workload log from the KTH SP2 was graciously provided by Lars Malinowsky (lama@pdc.kth.se), who also helped with background information and interpretation. If you use this log in your work, please use a similar acknowledgment.

Downloads:

KTH-SP2-1996-0 0.74 MB gz original log
KTH-SP2-1996-2.swf 0.41 MB gz converted log
KTH-SP2-1996-2.1-cln.swf 0.41 MB gz cleaned log -- RECOMMENDED, see usage notes
KTH-SP2-1996-1.swf 0.40 MB gz OLD VERSION of converted log (replaced 1 Aug 2006)
(May need to click with right mouse button to save to disk)

There is no cleaned version of this log as no serious anomalies have been found so far.

Papers Using this Log:

This log was used in the following papers: [feitelson98] [talby99a] [talby99b] [zotkin99] [cirne00] [mualem01] [feitelson01] [cirne01b] [streit02] [srinivasan02] [lawson02] [lublin03] [shmueli03] [ernemann03] [feitelson03a] [song04] [streit04] [feitelson04b] [feitelson05c] [feitelson05d] [talby05] [tsafrir05b] [shmueli05] [zilber05] [tsafrir06b] [shmueli06] [franke06] [iosup06] [ranjan06] [tsafrir07a] [feitelson07a] [talby07] [shmueli07] [ranjan08] [iosup08] [feitelson08] [shmueli09] [feitelson09] [folling09] [minh09] [thebe09] [tsafrir10] [sodan11] [lindsay12] [liux12] [krakov12] [zakay12] [klusacek12] [deng13] [zakay13] [liang13] [krakov13] [rajbhandary13] [sheikhalishahi14] [zakay14] [zakay14b] [feitelson14] [liu15] [lucarelli17] [soysal19]

System Environment

The 100 nodes in the batch pool are divided into different types:
numbercodetype
88Tthin2
10Wwide
2Zwide with more memory
64Uanother remote machine (experimental)

Over the period of time covered the actual number of nodes available has fluctuated due to PEs being set aside for reserved/interactive/course use, as PIOFS-servers, upgrades, service, etc.

The system imposes limits on job run times, and this was changes a couple of times during the period that the log was recorded. The limits in effect were as follows.

Prior to June 6, 1997 Weekdays between 0700 and 1600: limit of 4h.
Weekday nights: limit of 15h.
Weekends: limit of 60h.
However, no jobs could run across 0700 and 1600 weekdays (synchronization points.)

June 6 to July 15, 1997 Same as the above, but the restriction due to the synchronization points was removed. For example, a job could start at 0600 on a weekday, and it would have to terminate by 1100, so as not to violate the 4-hour rule that came into effect at 0700.

Starting July 15, 1997 Same as the above, but the 4-hour restriction during weekdays only applies to 64T and 2W nodes. The rest of the nodes allow a 15-hour limit even during weekdays. In addition, a `fallback' mechanism was activated; for example, if you request a T node you might get any kind of TWZ.

Log Format

The original log is available as KTH-SP2-1996-0.

This file contains one line per completed job with the following white-space separated fields:

Elapsed and aggregate times are reported in a unique format, with the hours and minutes separated by the letter `h'. For example, 4h is 4 hours, 0h02 is 2 minutes, and 84h25 is 84 hours and 25 minutes (about 3.5 days).

Note that uwall is not the same as the run time usually reported in other logs. A better match to common practice is to calculate tstop - tstart, the time from when all nodes became available and the job started running until the last node was returned.

Finally, the system administrators report that sometimes they have pushed jobs through the FIFO by giving them artificially low `enter-fifo' times. Thus the value of the wait field will be bogus.

Conversion Notes

The converted log is available as KTH-SP2-1996-2.swf. The conversion from the original format to SWF was done subject to the following. The conversion was done by a log-specific parser in conjunction with a more general converter module.

The differences between conversion 2 (reflected in KTH-SP2-1996-2.swf) and conversion 1 (KTH-SP2-1996-1.swf) are

Usage Notes

The log has a cleaned version available as KTH-SP2-1996-2.1-cln.swf. It is recommended that this version be used.

The cleaning consisted of removing the first 14 jobs, as they seem to represent activity from long before the actual logging started.

The Log in Graphics

File KTH-SP2-1996-2.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot utilization offered load performance


Parallel Workloads Archive - Logs