Parallel Workloads Archive: SDSC Blue Horizon

The San Diego Supercomputer Center (SDSC) Blue Horizon log

System: 144-node IBM SP, with 8 processors per node
Duration: Apr 2000 thru Jan 2003
Jobs: 250,440

An extensive log, starting when the machine was just installed, and then covering more than two years of production use. It contains information on the requested and used nodes and time, CPU time, submit, wait and run times, and user.

The workload log from the SDSC Blue Horizon was graciously provided by Travis Earheart and Nancy Wilkins-Diehr, who also helped with background information and interpretation. If you use this log in your work, please use a similar acknowledgment.

Downloads:

SDSC-BLUE-2000-0 6.0 MB gz original log
SDSC-BLUE-2000-4.swf 3.9 MB gz converted log
SDSC-BLUE-2000-4.2-cln.swf 3.8 MB gz cleaned log -- RECOMMENDED, see usage notes
SDSC-BLUE-2000-2.swf 3.9 MB gz OLD VERSION of converted log (replaced 1 Aug 2006)
SDSC-BLUE-2000-2.1-cln.swf 3.8 MB gz OLD VERSION of cleaned log (replaced 1 Aug 2006)
SDSC-BLUE-2000-3.swf 3.9 MB gz OLD VERSION of converted log (replaced 1 Dec 2011)
SDSC-BLUE-2000-3.1-cln.swf 3.8 MB gz OLD VERSION of cleaned log (replaced 1 Dec 2011)
SDSC-BLUE-2000-4.1-cln.swf 3.8 MB gz OLD VERSION of cleaned log (replaced 27 Jan 2015)
(May need to click with right mouse button to save to disk)

Papers Using this Log:

This log was used in the following papers: [feitelson04b] [feitelson05c] [feitelson05d] [talby05] [tsafrir05b] [sabin05] [zilber05] [feitelson06a] [tsafrir06a] [tsafrir06b] [ranjan06] [tsafrir07a] [feitelson07a] [lee07] [tsafrir07b] [talby07] [shmueli07] [esbaugh07] [ranjan08] [iosup08] [feitelson08] [shmueli09] [feitelson09] [guim09] [pascual09] [aida09] [tsafrir10] [sodan10] [sodan11] [vandenbossche11] [sheikhalishahi11] [lindsay12] [utrera12] [sheikhalishahi12] [kubert12] [niu12] [krakov12] [kumar12] [zakay12] [klusacek12] [etinski12] [ababneh12] [zakay13] [liang13] [chen13] [yang13] [rajbhandary13] [sheikhalishahi14] [cao14] [kumar14] [zakay14] [zakay14b] [feitelson14] [lucarelli17] [carastans17] [ntakpe17] [wang18] [hai20]

System Environment

The total machine size is 144 nodes. Each is an 8-way SMP with a crossbar connecting the processors to a shared memory. These nodes are for batch use, with jobs submitted using LoadLeveler. The data available here comes from LoadLeveler.

The log also contains interactive jobs up to July 2002. At about that time an additional 15 nodes were acquired for interactive use, e.g. development. These nodes have only ethernet communication, employ timesharing scheduling, and reportedly have only 4 processors each. These nodes are handled by a separate instance of LoadLeveler, and their workload is not available here.

The scheduler used on the machine is called Catalina. This was developed at SDSC, and is similar to other batch schedulers. It uses a priority queue, performs backfilling, and supports reservations.

Jobs are submitted to a set of queues. The main ones are
Name Time limitNode limit
interactive2hr8
express 2hr 8
high 36hr--
normal 36hr--
low -- --
According to on-line documentation, towards the end of 2001 the limits were different:
Name Time limitNode limit
interactive2hr--
express 2hr 8
high 18hr31
normal 18hr31
low 18hr31

For more information see the NPACI user guide.

Log Format

The original log is available as SDSC-BLUE-2000-0. This was originally provided as three separate yearly files, which have been concatanated to produce this file.

The data contains one line per job with the following white-space separated fields:

Conversion Notes

The converted log is available as SDSC-BLUE-2000-4.swf. The conversion from the original format to SWF was done subject to the following. The conversion was done by a log-specific parser in conjunction with a more general converter module.

The differences between conversion 4 (reflected in SDSC-BLUE-2000-4.swf) and conversion 3 (SDSC-BLUE-2000-3.swf) are mainly due to new logic to handle inconsistent times. For example, in conversion 3 when negative wait times were encountered the submit time was moved back, but in converion 4 it is not (effectively shifting the start and end times instead).

The differences between conversion 3 (reflected in SDSC-BLUE-2000-3.swf) and conversion 2 (SDSC-BLUE-2000-2.swf) are

The differences between conversion 2 (reflected in SDSC-BLUE-2000-2.swf) and conversion 1 (SDSC-BLUE-2000-1.swf) are

Usage Notes

The original log contains several flurries of very high activity by individual users, which may not be representative of normal usage. These were removed in the cleaned version, and it is recommended that this version be used. In addition, the first 8 jobs were removed.
The cleaned log is available as SDSC-BLUE-2000-4.2-cln.swf.

A flurry is a burst of very high activity by a single user. The filters used to remove the three flurries that were identified are

user=68 and job>57 and job<565 (477 jobs)
user=342 and job>88201 and job<91149 (1468 jobs)
user=269 and job>200424 and job<217011 (5181 jobs)
Removing the first 8 jobs was added in the second cleaned version, as they seem to represent activity from long before the actual logging started. Note that the filters were applied to the original log, and unfiltered jobs remain untouched. As a result, in the filtered logs job numbering is not consecutive and does not start from 1.

Further information on flurries and the justification for removing them can be found in:

The Log in Graphics

File SDSC-BLUE-2000-0 (before conversion)

bad daily cycle

File SDSC-BLUE-2000-4.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot utilization offered load performance

File SDSC-BLUE-2000-4.2-cln.swf (cleaned)

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot utilization offered load performance


Parallel Workloads Archive - Logs