Parallel Workloads Archive: SDSC Paragon

The San-Diego Supercomputer Center (SDSC) Paragon

System: 416-node Intel Paragon
Duration: January 1995 thru December 1996
Jobs: 76872 in 1995, 38723 in 1996

This log contains two years worth of accounting records for the 416-node Intel Paragon located at the San Diego Supercompter Center (SDSC). For more information about SDSC, see URL http://www.sdsc.edu/. The original logs and some more information (including a log of downtime!) are also available directly from SDSC.

Due to historical reasons, the log is divided to two parts (one per year). These extensive logs contain information about the number of nodes, submit, start, and end times, CPU time used, NQS queues used and their limits, and user. There is no information about the application being run.

The workload logs from the SDSC Paragon were graciously provided by Reagan Moore (moore@sdsc.edu) and Allen Downey (downey@sdsc.edu), who also helped with background information and interpretation. If you use this log in your work, please use a similar acknowledgment.

You can also reference the following:
K. Windisch, V. Lo, R. Moore, D. Feitelson, and B. Nitzberg, ``A comparison of workload traces from two production parallel machines''. In 6th Symp. Frontiers Massively Parallel Comput., pp.319-326, Oct 1996.
This paper compares the Paragon '95 workload with the NASA Ames iPSC workload.

Downloads:

SDSC-Par-1995-0 1.4 MB gz original log
SDSC-Par-1995-3.swf 1 MB gz converted log
SDSC-Par-1995-3.1-cln.swf 0.8 MB gz cleaned log -- RECOMMENDED, see usage notes
SDSC-Par-1996-0 0.8 MB gz original log
SDSC-Par-1996-3.swf 0.6 MB gz converted log
SDSC-Par-1996-3.1-cln.swf 0.5 MB gz cleaned log -- RECOMMENDED, see usage notes
SDSC-Par-1995-1.swf 0.4 MB gz OLD VERSION of converted log (replaced 1 Aug 2006)
SDSC-Par-1995-1.1-cln.swf 0.4 MB gz OLD VERSION of cleaned log (replaced 1 Aug 2006)
SDSC-Par-1996-1.swf 0.4 MB gz OLD VERSION of converted log (replaced 1 Aug 2006)
SDSC-Par-1996-1.1-cln.swf 0.4 MB gz OLD VERSION of cleaned log (replaced 1 Aug 2006)
SDSC-Par-1995-2.swf 1 MB gz OLD VERSION of converted log (replaced 29 Nov 2011)
SDSC-Par-1995-2.1-cln.swf 0.8 MB gz OLD VERSION of cleaned log (replaced 29 Nov 2011)
SDSC-Par-1996-2.swf 0.6 MB gz OLD VERSION of converted log (replaced 29 Nov 2011)
SDSC-Par-1996-2.1-cln.swf 0.5 MB gz OLD VERSION of cleaned log (replaced 29 Nov 2011)
SDSC-Par-1995-down 7 KB gz downtime log
SDSC-Par-1996-down 5 KB gz downtime log
(May need to click with right mouse button to save to disk)

Papers Using these Logs:

These two logs (or partial early versions of them) were used in the following papers: [feitelson96b] [downey97a] [downey97c] [downey98b] [feitelson98b] [smith98] [downey99] [talby99b] [mualem01] [feitelson01] [lawson02] [ernemann03] [lublin03] [song04] [feitelson04b] [feitelson05c] [zilber05] [brevik06] [feitelson06a] [tsafrir06a] [franke06] [ranjan06] [talby07] [feitelson07a] [shmueli07] [liy07] [ranjan08] [iosup08] [feitelson08] [shmueli09] [feitelson09] [minh09] [thebe09] [sodan11] [yuan11] [krakov12] [kumar12] [ababneh12] [zakay14] [feitelson14]

System Environment

The Paragon is a mesh with processing nodes based on the Intel i860 processor. The SDSC system has 416 nodes, divided thus:
48Interactive partition
352Compute partition (for parallel jobs via NQS)
6Service partition (logins etc.)
10I/O partition (parallel file system)
The compute partition, in turn, is divided into 64 nodes reserved for short jobs, and 288 for long jobs.

Batch jobs were handled by NQS, which was configured with a large number of queues. Queue names are parsed as follows:

  q [f] nodes len
The optional `f' indicates the use of fat nodes, with 32MB of memory. 256 nodes in the compute partition are thus configured. Other nodes have 16MB of memory.
`Nodes' is the limit on the number of nodes used. The queues are configured with powers of two.
`Len' is the limit on the wallclock time, and is one of the following:
sShort: 1 hour
mMedium: 4 hours
lLong: 12 hours
There are also two low-priority standby queues.

The scheduling algorithm used on this system is described in detail in the following paper:
Michael Wan, Reagan Moore, George Kremenek, and Ken Steube, "A Batch Scheduler for the Intel Paragon with a Non-Contiguous Node Allocation Algorithm". In Job Scheduling Strategies for Parallel Processing, Dror G. Feitelson, and Larry Rudolph (Eds.), Springer-Verlag, pp. 48-64, 1996, Lect. Notes Comput. Sci. vol. 1162.

Log Format

The original log files are available as SDSC-Par-1995-0 and SDSC-Par-1996-0. These files contains one line per completed job with the following white-space separated fields: The user ID field is sanitized to preserve privacy. However, the two logs were sanitized independently. As a result, it is not clear whether user IDs are consistent from 1995 to 1996. This prevents the concatanation of the logs into a single two-year log.

Conversion Notes

The converted logs are available as SDSC-Par-1995-3.swf and SDSC-Par-1996-3.swf. The conversion from the original format to the standard workload format was done subject to the following. The conversion was done by a log-specific parser in conjunction with a more general converter module.

The differences between conversion 3 (reflected in SDSC-Par-1995-3.swf and SDSC-Par-1996-3.swf) and conversion 2 (SDSC-Par-1995-2.swf and SDSC-Par-1996-2.swf) are

The differences between conversion 2 (reflected in SDSC-Par-1995-2.swf and SDSC-Par-1996-2.swf) and conversion 1 (SDSC-Par-1995-1.swf and SDSC-Par-1996-1.swf) are Regarding the OLD files SDSC-Par-1995-1.swf and SDSC-Par-1996-1.swf, on Dec 16, 2004 the following changes were made:

Usage Notes

These logs contains two types of anomalies that are not representative of normal usage. These have been removed in the cleaned versions of the logs, and it is recommended that these versions be used.
The cleaned logs are available as SDSC-Par-1995-3.1-cln.swf and SDSC-Par-1996-3.1-cln.swf.

The first anomaly is a set of 16 jobs that is executed every day at around 3:45 AM. These are probably automatic jobs that perform some system administration function. They were removed using the following filter:

(user=2 or user=5) and (submit_hour=3 or submit_hour=4)
In 1995, 6604 jobs were thus removed. In 1996, 5301 were removed.

The second anomaly is flurries of very high activity by individual users. Four flurries of different magnitudes occured in 1995. The filters used to remove them were

user=61 and job>17495 and job<25398 (2005 jobs)
user=62 and job>17441 and job<26321 (2139 jobs)
user=66 and job>55419 and job<69428 (8678 jobs)
user=92 and job>69856 and job<76815 (3476 jobs)
In addition, a single flurry occured in 1996. This was removed by the following filter
user=23 and job>3035 and job<4677 (1283 jobs)
Note that the filters were applied to the original logs, and unfiltered jobs remain untouched. As a result, in the filtered logs job numbering is not consecutive.

Further information on flurries and the justification for removing them can be found in:

The Log in Graphics

File SDSC-Par-1995-3.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot utilization offered load performance

File SDSC-Par-1995-3.1-cln.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot utilization offered load performance

File SDSC-Par-1996-3.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot utilization offered load performance

File SDSC-Par-1996-3.1-cln.swf

weekly cycle daily cycle burstiness and active users job size and runtime histograms job size vs. runtime scatterplot utilization offered load performance


Parallel Workloads Archive - Logs