Parallel Workloads Archive: Jann et al '97

The Jann et al 1997 Model

This workload model is a statistical model of the workload observed on a 322-node partition of the CTC SP2 from June 25, 1996 to September 12, 1996 (which corresponds to the first part of the CTC SP2 log available at this web site). During this period, 17440 jobs were executed.

The model is based on finding Hyper-Erlang distributions of common order that match the first three moments of the observed distributions. Such distributions are characterized by 4 parameters:

Note that this distribution is a generalization of the exponential (one branch, one stage), hyper-exponential (two branches, one stage), and erlang (one branch, n stages) distributions.

As the characteristics of jobs with different degrees of parallelism differ, the full range of degrees of parallelism is first divided into subranges. This is done based on powers of two, or on multiples of five. A separate model of the interarrival times and the service times (runtimes) is found for each range. If there are 10 such ranges, this leads to a total of 40 parameters. The models for the different ranges are rather different. Most required only a single stage, but some required as many as 7 to obtain the desired accuracy.

Tables with all the parameter values are available in the paper describing this model. They are also hardcoded into the extended version of the code available here. The original sample code only handles one subrange of sizes.

The general framework of this model was later re-used for the workload on the ASCI Blue machine. The only difference was using different values for the various parameters. These values are available in another paper.


Parallel Workloads Archive - Models