The Potsdam Institute for Climate Impact Research (PIK)
	IBM iDataPlex Cluster log
| 
  | System: | 320-node IBM iDataPlex Cluster |  | Duration: | April 2009 thru July 2012 |  | Jobs: | 742,964 |  
This log contains more than 3 years worth of accounting records from the
320-node IBM iDataPlex cluster at the Potsdam Institute for Climate
Impact Research (PIK) in Germany.
For more information about this installation, see URL
http://www.pik-potsdam.de/services/it/hpc
 
The log starts from when the machine was installed and first put into
use, so the initial part may not include representative workload data.
 
The workload log from the PIK IPLEX was graciously provided
by Ciaron Linstead (linstead@pik-potsdam.de), 
who also helped with background information and interpretation.
If you use this log in your work, please use a similar acknowledgment.
 
 Downloads:(May need to click with right mouse button to save to disk)
There is no cleaned version of this log yet.
 |  | 
System Environment
The 320 nodes in the cluster are each configured with 2 processors
that have 4 cores each, for a total of 8 cores per node, and 2560
cores in the whole system.
Each node also has 32 GB of memory, which is shared by the 8 cores.
The nodes are interconnected by an Infiniband DDR interconnect.
They are divided into two network domains of 160 nodes each.
Jobs cannot span both domains.
The maximal job size observed used 128 nodes (1024 cores).
The system has 800 TB total disk space.
Scheduling is performed using LoadLeveler with the backfilling
scheduler option.
Nodes may be shared by several jobs or allocated exclusively to a
single job.
Reasons for requesting non-shared access are either using all the
cores (that is, running 8 tasks on the node) or using all the physical
memory (so a single task that needs lots of memory will get the whole
node, and leave 7 cores idle).
The system does not run more than one user process per core.
Of the 32 GB of memory on each node 6 GB are reserved for the
operating system, leaving 26 GB for user processes (28672000 KB).
If 8 processes are run on the node, each can get 3500 MB
(3584000 KB).
Log Format
The original log is a set of files, one per month, generated by the
LoadLeveler llsummary -l command.
The files contain a multi-line stanza for each job, where each line is
a field:value pair.
Jobs may further be composed of several steps, which are in effect a
sequence of jobs that are executed one after the other.
If a job includes multiple steps, there will be a separate stanza
describing each one.
Conversion Notes
The converted log is available as PIK-IPLEX-2009-1.swf.
The conversion from the original format to SWF was done subject to the following.
-  User and group were identified using the submitting user ID and
     group ID, not the owner and Unix group.
-  Submit time was based on the queue date field.
-  Start time was based on the start time field, not the dispatch
     time field.
-  All date/time fields are given in a human-readable format that
     includes the timezone and a daylight saving time indication.
     These were converted to UTC using this information.
-  Job status was based on the Status field and the Completion Code
     field, with the following mapping:
     
      
     | Completed | 1 if code=0 |                              | 0 if code not 0 |  | Removed | 5 |  | Not Run | 5 |  | Idle | deleted |  
 Idle means that the job had not been handled yet when this log
     was recorded.
     It is then expected to appear again in the next monthly log, this
     time with its full data, so the first instance can be ignored.
     There were 176 such jobs.
-  If a job had multiple steps, they were assumed to depend on each
     other, and this was recorded in the last two fields of the SWF.
     The think time was calculated as the difference between the end time
     of one step and the start time of the next step.
-  The number of processors was set to 1 if a step was identified as
     Serial.
     If it was parallel, the Min Processors field was used
     (in all but one case this was also equal to the Max Processors
     field).
-  The Class field was used as a proxy for the queue.
-  The requested runtime was taken from the hard limit on wallclock
     time, not the soft limit or the CPU limit.
-  The CPU time was taken from the Step Total Time field (which is
     the sum of the Step User Time field and the Step System Time field).
-  The memory allocated was taken from the Step Real Memory field.
     This is given in gb, so it was multiplied by 1024*1024=1048576 to
     obtain KB as specified by the standard format.
     It was then divided by the number of processors used to obtain KB
     per process.
-  The Cmd field was used to dentify the application being run.
     Note, however, that this may be a script and not the actual
     application.
- 
The conversion loses the following data, that cannot be represented in
the SWF:
-  The list of host nodes that were allocated to each job, and by
     implication how many nodes were used.
-  Whether or not the nodes were sharable or requested for exclusive use.
-  Data about minor and major page faults.
 
- 
The following anomalies were identified in the conversion:
-  2 jobs were recorded as using more processors than the machine size;
     this was changed to the machine size.
-  In one job the number of processors used was missing.
-  Negative runtimes occured 3520 times; these were changed to 0.
     in 77 cases the negative runtimes were larger than 1 min,
     and in 6 cases they were larger than 1 hr 5 min (these were changed to -1).
-  27251 jobs had undefined start times.
     of these jobs, 16541 had "failed" status and 10710 had "success" status.
     in 3218 cases the undefined start times were replaced by the submit time.
     in 10129 cases the start times and runtimes were approximated
     using the CPU time.
-  Negative wait times occured 333 times; they were changed to 0.
     in 20 cases the negative wait times were larger than 1 min, and
     in 25 cases they were larger than 1 hr 5 min (these were changed to -1).
-  21616 jobs had an average CPU time higher than their runtime.
     in 2632 cases the extra CPU time was larger than 1 minute.
-  98955 jobs were recorded as using 0 CPU time; this was changed to -1.
     of these jobs, 30764 had "failed" status and 68191 had "success" status.
-  in 7 jobs the queue (class) was missing.
 
The conversion was done by 
a log-specific parser
in conjunction with a more general
converter module.
The Log in Graphics
File PIK-IPLEX-2009-1.swf
 
 
 
 
 
 
 
 
Parallel Workloads Archive - Logs