This log contains several months worth of accounting records from the
national grid of the Czech republic, called MetaCentrum.
This grid is composed of 14 clusters (called nodes), each with several
multiprocessor machines, for a total of 806 processors.
The MetaCentrum workload log was graciously provided by
Czech National Grid Infrastructure MetaCentrum.
If you use this log in your work, please use a similar
acknowledgment.
It was made available via the
web
page of Dalibor Klusacek.
Data about failures and maintenance is also available.
MetaCentrum is composed of 14 Linux clusters, with different
configurations, as follows:
Cluster
Processor
Nodes
Total CPUs
0
Itanium2 1.5GHz
8
8
1
Opteron 2.2GHz
16
16
2
Xeon 3.2GHz
10
10
3
Opteron 2.6GHz
5
80
4
AthlonMP 1.6GHz
16
32
5
Xeon 2.4GHz
32
64
6
Xeon 2.7GHz
36
148
7
Xeon 3.1GHz
35
70
8
Opteron 1.6GHz
10
20
9
Opteron 2.4GHz
3
6
10
Opteron 2.0GHz
23
92
11
Xeon 3.0GHz
19
152
12
Xeon 2.7GHz
8
64
13
Xeon 2.3GHz
11
44
Jobs could run on processors from more than one cluster.
While relatively rare, this did happen for 586 jobs in the log.
Scheduling is done with PBSpro, employing a system of 11 queues as
follows:
Queue
Priority
Time limit (hr)
q1
62
720
q2
70
720
q3
50
24
q4
60
2
q5
80
24
q6
65
720
q7
70
720
q8
70
4
q9
70
720
q10
99
720
q11
65
720
Importantly, data about failures and other special circumstances
is
provided together with the log.
This is considered important for reliable evaluations, and in fact is
the main point of the paper that introduced this log:
The original log is available as METACENTRUM-2009-0.
This file contains one line per completed job with the following
tab separated fields:
Job ID
User
Queue
Number of processors used
Number of grid clusters used (originally called nodes)
Properties required by the application (given as a list of
property numbers)
Memory used (KB)
Arrival time (UTC timestamp)
Start time (UTC timestamp)
End time (UTC timestamp)
Duration (seconds)
Exit status
List of assigned processors (space separated)
Conversion Notes
The converted log is available as METACENTRUM-2009-2.swf.
The conversion from the original format to SWF was done subject to the
following.
The status 0 was taken to mean success, and was converted to 1.
All other status values were converted to 0.
1118 jobs were recorded as using 0 memory; this was changed to -1.
The conversion loses the following data, that cannot be
represented in the SWF:
The number of clusters used by each job, as given
in the used clusters field.
The precise list of processors allocated to the job, and
which clusters they belong to.
The properties required by the application.
Note that the meaning of the properties is unknown; they are
simply listed as p1, p2, p3, etc.
The following anomalies were identified in the conversion:
1118 jobs were recorded as using 0 memory; this was changed to -1.
8 of them had "success" status.
All the jobs in the log passed the following two sanity checks:
the duration was equal the difference between the start and
end times, and the length of the list of assigned processors was
equal the number of assigned processors.
The difference between the first conversion (reflected in
METACENTRUM-2009-1.swf) and the second conversion (reflected in
METACENTRUM-2009-2.swf) is
In the first conversion clusters data was not recovered.
In the second conversion it was extracted using the CPU IDs specified
for each job.
For jobs that use more than one cluster, only the first one is noted.
Flurries seem to exist but have not been cleaned yet.
The log contains all the jobs that terminated in the logging period.
Some of these jobs are extremely long, as the maximal runtime allowed
on this system is 30 days.
Thus some of the logged jobs may have started up to 30 days before the
start of the logging period.
As a result the initial portion of the log is extremely sparse.
This effect also occurs (to a lesser degree) towards the end of the
log, because extremely long jobs that run in this period are not logged
because they did not terminate by the end of the logging period.