The University of Luxemburg Gaia Cluster log
System: |
University of Luxemburg Gaia Cluster |
Duration: |
May to August 2014 |
Jobs: |
51,987 |
This log contains 3 months worth of data from the Gaia
cluster at the University of Luxemburg.
It is used mainly by biologists working with large data problems and
engineering people working with physical simulations.
The workload data includes CPU and memory usage, and also I/O activity
in a separate file (as I/O is not accommodated by the
standard workload format).
The workload log from the Gaia cluster system was graciously provided
by Joseph Emeras (joseph.emeras@gmail.com).
If you use this log in your work, please use a similar acknowledgment.
Downloads:
(May need to click with right mouse button to save to disk)
|
|
System Environment
The Gaia cluster is one of the 4 clusters operated by the ULHPC
(University of Luxembourg HPC Center).
Initially released in 2011, Gaia is now a heterogeneous cluster that
has been upgraded several times.
It currently feature 151 nodes, manufactured by Bull and Dell, with a
total of 2004 cores.
Several nodes (20) feature NVidia Tesla-class GPGPUs accelerators.
Full details about its configuration and history are available from
the University of Luxemburg
site.
The scheduler used is OAR (oar.imag.fr/)
Log Format
The log is available directly in SWF.
It is based on accounting data collected by the scheduler.
In addition, a companion
log with I/O data is available.
For each job, it lists the total amount of data read and written by
all the processes of this job.
The job ID field is the same as in the SWF files, to enable merging the data.
Conversion Notes
There is no data about any problems in the conversion process.
Nevertheless, an SWF parser
(customized for this log) was used in conjunction with a general
converter module to check the file.
The following anomalies were observed and in some cases corrected:
- In 99 jobs runtime was missing and approximated using CPU time.
- In 28 additional "failed" jobs both runtime and CPU time were missing.
- 2,880 jobs were recorded as using 0 CPU time; this was changed to -1.
Of these, 155 had "failed" status, but 2,626 had "success" status.
- 1,464 jobs were recorded as using 0 memory; this was changed to -1.
Of these, 64 had "failed" status, but 1,400 had "success" status.
- 1,500 jobs got more runtime than they requested.
In 285 cases the extra runtime was larger than 1 min.
Due to the heterogeneity of the cluster it is not clear that all jobs
received the same level of service.
This may affect their wait times and maybe also the activity patterns
of certain users.
Activity in the first 4-5 days is very low and probably reflects
remnants of activity from before logging actually started.
There appears to be a flurry by user 8 towards the end of the log;
this has not been cleaned yet.
The Log in Graphics
File UniLu-Gaia-2014-2.swf
Parallel
Workloads
Archive - Logs