These one-year logs, presented and thoroughly analyzed in [li04], are fundamentally different than the rest of the logs in the archive, in that they record workloads produced by parallel and distributed computing research communities (located at five different universities in the Netherlands), rather than workload of "regular" production machine users. This fact is reflected in the very low utilization exhibited in the logs. DAS2 stands for "Distributed ASCI Supercomputer-2". ASCI stands for "Advanced School for Computing and Imaging in the Netherlands".
DAS2 is essentially a grid composed of five clusters, and therefore co-allocation (running a single job on two or more remote clusters) is possible and actually used. Unfortunately, there is no record regarding co-allocation, that is, if two (or more) seemingly distinctive jobs running in two (or more) clusters, actually compose a single co-allocated job. For this reason, we have chosen not to merge the five traces to one (as this would be misleading). The original logs contain most of the data as specified in the SWF and therefore the converted SWF-version loose no data (see details below). Further information about DAS2 is available at http://www.cs.vu.nl/das2.
The origin of the five DAS2 traces is:
# | Cluster Name | Location | CPUs | Jobs |
1 | fs0 | Vrije Univ. Amsterdam | 144 | 225,711 |
2 | fs1 | Leiden Univ. | 64 | 40,315 |
3 | fs2 | Univ. of Amsterdam | 64 | 66,429 |
4 | fs3 | Delft Univ. of Technology | 64 | 66,737 |
5 | fs4 | Utrecht Univ. | 64 | 33,795 |
The workload logs from DAS2 were graciously provided by the authors of [li04]:
Author | From | |
Hui Li | Leiden Univ. | hli AT liacs.nl |
David Groep | National Institute for Nuclear High Energy Physics, The Netherlands |
davidg AT nikhef.nl |
Lex Wolters | Leiden Univ. | llexx AT liacs.nl |
# | Cluster Name |
File | Size |
1 | fs0 | DAS2-fs0-2003-1.swf | 2.2M |
2 | fs1 | DAS2-fs1-2003-1.swf | 376K |
3 | fs2 | DAS2-fs2-2003-1.swf | 641K |
4 | fs3 | DAS2-fs3-2003-1.swf | 575K |
5 | fs4 | DAS2-fs4-2003-1.swf | 311K |
The content of the original gzipped tar file DAS2-2003-0.tgz |
| ||||||||||||||||||||||||||||||||||||||||||||||||||
Record fields structure of *.trace files |
|
||||||||||||||||||||||||||||||||||||||||||||||||||
Record fields structure of *.cancelled files |
|
The issue of cancelled jobs |
The *.trace files only contain some of the cancelled jobs: those that were cancelled by users after they were started. The *.cancelled files, in addition to containing the started jobs, also contain jobs that were cancelled by users before they were started. Consequently, the JobID field (of OpenPBS), as described above, may be used as a cross reference value between a fsN.trace file and the associated fsN.cancelled file. The number of jobs as reported in [li04] is smaller than the number of jobs found in the *.swf files since the data of fsN.trace and fsN.cancelled has been merged. Unfortunately, the only information available about jobs that were cancelled before being started, is their submission time and the "lag" time (which is assigned to the "wait" SWF field), other fields of such jobs are set to -1. There exist inconsistencies between fsN.trace and the associated fsN.cancelled e.g. jobs with OpenPBS-Status=0 (successful completion) in the former, might sometimes appear in the latter. We have chosen to use fsN.cancelled as the definitive criterion in determining whether jobs were cancelled or not and therefore any job that appears in fsN.cancelled will have an SWF-status=5 (cancelled), even if the data in fsN.trace indicates otherwise. |
||||||||||||||||||||||||||||||||||||||||||||||||
Errors in the original files (as reported in the swf/*.err files). |
|
File DAS2-fs1-2003-1.swf
File DAS2-fs2-2003-1.swf
File DAS2-fs3-2003-1.swf
File DAS2-fs4-2003-1.swf