################################################################################ 1) The header section of the SWF file: ################################################################################ ; Version: 2.2 ; Computer: 3GHz Pentium-IV Xeon Linux Cluster ; Installation: LPC (Laboratoire de Physique Corpusculaire) ; Installation: Part of the LCG (Large hadron collider Computing Grid project) ; Information: http://www.cs.huji.ac.il/labs/parallel/workload/l_lpc.html ; Information: JSSPP'05 - Workload Analysis of a Cluster in a Grid Environment ; Acknowledge: Emmanuel Medernach - medernac AT clermont.in2p3.fr ; ; Conversion: Dan Tsafrir - dants AT cs.huji.ac.il ; Conversion: Automatically generated by 'pbs2swf.pl' ; Conversion: See: http://www.cs.huji.ac.il/labs/parallel/workload/pbs2swf/ ; ; UnixStartTime: 1091615532 ; TimeZone: 3600 # DEPRECATED; use TimeZoneString + tzset() ; TimeZoneString: Europe/Paris ; StartTime: Wed Aug 04 12:32:12 CEST 2004 ; EndTime: Wed May 11 16:07:34 CEST 2005 ; ; MaxJobs: 244821 ; MaxRecords: 244821 ; MaxNodes: 70 (dual) ; MaxProcs: 140 ; MaxRuntime: 259200 ; AllowOveruse: False ; Preemption: No ; ; Queues: Queues enforce a runtime limit on the jobs that populate them. ; Queues: See URL in 'Information' for details. ; Queues: ----------------- ; Queues: SWF-ID PBS-name ; Queues: ----------------- ; Queue: 1 test ; Queue: 2 short ; Queue: 3 long ; Queue: 4 day ; Queue: 5 infinite ; Queue: 6 batch ; MaxQueues: 6 ; ; Partitions: One small partition, later replaced by two disjoint partitions. ; Partitions: See URL in 'Information' for details. ; Partitions: ----------------- ; Partitions: SWF-ID PBS-name ; Partitions: ----------------- ; Partition: 1 clrglop195.in2p3.fr ; Partition: 2 clrce01.in2p3.fr ; Partition: 3 clrce02.in2p3.fr ; MaxPartitions: 3 ; ; Note: The 'mem_used' column relates to: quantity=per_job type=virtual ; Note: The 'cpu_used' column relates to: quantity=per_process ; Note: The 'mem_req' column relates to: quantity=per_job type=physical ; Note: proc_used=0 means job was canceled before it started to run ; Note: Jobs are always serial. ################################################################################ 2) GENERAL: ################################################################################ jobs : 244821 started : 230448 canceled before start : 14373 mem_req : type=physical quantity=per_job mem_used : type=virtual quantity=per_job cpu_used : quantity=per_process ################################################################################ 3) PBS FIELDS USED FOR CONVERSION (jobs_count[pbsfield]): ################################################################################ SWF FIELD | CONVERT# | PBS FIELD submit | 230448 | 230448[ctime] wait | 230448 | 230448[start-ctime] runtime | 230448 | 352[etime-ctime] + 230096[resources_used.walltime] proc_used | 230448 | 8[Resource_List.nodes] + 205461[Resource_List.nodect] + 24979[default=1] cpu_used | 230448 | 230096[resources_used.cput] + 352[failures] mem_used | 230448 | 230096[resources_used.vmem] + 352[failures] proc_req | 230448 | 8[Resource_List.nodes] + 205461[Resource_List.nodect] + 24979[default=1] cpu_req | 230448 | 230448[Resource_List.walltime] mem_req | 230448 | 480[Resource_List.mem] + 229968[failures] status | 230448 | 230448[Exit_status] uid | 230448 | 230448[user] gid | 230448 | 230448[group] executable | 230448 | 230448[jobname] // default=-1 overwrites all values queue | 244821 | 244821[queue] partition | 244821 | 244821[partition] ################################################################################ 4) 5 MOST POPULAR VALUES (before applying 'overwrite' defaults): ################################################################################ SWF FIELD | 5 MOST POPULAR VALUES wait | 3[20%] 1[16%] 4[14%] 2[11%] 5[ 8%] (sum=68%) runtime | 5[ 7%] 6[ 7%] -1[ 6%] 7[ 5%] 2[ 4%] (sum=28%) proc_used | 1[94%] 0[ 6%] (sum=100%) cpu_used | 3[23%] 2[23%] 0[13%] -1[ 6%] 30[ 2%] (sum=68%) mem_used | 0[47%] -1[ 6%] 4888[ 3%] 4896[ 2%] 4880[ 1%] (sum=58%) proc_req | 1[100%] (sum=100%) cpu_req | 7200[29%] 259200[20%] 900[19%] 86400[13%] 129600[ 8%] (sum=89%) mem_req | -1[100%] 819200[ 0%] (sum=100%) status | 1[89%] 5[ 7%] 0[ 4%] (sum=100%) uid | 1[17%] 27[13%] 2[10%] 24[10%] 16[ 9%] (sum=59%) gid | 2[44%] 1[39%] -1[ 6%] 3[ 4%] 5[ 3%] (sum=96%) executable | 1[88%] 6[ 6%] -1[ 6%] 5[ 0%] 3[ 0%] (sum=100%) queue | 2[36%] 5[21%] 1[20%] 3[14%] 4[ 8%] (sum=99%) partition | 2[49%] 3[29%] 1[22%] (sum=100%) ################################################################################ 5) PROBLEMS SUMMARY: ################################################################################ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PROBLEM: FLOW_ARRIVAL OUTCOME: problem in flow of job's records; lost job RECORDS#: 1 DESCRIPTION: job's first record <> Q (1st Q indicates arrival) EXAMPLES: 1: 11/29/2004 09:12:49;E;290.clrglop195.in2p3.fr;user=biomed002 group=biomed jobname=STDIN queue=infinite ctime=1101456561 qtime=1101456561 etime=1101456561 start=1101456686 exec_host=clrwn26.in2p3.fr/1 Resource_List.cput=48:00:00 Resource_List.neednodes=clrwn26.in2p3.fr Resource_List.nodect=1 Resource_List.nodes=1 Resource_List.walltime=72:00:00 session=20919 end=1101715969 Exit_status=271 resources_used.cput=00:52:51 resources_used.mem=18132kb resources_used.vmem=31920kb resources_used.walltime=72:01:21 [ type=E ] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ PROBLEM: FLOW_UNFINISHED OUTCOME: informative; lost job RECORDS#: 51 DESCRIPTION: PBS-log does not include an ending record of the job EXAMPLES: 1: 78875.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 2: 78876.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 3: 78877.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 4: 78878.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 5: 78879.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 6: 78880.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 7: 78881.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 8: 78882.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 9: 78883.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 10: 78884.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 11: 78885.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 12: 78886.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 13: 108444.clrce01.in2p3.fr [ last state = CANCELED_RUNNING ] 14: 122240.clrce01.in2p3.fr [ last state = WAITING ] 15: 122241.clrce01.in2p3.fr [ last state = WAITING ] 16: 5393.clrce02.in2p3.fr [ last state = CANCELED_RUNNING ] 17: 5395.clrce02.in2p3.fr [ last state = CANCELED_RUNNING ] 18: 49097.clrce02.in2p3.fr [ last state = CANCELED_RUNNING ] 19: 49923.clrce02.in2p3.fr [ last state = RUNNING ] 20: 49924.clrce02.in2p3.fr [ last state = RUNNING ] 21: 49925.clrce02.in2p3.fr [ last state = RUNNING ] 22: 49926.clrce02.in2p3.fr [ last state = RUNNING ] 23: 49927.clrce02.in2p3.fr [ last state = RUNNING ] 24: 49929.clrce02.in2p3.fr [ last state = RUNNING ] 25: 49930.clrce02.in2p3.fr [ last state = RUNNING ] 26: 49931.clrce02.in2p3.fr [ last state = RUNNING ] 27: 36331.clrglop195.in2p3.fr [ last state = WAITING ] 28: 36332.clrglop195.in2p3.fr [ last state = WAITING ] 29: 36337.clrglop195.in2p3.fr [ last state = WAITING ] 30: 36340.clrglop195.in2p3.fr [ last state = WAITING ] 31: 36341.clrglop195.in2p3.fr [ last state = WAITING ] 32: 36343.clrglop195.in2p3.fr [ last state = WAITING ] 33: 36344.clrglop195.in2p3.fr [ last state = WAITING ] 34: 36346.clrglop195.in2p3.fr [ last state = WAITING ] 35: 36349.clrglop195.in2p3.fr [ last state = WAITING ] 36: 36350.clrglop195.in2p3.fr [ last state = WAITING ] 37: 36351.clrglop195.in2p3.fr [ last state = WAITING ] 38: 36353.clrglop195.in2p3.fr [ last state = WAITING ] 39: 36354.clrglop195.in2p3.fr [ last state = WAITING ] 40: 36356.clrglop195.in2p3.fr [ last state = WAITING ] 41: 36360.clrglop195.in2p3.fr [ last state = WAITING ] 42: 36363.clrglop195.in2p3.fr [ last state = WAITING ] 43: 36364.clrglop195.in2p3.fr [ last state = WAITING ] 44: 36366.clrglop195.in2p3.fr [ last state = WAITING ] 45: 36395.clrglop195.in2p3.fr [ last state = RUNNING ] 46: 36406.clrglop195.in2p3.fr [ last state = WAITING ] 47: 36583.clrglop195.in2p3.fr [ last state = RUNNING ] 48: 36588.clrglop195.in2p3.fr [ last state = WAITING ] 49: 38074.clrglop195.in2p3.fr [ last state = WAITING ] 50: 38081.clrglop195.in2p3.fr [ last state = WAITING ] 51: 38082.clrglop195.in2p3.fr [ last state = WAITING ] ################################################################################ 6) ANONYMIZATION TABLES (before applying 'overwrite' defaults): ################################################################################ ---------- anonymizing 3 'partition' values ---------- clrglop195.in2p3.fr -> 1 [ 53828 = 22.0% of the jobs] clrce01.in2p3.fr -> 2 [120123 = 49.1% of the jobs] clrce02.in2p3.fr -> 3 [ 70870 = 28.9% of the jobs] ---------- anonymizing 6 'queue' values ---------- test -> 1 [ 48062 = 19.6% of the jobs] short -> 2 [ 88557 = 36.2% of the jobs] long -> 3 [ 34372 = 14.0% of the jobs] day -> 4 [ 19764 = 8.1% of the jobs] infinite -> 5 [ 52553 = 21.5% of the jobs] batch -> 6 [ 1513 = 0.6% of the jobs] ---------- anonymizing 10 'gid' values ---------- dteam -> 1 [ 94474 = 38.6% of the jobs] dteam005 -> 1 [ 94474 = 38.6% of the jobs] biomed -> 2 [108348 = 44.3% of the jobs] biomgrid -> 2 [108348 = 44.3% of the jobs] lhcb -> 3 [ 9709 = 4.0% of the jobs] alice -> 4 [ 38 = 0.0% of the jobs] cms -> 5 [ 8564 = 3.5% of the jobs] atlas -> 6 [ 7979 = 3.3% of the jobs] sixt001 -> 7 [ 4 = 0.0% of the jobs] dzero -> 8 [ 1332 = 0.5% of the jobs] ---------- anonymizing 56 'uid' values ---------- dteam001 -> 1 [ 42363 = 17.3% of the jobs] biomed001 -> 2 [ 24073 = 9.8% of the jobs] dteam002 -> 3 [ 5351 = 2.2% of the jobs] dteam004 -> 4 [ 599 = 0.2% of the jobs] lhcbsgm -> 5 [ 12 = 0.0% of the jobs] dteam005 -> 6 [ 2100 = 0.9% of the jobs] lhcb001 -> 7 [ 5762 = 2.4% of the jobs] alice001 -> 8 [ 2 = 0.0% of the jobs] dteam003 -> 9 [ 1091 = 0.4% of the jobs] dteam006 -> 10 [ 305 = 0.1% of the jobs] dteam007 -> 11 [ 142 = 0.1% of the jobs] cms001 -> 12 [ 81 = 0.0% of the jobs] lhcb002 -> 13 [ 2335 = 1.0% of the jobs] atlas001 -> 14 [ 734 = 0.3% of the jobs] atlas002 -> 15 [ 889 = 0.4% of the jobs] biomed002 -> 16 [ 22996 = 9.4% of the jobs] biomed003 -> 17 [ 7996 = 3.3% of the jobs] biomed004 -> 18 [ 5682 = 2.3% of the jobs] biomgrid -> 19 [ 16006 = 6.5% of the jobs] lhcb003 -> 20 [ 1600 = 0.7% of the jobs] atlassgm -> 21 [ 3052 = 1.2% of the jobs] dteam008 -> 22 [ 729 = 0.3% of the jobs] dteam009 -> 23 [ 618 = 0.3% of the jobs] biomed005 -> 24 [ 23273 = 9.5% of the jobs] dteam010 -> 25 [ 123 = 0.1% of the jobs] dteam037 -> 26 [ 1 = 0.0% of the jobs] dteam042 -> 27 [ 32651 = 13.3% of the jobs] biomed006 -> 28 [ 193 = 0.1% of the jobs] biomed007 -> 29 [ 19 = 0.0% of the jobs] sixt001 -> 30 [ 4 = 0.0% of the jobs] dteamsgm -> 31 [ 14 = 0.0% of the jobs] dteam011 -> 32 [ 73 = 0.0% of the jobs] cms002 -> 33 [ 9 = 0.0% of the jobs] cms003 -> 34 [ 1 = 0.0% of the jobs] dteam015 -> 35 [ 6927 = 2.8% of the jobs] biomed015 -> 36 [ 8109 = 3.3% of the jobs] biomed008 -> 37 [ 1 = 0.0% of the jobs] alicesgm -> 38 [ 1 = 0.0% of the jobs] cmssgm -> 39 [ 8 = 0.0% of the jobs] dteam012 -> 40 [ 138 = 0.1% of the jobs] dteam013 -> 41 [ 9 = 0.0% of the jobs] dzerosgm -> 42 [ 46 = 0.0% of the jobs] dzero001 -> 43 [ 1270 = 0.5% of the jobs] atlas003 -> 44 [ 718 = 0.3% of the jobs] dzero002 -> 45 [ 16 = 0.0% of the jobs] dteam050 -> 46 [ 424 = 0.2% of the jobs] aliprod -> 47 [ 35 = 0.0% of the jobs] atlas004 -> 48 [ 1786 = 0.7% of the jobs] dteam014 -> 49 [ 4 = 0.0% of the jobs] atlas005 -> 50 [ 451 = 0.2% of the jobs] atlas006 -> 51 [ 210 = 0.1% of the jobs] atlas007 -> 52 [ 138 = 0.1% of the jobs] atlas008 -> 53 [ 1 = 0.0% of the jobs] dteam017 -> 54 [ 808 = 0.3% of the jobs] cms050 -> 55 [ 8465 = 3.5% of the jobs] dteam040 -> 56 [ 4 = 0.0% of the jobs] ---------- anonymizing 12 'executable' values ---------- STDIN -> 1 [215125 = 87.9% of the jobs] toto -> 2 [ 2 = 0.0% of the jobs] essai -> 3 [ 42 = 0.0% of the jobs] toto.job -> 4 [ 1 = 0.0% of the jobs] essai.job -> 5 [ 217 = 0.1% of the jobs] test.job -> 6 [ 15023 = 6.1% of the jobs] test1.job -> 7 [ 2 = 0.0% of the jobs] testjob.scr -> 8 [ 7 = 0.0% of the jobs] test.sh -> 9 [ 8 = 0.0% of the jobs] job -> 10 [ 18 = 0.0% of the jobs] job.sh -> 11 [ 2 = 0.0% of the jobs] date -> 12 [ 1 = 0.0% of the jobs] ################################################################################ 7) OVERWRITING VALUES (possibly reported above as frequent/anonymized): ################################################################################ 1) overwriting field 'executable' with value '-1' [applied to all jobs]