Parallel Workloads Archive: pbs2swf - Converting PBS Logs to SWF

zcat LPC-EGEE-2004-0old.pbs.gz LPC-EGEE-2004-0ce1.pbs.gz LPC-EGEE-2004-0ce2.pbs.gz | pbs2swf.pl \
                                                                                          \
               --output=l_lpc                                                             \
                                                                                          \
            --proc_used=1,started                                                         \
             --proc_req=1,all                                                             \
           --executable=-1,all,overwrite                                                  \
                                                                                          \
         --mem_req.type=physical                                                          \
                                                                                          \
  --anonymize.partition=clrglop195.in2p3.fr:1                                             \
  --anonymize.partition=clrce01.in2p3.fr:2                                                \
  --anonymize.partition=clrce02.in2p3.fr:3                                                \
                                                                                          \
      --anonymize.queue=test:1                                                            \
      --anonymize.queue=short:2                                                           \
      --anonymize.queue=long:3                                                            \
      --anonymize.queue=day:4                                                             \
      --anonymize.queue=infinite:5                                                        \
      --anonymize.queue=batch:6                                                           \
                                                                                          \
        --anonymize.gid=dteam:1                                                           \
        --anonymize.gid=dteam005:1                                                        \
        --anonymize.gid=biomed:2                                                          \
        --anonymize.gid=biomgrid:2                                                        \
                                                                                          \
             --Computer="3GHz Pentium-IV Xeon Linux Cluster"                              \
         --Installation="LPC (Laboratoire de Physique Corpusculaire)"                     \
         --Installation="Part of the LCG (Large hadron collider Computing Grid project)"  \
          --Information="http://www.cs.huji.ac.il/labs/parallel/workload/l_lpc.html"      \
          --Information="JSSPP'05 - Workload Analysis of a Cluster in a Grid Environment" \
          --Acknowledge="Emmanuel Medernach - medernac AT clermont.in2p3.fr"              \
           --Conversion="Dan Tsafrir - dants AT cs.huji.ac.il"                            \
             --MaxNodes="70  (dual)"                                                      \
             --MaxProcs=140                                                               \
       --TimeZoneString="Europe/Paris"                                                    \
           --MaxRuntime=259200                                                            \
         --AllowOveruse=False                                                             \
               --Queues="Queues enforce a runtime limit on the jobs that populate them."  \
               --Queues="See URL in 'Information' for details."                           \
           --Partitions="One small partition, later replaced by two disjoint partitions." \
           --Partitions="See URL in 'Information' for details."                           \
                 --Note="Jobs are always serial."

#	file	description
1	pbs2swf.pl	The conversion script. Uses the modules below.
2	ConversionConfig.pm	Some global configuration variables ("constants") that are used throughout.
3	ConversionLog.pm	Everything related to creating the conversion summary report..
4	ParseArgv.pm	Parsing command line arguments of pbs2swf.pl and setting ConversionConfig.pm accordingly.
5	ParsePBS.pm	Perform the actual parsing of PBS logs.
6	PrintSWF.pm	Print the data parsed by ParsePBS.pm in SWF format.

option flag	meaning	details
--proc_used=1,started --proc_req=1,all	Number of requested- (all jobs) and used- (started jobs) processors is set to be 1.	The size of all the jobs in the LPC log is 1. However some PBS records are missing this data. We therefore decide that the number of requested-processors (`proc_req`) of all the jobs is 1. This is also true with respect to used-processors (`proc_used`) but only for jobs that have actually `started` to run. Jobs that were canceled before this point, are always assigned with `proc_used=0` by the pbs2swf.pl script.
--executable=-1,all,overwrite	Set the executable of all jobs in the SWF version to be undefined (-1).	This data is actually available for all the started jobs (hence we `overwrite` it), but is meaningless, because it species the names of the PBS submittal scripts rather than names of actual applications. And so, almost 88% of the jobs specify `"STDIN"` as their executable name. Another 6% specify `"test.job"`, and another 6% are canceled jobs (before start), thus their executable name is missing form the PBS log altogether. This leaves us with only tens of jobs that also usually have names like `"test1.job"`, `"job.sh"` etc.
--mem_req.type=physical	SWF data regarding requested memory is associated with physical (rather than virtual) memory.	By default, pbs2swf.pl prefers extracting data from the PBS log that is associated with virtual memory. However, no such data is available in the LPC log, whereas some data specifying requested physical memory is in fact available (but only for 480 jobs).
--anonymize.partition=*	Explicitly associating PBS partitions with SWF codes that reflect the chronological order in which they were defined.	For example, the earliest partition is the 'old' one (`clrglop195.in2p3.fr`) and so it is set to be partition number 1. If SWF codes were not assigned explicitly, they would have been assigned arbitrarily by pbs2swf.pl.
--anonymize.queue=*	Explicitly associating PBS queues with SWF codes such that the bigger the code, the longer the jobs that may populate it.	For example, the `'test'` queue has the smallest limit on the requested runtime of the jobs that may populate it, and so it is set to be queue number 1.
--anonymize.gid=*	Unite PBS groups that appear different but are actually the same.	For example, PBS jobs associated with groups `'dteam'` and `'dteam005'` actually originate from the same group (which is indeed collectively referred to as `'dteam'` in [medernach05]). And so, they are both explicitly assigned to the same SWF group code 1.
Others	Some predefined SWF header fields.	Including only those that pbs2swf.pl cannot compute by itself (those that can be computed, may not be given as command line options).

option group	option	m a n d a t o r y	m u l t i p l e	default value	usage
Defaults to SWF fields	wait				Synopsis of the values associated with these flags: <default_value>,<all\|started\|canceled>[,overwrite] There's an option for each SWF fields that can possibly be effected by user's choice. For example, `--proc_req=10,all` means that all jobs, for which the requested-processor were unobtainable from the PBS log, will be assigned with 10 in this field. If this is given: `--proc_req=10,all,overwrite`, then 10 will be assigned to be the requested-processor of all the jobs, regardless of whether the associated data exists in the original PBS logs or not. Finally, if this is given `--proc_req=10,canceled` then only jobs that were canceled before they were started are effected (note that once again, `'overwrite'` is optional). Similarly, using `'started'` will only effect jobs that were actually started. Time fields are specified in seconds, and memory fields are specified in KB.
	proc_req
	cpu_req
	mem_req
	uid
	gid
	executable
	queue
	partition
	runtime				The synopsis of the values associated with these flags is: <default_value>,<started>[,overwrite] This is similar in every respect to group of flags explained above, but this fields only have meaning for jobs that were actually started. And so a `--runtime=3600,started` means all jobs that are missing a runtime data in the original PBS logs will be assigned with a 1-hour runtime.
	proc_used
	cpu_used
	mem_used
Attributes of SWF fields	mem_used.type			virtual	The values associated with `'type'` attributes are either `'physical'` or `'virtual'`. The values associated with `'quantity'` attributes are either `'per_job'` or `'per_process'` (resource was consumed by a single process; to know the amount of resource consumed by the entire job one must multiple this with the job's size). For example, `--mem_used.quantity=per_job` means that the associated SWF column specifies the aggregated amount of memory used by all the processes composing the job.
	mem_req.type			virtual
	mem_used.quantity			per_job
	mem_req.quantity			per_job
	cpu_used.quantity			per_process
Anonymizing PBS values	anonymize.uid		√		Synopsis of the values associated with these flags: <PBS_value>:<SWF_code> The pbs2swf.pl script arbitrarily replaces every PBS string representing user/group/executable/queue/partition with an SWF code (but does so consistently, that is, once a PBS value is associated with an arbitrary SWF-code, the code will always be used to represent this PBS value). These options gives control to the converter on how the anonymization will actually be performed (which values will be used for which PBS names). And so, for example, `--anonymize.queue=short:1` means that the PBS `'short'` queue will be represented by 1 in the resulting SWF file. See the pbs2swf - Example to understand why these options can be useful.
	anonymize.gid		√
	anonymize.executable		√
	anonymize.queue		√
	anonymize.partition		√
Predefined SWF header fields	Computer	√	√		<short machine description> e.g. "P-III Linux cluster"
	Installation	√	√		<location and name of machine> e.g. "SDSC - Blue Horizon"
	Information	√	√		<where find additional info> usually URL and possibly a paper-ref
	Acknowledge	√	√		<name+email of supplier of PBS data>
	Conversion	√	√		<name+email of converter and possibly additional conversion info>
	TimeZoneString	√			<verbal time zone of PBS log> a file which is (usually) found in /usr/share/zoneinfo/, e.g. US/Alaska
	MaxNodes	√			<int> [comment] e.g. "72 (dual CPU)"
	MaxProcs	√			<int> [comment] e.g. "144"
	MaxRuntime				<seconds> administrative max allowed runtime
	MaxMemory				<KB> administrative max allowed memory
	AllowOveruse				<bool> can jobs use more resource(s) than requested?
	Queues		√		<verbal information about queues>
	Partitions		√		<verbal information about partitions>
	Note		√		<any important note>
Other	output	√			<prefix name for the result file e.g., "l_sdsc_sp2"> In this example, pbs2swf.pl generates both `l_sdsc_sp2.swf` (the actual conversion), and `l_sdsc_sp2.conversion.txt` (reporting various statistics and problems). Here's an example of a conversion summary file.
	help				print a help message
	debug				<0\|1> 1 means an added a 19-th field will be added to the resulting SWF file that holds the original PBS-ID of the job

record type	meaning
A	Job was aborted by server
B	Beginning of reservation period
C	Job was check-pointed and held
D	Job was deleted by request (record contains requestor=user@host
E	Job ended (terminated execution) (record contains all the data needed for SWF)
F	Resources reservation period finished
K	Scheduler/server requested removal of reservation
Q	Job entered a queue; record for each move between queues; record contains queue=name
R	Job was rerun
S	Job execution started

record flow	jobs#	PBS-ID for example	description
QSE	223838	16443.clrglop195.in2p3.fr	"normal" jobs [5]
QQQSE	1	10610.clrce02.in2p3.fr	"normal" jobs [5]
QSDE	2361	43671.clrce01.in2p3.fr	started jobs that were canceled (reached [5] through [4])
QSD...DE	7	33836.clrce01.in2p3.fr	started jobs that were canceled (reached [5] through [4])
QD	14362	33885.clrglop195.in2p3.fr	jobs that were canceled before they were started ([1]->[3])
QDD	4	43569.clrce02.in2p3.fr
QA...A	7	49921.clrce02.in2p3.fr
Q	25	36406.clrglop195.in2p3.fr	jobs for which there's no E record becuase it occurred after the available PBS log ends
QS	10	49927.clrce02.in2p3.fr
QSD	16	78876.clrce01.in2p3.fr
QSEQSE	2106	835.clrglop195.in2p3.fr	IDs wraparound (as explained above); in QSEE there's also a part of the log missing between the two E-s (this is also an event that accompanied the wraparound, that is, that a short portion of the log is missing)
QSEQSDE	14	2202.clrglop195.in2p3.fr
QSEE	1	290.clrglop195.in2p3.fr
QSRD	2	5393.clrce02.in2p3.fr	missing E record (unknown why)

pbs2swf - Converting PBS Logs to SWF

Table of Content:

pbs2swf - Download:

pbs2swf - Example:

pbs2swf - Usage:

Parsing a PBS log file:

Parsing a PBS log record:

record portion	specification
date_time	mm/dd/yyyy hh:mm:ss
id_string	Either the job identifier (job ID) or a PBS reservation identifier (seems irrelevant to SWF). In the LPC log, this may look like so: `468.clrce01.in2p3.fr` (the server's name is the "partition" and the serial number is the PBS job ID within the partition).
record_type	The record type as described above.
message_text	The content depends on the record_type. The message text format is blank separated keyword=value fields (the relevant pairs are discussed below).

SWF field	PBS field(s) [ listed in preference order ]	specification
job	-	Jobs are sorted by arrival order (major) and partition SWF code (minor). If jobs are still "equal" (arrived simultaneously to the same partition) we use the serial from the PBS-ID (see above) to break the tie. Jobs are then assigned in order 1,2,3....
submit	ctime	By 10.12.5 of the PBS administration guide: 'ctime' is the "Time in seconds [since the epoch] when a job was created (first submitted)."
wait	ctime-start	By 10.12.5 of the PBS administration guide: 'start' is the "Time in seconds [since the epoch] when the job execution started".
runtime	resources_used.walltime, etime-ctime	By 11.20.1 of the PBS administration guide: "Use the walltime attribute rather than wall time calculated by subtracting the job start time from end time. The walltime resource attribute does not accumulate when a job is suspended for any reason". However, some records have their 'resources_used.walltime' missing from their record (in the LPC log these are the 352 jobs with Exit_status=-4; that is, that died on signal). In this case, due to no better alternative, the runtime is computed by subtracting the job's 'ctime' from its 'etime'.
proc_used	resources_used.ncpus, resources_used.nodect, resources_used.nodes, Resource_List.ncpus, Resource_List.nodect, Resource_List.nodes	The 'resources_used' list is the most appropriate to obtain this data. The 'ncpus' resource is the most appropriate field, as by 4.8 of PBS user guide we have: "'ncpus' is the number of CPUs (processors) required by a job." . If we don't have 'ncpus', let's try the 'nodect', as by 6.7.1 of the PBS administration guide: "It is possible to set limits on queues (and the Server) as to how many nodes a job can request. The 'nodes' resource itself is a text string and difficult to limit. Thus, 'nodect' (node count) is set by the Server to the integer number of nodes desired by the user as declared in the nodes resource specification. That declaration is parsed and the resulting total number of nodes is set in nodect." If we don't have 'nodect': though the value associated with 'nodes' might be a complex string, it may also a simple number (as is the case in the LPC log, so we try to use 'nodes' as is. If we don't have 'ncpus'/'nodect'/'nodes' in the 'resources_used' list (as is the case in the LPC log) the _requested_ size will serve as the allocated size (extracted from the 'Resource_List'), by using the same resources described above, in the same order.
cpu_used	resources_used.pcput, resources_used.cput	By 4.8 of the PBS administration guide 'pcput' is the "per_process maximum CPU time (i.e. for any single process in the job)". and 'cput' is the "Maximum aggregated CPU time required by all processes in job" . We can convert between 'pcput' and 'cput' by dividing / multiplying them with the processors used, if available. Note however that if this is not available, and one PBS job uses 'pcput' while the other uses 'cput' then one of this data fields will be lost in the conversion (as we can't mix them in one SWF column).
mem_used	resources_used.pvmem, resources_used.vmem, resources_used.pmem, resources_used.mem	By 4.8 of the PBS administration guide: 'pvmem' is the "per_process maximum amount of virtual memory (i.e. for any single process in the job)." 'vmem' is the "Maximum, aggregate amount of virtual memory used by all concurrent processes in the job." 'pmem' is the "per_process maximum amount of physical memory (i.e. for any single process of the job)." 'mem' is the "Maximum amount of physical memory (RAM) required by job." Naturally, we prefer virtual over physical memory, because the former really reflects the jobs' needs. We also prefer the aggregated form over the per-process one (but recall one can be converted to the other if used-processors are known). Similarly to proc_used, we can't mix type (physical/virtual) and quantity (aggregated/per-process) in the same SWF column, so some data fields might get lost in the conversion. The converter should probably prefer the specification that results in more converted data.
proc_req	Resource_List.ncpus, Resource_List.nodect, Resource_List.nodes	See conversion specification of 'proc_used'.
cpu_req	Resource_List.walltime	See conversion specification of 'runtime'.
mem_req	Resource_List.pvmem, Resource_List.vmem, Resource_List.pmem, Resource_List.mem	See conversion specification of 'mem_used'.
status	Exit_status	By 10.12.5 of the PBS administration guide: "The exit status of the job: If the value is less than 10000 (decimal) it is the exit value of the top level process of the job, typically the shell. If the value is greater than 10000, the top process exited on a signal whose number is given by subtracting 10000 from the exit value." We assume that the meaning of exit status values is the same as that of the shell. Thus, we consider Exit_status=0 as normal termination (SWF status=1) and other values as indicating failure (SWF status=0, unless the job was canceled, as indicated by a D/A record, in which case the SWF status is 5).
uid	user	By 10.12.5 of the PBS administration guide: user=username is "The user name under which the job executed."
gid	group	By 10.12.5 of the PBS administration guide: group=groupname is "The group name under which the job executed."
executable	jobname	By 10.12.5 of the PBS administration guide: jobname=job_name is "The name of the job."
queue	queue	By 10.12.5 of the PBS administration guide: queue=queue_name is "The name of the queue in which the job executed."
partition	id_string	A job ID string may look like this: `123.par.cnn.com`. The server in this string serves as the partition identifier.

suffix	meaning
none	bytes
b\|w	bytes or words
kb\|kw	klio bytes or words (1024)
mb\|mw	mega bytes or words (1024^2 = 1,048,576)
gb\|gw	giga bytes or words (1024^3 = 1,073,741,824)