Reads job descriptions from one or more submit files and enters
them into the JugMaster database. To read from the standard input,
use the file name -
.
Submit files that you write may allow or require certain parameters to be specified on the command line, but if the user forgets to provide them, jug_submit will state what is missing, in relatively clear terms.
Jobs are always submitted in batches. Even if there is only one job, it must be submitted as part a "batch" containing just the one job.
The following example assigns some parameters of a batch and then
associates a few jobs with it.
batch
name = "test"
software = "/cms/jug_sw/cmsim.tgz"
storage_class = "cms"
output_path = "cmsim"
job
input_files = "/cms/data/cmkin/test_001.ntpl"
job
input_files = "/cms/data/cmkin/test_002.ntpl"
job
input_files = "/cms/data/cmkin/test_003.ntpl"
This example assumes that wherever the workers run, they will have
access to the filesystem /cms
where the software and data
are stored.
Note also that in this example, it is assumed that a class of storage
workers have been configured to store output files belonging to
a storage class named "cms". These storage workers could,
for example, be configured with a base output path of
/cms/data
, so that the full output path for these jobs
would become /cms/data/cmsim
.
Once you submit the batch, you still need to run some workers to process it. See Running Workers.
More examples may be found here.
More specific notes on syntax are described here.
run_command
will be searched for in
the software package commands list.
If no run_command
can be found there, then the job
fails with an error.
stage_in_command
instead of in
run_command
is that JugWorker can stage in files in
advance, while the previous job is still running, saving time in
case of network delays. If you do not specify anything, then a
stage_in_command
will be searched for in the software
package commands list.
output/
directory are automatically staged out for
you by the storage system. If you do
wish to directly stage out files as part of the job, the advantage
of doing it in stage_out_command
instead of in
run_command
is that JugWorker can run the next job
while the stage-out is happening, saving time in case of network
delays. If you do not specify anything, then a
stage_out_command
will be searched for in the software
package commands list.
run_command
starts some external
process, such as submitting a job to a batch system. Instead
of polling for completion in command
itself, you can
check for completion in polling_command
. The advantage
is that if the worker is managing a large number of jobs simultaneously,
(max_post_running>1),
you can still limit the number of polling commands running at the
same time (max_polling=1).
If you do not specify anything, then a
polling_command
will be searched for in the software
package commands list.
environment =
"INPUT_FILE=file_001.ntpl"
"OUTPUT_FILE=file_001.fz"
true
,
jobs will be interrupted when the worker is interrupted.
Currently, the only mechanism for interruption of either jobs or
the worker is via SIGTERM. Making interruptible jobs is useful,
for example, when running under Condor, since you can have your
jobs gracefully shut down when they are preempted. See also --packup_interrupted_jobs. If your jobs are not
interruptible, then when JugWorker is shutting down, it will
simply wait until they finish. Some batch systems, such as
Condor, may deliver a hard-kill (SIGKILL) if preempted jobs do not
finish in time.
0.03
, or about 1 failure per 30 seconds. The
intention is to prevent a runaway error condition from bogging
down the system by generating a huge rate of failure. Set this to
None
to disable throttling altogether.
mirror_jobs_when_idle
is enabled.
In both cases, it would only ever happen if there is no other work
to do.
true
), extra
copies of jobs will be assigned to run on workers that have no
other work to do. The maximum number of extra copies may be
configured with max_job_mirrors
.
output
directory is added to the batch output path to produce the final
"storage name" of the file. The full path to the file is the
storage server's base path plus the storage name of the file. The
output path may contain references to special environment variables. The
format is
"/cms/data/cmkin/${JUG_BATCH_ID}/${JUG_JOB_ID}"
.
output
directory in their runtime working
directory. Of course, if the job handles storage of its output
files itself, then you do not need to specify a storage class.
The actual storage process is handled by a storage worker.
"*.ntpl"
would
match all output files from the parent that end in
".ntpl"
. By default, the file is copied into the
child job's working directory with the same basename as the
original. You can override this and can build the new file name
out of arbitrary portions of the original. See File Attributes and URLs.
Jug submit file syntax has many similarities to Python. Why not simply use plain python insteady of inventing a specialized syntax? The short answer is that you get superior error reporting and vastly condensed syntax. Fortunately, you can still import and use python modules, and there is also a pure python API (though this is currently undocumented).
Comments in the job description begin with #
and go to
the end of the line.
Attributes of a batch or job are assigned in an indented block. The indentation may consist of spaces or tabs as long as it is consistent within a given block. In place of indentation, braces may be used.
Some attributes (like software
or
input_files
) may consist of a list of values. Lists are
formed by a sequence of comma-delimited (or newline-delimited) values.
The list must either be enclosed in braces or indented:
input_files = {"/cms/data/file_001.ntpl","/cms/data/file_001.ntpl",\
"/cms/data/file_003.ntpl"}
or
input_files =
"/cms/data/file_001.ntpl"
"/cms/data/file_002.ntpl"
"/cms/data/file_003.ntpl"
Key-value lists are lists with named items.
The environment
attribute is a list of key=value
pairs. This may either be specified as a list of "key=value" strings,
or a list of key = value assignments.
The following two examples are equivalent:
#environment as list of strings containing key-values
environment =
"DATASET=l104_qcd"
"JOB_NUM=$JUG_JOB_ID"
or
#environment as key-value assignments
environment =
DATASET = "l104_qcd"
JOB_NUM = "$JUG_JOB_ID"
If a key is assigned the value None
, it is removed
from the list. Nested lists are expanded into a single list to form
the final environment setting, so you may easily compose the full
environment list by dropping in other sub-lists.
You may extract values from a list or key-value list using the
[]
operator or with get(key,default=None)
.
Facetious example:
sw_urls =
CMKIN_4_3_1 = "/afs/hep.wisc.edu/cms/sw/dar/CMKIN_4_3_1"
OSCAR_3_6_5 = "/afs/hep.wisc.edu/cms/sw/dar/OSCAR_3_6_5"
batch
software =
sw_urls["CMKIN_4_3_1"] #fails if key is undefined
sw_urls.get("OSCAR_3_6_5") #returns None if key is undefined
You may define your own variables in the submit file, either at the
top level, or as part of particular batch or job. Variables may be assigned
to any datatype (strings, numbers, lists, and hashes). Example:
dataset = "l104_qcd"
batch
name = dataset + ".cmkin"
batch
name = dataset + ".oscar"
var my_parent_name = dataset + ".cmkin"
parent
name = my_parent_name
For variables defined at the top level (dataset
in the
example above), the var
keyword is optional. Inside of a
batch or other object, var
is required to make it clear
that you are defining a new variable instead of assigning a value to
an existing one.
As illustrated in this example, you may also use variables in expressions, such as the concatenation of two strings together.
Parameters are variables that may be assigned externally, either on
the command-line to jug_submit
or when a script gets included
into another one.
Parameters are defined with the param
keyword. In the
following example, dataset
is a required parameter, while
events_per_job
is optional, since it has a default value.
param dataset
param events_per_job = 100
If called directly with jug_submit
, a script
containing the above lines might be invoked like this:
jug_submit cmkin.jug dataset=l104_qcd events_per_job=10
NOTE: all parameters passed from the command-line are treated as strings
by default, so if you need to perform numeric operations on them, you
will need to convert them to numbers using num()
.
If a user does not provide a value for a parameter and the
parameter has no default value, a standard error message will be
displayed. To provide more help to users, you can specify an example
value. For example:
param orca_package eg("ORCA_7_6_1")
You may also specify a longer description of the parameter.
param orca_package eg("ORCA_7_6_1","ORCA software directory name")
When inside of a batch or other object definition, variable references are first matched within the scope of the object and then within the global scope of the file. You may force the global scope as in the example:
param execution_group batch name = "test" execution_group = global.execution_group
There are a number of built-in functions that you may use in expressions within the submit file.
eval("1+1")
evaluates to 2
.
test
is true, this function returns
true_result
. Otherwise it returns
false_result
. Only the needed result is evaluated.
batch("l104_qcd_80_120.oscar").environment["OUTPUT_COLLECTION"]
.
Expressions may use python-style mathematical and logical operators: +, -, *, /, <, >, ==, !=, and, or, and not.
The if
statement may be used to conditionally execute
portions of a submit script. Example:
param parent_name = None
batch
if parent_name:
parent
name = parent_name
group_size = 0
The if
statement may be followed by one or more
elif
statements and a final else
statement.
if condition:
...
elif condition:
...
else:
...
You may include other submit files using the include
keyword. The files are searched for in the same directory and in all
directories listd in the environment variable JUG_SUBMIT (colon
delimited entries).
The following example demonstrates how to include another script,
setting parameters in the process. The parameters are simply treated
as a hash list, with any nested hashes expanded into a single list.
include "some_script.jug"
dataset = "l104_qcd"
events_per_job = 10
It is also possible to include another script and then reference
variables within it. You do this with the You may import python modules into a Jug submit file using the
following syntax:
You may then access variables and call functions of the module,
just like you would in python code.
When input files or software packages are specified, an absolute
path or a URL may be given. In either case, you should ensure that
the worker will be able to access this path or URL at runtime.
The protocols supported by JugWorker may be extended. By default, JugWorker supports the
following protocols:
You may also use host file name URLs. These are URLs of the
form Files (i.e. URLs) may be referenced in a number of places in job
descriptions. The context of the reference generally associates some
sort of default assumptions about how the file should be handled.
For example, files listed in the You can override the way a file is treated by setting one or more
attributes of the file. These are specified inside braces before the filename.
The following example places an input file in a subdirectory
instead of the default behavior of simply putting it directly in the
job's working directory.
When there are multiple attribute assignments, these are separated
by semicolons or ampersands. Special characters in attribute names or
values (such as semicolons) must be escaped by using form-url-encoding
syntax. The special character is represented by a percent sign
followed by the hexideciman ascii number of the character. For
example, If software is always pre-installed, you can simply
give the installation path as the URL part of the software entry.
There are a few special environment variables supplied to jobs.
These may also be referenced in the values of other environment
variables or in the output_path attribute. To reference them, simply
insert The following variables may be referenced but are not automatically
assigned to an environment variable.
In addition to simple substitutions of these variables into other
environment values, you may use them in expressions. For example,
The following examples may be helpful in showing the sort of thing
that can be done with The following example reads the submit file Here is an example submit file:
The file The following example creates a child batch that receives input
files from the previous example.
A few words about file handling are in order. JugWorker looks for
output files in the Files from the first batch ending in How the files are stored and retrieved depends on the storage handler that is being used. A
simple storage handler may store the files on disk and allow them to
be retrieved via http from the Jug storage worker itself. If there
are multiple storage workers, the files may even be scattered across
multiple hosts. The database keeps track of the URL of each file so
that the file can be read by the child job.
This example combines the previous two examples by creating a
submit file that invokes both the parent and child batches at once.
as
keyword. Example:
include "site_config.jug" as site
batch
storage_class = site.storage_class
...
Importing Python Modules
import module-name [as alias]
URLs
"hfn:machine.domain.name:/path/to/file"
. In order
to make these files accessible, you will need to register a file server that can
provide access to the files. Note that the "hostname" is usually, but
not necessarily, the network name of some machine. It could also be
some arbitrarily chosen name, like "cms_data" which is associated with
the fileserver that you register.
File Attributes
software
list are
assumed to be packages that should be installed once per worker and
shared by all other jobs that need them, but files listed in
input_files
are simply copied into the job's working
directory without any caching.
Example
input_files = "{name=input/file_001.ntpl}/cms/data/file_001.ntpl"
';'
is %3B
, '&'
is
%26
, and '='
is %3D
.
install
script if
there is one in the unpacked package
directory.
pre_installed_search_path=${MY_PACKAGE_PATH}
.
run_wrapper
, it will be resolved relative to the directory
where the software package is installed.
download_command
,
the default is half an hour.
handle_urls
to specify the command that should be used to
invoke a file downloading software package. The first argument to the
command will be the url and the second argument is the local filename
to store it in.
download_command
).
*
replaced with the name of the data
file. For example, 'store_info=*.info'
would store
the information for file 'X'
in 'X.info'
.
If you specify only *
as the name, then the file
data will not be stored; only the file info will be stored. The
information is stored in 'key = value' format, one line per item. The
attributes stored in the file are name, size, md5 checksum, and url.
no
, the file data will not be
downloaded or installed. This may be useful in combination with
store_info
if you only want information about the file
and do not want the file itself.
Special Environment Variables
${VAR_NAME}
in the string where you want
the value to be substituted. You may also use the syntax
$VAR_NAME
when the variable name happens to be
delimited from the surrounding text by spaces or punctuation. If you
do not want a $
expression to be evaluated, simply insert
$$
.
${JUG_SEED*250}
will be evaluated as the seed number
multiplied by 250.
More Examples
jug_submit
. For information about
hwo to actually run the jobs, see JugWorker.
Example 1
cmkin.jug
and
creates 100 jobs, each with a different random seed.
jug_submit cmkin.jug dataset=qcd seed=12000 jobs=100 events_per_job=250
#################################################
#cmkin.jug: submit file for stage 1 simulation
#load general configuration information
include "site_config.jug" as site
param dataset
param seed
param jobs
param events_per_job = 100
param cmkin_package = "CMKIN_1_2_0"
param cmkin_exe = "kine_make_ntpl.exe"
param cmkin_storage_class = site.storage_class
batch
name = dataset + ".cmkin"
source = site.site_name
seed_low = seed
seed_high = seed + num(jobs)-1
software =
site.cms_sw_path + "/" + cmkin_package
site.cms_sw_path + "/scripts"
run_command = "run_cmkin" #the script to run
input_file =
site.cms_db + "/" + dataset + "/cmkin.cards"
environment =
CMKIN_EXE = cmkin_exe #stage 1 simulator to run from the script
CMKIN_RUN_SEED = "$JUG_SEED" #the random seed
CMKIN_EVENTS = events_per_job #number of events to generate
NTPL_NAME = "$JUG_BATCH_NAME.$JUG_SEED.ntpl" #output datafile name
LOG_NAME = "$JUG_BATCH_NAME.$JUG_JOB_ID.log" #log file name
storage_class = cmkin_storage_class
#################################################
site_config.jug
is used to provide some
general configuration paremeters in this example. It might look like this:
#################################################
#site_config.jug: site configuration file
site_name = "uwhep"
cms_sw_path = "/afs/hep.wisc.edu/cms/sw"
cms_db = "/afs/hep.wisc.edu/cms/cms_db"
storage_class = site_name
#################################################
Example 2
#################################################
#oscar.jug: submit file for stage 2 simulation
#load general configuration information
include "site_config.jug" as site
param dataset
param oscar_package = "OSCAR_2_4_5"
param oscar_exe = "oscar"
param oscar_storage_class = site.storage_class
batch
name = dataset + ".oscar"
source = site.site_name
parent
name = dataset + ".cmkin"
input_files = "*.ntpl"
software =
site.cms_sw_path + "/" + oscar_package
site.cms_sw_path + "/scripts"
run_command = "run_oscar" #the script to run
input_files =
site.cms_db + "/" + dataset + "/oscar.cards"
environment =
OSCAR_EXE = oscar_exe #stage 2 simulator to run from the script
LOG_NAME = "$JUG_BATCH_NAME.$JUG_JOB_ID.log" #the log file
storage_class = oscar_storage_class
#################################################
output
directory contained in the
job's working directory. Anything there (incuding sub-directories)
gets registered as output and will be stored by the storage handler.
.ntpl
become
input files to the second batch of jobs. By default, this means that
the input file is copied into the working directory of the second job.
Example 3
#################################################
#cmkin_to_oscar.jug: submit file for stage 1-2 simulation
param dataset
param seed
param jobs
param events_per_job = 100
param cmkin_package = "CMKIN_1_2_0"
param cmkin_exe = "kine_make_ntpl.exe"
param oscar_package = "OSCAR_2_4_5"
param oscar_exe = "oscar"
include "cmkin.jug"
dataset = dataset
seed = seed
jobs = jobs
events_per_job = events_per_job
cmkin_package = cmkin_package
cmkin_exe = cmkin_exe
include "oscar.jug"
dataset = dataset
oscar_package = oscar_package
oscar_exe = oscar_exe
#################################################