jug_submit

Usage: jug_submit submit_files [param=value ...]

Reads job descriptions from one or more submit files and enters them into the JugMaster database. To read from the standard input, use the file name -.

Submit files that you write may allow or require certain parameters to be specified on the command line, but if the user forgets to provide them, jug_submit will state what is missing, in relatively clear terms.

Submit File Syntax

Jobs are always submitted in batches. Even if there is only one job, it must be submitted as part a "batch" containing just the one job.

Example 1

The following example assigns some parameters of a batch and then associates a few jobs with it.

batch
    name = "test"
    software = "/cms/jug_sw/cmsim.tgz"
    storage_class = "cms"
    output_path = "cmsim"
job
    input_files = "/cms/data/cmkin/test_001.ntpl"
job
    input_files = "/cms/data/cmkin/test_002.ntpl"
job
    input_files = "/cms/data/cmkin/test_003.ntpl"

This example assumes that wherever the workers run, they will have access to the filesystem /cms where the software and data are stored.

Note also that in this example, it is assumed that a class of storage workers have been configured to store output files belonging to a storage class named "cms". These storage workers could, for example, be configured with a base output path of /cms/data, so that the full output path for these jobs would become /cms/data/cmsim.

Once you submit the batch, you still need to run some workers to process it. See Running Workers.

More examples may be found here.

More specific notes on syntax are described here.

Attributes

batch
name
Label for this batch. Should be unique across all batches.
run_command
Command to run. If you do not specify anything, then a run_command will be searched for in the software package commands list. If no run_command can be found there, then the job fails with an error.
stage_in_command
Command to run to stage-in files. This is normally not necessary, since any input files specified in the jug submit file are automatically staged in for you. If you do wish to directly stage in files as part of the job, the advantage of doing it in stage_in_command instead of in run_command is that JugWorker can stage in files in advance, while the previous job is still running, saving time in case of network delays. If you do not specify anything, then a stage_in_command will be searched for in the software package commands list.
stage_out_command
Command to run to stage-out files. This is normally not necessary since any output files placed in the job's output/ directory are automatically staged out for you by the storage system. If you do wish to directly stage out files as part of the job, the advantage of doing it in stage_out_command instead of in run_command is that JugWorker can run the next job while the stage-out is happening, saving time in case of network delays. If you do not specify anything, then a stage_out_command will be searched for in the software package commands list.
polling_command
Command to run to periodically check if the job is finished. This may be useful if run_command starts some external process, such as submitting a job to a batch system. Instead of polling for completion in command itself, you can check for completion in polling_command. The advantage is that if the worker is managing a large number of jobs simultaneously, (max_post_running>1), you can still limit the number of polling commands running at the same time (max_polling=1). If you do not specify anything, then a polling_command will be searched for in the software package commands list.
environment
One or more environment assignments that should be made before running the job. Example:
    environment =
        "INPUT_FILE=file_001.ntpl"
        "OUTPUT_FILE=file_001.fz"
    
execution_class
Execution of jobs by JugWorker is controlled by the execution class of the job. When workers are configured, they are defined to select jobs from one or more execution classes, so by controlling what types of workers run under what circumstances, you control when and where jobs belonging to a given execution class may run.
input_files
Files to copy into the working directory of the job. By default, no unpacking is done of archives; otherwise, this is the same as workware.
interruptible=false
If set to true, jobs will be interrupted when the worker is interrupted. Currently, the only mechanism for interruption of either jobs or the worker is via SIGTERM. Making interruptible jobs is useful, for example, when running under Condor, since you can have your jobs gracefully shut down when they are preempted. See also --packup_interrupted_jobs. If your jobs are not interruptible, then when JugWorker is shutting down, it will simply wait until they finish. Some batch systems, such as Condor, may deliver a hard-kill (SIGKILL) if preempted jobs do not finish in time.
job_runtime
A rough estimate of how long each job will take to run, in seconds. This is used to determine if a job might be taking too long, in which case, when there is nothing else to do, the same job may be assigned to a second worker, letting them race to the finish. See max_job_mirrors.
job_startup_time
Amount of time (in seconds) from the moment the job starts running that it should be considered to be in a "starting up" phase. During this time, the application is assumed to be doing a significant amount of disk I/O so it would be best not to pre-stage files for another job. The end effect is that a job that is starting up will be counted in the "pre running" as well as the "running" state for purposes of JugWorker queue management. The default startup time is 1 hour.
max_failure_frequency
Maximum number of failed run attempts per second that may fail before throttling the assignment of new runs. The default is 0.03, or about 1 failure per 30 seconds. The intention is to prevent a runaway error condition from bogging down the system by generating a huge rate of failure. Set this to None to disable throttling altogether.
max_job_mirrors=5
This is the maximum number of extra copies of the same job that may ever simultaneously be assigned to run. This may happen when a running job has taken longer than expected (see job_runtime). It may also happen if mirror_jobs_when_idle is enabled. In both cases, it would only ever happen if there is no other work to do.
mirror_jobs_when_idle=false
When this is enabled (assigned to true), extra copies of jobs will be assigned to run on workers that have no other work to do. The maximum number of extra copies may be configured with max_job_mirrors.
output_path
Path within the storage area where output files will be stored. The path of the file within the job's output directory is added to the batch output path to produce the final "storage name" of the file. The full path to the file is the storage server's base path plus the storage name of the file. The output path may contain references to special environment variables. The format is "/cms/data/cmkin/${JUG_BATCH_ID}/${JUG_JOB_ID}".
rank
A floating point number representing the relative share of resources this batch should get when there are other competing batches. The default is 1.0.
seed_low
Starting value for the seed range in a seed batch. The jobs for a seed batch are generated automatically, with each job receiving a different seed number from the specified range. The number is stored in the job's environment as JUG_SEED.
seed_high
Final value for the seed range in a seed batch. For example, seed_low=1 and seed_high=100 would produce 100 jobs, including both JUG_SEED=1 and JUG_SEED=100.
software
One or more software packages required by the job. See a description of the simple package format for details. If not pre-installed, software packages are installed once per worker and shared between all jobs that need them. Each entry should be a URL or path that will be accessible to the worker from wherever it runs. See URLs and File Attributes for additional options.
source
Name of the system from which this batch was created. For example, if this is an assignment from CERN's RefDB, then this could be "RefDB".
source_batch_id
If this batch was generated as an assignment from some other system, this may be assigned to the unique identifier for this batch in the other system.
storage_class
Name of a storage class to handle storage of output files. Jobs are expected to place output files into an output directory in their runtime working directory. Of course, if the job handles storage of its output files itself, then you do not need to specify a storage class. The actual storage process is handled by a storage worker.
workware
This is just like software, but these packages are installed into the job's runtime working directory. This means that the package file is downloaded once by the worker, but then it must be unpacked for each job that runs.
job
batch
Name of the batch this job is associated with. By default, the job simply belongs to the batch preceeding it in the submit file. You may also nest the job description inside the batch description by indenting the whole thing so that it is in the batch attribute assignment block.
workware
Additional workware required by this job. This is treated the same as batch workware, but the packages are not cached, since it is assumed to be specific to just this job.
input_files
One or more files to copy into the job's runtime working directory. By default, the files are installed with the same basename, but you can change the name or have them copied into a sub-directory. Each entry should be a URL of file path that will be accessible to the worker from wherever it runs. See File Attributes.
environment
One or more additional environment variable assignments for this job.
output_path
Additional path to add on to the batch output_path when storing output files from this job.
parent
A child batch may have one or more parents. Jobs in the child batch are created automatically in an N-to-1 map between parent jobs and child jobs.
name
The name of the parent batch. If the child batch definition is nested inside of another batch, then that outer batch is the default parent batch.
input_files
One or more input file patterns to match against the output files of the parent job. For example, "*.ntpl" would match all output files from the parent that end in ".ntpl". By default, the file is copied into the child job's working directory with the same basename as the original. You can override this and can build the new file name out of arbitrary portions of the original. See File Attributes and URLs.
group_size
Number of parent jobs per child job. The default is 1. Setting this to 0 means that no parent jobs are mapped to child jobs, but the entire parent batch must be finished before the child batch can begin.

General Notes on Syntax

Jug submit file syntax has many similarities to Python. Why not simply use plain python insteady of inventing a specialized syntax? The short answer is that you get superior error reporting and vastly condensed syntax. Fortunately, you can still import and use python modules, and there is also a pure python API (though this is currently undocumented).

Comments in the job description begin with # and go to the end of the line.

Attributes of a batch or job are assigned in an indented block. The indentation may consist of spaces or tabs as long as it is consistent within a given block. In place of indentation, braces may be used.

Lists

Some attributes (like software or input_files) may consist of a list of values. Lists are formed by a sequence of comma-delimited (or newline-delimited) values. The list must either be enclosed in braces or indented:

input_files = {"/cms/data/file_001.ntpl","/cms/data/file_001.ntpl",\
               "/cms/data/file_003.ntpl"}

or

input_files =
    "/cms/data/file_001.ntpl"
    "/cms/data/file_002.ntpl"
    "/cms/data/file_003.ntpl"

Key-Value Lists

Key-value lists are lists with named items.

The environment attribute is a list of key=value pairs. This may either be specified as a list of "key=value" strings, or a list of key = value assignments.

The following two examples are equivalent:

#environment as list of strings containing key-values
environment =
   "DATASET=l104_qcd"
   "JOB_NUM=$JUG_JOB_ID"

or

#environment as key-value assignments
environment =
   DATASET = "l104_qcd"
   JOB_NUM = "$JUG_JOB_ID"

If a key is assigned the value None, it is removed from the list. Nested lists are expanded into a single list to form the final environment setting, so you may easily compose the full environment list by dropping in other sub-lists.

You may extract values from a list or key-value list using the [] operator or with get(key,default=None). Facetious example:

sw_urls =
  CMKIN_4_3_1 = "/afs/hep.wisc.edu/cms/sw/dar/CMKIN_4_3_1"
  OSCAR_3_6_5 = "/afs/hep.wisc.edu/cms/sw/dar/OSCAR_3_6_5"

batch
  software =
    sw_urls["CMKIN_4_3_1"]     #fails if key is undefined
    sw_urls.get("OSCAR_3_6_5") #returns None if key is undefined

Variables

You may define your own variables in the submit file, either at the top level, or as part of particular batch or job. Variables may be assigned to any datatype (strings, numbers, lists, and hashes). Example:

dataset = "l104_qcd"

batch
   name = dataset + ".cmkin"
batch
   name = dataset + ".oscar"
   var my_parent_name = dataset + ".cmkin"

   parent
      name = my_parent_name

For variables defined at the top level (dataset in the example above), the var keyword is optional. Inside of a batch or other object, var is required to make it clear that you are defining a new variable instead of assigning a value to an existing one.

As illustrated in this example, you may also use variables in expressions, such as the concatenation of two strings together.

Parameters

Parameters are variables that may be assigned externally, either on the command-line to jug_submit or when a script gets included into another one.

Parameters are defined with the param keyword. In the following example, dataset is a required parameter, while events_per_job is optional, since it has a default value.

param dataset
param events_per_job = 100

If called directly with jug_submit, a script containing the above lines might be invoked like this:

jug_submit cmkin.jug dataset=l104_qcd events_per_job=10

NOTE: all parameters passed from the command-line are treated as strings by default, so if you need to perform numeric operations on them, you will need to convert them to numbers using num().

If a user does not provide a value for a parameter and the parameter has no default value, a standard error message will be displayed. To provide more help to users, you can specify an example value. For example:

param orca_package  eg("ORCA_7_6_1")

You may also specify a longer description of the parameter.

param orca_package  eg("ORCA_7_6_1","ORCA software directory name")

Scope Resolution

When inside of a batch or other object definition, variable references are first matched within the scope of the object and then within the global scope of the file. You may force the global scope as in the example:

param execution_group
batch
  name = "test"
  execution_group = global.execution_group

Built-in Functions

There are a number of built-in functions that you may use in expressions within the submit file.

num(X)
Returns the argument converted to a numeric datatype.
int(X)
Returns the argument converted to a numeric datatype with any fractional part truncated.
str(X)
Returns the argument converted to a string datatype.
eval(X)
Parses the string argument and evaluates it as an expression. For example, eval("1+1") evaluates to 2.
iff(test,true_result,false_result)
If test is true, this function returns true_result. Otherwise it returns false_result. Only the needed result is evaluated.
this_file()
Returns the full path to the file that is being parsed.
dirname(path)
Returns the path part of the given path.
basename(path)
Returns the file name at the end of the specified path.
get_ext(path)
Returns the filename extension. For example, get_ext("name.ext") yields ".ext".
strip_ext(path)
Returns everything but the filename extension. For example, strip_ext("name.ext") yields "name".
set_ext(path)
Replaces the filename extension. For example, set_ext("name.ext",".new") yields "name.new".
lower(X)
Returns a string with all characters converted to lowercase.
upper(X)
Returns a string with all characters converted to uppercase.
new_seed(seed_range,seed_bank="default")
Returns a new starting seed value from the named "seed bank," which is tracked in the database. The following seed_range seeds are reserved, so that the next call to new_seed for the same seed bank will return a number beyond the end of the range of seeds allocated in the previous call.
batch(name)
Returns a batch object with the specified name. This is useful for accessing properties of existing batches. For example, a child batch might need to access values stored in the environment of a parent batch. You could do that with an expression such as this, batch("l104_qcd_80_120.oscar").environment["OUTPUT_COLLECTION"].
batch_exists(name)
Returns true if the give batch exists, either in the database, or pending creation from the current jug submit file.
getenv(var,[default_value])
Returns the value of the specified environment variable. If no default value is specified, it is an error if the environment variable is not defined.

Operators

Expressions may use python-style mathematical and logical operators: +, -, *, /, <, >, ==, !=, and, or, and not.

Flow Control

The if statement may be used to conditionally execute portions of a submit script. Example:

param parent_name = None
batch
  if parent_name:
    parent
      name = parent_name
      group_size = 0

The if statement may be followed by one or more elif statements and a final else statement.

if condition:
  ...
elif condition:
  ...
else:
  ...

Including Scripts

You may include other submit files using the include keyword. The files are searched for in the same directory and in all directories listd in the environment variable JUG_SUBMIT (colon delimited entries).

The following example demonstrates how to include another script, setting parameters in the process. The parameters are simply treated as a hash list, with any nested hashes expanded into a single list.

include "some_script.jug"
   dataset        = "l104_qcd"
   events_per_job = 10

It is also possible to include another script and then reference variables within it. You do this with the as keyword. Example:

include "site_config.jug" as site

batch
   storage_class = site.storage_class
   ...

Importing Python Modules

You may import python modules into a Jug submit file using the following syntax:

import module-name [as alias]

You may then access variables and call functions of the module, just like you would in python code.

URLs

When input files or software packages are specified, an absolute path or a URL may be given. In either case, you should ensure that the worker will be able to access this path or URL at runtime.

The protocols supported by JugWorker may be extended. By default, JugWorker supports the following protocols:

You may also use host file name URLs. These are URLs of the form "hfn:machine.domain.name:/path/to/file". In order to make these files accessible, you will need to register a file server that can provide access to the files. Note that the "hostname" is usually, but not necessarily, the network name of some machine. It could also be some arbitrarily chosen name, like "cms_data" which is associated with the fileserver that you register.

File Attributes

Files (i.e. URLs) may be referenced in a number of places in job descriptions. The context of the reference generally associates some sort of default assumptions about how the file should be handled.

For example, files listed in the software list are assumed to be packages that should be installed once per worker and shared by all other jobs that need them, but files listed in input_files are simply copied into the job's working directory without any caching.

You can override the way a file is treated by setting one or more attributes of the file. These are specified inside braces before the filename.

Example

The following example places an input file in a subdirectory instead of the default behavior of simply putting it directly in the job's working directory.

input_files = "{name=input/file_001.ntpl}/cms/data/file_001.ntpl"

When there are multiple attribute assignments, these are separated by semicolons or ampersands. Special characters in attribute names or values (such as semicolons) must be escaped by using form-url-encoding syntax. The special character is represented by a percent sign followed by the hexideciman ascii number of the character. For example, ';' is %3B, '&' is %26, and '=' is %3D.

type
Possible values are "sw" and "rundata". Type "sw" packages are installed in the worker's software cache. Type "rundata" files or packages are installed in the job's working directory.
name
The name (and path) to use when installing the file. For parent-child input file patterns, this may include references to parenthesized portions of the matched file name. $1 refers to the first parenthesized group that was matched, $2 refers to the second, and so on. To get the whole name, use $0.
cache
Possible values are "yes" and "no". For batch workware and input_files, this is set to "yes" by default. In all other cases it is "no" by default. Note that caching only applies to the downloaded file or package. It does not affect how and where the file is installed. So for software which is installed in space that is shared from one job to the next, no caching of the downloaded package file is necessary.
unpack
Possible values are "yes" and "no". Software and batchware are unpacked by default; input files are not. Unpacking basically means untarring/unzipping, and running the install script if there is one in the unpacked package directory.
pre_installed_search_path
List of paths to search for an existing installation of this software. If there are multiple paths to search, they should be delemited by '|'. References to the environment of JugWorker may be made. For example: pre_installed_search_path=${MY_PACKAGE_PATH}.

If software is always pre-installed, you can simply give the installation path as the URL part of the software entry.

not_required
The specified command will be run and the exit status will determine whether this package will be downloaded and installed. A zero exit status causes the package to be skipped; a non-zero exit status causes the package to be installed. The program that does the test would typically be installed as part of another software package, which must therefore be listed before this one.
optional_input
Setting this attribute to 1 in an input file pattern will prevent it from being treated as an error when a job is created and the parent job has no output files matching the specified input file pattern.
run_wrapper
The specified command will be used as a "job wrapper" during the run stage. The command line arguments to the wrapper will be the job executable and arguments. If a relative path is specified for the run_wrapper, it will be resolved relative to the directory where the software package is installed.
md5
When an MD5 checksum is provided, the downloaded file is checked against this value. This is automatically done for storage transfers and when files are transmitted between parent-child jobs.
handle_urls
Any URLs matching the specified regular expression will be handled by this software package. Example: "{download_command=dccp;handle_urls=dcap:}http://path/to/dccp.tgz"
read_timeout
If no data is read for the specified amount of time (in seconds), the download will be aborted. If nothing is specified, the default is 10 minutes. For files downloaded via download_command, the default is half an hour.
download_command
This is used in conjunction with handle_urls to specify the command that should be used to invoke a file downloading software package. The first argument to the command will be the url and the second argument is the local filename to store it in.
download_command_read_timeout
Like read_timeout, but this applies, not to this package itself but to files being downloaded by this package (via download_command).
store_url_only
The file data will not be downloaded. Instead, the URL of the file that would have been downloaded is written into a local file with the same name as what would have been created. This may be useful for child jobs that need to know the location but do not need a local copy of the data from a parent job.
store_info
Information about the file is stored in another file. The name of the info file is whatever you specify as the value of this parameter, with any * replaced with the name of the data file. For example, 'store_info=*.info' would store the information for file 'X' in 'X.info'. If you specify only * as the name, then the file data will not be stored; only the file info will be stored. The information is stored in 'key = value' format, one line per item. The attributes stored in the file are name, size, md5 checksum, and url.
store_contents
If this is set to no, the file data will not be downloaded or installed. This may be useful in combination with store_info if you only want information about the file and do not want the file itself.

Special Environment Variables

There are a few special environment variables supplied to jobs. These may also be referenced in the values of other environment variables or in the output_path attribute. To reference them, simply insert ${VAR_NAME} in the string where you want the value to be substituted. You may also use the syntax $VAR_NAME when the variable name happens to be delimited from the surrounding text by spaces or punctuation. If you do not want a $ expression to be evaluated, simply insert $$.

JUG_BATCH_ID
The unique ID number assigned by JugMaster to this batch.
JUG_JOB_ID
The unique ID number assigned by JugMaster to this job.
JUG_SEED
The seed number assigned to this job by JugMaster from the range you specified.

The following variables may be referenced but are not automatically assigned to an environment variable.

JUG_BATCH_NAME
The name of this batch, assigned by the user.

In addition to simple substitutions of these variables into other environment values, you may use them in expressions. For example, ${JUG_SEED*250} will be evaluated as the seed number multiplied by 250.

More Examples

The following examples may be helpful in showing the sort of thing that can be done with jug_submit. For information about hwo to actually run the jobs, see JugWorker.

Example 1

The following example reads the submit file cmkin.jug and creates 100 jobs, each with a different random seed.

jug_submit cmkin.jug dataset=qcd seed=12000 jobs=100 events_per_job=250

Here is an example submit file:

#################################################
#cmkin.jug: submit file for stage 1 simulation

#load general configuration information
include "site_config.jug" as site

param dataset
param seed
param jobs
param events_per_job = 100
param cmkin_package  = "CMKIN_1_2_0"
param cmkin_exe      = "kine_make_ntpl.exe"
param cmkin_storage_class = site.storage_class

batch
   name   = dataset + ".cmkin"
   source = site.site_name

   seed_low  = seed
   seed_high = seed + num(jobs)-1

   software =
      site.cms_sw_path + "/" + cmkin_package
      site.cms_sw_path + "/scripts"

   run_command = "run_cmkin"          #the script to run

   input_file =
      site.cms_db + "/" + dataset + "/cmkin.cards"

   environment =
      CMKIN_EXE      = cmkin_exe      #stage 1 simulator to run from the script
      CMKIN_RUN_SEED = "$JUG_SEED"    #the random seed
      CMKIN_EVENTS   = events_per_job #number of events to generate
      NTPL_NAME      = "$JUG_BATCH_NAME.$JUG_SEED.ntpl"  #output datafile name
      LOG_NAME       = "$JUG_BATCH_NAME.$JUG_JOB_ID.log" #log file name

   storage_class = cmkin_storage_class
#################################################

The file site_config.jug is used to provide some general configuration paremeters in this example. It might look like this:

#################################################
#site_config.jug: site configuration file

site_name = "uwhep"

cms_sw_path = "/afs/hep.wisc.edu/cms/sw"

cms_db = "/afs/hep.wisc.edu/cms/cms_db"

storage_class = site_name
#################################################

Example 2

The following example creates a child batch that receives input files from the previous example.

#################################################
#oscar.jug: submit file for stage 2 simulation

#load general configuration information
include "site_config.jug" as site

param dataset
param oscar_package  = "OSCAR_2_4_5"
param oscar_exe      = "oscar"
param oscar_storage_class = site.storage_class

batch
   name   = dataset + ".oscar"
   source = site.site_name

   parent
      name = dataset + ".cmkin"
      input_files = "*.ntpl"

   software =
      site.cms_sw_path + "/" + oscar_package
      site.cms_sw_path + "/scripts"

   run_command = "run_oscar"    #the script to run

   input_files =
      site.cms_db + "/" + dataset + "/oscar.cards"

   environment =
      OSCAR_EXE  = oscar_exe    #stage 2 simulator to run from the script
      LOG_NAME   = "$JUG_BATCH_NAME.$JUG_JOB_ID.log" #the log file

   storage_class = oscar_storage_class
#################################################

A few words about file handling are in order. JugWorker looks for output files in the output directory contained in the job's working directory. Anything there (incuding sub-directories) gets registered as output and will be stored by the storage handler.

Files from the first batch ending in .ntpl become input files to the second batch of jobs. By default, this means that the input file is copied into the working directory of the second job.

How the files are stored and retrieved depends on the storage handler that is being used. A simple storage handler may store the files on disk and allow them to be retrieved via http from the Jug storage worker itself. If there are multiple storage workers, the files may even be scattered across multiple hosts. The database keeps track of the URL of each file so that the file can be read by the child job.

Example 3

This example combines the previous two examples by creating a submit file that invokes both the parent and child batches at once.

#################################################
#cmkin_to_oscar.jug: submit file for stage 1-2 simulation

param dataset
param seed
param jobs
param events_per_job = 100
param cmkin_package  = "CMKIN_1_2_0"
param cmkin_exe      = "kine_make_ntpl.exe"
param oscar_package  = "OSCAR_2_4_5"
param oscar_exe      = "oscar"

include "cmkin.jug"
   dataset        = dataset
   seed           = seed
   jobs           = jobs
   events_per_job = events_per_job
   cmkin_package  = cmkin_package
   cmkin_exe      = cmkin_exe

include "oscar.jug"
   dataset       = dataset
   oscar_package = oscar_package
   oscar_exe     = oscar_exe
#################################################