The JUG worker is a generic "agent" that asks the master for work, obtains necessary software/data, executes jobs, and sends back the results.
The files downloaded (aka staged-in) for a particular job fall under two basic types: software and rundata. Software is installed once and shared (potentially) by multiple jobs, while rundata is copied into the work space for each job individually (though it may still be cached to avoid having to download it every time).
Files may be downloaded by the worker individually, but software is usually installed in the form of a package, from which one or more files are extracted. For a description of the simple packaging format, see Packaging Software.
The jug worker considers all jobs to consist of three main stages: stage-in, run, and stage-out. In order to make optimal use of system resources, jobs are "juggled" so that the stage-in of the next job is happening while the current job is running (and similarly for stage-out).
The specific steps taken by the worker to run a job are listed below:
stage_in_command
if there is one.
run_command
.
polling_command
until
jug_polling.vars
indicates that the job is finished.
This is used in queue management
software.
stage_out_command
if there is one.
output
directory. Wait for these to be
stored by a storage worker before "committing" the job,
which marks it as done.
In the default configuration for job execution workers, only one jobs runs at a time, only one is staged in at a time, and only one is staged out at a time. Any of these stages may be configured to have multiple threads. That might be useful, for example, if the jobs being run by the worker are actually being submitted into some other processing queue. It also may be advantageous to give workers greater autonomy in the face of network and service outages by letting them pre-stage-in N jobs or queue up the stage-out of N jobs instead of just 1.
The worker is configured using jug_worker_setup and instantiated with jug_make_worker or by the action of a queue manager.