A queue worker is the same as a normal worker (i.e. JugWorker), but it runs queue jobs instead of normal job execution jobs. When there is work to do, the queue worker creates workers and submits them to a batch system. Jug comes with a queue management package that supports Condor, and additional packages may be easily added.
A queue worker class for running under Condor should have already been set up for you by default (see condor submission package). If you need to modify it or add new queue classes, you may do so.
Each queue worker belongs to a worker class, which is registered in the master database via jug_worker_setup. This class of workers selects jobs from one or more queue classes, where the "job" is to launch a worker of some other type. This is best illustrated by example, because the relationship between the various classes of work and classes of worker can be a little confusing at first.
Suppose you have batches of jobs that all have approximately the
same execution requirements, and which, therefore, have all been
created with their execution class set to the same thing, say,
"MonteCarlo". To run these jobs, you need to have a worker class that
has its job_selector
set to match "MonteCarlo" and its
worker type set to "Execution". The name of the worker class is
arbitrary; it could be "MonteCarlo" or anything else, as long as the
job_selector
matches the batch
execution_class
.
Once this class of workers has been defined (using jug_worker_setup), you could simply
create a worker of this type using jug_make_worker
and
then you could directly run the worker or submit it to a batch system.
If you directly submit workers to a batch system, you probably will
let them retire when there is no more work to do (the default
behavior), so if new work is submitted to the Jug database and the
workers have already retired, you will need to submit more workers.
To automate that process, you use a queue worker. The queue worker
will automatically submit new job execution workers when there is work
for them to do.
To get this working, you simply need a queue worker class
configured via jug_worker_setup and
you need its job_selector
to match the
queue_class
that is configured for the workers that run
the "MonteCarlo" jobs. For example, the "MonteCarlo" workers could be
configured with queue_class="condor"
and the queue worker
could be configured with job_selector="condor"
. Any
special options (such as memory requirements) that Condor needs to
know about may be configured in the "MonteCarlo" worker class
queue_options
.
Once you have the queue worker class configured, you can create
a queue worker (using jug_make_worker
) and run it. For
scalability and reliability, you may even run multiple instances
of it at the same time. It is even possible to have one type of
queue worker create queue workers of another type, which in turn
create workers of yet another type.
Since job_selector
may contain a comma-seperated list
or even wildcards, there is a lot of flexibility in how you configure
things. You may have multiple types of queue workers that compete
with each other for the same classes of workers to run. You may also
have multiple types of workers that compete for the same classes of
batch jobs to run. For example, suppose you have access to a number
of clusters on a computing grid and suppose jobs that run at a given
site need to have some site-specific information injected into their
environment, so they know where to find things. If the method of job
submission is uniform across all the sites (e.g. Condor-G), you could
have a single queue worker class that selects from multiple job
execution worker classes, each representing a different execution
site. The site specific information can be placed in the execution
worker's environment settings (with jug_worker_setup) and the information
about where to submit the jobs can be placed in its
queue_options
.
To configure a worker class that selects jobs from a given queue class, use the tool jug_worker_setup. Example:
jug_worker_setup add \ --worker_class=condor_queue \ --worker_type=Queue \ --job_selector=queue_class \ --software=$JUG_SYS/sw_packages/submit_to_condor
Once you have created a queue class, you need to run the queue workers. Create a queue worker by doing something like this:
jug_make_worker --worker_class=condor_queue
Then you can start the worker by executing the python script that
is generated or by using jug_daemon
.
To write your own queue software, use
$JUG_SYS/sw_packages/submit_to_condor as a starting point. The
information you need will come in through the environment. You can
either run the job and wait for it to finish all in one script (the
run_command
), or you can launch the job and periodically
poll its status using polling_command
.