Jug Queue Workers

A queue worker is the same as a normal worker (i.e. JugWorker), but it runs queue jobs instead of normal job execution jobs. When there is work to do, the queue worker creates workers and submits them to a batch system. Jug comes with a queue management package that supports Condor, and additional packages may be easily added.

Setup

A queue worker class for running under Condor should have already been set up for you by default (see condor submission package). If you need to modify it or add new queue classes, you may do so.

Each queue worker belongs to a worker class, which is registered in the master database via jug_worker_setup. This class of workers selects jobs from one or more queue classes, where the "job" is to launch a worker of some other type. This is best illustrated by example, because the relationship between the various classes of work and classes of worker can be a little confusing at first.

Suppose you have batches of jobs that all have approximately the same execution requirements, and which, therefore, have all been created with their execution class set to the same thing, say, "MonteCarlo". To run these jobs, you need to have a worker class that has its job_selector set to match "MonteCarlo" and its worker type set to "Execution". The name of the worker class is arbitrary; it could be "MonteCarlo" or anything else, as long as the job_selector matches the batch execution_class.

Once this class of workers has been defined (using jug_worker_setup), you could simply create a worker of this type using jug_make_worker and then you could directly run the worker or submit it to a batch system. If you directly submit workers to a batch system, you probably will let them retire when there is no more work to do (the default behavior), so if new work is submitted to the Jug database and the workers have already retired, you will need to submit more workers. To automate that process, you use a queue worker. The queue worker will automatically submit new job execution workers when there is work for them to do.

To get this working, you simply need a queue worker class configured via jug_worker_setup and you need its job_selector to match the queue_class that is configured for the workers that run the "MonteCarlo" jobs. For example, the "MonteCarlo" workers could be configured with queue_class="condor" and the queue worker could be configured with job_selector="condor". Any special options (such as memory requirements) that Condor needs to know about may be configured in the "MonteCarlo" worker class queue_options.

Once you have the queue worker class configured, you can create a queue worker (using jug_make_worker) and run it. For scalability and reliability, you may even run multiple instances of it at the same time. It is even possible to have one type of queue worker create queue workers of another type, which in turn create workers of yet another type.

Since job_selector may contain a comma-seperated list or even wildcards, there is a lot of flexibility in how you configure things. You may have multiple types of queue workers that compete with each other for the same classes of workers to run. You may also have multiple types of workers that compete for the same classes of batch jobs to run. For example, suppose you have access to a number of clusters on a computing grid and suppose jobs that run at a given site need to have some site-specific information injected into their environment, so they know where to find things. If the method of job submission is uniform across all the sites (e.g. Condor-G), you could have a single queue worker class that selects from multiple job execution worker classes, each representing a different execution site. The site specific information can be placed in the execution worker's environment settings (with jug_worker_setup) and the information about where to submit the jobs can be placed in its queue_options.

To configure a worker class that selects jobs from a given queue class, use the tool jug_worker_setup. Example:

jug_worker_setup add \
  --worker_class=condor_queue \
  --worker_type=Queue \
  --job_selector=queue_class \
  --software=$JUG_SYS/sw_packages/submit_to_condor

Once you have created a queue class, you need to run the queue workers. Create a queue worker by doing something like this:

jug_make_worker --worker_class=condor_queue

Then you can start the worker by executing the python script that is generated or by using jug_daemon.

To write your own queue software, use $JUG_SYS/sw_packages/submit_to_condor as a starting point. The information you need will come in through the environment. You can either run the job and wait for it to finish all in one script (the run_command), or you can launch the job and periodically poll its status using polling_command.