Jug Storage Workers

A storage worker is the same as a normal worker (i.e. JugWorker), but it runs storage jobs. A storage job could be as simple as moving a file into a directory. The storage software can be installed on the fly from a package, just like the software for other jobs run by JugWorker, so you can make Jug interface to whatever storage system you want.

Setup

Each storage worker belongs to a worker class, which is registered in the master database via jug_worker_setup. This class of workers selects jobs from one or more storage classes. The idea is that you may want the output from one or more batches of jobs to be handled one way, and you may have a different set of jobs that should be handled a different way (e.g. different machines or different storage software).

To configure a worker class that selects jobs from a given storage class, use the tool jug_worker_setup. Example:

jug_worker_setup add \
  --worker_class=store_to_disk \
  --worker_type=Storage \
  --job_selector=storage_class \
  --software=$JUG_SYS/sw_packages/store_to_disk \
  --base_output_path=/data/output

Once you have created a storage class, you need to run the storage workers. Create a storage worker by doing something like this:

jug_make_worker --worker_class=store_to_disk

Then you can start the worker by executing the python script that is generated or by using jug_daemon.

To write your own storage software, use $JUG_SYS/sw_packages/store_to_disk as a starting point. The information you need will come in through the environment, and the information you generate about the location of the file should be written to a file named jug_storage.log.

By default, the storage server will also provide remote access to files in the output area (assuming the storage log contains a host-file-name (HFN) path to the file). This is convenient, for example, when using parent/child jobs where the output from one job becomes the input of another. In this case, the HFN for the file is automatically translated into the URL of the storage worker. You can use a different file server if the storage log specifies some other URL for the file.