cmsjug version 1.0

This is a collection of scripts used to run Monte-Carlo simulations of
the CMS experiment.  The jobs are all managed by JugMaster, from event
generation (cmkin) to simulation (oscar) to digitization (orca) to dst
production.  JugMaster provides job management features such as
drill-down error analysis, adjustable workload caps on different job
classes, multiple points of automated submission to the batch
system(s), dynamically expandable datasets (by simply increasing the
random seed range), pipelined workflow (co-scheduling data flow and
cpu usage), robust batch-aware storage services, and more.

You will need a working installation of JugMaster (v1.2 or higher):
http://www.hep.wisc.edu/~dan/jug/

The following files must be configured for your site:

setup.sh
jug_include/site.config

Once you have done this, you may install the CMSJug web monitor:

cgi/CMSJugCGI.py --install /path/to/cgi-bin/CMSJug.cgi


The configuration that we use at UW-HEP has the XML pool catalogs and
software readable from all worker nodes via AFS.  Data files reside in
dCache.  You could survive with the pool, software, and data files in
NFS as well.  Even better would be to remove the need for a shared
filesystem altogether.  Jug supports on-the-fly download and
installation of software tarballs, so this would not be trivial to set
up for all file access except for the pool catalog, which needs to
be handled differently, since it is updated in some steps.

At UW-HEP, all updates to the pool catalog and metadata
files are done by a "pool update" service running on a single machine,
where metadata attachment and dataset initialization occur.  (This
service is simply a specific class of JugWorker specified in
site.config.)  This guarantees that only one process is modifying
pool catalogs and metadata at any time.  It also allows us to use
a host-based AFS IP ACL to restrict updates to the pool catalog area.
No other jobs need write access to the pool area.

The software installation may be created from DAR or xcmsi.  Either
way, you will need to turn the installation into a jug software package
by simply adding a sub-directory named "package" containing a file
named "setup.sh" which initializes the environment for the package.

To create a batch of jobs, you do something like this:

source cmsjug/setup.sh
cd submit/template

#The following should all be on one command line (newlines are only
#for easy reading).  Since this gets awkward, see my_dataset.jug for
#an example of how to do the same thing from within a submit file.

jug_submit cmkin_to_digi_no_pu.jug
  start_seed=1230000
  events_per_job=500
  jobs=125
  dataset="test"
  cmkin_package="CMKIN_2_0_1"
  cmkin_rcfile="`pwd`/test.cards"
  cmkin_exe="kine_make_ntpl_pyt6220.exe"
  oscar_rcfile="`pwd`/oscar365.rc"
  oscar_package="OSCAR_3_6_5"
  oscar_name="oscar365"
  geometry_package="cms133"
  pool_template="MBmsel2_oscar365"
  digi_rcfile="`pwd`/digi871_no_pu.rc"
  orca_package="ORCA_8_7_1"
  digi_name="digi871_no_pu"


For that example to work, you'll need to install cmkin, oscar, and
orca.  You'll also need to create test.cards for use by cmkin.

If you haven't already done so, you may need to use jug_worker_setup
to configure Jug workers for the different execution and storage
classes you configured in site.config.  If you don't configure
the needed worker classes, jug_submit will give you an error message
telling you what is missing.

The classes of Jug workers that you will need to configure are:

storage         -- stores output
pool            -- modifies the pool catalog and metadata
execution       -- runs any mostly cpu job (cmkin, oscar, digi with no pileup)
execution.digi  -- runs digitization with pileup (heavy I/O)
execution.dst   -- runs DST jobs (heavy I/O)

An example storage setup would be something like this:

jug_worker_setup add \
  --worker_class=uwhep.storage \
  --worker_type=Storage
  --job_selector=uwhep \
  --base_output_path=/pnfs/hep.wisc.edu/uwhep1/output \
  --software="
    /afs/hep.wisc.edu/cms/sw/dar/jug_sw_packages/dccp.tgz
    /afs/hep.wisc.edu/cms/sw/dar/jug_sw_packages/store_to_disk.tgz" \
  --environment="
    CP_COMMAND=dccp -d 2
    BASE_URL=dcap://cms-dcache.hep.wisc.edu:22125"

The pool worker is a single worker that runs in whatever environment
that is necessary to be able to update the pool files.  The
execution_class of this worker should be the same as the pool
execution_class configured in site.config.  Example:

jug_worker_setup \
  --worker_class=uwhep.pool \
  --worker_type=Execution \
  --job_selector=uwhep.pool \
  --runtime_options="
    no_failure_cleanup=1
    stay_alive=1
    max_queue_depth=1"

For the rest of the execution classes, you could simply configure a single
worker class that is submitted to a batch system such as Condor.  Example:

jug_worker_setup \
  --worker_class=uwhep.exec \
  --worker_type=Execution \
  --job_selector=uwhep,uwhep.digi,uwhep.dst
  --queue_class=condor \
  --queue_options="
    minMemory=250
    minDisk=3000"

If you wish to limit the number of simultaneously running jobs of any
particular class, you can do so with jug_workload_constraint.  For example:

jug_workload_constraint add \
  --name=uwhep.digi \
  --job_selector=uwhep.digi \
  --max_assigned=100

--
Dan Bradley
http://www.hep.wisc.edu/~dan/