cmsjug version 1.0 This is a collection of scripts used to run Monte-Carlo simulations of the CMS experiment. The jobs are all managed by JugMaster, from event generation (cmkin) to simulation (oscar) to digitization (orca) to dst production. JugMaster provides job management features such as drill-down error analysis, adjustable workload caps on different job classes, multiple points of automated submission to the batch system(s), dynamically expandable datasets (by simply increasing the random seed range), pipelined workflow (co-scheduling data flow and cpu usage), robust batch-aware storage services, and more. You will need a working installation of JugMaster (v1.2 or higher): http://www.hep.wisc.edu/~dan/jug/ The following files must be configured for your site: setup.sh jug_include/site.config Once you have done this, you may install the CMSJug web monitor: cgi/CMSJugCGI.py --install /path/to/cgi-bin/CMSJug.cgi The configuration that we use at UW-HEP has the XML pool catalogs and software readable from all worker nodes via AFS. Data files reside in dCache. You could survive with the pool, software, and data files in NFS as well. Even better would be to remove the need for a shared filesystem altogether. Jug supports on-the-fly download and installation of software tarballs, so this would not be trivial to set up for all file access except for the pool catalog, which needs to be handled differently, since it is updated in some steps. At UW-HEP, all updates to the pool catalog and metadata files are done by a "pool update" service running on a single machine, where metadata attachment and dataset initialization occur. (This service is simply a specific class of JugWorker specified in site.config.) This guarantees that only one process is modifying pool catalogs and metadata at any time. It also allows us to use a host-based AFS IP ACL to restrict updates to the pool catalog area. No other jobs need write access to the pool area. The software installation may be created from DAR or xcmsi. Either way, you will need to turn the installation into a jug software package by simply adding a sub-directory named "package" containing a file named "setup.sh" which initializes the environment for the package. To create a batch of jobs, you do something like this: source cmsjug/setup.sh cd submit/template #The following should all be on one command line (newlines are only #for easy reading). Since this gets awkward, see my_dataset.jug for #an example of how to do the same thing from within a submit file. jug_submit cmkin_to_digi_no_pu.jug start_seed=1230000 events_per_job=500 jobs=125 dataset="test" cmkin_package="CMKIN_2_0_1" cmkin_rcfile="`pwd`/test.cards" cmkin_exe="kine_make_ntpl_pyt6220.exe" oscar_rcfile="`pwd`/oscar365.rc" oscar_package="OSCAR_3_6_5" oscar_name="oscar365" geometry_package="cms133" pool_template="MBmsel2_oscar365" digi_rcfile="`pwd`/digi871_no_pu.rc" orca_package="ORCA_8_7_1" digi_name="digi871_no_pu" For that example to work, you'll need to install cmkin, oscar, and orca. You'll also need to create test.cards for use by cmkin. If you haven't already done so, you may need to use jug_worker_setup to configure Jug workers for the different execution and storage classes you configured in site.config. If you don't configure the needed worker classes, jug_submit will give you an error message telling you what is missing. The classes of Jug workers that you will need to configure are: storage -- stores output pool -- modifies the pool catalog and metadata execution -- runs any mostly cpu job (cmkin, oscar, digi with no pileup) execution.digi -- runs digitization with pileup (heavy I/O) execution.dst -- runs DST jobs (heavy I/O) An example storage setup would be something like this: jug_worker_setup add \ --worker_class=uwhep.storage \ --worker_type=Storage --job_selector=uwhep \ --base_output_path=/pnfs/hep.wisc.edu/uwhep1/output \ --software=" /afs/hep.wisc.edu/cms/sw/dar/jug_sw_packages/dccp.tgz /afs/hep.wisc.edu/cms/sw/dar/jug_sw_packages/store_to_disk.tgz" \ --environment=" CP_COMMAND=dccp -d 2 BASE_URL=dcap://cms-dcache.hep.wisc.edu:22125" The pool worker is a single worker that runs in whatever environment that is necessary to be able to update the pool files. The execution_class of this worker should be the same as the pool execution_class configured in site.config. Example: jug_worker_setup \ --worker_class=uwhep.pool \ --worker_type=Execution \ --job_selector=uwhep.pool \ --runtime_options=" no_failure_cleanup=1 stay_alive=1 max_queue_depth=1" For the rest of the execution classes, you could simply configure a single worker class that is submitted to a batch system such as Condor. Example: jug_worker_setup \ --worker_class=uwhep.exec \ --worker_type=Execution \ --job_selector=uwhep,uwhep.digi,uwhep.dst --queue_class=condor \ --queue_options=" minMemory=250 minDisk=3000" If you wish to limit the number of simultaneously running jobs of any particular class, you can do so with jug_workload_constraint. For example: jug_workload_constraint add \ --name=uwhep.digi \ --job_selector=uwhep.digi \ --max_assigned=100 -- Dan Bradley http://www.hep.wisc.edu/~dan/