Author: Dan Bradley
Last Updated: 2007-11-19
This is a collection of useful patches that I happen to be aware
of for condor.pm, the Globus jobmanager for Condor. This file is
found in
$OSG_LOCATION/globus/lib/perl/Globus/GRAM/JobManager/
.
The most famous condor.pm hack in OSG is NFSLite, developed by Terrence Martin at UCSD. It is a relatively small patch that turns on Condor file transfer mode in order to reduce use of the NFS server. The standard input/output, user proxy, and files in the job's GRAM scratch directory are copied to/from the job's temporary scratch directory on the worker node.
NFSLite is currently available as a VDT package. There is further documentation here.
Why would you want to have a wrapper script start the user job? One reason is to have the environment variable OSG_WN_TMP set equal to the value of _CONDOR_SCRATCH_DIR. Then when the job runs, it can do its scratch work in the temporary directory created by Condor for the job. The advantage of this is that Condor automatically cleans up the contents of this directory if the job leaves anything behind.
One way to achieve this is to configure the worker nodes with a USER_JOB_WRAPPER. However, in Wisconsin, we flock OSG jobs to several condor pools and we don't want to make OSG-specific modifications to the configuration of these other nodes, if at all possible. Therefore, this small hack to condor.pm runs a wrapper script that requires no configuration or file installation on the worker nodes.
The wrapper script itself is here
The modification to condor.pm should be inserted in the section of
condor.pm
where the condor submit file is being created.
I put it after the line that sets X509UserProxy
.
Having OSG VO information in the job ClassAd is useful in a number of ways. For example, you can write machine RANK expressions that favor some OSG VO's over others.
Here are the lines to add to condor.pm
. I put them
after the line that sets X509UserProxy
.
By default, the Globus jobmanager for Condor inserts requirements that prevent jobs from running on both 32 and 64-bit systems. A simple modification will allow jobs to run on both types (i.e. assuming all jobs are actually 32-bit and all 64-bit systems have the necessary 32-bit compatibility libraries).
The modification is to comment out the following line: