You must first find out the path to your files in UW-HEP dCache.
To do a bulk copy of all of the files in a dataset, it is easier to use dccp from the command line. To do this, log into the server where you have some scratch space to store the files (e.g. sesame). You may need to go through login.hep.wisc.edu if you are outside the hep.wisc.edu domain. Next, find the directory containing your files in /pnfs/hep.wisc.edu/.../dataset. (You will need to be on a computer with /pnfs/hep.wisc.edu mounted to do that.) Then copy the files like this:
source /afs/hep.wisc.edu/cms/cmsprod/setup.sh dccp_many -r /pnfs/hep.wisc.edu/.../dataset /data/mydata/
dccp_many is simply a script that calls
dccp for each file in a list of files, since dccp itself
will only copy one file at a time.
Yes. All data files are globally readable through dCache through several protocols: dcap, http, gsiftp, srm.
If you want to get a local copy of data files from UW-HEP, you just need a list of filenames. Once you have that, you may copy them. For example, to get the files via http, assuming you have the /pnfs paths stored in a file named "filenames":
#run bash if you are using some other shell cd /local/path/to/store/files for file in `cat filenames`; do wget http://cmsdcap.hep.wisc.edu:2244$file done
You can also retrieve the files via the dcap protocol. For example:
dccp dcap://cmsdcap.hep.wisc.edu:22125/pnfs/hep.wisc.edu/cmsprod/test_file .
Yet another way is to use the SRM protocol. Example:
ls. However, it is
usually the case that your unix user id does not match the unix user
id that owns the files in dCache, so you have to go through an
additional step in order to do other operations, such as rm, mv,
mkdir, rmdir. Ideally, these operations could be done via SRM
commands. However, the current version of SRM does not allow this.
Therefore, the best solution is to submit file management commands
through the grid. Your grid credentials should map you to the same
user who owns the files in dCache, so you should be able to do any
namespace operations (such as rm, mv, mkdir, etc.).
For interactive file management, the simplest solution is to run an xterm
on the grid gatekeeper. Here is a script that does this for you:
An xterm should eventually pop up on your screen and you should find that
you are at a shell prompt as the user that your grid credentials are mapped
to. You can then cd> into the pnfs directory where you want
to work and start running commands.
If you get errors about not being able to open the display, then you need to make sure your ssh client is forwarding an X session when you log in. For example, on a Mac running OS X, you need to do something like the following:
ssh -X user@login.hep.wisc.edu
grid-proxy-init
grid-xterm-uwhep
Access to some files seems to take forever. How can I tell if dCache is functioning properly?
Normally, our dCache service functions very well. However, we are still working on improving the service to overcome occasional difficulties. You can see the list of active transfers in the dCache server: cms-dcache.hep.wisc.edu:2288. In the far right column, you can see if dCache is having trouble accessing the file if it says "Staging" or "No Mover Found" instead of showing a transfer speed. These messages are expected for short periods of time in a heavily loaded system, but they should go away after a few minutes.
You can also test the ability to access individual files using dccp or any of the other file transfer mechanisms. See Copying Data Files.
The files being transferred are reported by pnfsid. If you need to
find out what the filename is, you may use the following command if
you are in the CMS AFS group and on any machine with
/pnfs/hep.wisc.edu mounted.
source /cms/cmsprod/setup.sh #initialize your environment
dcache_pnfs_pathfinder pnfsid
The files in dCache may also be accessed through xrootd, which offers very high throughput when this is required (e.g. serving pileup data for digitization at high luminosity).
If you know the /pnfs path to a file, you can have root read from
the file directly by prepending
dcap://cmsdcap.hep.wisc.edu:22125 to the file name that
you give to root.
Files in dCache larger than 2GB appear in /pnfs with size 1. This is due to
a limitation of the NFS protocol. To see the real size, you can use
srm-get-metadata. Example:
farmoutRandomSeedJobs jobName nEvents nEventsPerJob /path/to/CMSSW /path/to/configTemplate
There is an example configuration template here. Use the --help option to see all of the options.
This script will run cmsRun root files in a directory or directory tree. By default, it runs on all root files in a directory in your /pnfs area, using the jobName that you specify to find the files. However, you can direct it to an anlternate path and tell it to exclude root files with names matching a pattern that you specify.
For full options to the script, use the -h option.
Here is a brief synopsis:
farmoutAnalysisJobs [options] jobName /path/to/CMSSW /path/to/configTemplate
There is an example configuration template here.
mergeFiles [options] output_file.root input_directory(s)
Use mergeFiles -h for a full list of options.
You can submit your job from your working directory in AFS but it is preferable to submit from a local disk, such as /scratch/username. If you don't explicitely provide the names of the input/output directories then your submit directory will be assumed for all input/output operations involving relative paths.
If your output files are being written into AFS, you must make the directory writable by any process running on the machines where condor runs. This should only be done if absolutely necessary. AFS performance may suffer if hundreds of condor jobs all pound on it at the same time. This is also dangerous from a security standpoint, so do not do this on directories containing executables etc. Certainly do not do it on your home directory.
If you really must write to AFS from your Condor jobs, here how
you must prepare the AFS directory:
mkdir /path/to/data
fs setacl -dir /path/to/data -acl condor-hosts rlidkw
If you are not using AFS to write output, you must enable Condor's file-transfer mechanism as in the example below.
For full details on how to submit jobs to Condor, see the Condor Manual or the Quick Start. Here is a simple example of a submit description file that you could use to submit a job from one of the login machines at the Wisconsin Tier-2:
Executable = /path/to/your/executable (ex: cmsRun) Arguments = arg1 arg2 ... GetEnv = true Universe = Vanilla should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = inputfile1 inputfile2 ... output = job.out error = job.err log = job.log notification = never on_exit_remove = (ExitBySignal == FALSE && ExitStatus == 0) ImageSize = 900000 +DiskUsage = 2000000 Requirements = TARGET.HasAfs =?= True Queue(What is the meaning of the above variables and what do they do ? See Explanation . )
Condor uses a fair sharing algorithm to distribute resources. Users who claim lots of resources gradually get less priority, so that others do not get starved for resources. In special cases, we may need to adjust priorities in order to get important work done on schedule.
Since your jobs may run anywhere on the Madison campus Condor grid,
your jobs may also be landing in "unfriendly" territory where they are
likely to be preempted after a short amount of time. If your job
needs a minimum of X time in order to get anything done and you don't
want to have it try to run on resources that can't guarantee that
amount of uninterrupted time, then you can specify this in the
requirements expression. Example:
requirements = (TARGET.MaxJobRetirementTime >= X)
where X is the number of seconds of runtime that your
job requires. Just be careful not to set this too high or you may not
find any matching resources. A reasonable value is one or two days.
You can use condor_q -analyze
Jobs submitted to Condor at the Wisconsin Tier-2 may run on
resources distributed across the campus grid. It can take a few
minutes for the Condor negotiator to come around to your newly
submitted job and try finding a machine to run it on. If no machines
are immediately available, the job waits in the idle state ('I' in the
condor_q output).
To see how many machines could possible run your job, you can use the following command:
condor_q -pool glow.cs.wisc.edu -analyze <jobid>
If your job requirements do not match very many machines, you can try to analyze the requirements:
condor_q -pool glow.cs.wisc.edu -better-analyze <jobid>
It may happen that your urgent jobs have no problem matching the requirements of lots of machines, but they are still idle due to machines being busy with other jobs. In this case, let us know and we can see if a priority adjustment would help.
WARNING: File /afs/hep.wisc.edu/user/blah/blah.out is not writable by condor. WARNING: File /afs/hep.wisc.edu/user/blah/blah.error is not writable by condor.The above indicates that the directory "blah" doesn't have write permission for condor-hosts. You really should avoid submitting jobs from AFS if at all possible. If you really must submit from AFS, see the recipe for setting up the ACLs on the AFS directory here.
Fermilab uses kerberos 5 to authenticate users. The default ssh
client at the Wisconsin Tier-2 is only able to handle kerberos 4.
However, a kerberos 5 enabled version of the ssh client is provided.
Example:
kinit fnal-usernamey@FNAL.GOV
ssh-krb5 -2 fnal-username@cmsuaf.fnal.gov
Once you get connected, you will find that you have no AFS token or
other kerberos credential at Fermilab. If you do kinit
above with the -f option, this will cause your credential
to be forwarded when connecting to some Fermilab computers, but for
others I find that my login attempt hangs, so rather than using a
forwardable kerberos ticket, you may just have to authenticate again
(but this time from Fermilab):
kinit fnal-username@FNAL.GOV
Email: tier2-support@hep.wisc.edu