Managing the DAQ

The DAQ program is called Phoenix. It typically writes one 4.6MB file per day containing the profiles for one event, and a smaller amount of summary data for the profiles for all the events of the day into the database.

It also writes a logfile named logfile.'dateformat1'_'dateformat2' dateformat1 is "seconds in epoch", typically something like "1330958633" dateformat2 is "yearmonthdayhourminutesecond", typically something like "20120305154353" So a typical logfile is named something like "logfile.1330958633.20120305154353"

The DAQ, if unattended, can and will use up its file quota. From time to time you should copy off the .root event files. The log files are useful for debugging problems, but are not otherwise used.

Starting the DAQ

  1. slogin -Y cmsusr1
  2. slogin -Y dcspcs2g19-36.cms
  3. sudo -u emudcops bash
  4. cd ~emudcops
  5. cd DCOPS_READOUT
  6. ./runPhoenix

You must be on the sudoers list for emudcops. So far this only includes James Bellinger, Xiaofeng Yang, and I think Oleg Prokofiev.

The DAQ takes about 15-20 minutes to initialize the system. It starts in IDLE state and you must tell it to RUN

  1. Start it from the DCS console

OR

  1. source ~jnb/bypass/RUN

OR

  1. source ~jnb/bypass/setup.com
  2. ~jnb/bypass/bypass2
  3. RUN
  4. ZZZZ or whatever you please to exit the program

Starting DIM DNS

After a reboot!

The dim dns service does not come back automatically after a reboot.

  1. slogin -Y cmsusr1
  2. slogin -Y dcspcs2g19-36.cms
  3. sudo -u emudcops bash
  4. cd ~emudcops
  5. cd dim
  6. source setup.com
  7. ./linux/dns >& dns.log &

Management

The bypass2 program allows much more detailed control of the DAQ than DCS.

  • AREYOUALIVE Check that the DAQ is still responding
  • HALT PAUSE and re-initialize the program; same as STOP
  • STOP PAUSE and re-initialize the program; same as HALT
  • ABORT ABORT the current readin, PAUSE and re-initialize the program
  • STANDBY PAUSE the program, same as PAUSE
  • PAUSE PAUSE the program. Does not try to stop the current event
  • RUNONCE RUN for one readout event and then PAUSE; same as RUN ONCE
  • RUN ONCE RUN for one readout event and then PAUSE; same as RUNONCE
  • RUN Begin reading events. An event cycle takes about 20 minutes
  • RESET re-initialize the program; only good if PAUSE'd
  • SAVEPROFILES Write a root file for every event
  • DROPPROFILES Do not write a root file for every event. Dailies are still written
  • SETDCOPSOFF Mask the readout of the specified problematic DCOPS
  • SETDCOPSON Allow the readout of the specified DCOPS
  • THUMBNAILS Print histograms of the 4 profiles for the specified DCOPS. The next events will automatically print new ones. This is a very fast way of checking suspicious readouts without stopping the DAQ to run PickOne.

    From time to time (at least once a month) copy the .root profiles to some offline location and delete them from the online area. I have been putting these in /afs/cern.ch/cms/CAF/CMSALCA/ALCA_MUONALIGN/HWAlignment/Endcap/profileData

    On each new event the bypass2 program writes a summary of what it knows about what failures it knows about for all the SLMs and transfer lines. 0 is good. It does not have any useful information until after it receives an event, so don't panic if you see lots of problems on startup. Each SLM or transfer line has 2 lasers, so there are two sets of numbers for each SLM.

    For example, if the MAB low voltages are off, the middle 4 columns in each of the XFRn blocks of numbers will be full of 6's. This does not diagnose bad profiles, merely readout issues.

    Fixing Problems

    If you do not get any response from the DCS or the bypass2, the dim dns service is probably not running. See the instructions at Starting DIM DNS above.

    Check to see if the DAQ is active:

    1. source ~jnb/bypass/setup.com
    2. ~jnb/bypass/bypass2
    3. AREYOUALIVE
    It should return a status statement like:
    DAQ update at 1330962681	05-Mar-12 04.51.21 PM	 status=1=> current=1 target=0
    
    For your purposes the interesting part is the date. If it is the current date, then the DAQ is not hung, and will respond to PAUSE or other commands.

    The terminal server is not 100% reliable, and sometimes a port will lock up. No futher readouts on that port are possible until the terminal server has been rebooted. PAUSE the DAQ and wait until the event is finished reading out.

    1. source ~jnb/bypass/setup.com
    2. ~jnb/bypass/bypass2
    3. PAUSE
    4. wait until the program starts writing out lines like "Loop 1330959204 05-Mar-12 03.53.24.000000 PM"
    5. ZZZZ or whatever you please to exit the program
    6. slogin -Y dcspcs2g19-36.cms
    7. sudo -u emudcops bash
    8. cd ~emudcops
    9. cd DCOPS_READOUT
    10. ./KILL.com
    11. ps auxww | grep emudcops
    12. ./KILL.com again if there are still hung subprocesses
    13. telnet -l root alignterminal01.cms (or 02 if that's the offending one)
    14. password DCOPS
    15. reboot
    16. Wait 2-3 minutes for the terminal server to reboot. You can improve the shining hour by rebooting the other one as well, just for safety's sake.
    17. ./runPhoenix
    18. ~jnb/bypass/bypass2
    19. wait until the program starts writing out lines like "Loop 1330959204 05-Mar-12 03.53.24.000000 PM"
    20. RUN
    21. ZZZZ or whatever you please to exit the program