
One error in the DAQ is
commandRetrieveProfile: input fail/eof at data line 6
It is sufficiently rare to be hard to catch, but here's one. This is the last line--the line it failed on:
cRP: 0042;0070TT04A;0041;^M
The correct format is something like 0042;0041;0041;0043;^M with 4 blocks of hex data terminated by semicolons and a newline at the end. Notice the interposition of 0042;0070TT04A;0041; a command (70TT) inside the body of the text. This appeared at line 6, which should be safely inside the stream of text data from the DCOPS. This is clearly not due to running out of data, but represents a readout glitch which overwrites part of the output stream. Perhaps this demands a slightly longer delay after the wakeup TT commands?
The other error (somewhat more frequent) is
commandRetrieveProfile: input fail/eof at data line 2047
This means somehow the data is only 2047 instead of 2048 lines long. The last line is
cRP: <052>(or 94, 72, 59, 47, or 45 in the sample I have). This is a little harder to interpret: is the first line lost or the last one, or is one of the lines along the way missing a newline?
Sometimes I'm getting a bad temperature too: 44,41,42,48 (or of course 999). 4 times this happens just before a failure to read (only 2047). Suggests that the temperature is read from the wrong place?
Updated: 30-July
New failure, of the same class as before but less clear:
commandRetrieveProfile: input fail/eof at data line 4 cRP: 0041;004A;00h<93>¶^F^NFî¶Ö<98>8f<98>41;0048;0044;0047;^M
The bracketed quantities above are non-printable characters. The same sort of thing occurs later:
commandRetrieveProfile: input fail/eof at data line 6 cRP: 0048;42<86>^UÕ^F^NFζ^F^FFnnÖ<98>^XfÌ48;0049;0047;0043;^M
There were 24 leftover telnet processes from 23 and 24-July: something didn't clean up properly then. I don't know if this had any bearing, though I doubt it did.
Update: 30-July 11:40
Found a smoking gun. We have some corruption of the telnet reply. I can't safely untangle this. No robust software solution... I could interpolate...
0049;0046;0044;003D;^M 0048;0045;0043;003^E3&ªQyÌ49;0046;0044;003D;^M 0049;0046;0043;003D;^M
Update: 31-July
I increased the delay after the last TT to .8 seconds from .5. It seems to have had little effect, but the statistics are small. I see 3 of the 7 failures are "unbracketed" in the sense that the noise fills a line of data either completely or so that I cannot find semicolons on each side of it. That means that it is difficult to determine how much data was lost. I still get some command data mixed in, despite the longer wait:
commandRetrieveProfile: input fail/eof at data line 5 cRP: 0040;0049;0035TT048;^M
Since this occurs in the first few lines, which take very little time (less than 1/100 second, ideally), I don't think the delay after the TT is actually doing anything. The fact that so much looks like pure garbage suggests noise. So far it seems to be random.
Updated: I count 9 failures in 100 read cycles, where there are 180 DCOPS reads per cycle. That's 1 failure in 2000 or 0.05%
Modified 31-July-2008 at 22:09
http://hep.physics.wisc.edu/~jnb/cms/29Jul2008
| Previous notes | Next notes | Main slide directory |