|
2004 CDF E-Log -- Owl shift. Sun Feb 29, 2004 |
| SciCo |
DAQ Ace |
Monitoring Ace |
CO |
(Operations Manager) |
| S.Miscetti |
A.Lister |
I,Vollrath |
B.Mohr |
JJ Schmidt |
Start of Shift Notes:  Store 3251 still running at 1.56*10^31
normal data taking in progress.
Sun Feb 29 01:16:46
Run 179480
ACTIVE: L2 DTO b0fcal01: shepherded once, no luck, halted run, shepherded again then was OK. - alison X 2080
Sun Feb 29 01:20:15



- plots (23:15-01:00) --ian
Sun Feb 29 01:47:20
Run 179480
ACTIVE: b0dap84 ConsumerError CER_SVXMON_HALT_RECOVER_RUN_ERROR
Stuck cell S/B1/W4/L4/C7-13
HRR solved the problem - alison X 2080
Sun Feb 29 01:51:13
 | pinkie in imon (related to above entry by alison) |
- ian
-- Sun Feb 29 02:17:27 comment by...ian -- informed si pager carrier about this ... current is remaining steady at 1025
Sun Feb 29 01:56:45
Side note:
When entering comments in the e-log straight from run control
(parameters ->Add Comment to e-log and Run Database)
if you try and do a copy-pate (i.e. select area- middle click) from either a popup window or from
the error display, you get only one line "anadi x2080".
Despite Anadi being a very nice person I don't think she is the cause for all the problems we have
:-)
Any ideas? I am sure this worked last week...
I have just done a ktickets in order to see the shepherding messages
(which is what you have to do to get the window back to "normal", no need to restart RC or anything,
it just stops displaying messages when kticket has expired, Bill B informed me of this the other
day)
but odn't think that's the cause for this problem... but you never know...
Middle clicking in the web browser works fine...
- alison
Sun Feb 29 03:07:10



- plots (01:00-03:00) --ian
Sun Feb 29 05:06:19



- plots (03:00-05:00) --ian
Sun Feb 29 05:46:34
Around 4:40 am the monitoring processes got stuck while
the DAQ ace was complaining for the some problem
detected by PROCMON. While trying to restart the
calib_consumers we realized that node b0dap83 was
not responding. We went to look for the situation
in the third floor and found nodes b0dap79-84 off.
We paged DAQ experts and then the online sys.administrators.
After discussing with them we realized that the
5 nodes were connected to a normal plug which was
in fault and we just reset it.
Now the machines are back. DAQ Ace is following
the recovery procedure.
- S.Miscetti
-- Sun Feb 29 09:28:16 comment by...jj -- I have sent email to Dervin asking that this rack be
checked for power problems.
Sun Feb 29 06:29:06
Startup of the Proxys and the consumer State manager:
Following the instructions on the web did not work.
The processes simply didn't start.
Jane Nachtman (DAQ expert) looked into the logfiles for the Calib Consumer proxy and found that the
rtserver still thought it was connected (i.e. the smarksockets connections were not closed when the
computers crashed, almost as bad as OPC communication to be used at LHC). Restarting the rtserver
closed all the "open ended" connections and we were able to restart almost all the processes:
Calib Consumer Proxy
Soft EVD Proxy
ResMgr Proxy
and Consumer Monitor Proxy
The only one that did not start was Consumer State Manager.
The error was:
INFORMATIONAL: Product 'cdfsoftb0' (with qualifiers ''), has no 5.3.0pre1d version (or may not
exist)
nice: StateManager: No such file or directory
tcgetattr: Inappropriate ioctl for device
This is because in the startup script there is a line which does a:
setup cdfsoftb0 5.3.0pre1d.
This release was found to no longer exist.
Modifying the script to be simply
setup cdfsoftb0
(which uses the defauly release, currently thought to be 5.3.1pre4d) seems to allow the script to
run properly.
So Consumer State Manager now running fine.
Jane will e-mail Kaori to check the status of the release and check the script with her again.
One further comment:
The Expert and the DAQ webpages don't agree on which computer one has to log on to to restart the
Consumer State manager:
ACE: says connect to b0dap32
Expert: says b0dap84
The expert page was most up to date and seemed to work.
Please could someone change the ACE DAQ webpage and also remove the part about restarting DBbroker
Proxy which is no longer used (has been removed from the ProcMon GUI but not from the webpage yet).
Thanks.
- alison (in a verbose mode today)
Sun Feb 29 06:40:26
MCR called to advise that the store will be kept
approximately up to 9.30 am. D0 has a request for
1 hour controlled access.
- S.Miscetti
Sun Feb 29 06:53:31



- plots (05:00-06:30) --ian
Sun Feb 29 06:55:52
More notes on the CPU power failure. I last checked the consumer displays ~10 min before the
failure, after which the displays froze. At that time, all of the plots looked OK. After recovery,
I had no trouble restarting all Mons and Displays. I waited about an hour to check the plots again
to collect statistics, and the plots appear to be OK still.
One more note: When I ran "startDisplay.sh All," the location of the TrigMon and BeamMon displays
were reversed. Running their respective individual scripts also gave me the same problem. (ie.
BeamMon display starts when I try to start TrigMon display). This is probably due to an error in
the scripts.
- B. Mohr
Sun Feb 29 07:54:30



- plots (06:30-07:30) --ian
Sun Feb 29 07:55:41
| Run Number |
Data Type |
Physics Table |
Begin Time |
End Time |
Live Time |
L1 Accepts |
L2 Accepts |
L3 Accepts |
Live Lumi, nb-1 |
GR |
SC |
RC |
|
179480
x2BD18 |
BEAM |
PHYSICS_2_03 [1,431,435] |
21:39:09 |
|
08:28:40 |
358,627,714 |
4,524,697 |
1,023,383 |
432.136 |
|
|
1 |
| Totals |
|
|
|
07:55:01 |
08:28:40 |
358,627,714 |
4,524,697 |
1,023,383 |
432.136 |
|
|
|
- End of Shift Report
Sun Feb 29 08:02:10
Shift Summary: Store 3261 still in progress.
Luminosity degraded from 1.56E31 to 1.21 E31.
- Quiet data taking.
- At 4:40 am we lost the connection with b0dap79-b0dap84 nodes.
We investigated and found the nodes had the power OFF. After discussing
with the experts we restored the power.
- All consumer processes where completely restarted around 6:00.
- Silicon:
- At 7:30 we got a trigger Inhibit from SVX. It looks very similar to the one of yesterday night:
IFIX sets the inhibit but do not provide the localization of the problematic Caen Crate. Experts
investigating.
MCR called to advise that the store will be kept up to 9:30 am.
Plan is continue running and then follow JJ's plan as reported on the eve-shift log
End of Shift Numbers
|
CDF Run II
Runs 179480
Delivered Luminosity 393.2
Acquired Luminosity 360.4
Efficiency 91.7
|
- S.Miscetti
Sun Feb 29 08:02:37
had svx hv inhibit but no alarms. low voltage bar of svx on hv summary was gone. inhinbit
status told us it was SVX B4W3L0. luckily expert was here to relate this to a crate number ... crate
was hockerized successfully.
- ian :: (run 179480)