|
2004 CDF E-Log -- Owl shift. Fri Feb 27, 2004 |
| SciCo |
DAQ Ace |
Monitoring Ace |
CO |
(Operations Manager) |
| Guram Chlachidze |
Alison Lister |
Catalin Ciobanu |
Oleg Poukhov |
J.J.Schmidt |
Start of Shift Notes:  Taking cosmics. Tevatron in stacking.
Fri Feb 27 00:18:02
TOF heartbeat. SMACS crashed. We restarted it. - Catalin
Fri Feb 27 01:12:21
b0clc00 had a "U" error in the VxWorks monitor window.
Shepherding the crate fixed this.
- Alison/Catalin
Fri Feb 27 04:20:00
Still waiting beam, so we are taking cosmics - Guram
Fri Feb 27 04:48:43
 | Pinky (B2 W7 L4) stable throughout the shift |
- Catalin
Fri Feb 27 05:11:35
Run 179416
ACTIVE:
Attention !!!. CER_SVXMON_HALT_RECOVER_RUN_ERROR !!!
Stuck Cellid I/B0/W0/L0/C4-7 .
auto-HRR solved the problem - Alison x2080
Fri Feb 27 05:27:19
Run 179416
Recover transition state: FrontEnd Crate Error Condition from:
VRB_SVX_02
RXPT error - Alison x2080
Fri Feb 27 05:32:09
b0svx02 (VRB_SVX_02 RXPT error)
VxWorks login: vxworks
Password:
b0svx02-> i
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
---------- ------------ -------- --- ---------- -------- -------- ------- -----
tExcTask excTask 7bfe708 0 PEND 29a164 7bfe630 0 0
tNetTask netTask 7bdd088 1 PEND 280360 7bdcfd8 0 0
tPortmapd portmapd 7bd8308 1 PEND 280360 7bd8198 16 0
tRlogOutTasrlogOutTask 781f238 2 PEND 280360 781f0b0 0 0
tRlogInTaskrlogInTask 781dca8 2 PEND 280360 781da80 0 0
tShell shell 7aa11f8 5 READY 283d24 7aa01c0 1c0001 0
tRlogind rlogind 7bd9898 15 PEND 280360 7bd94d8 0 0
tAioIoTask1aioIoTask 7bee678 50 PEND 2807d4 7bee5d0 0 0
tAioIoTask0aioIoTask 7be7470 50 PEND 2807d4 7be73c8 0 0
tAioWait aioWaitTask 7bf5880 51 PEND 280360 7bf5728 0 0
t1 VISIONserver 7bbf1f0 100 PEND 280360 7bbf0c0 0 0
t2 ROBINserver 7bba1c8 100 PEND 280360 7bba088 0 0
t3 httpDaemon 7bb51a0 100 PEND 280360 7bb4f88 0 0
Messenger FER_messenge 7a70a30 200 SUSPEND 282aa0 7a70770 3d0004 0
Readout FER_readOutV 777e030 201 READY 287110 777df38 3d0002 0
rtlm_main rtlm_main 7bb0178 220 PEND 280360 7bb0038 0 0
Mon_III FER_monitorI 7b5ceb0 220 READY 2832d8 7b5ce10 1c0001 0
rtlm_sessiortlm_session 7b783c8 225 PEND 280360 7b781e8 0 0
tLogTask logTask 7bfbd90 250 READY 29a164 7bfbcc8 0 0
value = 0 = 0x0
b0svx02-> tt Messenger
2889e8 vxTaskEntry +60 : FER_messenger ()
7b300b4 FER_messenger +2cc: 7b0eb28 ()
7b0eb28 FER_smartInitMS+1db0: 7b2e628 ()
7b2e628 FER_errorSender+22ec: _MLencodedMessageCreate ()
7b1b304 _MLencodedMessageCreate+88 : free ()
25a37c free +1c : memPartFree ()
259fbc memPartFree +144: taskSuspend ()
value = 0 = 0x0
b0svx02->
- Catalin/Alison
-- Fri Feb 27 05:44:14 comment by...W.Badgett -- | The stack trace above points to a recently fixed
merlin package bug. Note that the main DAQ crates
all boot off of the bug-fixed version, while the b0svx##
crates boot from an old (very old?) version in
Steve Nahn's private disk area. Therefore, I'm guessing
the bug is still present in Steve Nahn's version.
I advise Steve Nahn to recompile and/or relink his
private version of fer package. It would also be
good for these crates to boot from a public area, but
that doesn't seem likely to happen.
|
-- Fri Feb 27 09:17:30 comment by...SCN -- Here the record of reboots of b0svx02 for the last 6 months or so
Wed Nov 26 10:05:10
Wed Nov 26 10:17:48
Thu Nov 27 08:34:10
Thu Nov 27 08:36:46
Thu Nov 27 08:49:36
Thu Nov 27 09:08:32
Thu Nov 27 09:12:01
Sun Nov 30 04:31:35
Thu Jan 08 06:10:24
Thu Jan 08 06:13:01
Thu Jan 08 06:46:35
Mon Jan 12 15:38:33
Wed Jan 21 20:09:32
Sat Jan 24 15:34:14
Sat Jan 24 15:37:46
Wed Feb 25 08:50:36
Wed Feb 25 08:52:40
Fri Feb 27 05:33:27
Fri Feb 27 05:39:11
Fri Feb 27 07:46:45
In addition, if you browse the Silicon log you find that until Feb 27, all of the reboots were done on purpose by humans either to troubleshoot the FFO resonance detector hardware, add some L1A counting, swap a VRB etc. None of them until this morning were due to a crash in the software, and you can see that in between
Jan 24 and Feb 25 there were no reboots at all.
This begs the question What changed today? (The point of keeping a private version is that it doesn't get changed underneath you, unlike the public "frozen" version which has changed 3 times in the last 20 days or so.
Fri Feb 27 05:37:50
Run 179416
Terminated at 2004.02.27 05:37:35 - RunControl
Fri Feb 27 05:38:29
Run 179416
TERMINATE: Ended run as problem with b0svx02 - Alison x2080
Fri Feb 27 05:46:25
Run 179417
Activated at 2004.02.27 05:45:53 - RunControl
Fri Feb 27 05:46:45
Run 179417
ACTIVATE: Restarted the run - Alison x2080
Fri Feb 27 05:47:51
Working on crate b0imu00 and b0imu01 to diagnose their
problems running with TDC DSP version V45.
- W.Badgett
Fri Feb 27 07:45:09
Run 179417
Terminated at 2004.02.27 07:45:02 - RunControl
Fri Feb 27 07:45:22
Run 179417
TERMINATE: End run to put IMU crates back in - Alison x2080
Fri Feb 27 07:51:33
Run 179435
Activated at 2004.02.27 07:51:18 - RunControl
Fri Feb 27 07:51:48
Run 179435
ACTIVATE: Restarted run with IMU back in partition - Alison x2080
Fri Feb 27 07:52:34
Run 179435
Terminated at 2004.02.27 07:52:27 - RunControl
Fri Feb 27 07:52:45
Run 179435
TERMINATE: IMU01 had DTO problems - Alison x2080
Fri Feb 27 07:54:35
Doing some parasitical testing of b0imu00 and
b0imu01 while they are in the DSP version V45,
in the main partition.
- W.Badgett
Fri Feb 27 07:56:08
| Run Number |
Data Type |
Physics Table |
Begin Time |
End Time |
Live Time |
L1 Accepts |
L2 Accepts |
L3 Accepts |
Live Lumi, nb-1 |
GR |
SC |
RC |
| Totals |
|
|
|
07:55:03 |
:: |
|
|
|
|
|
|
|
- End of Shift Report
Fri Feb 27 07:56:08
b0SVX02 again had problems, this time during the HRR recover transition.
Software rebooted the crate, this time it seemed to work!
- alison
Fri Feb 27 07:56:45
Run 179436
Activated at 2004.02.27 07:56:25 - RunControl
Fri Feb 27 07:56:46
Run 179436
ACTIVATE: Restarted without IMU01 - Alison x2080
Fri Feb 27 07:57:48
Shift Summary: No beam all shift following a quench during the day
shift.
Tevatron people working to fix consequences of the bad
pirani gauge.
- restarted run due to b0svx02 error
- Bill is working on crate b0imu00 and b0imu01 to diagnose their
problems running with TDC DSP version V45.
End of Shift Numbers
|
CDF Run II
Runs
Delivered Luminosity 0
Acquired Luminosity 0
Efficiency 100
|
- Guram
Fri Feb 27 08:08:25
Tried several different readout settings, and imu00
seemed to like its default settings: TdcReadoutMode =
LocalAggressive; DmaChain=false; SpyMode=false
In this case, there was the latest rate rate
of header word errors.
- W.Badgett :: (run 179436)
Fri Feb 27 08:09:12
Run 179436
Terminated at 2004.02.27 08:08:44 - RunControl
Fri Feb 27 08:09:13
Run 179436
TERMINATE: End run to put imu01 back in the partition. - Alison x2080
Fri Feb 27 08:13:23
Run 179437
Activated at 2004.02.27 08:13:07 - RunControl
Fri Feb 27 08:23:16
Run 179437
ACTIVE: HRRed CER_SVXMON_HALT_RECOVER_RUN_ERROR:
Stuck Cellid S/B1/W5/L4/C7-13 . - Vadim x2080
Fri Feb 27 08:25:51
Run 179437
Terminated at 2004.02.27 08:25:33 - RunControl
Fri Feb 27 08:25:52
Run 179437
TERMINATE: stop the run to change l3proxy version - Vadim x2080