2004 CDF E-Log -- Owl shift. Sat Mar 6, 2004
SciCo DAQ Ace Monitoring Ace CO (Operations Manager)
Rick Field V. Khotilovich N. Miladinovic A. Loginov Mary Convery


Start of Shift Notes:  

Store 3275 continuing 
Taking data

Sat Mar 6 00:17:51
L2 Decision Timeout.[RXPT]: 
b0l2de00:SpyAlpha:12:16:07 AM-> 
CListMon: Dumping CLIST data: 
Word       upper 32 bits  lower 32 bits 
 0: 0x80338401	0x20000011  
 1: 0x00338401	0x00000011  
 2: 0x00355c01	0x20010023  
 3: 0x80355c01	0x00010023  
 4: 0x87b55c01	0x20010023  
 5: 0x07b55c01	0x00010023  
ClistMon: done.
 - Vadim :: (run 179683)
Sat Mar 6 00:30:41 Still having problems with ACNET consoles. At Steve Hahn's suggestion, asked MCR to reboot vaxes for CNS46PC and CNS51PC. - convery
Sat Mar 6 00:45:52 Steve Hahn called. We are having a problem saving pictures with ACNET. This is a new problem. Steve called MCR and they are looking into the problem (they do not have the problem). They will call after they investigate. - Rick Field
Sat Mar 6 01:32:44 CPR tripped [North West 0 - 5]. Recovered. - natasha
Sat Mar 6 01:35:18
acnet test plotting works - only the VMS file system default path got reset.
 - rainer
-- Sat Mar 6 02:11:21 comment by...rainer --  

fond memories came up when the xv capture dialog offered something like USR$USERB:[BLA.BLUBB.BLOBB] which 'older folks' (compared to aces) immediately recognize as a valid VMS path. for some reason, the default path USERB:[PIC.CAPTURE.CDF] (USERB undoubtedly LOGICAL to USR$USERB) got erased in the xv capture dialog which is mapped to http://adcon.fnal.gov/userb/pic/capture/cdf/ the aces use to import the files from that file system into the elog.

so if it happens again, take heart and click yourself through the tree - the "." are the "/" in UNIX the brackets are ... well forget about the brackets and USERB: is something like C:\ in MSDOS and /dev/hda in unix. be happy you can click - on the file system that would have been something like set/def [.capture.cdf]

The VAX is dead - long live the VAX
-- Sat Mar 6 02:12:47 comment by...rainer --  umpf - set def of course, not set/def - I am already unix brainwashed.


Sat Mar 6 01:45:35 The CO just saw from the YMON Trigger Bits (L1 & L3) plots that YMON was set for COSMICS. He reset it for PHYSICS. I am confused as to why it was set for COSMICS... it means that the previous CO was not monitering the PHYSICS??? - Rick Field :: (run 179683)
-- Sat Mar 6 17:38:24 comment by...G. Mitselmakher (previous CO) --  The first physics run in our shift started just 10 min before the end of the shift, so we did not have any physics runs to monitor. I am positive about switching to physics, although a mistake is always a possibility. I did not have statistics to go through the checklist G. Mitselmakher
Sat Mar 6 01:49:05
 - natasha
Sat Mar 6 02:04:21 HRR just recovered an L2 decision timeout. Looking into the errlog it's hard to say why it happenned (halt occured at 1:59:35) - Vadim :: (run 179683)
Sat Mar 6 02:05:17 Had an HRR becouse of CER_SVXMON_HALT_RECOVER_RUN_ERROR: Stuck Cellid I/B0/W0/L0/C8-11 .  - Vadim :: (run 179683)
Sat Mar 6 02:13:54 pcal01 got a done timeout. had to shephered it. - Vadim :: (run 179683)
Sat Mar 6 03:09:01
EVB error: 
SCPU_TRACER_EVENT_ID Error !!!  
 Hardware EVB has detected a problem with data in  
 SCPU b0eb15 (forwarded by FER crate CCAL_15).  
 Datataking is paused: events are not being processed.  
 AUTO HRR will be issued. 
 - Vadim :: (run 179683)
Sat Mar 6 03:21:49
Reformatter error caused busy timeout in svx06: 

b0l3pcom1.fnal.gov:main:3:16:53 AM->Host b0eb19.fnal.gov, task tRec_0 
SCPU-P1-E-VrbHeader: Dump of header words for event 3816257 from VRB in slot 16: 
0x5816590f 0x5a0e5b09 0x5c0f5d03 0xa20ba10b 0xa00b7ff3 0xc1c10000 0xe27473fd 0x8d0b00d0 
(MLE) b0l3pcom2.fnal.gov:main:3:16:54 AM->Error on L3 node b0l3241 (partition 1) Sat Mar 6
03:16:53 2004 l3_node  

  in refoInt_reformatProc (l3_refevt.c:627) 
         L3_REFORMAT_ERROR (Parsed Info):  
         RAWREF Error - BNK-SI-DUPL - with code 37 -- START 
         Error count: 1 of 1 
         Reject rate: 0.012 percent 
          (see Error Handler log file for full message) 
1 crate/s: b0svx06(16),  busy.[RXPT]
 - Vadim :: (run 179683)
Sat Mar 6 03:45:47 pcal05 confessed in having FERML_HEAP_CORRUPT and asked me to shepherd her. I duly did it. - Vadim :: (run 179683)
Sat Mar 6 03:46:15 We're having the same problem w/ ACNET that we had earlier today. Rainer figured out that VMS path got reset. He also gave instructions in case it happens again. The problem is -- the system is painfully slow. I'm trying to find my way through the file system, but acnet seems to be thinking for many minutes about every click [the cursor is yellow and I cannot do anything other than wait]. So effectively I'm stuck, or rather acnet is stuck, I'll keep working on it once it lets me... By the way, we didn't reset acnet, or anything for that matter, someone should look at why this is happening. - natasha
Sat Mar 6 03:50:51 We are still having ACNET problems and we cannot put pictures in the E-log. We can, of course, still monitor the system so we will wait until 8 am when Steve Hahn arrives (he is the day Sci-Co) to work on the problem. - Rick Field
Sat Mar 6 05:10:47 Abort gaps are rising steadily but sharply -- within the last hour the B)PAGC went from around 4.5 kHz to around 6 kHz. This is of course well within accepted limits. - natasha
-- Sat Mar 6 05:11:44 comment by...natasha --  i guess i meant steadily but steeply...
Sat Mar 6 05:19:22 Pink cell in IMon - S_AVDD_B5w10L2. Current is slightly too high [835, good is up to 8300]. I unmarked it and it came pink again. Unmarked it again and did HRR [though the current seems to have started going down before the second unmark]. It appears to be fully back to normal now. - natasha
Sat Mar 6 05:49:48 Run 179683 ACTIVE: Got an auto HRR because of this: SCPU_TRACER_EVENT_ID Error !!! Hardware EVB has detected a problem with data in SCPU b0eb11 ( FER crate : NOT_AVAILABLE).  - Vadim X2080
Sat Mar 6 06:03:51 We had 2 solenoid trip alarms [the 'twiglight zone' music]. The cryo guy says we were running with magnetic field slightly too high. We didn't notice any problems, and after the conversation with the cryo tech the magnetic field was 13703Gauss, which is normal value. The Cryo tech said not to worry. The previous shift had the same alarm. - natasha
Sat Mar 6 06:06:33 We got two trips of the Solonoid alarm. I called CRYO and they said that the B-field was slightly too high. Everything seems fine now. - Rick Field
Sat Mar 6 06:13:27 CPR trip -- this time NE 18-23. Recovered. - natasha
Sat Mar 6 06:19:36 Run 179683 ACTIVE: Auto HRRed this one: SCPU_BAD_CHANNEL_COUNTS Error !!! Hardware EVB has detected a problem with data in SCPU b0eb22.  - Vadim X2080
Sat Mar 6 07:09:16
Got L2 decision timeout: 

b0l2de00:SpyAlpha:7:07:57 AM-> 
CListMon: Dumping CLIST data: 
Word       upper 32 bits  lower 32 bits 
 0: 0x80169c01	0x20000013  
 1: 0x00169c01	0x00000013  
 2: 0x0014a401	0x2002001f  
 3: 0x8014a401	0x0002001f  
 4: 0x8094a404	0x2002003d  
 5: 0x0094a404	0x0002003d  
 6: 0x0391d404	0x20170027  
 7: 0x8391d404	0x00170027  
ClistMon: done. 
L2 Decision Timeout.[RXPT]
 - Vadim :: (run 179683)
Sat Mar 6 07:55:24
Run Number Data Type Physics Table Begin Time End Time Live Time L1 Accepts L2 Accepts L3 Accepts Live Lumi, nb-1 GR SC RC
179683 x2BDE3 BEAM PHYSICS_2_02 [2,424,431] 23:51:25 07:30:13 382,520,155 8,215,835 1,641,504 1162.392 1
Totals 07:55:02 07:30:13 382,520,155 8,215,835 1,641,504 1162.392
 - End of Shift Report
Sat Mar 6 08:00:13 Shift Summary:
Store 3275 Initial Lum =  67 

Run 179683  Current Lum = 32 

** L2 Problems ** 
Burkard and Ted worked on muon fibre splitting 
and things seem to be working now. No more L2 Pulsar errors  
in TrigMon. 

** ACNET Problem ** 
We were having a problem saving pictures with ACNET. Steve Hahn  
called MCR and they looked into the problem. MCR called and talked  
with Natasa (ACE).  Rainer figured out that a VMS path got reset.  
He also gave instructions in case it happens again... However, it 
has happened again and the instructions were not sufficient to 
fix the problem.  I decided to wait until 8am when Steve Hahn 
will arrive for his day "Sci-Co" shift.  Steve is now here and 
is working on it. 

** Lots of Good Data ***

End of Shift Numbers
CDF Run II

Runs                   179683
Delivered Luminosity   1228.2 nb-1  
Acquired Luminosity    1143.9 nb-1  
Efficiency             93.1

 - Rick Field