2004 CDF E-Log -- Day shift. Mon Mar 1, 2004
SciCo DAQ Ace Monitoring Ace CO (Operations Manager)
Rainer W. Susana C. Tom S. Franco S. Mary C.


Start of Shift Notes:  

Store #3263 colliding, Inst Lum 19.9e30, Stack 138, stacking around 5m/h   
Run 179505 in progress.   
COT in compromised setup (HV SL12 off, SL345 reduced gain)   

Trigger table is PHYSICS_2_03 [1,431,435]   
Plan: - take data 
      - take data 
      - take data 

Mon Mar 1 08:57:36
 - (hourlies) Tom
-- Mon Mar 1 09:00:18 comment by...Rainer --  the funny structure is seen in several proton abort gap sensitive variables so we conclude the effect is real. See also elog entry here
-- Mon Mar 1 09:00:25 comment by...Tom --  
For Abort Gap:
substituted T:E0LABT and C:B0MSC3
for C:B0AAGC and C:B0ABSM

as requested by SCICO.

Mon Mar 1 09:03:14
SCPU_BAD_VRB_BYTE_COUNT error, hardware event builder detected  
a problem with SCPU b0eb11. Automatic HRR 

(MLE) b0l3pcom1.fnal.gov:main:8:58:59 AM->Host b0eb11.fnal.gov, task tRec_0 
SCPU-P1-E-VrbHeader: Dump of header words for event 13670316 from VRB in slot 10: 
0x00000000 0x00002c10 0x00390039 0x03580814 0x08a80a20 0x011c01c8 0x01f8021c 0x06b40000 

(MLE) b0l3pcom1.fnal.gov:main:8:58:59 AM->Host b0eb11.fnal.gov, task tRec_0 
SCPU-P1-E-VrbHeader: Dump of header words for event 13670316 from VRB in slot 12: 
0x001c017e 0x53564444 0x00080003 0x00000000 0x00000003 0x00000003 0x00000000 0x00000002 

(MLE) b0dap73.fnal.gov:Thread-53:8:59:10 AM->Requested Halt-Recover-Run issued [errmon] 

(MLE) b0svx06:Messenger:8:58:56 AM->Silicon Timeout:BUSY- Slots:  08:fa00 10:fa20 12:fa40 16:f800
18:f820 20:f840 


(RC)   8:59:13 Halt -> HALTED 
(RC)   8:59:19 Recover -> RECOVERED 
(RC)   8:59:22 Run -> ACTIVE 
 - Susana.
Mon Mar 1 09:15:44
Some mismatch in muon trigger data vs simulation
 - Franco
-- Mon Mar 1 09:40:35 comment by...Franco --  
This is correct. See this link: http://www-cdfonline.fnal.gov/cgi-bin/fixlist.pl

Mon Mar 1 09:48:24 We have a trigger inhibit for the components: TRIP:SVX06, SVXHV. We halted the run and recover from the trip from the PS Gui , th etrip was in SVX B4 W3 We have no messages in the Global CDF Alarm Gui. The inhibit cleared off and we recover and run. After this, we paused the run due to heartbeat alarm in the shower max detectors and in the muon system. We have no trigger inhibit signals for this HV problem. Scico confirms it is only a monitoring problem, therefore we resume the run. - Susana.
Mon Mar 1 09:59:57
MUON3 PC lost connection.  So you see CMX,BMU and CES,CPR,CCR had heartbeat alarms.  Followed
procedure to reboot PC. 


SHUTDOWN HVMON,IFIX, then PC 
log in using password written in binder. 
IFIX and HVMON will start automatically. 
press START button on HVMON control panel. 
takes roughly 5-10 minutes for everything to pop back up. 
should see red X's go away.
 - Tom
Mon Mar 1 10:15:35
 - (hourlies) Tom
-- Mon Mar 1 10:16:33 comment by...Tom --  
again made above abort gap plot subsitutions 

T:E0LABT
C:B0MSC3

Mon Mar 1 10:56:19
IFIX Global Alarm and HV summary pages are acting very slow.  At some points even freezing up
then starting again.  Looking into this.
 - Tom
-- Mon Mar 1 11:02:52 comment by...Tom --  So the little X that spins around telling me things are cool keeps freezing up and going again. A lot more then i'm used to. Might need to restart ifix.
-- Mon Mar 1 11:06:08 comment by...Tom --  
Paging Expert.

-- Mon Mar 1 11:10:04 comment by...rainer --  JC responded. he is coming in in about 15min to have a look.
-- Mon Mar 1 11:45:59 comment by...Tom --  
JC Restarted IFIX on VNODE 1.  Had me monitor HV Summary and Global alarms on vnode 2.  All
seems fine now.

Mon Mar 1 11:10:53 Notes from Run Coordinator elog:

There was a periodic rise in abort gap losses during the last two stores caused by either the RF or the longitudinal damper. There may be a temperature effect at F0 causing the problem with the warmer weather.
Tevatron experts would like a 4 hour end of store study and would like to increase the helix to 115% from the beginning of the next store.  - convery
-- Mon Mar 1 11:17:34 comment by...convery --  11:14:46- The Plan: End of Store studies begin at 2200 and last until 0200 Tuesday. Experimental access between stores, estimated duration 1-2 hours. Shot setup after accesses are complete, the helix will be at 115% from the beginning of the store. Pbar can have a shift of studies this afternoon and evening if they desire. Pbar shots to the Recycler after the next shot, they need 50E10. - JPM
-- Mon Mar 1 11:18:03 comment by...convery --  We requested the access to do COT gas plumbing work in order to reverse the gas flow. The COT would be off for the following store so that any oxygen introduced into the system can be purged.


Mon Mar 1 11:11:17
 - (hourlies) Tom
Mon Mar 1 11:13:03
Automatic HRR was delivered because of a Consumer Error: 
CER_SVXMON_HALT_RECOVER_RUN_ERROR ! 
The stuck cellid was S/B1/W5/L4/C7-13 
 - Susana.
-- Mon Mar 1 11:23:15 comment by...Susana. --  
The same situation at 11:18

-- Mon Mar 1 12:33:38 comment by...Susana. --  
The same situation at 12:25 

Mon Mar 1 11:22:40
Level 2 decision timeout error, automatic HRR issued.It was not related with the out-of-sinc
problem so I haven't rebooted  

the alpha.
 - Susana
Mon Mar 1 11:40:42

JJ had rebooted CNS51PC and in the process of coming back, it produced the following message:

Messanger Service: Message from CNS51PC to CNS51PC. This computer does not have the Patchlnk installed or there is a communication error with the Patchlnk server.

I talked with Jim Smedinghoff about this; apparently, AD is deploying a client-server setup which automatically serves and installs Windows patches. On "his" (ACnet?) machines, this is not working correctly and gives this message. Windows patches are being served by other means, and this message is not serious.

Also, JJ couldn't find the TeV Array Display to start up the bunch-by-bunch display. I had the same confusion initially; it's a Netscape alias icon at the very top of both screens. Started up just fine.

 - Steve Hahn
Mon Mar 1 12:08:57
Icicle heart-beat not received since 11:57, I have re-started
 - Susana.
Mon Mar 1 12:13:22
 - (hourlies) Tom
Mon Mar 1 12:17:43 Steve Nahn and Marcel Stanitzki working on 2nd floor Silicon DAQ teststand to implement monitoring in anticipation of UDPS test. Will need a END RUN -> CONFIG -> ACTIVATE transition ok'ed by ops manager. - Rainer
-- Mon Mar 1 12:59:05 comment by...Rainer --  monitoring installed. 3% efficiency drop of day shift. snifff .... but well invested, mind you. Now we only need that trigger table - silicon ready for UDPS test.
Mon Mar 1 12:32:57
L2 decision timeout ERROR. Automatic HRR issued and run recovered.
 - Susana.
Mon Mar 1 12:40:57 Run 179505 Terminated at 2004.03.01 12:40:25 - RunControl
Mon Mar 1 12:41:20 Run 179505 TERMINATE: AAA_CURRENT PHYSICS_2_03[1,431,435] - Susana x2080
-- Mon Mar 1 12:42:37 comment by...Susan. --  
Run terminated to perform silicon tet in cold start,
requested by the Silicon Expert and/or Scico.

Mon Mar 1 12:45:41 Run 179506 Activated at 2004.03.01 12:45:03 - RunControl
Mon Mar 1 12:47:24 I have changed some parameters in the .tcl for YMon in order to include again the Online DQM Expert System. The CO killed and restarted again YMon. For the moment it works fine. - Olga :: (run 179505)
Mon Mar 1 12:47:52 Run 179506 ACTIVATE: AAA_CURRENT PHYSICS_2_03[1,431,435] COT masked off. - Susana x2080
Mon Mar 1 13:03:44
 - (hourlies) Tom
Mon Mar 1 13:15:41 test entry - author