2004 CDF E-Log -- Day shift. Sat Mar 6, 2004
SciCo DAQ Ace Monitoring Ace CO (Operations Manager)
Steve Hahn Jan Ehlers Susana Cabrera Serguei Bourov Mary Convery


Start of Shift Notes:  

Take COT good data, maximize efficiency

Sat Mar 6 08:51:22

Looking into ACnet problems with USERB disk where ACnet GIFs are copied. Opening a DECterm on CNS51PC and directly trying a "set def userb:[000000]" and doing a directory gives a "remote node unreachable" error. I have talked with MCR, noting that both our consoles (CNS46 dedicated and CNS51 not) have the same problem, and suggesting perhaps it is a network problem. They are going to call the experts and get back to us.

 - Steve Hahn :: (run 179683)
Sat Mar 6 09:13:25 L2 Decision Timeout: L1Mon: saw 210 L1 DMA transfers, expect 1 (buffer number 0) L1Mon: Dumping data for 1 word. Word upper 32 bits lower 32 bits 0: 0x00000000 0x00200000 1: 0x00000000 0x40300804 2: 0x0083b477 0xa8948879 .... 418: 0x0083b477 0xa8948879 419: 0x8083b477 0x88948879 L1Mon: done.  - Jan :: (run 179683)
Sat Mar 6 09:17:32
 - Susana.
-- Sat Mar 6 09:18:45 comment by...Susana. --  
Hourly plots have been captured with ALT+PRINT SCREEN and saved with Paint Shop Pro

Sat Mar 6 09:42:04

Note to aces:

How to copy ACnet screens on the ACnet consoles without using the ACnet command "XV capture" or "Save to GIF" in the Utilities menu? Answer: use screen capture instead! In fact, this method may be faster!

  1. Click on the window you want to copy so it is active (blue border).
  2. Hold down the ALT key, then press the "Print Screen" key (in the upper row of function keys next to F12). This copies the active window to the clipboard.
  3. Go to Start -> Programs -> Paint Shop Pro 6 -> Paint Shop Pro 6 to open Paint Shop Pro. You might check first whether Paint Shop Pro is already open in the taskbar.
  4. Under the EDIT menu, click "Paste as New Image". This copies the clipboard into Paint Shop Pro.
  5. Under the FILE menu, click "Save as...", select "CompuServe Graphics Interchange (*.gif)" in the "Save as Type" pulldown menu, and then save to an area on the local computer which you'll remember (like the Desktop, for example).
  6. Now, when you "Add Graphics" in the logbook, simply select the file you just created using the "Browse..." button. You might as well delete the files on the local computer when you are done since "Add Graphics" makes a copy of the graphic file on the logbook computer.
  7. Note that this method does not use the http://adcon.fnal.gov/userb web page at all.

 - Steve Hahn :: (run 179683)
Sat Mar 6 09:45:58 Test to see if logbook entries using Konquerer web browser works. - Steve Hahn :: (run 179683)
-- Sat Mar 6 09:46:52 comment by...Steve Hahn --  And it does! Nice to use a modern browser!
-- Sat Mar 6 11:28:31 comment by...Dale Stentz --  Konquerer works fine (better than netscape is some regards).

However, recall that one should never use the "advanced edit" feature for the elog using Konquerer. Using it will cause havoc and mess up all of the elog features (also might make some people upset as a result).


Sat Mar 6 09:50:23
Run Coordinator elog:
09:36:32- The noise problem D0 has been experiencing on their detector has gone away. After a combination of a temperature increase to the cooling water, lower humidity in the hall, and mechanically reconnecting part of the toroid, the noise was gone. It may come back, of course. So, for now, the shutdown start date is back to Monday, March 15. Accelerator Division is developing a contingency plan to start the shutdown on Thursday, March 10 if the noise returns in the next day or two. A final decision on the shutdown date will be made on Monday (3/8) at 1600. - JPM
 - convery
-- Sat Mar 6 09:50:45 comment by...convery --  09:39:09- The general plan is to hold this store until late morning or afternoon tomorrow, depending on stacking progress. Dr. X can have 30 minutes at the end of the store for TEL studies. Studies can have up to 20% of the TLG after the stack reaches 100E10. - JPM
Sat Mar 6 09:53:18

Talked with Jim Smedinghoff about ACnet problems. Apparently, this is a ACnet-wide DECnet problem; we are not the only nodes suffering these problems. The network router, cns55, does not see some nodes including ours even after multiple reboots. In fact after the most recent reboot, one of our consoles--CNS51--did reappear, and now the directory structures work correctly. However, Jim had no confidence it may not drop out again any moment. He is investigating, but no estimate how long this is going to take. We should continue to use the screen capture method I outlined above for the indefinite future.

 - Steve Hahn :: (run 179683)
Sat Mar 6 09:59:05
SCPU_TRACER_EVENT_ID Error: 

Hardware EVB has detected a problem with data quality in  
SCPU b0eb15 (forwarded by FER crate WCAL_03).
 - Jan :: (run 179683)
Sat Mar 6 10:02:05
L2 Decision Timeout: 

L1Mon: saw 210 L1 DMA transfers, expect 1 (buffer number 0) 
L1Mon: Dumping data for  1 word. 
Word       upper 32 bits  lower 32 bits 
   0: 0x00000000	0x00008000  
   1: 0x00000000	0x40600012  
   2: 0x0083b477	0xa8948879  
... 
  418: 0x0083b477	0xa8948879  
  419: 0x8083b477	0x88948879  
L1Mon: done.
 - Jan :: (run 179683)
Sat Mar 6 10:11:34
 - Susana.
Sat Mar 6 10:15:49
SCPU_TRACER_EVENT_ID Error: 

Host b0eb11.fnal.gov, task tRec_0 
SCPU-P1-E-TracerEventId: Event 10245271, crate 4, channel 8 has either bad Tracer ID or bad markers
around Tracer word 


Hardware EVB has detected a problem with data quality in  
SCPU b0eb11 (forwarded by FER crate CCAL_03)
 - Jan :: (run 179683)
Sat Mar 6 10:27:20
busy timeout B0SVX06: 

SCPU-P1-E-VrbHeader: Dump of header words for event 10450920 from VRB in slot 12: 
0x7ff3c1c1 0x7ff3c1c1 0xe04418fe 0x8d2400f3 0x8c248b24 0x8a248924 0x25032619 0x270431f3 
1 crate/s: b0svx06(16),  busy.[RXPT]
 - Jan :: (run 179683)
Sat Mar 6 10:48:07 Jim Smedinghoff called back and said the DECnet router problems are solved. I leave it to the aces as to whether they like the new ACnet copying method better or not. - Steve Hahn :: (run 179683)
Sat Mar 6 10:52:47

Mention a couple of problems found by our CO:

Many triggers which use tracking are showing up with rates that are too low. Presumably this is becuase Charles recently changed the reference rates to take into account SL2 being masked on, but now we are running without SL2 masked on. Kaori has sent a message to Charles Plager and Kevin Pitts.

COT SLs 2, 3, 4, 6, and 8 all show up as red in the 1-D occupancy plots. Checked with Bob Wagner; these are caused by known single channel gain problems throughout the chamber.

 - Steve Hahn :: (run 179683)
Sat Mar 6 11:06:54
 - Susana.
Sat Mar 6 11:51:41
Here is a plot of the solenoid NMR over the last 30 days.

As far as I can tell, there was only one other glitch on February 28 (there is always an excursion when the solenoid is ramped up from being off). Thus, the fact we had two incidents on the evening and owl shifts is worrisome.

I talked with Steve Gordon (the system process tech) about this. Apparently, there have been various problems with this NMR readout, but not in recent times (since last November?). I'll talk to Bob Sanders about this next week.

 - Steve Hahn
Sat Mar 6 11:57:38
for information: 

PCAL02 throws a bunch of reformatter errors at once with a low reject rate of ~0.005% (without
stopping the DAQ)
 - Jan :: (run 179683)
Sat Mar 6 12:03:00 Do to a scripting error (on my part), I only updated L1 trigger rates. I just updated L1, L2, and L3 rates. This will take effect when either a new run starts or XMon is restarted. This is for the two run ranges where the COT was in the new standard compromised setup. (runs 179103-179132 and runs 179463-179505).  - Charles Plager
Sat Mar 6 12:08:41
SCPU_TRACER_EVENT_ID Error: 

Hardware EVB has detected a problem with data quality in  
SCPU b0eb12 (forwarded by FER crate XFT_FINDER_02) 

Host b0eb12.fnal.gov, task tRec_0 
SCPU-P1-E-VrbHeader: Dump of header words for event 11822182 from VRB in slot 10 
SCPU-P1-E-TracerEventId: Event 11822184, crate 100, channel 3 has either bad Tracer ID or bad
markers around Tracer word
 - Jan :: (run 179683)
Sat Mar 6 12:18:09
 - Susana
Sat Mar 6 12:20:31
COT HV alarm for a few seconds in CDF GLOBAL ALARM: 
related to COT temperatures. It was so quickly that I could not  
identify the origin.
 - Susana.
Sat Mar 6 12:39:51 XMon was restarted by SciCo request. - Seguei :: (run 179683)
Sat Mar 6 12:58:16
CER_SVXMON_HALT_RECOVER_RUN_ERROR !!!  

Stuck Cellid S/B1/W5/L4/C7-13 .  
AUTO HRR will be issued   
 - Jan :: (run 179683)
Sat Mar 6 13:12:20
The updated XMon L1 Trigger X-sections
 - Serguei
Sat Mar 6 13:14:03
XMon #6
 - Serguei
-- Sat Mar 6 13:54:05 comment by...Steve Hahn --  

Talked with Charles about current trigger rates after restarting XMon (since his last message). He was aware of them, and thinks with the current state of the COT we should be OK. He said we should worry if we see any particular triggers causing high dead time, but a check of the trigger display shows no culprits.


Sat Mar 6 13:17:14
 - Susana.
Sat Mar 6 13:51:52
CER_SVXMON_HALT_RECOVER_RUN_ERROR: 

Stuck Cellid S/B5/W6/L4/C7-13 .  
AUTO HRR will be issued  
 - Jan :: (run 179683)
Sat Mar 6 14:20:42
information: 

every now and then low rate (~0.005%) REFORMATTER errors from FIB00 

RAWREF Error - VRB-DLINCO - with code 18 -- START
 - Jan :: (run 179683)
Sat Mar 6 14:35:39
 - Susana.
Sat Mar 6 15:12:03
Host b0eb12.fnal.gov, task tRec_0 
SCPU-P1-E-CantResetVrb: Reset of VRB in slot 14 failed 

Press the button on the front panel OF THE VRB IN SLOT 14 ,  
*NOT* the crate CPU, and WAIT AT LEAST 10 SECONDS.  
Note the button is recessed and will require a pen or  
paperclip to press. When pressed, lights will flash.  
If no lights flash, it wasn't pressed 

<- DONE, WORKED
 - Jan :: (run 179683)
Sat Mar 6 15:13:17
FERML_HIGH_DEADTIME occured (no consequence)
 - Jan :: (run 179683)
Sat Mar 6 15:29:38
 - Susan.
Sat Mar 6 15:34:00
DateTimeBLMDose
2004.03.0615:29:13W Inner BLM594.31RADS
2004.03.0615:29:13W Outer BLM33.71RADS
2004.03.0615:29:13E Inner BLM111.40RADS
2004.03.0615:29:13E Outer BLM594.31RADS
Integrated dosage
 - Susana.
-- Sat Mar 6 15:35:14 comment by...Susana --  
DateTimeBLMDose
2004.03.0615:34:13W Inner BLM597.32RADS
2004.03.0615:34:13W Outer BLM33.71RADS
2004.03.0615:34:13E Inner BLM111.40RADS
2004.03.0615:34:13E Outer BLM597.32RADS
Integrated dosage
Sat Mar 6 15:40:49
s
 - s
Sat Mar 6 15:45:06
s
 - s
Sat Mar 6 15:55:30
Run Number Data Type Physics Table Begin Time End Time Live Time L1 Accepts L2 Accepts L3 Accepts Live Lumi, nb-1 GR SC RC
179683 x2BDE3 BEAM PHYSICS_2_02 [2,424,431] 23:51:25 15:03:45 791,507,714 14,646,722 3,039,894 1899.218 1
Totals 15:55:03 15:03:45 791,507,714 14,646,722 3,039,894 1899.218
 - End of Shift Report Luminosity summary Begin shift or beam at 08:00:59
End shift or beam at 15:59:05
Delivered luminosity: 768.63
Acquired luminosity: 728.73

Totals
Date:2004.03.06
Shift:day
Delivered luminosity: 768.6 nb-1
Acquired luminosity: 728.7 nb-1
Efficiency: 94.8

Plot not available

This script has been called 3532 times since Aug 16th 2003

Store 3275 Initial Lum =  6.7e31 @ 2304 03/05/04

Run 179683  Start shift Lum = 3.3e31

Very smooth running.

Problems with copying ACnet plots to USERB disk found
to be DECnet router problem (we were not only ones 
affected).  Jim Smedinghoff fixed the problem about 1000.

Many trigger rates in XMon which use tracking are 
marked as too low.  Presumably, this is because Charles
yesterday changed reference to reflect SL2 being masked
on, but we are no longer doing this.  Charles and Kevin
Pitts have been notified.

Charles wrote back that he had updated L1 L2 and L3 
trigger rates for the current configuration.  However,
after restarting XMon we do not see much improvement,
other than several triggers are marked invalid (grey).

Talked with Charles, under current comprimised COT 
conditions, he said we should worry if we see any
triggers causing high dead time.  We see no such
problems.

Our one downtime: 9 minutes to a VRB error in
b0eb12, slot 14 at 1459.  The standard front-panel 
reset worked.

-- Sat Mar 6 19:33:08 comment by...Steve Hahn --  But then see how I managed to make all hell break loose for the next shift
Sat Mar 6 15:55:31
HVAC alarm in CDF GLOBAL ALARMS. 
PDT-CH Collision hall differencial pressure, current value: 
0.158 H20, hi limit is 0.13 
 - Susana.