2004 CDF E-Log -- Eve shift. Thu Mar 4, 2004
SciCo DAQ Ace Monitoring Ace CO (Operations Manager)
Stephan LAmmel Ian Vollrath Christopher Marino Diego Cauz Mary Convery


Start of Shift Notes:  

Store 3273 (inst lumi 2.68E31) and run 179643 (PHYSICS_2_02) 
in progress

Thu Mar 4 16:38:16
DateTimeBLMDose
2004.03.0416:34:05W Inner BLM263.19RADS
2004.03.0416:34:05W Outer BLM0.00RADS
2004.03.0416:34:05E Inner BLM3.03RADS
2004.03.0416:34:05E Outer BLM263.19RADS
Integrated dosage  - Christopher
Thu Mar 4 16:47:05
 - Christopher
Thu Mar 4 16:59:55
Question for the IMU/BMU people: is there any reason why all the 
East layers have channel n. 71 dead and all the 
West layers have channels n. 72, 142, 143 dead?
 - diego :: (run 179643)
-- Fri Mar 5 09:20:06 comment by...Camille Ginsburg --  
Dear Diego,
     Thanks for the careful check of the data.
BMU-E-71 and BMU-W-72 are un-instrumented.  They
are at the very top of the detector, where there
was no space.  BMU-W-142 and BMU-W-143 have
the pre-amplifiers unplugged because these stacks
were oscillating.  (probably only one is actually
oscillating, but these two stacks are jumpered together)
We will repair them at the next access opportunity
(presumably March 15).


Thu Mar 4 17:30:19
Trigmon Slide n. 2 L2 TriggerMonitor 2.2.3.2 
Ncluster: TL2D vs. TC2D 
shows an error rate slightly over the limit: 0.12% vs. 0.10% 
I'm keeping an eye over it.
 - diego :: (run 179643)
-- Thu Mar 4 17:32:57 comment by...diego --  
oops! I goofed. The limit is 1% not 0.1%

Thu Mar 4 17:35:48
got silicon resonance error: 

MLE) b0dap73.fnal.gov:Thread-28:5:31:51 PM->FrontEnd Error: VRB_SVX_02 
CT: 2004.03.04 17:32:03 
32'3" 1 crate/s: b0svx02(112),  in error.[RXPT]b0svx02:Messenger:5:31:59 PM->SRC Fatal Error:Sl 5
Resonance Det 

ected 
 --> 
 Additional Information: 

 Attention !!!. FERML_SRC_FATALITY ERROR !!! 

 SRC Fatal Error from b0svx02: Sl 5 Resonance Detected 


hrr worked.
 - ian :: (run 179643)
Thu Mar 4 17:50:15
got error: 

MLE) b0dap73.fnal.gov:Thread-28:5:43:21 PM->  Level 2 Decision Timeout 

(MLE) b0dap73.fnal.gov:Thread-28:5:43:21 PM->Requested Halt-Recover-Run issued [errmon] 
(MLE) b0l2de00:SpyAlpha:5:43:23 PM-> 
L1Mon: saw 210 L1 DMA transfers, expect 1 (buffer number 1) 
L1Mon: Dumping data for  1 word. 

hrr worked.
 - ian :: (run 179643)
-- Thu Mar 4 17:51:12 comment by...ian --  
again at 17:49

Thu Mar 4 17:52:37
 - Christopher
-- Thu Mar 4 17:53:49 comment by...Christopher --  Hourly Plots: Note LOSTP spike at 5:30
Thu Mar 4 17:56:18
is this red plot something we should worry about?
 - diego
-- Thu Mar 4 19:25:23 comment by...ps --  no
Thu Mar 4 18:05:50
got error: 

Attention!!!. SCPU_TRACER_EVENT_ID Error !!! 
 Hardware EVB has detected a problem with data quality in 
 SCPU b0eb15 (forwarded by FER crate WCAL_04). 

hrr worked.
 - ian :: (run 179643)
Thu Mar 4 18:43:05
 - Christopher
Thu Mar 4 18:44:09
got error: 

(MLE) b0l2de00:SpyAlpha:6:40:13 PM-> 
L1Mon: saw 210 L1 DMA transfers, expect 1 (buffer number 0) 
L1Mon: Dumping data for  1 word. 

followed by: 

(MLE) b0l3pcom1.fnal.gov:main:6:40:31 PM->Host b0eb20.fnal.gov, task tRec_0 
SCPU-P1-E-VrbHeader: Dump of header words for event 5299331 from VRB in slot 16: 

hrr worked.
 - ian :: (run 179643)
Thu Mar 4 19:04:30 Process System GAS alarm has tripped twice in last 3 minutes
Ethane Heater Box N2Purge DP under Shed Alarms is to blame - Christopher
-- Thu Mar 4 19:10:33 comment by...Christopher --  Process Systems Report: Alarm is a recurring weather related problem
Thu Mar 4 19:10:55 Reiner spotted yellow (no data) boxes in SVXmon SVX chip status map display; ask to page Silicon; Florencia called back and will come in to check things out - Stephan :: (run 179643)
-- Thu Mar 4 19:17:37 comment by...rainer --  this looks like a GLINK issue - try an HRR to get it into shape.
-- Thu Mar 4 22:38:50 comment by...rainer --   VRB problem in slot #11 b0svx02
  • HRR did not help to get VRB back in shape
  • intended to reset the VRB and re-start the run
  • due to communication failure with shift crew, reset VRB while ending of run was still in progress
  • the run was aborted and event builder cleanup attempted, which seem to have failed - as a result L3 appeared to be somewhat completely broken
  • reset of VRB and crate CPU fixed the problem and brought data back ffrom SB2W0/1
  • SVT was not affected, see here
  • strange thing was that SVXMon was frozen and only got re-started at 6:30pm. normally, the absence of SVXMon should have been alerted in run control. last confirmed sign of life was 2:37pm.
  • sometime between 2pm and 3 pm, the VRB started to flake out.
  • two wedges out I believe is a run quality criterium BAD for SVX if I remember correctly ...\
  • details see silicon elog
    -- Thu Mar 4 22:56:06 comment by...rainer --  pricetag: 125min (15 min VRB reset)
    Thu Mar 4 19:42:42
    bunch of clist errors, e.g: 
    
    
    MLE) b0l2de00:SpyAlpha:7:39:45 PM-> 
    ClistMon: Error on pass 0, cluster 0: 2 != 1 
    
    (MLE) b0l2de00:SpyAlpha:7:39:45 PM-> 
    ClistMon: Error on pass 2, cluster 0: 2 != 1 
    
    (MLE) b0l2de00:SpyAlpha:7:39:46 PM-> 
    ClistMon: Error on pass 2, cluster 1: 2 != 1 
    
    +lots more 
    
    hrr worked.
     - ian :: (run 179643)
    Thu Mar 4 19:46:03
     - Christopher
    -- Thu Mar 4 19:46:44 comment by...Christopher --  Hourly Plots: Another LOSTP spike around 6:30
    Thu Mar 4 20:06:13
    got error: 
    
    (MLE) b0xft00:Messenger:8:03:16 PM->Runtime Error 500, Event 6052890: Bunch counter mismatch,
    mismatch count = 1 
    
    (MLE) b0l3pcom1.fnal.gov:main:8:04:07 PM->Host b0eb19.fnal.gov, task tRec_0 
    SCPU-P1-E-VrbHeader: Dump of header words for event 6062536 from VRB in slot 20: 
    0x009e0198 0x01900018 0x0086008a 0x0052004e 0xe2a038fc 0x831b00f3 0x5406551f 0x56ef5726 
    (MLE) b0dap73.fnal.gov:Thread-28:8:04:11 PM->Requested Halt-Recover-Run issued [errmon] 
    (MLE) b0svx06:Messenger:8:04:07 PM->Silicon Timeout:BUSY- Slots:  08:fa00 10:fa20 12:fa40 16:f800
    18:f820 20:f840 
    
    
    hrr worked.
     - ian :: (run 179643)
    Thu Mar 4 20:12:56 Run 179643 Terminated at 2004.03.04 20:11:58 - RunControl
    Thu Mar 4 20:12:57 Run 179643 TERMINATE: run ended due to silicon problems - Ian x2080
    Thu Mar 4 20:36:39 problem with level 3 converter node 1: stuck after VRB reset during run termination; page level 3; called back working with Ian - Stephan
    Thu Mar 4 20:39:25
    inserting SVX occupancy, on Reiner's request
     - diego
    -- Thu Mar 4 21:08:32 comment by...diego --  I meant SVT occupancy
    Thu Mar 4 20:44:40
     - Christopher
    Thu Mar 4 21:00:46 Run 179643 RUNSTATUS:
    Marked Bad, explanation:
    SVX  readout problem (bulkhead 2 wedge 0 and 1) for unknown length of time a end of run
    
     - cdfscico
    Thu Mar 4 21:35:14 Run 179644 Activated at 2004.03.04 21:35:04 - RunControl
    Thu Mar 4 21:36:18 Run 179644 ACTIVATE: PHYSICS_2_02[2,424,431] after L3 work - Ian x2080
    Thu Mar 4 21:36:56
     - Christopher
    Thu Mar 4 21:38:13
    About Level3 page  
    
    When i came level3 was completely broken. 
    To bring it back to life i had to reboot converters 1 and 8  
    
    and later Scanner manager b0eb10 and scpu25 
    
    Now Level3 seems to be fine  
     - Arkadiy
    -- Thu Mar 4 22:44:18 comment by...rainer --  No doubt there was some relationship with action reported here
    Thu Mar 4 21:49:40 COT alarm, but no trip -- in alarm log reads TEMP_ALR
    Everything green now - Christopher
    -- Thu Mar 4 21:55:05 comment by...Christopher --  Morris (just happened to enter) confirms COT looks OK
    Note any recurrance
    Thu Mar 4 22:14:47
    the CMP part B looks a bit out of shape
     - diego
    -- Fri Mar 5 08:40:45 comment by...Lucio Cerrito --  the plot has low statistics; part Bottom is alot more shielded than the rest so it accumulates hits alot slower than the others.
    Thu Mar 4 22:22:44
    DIRAC/preFRED not always =1. Is it all right?
     - diego
    Thu Mar 4 22:41:40 more CAEN madness

    several ladders showing pink in IMON - reaback reveals daunting voltages on low voltage power supply. corrupted readback - hockerized crate and now ok. Details see silicon elog  - rainer
    -- Thu Mar 4 22:54:04 comment by...rainer --  pricetag: 15min.


    Thu Mar 4 23:05:34
     - Christopher
    Thu Mar 4 23:06:39
    We have put a DQMon monitor on b0dap51 for  
    Validation Period by the CO. Please report 
    (mmp@fnal.gov) any comment/complain/observation 
    
        In case you may want to restart it: 
    
          * xhost + b0dap30 
          * login on b0dap30 as cdfdaq 
          * setup -r /data1/mmp/fer fer 
          * java rc.mon.DQMon & 
    
    I explained the present CO how to use it. Please mention 
    to the new shift the instructions.
     - Mario
    Thu Mar 4 23:44:29
     - Christopher
    Thu Mar 4 23:46:36
    DateTimeBLMDose
    2004.03.0423:44:03W Inner BLM521.69RADS
    2004.03.0423:44:03W Outer BLM0.00RADS
    2004.03.0423:44:03E Inner BLM3.03RADS
    2004.03.0423:44:03E Outer BLM447.26RADS
    Integrated dosage  - Christopher
    Thu Mar 4 23:56:12
    Run Number Data Type Physics Table Begin Time End Time Live Time L1 Accepts L2 Accepts L3 Accepts Live Lumi, nb-1 GR SC RC
    179643 x2BDBB BEAM PHYSICS_2_02 [2,424,431] 12:08:03 20:11:58 07:24:30 397,695,812 6,122,026 1,325,284 728.122 1 1 1
    179644 x2BDBC BEAM PHYSICS_2_02 [2,424,431] 21:35:04 02:16:28 95,216,684 1,330,472 324,667 151.650 1
    Totals 23:55:02 09:40:59 492,912,496 7,452,498 1,649,951 879.772
     - End of Shift Report
    Fri Mar 5 00:00:33 Shift Summary:
    Store 3273 (inst lumi 1.73E31) and run 179644 in progress
    
    
       - COT in slightly-degraded mode, only SL 4 and 5 at reduced         gain; XFT three miss for SL 4
    
    
       - running old-default low lumi table PHYSICS_2_02 
       - SVX readout error for two wedges, fixed by Florencia/Reiner 
       - level 3 hung up after VRB reset during run termination, 
         fixed by Arkadiy 
    
    Plan is to continue with PHYSICS_2_02[2,424,431] 
       - calibration and cosmics between stores 
       - DQMon running on CO screens, please report experience

    End of Shift Numbers
    CDF Run II

    Runs                   179643, 179644
    Delivered Luminosity   0.619 pb^-1  
    Acquired Luminosity    0.476 pb^-1  
    Efficiency             77.0
    
    
     - Stephan