2004 CDF E-Log -- Eve shift. Sun Feb 29, 2004
SciCo DAQ Ace Monitoring Ace CO (Operations Manager)
Stephan Lammel Jan Ehlers Anadi Canepa Diego Cauz JJ Schmidt


Start of Shift Notes:  

Shot 3263 (inst lumi 4.0E31) run 179504 (Silicon in, COT in degraded config) in progress

Sun Feb 29 16:22:17
do TOF people know why this guy has turned red?
 - diego
Sun Feb 29 16:37:40
at the end/beginning of last/this shift: 
-REFORMATTER ERROR from FIB_00 
-BUSY TIMEOUT from SVX06 
-SCPU_BAD_VRB_BYTE_COUNT error forwarded by CCAL07 
-some L2 DECISION TIMEOUTs 
 - Jan :: (run 179504)
Sun Feb 29 16:39:05
 - Anadi
-- Sun Feb 29 16:39:27 comment by...Anadi --  Hourly Plots: 15:30-16:30
Sun Feb 29 16:44:49 Run 179504 Terminated at 2004.02.29 16:44:36 - RunControl
Sun Feb 29 16:45:43 Run 179504 TERMINATE: lumi dropped below 40E30 - Jan x2080
Sun Feb 29 16:49:01 Run 179505 Activated at 2004.02.29 16:48:33 - RunControl
Sun Feb 29 16:49:50 Run 179505 ACTIVATE: low lumi trigger table PHYSICS_2_03[1,431,435] - Jan x2080
Sun Feb 29 16:52:00
pipline out of synch in 130 silicon readout chips
 - Jan :: (run 179505)
Sun Feb 29 17:18:06
DONE TIMEOUT for B0PCAL04 -> had to shepherd
 - Jan :: (run 179505)
Sun Feb 29 17:21:22 Per JJ's request, I have corrected the runSummary luminosity informations in the run database for store 3256, as the b0clc00 crate was rebooted twice during this strore, resetting the integrated luminosity counters. This store starts at run 179358; the corrections affect runs between 179364 and 179385. The CLC crate was rebooted during the process of loading new TDC DSP code. - W.Badgett
Sun Feb 29 17:36:23
twice in five minutes L2 DCISION TIMEOUT with the following long message in the error display:



MLE) b0l2de00:SpyAlpha:5:31:04 PM-> 
L1Mon: saw 210 L1 DMA transfers, expect 1 (buffer number 0) 
L1Mon: Dumping data for  1 word. 
Word       upper 32 bits  lower 32 bits 
   0: 0x00010000	0x00020000  
   1: 0x00000000	0x00300006  
   2: 0x00030000	0x00030000  
   3: 0x00030000	0x00030000  

...... 

  417: 0x00000000	0x00000000  
  418: 0x00000000	0x00000000  
  419: 0x00000000	0x00000000  
L1Mon: done.
 - Jan :: (run 179505)
Sun Feb 29 17:37:23
 - Anadi
-- Sun Feb 29 17:38:54 comment by...Anadi --  
Hourly Plots: 16:30-17:30

Sun Feb 29 17:37:42 spike in B0PBSM (proton abort gap loss) at 17:12, called MCR, say see nothing unusual - Stephan
Sun Feb 29 18:12:19
We have a Silicon trip: B3W8L0. We halt the run and ricover the trip. For details see Si expert
entry. 
 - Anadi
-- Sun Feb 29 18:38:37 comment by...Anadi --   Si expert link
Sun Feb 29 18:21:05
just for information: stuck cell ID S/B2/W7/L1/C0-2 
as usual auto HRR is issued
 - Jan :: (run 179505)
Sun Feb 29 18:36:25 Stage0 showed PROBLEM, so I notified Alexey about it. - diego :: (run 179504)
Sun Feb 29 19:02:47
 - Anadi
-- Sun Feb 29 19:03:21 comment by...Anadi --  Hourly Plots: 17:30-19:00
Sun Feb 29 19:03:14
L2 decision timeout  
again with the following error message: 

MLE) b0l2de00:SpyAlpha:7:01:03 PM-> 
L1Mon: saw 210 L1 DMA transfers, expect 1 (buffer number 0) 
L1Mon: Dumping data for  1 word. 
Word       upper 32 bits  lower 32 bits 
   0: 0x00010000	0x00020000  
   1: 0x00000000	0x40901020  
   2: 0x00030000	0x00030000  
   3: 0x00030000	0x00030000  

.... 

  418: 0x00000000	0x00000000  
  419: 0x00000000	0x00000000  
L1Mon: done.
 - Jan :: (run 179505)
Sun Feb 29 20:14:15
 - Anadi
-- Sun Feb 29 20:30:41 comment by...Anadi --  Hourly Plots: 19:00-20:00 Proton losses (B0PAGC) slightly increasing in time but less then 4.5 kHz. Spike in proton losses (B0PBSM) almost 35 kHz
Sun Feb 29 20:23:58
L2 Decision Timeout after a lot of CLIST errors
 - Jan :: (run 179505)
Sun Feb 29 20:32:41 Efficiency for store almost over 90%. Good job shifters! - jj
Sun Feb 29 20:36:14 We observe bias current for B4W1L2 slightly increasing since two hours. Current is 101.0 which exceeds caution value by 0.5 and the good value by 1. We just unmarked the cell twice without paging; there is no error in SVXMon. Several automatic HRR have been issued so we didn't issue any manual one. We unmark it for the third time. If it comes back pink we'll page the Si expert.  - Anadi and Jan
Sun Feb 29 20:38:56
history plot of B4W1L2
 - Anadi & Jan
Sun Feb 29 20:49:31
a HEAP CORRUPT occured for crate B0COT18 
-> shepherd -> fine
 - Jan :: (run 179505)
Sun Feb 29 20:55:45
REFORMATTER ERROR summary of the last 90 minutes: 

5 times from FIB_00
 - Jan :: (run 179505)
Sun Feb 29 21:00:24 Bias current in B4W1L2 still increasing in time. We unmark the cell for the 4th time and issue a manual HRR. Cell pink again. We page the Si expert.  - Anadi
-- Sun Feb 29 21:02:24 comment by...Anadi --  Pete replied immediately; he is coming in.
-- Sun Feb 29 22:10:37 comment by...Anadi --  Pete fixed the problem. For more details, go to: history
Sun Feb 29 21:23:38
l2 Decision Timeout: 

(MLE) b0l2de00:SpyAlpha:9:21:09 PM-> 
Error: Startload is still low. ModDone bits 0x4007fdff for event 238458431 

(MLE) b0l2de00:SpyAlpha:9:21:09 PM-> 
TIMEOUT: Mod done not set for slot(s) 12 (SVTList)  
 Raw: (0x0007fdff, cnt 7) 
XTRP: (BC = 127, Buffer no = 1)
 - Jan :: (run 179505)
Sun Feb 29 21:32:47
two REFORMATTER errors occured: 

once FIB02 and 
once PART 1 SCPU 8 <- unfortunately no more information to localize
 - Jan :: (run 179505)
Sun Feb 29 22:15:07
 - Anadi
-- Sun Feb 29 22:19:19 comment by...Anadi --  Hourly Plots: 20:00-22:00 Spike in B0ABSM similar to the proton loss spike we had before which was not associated to anything unusual. We don't call MCR. Proton losses (B0PAGC) are slightly increasing in time: from 4.2 to 5 kHz in the last 2 hours.
Sun Feb 29 22:32:24 We have an harware trigger inhibit followed by the software one for CPR NW. We recover the trip by setting it to Standby and ON.  - Anadi
Sun Feb 29 22:35:30
REFORMATTER error from FIB_00
 - Jan :: (run 179505)
-- Sun Feb 29 22:41:37 comment by...rainer --  FYI - reformatter errors of the form
(MLE) b0l3pcom2.fnal.gov:main:10:24:02 PM->Error on L3 node b0l3052 (partition 1) Sun Feb 29
22:24:02 2004 l3_node 

  in refoInt_reformatProc (l3_refevt.c:610)
  @L3_REFORMAT_ERROR

RAWREF Error - VRB-DLINCO - with code 18 -- START
 Line - 1111 in: src/library/rr_rawevt.c
 Buffer position: 23277
 Error count: 1 of 1
 Reject rate: 0.010 percent

 SVX data len, VRB total and links incons.
 Total Len   Header Len  Data Length
  0x00000838  0x00000020  0x00000918

 Error location code: 0x01005590


 Crate Id?: 0x00000010

 Error during scanning MINI structure!
 Last Unit prob. worked on (all counts from 0):
  SCPU: 5
  VRB : 5
  LINK: 9
  MINI: 0

 Buffer (23257 until posi+2=23279)
 0x2914f313  0x06160915  0x03021aa4  0x05040a03  0x1aa21aa3  0x021e1aa1 
 0x0620091f  0x092c012b  0x1aa0052d  0x0a7b087a  0x037f067c  0x0000c1c1 
 0x02003808  0x10044110  0x10040110  0x7e01ba01  0xcc002401  0x6400b400 
 0x18010000  0xdc00c200  0xfe79a0e0  0xf3001a83  0x0b060705 


 NOTE: Event rejected!

RAWREF Error - VRB-DLINCO - with code 18 -- FINISHED: L<  893

are relatively harmless as long as their rate is << 1% - these are probably bit errors in the ATM ring or something like that. if the rate is approaching 1%, then one needs to worry - page silicon expert in that case. also if reformatter errors other than VRB-DLINCO are thrown at appreciable (i.e. not << 1%) rate.

reformatter file can be looked at in new run summary.


Sun Feb 29 23:09:57
-problem with VRB in slot 20 

SCPU-P1-E-BadVrbByteCount: Event 5747665 total byte count is lt 32, gt 65528, or not divisible by 8
for VRB in slot 20. 

Count = 49601. -->   

-connected with REFORMATTER error in FIB_00 (at the same time) 
 - Jan & Anadi :: (run 179505)
-- Sun Feb 29 23:14:07 comment by...Jan --  
It's the same REFORMATTER error which Rainer described minutes ago! (rate 0.008%)

Sun Feb 29 23:18:44
 - Anadi
-- Sun Feb 29 23:19:05 comment by...Anadi --  Hourly Plots: 22:00-23:00
Sun Feb 29 23:56:12
Run Number Data Type Physics Table Begin Time End Time Live Time L1 Accepts L2 Accepts L3 Accepts Live Lumi, nb-1 GR SC RC
179504 x2BD30 BEAM PHYSICS_HIGHLUM_2_03 [1,432,436] 13:21:42 16:44:36 03:10:15 65,644,340 3,038,687 482,843 490.034 1 1 1
179505 x2BD31 BEAM PHYSICS_2_03 [1,431,435] 16:48:33 06:33:34 381,314,511 6,467,170 1,203,155 755.012 1
Totals 23:55:03 09:43:49 446,958,851 9,505,857 1,685,998 1245.046
 - End of Shift Report
Sun Feb 29 23:59:32 Shift Summary:
Store 3263 in progress luminosity dropped from 4.0E31 to
2.7E31 

   - took data with new high lum table PHYSICS_HIGHLUM_2_03[1,432,436] 
   - switched to new low lum table PHYSICS_2_03[1,431,435] 
   - Silicon, CPR trips, nothing major 

Plan is to continue data taking 
   - Ultra Prescale test at the end of the store

End of Shift Numbers
CDF Run II

Runs                   179504, 179505
Delivered Luminosity   0.936 pb-1  
Acquired Luminosity    0.860 pb-1  
Efficiency             91.9

 - Stephan