2004 CDF E-Log -- Day shift. Fri Feb 27, 2004
SciCo DAQ Ace Monitoring Ace CO (Operations Manager)
Beate H./Rainer W. Vadim Khotilovic Natasa Miladinovic Franco Semeria JJ Schmidt


Start of Shift Notes:  

Tevatron dry squeeze under way. Waiting for shot setup. 
Da Plan:  
- taking cosmics until shot setup 
- silicon SRC diagnostics. 

Fri Feb 27 08:32:25 Unmarked S_VBIAS_B0W4L0 in IMon. Let's see if it stays green. - natasha
Fri Feb 27 08:39:01 call Guillelmo if shot setup. L3 currently in a funky state. - Beate/Rainer
Fri Feb 27 08:41:07 I have integrated a few nodes in the farm (all processor nodes are on-line now) and I am testing the l3proxy on pcom1.
If you have problems with l3proxy (please follow this order):

1) Try to restart it using the Ace Control Panel
2) If not possible, call me
3) If not answer (rare), page L3 - Guillelmo
-- Fri Feb 27 08:41:38 comment by...Guillelmo --  
Call me when shot setup starts! (do NOT page L3)


Fri Feb 27 08:42:23 Bill working IMU DSP code. See earlier entry.  - Beate/Rainer
Fri Feb 27 08:49:05 Run 179439 Activated at 2004.02.27 08:48:58 - RunControl
Fri Feb 27 08:49:43 Run 179439 ACTIVATE: COSMICS [12,391,403] - Vadim x2080
Fri Feb 27 09:04:28 PSM alarm: 1RR18C (IMU W muon TDCs) turned red. Channel 2 was out of range (-1.20V or so, when it should have been -5.20V), but after 2 minutes everything seems fine.  - natasha
Fri Feb 27 09:05:57 VNODE1 gives the following message: "You have not been backed up since Thu Feb19 2004. Contact your backup administrator for more info.". Will send an email to Mark Knapp. - natasha
Fri Feb 27 09:10:18 Run 179439 Terminated at 2004.02.27 09:09:57 - RunControl
Fri Feb 27 09:12:26 Run 179439 TERMINATE: End run to put in crate imu00 - Vadim x2080
Fri Feb 27 09:15:39 Run 179444 Activated at 2004.02.27 09:14:42 - RunControl
Fri Feb 27 09:15:40 Run 179444 ACTIVATE: COSMICS [12,391,403] - Vadim x2080
Fri Feb 27 09:32:24 Run 179444 Terminated at 2004.02.27 09:32:02 - RunControl
Fri Feb 27 09:32:52 Run 179444 TERMINATE: will put the system to standby - Vadim x2080
Fri Feb 27 09:33:21
MCR calls - they will inject beam. end the run and put detectors to standby.
 - Beate/Rainer
Fri Feb 27 09:47:56 Same PSM alarm as 40 minutes ago... (9:04 entry) - natasha
-- Fri Feb 27 10:02:11 comment by...Rainer --  Bill Badgett gets the blame. He was working on the crate in question.
Fri Feb 27 09:52:10 RC reported missing Icicle heartbeat. The Icicle looked like running. But I restarted it anyway. - Vadim
-- Fri Feb 27 09:57:06 comment by...Vadim --  heartbeat came back
Fri Feb 27 09:54:09 TOF HV wouldn't go down. There was some problem w/ Smacs (not exactly hung, but not responding properly). Also there was the "not backed up" message on the TOF PC. - natasha
-- Fri Feb 27 09:57:12 comment by...Rainer --  TOF expert arrived. working on it.
-- Fri Feb 27 10:11:24 comment by...Rainer --  expert ok's to have MCR inject beam. called them and told them it's fine to proceed.
Fri Feb 27 09:54:49
FYI:  I am still working on crate IMU_00 and IMU_01 --  
you will see them turn red sometimes in vxmon & the  
PSMON of iFix might complain about them when I cycle  
their power. 

Testing TDC DSP code V45. 
 - W.Badgett
Fri Feb 27 09:58:48 Again TOF Heartbeat -- the TOF PC has a pop -up window saying "The remote server machine does not exist or is unavailable. The client Program not found." I eblieve this was the same message given when the first TOF heartbeat went off, so Smacs crashed. I restarted it fine, but the problem persists. The expert is here looking at it. - natasha
Fri Feb 27 10:04:02 the search option for the e-logs now uses more convenient pull down menu's -- also, the temp file area was changed to notebooks/temp...send me email if there are problems or suggestions... - R. Vidal
Fri Feb 27 10:07:09 b0svt06 reported with heap corruption in vxworks. CPU output reports:
 
adword addr 80001c, opened as geo32:/slot=13 
status = 5 
readword addr 80001c, opened as geo32:/slot=13 
status = 5 
readword addr 80001c, opened as geo32:/slot=13 
status = 5 
writeword addr 80001c, opened as geo32:/slot=13 
status = 5 
SVTMON -> Filling Histograms    free 119601224 max 101253152 in 2569 blocks: 14920 us 
SVTMON -> Sending Status msg  free 119601224 max 101253152 in 2569 blocks: 23269 us 
SVTMON -> >> completed loop n. 25378  << Reading board regs    free 119631944 max 101253152 in 2560 blocks: 147 us 
SVTMON -> Reading spy buff      free 119631944 max 101253152 in 2560 blocks: 4805 us 
readword addr 400018, opened as geo32:/slot=13 

 - Beate/Rainer
-- Fri Feb 27 10:11:41 comment by...natasha --  Vadim shepherded it and now it appears fine in VxWorks monitor.
Fri Feb 27 10:16:03 Run 179448 ACTIVATE: COSMICS_NOTRACKS [6,392,403] - Vadim x2080
Fri Feb 27 10:23:02 Run 179448 TERMINATE: will torture L2 - Vadim x2080
Fri Feb 27 10:37:23 Run 179449 TERMINATE: l3 is red: the last heartbeat was 6 min ago - Vadim x2080
-- Fri Feb 27 10:43:19 comment by...Vadim --  b0eb19 was not responding. RC asked to clean up EVB. Cleaned up L3 farm and EVB
Fri Feb 27 10:45:40 After a voice message "Tevatron Abort" we got a Clock Error (already three of them!): rc.vxmon.ClockStatus@8c3c1d Status: 0x18000 Fatal Error PCC_CSR_6_SYNC_TIMING_ERROR PCC_CSR_7_SYNC_MISSING_ERROR Time: : 1077900027 TimeNanoSeconds: 0 SeqCSR0: 0xfd27 SeqCSR1: 0x4 PccDelay: 0xffb0 PccCSR: 0xffc3 - Vadim
-- Fri Feb 27 10:47:34 comment by...ronmoore --  The Tevatron Low-Level RF processor was rebooted. That is the likely source of the clock errors. Check with MCR.
-- Fri Feb 27 10:55:28 comment by...Vadim --  JJ cleared errors on the clock crate
Fri Feb 27 10:58:25 pcal11 was not responding. shepherded it. - Vadim
Fri Feb 27 11:03:27 L3 becaome red during partitioning again. But now all the eb's are responding. Calling the experts - Vadim
-- Fri Feb 27 11:16:35 comment by...Guillelmo --  
Did you call me or did you page L3??????????

-- Fri Feb 27 11:23:41 comment by...Vadim --  I thought it was some not l3proxy related problem, and we called the pager. Cleaning up the EVB again helped.
-- Fri Feb 27 17:15:34 comment by...Nuno --  Clarifications about this pager: (i) what was done was to cleanup L3 a second time, not EVB; (ii)the red color in the partitions state bars on the L3 display indicated the problem was precisely with l3proxy.
Fri Feb 27 11:19:45 Run 179453 ACTIVATE: L2_TORTURE[15,390,406] - Vadim x2080
Fri Feb 27 11:23:34 We have two protons bunches in the machine... - Beate
-- Fri Feb 27 11:37:45 comment by...m mattson --  
I'm surprised that Ron Moore has not set up an automated script, to explain that these are not two proton bunches. :)
-- Fri Feb 27 11:58:06 comment by...ronmoore --  Yes, it is a pet peeve of mine, but I would not go to that extreme. Yet. (Perhaps the SciCos should also read my Machine Session in the Ace Training presentation.) One can also compare the TevDC (total beam intensity) and TeVPR (proton intensity in the 36 RF buckets used for coalesced protons in HEP stores)...with uncoalesced protons, TevDC is almost an order of magnitude bigger than TevPR! Also notice that when uncoalesced protons are at 980 GeV, it may appear that there are two proton bunches AND 1 pbar bunch! The uncoalesced protons also span a bucket normally occupied by a coalesced pbar bunch at 980 GeV.
-- Fri Feb 27 12:05:44 comment by...ronmoore --  The TevDC and TevPR quantities I mentioned can be found on the channel 13 a.k.a. "notify" display.
Fri Feb 27 11:26:43 during the partitioning the crate b0clc00 was not responding. had to shepherd it - Vadim :: (run 179453)
Fri Feb 27 11:31:14
I have finished working on IMU_00 and IMU_01 crates;  they  
have been put back to their original configuration  
with the old TDC DSP code V37.  Have added them  
back into the main partition to verify they are  
OK.   A cosmic run should be taken to make sure  
no noisy channels or other problems were induced in the  
power cycling. 
 - W.Badgett :: (run 179454)
Fri Feb 27 11:33:59 Run 179454 ACTIVATE: L2_TORTURE [15,390,406] - Vadim x2080
Fri Feb 27 11:44:09
Conclusion from testing TDC DSP V45 on b0imu crates:   
The value of the TDC header as read from the TDC's  
Static RAM seems to be OK.   However, the header as read  
from the TDC hit FIFO (last word in the FIFO for V45) does  
not have the correct data at a significant rate  (1 - 10%)  
when operating at high trigger/readout rates. 

Sometimes the module ID is correct, sometimes not.   Invariably  
the other fields of the header are not correct when a mismatch  
is encountered.   

However, when running with a MVME 2401 in place of the  
usual MVME 2301 crate processor, the problem vanishes  
completely.   Both CPUs use the same Tundra/Universe  
VME interface, but they have different versions of the chip. 
My hypothesis is that the older 2301 does not prohibit  
a read-ahead on a block transfer when it reaches a 256 byte block boundary, whereas the 2401 does do
this.    This feature is  

needed in order to use the dual-access FIFO readout mode  
of DSP V45.   Note that the COT crates all have MVME 2401  
processors, and thus do not enounter this problem. 

stop the read-ahead feature at the final byte.
 - W.Badgett
-- Fri Feb 27 11:48:11 comment by...WB --  
Also note that crate b0imu01 has DTACK filters 
installed on Trace and MVME extender board.   b0imu00 
has neither DTACK filter.   The 2301/2401 effect was 
seen in both crates.

-- Fri Feb 27 12:14:59 comment by...ps --  also b0imu01 has rev f's and 00 'd's if you care
Fri Feb 27 11:50:44
clc00 was not responding again. tried to shepherd it - didn't succeed. tried to vxlogin and
reboot didn't help - got stuck in the process. tried to reset without shepherding and then to
shepherd - helped.  


after several minutes got: 
b0dap58.fnal.gov:Thread-154:11:46:41 AM->Task receive/clcreceive SUSPENDED on node b0clc00
 - Vadim :: (run 179454)
-- Fri Feb 27 11:55:18 comment by...rainer --  paged CLC expert.
-- Fri Feb 27 11:56:07 comment by...Vadim --  vxlogin and reboot doesn't do anything - it's stuck. calling the CLC expert
-- Fri Feb 27 11:59:12 comment by...rainer --  sasha responded. he is coming in.
-- Fri Feb 27 12:48:02 comment by...rainer --  Frank Chlebana reset the crate, and now it works.
-- Fri Feb 27 12:57:23 comment by...Ming --  
Yesterday we replaced the cpu in the b0clc00 crate with a new and faster cpu. However we found
out that we need to recompile one of the program (clcreceive) that runs on that cpu. We don't know
if the suspension of the clcreceive program is related to the change

of the cpu, and the recompiling of the code (as this program had
experienced suspension with the previous cpu). We decide to leave the situation as it is and see if
this program suspens again.

-- Fri Feb 27 13:04:40 comment by...rainer --  Frank and Ray Culbertson decided to switch the CPUs in the clc crate.
Fri Feb 27 12:03:06 from MCR elog:
 
11:38:18-  TeV is approaching a lowbeta squeeze, and has uncoalesced beam at 980GeV 
 - Rainer
-- Fri Feb 27 13:11:06 comment by...rainer --  12:32:48 comment by...bS -- ...some orbit smoothing was done and we are heading to the proton injection porch, from a wet squeeze, for another round.
Fri Feb 27 12:35:32 Mario working on DAQMon. - Rainer
Fri Feb 27 12:54:33 Run 179454 TERMINATE: experts will swap clc00's CPU - Vadim x2080
Fri Feb 27 13:00:57 Run 179455 ACTIVATE: L2_TORTURE [15,390,406] without clc00 crate - Vadim x2080
Fri Feb 27 13:00:58 Run 179455 TERMINATE: will add clc00  - Vadim x2080
Fri Feb 27 13:05:49 Run 179456 ACTIVATE: L2_TORTURE [15,390,409]  - Vadim x2080
Fri Feb 27 13:06:09 Run 179456 TERMINATE: hmm.. it looks like I didn't actually add clc00 - Vadim x2080
Fri Feb 27 13:12:42 Run 179457 ACTIVATE: L2_TORTURE [15,390,409] with clc00 processor swapped - Vadim x2080
Fri Feb 27 13:13:28 The new 2400 in the b0clc00 crate was swapped with a second 2400. We will watch for problems, the old 2300 can be swapped back quickly if needed.  - Ray Culbertson, Frank Chlebana
Fri Feb 27 13:35:15 Run 179457 TERMINATE: will run without cal an cal trigger crates - Vadim x2080
Fri Feb 27 13:35:16 Bill Badgett borrows some crates for tests. - Rainer
Fri Feb 27 13:39:45 Run 179458 ACTIVATE: L2_TORTURE [15,390,409] without cal and trigger cal crates - Vadim x2080
-- Fri Feb 27 15:49:44 comment by...Rainer --   SVX trips and cooling alarm

several ladders in SVX trip off at 13:52pm, SVX ladders go into high current DVDD state. A cooling alarm ensues which kills power to SB4. silicon SPL kills the power to rest of SVX. experts investigating. See silicon elog.


Fri Feb 27 15:00:08 we are officially in shot setup.  - rainer
-- Fri Feb 27 15:56:13 comment by...rainer --  MCR elog: 15:34:56- Loading Final Protons.
Fri Feb 27 15:04:31 Run 179458 TERMINATE: Scanner Manager SM_SCPU_TIMEOUT Error: HRR didn't help. Will cleanup EVB. - Vadim x2080
Fri Feb 27 15:27:55
Quad angles for last 2 months
 - JJ
-- Fri Feb 27 15:28:54 comment by...JJ --  
Although this plot is of some interest, it was mainly put
here to see I would reproduce problems that others were
having putting graphics in eLog.

Fri Feb 27 15:54:56 Shift Summary:
- TOF HV stayed up after MCR informed us they want to put
beam in the machine - needed expert intervention to get it to OFF.  

  apparently some ifix <-> tof PC communication problem. 
- SRC guys working on silicon. new SVX SRC is in. major cooling trip likely related to SRC work.
still ongoing investigation 

- CLC cpu needed expert intervention (crate reset and CPU swap) 
- we are in shot setup. protons going in. 
 - rainer
Fri Feb 27 15:59:55 Loading Final Protons - Stephan Lammel/Robert Harris
Fri Feb 27 16:01:44 Shift Summary:
See above
 - St. Lammel/R. Harris
Fri Feb 27 16:01:45
SVRAD totals after final proton have been loaded  
Date Time BLM Dose  
2004.02.27 15:54:42 W Inner BLM 0.00 RADS  
2004.02.27 15:54:42 W Outer BLM 0.00 RADS  
2004.02.27 15:54:42 E Inner BLM 0.00 RADS  
2004.02.27 15:54:42 E Outer BLM 0.34 RADS  
Integrated dosage 
 - Anadi & Natasha