2004 CDF E-Log -- Eve shift. Fri Mar 5, 2004
SciCo DAQ Ace Monitoring Ace CO (Operations Manager)
Rei Tanaka /
Stephan Lammel
Andrew Ivanov Simon Sabik Guenakh Mitselmakher /
Diego Cauz
Mary Convery


Start of Shift Notes:  

Still in access for 5-8 hours (D0).

Fri Mar 5 16:35:58 D0 needs a couple of hours more for an access.  - Rei Tanaka
Fri Mar 5 16:38:28 Played around testing TOF loading of HV file settings. Looks ok! - bauerg
Fri Mar 5 17:01:19 Run 179672 Activated at 2004.03.05 17:00:43 - RunControl
Fri Mar 5 17:01:20 Run 179672 ACTIVATE: COSMICS[12,391,403] - Andrew X2080
Fri Mar 5 17:06:51 About Level3 page
It looks like experimental trigger table tests from previous shift made level3 control program l3_node crash.
when l3_node crashes it dumps the core in the format core.xxx after they tried a few times almost all the nodes of level3 farm had 100 % full /cdf partition
That's why level3 did not work after they returned to a default trigger table
Now , to fix this there is a linux config file called /etc/sysctl.conf which controls how the program dumps the core
I checked that and it was fine kernel.core_uses_pid = 0 which tells the kernel to dump the core in core format
What remains unclear is why l3_node still dumps core.xxx because other level3 programs dump just a core file
One likely reason is that variable is overwritten somewhere in the level3 code but this can not be because for change to take place the machine has to be rebooted
The other reason is different core dump happens whether the program is single threaded or multi threaded
To fix Level3 i cleaned up core files on all the farm Now level3 seems to be fine  - Arkadiy
Fri Mar 5 17:44:17 from the Run Coordinator elog:
16:35:33- D0 estimates that their access will end around 1900. They can have more time if they think they have identified the problem and can fix it. After the access, we will recover and go into shot setup. Pbar studies can continue until shot setup if the studier has the stamina. - JPM
 - convery
Fri Mar 5 18:16:46 TOF channel trip. It was not possiblre to reset it from IFIX. It said "system is busy". I resetted it from the TOF PC. It is back up. - Simon
Fri Mar 5 18:39:11 Run 179672 Terminated at 2004.03.05 18:38:44 - RunControl
Fri Mar 5 18:39:19 Run 179672 TERMINATE: end the run for a clean restart - Andrew X2080
Fri Mar 5 18:40:39 Cryo shift Jim H. came up. As written in Cryo&Gas e-log, Rack 2RR23C has a single fan failure, the bottom fan is off. Jim says that this is not an urgent problem, and sould be notified at 8a.m. meeting on Monday and to be repaired. - Rei Tanaka
Fri Mar 5 18:43:44 Run 179673 ACTIVATE: another trial - Andrew X2080
Fri Mar 5 18:46:15 TOF channel tripped again. Channel 308 (same as last time). Ramped it back on from IFIX. - Simon
Fri Mar 5 19:10:31 Run 179674 ACTIVATE: ISL frontend setting checkout  - 6880 rsw
Fri Mar 5 19:11:03 Called MCR asking the status. D0 has finished their access. They will check D0 first if everything is OK. Then shop setup will follow. - Rei Tanaka
Fri Mar 5 19:11:31 Run 179674 TERMINATE: end ISEL FE setting test run - 6880 rsw
Fri Mar 5 19:13:11 Run 179675 ACTIVATE: try one more time  - 6880 rsw
Fri Mar 5 19:15:25 Run 179675 TERMINATE: end test run for ISL - 6880 rsw
Fri Mar 5 19:20:56 Run 179676 ACTIVATE: test run ISL with svxmon enabled in partition 0 - 6880 rsw
Fri Mar 5 19:28:41 Run 179673 Terminated at 2004.03.05 19:28:36 - RunControl
Fri Mar 5 19:29:42 Run 179673 TERMINATE: stop for a clean start - Andrew X2080
Fri Mar 5 19:31:37 Run 179677 Activated at 2004.03.05 19:30:59 - RunControl
Fri Mar 5 19:31:52 Run 179677 ACTIVATE: start new cosmic run for pulsar fiber splitting (MUON) - Andrew X2080
Fri Mar 5 19:33:10 Run 179677 Terminated at 2004.03.05 19:33:01 - RunControl
Fri Mar 5 19:33:45 Run 179677 TERMINATE: end cosmics - Andrew X2080
Fri Mar 5 19:37:31 Run 179676 TERMINATE: end test run - 6880 rsw
Fri Mar 5 19:41:36 Run 179678 Activated at 2004.03.05 19:41:15 - RunControl
Fri Mar 5 19:42:21 Run 179678 ACTIVATE: new run for the muon fiber splitting - Andrew X2080
-- Fri Mar 5 20:05:53 comment by...Andrew --  
it is cosmics run

Fri Mar 5 19:46:21 ICICLE heart beat on IFIX. Restarted it. - Simon
Fri Mar 5 20:16:02
D0 said they disengaged a part of their Toroid, now the magnet is in full power and they no longer see the noise in the calorimeter. So, they seem to have fixed the problem!
 - nigel and kaori
Fri Mar 5 20:43:46 MCR called us. Shall start shot setup in 5-10 minutes. - Rei Tanaka
-- Fri Mar 5 20:54:05 comment by...Rei Tanaka --  MCR called us again. Shot setup started.
Fri Mar 5 20:55:14 Run 179678 Terminated at 2004.03.05 20:54:27 - RunControl
Fri Mar 5 20:55:15 Run 179678 TERMINATE: end run - Andrew X2080
Fri Mar 5 21:00:50 Run 179679 Activated at 2004.03.05 20:56:53 - RunControl
Fri Mar 5 21:00:51 Run 179679 ACTIVATE: new run for MUON fiber splitting, all fibers splitted, accumulate statistics - Andrew X2080
Fri Mar 5 21:05:57 CMX trip. SW channels 9 to 12. Ramped back up. - Simon
-- Fri Mar 5 21:10:23 comment by...Simon --  ALL HV on standby for shot setup.
Fri Mar 5 21:21:43 Run 179679 Terminated at 2004.03.05 21:21:33 - RunControl
Fri Mar 5 21:22:23 Run 179679 TERMINATE: switching to SHOTSETUP - Andrew X2080
Fri Mar 5 21:33:18 Run 179680 ACTIVATE: SHOTSETUP run + accumulating statistics for fiber splitting test - Andrew X2080
Fri Mar 5 21:49:32
DateTimeBLMDose
2004.03.0521:49:07W Inner BLM0.00RADS
2004.03.0521:49:07W Outer BLM0.00RADS
2004.03.0521:49:07E Inner BLM0.00RADS
2004.03.0521:49:07E Outer BLM0.00RADS
Integrated dosage - Simon
Fri Mar 5 21:53:33
For Guenakh:  
if trying to stop the consumers ^C fails, one has to 
log into the machine the consumer is running on.  
If also the login fails, try with: 
> kdestroy 
> kticket 
> ssh -l cdfdaq  
 - diego
Fri Mar 5 21:54:43
Tonight for the first time we got the fiber splitting for muon path working (still waiting for more statistics to test the robustness of the splitting). We have tried in a systematic way to understand the problems encountered in the past. To make the long story short, the problems were mostly due to one bad fiber (from splitter to L2 muon Pulsar input in the L2 decision crate). The details can be found at Pulsar e-log . We plan to leave this splitting setup in for the upcoming store, to test the robustness. PLEASE watch the TrigMon for the L2 Pulsar plots, if there is any problem, please page Pulsar pager: 218-9486 (Burkard), 630-544-7530(Burkard cell), 630-357-1530 (Burkard home). 630-988-9986 (Ted cell), and 630-357-9986(Ted home). Please also let the people for the upcoming shifts know about this new fiber splitter setup for the L2 muon path.
 - Burkard and Ted
-- Sat Mar 6 01:47:58 comment by...Burkard and Ted --  the splitting didn't work in the beam (see later entry on this shift e-log. three channels failed. the splitters are removed.
Fri Mar 5 22:26:11 Run 179680 TERMINATE: switch to test alpha table - Andrew X2080
Fri Mar 5 22:39:14 Run 179681 ACTIVATE: TEST_ALPHA_CLUSTERING_NOSPIKES[4,436,403] - Andrew X2080
Fri Mar 5 22:43:03 Run 179681 TERMINATE: end test - Andrew X2080
Fri Mar 5 23:04:21 Run 179682 Activated at 2004.03.05 23:04:10 - RunControl
Fri Mar 5 23:05:35 Run 179682 ACTIVATE: PHYSICS_2_02[2,424,431] - Andrew X2080
Fri Mar 5 23:09:29
At L=67 we have ~30% deadtime with the default PHYSICS_2_02 
L1 14 kHz 
L2 390 Hz 
L3  70 Hz
 - convery :: (run 179682)
-- Fri Mar 5 23:22:57 comment by...Taka --  
L3 looks having room. Problem might be a L2 rate.  

-- Fri Mar 5 23:36:07 comment by...Taka --  "problem" means 30% dead time.
Fri Mar 5 23:11:21
 
-------------------------------------------------------- 
Mar.05, 2004	Store #3275   Time	p	pbar 
-------------------------------------------------------- 
Previous store dumped.        11:12     

Shot setup       started      20:54             169E10 
proton injection started      21:50    (call Ops Manager) 
pbars loading    started      22:13    (page SVX Primary) 
pbars loading    finished     22:38       E10      E10 
Scraping end                   
	 Lumi = E30                 E30 
MCR called us at              22:56    (scraping finished)  
CLC HV up                     23:00    (call MCR) 
         C:B0ILUM             68.30 E30 
         C:LOSTP              11 kHz  
         C:LOSTPB             0.6KHz   
         TevPR                8438 10**9   
         TevPB                1201 10**9 
H.V. (except Si) ON           23:02 
H.V. Silicon     ON           23:05   
Physics RUN# start            179682 PHYSICS_2_02[2,424,431]  
	 Lumi = E30           67.03 E30 
-------------------------------------------------------- 
 - Rei Tanaka
-- Sat Mar 6 00:11:49 comment by...convery --  FYI from the run coordinator elog:
19:31:06- Shot Strategy: Protons 270-280E9 per bunch at 150 GeV in the Tevatron, pbars as per guidelines. No extra or "aggressive" cooling on this shot, we want to make sure smaller pbars aren't skewing the luminosity lifetime. After the shot, stack to 40E10 for pbar shots to the Recycler. - JPM
Fri Mar 5 23:11:57 SVT new firmware looks OK at least no harmful so far. - Taka Maruyama
Fri Mar 5 23:22:30 CO finds "too many errors" message in TrigMon L2 Pulsar plots. We also got one trigger time out error. Paged experts, and he is coming.  - Rei Tanaka
Fri Mar 5 23:25:12 Problem with ACNET. I can do the plots, but I can't save the files. Mary is calling Steve. - Simon
Fri Mar 5 23:26:23 ISL05, L00 is tripped. Trigger inhibit is set. - Ace
-- Fri Mar 5 23:28:55 comment by...Shift crew --  Precisely: TRIP:ISL05, IFIX:ISL HV and IFIX:L00 were set.
-- Fri Mar 5 23:44:48 comment by...Andrew --  
CAEN Crate #12 power supplies dissappeared from the Silicon PS GUI
window. I went downstairs and reset the crate.After that we powered
Silicon back on and had a few DVDD trips during transition.
Recovered from trips and everything is back to normal.

Fri Mar 5 23:26:27
L2 Pulsar Errors, 10 min into the run...
 - G. Mitselmakher
-- Fri Mar 5 23:38:55 comment by...Ace --  Pulsar guys is here. And work for this problem.
-- Sat Mar 6 00:02:13 comment by...Burkard and Ted --  
This error was due to the fact that we splitted the muon fibers
for a test run with beam(see entry earlier in this shift). The
splitting worked with cosmic run and shot setup runs, but 
doesn't work with beam. We now removed all fiber splitting for 
L2 muon path and put the system back to the default setting 
and started a new run. Now there is no more error. We will 
stay here for a while to make sure. The splitting was done
mostly with old type SVX/SVT fiber splitters (we only have a 
few new better performance splitters in hand at this point). 
Since L2 muon trigger is still in tagging mode, this run 
should not be marked as bad simply due to this problem. 

Fri Mar 5 23:39:38 Got Solenoid Trip alarm at 23:30, and it went away 10sec later. Called Cryo, and he says that it happened due to current limit excess. - Rei Tanaka
Fri Mar 5 23:43:06 Both CPU of L3 node#12 are pink. - Ace
Fri Mar 5 23:48:30 Run 179682 Terminated at 2004.03.05 23:48:15 - RunControl
Fri Mar 5 23:49:36 Run 179682 TERMINATE: end the run and start a new one without fiber splitting - Andrew X2080
Fri Mar 5 23:51:42 Run 179683 Activated at 2004.03.05 23:51:25 - RunControl
Fri Mar 5 23:51:52 Run 179683 ACTIVATE: PHYSICS_2_02[2,424,431] - Andrew X2080
Fri Mar 5 23:55:48
Run Number Data Type Physics Table Begin Time End Time Live Time L1 Accepts L2 Accepts L3 Accepts Live Lumi, nb-1 GR SC RC
179682 x2BDE2 BEAM PHYSICS_2_02 [2,424,431] 23:04:10 23:48:15 00:17:28 19,796,653 533,233 92,122 68.289 1 1
Totals 23:55:02 00:17:28 19,796,653 533,233 92,122 68.289
 - End of Shift Report
Fri Mar 5 23:57:22 Shift Summary:
- D0 made 8-hour access to investigate the noise on their
calorimeter and muon  

  systems coming from their toroid's spark. They disengaged the troid jumper, 
  the problem has been fixed. 
- Smooth shot setup followed. 

Work done:  
- TOF HV worked on by Gerry Bauer.  
- L2 people, Burkard and Ted, worked on muon fibre splitting. 
  Watch the TrigMon for the L2 Pulsar plots. 

Known problems: 
- Water leak in roof which cannot be fixed until monday. 
- Rack 2RR23C has a single fan failure, the bottom fan is off. 
  To be repaired on monday. 
- One Solenoid Trip alarm due to current limit excess. 
- ACNET cannot save files. 
- L2 Pulsar errors in TrigMon. Experts are working.  

Plan: 
- Take good quality data!
 - Rei Tanaka
Sat Mar 6 00:00:54 Luminosity summary Begin shift or beam at 22:57:46
End shift or beam at 23:58:04
Delivered luminosity: 230.79
Acquired luminosity: 83.54

Totals
Date:2004.03.05
Shift:eve
Delivered luminosity: 230.8 nb-1
Acquired luminosity: 83.5 nb-1
Efficiency: 36.2

Plot not available

This script has been called 3500 times since Aug 16th 2003  - Rei Tanaka


Sat Mar 6 00:01:09
Got PCAL_02 timeout. Shepherded the crate and continued running.
 - Andrew :: (run 179683)
Sat Mar 6 00:05:32
After we removed the L2 muon fiber splitters, no more errors in L2 Pulsar TrigMon plots.
 - Burkard