|
2004 CDF E-Log -- Day shift. Fri Feb 27, 2004 |
| SciCo |
DAQ Ace |
Monitoring Ace |
CO |
(Operations Manager) |
| Beate H./Rainer W. |
Vadim Khotilovic |
Natasa Miladinovic |
Franco Semeria |
JJ Schmidt |
Start of Shift Notes:  Tevatron dry squeeze under way. Waiting for shot setup.
Da Plan:
- taking cosmics until shot setup
- silicon SRC diagnostics.
Fri Feb 27 08:32:25
Unmarked S_VBIAS_B0W4L0 in IMon. Let's see if it stays green. - natasha
Fri Feb 27 08:39:01
call Guillelmo if shot setup. L3 currently in a funky state. - Beate/Rainer
Fri Feb 27 08:41:07
I have integrated a few nodes in the farm (all processor nodes are on-line now) and I am testing the l3proxy on pcom1.
If you have problems with l3proxy (please follow this order):
1) Try to restart it using the Ace Control Panel
2) If not possible, call me
3) If not answer (rare), page L3 - Guillelmo
-- Fri Feb 27 08:41:38 comment by...Guillelmo --
| | Call me when shot setup starts! (do NOT page L3) |
Fri Feb 27 08:42:23
Bill working IMU DSP code. See earlier
entry. - Beate/Rainer
Fri Feb 27 08:49:05
Run 179439
Activated at 2004.02.27 08:48:58 - RunControl
Fri Feb 27 08:49:43
Run 179439
ACTIVATE: COSMICS [12,391,403] - Vadim x2080
Fri Feb 27 09:04:28
PSM alarm: 1RR18C (IMU W muon TDCs) turned red. Channel 2 was out of range (-1.20V or so, when it should have been -5.20V), but after 2 minutes everything seems fine. - natasha
Fri Feb 27 09:05:57
VNODE1 gives the following message: "You have not been backed up since Thu Feb19 2004. Contact your backup administrator for more info.". Will send an email to Mark Knapp. - natasha
Fri Feb 27 09:10:18
Run 179439
Terminated at 2004.02.27 09:09:57 - RunControl
Fri Feb 27 09:12:26
Run 179439
TERMINATE: End run to put in crate imu00 - Vadim x2080
Fri Feb 27 09:15:39
Run 179444
Activated at 2004.02.27 09:14:42 - RunControl
Fri Feb 27 09:15:40
Run 179444
ACTIVATE: COSMICS [12,391,403] - Vadim x2080
Fri Feb 27 09:32:24
Run 179444
Terminated at 2004.02.27 09:32:02 - RunControl
Fri Feb 27 09:32:52
Run 179444
TERMINATE: will put the system to standby - Vadim x2080
Fri Feb 27 09:33:21
MCR calls - they will inject beam. end the run and put detectors to standby.
- Beate/Rainer
Fri Feb 27 09:47:56
Same PSM alarm as 40 minutes ago... (9:04 entry) - natasha
-- Fri Feb 27 10:02:11 comment by...Rainer -- Bill Badgett gets the blame. He was working on the crate in question.
Fri Feb 27 09:52:10
RC reported missing Icicle heartbeat. The Icicle looked like running. But I restarted it anyway. - Vadim
-- Fri Feb 27 09:57:06 comment by...Vadim -- heartbeat came back
Fri Feb 27 09:54:09
TOF HV wouldn't go down. There was some problem w/ Smacs (not exactly hung, but not responding properly). Also there was the "not backed up" message on the TOF PC. - natasha
-- Fri Feb 27 09:57:12 comment by...Rainer -- TOF expert arrived. working on it.
-- Fri Feb 27 10:11:24 comment by...Rainer -- expert ok's to have MCR inject beam. called them and told them it's fine to proceed.
Fri Feb 27 09:54:49
FYI: I am still working on crate IMU_00 and IMU_01 --
you will see them turn red sometimes in vxmon & the
PSMON of iFix might complain about them when I cycle
their power.
Testing TDC DSP code V45.
- W.Badgett
Fri Feb 27 09:58:48
Again TOF Heartbeat -- the TOF PC has a pop -up window saying "The remote server machine does not exist or is unavailable. The client Program not found." I eblieve this was the same message given when the first TOF heartbeat went off, so Smacs crashed. I restarted it fine, but the problem persists. The expert is here looking at it. - natasha
Fri Feb 27 10:04:02
the search option for the e-logs now uses more convenient pull down menu's -- also, the temp file area was changed to notebooks/temp...send me email if there are problems or suggestions... - R. Vidal
Fri Feb 27 10:07:09
b0svt06 reported with heap corruption in vxworks. CPU output
reports:
adword addr 80001c, opened as geo32:/slot=13
status = 5
readword addr 80001c, opened as geo32:/slot=13
status = 5
readword addr 80001c, opened as geo32:/slot=13
status = 5
writeword addr 80001c, opened as geo32:/slot=13
status = 5
SVTMON -> Filling Histograms free 119601224 max 101253152 in 2569 blocks: 14920 us
SVTMON -> Sending Status msg free 119601224 max 101253152 in 2569 blocks: 23269 us
SVTMON -> >> completed loop n. 25378 << Reading board regs free 119631944 max 101253152 in 2560 blocks: 147 us
SVTMON -> Reading spy buff free 119631944 max 101253152 in 2560 blocks: 4805 us
readword addr 400018, opened as geo32:/slot=13
- Beate/Rainer
-- Fri Feb 27 10:11:41 comment by...natasha -- Vadim shepherded it and now it appears fine in VxWorks monitor.
Fri Feb 27 10:16:03
Run 179448
ACTIVATE: COSMICS_NOTRACKS [6,392,403] - Vadim x2080
Fri Feb 27 10:23:02
Run 179448
TERMINATE: will torture L2 - Vadim x2080
Fri Feb 27 10:37:23
Run 179449
TERMINATE: l3 is red: the last heartbeat was 6 min ago - Vadim x2080
-- Fri Feb 27 10:43:19 comment by...Vadim -- b0eb19 was not responding. RC asked to clean up EVB. Cleaned up L3 farm and EVB
Fri Feb 27 10:45:40
After a voice message "Tevatron Abort" we got a Clock Error (already three of them!):
rc.vxmon.ClockStatus@8c3c1d
Status: 0x18000
Fatal Error
PCC_CSR_6_SYNC_TIMING_ERROR
PCC_CSR_7_SYNC_MISSING_ERROR
Time: : 1077900027
TimeNanoSeconds: 0
SeqCSR0: 0xfd27
SeqCSR1: 0x4
PccDelay: 0xffb0
PccCSR: 0xffc3 - Vadim
-- Fri Feb 27 10:47:34 comment by...ronmoore -- The Tevatron Low-Level RF processor was rebooted. That is the
likely source of the clock errors. Check with MCR.
-- Fri Feb 27 10:55:28 comment by...Vadim -- JJ cleared errors on the clock crate
Fri Feb 27 10:58:25
pcal11 was not responding. shepherded it. - Vadim
Fri Feb 27 11:03:27
L3 becaome red during partitioning again. But now all the eb's are responding. Calling the experts - Vadim
-- Fri Feb 27 11:16:35 comment by...Guillelmo -- | | Did you call me or did you page L3?????????? |
-- Fri Feb 27 11:23:41 comment by...Vadim -- I thought it was some not l3proxy related problem, and we called the pager. Cleaning up the EVB again helped.
-- Fri Feb 27 17:15:34 comment by...Nuno -- Clarifications about this pager:
(i) what was done was to cleanup L3 a second time, not EVB;
(ii)the red color in the partitions state bars on the L3 display indicated the problem was precisely with l3proxy.
Fri Feb 27 11:19:45
Run 179453
ACTIVATE: L2_TORTURE[15,390,406] - Vadim x2080
Fri Feb 27 11:23:34
We have two protons bunches in the machine... - Beate
-- Fri Feb 27 11:37:45 comment by...m mattson --
I'm surprised that Ron Moore has not set up an automated script, to
explain that these are not two proton bunches. :)
-- Fri Feb 27 11:58:06 comment by...ronmoore -- Yes, it is a pet peeve of mine, but I would not go to that extreme.
Yet. (Perhaps the SciCos should also read my Machine Session in the Ace Training presentation.) One can also compare the TevDC (total
beam intensity) and TeVPR (proton intensity in the 36 RF buckets
used for coalesced protons in HEP stores)...with uncoalesced
protons, TevDC is almost an order of magnitude bigger than TevPR!
Also notice that when uncoalesced protons are at 980 GeV, it may
appear that there are two proton bunches AND 1 pbar bunch! The uncoalesced protons also span a bucket normally occupied by a coalesced pbar bunch at 980 GeV.
-- Fri Feb 27 12:05:44 comment by...ronmoore -- The TevDC and TevPR quantities I mentioned can be found on the channel 13 a.k.a. "notify" display.
Fri Feb 27 11:26:43
during the partitioning the crate b0clc00 was not responding. had to shepherd it - Vadim :: (run 179453)
Fri Feb 27 11:31:14
I have finished working on IMU_00 and IMU_01 crates; they
have been put back to their original configuration
with the old TDC DSP code V37. Have added them
back into the main partition to verify they are
OK. A cosmic run should be taken to make sure
no noisy channels or other problems were induced in the
power cycling.
- W.Badgett :: (run 179454)
Fri Feb 27 11:33:59
Run 179454
ACTIVATE: L2_TORTURE [15,390,406] - Vadim x2080
Fri Feb 27 11:44:09
Conclusion from testing TDC DSP V45 on b0imu crates:
The value of the TDC header as read from the TDC's
Static RAM seems to be OK. However, the header as read
from the TDC hit FIFO (last word in the FIFO for V45) does
not have the correct data at a significant rate (1 - 10%)
when operating at high trigger/readout rates.
Sometimes the module ID is correct, sometimes not. Invariably
the other fields of the header are not correct when a mismatch
is encountered.
However, when running with a MVME 2401 in place of the
usual MVME 2301 crate processor, the problem vanishes
completely. Both CPUs use the same Tundra/Universe
VME interface, but they have different versions of the chip.
My hypothesis is that the older 2301 does not prohibit
a read-ahead on a block transfer when it reaches a 256 byte block boundary, whereas the 2401 does do
this. This feature is
needed in order to use the dual-access FIFO readout mode
of DSP V45. Note that the COT crates all have MVME 2401
processors, and thus do not enounter this problem.
stop the read-ahead feature at the final byte.
- W.Badgett
-- Fri Feb 27 11:48:11 comment by...WB -- Also note that crate b0imu01 has DTACK filters
installed on Trace and MVME extender board. b0imu00
has neither DTACK filter. The 2301/2401 effect was
seen in both crates.
-- Fri Feb 27 12:14:59 comment by...ps -- also b0imu01 has rev f's and 00 'd's if you care
Fri Feb 27 11:50:44
clc00 was not responding again. tried to shepherd it - didn't succeed. tried to vxlogin and
reboot didn't help - got stuck in the process. tried to reset without shepherding and then to
shepherd - helped.
after several minutes got:
b0dap58.fnal.gov:Thread-154:11:46:41 AM->Task receive/clcreceive SUSPENDED on node b0clc00
- Vadim :: (run 179454)
-- Fri Feb 27 11:55:18 comment by...rainer -- paged CLC expert.
-- Fri Feb 27 11:56:07 comment by...Vadim -- vxlogin and reboot doesn't do anything - it's stuck. calling the CLC expert
-- Fri Feb 27 11:59:12 comment by...rainer -- sasha responded. he is coming in.
-- Fri Feb 27 12:48:02 comment by...rainer -- Frank Chlebana reset the crate, and now it works.
-- Fri Feb 27 12:57:23 comment by...Ming -- Yesterday we replaced the cpu in the b0clc00 crate with a new and faster cpu. However we found
out that we need to recompile one of the program (clcreceive) that runs on that cpu. We don't know
if the suspension of the clcreceive program is related to the change
of the cpu, and the recompiling of the code (as this program had
experienced suspension with the previous cpu). We decide to leave the situation as it is and see if
this program suspens again.
-- Fri Feb 27 13:04:40 comment by...rainer -- Frank and Ray Culbertson decided to switch the CPUs in the clc crate.
Fri Feb 27 12:03:06
from MCR elog:
11:38:18- TeV is approaching a lowbeta squeeze, and has uncoalesced beam at 980GeV
- Rainer
-- Fri Feb 27 13:11:06 comment by...rainer -- 12:32:48 comment by...bS -- ...some orbit smoothing was done and we are heading to the proton injection porch, from a wet squeeze, for another round.
Fri Feb 27 12:35:32
Mario working on DAQMon. - Rainer
Fri Feb 27 12:54:33
Run 179454
TERMINATE: experts will swap clc00's CPU - Vadim x2080
Fri Feb 27 13:00:57
Run 179455
ACTIVATE: L2_TORTURE [15,390,406] without clc00 crate - Vadim x2080
Fri Feb 27 13:00:58
Run 179455
TERMINATE: will add clc00 - Vadim x2080
Fri Feb 27 13:05:49
Run 179456
ACTIVATE: L2_TORTURE [15,390,409] - Vadim x2080
Fri Feb 27 13:06:09
Run 179456
TERMINATE: hmm.. it looks like I didn't actually add clc00 - Vadim x2080
Fri Feb 27 13:12:42
Run 179457
ACTIVATE: L2_TORTURE [15,390,409] with clc00 processor swapped - Vadim x2080
Fri Feb 27 13:13:28
The new 2400 in the b0clc00 crate was swapped with a second 2400.
We will watch for problems, the old 2300 can be swapped back
quickly if needed. - Ray Culbertson, Frank Chlebana
Fri Feb 27 13:35:15
Run 179457
TERMINATE: will run without cal an cal trigger crates - Vadim x2080
Fri Feb 27 13:35:16
Bill Badgett borrows some crates for tests. - Rainer
Fri Feb 27 13:39:45
Run 179458
ACTIVATE: L2_TORTURE [15,390,409] without cal and trigger cal crates - Vadim x2080
-- Fri Feb 27 15:49:44 comment by...Rainer -- SVX trips and cooling alarm
several ladders in SVX trip off at 13:52pm, SVX ladders go into high current DVDD state. A cooling alarm ensues which kills power to SB4. silicon SPL kills the power to rest of SVX. experts investigating. See silicon
elog.
Fri Feb 27 15:00:08
we are officially in shot setup. - rainer
-- Fri Feb 27 15:56:13 comment by...rainer -- MCR elog: 15:34:56- Loading Final Protons.
Fri Feb 27 15:04:31
Run 179458
TERMINATE: Scanner Manager SM_SCPU_TIMEOUT Error: HRR didn't help. Will cleanup EVB. - Vadim x2080
Fri Feb 27 15:27:55
 | Quad angles for last 2 months |
- JJ
-- Fri Feb 27 15:28:54 comment by...JJ -- Although this plot is of some interest, it was mainly put
here to see I would reproduce problems that others were
having putting graphics in eLog.
Fri Feb 27 15:54:56
Shift Summary: - TOF HV stayed up after MCR informed us they want to put
beam in the machine - needed expert intervention to get it to OFF.
apparently some ifix <-> tof PC communication problem.
- SRC guys working on silicon. new SVX SRC is in. major cooling trip likely related to SRC work.
still ongoing investigation
- CLC cpu needed expert intervention (crate reset and CPU swap)
- we are in shot setup. protons going in.
- rainer
Fri Feb 27 15:59:55
Loading Final Protons - Stephan Lammel/Robert Harris
Fri Feb 27 16:01:44
Shift Summary: See above
- St. Lammel/R. Harris
Fri Feb 27 16:01:45
SVRAD totals after final proton have been loaded
Date Time BLM Dose
2004.02.27 15:54:42 W Inner BLM 0.00 RADS
2004.02.27 15:54:42 W Outer BLM 0.00 RADS
2004.02.27 15:54:42 E Inner BLM 0.00 RADS
2004.02.27 15:54:42 E Outer BLM 0.34 RADS
Integrated dosage
- Anadi & Natasha