|
2004 CDF E-Log -- Owl shift. Sat Feb 28, 2004 |
| SciCo |
DAQ Ace |
Monitoring Ace |
CO |
(Operations Manager) |
| S.Miscetti/Guram |
Ian Vollrath |
Alison Lister |
Brian Mohr |
jj Schmidt |
Start of Shift Notes:  Trying to put Silicon In DAQ.
Test new trigger table PHYSICS_2_03[1,431,435]
Sat Feb 28 00:07:05
Run 179468
Activated at 2004.02.28 00:06:27 - RunControl
Sat Feb 28 00:07:06
Run 179468
ACTIVATE: PHYSICS_2_03[1,431,435] with silicon - Ian x2080
Sat Feb 28 01:00:59
getting L1 done timeouts every ~3-5min on average. however, none yet resulting from e120 ...
which they tell me is a good thing. also have had a few (~4) reformatter errors.
- ian :: (run 179468)
Sat Feb 28 01:07:48
done timeout from b0cmx01
hrr recovered
- ian :: (run 179468)
Sat Feb 28 01:23:54
Run 179468
Terminated at 2004.02.28 01:23:06 - RunControl
Sat Feb 28 01:23:56
Run 179468
TERMINATE: run ended for silicon work - Ian x2080
Sat Feb 28 01:29:34
We have given a first look to the new trigger table
comparing run 179468 with previous run number at low-lum
(run 179467). From Xmon, we see that global rates of
L1,L2,L3 looks reasonable. L2 processing looks better.
We decided to keep the table up to 8 in the morning.
- S.Miscetti
Sat Feb 28 01:34:52
COT cooling alarm which went away on it's own after a couple of minutes.
"Cryo guy" said this is a "normal" behaviour when the Silicon is put into a run later than the rest.
(guess this has been seen often before but thought I would put it in e-log anyway for
information)
- alison
Sat Feb 28 01:46:15
Run 179469
Activated at 2004.02.28 01:44:50 - RunControl
Sat Feb 28 01:46:16
Run 179469
ACTIVATE: PHYSICS_2_03[1,431,435] - Ian x2080
Sat Feb 28 01:46:17
Run 179469
Terminated at 2004.02.28 01:46:10 - RunControl
Sat Feb 28 01:46:42
Run 179469
TERMINATE: bad bad bad - Ian x2080
Sat Feb 28 01:46:59



- alison (2 hour plots)
Sat Feb 28 01:57:21
Run 179468
RUNSTATUS: Marked Bad, explanation:
L3T too many reformatter errors (5%)
- cdfscico
Sat Feb 28 02:06:05
Run 179470
TERMINATE: bad bad - Ian x2080
Sat Feb 28 02:09:33



- alison
Sat Feb 28 02:17:01
Run 179471
Activated at 2004.02.28 02:16:34 - RunControl
Sat Feb 28 02:21:06
Run 179471
Terminated at 2004.02.28 02:19:26 - RunControl
Sat Feb 28 02:21:07
Run 179471
ACTIVATE: bad bad - Ian x2080
Sat Feb 28 02:21:08
Run 179471
TERMINATE: bad bad - Ian x2080
Sat Feb 28 02:31:39
Run 179472
Activated at 2004.02.28 02:30:45 - RunControl
Sat Feb 28 02:32:26
Run 179472
ACTIVATE: PHYSICS_2_03[1,431,435] after some si fixes - Ian x2080
Sat Feb 28 02:46:44
about 80% of L3 data flow states are in an "error state". however, have no errors and running
~smoothly. maybe this is a result of the numerous reformatter we have been having.
- ian :: (run 179472)
Sat Feb 28 02:49:57
Update on Silicon Massace
yesterday, we had attempted to upgrade the SRC firmware - we are still running with old firmware in the SVX SRC, for which there is no spare.
we knew the old firmware was plagued with L2 w/o L1 fatal errors, out of which we knew we could HRR out.
lester downloaded some firmware with additional diagnostics into the b0svx06/ISL SRC.
checkouts w/ silicon appeared normal - with silicon, the SVX got into a dramatic state which kicked SVX into a high current state - so high that the detector temperature increased and the temperature alarm tripped off.
we decided to undo all of the changes and revert to the old situation - old SRC in SVX, latest firmware w/o diagnostics in ISL as before.
unfortunately, we are now plagued by numerous L1 DONE timeouts from various ladders.
we suspected corrupted GLINK senders and power cycled FIB crates in the collision hall. no avail.
we suspected corrupted GLINK receivers and power cycled VRB crates. to no avail.
almost by accident, we swapped one VRB. that seem to have helped at least individual sources of DONE timeout.
unfortunately, there are instantenous spurs of reformatter errors, likely connected to the DONE TO, we result in the runs to be marked bad .
we have currently no idea how this could come about - the fact that swapping VRBs seems to help is almost ridiculous.
nonetheless we swapped 3 times successfully, and we had to steal one VRB from the EVB upgrade crate because there is no way to get into FCC for after hours, and we burned the two spares we have available in the Si office.
all this was discussed with Ops and ok'ed.
one EVB cleanup was my fault - I forgot to put a cable back in. - Rainer, for Sal, Marcel, Pete and Lester and Steve consulting.
-- Sat Feb 28 02:50:48 comment by...rainer -- ps.: to be continued ... tomorrow we'll try to brainstorm with the experts.
-- Sat Feb 28 02:51:24 comment by...rainer -- See silicon
elog
-- Sat Feb 28 03:33:30 comment by...ian -- in the past hour or so have had L1 done timeouts:
e480: 1
e200: 7
e420: 12
none from e120.
reformatter error rate for this run so far is: 1.78% and decreasing.
Sat Feb 28 03:09:51



- alison
Sat Feb 28 04:06:11



- alison
Sat Feb 28 05:05:53



- alison
Sat Feb 28 06:05:38



- alison
Sat Feb 28 06:27:04




- alison
-- Sat Feb 28 06:28:36 comment by...alison -- All of B1W3 went down to zero at the same time around 6.05
These are only the AVDD plots.
-- Sat Feb 28 07:14:15 comment by...rainer -- another fit of CAEN madness - fixed by hockerization. details see silicon
elog.
Sat Feb 28 07:10:36



- alison
Sat Feb 28 07:14:17
PSM alarm: 1RR18D (CMX-Muon crate)Channel 0 was slighly high (just over 6V). Alarm went away on
it's own.
- alison
Sat Feb 28 07:15:19
got trigger inhibit from IFIX: SVX HV. no signs of any trips, alarms, etc. low voltage bar of
SVX HV on IFIX had disappeared. brought SVX HV to standby. contacted expert. brought SVX HV back up
at expert's request. same problem - i.e. trigger inhibit without anything else. brought SVX HV back
down at expert's request. rainer called ... came back in. turns out there was an inhibit and some
power supply problems. some crates had to "hockerized"
- ian :: (run 179472)
-- Sat Feb 28 07:17:07 comment by...ian -- note that the above is summary of what's been going on for the past while (got first inhibit at
~5:45am)
-- Sat Feb 28 07:17:34 comment by...ian -- we are back running as of 7:12am
Sat Feb 28 07:46:59
CPR trip: sections 0 through 5 West.
"ON" for that section recovered the trip.
- alison
Sat Feb 28 07:48:05
got error:
46'34" 1 crate/s: b0svx06(16), in error.[RXPT]b0svx06:Messenger:7:46:23 AM->SRC Fatal Error:Sl 5
Too Many L1A 2 L1A to Buff
-->
Additional Information:
Attention !!!. FERML_SRC_FATALITY ERROR !!!
SRC Fatal Error from b0svx06: Sl 5 Too Many L1A 2 L1A to Buff
hrr worked
- ian :: (run 179472)
-- Sat Feb 28 07:51:57 comment by...ian -- again
Sat Feb 28 07:56:09
| Run Number |
Data Type |
Physics Table |
Begin Time |
End Time |
Live Time |
L1 Accepts |
L2 Accepts |
L3 Accepts |
Live Lumi, nb-1 |
GR |
SC |
RC |
|
179468
x2BD0C |
BEAM |
PHYSICS_2_03 [1,431,435] |
00:06:27 |
01:23:06 |
00:59:48 |
56,227,513 |
1,047,617 |
180,634 |
135.953 |
0 |
1 |
1 |
|
179472
x2BD10 |
BEAM |
PHYSICS_2_03 [1,431,435] |
02:30:45 |
|
03:05:45 |
180,917,362 |
3,043,852 |
554,559 |
354.951 |
|
|
1 |
| Totals |
|
|
|
07:55:02 |
04:05:34 |
237,144,875 |
4,091,469 |
735,193 |
490.904 |
|
|
|
- End of Shift Report
Sat Feb 28 08:01:42
Shift Summary: We started the shift with Store #3261 in at an
istantaneous luminosity of
3.9*10^31.
The previous shift has just started the new run (179468) with the low
luminosity trigger table PHYSICS_2_03[1,431,435]. Silicon experts were
swapping boards in order to reduce the Silicon timeout. We compared
XMON with previous run and did not look too different so we run with
the new table.
A lot of problems while swapping Si boards. At 2AM Rainer completed
the fix and we started a run with a reasonable number of L3 reformatter
errors ( <1% ). Still we have a 4% DeadTime due to L1 Done. The situation
is unclear and this morning the Si experts will keep working on this.
We had also at least 1 hour of downtime due to a Si trigger Inhibit from
IFIX although there was no signs of any trips. Rainer came back in
and discovered some power supply problems. Now this is fixed.
SI experts at work.
End of Shift Numbers
|
CDF Run II
Runs
Delivered Luminosity 941.5
Acquired Luminosity 498.0
Efficiency 52.9
|
- S.Miscetti
Sat Feb 28 08:18:49
Run 179472
ACTIVE: Got "L3_REF_ERROR_HIGH_RATE Error: L3 Instantaneous error rate is 1.3838321 per cent."
and then "FERML_SRC_FATALITY ERROR !!! SRC Fatal Error from b0svx06: Sl 5 L2A w/o L1A" - Vadim x2080
Sat Feb 28 08:23:48



(entry outside this shift's time range ) - Susana.
Sat Feb 28 08:38:50
Run 179472
ACTIVE: Fatal one again: first "FrontEnd Crate Error Condition from: VRB_ISL_06", then RXPT Error, and "FERML_SRC_FATALITY ERROR !!! SRC Fatal Error from b0svx06: Sl 5 Too Many L1A 2 L1A to Buff" - Vadim x2080