Installs:
0 installs 0 decommissions
Replaced drives:
0 9940B 0 9940A 0 LTO1/LT02 0 other
Replaced servers/movers/fileserver nodes:
0 server 0 mover 0 fileservers
0 Robot hardware maintenance service calls
1 Server/Mover/Fileserver maintenance (repairs/parts replacement)
Investigations/Interventions:
18 mover 0 server 6 tape drive 0 fileserver 12 tape 0 file 0 library
tape operations
39 tapes clobbered/recycled
0 tapes labeled/entered/removed
1 tape MIRs fixed
0 tape MIRS not fixed 0 tape drive firmware updates
5 tapes cloned
0 quota increase requests serviced
0 raid disk replacements/interventions
3 enstore service requests
0 new muon pages
0 off-hour calls/interventions
120 flip-tab
Weekend:
------------------------------------------------------------------------
Monday 10/23
home/enstore/enstore/sbin/pageDcache srm fndca]
Had to clear DLT problems
dasadmin dismount -d DG4B & enstore vol --clear JL7006
dasadmin dismount -d DG4A & enstore vol --clear JL7088
dasadmin dismount -d DG4D & enstore vol --clear JL7018
2006-10-23 16:57:27.860154 system_inhibit[0] none
2006-10-23 15:50:34.776666 system_inhibit[0] none
2006-10-23 11:12:05.284226 system_inhibit[0] none
Killed these two stuck stken cronjobs.
pageDcacheKftp
pageDcacheGridftp
tried to restart srm to re-engage this pagedcacheSRM cronjob.
/etc/rc.d/init.d/dcache-boot start fifo srm1
'(CD-9940B, exp-db) is approaching its quota limit (591/620)'}
library storage_group requested authorized quota allocated
CD-9940B exp-db 1000 600 640 603 *
Stken; Alarms not collapsing fixed on Thursday.
D0En;
Testing volume PRL230 more on Thursday.
volume PRL230 READ_ERROR @ 63, 74 & 83
Notified CMS_t1 of CRC dcache mismatch alarms:
------------------------------------------------------------------------
Tuesday 10/24
StkEn;
Cleared JL7018 & drive
2006-10-24 14:04:32.
2006-10-24 13:35:44.
2006-10-24 10:57:01.
drive stkenmvr20a, service called drive tested and released.
SL code set to disallow clobber label.
9940B tape drive 0,1,10,3 is getting many read failures[Fwd: #63840@STC
(24x7)]
Notified CMS_t1 of cmsstor49 CRC dcache mismatch alarms.
VO7923 cloned It had 2028 mounts
sent mail out reporting cronjob failure.
/home/enstore/SL8500-VOLUMES.html line 1: SL8500 command not found
HelpDesk ticket 87389 Jobs stalled writing to dcache
restarted the door ; dcache-boot start fifo doorG00
D0En;
HelpDesk ticket 87455
d0en enstore file system corrupt error (20061023); cleared ghost file.
node d0enmvr26a crashed and was rebooted. D0enmvr26a may have other issues.
------------------------------------------------------------------------
Wednesday 10/25
2006-Oct-25 14:08:27 Today's 9310 robot flip_tab work: 6 groups locked,
0 groups failed.
Cleared JL7018 once
2006-10-25 17:39:00.702599 system_inhibit[0] none
First noticed the DBT10MV CRC errors
9940 quota for backups was increased from 260 to 280. They are now 254/280
VO5159 was cloned It had 2214 mounts.
Drive DBT24MV & volume VO5153 ftt.FTTError
D0En;
HelpDesk ticket 87482
d0enmvr26a dead and blocking transfers from node d0olw, I freed the volume
and left the PC offline
Drive D21GLTO & volume PRX441L1 Memory usage 10.1 approaches a limit 10
Rebooted the PC
New trixes;
rpm -qa --qf "%-25{name}\t%{version}\t%{summary}\n" | sort -n
------------------------------------------------------------------------
Thursday 10/26
No additional drives offline this morning.
Checked libraries for stalled processes.
STKen:
VO4467 was cloned. It had 3211 mounts.
Cleared JL7018
2006-10-26 12:16:17.561844 system_inhibit[0] none
2006-10-26 10:07:44.036669 system_inhibit[0] none
Tested and released drive DBT10MV, volumes VO5153 & VO8363 CRC errors
But then got another one. How does this drive find these tapes.
Stken dcache pool offline. w-stkendca10a-3
Ticket #: 87592 Enstore volumes recycled for exp-db
2006-Oct-26 16:04:16 INQSRV TIMEDOUT
Enstore_Up_Down Ticket #: 87604
2006-Oct-26 16:27:05 Enstore_Up_Down ; Ticket Generated ; YES
(RedBall/STK Enstore)
D0en;
HelpDesk ticket 87555
d0enmvr40a or tape PRJ410 experiencing very low drive write rates,
All ftt test are successful tests drive. DBT40MV seems to have a number of errors on the Failed
Transfers page.
#63865@STC (24x7) While checking LSM logs Clarence saw some odd errors
on a number of volumes. He replaced drive 9940B40
PRL230 is undergoing another cloning attempt. We are trying to use a 9940B
drive to read the 9940A tape and copy it to a 9940B tape.
CDFen:
IA0212 has been replaced
IA0213 The tape was physically replaced, deleted from enstore and remained
a 9940A. The tape was never used. George deleted it again and re-added
it as a 9940B.
Checked that COMPLETE_FILE_LISTING_cdf is updating!
root cron (quickcheck) on cdfensrv3.fnal.gov is active too long
------------------------------------------------------------------------
Friday 10/27
StkEn;
9940B21 woke up dead; enstore log reported, Memory 12.1 approaching
limit of 10. Will restart the mover. The mover did not restart.
Enstore was started.
9940B40 dead Memory usage 10.1 approaching limit of 10.
Will restart the mover. Cannot restart mover there is a lock file.
Enstore was started.
restarted dcache pools.
w-stkendca10a-3 92 unknown
w-stkendca11a-3 6 unknown
D0En;
Ticket #: 87606 d0en enstore file system corrupt error (20061026) removed Casper all_1_0000227046_079.raw
Ticket #: 87621 d0en enstore file system corrupt error (20061027)
------------------------------------------------------------------------
Tape aid tickets.
Ticket #: 87474 write protect 3 tapes (flip tabs) in d0en 9310 tape library
Ticket 87474 Has Been Resolved.
Ticket #: 87475 write protect 155 tapes (flip tabs) in cdfen 9310 tape
library
Still pending.
Ticket #: 87476 write protect 117 tapes (flip tabs) in stken 9310 tape
library
Ticket 87476 Has Been Resolved.
------------------------------------------------------------------------
STKen volumes active too long
Volume Count Movers
Tapes marked readonly due to write errors
Volume Count Movers
D0en volumes active too long
Volume Count Movers
Tapes marked readonly due to write errors
Volume Count Movers
PRJ399 1 d0enmvr40a
PRJ406 1 d0enmvr40a
PRJ410 1 d0enmvr40a
CDFen volumes active too long
Volume Count Movers
IA3671 2 cdfenmvr16a
IA9283 1 cdfenmvr23a
Total 3
Tapes marked readonly due to write errors
Volume Count Movers
Local Weather Radar Page from NIU
Security, Privacy, Legal