Installs:
0 installs  0 decommissions
Replaced drives:
0 9940B  0 9940A  0 LTO1/LT02  0 other
Replaced servers/movers/fileserver nodes:
0 server  0 mover  0 fileservers
0 Robot hardware maintenance service calls
1 Server/Mover/Fileserver maintenance (repairs/parts replacement)
Investigations/Interventions:
18 mover  0 server   6 tape drive  0 fileserver  12 tape  0 file  0 library
tape operations
39 tapes clobbered/recycled
0 tapes labeled/entered/removed
1 tape MIRs fixed
0 tape MIRS not fixed      0 tape drive firmware updates
5 tapes cloned
0 quota increase requests serviced
0 raid disk replacements/interventions
3 enstore service requests
0 new muon pages
0 off-hour calls/interventions
120 flip-tab

Weekend:
------------------------------------------------------------------------

Monday 10/23

home/enstore/enstore/sbin/pageDcache srm fndca]

Had to clear DLT problems
dasadmin dismount -d DG4B & enstore vol --clear JL7006
dasadmin dismount -d DG4A & enstore vol --clear JL7088

dasadmin dismount -d DG4D & enstore vol --clear JL7018
2006-10-23 16:57:27.860154   system_inhibit[0]    none
2006-10-23 15:50:34.776666   system_inhibit[0]    none
2006-10-23 11:12:05.284226   system_inhibit[0]    none

Killed these two stuck stken cronjobs.
pageDcacheKftp
pageDcacheGridftp

tried to restart srm to re-engage this pagedcacheSRM cronjob.
/etc/rc.d/init.d/dcache-boot start fifo srm1

'(CD-9940B, exp-db) is approaching its quota limit (591/620)'}
library  storage_group  requested  authorized  quota  allocated
CD-9940B         exp-db       1000         600    640        603  *

Stken; Alarms not collapsing fixed on Thursday.

D0En;
Testing volume PRL230   more on Thursday.
volume PRL230 READ_ERROR @ 63, 74 & 83

Notified CMS_t1 of CRC dcache mismatch alarms:

------------------------------------------------------------------------

Tuesday 10/24

StkEn;
Cleared JL7018 & drive
2006-10-24 14:04:32.
2006-10-24 13:35:44.
2006-10-24 10:57:01.

drive stkenmvr20a, service called drive tested and released.
SL code set to disallow clobber label.
9940B tape drive 0,1,10,3 is getting many read failures[Fwd: #63840@STC
(24x7)]

Notified CMS_t1 of cmsstor49  CRC dcache mismatch alarms.

VO7923 cloned It had 2028 mounts

sent mail out reporting cronjob failure.
/home/enstore/SL8500-VOLUMES.html line 1: SL8500 command not found

HelpDesk ticket 87389 Jobs stalled writing to dcache
restarted the door ; dcache-boot start fifo doorG00

D0En;
HelpDesk ticket 87455
d0en enstore file system corrupt error (20061023); cleared ghost file.
node d0enmvr26a crashed and was rebooted. D0enmvr26a may have other issues.

------------------------------------------------------------------------

Wednesday  10/25
2006-Oct-25 14:08:27 Today's 9310 robot flip_tab work: 6 groups locked,
0 groups failed.

Cleared JL7018 once
2006-10-25 17:39:00.702599   system_inhibit[0]    none

First noticed the DBT10MV CRC errors

9940 quota for backups was increased from 260 to 280. They are now 254/280

VO5159 was cloned It had 2214 mounts.

Drive DBT24MV & volume VO5153 ftt.FTTError

D0En;
HelpDesk ticket 87482
d0enmvr26a dead and blocking transfers from node d0olw, I freed the volume
and left the PC offline

Drive D21GLTO & volume PRX441L1 Memory usage 10.1 approaches a limit 10
Rebooted the PC

New trixes;
rpm -qa --qf "%-25{name}\t%{version}\t%{summary}\n" | sort -n
------------------------------------------------------------------------

Thursday 10/26

No additional drives offline this morning.

Checked libraries for stalled processes.

STKen:
VO4467 was cloned.  It had 3211 mounts.

Cleared JL7018
2006-10-26 12:16:17.561844   system_inhibit[0]    none
2006-10-26 10:07:44.036669   system_inhibit[0]    none

Tested and released drive DBT10MV, volumes VO5153 & VO8363 CRC errors
But then got another one.  How does this drive find these tapes.

Stken dcache pool offline.  w-stkendca10a-3
Ticket #: 87592 Enstore volumes recycled for exp-db

2006-Oct-26 16:04:16 INQSRV TIMEDOUT

Enstore_Up_Down  Ticket #: 87604
2006-Oct-26 16:27:05 Enstore_Up_Down ; Ticket Generated ; YES
(RedBall/STK Enstore)


D0en;
HelpDesk ticket 87555
d0enmvr40a or tape PRJ410 experiencing very low drive write rates,
All ftt test are successful tests drive.  DBT40MV seems to have a number of errors on the Failed
Transfers page.
#63865@STC (24x7) While checking LSM logs Clarence saw some odd errors
on a number of volumes.  He replaced drive 9940B40

PRL230 is undergoing another cloning attempt.  We are trying to use a 9940B
drive to read the 9940A tape and copy it to a 9940B tape.

CDFen:
IA0212 has been replaced

IA0213 The tape was physically replaced, deleted from enstore and remained
a 9940A.   The tape was never used. George deleted it again and re-added
it as a 9940B.

Checked that COMPLETE_FILE_LISTING_cdf is updating!

root cron (quickcheck) on cdfensrv3.fnal.gov is active too long

------------------------------------------------------------------------

Friday 10/27
StkEn;
 9940B21 woke up dead; enstore log reported, Memory 12.1 approaching
 limit of 10. Will restart the mover. The mover did not restart.
Enstore was started.

 9940B40 dead Memory usage 10.1 approaching limit of 10.
Will restart the mover. Cannot restart mover there is a lock file.
Enstore was started.

restarted dcache pools.
w-stkendca10a-3 92 unknown
w-stkendca11a-3 6 unknown


D0En;
Ticket #: 87606 d0en enstore file system corrupt error (20061026) removed Casper all_1_0000227046_079.raw
Ticket #: 87621 d0en enstore file system corrupt error (20061027)

------------------------------------------------------------------------
Tape aid tickets.
Ticket #: 87474 write protect 3 tapes (flip tabs) in d0en 9310 tape library
Ticket 87474 Has Been Resolved.
Ticket #: 87475 write protect 155 tapes (flip tabs) in cdfen 9310 tape
library
Still pending.
Ticket #: 87476 write protect 117 tapes (flip tabs) in stken 9310 tape
library
Ticket 87476 Has Been Resolved.

------------------------------------------------------------------------

STKen volumes active too long
Volume    Count   Movers

Tapes marked readonly due to write errors
Volume    Count   Movers

D0en volumes active too long
Volume    Count   Movers

Tapes marked readonly due to write errors
Volume    Count   Movers
PRJ399        1   d0enmvr40a
PRJ406        1   d0enmvr40a
PRJ410        1   d0enmvr40a

CDFen volumes active too long
Volume    Count   Movers
IA3671        2   cdfenmvr16a
IA9283        1   cdfenmvr23a
Total         3

Tapes marked readonly due to write errors
Volume    Count   Movers





Local Weather Radar Page from NIU


Security, Privacy, Legal