Primary Report for Week of Mar 7, 2005
		**************************************


Summary
-------
Pages:
 0 off hours
 1 work hour (d0lib-archive exceeded quota, ticket 55988))

Remedy tickets:
 2 resolved (51869,53575)
 2 reassigned (55919,54734)

Servers:
 1 server investigations (d0ensrv3 lost raid)

Movers:
 1 installed (stkenmvr27a)

Tapes:
 2 volumes cloned (VO7113, IA8157)

Robots:
 1 hardware service call (D41BLTO, D41DLTO)
 1 software update service call (fntt2)

Investigations:
 2 LTO-2 dismount failed
 2 tapes entered upside down (VO7502, VO7781)
 2 tapes dismount failed (9994) (PRS741, VO5839)
 1 DLT mount failed
--
 7


Report
------
* 3/7 (Monday)

D0 -  PRO874L1 dismount failed bad D41BLTO (d0enmvr53a). This looks like
      the weak spring problem; service call placed.

      Exported d0ensrv1:/dzero and stkensrv1:/patriot to d0cs561-d0cs600.

STK - Volume VO7113 cloned (too many mounts).

      pageDcacheCmsGridftp hung since yesterday; killed process tree,
      but there is still a problem.

CDF - Service call placed to upgrade ACSLS on fntt2 on Wednesday morning.

      30 volumes were recycled, but of those, 19 were still write locked.
      Ran flip_tab to unlock them.

      Removed a used-up cleaning tape from silo 0,1. Tried to move a new one
      from 0,0 to 0,1 but 0,1 was already full again. Ejected the 30 volumes
      that had been recycled to write permit them; since most of them were
      in silo 0,0 I was able to move two cleaning tapes from 0,0 to 0,1 while
      they were ejected.

* 3/8 (Tuesday)

D0 -  Volume PRO877L1 ejected but not dismounted from drive D21DLTO; volume
      noaccess, drive offline. This tape was caught in a double whammy by a
      touch sensor error on a different tape; the robot arm was too busy to
      dismount this tape. Cleared volume, brought drive online.

STK - Mover 9940B27 installed.

      Paged because d0lib-archive exceeded 9940 quota; switched library tags
      to CD-9940B.

      Volume JL4592 mount failed bad in drive DG4DDLT; volume noaccess, drive
      offline. The robot had a touch sensor error on this mount; while volume
      was noaccess and drive offline, I successfully mounted the tape. Cleared
      volume, brought drive online.

CDF - Volume IA8157 cloned; file 54 is bad.

      Volumes IA0212 and IA0213 both have > 5000 mounts, but are used for
      pageDcache, so they will not be cloned. Should they be retired, or
      replaced? (pageDcache is currently writing to IA5387.)

* 3/9 (Wednesday)

D0 -  ADIC replaced drives D41BLTO and D41DLTO.

      At 15:55, d0ensrv3 began to have SCSI errors from the RAID array.
      psqlBackup (on d0ensrv1) and inventory.py (on d0ensrv3) were failing
      because /diskc was unavailable. The RAID box didn't show any problems,
      but I power cycled it anyway. That didn't help, so I rebooted d0ensrv3.
      Both /diska and /diskc returned.

      There are two things that I noted were different about this RAID from
      the other servers:
	 o RAID is connected at SCSI B port, instead of A
	   - broken connector on A port
	 o rate is 40 MB/s, instead of 160
	   - probably limited by the PC built-in SCSI adapter

CDF - STK installed ACSLS 7 on fntt2.

      Cron job STKDrvBusy on cdfensrv4 was stuck in rsh because fntt2
      was down. Killed process tree.

* 3/10 (Thursday)

D0 -  Volume PRS741 dismount failed (9994) in drive 9940B37; volume
      noaccess, drive offline. Volume was unloaded but not dismounted.
      Manual dismount succeeded. Volume cleared, drive brought online.

STK - Volume VO7502 mount failed in drive 9940B26 "incompatible media
      type"; volume noaccess, drive in error state. Problem is indeed
      the volume (see next); restarted mover.

      Volumes VO7502 and VO7781 have media type "3480" in the robot,
      instead of "STK2P". In enstore, the first is a 9940B, the second,
      a 9940. Ejected them,  found they had been entered upside down.
      Interestingly, the robot read the volser correctly, only got the
      media type wrong.

      Volume VO5839 dismount failed (9994) in drive 994052; volume
      noaccess, drive offline. Volume was unloaded but not dismounted.
      Manual dismount succeeded. Volume cleared, drive brought online.

* 3/11 (Friday)

* 3/12 (Saturday)

* 3/13 (Sunday)