Primary Report for the Week of April 4, 2005
	*********************************************



Summary
-------
Pages:
 0 off hours
 0 work hour

Remedy tickets:
 6 assigned (57043, 57062, 57099, 57140, 57154, 57160)
 7 resolved (56760, 56834, 56949, 56965, 57043, 57062, 57140)
 1 reassigned (56946)

Server investigations:
 raid array (d0ensrv3)
 info server (stkensrv0)

Mover investigations:
 1 unresponsive PC (d0enmvr55a)

Tapes:
 31 volumes recycled (1 for SDSS, 30 for CMS)
 2 volumes cloned (VO5871, VO7445)
 1 TOC repaired (PRO829L1)

Drives:

Robots:
 1 scheduled maintenance (ADIC)

Investigations:


Report
------
* 4/4 (Monday)

D0 - PC d0enmvr55a was hung - pingable, but could not login. Rebooted.

* 4/5 (Tuesday)

D0 -  AML/2 scheduled downtime.

STK - Recycled 1 volume for SDSS.

      Request from Michael Z. to attempt to read JL5013 manually; SDSS
      expected 17 files on this tape, but sees only 11.

      Continued dialog with an SDSS user, who persists in trying to read
      tapes we don't have, then complaining about how long it takes.

      The info server appeared to be hung for at least 10 minutes. There
      was a "server died" alarm, and the process was consuming 20% of the
      memory. Then it cleared up.

      CDF reached its quota of 90 tapes; I increased it to 100.

      Requested the removal of stkendca4a and stkendca5a from NGOP
      monitor configuration.

* 4/6 (Wednesday)

All - Notified all users about the network outage tomorrow morning, and
      queued an at job on each system to pause the libraries at 5:30 am.

STK - Liz BG had questions about how tape drives are shared among users.

CDF - Received "Warning: day old CDF data logging filesets" emails. Saw
      nothing unusual; no writes pending. CDF was writing to 2 drives
      and reading from 14; there were 324 pending reads.

* 4/7 (Thursday)

All - All libraries successfully paused; network outage occurred without
      incident; all libraries unlocked.

D0 -  Raid box on d0ensrv3 went belly up. Wayne and George power cycled
      the raid and rebooted the server. Syslog shows 2478 errors from
      /diska, 517 from /diskc, starting yesterday. None since reboot.
      Similar event occurred Mar 9 - raid power cycled, server rebooted.
      No errors between these events.

      It was determined that LTO-2 drives could be forced to rewrite
      broken TOCs by manually appending EOFs at End-Of-Data.  Volume
      PRO829L1 was repaired by this method.

      29 volumes were referenced in uncleared alarms. Verified that all
      were home (ie, not stuck in a drive or missing); none are currently
      noaccess or notallowed. Break down:
	2 LTO2 volumes set readonly because they were write protected
		PRO899L1
		PRO908L1
	2 LTO2 volumes, 1 9940B volume readonly because of write errors
		PRO897L1
		PRO911L1
		PRS866
	2 LTO2 volumes, 2 9940A volumes "too long in seek"
		PRO899L1
		PRO829L1
		PRL611
		PRM268
	2 clusters of 9940B volumes "too long in state mount/dismount wait"
		6 volumes at 2005-Apr-05 15:47:38
		6 volumes at 2005-Apr-06 14:31:33
	1 cluster of 9940B volumes "too long in state active"
		3 volumes at 2005-Apr-06 12:29:07

STK - Server stkensrv0 was rebooted to switch the network i/f to see if
      using a different network driver alleviates problems on that system
      that cause it to lose time and sometimes hang just after 6 pm daily.
      There have been no "time reset" syslog messages in the last 24 hrs.

      Volume VO5871 cloned (more than 2000 mounts).

      36 volumes were referenced in uncleared alarms. Verified that all
      were home (ie, not stuck in a drive or missing); two are currently
      noaccess, none are notallowed. Break down:
	2 noaccess volumes
		JL5013
		VO7784
	1 9940B volume, 4 9940 volumes readonly because of write errors
		VO7503
		VO7750
		VO7784
		VO7927
		VO7928
	30 9940B volumes "too long in state active"

CDF - Robot is complaining about spent cleaning cartridges:
	2005-04-06 22:11:20     0    Cleaning cartridge CLN536 is spent.
	2005-04-07 00:13:48     0    Cleaning cartridge CLN548 is spent.
      even though it seems to be continuing to use them:

	2005-04-07 12:47:17        Cleaning Cartridge Status
	 Identifier  Home Location    Max Usage  Current Usage
	 CLN536        0, 0, 7, 2,10  100        2
	 CLN540        0, 0,10, 6, 6  100        0
	 CLN541        0, 0, 8, 6, 5  100        0
	 CLN548        0, 1, 7, 3,19  100        20
	 CLN549        0, 1,11,11,22  100        0

* 4/8 (Friday)

STK - 2 volumes being cloned (VO7446, VO6809)