Storage Services Adminstration (SSA) Group

Enstore Library Manager In A "Broken" State

Procedures


Return To Main Trouble Shooting Page

Overview:

The "broken" status of the enstore library managers can occur when the tape robot gives it's status to enstore as being offline or There are multiple consecutive errors in enstore. You will not get this status if the library managers are drained. Prior to the error enstore is running normally and the status of the library managers was unlocked.

Procedure:

Library Manager In A Broken State

Symptoms:
  1. There is pending work and no work actually running.
  2. The enstore/status_enstore_system.html page shows the state of the library managers as being broken.
  3. The command enstore lib --status library_manager_name on any of the servers displays broken as the status.
  4. The enstore ball on the SAAG page is RED
Solution :
  1. Make sure the tape robot and all components are functioning and are online.
  2. Check the AML/2 log or the STK Diagnostic window on FNTT.
  3. Check the Enstore Log to see if the problem was caused in enstore.
  4. Example: Enstore log entry

    10:40:18 stkensrv4.fnal.gov 031976 enstore I STKMC   FINISHED mount VO1979 0,0,10,0  returned ('ok', 0, 'mount VO1979 0,0,10,0 => 0,Mount: VO1979 mounted on   0, 0,10, 0')
    10:40:19 stkenmvr1a.fnal.gov 013865 root I EA11MV  MSG_TYPE=MC_LOAD_DONE  mounted VO1979
    10:40:19 stkensrv4.fnal.gov 017897 enstore I STKMC   query drive 0,0,10,6 => 0,0,0,10,6 online          in use      VO1218     9840
    10:40:25 stkensrv0.fnal.gov 002092 enstore A VOLSRV  MSG_TYPE=ALARM  {'label': 'VO1979', 'root_error': 'NOACCESS', 'severity': 3}
    10:40:25 stkensrv0.fnal.gov 002092 enstore I VOLSRV  VO1979 system inhibit set to NOACCESS
    10:40:25 stkensrv0.fnal.gov 002092 enstore I VOLSRV  pause library_managers for stk.media_changer media_changerare paused due to too many volumes set to NOACCESS
    10:40:25 stkenmvr1a.fnal.gov 013865 root A EA11MV  MSG_TYPE=ALARM  {'root_error': 'volume VO1979 already labeled VO1979', 'severity': 1}
    10:40:25 stkenmvr1a.fnal.gov 013865 root E EA11MV  marking VO1979 noaccess
    10:40:25 stkenmvr1a.fnal.gov 013865 root E EA11MV  transfer failed WRITE_VOL1_WRONG volume VO1979 already labeled VO1979 volume=VO1979 location=0
    10:40:25 sdssdp7.fnal.gov 002662 jhendry W ENCP  transfer file EXfer error: ('fd_xfer write error', 32, 'Broken pipe', 2662)
    10:40:25 sdssdp7.fnal.gov 002662 jhendry E ENCP  INFILE=/opdb/d2/spool/products/acacia-THROUGH-lss.IRIX-ONLY.Part01.20011010.tar OUTFILE=/pnfs/sdss/products2/acacia-THROUGH-lss.IRIX-ONLY.Part01.20011010.tar FILESIZE=279490560 LABEL=VO1979 LOCATION= DRIVE=stkenmvr1a:/dev/rmt/tps0d1n DRIVE_SN=3310000195 TRANSFER_TIME=6.47 SEEK_TIME=0.00 MOUNT_TIME=14.24 QWAIT_TIME=17.37 TIME2NOW=0.00 STATUS=TOO MANY RETRIES ('WRITE_VOL1_WRONG', 'volume VO1979 already labeled VO1979')
    10:40:25 sdssdp7.fnal.gov 002662 jhendry I ENCP  Error after transferring 0 bytes in 1 files in 169.174152017 sec.  Overall rate = 0 MB/sec.  Drive rate = 0 MB/sec.  Network rate = 0 MB/sec.  Exit status = 1.
    10:40:25 stkensrv4.fnal.gov 020246 enstore I EAGLBM  mover_error updated suspect volume list for VO1979
    10:40:29 stkensrv4.fnal.gov 017897 enstore I STKMC   dismount VOLUME 0,0,10,6 force => 0,Dismount: Forced dismount of VO1218 from   0, 0,10, 6
    10:40:29 stkensrv4.fnal.gov 031976 enstore I STKMC   FINISHED dismount VO1218 0,0,10,6  returned ('ok', 0, 'dismount VOLUME 0,0,10,6 force => 0,Dismount: Forced dismount of VO1218 from   0, 0,10, 6')
    10:40:29 stkensrv0.fnal.gov 002092 enstore E VOLSRV  library_managers for stk.media_changer media_changerare paused due to too many volumes set to NOACCESS
    10:40:29 stkensrv4.fnal.gov 020246 enstore A EAGLBM  MSG_TYPE=ALARM  {'root_error': 'LM eagle.library_manager goes to BROKEN state', 'severity': 1}
    10:40:31 stkenmvr1a.fnal.gov 013865 root I EA11MV  dismounting VO1979
    10:40:31 stkensrv4.fnal.gov 031976 enstore I STKMC   REQUESTED dismount VO1979 0,0,10,0
    10:40:32 stkensrv4.fnal.gov 017906 enstore I STKMC   query drive 0,0,10,0 => 0,0,0,10,0 online          in use      VO1979     9840
    10:40:41 stkensrv4.fnal.gov 017906 enstore I STKMC   dismount VOLUME 0,0,10,0 force => 0,Dismount: Forced dismount of VO1979 from   0, 0,10, 0
    10:40:42 stkensrv4.fnal.gov 031976 enstore I STKMC   FINISHED dismount VO1979 0,0,10,0  returned ('ok', 0, 'dismount VOLUME 0,0,10,0 force => 0,Dismount: Forced dismount of VO1979 from   0, 0,10, 0')
    10:40:42 stkensrv2.fnal.gov 020694 enstore A Enstore_Up_Down  MSG_TYPE=ALARM  {'Reason': "['eagle.library_manager down']", 'severity': 4, 'root_error': 'ENSTORE BALL IS RED'}
    
    
    

    In the example above VO1979 was the fourth tape in a row where an attempt was made to write a VOLUME SERIAL number on a tape and it failed due to a label already existing on the tape. This problem happened because the tape was defined to enstore without the "VOL1OK" option. This caused the Library Manager to go into a "Broken" state. The root cause of the problem must be fixed first before the Library manager can be reset. To fix this problem:

  5. Logon to an enstore server as user enstore.
  6. For each library manager that is in a broken state, enter the following commands:
    1. enstore vol --reset-lib=library_manager_name
    2. enstore lib --stop_draining library_manager_name
    3. enstore lib --status library_manager_name
  7. The return from the status command should be unlocked. If it is not go back to step 1.


The URL for this page is http://www-isd.fnal.gov/ISA/Trouble_LibBroken.html
This page was updated on:
Questions or comments? Send mail to:
ssa-group@fnal.gov