Live Drain-off of Write Pools Procedure
DRAFT: How to drain-off write pools on a node that is to be removed from dCache for hardware maintenance... and do so on a live dCache system without a down time. THIS IS AN ADVANCED PROCEDURE with a potential for data loss if not done correctly.
Size 19.9 kB - File type text/plainFile contents
FNDCA Dcache Write Pool Drain-off Procedure v0.999 16 March 2007 RDK [TODO] Move to word processor or HTML. Exploit fonts/bold face to make clear which text is printed at terminal and which text is user entered responses. [TODO] Address TODO comments embedded in procedure. [MOST RECENT CHANGE] Minor, in logfile references. ================================================================================ Description ----------- If you would like to remove a write pool host from dCache for hardware maintenance, then as a precaution all "precious" files should be drained off to Enstore. In other words, all files not yet on tape should be written to tape to guard against file loss should a file system be wiped by the hardware maintenance. After removing pools from the dCache configuration and precious files are drained-off, the pools can be turned off, and the host node released for maintenance. ================================================================================ Overview -------- DRAIN-OFF 1) Remove the pools from the dCache configuration to prevent new files from appearing in these write pools. 2) Trigger the sending of all precious files in these write pools to tape. On FNDCA, there is a four hour accumulation time per storage group before files are written to tape, requiring this trigger step. Most other dCaches have the default zero accumulation time configured. The "trigger" action is to set the accumulation time to zero. 3) Watch the monitoring, check the pools, to detect when and be sure that all precious files are written to tape. 4) Turn the write pool (dcache software) off and then on again, as a check that there are no files left unwritten to tape. If any files from these pools are sent to Enstore, then these will need to be investigated. 5) Turn the write pool (dcache software) off. Release the hardware for maintenance work. -------------------------------------------------------------------------------- RECOVERY OF NORMAL CONDITIONS (after maintenance) R1) Confirm that the dCache installation is still in place in the file system. If not, then it must be re-installed, which may be a trivial task if it is only the pool that is lost. R2) The write pools should automatically start up with the host node reboot. If not, then start the write pools manually. This will re-establish the accumulation time settings in either case. R3) Replace the pools in the dCache configuration. R4) Watch the monitoring to be sure the pools are working properly at some level (depending on time available). ================================================================================ Detailed Procedure ------------------ DRAIN-OFF 1) Remove the pools from the dCache configuration to prevent new files from appearing in these write pools. 1.1) Login as user "enstore" to fndca.fnal.gov 1.2) $ cd dcache-deploy/config 1.3) Make a copy of the file PoolManager.conf and edit the copy to comment out the assignment of the pools on the node in question to a pool group. After doing this for the pools on w-stkendca10a, one would see: # psu addto pgroup writePools w-stkendca10a-1 # psu addto pgroup writePools w-stkendca10a-2 # psu addto pgroup writePools w-stkendca10a-3 where the # character is interpreted by dCache as a comment line. Save the file. Diff the copy against the original PoolManager.conf: BE SURE NOT TO LEAVE ANY STRAY TYPOS IN OR REMOVE ANYTHING FROM THIS FILE UNLESS YOU REALLY KNOW WHAT YOU ARE DOING! If only these 3 lines show in the diff, then copy the changed file over the original. 1.4) Cause the PoolManager component of dCache to re-read its configuration file. This step carries some risk that all other client requests may be affected if the PoolManager becomes confused by typos or missing information from the file. If you have any questions about this step, feel free to task a dCache developer to lend a pair of eyes to be sure it is done as stated 8^). Still as user "enstore on fndca, launch the "dcache" command interpreter and execute some commands with: $ dcache slm-enstore-52108 > exit Shell Exit (code=0;msg=(0) ) Back to .. mode .. > set dest PoolManager@dCacheDomain PoolManager@dCacheDomain > help say <arg-0> [LOTS MORE OUTPUT. OK, YOU ARE TALKING TO THE POOLMANAGER NOW.] PoolManager@dCacheDomain > reload -yes PoolManager@dCacheDomain > exit Back to .. mode .. > exit Connection to fndca3a.fnal.gov closed. 1.5) Login as user "enstore" to fndcam.fnal.gov; cd dcache-log ; tail -f LOG-[today] to be sure there are no "unit not found" or similar errors coming from the PoolManager (symptoms of its data structures becoming confused). [TODO] add example of bad log output. If there are signs of PoolManager confusion, then contact a dCache developer to do a immediate restart of the dCacheDomain. Dcache is practically unusable in this state, so one must recover it ASAP. PROBLEM REACTION in more detail [TODO] 2) Trigger the sending of all precious files in these write pools to tape. On FNDCA, there is a four hour accumulation time per storage group before files are written to tape, requiring this trigger step. Most other dCaches have the default zero accumulation time configured. The "trigger" action is to set the accumulation time to zero. 2.0) The setup you will be changing is initially defined in the files with names like "stkendca10a.write-pool-3.setup". 2.1) Change the pool configuration for each of these write pools (one at a time) to not wait until an accumulation trigger goes off. Write all precious files to tape now. Still as user "enstore on fndca, launch the "dcache" command interpreter and execute some commands with: $ dcache slm-enstore-52108 > exit Shell Exit (code=0;msg=(0) ) Back to .. mode > set dest w-stkendca10a-1@w-stkendca10a-1Domain w-stkendca10a-1@w-stkendca10a-1Domain > help mover remove <jobId> [LOTS MORE OUTPUT. OK, YOU ARE TALKING TO POOL w-stkendca10a-1 NOW.] w-stkendca10a-1@w-stkendca10a-1Domain > queue define class enstore * -expire=0 w-stkendca10a-1@w-stkendca10a-1Domain > exit Back to .. mode .. > exit Connection to fndca3a.fnal.gov closed. 2.2) Check: was step 2.1 done for each of the pools on this node? 3) Watch the monitoring, check the pools, to detect when and be sure that all precious files are written to tape. 3.1) First you want to see if the triggers worked. Check the Pool Usage web page to see if there is any precious spacein these pools to begin with. If so, are any stores showing in the Pool Request Queues page for these pools after several minutes (be patient, big time lag in monitoring). 3.2) When the count of active stores for these pools goes back to zero, and the precious space in each pool is zero. Then all precious files are written to tape. We are paranoid though, so we will test this several ways. 3.3) Grep for "precious" in pool control state files. The state of each dCache file is stored in an auxiliary file co-located (almost) with the data file. For the pool w-stkendca10a-1 (write pool, node stkendca10a, first instance), as user "enstore" on w-stkendca10a: [enstore@stkendca10a enstore]$ cd dcache-deploy/config/ [enstore@stkendca10a config]$ cat w-stkendca10a-1.poollist w-stkendca10a-1 /diska/write-pool-1 [AND MORE STUFF...] ^^^^^^^^^^^^^^^^^^^ This tells you where to find the pool in the file system. [enstore@stkendca10a config]$ cd /diska/write-pool-1 the data is in the "data" sub-directory, auxiliary files in "control". [enstore@stkendca10a write-pool-1]$ cd control -bash-2.05b# grep precious 0* 00300000000000000173AD58:precious 00300000000000000173AE30:precious 00300000000000000173AF00:precious 00300000000000000173AFA0:precious 00300000000000000173B2F0:precious 00300000000000000173FE98:precious 00300000000000000173FF40:precious 00300000000000000173FFD8:precious 0030000000000000017400F8:precious This is an example of precious files STILL IN CACHE! BAD! The grep should find no matches if all precious files have been written to tape. 3.4) Check: was step 3.3 done for each of the pools on this node? 3.5) Use the script "check_precious" to interrogate the dcache system at a if there are precious or confused-state files in these pools. As user "root" on the pool node in question: -bash-2.05b# sh check_precious -all Thu Dec 21 08:52:04 CST 2006 Scanning for precious files on stkendca10a, Skipping files dated "Feb 30 " "Feb 30 ", also setting find ctime="", Excluding "" files. Thu Dec 21 08:52:04 CST 2006 w-stkendca10a-1 /diska/write-pool-1 Thu Dec 21 08:52:06 CST 2006 Find of precious files produced 0 files to check Thu Dec 21 08:52:06 CST 2006 w-stkendca10a-2 /diskb/write-pool-2 Thu Dec 21 08:52:08 CST 2006 Find of precious files produced 0 files to check Thu Dec 21 08:52:08 CST 2006 w-stkendca10a-3 /diskc/write-pool-3 Thu Dec 21 08:52:11 CST 2006 Find of precious files produced 0 files to check Thu Dec 21 08:52:11 CST 2006 Finished on stkendca10a THIS IS GOOD OUTPUT. No precious files to check and no warnings about files in a confused state. The "-all" switch eliminates the default 1 day age requirement on precious files that normally prevents files just arrived in cache from being "checked". Here, for this procedure, we want to consider all precious files, young and old. PROBLEM REACTION: Precious files still in cache after many hours. - Is there a backlog of tape stores in Enstore? This drain-off can take hours and proceed at a very uneven pace. Check the Enstore web pages to see if tape drives are readily available, for instance. - There may be "stale precious" files, which are in fact on tape, but have been left in a precious state. Long story. Files in this state require some investigation to be absolutely sure each is on tape. Files that are on tape can be manually moved from "precious" to "cached" state. Proceed with care! a) For each file still precious, check the Layer 1 and Layer 4 information in PNFS to be sure it is consistent with the file being on tape. These layers must exist and contain a tape label. [TODO] Better way for admins? a.1) Translate the PNFSID (eg. 0030000000000000016C9FC0) to a PNFS path: cat the SI file. Pick out the <directory> and <filename>. a.2) cd <directory> a.3) cat '.(use)(4)(<filename>)' This file is the Layer 4 information and must exist and contain reasonable values for the first five and last three lines (fields). Example: -bash-2.05b# cat SI-0030000000000000016C9FC0 [BINARY DATA DUMP... pick out the pathname (edited version below)] pathto /pnfs/fnal.gov/usr/exp-db/daily/d0-offline/d0oflump/2006/12-December/12/channel02/D0OFLUMP_t608955673_s3690_p31 xpt -bash-2.05b# cd /pnfs/fnal.gov/usr/exp-db/daily/d0-offline/d0oflump/2006/12-December/12/channel02 -bash-2.05b# cat '.(use)(4)(D0OFLUMP_t608955673_s3690_p31)' VOB415 0000_000000000_0000024 2019328000 daily-d0-offline /pnfs/fnal.gov/usr/exp-db/daily/d0-offline/d0oflump/2006/12-December/12/channel02/D0OFLUMP_t608955673_s3690_p31 0030000000000000016C9FC0 CDMS116593937200000 stkenmvr10a:/dev/rmt/tps0d0n:479000018392 3835795016 [THIS IS GOOD INFO. The file exists and the Tape label = VOB415, Tape Drive = stkenmvr10a:/dev/rmt/tps0d0n:479000018392] b) "check_precious" script will provide instructions on how to change the file state from "precious" to "cached". - When in doubt at this point, consult a dCache developer or an Enstore developer (to see if file is on tape). Better to be careful than risk data loss. 4) Turn the write pool (dcache software) off and then on again, as a check that there are no files left unwritten to tape. If any files from these pools are sent to Enstore, then these will need to be investigated. 4.1) As user "root" on the pool node in question [NOT NOT NOT fndca.fnal.gov!] -bash-2.05b# ps -auxwww | grep real-encp | wc -l If this does NOT return a print-out of "0" (zero), then there are still some tape writes going to Enstore and may be in a bad or stuck state. Consult a dCache developer. 4.2) As user "root" on the pool node in question [NOT NOT NOT fndca.fnal.gov!] -bash-2.05b# /etc/rc.d/init.d/dcache-boot stop Then, wait a minute or two. -bash-2.05b# /etc/rc.d/init.d/dcache-boot start Then, wait a minute or two. 4.3) Repeat at least steps 3.2 and 4.1 to be sure no new file stores are initiated by this. This is another check that all files have been flushed to tape. 5) Turn the write pool (dcache software) off. Release the hardware for maintenance work. 5.1) As user "root" on the pool node in question [NOT NOT NOT fndca.fnal.gov!] -bash-2.05b# /etc/rc.d/init.d/dcache-boot stop Then, wait a minute or two. 5.2) Check that there are no stray JVMs running or real-encps... one last time. -bash-2.05b# ps -auxwww | grep real-encp | wc -l -bash-2.05b# ps -auxwww | grep bin/java | wc -l If these both print out "0" (zero), then the node can be released for hardware maintenance. -------------------------------------------------------------------------------- RECOVERY OF NORMAL CONDITIONS (after maintenance) R1) Confirm that the dCache installation is still in place in the file system. If not, then it must be re-installed, which may be a trivial task if it is only the pool that is lost. R1.1) Find out the configured base directory for a pool. Confirm the directory structure is intact. If a pool called write-pool-N is on /diska, then the following directories should exist or be created and owned by root with open permissions [TODO] could perms be reduced to 755 here? : /diska/write-pool-N /diska/write-pool-N/data /diska/write-pool-N/control To do this for w-stkendca10a-1 as an example, as user "root" on stkendca10a, do: -bash-2.05b# cd ~enstore/dcache-deploy/config -bash-2.05b# cat w-stkendca10a-1.poollist w-stkendca10a-1 /diska/write-pool-1 [AND MORE STUFF...] ^^^^^^^^^^^^^^^^^^^ This tells you where to find the name of the base directory of the pool in the file system. Lets go there and see what directories exist. -bash-2.05b# cd /diska -bash-2.05b# ls -alF total 4 drwxr-xr-x 3 root root 25 Feb 3 2006 ./ drwxr-xr-x 26 root root 4096 Dec 21 10:40 ../ drwxrwxrwx 4 root root 62 Dec 21 12:51 write-pool-1/ So far so good. The base directory exists. Underneath the base dir, control and data are there... -bash-2.05b# ls -alF write-pool-1/ total 368 drwxrwxrwx 4 root root 62 Dec 21 12:51 ./ drwxr-xr-x 3 root root 25 Feb 3 2006 ../ drwxrwxrwx 2 root root 196608 Dec 21 11:05 control/ drwxrwxrwx 2 root root 94208 Dec 21 11:05 data/ -rw-r--r-- 1 root root 0 Dec 21 12:51 RepositoryOk lrwxrwxrwx 1 root root 65 Feb 3 2006 setup -> /home/enstore/dcache-deploy/config/stkendca10a.write-pool-1.setup R1.2) There needs to be a symbolic link called "setup" in the base directory of the pool that points back into the configuration area at a file called something like <pool name>.setup. In the last listing of (R1.1), we see that sym-link. If it were missing, one would create it. R1.3) Check: were steps 1.1 and 1.2 are done for each of the pools on this node? R2) The write pools should automatically start up with the host node reboot. If not, then start the write pools manually. This will re-establish the accumulation time settings in either case. R2.1) After several minutes, check http://fndca3a.fnal.gov:2288/cellInfo to be sure the cells for these pools are listed with a creation time. If they show up, then (R2) is done, proceed to (R3). If they do not show up, then proceed with (R2.2). R2.2) Check whether the pool JVMs are running. As user "root" on the pool node in question [NOT NOT NOT fndca.fnal.gov!] -bash-2.05b# ps -auxwww | grep bin/java | wc -l If the count printed out matches the number of pools that are supposed to be running, then the pools are in fact running (monitoring can sometimes be slow to show newly started pools). If the count does not match, then proceed with (R2.3) R2.3) If not all of the pools started up with a reboot (or there was no reboot of the system), then as user "root" on the pool node in question [NOT NOT NOT fndca.fnal.gov!] -bash-2.05b# /etc/rc.d/init.d/dcache-boot stop -bash-2.05b# /etc/rc.d/init.d/dcache-boot start The "stop" here covers all bases, in case some pools started but others did not, and in case there are some logging fifos still intact which have been known to block successful start-up. Repeat Step (R2) until success or contact a dCache developer for assistance. R3) Replace the pools in the dCache configuration. R3.1) Make a copy of the file PoolManager.conf and edit the copy to UN-comment out the assignment of the pools on the node in question to a pool group. After doing this for the pools on w-stkendca10a, one would see: psu addto pgroup writePools w-stkendca10a-1 psu addto pgroup writePools w-stkendca10a-2 psu addto pgroup writePools w-stkendca10a-3 where the # character that is removed is interpreted by dCache as a comment line. Save the file. Diff the copy against the original PoolManager.conf: BE SURE NOT TO LEAVE ANY STRAY TYPOS IN OR REMOVE ANYTHING FROM THIS FILE UNLESS YOU REALLY KNOW WHAT YOU ARE DOING! If only these 3 lines show in the diff, then copy the changed file over the original. R3.2) Cause the PoolManager component of dCache to re-read its configuration file. This step carries some risk that all other client requests may be affected if the PoolManager becomes confused by typos or missing information from the file. If you have any questions about this step, feel free to task a dCache developer to lend a pair of eyes to be sure it is done as stated 8^). Still as user "enstore on fndca, launch the "dcache" command interpreter and execute some commands with: $ dcache slm-enstore-52108 > exit Shell Exit (code=0;msg=(0) ) Back to .. mode .. > set dest PoolManager@dCacheDomain PoolManager@dCacheDomain > help say <arg-0> [LOTS MORE OUTPUT. OK, YOU ARE TALKING TO THE POOLMANAGER NOW.] PoolManager@dCacheDomain > reload -yes PoolManager@dCacheDomain > exit Back to .. mode .. > exit Connection to fndca3a.fnal.gov closed. R4) Watch the monitoring to be sure the pools are working properly at some level (depending on time available). R4.1) http://fndca3a.fnal.gov:2288/cellInfo This page should show the affected pool cells alive with a reasonable (less than one second) ping time. Allow a few refreshes before assuming a long ping time indicates a problem. There can be very infrequent long pings if a cell is busy or a transient ping timing issue. R4.2) http://fndca3a.fnal.gov:2288/queueInfo This page would ideally show the pools being used... some active movers, restores, or stores. Of course, if there are no clients using files in these pools, then you will not get this positive feed-back, and that is OK. If the system is busy, all other pools are busy, and these pools are not, then there is a problem and one should consider contacting a dCache developer for second pair of eyes and perhaps some suggestions. R4.3) http://fndca3a.fnal.gov:2288/billing This page would ideally show some successful movers, restores, or stores for the pool(s) in question. It should definitely not show any errors. Billing errors on this page can be due to harmless timeouts, but such timeouts take hours before occurring. Errors on a newly started pool are almost always due to a serious file, metadata, or software problem. R4.4) Grep the logfile for the log messages from re-introduced pool(s). Login as user "enstore" to the pool node hosting the pool in question. For example, if the pool is w-stkendca10a-2, which is on stkendca10a: [enstore@stkendca10a ~]$ cd dcache-log [enstore@stkendca10a dcache-log]$ tail -f w-stkendca10a-2Domain.log - On FNDCA, the logfiles are now located on the same host as the dcache component as is writing to them. [TODO] Example of "good" log output from pool, how long to wait for inventory to be completed (can take minutes in pools with 30k+ files). ================================================================================