Document Actions

Live Drain-off of Write Pools Procedure

DRAFT: How to drain-off write pools on a node that is to be removed from dCache for hardware maintenance... and do so on a live dCache system without a down time. THIS IS AN ADVANCED PROCEDURE with a potential for data loss if not done correctly.
Click here to get the file
Size 19.9 kB - File type text/plain
File contents

FNDCA Dcache Write Pool Drain-off Procedure
v0.999
16 March 2007
RDK


[TODO] Move to word processor or HTML. Exploit fonts/bold face to make clear
which text is printed at terminal and which text is user entered responses.

[TODO] Address TODO comments embedded in procedure.

[MOST RECENT CHANGE] Minor, in logfile references.

================================================================================

Description
-----------

	If you would like to remove a write pool host from dCache for hardware
	maintenance, then as a precaution all "precious" files should be drained
	off to Enstore. In other words, all files not yet on tape should be
	written to tape to guard against file loss should a file system be wiped
	by the hardware maintenance. After removing pools from the dCache
	configuration and precious files are drained-off, the pools can be
	turned off, and the host node released for maintenance.

================================================================================

Overview
--------

	DRAIN-OFF

	1) Remove the pools from the dCache configuration to prevent new files
	from appearing in these write pools.

	2) Trigger the sending of all precious files in these write pools to
	tape. On FNDCA, there is a four hour accumulation time per storage group
	before files are written to tape, requiring this trigger step. Most
	other dCaches have the default zero accumulation time configured. The
	"trigger" action is to set the accumulation time to zero.

	3) Watch the monitoring, check the pools, to detect when and be sure
	that all precious files are written to tape.

	4) Turn the write pool (dcache software) off and then on again, as a
	check that there are no files left unwritten to tape. If any
	files from these pools are sent to Enstore, then these will need to be
	investigated.

	5) Turn the write pool (dcache software) off. Release the hardware for
	maintenance work.

--------------------------------------------------------------------------------

	RECOVERY OF NORMAL CONDITIONS (after maintenance)

	R1) Confirm that the dCache installation is still in place in the file
	system. If not, then it must be re-installed, which may be a trivial
	task if it is only the pool that is lost.

	R2) The write pools should automatically start up with the host node
	reboot. If not, then start the write pools manually. This will
	re-establish the accumulation time settings in either case.

	R3) Replace the pools in the dCache configuration.

	R4) Watch the monitoring to be sure the pools are working properly at
	some level (depending on time available).

================================================================================

Detailed Procedure
------------------

	DRAIN-OFF

	1) Remove the pools from the dCache configuration to prevent new files
	from appearing in these write pools.

1.1) Login as user "enstore" to fndca.fnal.gov

1.2) $ cd dcache-deploy/config

1.3) Make a copy of the file PoolManager.conf and edit the copy to comment out
	the assignment of the pools on the node in question to a pool group.
	After doing this for the pools on w-stkendca10a, one would see:

# psu addto pgroup  writePools w-stkendca10a-1
# psu addto pgroup  writePools w-stkendca10a-2
# psu addto pgroup  writePools w-stkendca10a-3

	where the # character is interpreted by dCache as a comment line. Save
	the file. Diff the copy against the original PoolManager.conf:

	BE SURE NOT TO LEAVE ANY STRAY TYPOS IN OR REMOVE ANYTHING FROM
	THIS FILE UNLESS YOU REALLY KNOW WHAT YOU ARE DOING!

	If only these 3 lines show in the diff, then copy the changed file over
	the original.

1.4) Cause the PoolManager component of dCache to re-read its configuration
	file. This step carries some risk that all other client requests may be
	affected if the PoolManager becomes confused by typos or missing
	information from the file. If you have any questions about this step,
	feel free to task a dCache developer to lend a pair of eyes to be sure
	it is done as stated 8^). Still as user "enstore on fndca, launch the
	"dcache" command interpreter and execute some commands with:

$ dcache
slm-enstore-52108 > exit
Shell Exit (code=0;msg=(0) )
Back to .. mode
 .. > set dest PoolManager@dCacheDomain
PoolManager@dCacheDomain > help
 say  <arg-0>

[LOTS MORE OUTPUT. OK, YOU ARE TALKING TO THE POOLMANAGER NOW.]

PoolManager@dCacheDomain > reload -yes
PoolManager@dCacheDomain > exit
Back to .. mode
 .. > exit
Connection to fndca3a.fnal.gov closed.

1.5) Login as user "enstore" to fndcam.fnal.gov; cd dcache-log ; tail -f
	LOG-[today] to be sure there are no "unit not found" or similar errors
	coming from the PoolManager (symptoms of its data structures becoming
	confused).

	[TODO] add example of bad log output.

	If there are signs of PoolManager confusion, then contact a dCache
	developer to do a immediate restart of the dCacheDomain. Dcache is
	practically unusable in this state, so one must recover it ASAP.

	PROBLEM REACTION in more detail [TODO]



	2) Trigger the sending of all precious files in these write pools to
	tape. On FNDCA, there is a four hour accumulation time per storage group
	before files are written to tape, requiring this trigger step. Most
	other dCaches have the default zero accumulation time configured. The
	"trigger" action is to set the accumulation time to zero.

2.0) The setup you will be changing is initially defined in the files with names
	like "stkendca10a.write-pool-3.setup".

2.1) Change the pool configuration for each of these write pools (one at a time)
	to not wait until an accumulation trigger goes off. Write all precious
	files to tape now. Still as user "enstore on fndca, launch the "dcache"
	command interpreter and execute some commands with:

$ dcache
slm-enstore-52108 > exit
Shell Exit (code=0;msg=(0) )
Back to .. mode
> set dest w-stkendca10a-1@w-stkendca10a-1Domain
w-stkendca10a-1@w-stkendca10a-1Domain > help
 mover remove <jobId>

[LOTS MORE OUTPUT. OK, YOU ARE TALKING TO POOL w-stkendca10a-1 NOW.]

w-stkendca10a-1@w-stkendca10a-1Domain > queue define class enstore * -expire=0
w-stkendca10a-1@w-stkendca10a-1Domain > exit
Back to .. mode
 .. > exit
Connection to fndca3a.fnal.gov closed.

2.2) Check: was step 2.1 done for each of the pools on this node?


	3) Watch the monitoring, check the pools, to detect when and be sure
	that all precious files are written to tape.

3.1) First you want to see if the triggers worked. Check the Pool Usage web page
	to see if there is any precious spacein these pools to begin with. If
	so, are any stores showing in the Pool Request Queues page for these
	pools after several minutes (be patient, big time lag in monitoring).

3.2) When the count of active stores for these pools goes back to zero, and the
	precious space in each pool is zero. Then all precious files are written
	to tape. We are paranoid though, so we will test this several ways.

3.3) Grep for "precious" in pool control state files. The state of each dCache
	file is stored in an auxiliary file co-located (almost) with the data
	file. For the pool w-stkendca10a-1 (write pool, node stkendca10a, first
	instance), as user "enstore" on w-stkendca10a:

[enstore@stkendca10a enstore]$ cd dcache-deploy/config/
[enstore@stkendca10a config]$ cat w-stkendca10a-1.poollist
w-stkendca10a-1 /diska/write-pool-1 [AND MORE STUFF...]

		^^^^^^^^^^^^^^^^^^^
	This tells you where to find the pool in the file system.

[enstore@stkendca10a config]$ cd /diska/write-pool-1

	the data is in the "data" sub-directory, auxiliary files in "control".

[enstore@stkendca10a write-pool-1]$ cd control

-bash-2.05b# grep precious 0*
00300000000000000173AD58:precious
00300000000000000173AE30:precious
00300000000000000173AF00:precious
00300000000000000173AFA0:precious
00300000000000000173B2F0:precious
00300000000000000173FE98:precious
00300000000000000173FF40:precious
00300000000000000173FFD8:precious
0030000000000000017400F8:precious

	This is an example of precious files STILL IN CACHE! BAD! The grep
	should find no matches if all precious files have been written to tape.

3.4) Check: was step 3.3 done for each of the pools on this node?

3.5) Use the script "check_precious" to interrogate the dcache system at a if
	there are precious or confused-state files in these pools. As user
	"root" on the pool node in question:

-bash-2.05b# sh check_precious -all
Thu Dec 21 08:52:04 CST 2006 Scanning for precious files on stkendca10a, Skipping files dated "Feb 30 " "Feb 30 ", also setting find ctime="", Excluding "" files.
Thu Dec 21 08:52:04 CST 2006 w-stkendca10a-1 /diska/write-pool-1
Thu Dec 21 08:52:06 CST 2006 Find of precious files produced 0 files to check
Thu Dec 21 08:52:06 CST 2006 w-stkendca10a-2 /diskb/write-pool-2
Thu Dec 21 08:52:08 CST 2006 Find of precious files produced 0 files to check
Thu Dec 21 08:52:08 CST 2006 w-stkendca10a-3 /diskc/write-pool-3
Thu Dec 21 08:52:11 CST 2006 Find of precious files produced 0 files to check
Thu Dec 21 08:52:11 CST 2006 Finished on stkendca10a

	THIS IS GOOD OUTPUT. No precious files to check and no warnings about
	files in a confused state. The "-all" switch eliminates the default 1
	day age requirement on precious files that normally prevents files just
	arrived in cache from being "checked". Here, for this procedure, we want
	to consider all precious files, young and old.

	PROBLEM REACTION: Precious files still in cache after many hours.

	- Is there a backlog of tape stores in Enstore? This drain-off can take
	hours and proceed at a very uneven pace. Check the Enstore web pages to
	see if tape drives are readily available, for instance.

	- There may be "stale precious" files, which are in fact on tape, but
	have been left in a precious state. Long story. Files in this state
	require some investigation to be absolutely sure each is on tape. Files
	that are on tape can be manually moved from "precious" to "cached"
	state. Proceed with care!

		a) For each file still precious, check the Layer 1 and Layer 4
		information in PNFS to be sure it is consistent with the file
		being on tape. These layers must exist and contain a tape label.

			[TODO] Better way for admins?

			a.1) Translate the PNFSID (eg. 0030000000000000016C9FC0)
			to a PNFS path: cat the SI file. Pick out the
			<directory> and <filename>.
			a.2) cd <directory>
			a.3) cat '.(use)(4)(<filename>)'

			This file is the Layer 4 information and must exist and
			contain reasonable values for the first five and last
			three lines (fields). Example:

-bash-2.05b# cat SI-0030000000000000016C9FC0
	[BINARY DATA DUMP... pick out the pathname (edited version below)]
pathto
/pnfs/fnal.gov/usr/exp-db/daily/d0-offline/d0oflump/2006/12-December/12/channel02/D0OFLUMP_t608955673_s3690_p31
xpt

-bash-2.05b# cd /pnfs/fnal.gov/usr/exp-db/daily/d0-offline/d0oflump/2006/12-December/12/channel02
-bash-2.05b# cat '.(use)(4)(D0OFLUMP_t608955673_s3690_p31)'
VOB415
0000_000000000_0000024
2019328000
daily-d0-offline
/pnfs/fnal.gov/usr/exp-db/daily/d0-offline/d0oflump/2006/12-December/12/channel02/D0OFLUMP_t608955673_s3690_p31

0030000000000000016C9FC0

CDMS116593937200000
stkenmvr10a:/dev/rmt/tps0d0n:479000018392
3835795016

	[THIS IS GOOD INFO. The file exists and the Tape label = VOB415,
	Tape Drive = stkenmvr10a:/dev/rmt/tps0d0n:479000018392]

		b) "check_precious" script will provide instructions on how to
		change the file state from "precious" to "cached".

	- When in doubt at this point, consult a dCache developer or an Enstore
	developer (to see if file is on tape). Better to be careful than risk
	data loss.


	4) Turn the write pool (dcache software) off and then on again, as a
	check that there are no files left unwritten to tape. If any
	files from these pools are sent to Enstore, then these will need to be
	investigated.

4.1) As user "root" on the pool node in question [NOT NOT NOT fndca.fnal.gov!]

-bash-2.05b# ps -auxwww | grep real-encp | wc -l

	If this does NOT return a print-out of "0" (zero), then there are still
	some tape writes going to Enstore and may be in a bad or stuck state.
	Consult a dCache developer.

4.2) As user "root" on the pool node in question [NOT NOT NOT fndca.fnal.gov!]

-bash-2.05b# /etc/rc.d/init.d/dcache-boot stop

	Then, wait a minute or two.

-bash-2.05b# /etc/rc.d/init.d/dcache-boot start

	Then, wait a minute or two.

4.3) Repeat at least steps 3.2 and 4.1 to be sure no new file stores are
	initiated by this. This is another check that all files have been
	flushed to tape.


	5) Turn the write pool (dcache software) off. Release the hardware for
	maintenance work.

5.1) As user "root" on the pool node in question [NOT NOT NOT fndca.fnal.gov!]

-bash-2.05b# /etc/rc.d/init.d/dcache-boot stop

	Then, wait a minute or two.

5.2) Check that there are no stray JVMs running or real-encps... one last time.

-bash-2.05b# ps -auxwww | grep real-encp | wc -l

-bash-2.05b# ps -auxwww | grep bin/java | wc -l

	If these both print out "0" (zero), then the node can be released for
	hardware maintenance.

--------------------------------------------------------------------------------

	RECOVERY OF NORMAL CONDITIONS (after maintenance)

	R1) Confirm that the dCache installation is still in place in the file
	system. If not, then it must be re-installed, which may be a trivial
	task if it is only the pool that is lost.

R1.1) Find out the configured base directory for a pool. Confirm the directory
	structure is intact. If a pool called write-pool-N is on /diska, then
	the following directories should exist or be created and owned by root
	with open permissions [TODO] could perms be reduced to 755 here? :

	/diska/write-pool-N
	/diska/write-pool-N/data
	/diska/write-pool-N/control

	To do this for w-stkendca10a-1 as an example, as user "root" on
	stkendca10a, do:

-bash-2.05b# cd ~enstore/dcache-deploy/config
-bash-2.05b# cat w-stkendca10a-1.poollist
w-stkendca10a-1 /diska/write-pool-1 [AND MORE STUFF...]

		^^^^^^^^^^^^^^^^^^^
	This tells you where to find the name of the base directory of the pool
	in the file system. Lets go there and see what directories exist.

-bash-2.05b# cd /diska
-bash-2.05b# ls -alF
total 4
drwxr-xr-x    3 root     root           25 Feb  3  2006 ./
drwxr-xr-x   26 root     root         4096 Dec 21 10:40 ../
drwxrwxrwx    4 root     root           62 Dec 21 12:51 write-pool-1/

	So far so good. The base directory exists. Underneath the base dir,
	control and data are there...

-bash-2.05b# ls -alF write-pool-1/
total 368
drwxrwxrwx    4 root     root           62 Dec 21 12:51 ./
drwxr-xr-x    3 root     root           25 Feb  3  2006 ../
drwxrwxrwx    2 root     root       196608 Dec 21 11:05 control/
drwxrwxrwx    2 root     root        94208 Dec 21 11:05 data/
-rw-r--r--    1 root     root            0 Dec 21 12:51 RepositoryOk
lrwxrwxrwx    1 root     root           65 Feb  3  2006 setup ->
	/home/enstore/dcache-deploy/config/stkendca10a.write-pool-1.setup


R1.2) There needs to be a symbolic link called "setup" in the base directory of
	the pool that points back into the configuration area at a file called
	something like <pool name>.setup. In the last listing of (R1.1), we see
	that sym-link. If it were missing, one would create it.


R1.3) Check: were steps 1.1 and 1.2 are done for each of the pools on this node?


	R2) The write pools should automatically start up with the host node
	reboot. If not, then start the write pools manually. This will
	re-establish the accumulation time settings in either case.

R2.1) After several minutes, check http://fndca3a.fnal.gov:2288/cellInfo to be
	sure the cells for these pools are listed with a creation time. If they
	show up, then (R2) is done, proceed to (R3). If they do not show up,
	then proceed with (R2.2).

R2.2) Check whether the pool JVMs are running. As user "root" on the pool node
	in question [NOT NOT NOT fndca.fnal.gov!]

-bash-2.05b# ps -auxwww | grep bin/java | wc -l

	If the count printed out matches the number of pools that are supposed
	to be running, then the pools are in fact running (monitoring can
	sometimes be slow to show newly started pools). If the count does not
	match, then proceed with (R2.3)

R2.3) If not all of the pools started up with a reboot (or there was no reboot
	of the system), then as user "root" on the pool node in question [NOT
	NOT NOT fndca.fnal.gov!]

-bash-2.05b# /etc/rc.d/init.d/dcache-boot stop

-bash-2.05b# /etc/rc.d/init.d/dcache-boot start

	The "stop" here covers all bases, in case some pools started but others
	did not, and in case there are some logging fifos still intact which
	have been known to block successful start-up. Repeat Step (R2) until
	success or contact a dCache developer for assistance.


	R3) Replace the pools in the dCache configuration.

R3.1) Make a copy of the file PoolManager.conf and edit the copy to UN-comment
	out the assignment of the pools on the node in question to a pool group.
	After doing this for the pools on w-stkendca10a, one would see:

psu addto pgroup  writePools w-stkendca10a-1
psu addto pgroup  writePools w-stkendca10a-2
psu addto pgroup  writePools w-stkendca10a-3

	where the # character that is removed is interpreted by dCache as a
	comment line. Save the file. Diff the copy against the original
	PoolManager.conf:

	BE SURE NOT TO LEAVE ANY STRAY TYPOS IN OR REMOVE ANYTHING FROM
	THIS FILE UNLESS YOU REALLY KNOW WHAT YOU ARE DOING!

	If only these 3 lines show in the diff, then copy the changed file over
	the original.

R3.2) Cause the PoolManager component of dCache to re-read its configuration
	file. This step carries some risk that all other client requests may be
	affected if the PoolManager becomes confused by typos or missing
	information from the file. If you have any questions about this step,
	feel free to task a dCache developer to lend a pair of eyes to be sure
	it is done as stated 8^). Still as user "enstore on fndca, launch the
	"dcache" command interpreter and execute some commands with:

$ dcache
slm-enstore-52108 > exit
Shell Exit (code=0;msg=(0) )
Back to .. mode
 .. > set dest PoolManager@dCacheDomain
PoolManager@dCacheDomain > help
 say  <arg-0>

[LOTS MORE OUTPUT. OK, YOU ARE TALKING TO THE POOLMANAGER NOW.]

PoolManager@dCacheDomain > reload -yes
PoolManager@dCacheDomain > exit
Back to .. mode
 .. > exit
Connection to fndca3a.fnal.gov closed.


	R4) Watch the monitoring to be sure the pools are working properly at
	some level (depending on time available).

R4.1) http://fndca3a.fnal.gov:2288/cellInfo

	This page should show the affected pool cells alive with a reasonable
	(less than one second) ping time. Allow a few refreshes before assuming
	a long ping time indicates a problem. There can be very infrequent long
	pings if a cell is busy or a transient ping timing issue.

R4.2) http://fndca3a.fnal.gov:2288/queueInfo

	This page would ideally show the pools being used... some active movers,
	restores, or stores. Of course, if there are no clients using files in
	these pools, then you will not get this positive feed-back, and that is
	OK. If the system is busy, all other pools are busy, and these pools are
	not, then there is a problem and one should consider contacting a dCache
	developer for second pair of eyes and perhaps some suggestions.

R4.3) http://fndca3a.fnal.gov:2288/billing

	This page would ideally show some successful movers, restores, or stores
	for the pool(s) in question. It should definitely not show any errors.
	Billing errors on this page can be due to harmless timeouts, but such
	timeouts take hours before occurring. Errors on a newly started pool are
	almost always due to a serious file, metadata, or software problem.


R4.4) Grep the logfile for the log messages from re-introduced pool(s). Login as
	user "enstore" to the pool node hosting the pool in question. For
	example, if the pool is w-stkendca10a-2, which is on stkendca10a:

[enstore@stkendca10a ~]$ cd dcache-log
[enstore@stkendca10a dcache-log]$ tail -f w-stkendca10a-2Domain.log

	- On FNDCA, the logfiles are now located on the same host as the dcache
	component as is writing to them.

	[TODO] Example of "good" log output from pool, how long to wait for
	inventory to be completed (can take minutes in pools with 30k+ files).


================================================================================
by Robert Kennedy — last modified 2007-03-16 12:24
Sections

Personal tools

Document Actions

Live Drain-off of Write Pools Procedure

File contents

Sections

Personal tools

Log in

Document Actions

Live Drain-off of Write Pools Procedure

File contents