d0_kback v_05.1 Changes from previous: modified to allow backup to Mammoth tape removed a couple silly bugs ----------------------------------------------------------------- How to back up your UNIX project disk at FNAL. Using Computing Division's ocs and fmb packages, I have written a set of scripts to let you back up your disk drives with a few keystrokes each week. The data from your disk drive(s) will be copied to tapes you place in the permanent tape vault at the Feynman Computing Center. This document shows you how to make regular backups using my scripts. It seems like quite a lot, but the first nine steps are one-time only. After you are set up, you only type a one-line command any time you want to make backups. I emphasize: steps 0-9 below are only done once in your life; after that, you just do step 10 when you feel like it. Here's the initial setup: 0) If you can acquire tapes that are already in the FCC tape vault, then do so and skip to step (4). Otherwise, you will need to vault your own tapes with steps (1) through (3). 1) Acquire some "exabyte" tapes, either 8mm or Mammoth, and some official tape label names (called VSN's). As an example, Lee Lueking has taken on the responsibility of assigning tape names for all of DZero. If you are a member of DZero, send your request to lueking@fnal.gov with the following info: i) Reason for request (Monte Carlo, data storage, etc) ii) Maximum number of tapes needed (please let him know if the number of tapes may increase in the near future). iii) Name/email/phone 2) Label each tape on the edge with the VID, then label the top of the tape with the following info: TAPE VID experiment name, group name your name your phone your email 3) Fill out a vault request form (available at FCC or I have an old one scanned in at http://www-d0.fnal.gov/~jkrane/images/vsn.ps ). To add your tapes to the FCC vault, go to the back side of the FCC building. Looking from the parking lot, on the right side of the building, proceed straight through the two sets of doors, straight across the room, into the hall, and go to the window on your left. (You'll feel like you're ordering take-out...) Just pick up the phone if nobody is there. 4a) Create a file ".rhosts" if you don't have one in your home area already. Do "man .rhosts" at the command line for more info. This file will (among other things) let you do backups of disks that are mounted on one machine but are accessible to all machines; this step makes the NSF mounting transparent to you. You will need this file on each machine that has its own drive system. (At DZero, you want to put the file in your home areas on d0cha, d0chb, and d02ka.) 4b) If you are in the Fermilab "strengthened realm", create a file ".k5login" if you don't have one in your home area already. In the file, put the following: jkrane@PILOT.FNAL.GOV jkrane/cron/d0cha.fnal.gov@PILOT.FNAL.GOV jkrane/cron/d0chb.fnal.gov@PILOT.FNAL.GOV jkrane/cron/d0mino.fnal.gov@PILOT.FNAL.GOV ...but using your username in place of "jkrane". This will give your batch jobs the authority to rsh in the strengthened realm. At Fermilab, we need both 4a) and 4b). 5) Create a directory (on d0cha or d0chb or d0mino or ???) from which you wish to operate the scripts. Add the line "setup kback" to your .login file (do it at the command line also, first time). 6) From your backup directory, create a "logs" subdirectory, then copy the input file to your area: mkdir logs cp $KBACK_DIR/x_backup.input . 7) Edit x_backup.input to reflect YOUR information instead of the example. I hope this is self-explanatory if you look at the file. You need LSF batch privileges to run the script in its current form. (All I mean is: you need to be able to submit batch jobs to a queue.) If you don't have LSF privs, you should request them from d0-admin@fnal.gov. Congratulations, you are now ready to use the utility! 8) When you run with brand new tapes, you must label them first. Go to your backup directory and type "klabel" to assign the internal ID of each tape to match its external VSN. This is automated and extracts the info from your x_backup.input. It will take several hours to cycle through several tapes, log files will appear in your log directory or you can monitor the batch jobs with "bjobs" "bpeek" etc. 9) Type the following and provide your kerberos password. Make sure you are on a secure node when you do this! kcroninit (If this command fails, you may need to "setup kcroninit" and then try again. You should only need to do this once for every machine on which you run kback.) That takes care of all the one-time setup! Congrats. When you want to back up the drives that you listed in the input file, you need only do the following. 10) In your backup directory, type kback 1 The "1" argument backs up all drives on your "sequence 1" tapes. A "2" would give you sequence 2. You can add sequence 3 or 4 lists to x_backup.input if you wish. If you want redo a single tape's backup (because of an odd failure, for instance), type kback x prqy99 The "x" says to back up only one tape. The "prqy99" is an example of a tape VSN. Replace it with YOUR VSN when you do this command! The script will search through the input file and put the proper data on that tape. If you forget how to do the arguments to kback (or krecover or klabel) then type the command with no arguments to get a brief help message. (Also, if you forgot to kcroninit, the script *should* automatically prompt you.) The scripts put a file called .lastfull.d0cha (or chb, etc) in the top of each main directory it backs up. This indicates to everyone the date of the last full backup. If you don't have privilege to write to a directory you backup, you'll get an error message to that effect in the log file, but the backup will be successful. The scripts will send you mail as they finish. Always check the logfiles for funny errors. The scripts attempt to self-diagnose, but somebody needs to look at the file. If you want to monitor the progress of your jobs, try the commands "bjobs" (get a list of your batch jobs, running and pending), "bpeek " (look at output of batch job number "n"), and "ocs_tape" (list tape drives available on this machine and who is using them). Questions? Send me email. Want to look at the scripts yourself? After you "setup kback", look in $KBACK_DIR . Have some suggested changes? Definitely let me know. Consider subscribing yourself to the "kback_users" mailing list on fnal.fnal.gov. This is just like any other FNAL mailing list; find more info at http://www-email.fnal.gov/list-use.asp . The list is completely open, so you can subscribe, unsubscribe, or send email to the list at your own discretion. - John Krane jkrane@fnal.gov ------------------------------------------------------------- Version history v1.0 Official release, scripts will - double-check tape label is correct one (we don't want to overwrite somebody else's tapes!) - check that tape is write-enabled - check for any other tape-load failure - copy the requested files from disk to tape The scripts will not - check to make sure the requsted files will *fit* on the tape - spill over to another tape - submit themselves based on what day it is (write your own "cron" job if you want) v2.0 Many little fixes to handle exceptional cases. Can now back up chb drives from cha, 2ka, and vice versa. Removed "_err" files, now error logs are in one main log file. Can now handle >10 tapes per set (thanks V.Sorin). Diagnoses tape drive failure. v3.0 Backup and recover commands echoed to screen first then executed--helps with debugging fmb. Hacked fmb_recover to avoid the "fixup" command if 'x_recover_hack=true' in the input file. Removed bug in krecover that was introduced in v2. v4.0 Added compatablibity with kerberos. Spruced code slightly. v4.1 Found krecover had a kerberos problem: fixed. v4.2 Leared that setting up kerberos inside kback is a no-no. Found a wiser way to detect kerberos-enabled systems. Also tried to avoid sending df errors to screen. v5.0 Added capability to handle exabyte Mammoth tapes. These have 18 to 60 GB holding capacity, compared to only 5.5 GB for 8mm. A new switch resides at the end of x_backup.input to enable Mammoth tape backups. v5.1 Several "if" statements had poor spacing in places where .csh is sensitive to spacing Future plans: add option for partial backups. Computing Division says fmb will be dropped, so the kback scripts may become obsolete before this is implemented. ------------------------------------------------------------- Common problems 1) "rsh error": Your .cshrc or .login files echo text to the screen. This will cause rsh commands to fail. Remove these "echo" commands from the files or put them in if-then blocks as in the man pages for rsh. 2) "Cannot open #blah#": Emacs won't let you read files with the # on them. Fmb won't seem to ignore those files, depite having a mechanism to exclude them in principle. This error is harmless, so please ignore it. 3) "No space left on device": You are trying to put too much stuff on one tape. Divide the disk into to sets and put it on several tapes. (Note: if you get this error while the scripts are looking for the tape label, then you have forgotten to label the tapes in the first place! Run klabel first, then try again when all tapes have a label.) 4) "/bin/sh: /mydirectory: permission denied": Sombody has set their directory so the group can't look in there... including your batch job. Have them "chmod a=rwx" or something similar and try again.