A) Scope
B) Packaging
- purpose of this software
- the purpose of this software is to move SDSS imaging data in standard directory structure to the stk enstore tape robot system and restore it to client machines.
- backup all, select fields or select file types of a run
- Assume that the set of files for the run/rerun is complete (passes status for fileType you are planning on backing up)
- target environments
- stk enstore system
- sdssdp2/3 (DEC ALPHA OSF1 boxes) (required)
- fsgi03 (required)
- collaborators machines that they need to get data to (desired, not required)
C) Design
- The product for doing the storing to the robot will be part of the dp product
- The product for doing the restoring from the robot will be part of the dp product
- user interface (users documentation)
- copyToEnstore run rerun {fileTypes all } {fields all}
- copies the files off of the production machines and onto the enstore robot.
- prepRestoreFromEnstore run Rerun {fileTypes all } {fields all}
- finds space on dp2/3 for the files you will be restoreing from the robot
- restoreFromEnstore run rerun {fileTypes all } {fields all}{rootDir /data/dp3.b/data}
- copies files from the enstore robot back to the root dir
- If all directories do not exist, exit with error message
- fileTypes defaults to all. Below is a shorthand list of what is included in all. You can enter a tcl list of the file types if you do not want all. framesOutputs and sscOutputs are a shortcut for the file types listed below.
- sscOutputs
- scFang
- otherSsc
- astromOutputs
- pspOutputs
- framesOutputs
- fpFieldStats
- fpObjcs
- fpBin
- fpM
- fpAtlas
- fpDiag
- fpPlots
- fpC
- nfcalibOutputs
- ssc
- photo
- logs
- fields can be all or a range of fields designated by the user either listed one at a time separated by a space, or if it is an inclusive range, separated by a dash.
- file names of the code
- dp/etc/enstore.tcl
- list of each script with a brief description of it's purpose
- Internal Commands
- enGlobalsSet
- sets the global variables for the copy
- used to be able to distinguish between a real copy and a test case.
- enFieldsExpand fields
- expands the list of fields that the user gives it so that it shows all the fields
- returns a tcl list of the fields in numerical order
- used by both copyToEnstore and restoreFromEnstore
- enFileTypeExpand run rerun fileTypes
- if user puts in all or just framesOutputs or sscOutputs it expands the list so that it becomes a full tcl list of all file types
- returns a tcl list of the file types in the order that they will be put onto tape
- this should help enable quicker retrieval of the files when we want to get them back off because we won't have to ff or rew the tape
- used by both copyToEnstore and restoreFromEnstore
- enParamFileMake run rerun fileTypes fields paramFileName direction
- Makes a parameter file in /tmp that has a list of all the files we want to put over in it.
- stores this file in /tmp/enstore_$pid.par
- return 0 if sucessful
- used by both copyToEnstore and restoreFromEnstore
- enAddToParam paramFile inDir outDir outDir_2c fileName direction
- checks for file existance and actually adds the file to the parameter file
- used by enParamfileMake to keep repeated code down
- enRunRerunDirMake run rerun
- checks to see if the run/rerun dir is on the robot yet and if it is not makes a run/rerun dir for both copies
- assigns the file family for the run
- calls enFileFamilyNextGet
- returns 0 if successful or error message if it was not able to make the run/rerun or family
- used by copyToEnstore
- enFileFamilyToUseGet
- return the next file family to use
- used by copyToEnstore (enRunRerunDirMake)
- enSubDirMake run rerun fileTypes
- figures out what subdirectories to make on the destination side
- checks to see if directories are there and if they are not there makes them.
- used by copyToEnstore
- do we need this or can this be done recursively???
- enOneFileCp source destination
- stores one file onto the enstore system
- destination will be the directory structure on the robot
- returns 0 if successful and exits if it is not
- used by copyToEnstore and restoreFromEnstore
- enFileFamilyQuery fileFamily verbose
- tells me what files are in a file family
- used by restoreFromEnstore
- should have a verbosity level
- 0 - just run/reruns in the family
- 1 run/rerun + fileTypes
- 2 run/rerun + list of all files in the dirs
- enFileFamilyDir fileFamily verbose
- formated output of enFileFamilyQuery
- used by restoreFromEnstore
- should have a verbosity level
- 0 - just run/reruns in the family
- 1 run/rerun + fileTypes
- 2 run/rerun + list of all files in the dirs
- enFileFamilyFromRun run rerunr
- returns the file family a run is in
- enRunRerunValidate run rerun {enCopyNumber 1}
- verifies that we can read the files back off of the robot
- returns 0 if data are O.K. and 1 if there was a failure
- enFileFamilyVerifyOnRobot fileFamily
- verifies that the fileFamily you want to get is resident on the robot.
- returns 0 if it is and 1 if it is not
- enFileFamilyList
- returns all file families in a tcl list
- enFileFamilyLoad fileFamily
- sends request to load file family into robot
- enFileFamilyUnload fileFamily
- sends request to have file family unloaded from the robot
- overview of the logic
- copyToEnstore
- Use enFieldsExpand to determine required fields to act on.
- Use enFileTypeExpand to determine file types to act on.
- Use enParamFileMake to determine a list of all files to be copied from the local disk to the Enstore robot.
- The list should include only the requested fields and file types.
- The list should contain two entries for each file.
- The complete path for the file on the local disk.
- The complete path for the file on the Enstore robot.
- The complete list of all files to back up with destinations should be followed by the same list with slightly different destinations for the second copy on the robot.
- Use enRunRerunDirMake to make a run/rerun directory structure on the Enstore robot, if one doesn't already exist.
- Use enSubDirToMake to make the apropriate subdirectories on the Enstore robot.
- Note that the creation of this run/rerun directory requires the assignment of a file family name to the run/rerun directory.
- Use enFileFamilyToUseGet to determine which file family to assign this run/rerun to.
- A file family should consist of roughly 20 tapes.
- For each file in the parameter file, use enOneFileCopy to copy the file to the Enstore robot.
- A check is first made to determine if the file already exists on the Enstore robot.
- If the file is already on the Enstore robot, a check sum should be made between the file on local disk and the copy on the Enstore robot.
- If this test fails a warning message should be printed.
- If the file is NOT already on the Enstore robot, copy the file from the local disk with a check to be certain the copy was successful.
- If the file does NOT copy successfully, exit with an error.
- restoreFromEnstore
- Use enFieldsExpand to determine required fields to act on.
- Use enFileTypeExpand to determine file types to act on.
- Use enParamFileMake to determine a list of all files to be copied from the Enstore robot to the local disk.
- The list should include only the requested fields and file types.
- The list should contain two entries for each file.
- The complete path for the file on the local disk.
- The complete path for the file on the Enstore robot.
- For each file in the parameter file, use enOneFileCopy to copy the file from the Enstore robot to the local disk.
- A check is first made to determine if the file already exists on the local disk.
- If the file is already on the local disk, a check sum is made between the file on local disk and the copy on the Enstore robot.
- If this test fails, recopy the file to the local disk.
- If the file is NOT already on the local disk, copy the file from the Enstore robot with check to be certain the copy was successful.
- If the file does NOT copy successfully, exit with an error.
make second copy of all the files in /pnfs/sdss/imaging_2c for data integreity intermediate parameter file :
- /tmp/enstore_$pid.par
directory structure of input
- standard structure on dp2/3
directory structure of output
- It should be noted here that the imaging and imaging_2c must be in different file families to ensure that the files will appear on different volumes. This will ensure that if one of our tapes goes bad, that tape could be removed and we would not loose any data. The "backup" or tape in imaging_2c would be copied and the new volume would be put in the primary imaging directory.
- /pnfs/sdss/imaging
- file family names sdssiXXX
- /pnfs/sdss/imaging_2c
- file family names sdssiXXX_2c
directory structure of intermediate files Error handling
- return codes
- on all high level commands 0 if successful and 1 if not
- after a file has been copied over to the robot check to see if the file size on the robot is the same size as the file size on the input machine. If they are the same size signal success. If they are not you can mv the file to filename.bad and try copying again.
- When copying files off of enstore we should also check input and output file sizes and make sure that they are the same and if the copy did not work, recopy it over.
- recovery processes
- submit the same command that you submitted to start the first copy.
- If there is a failure pcmd debug file will list the debug information about the file transfer to help in debugging why a file transfer fails.
PsudoCode and Code:
- globals enDirName= /pnfs/sdss/imaging enDirName_2c= /pnfs/sdss/imaging_2c enFileFamilyRoot = sdss
- copyToEnstore
- is enStore working?
- no exit with error
- enFileTypeExpand fileTypes
- enExpandFields fields
- enMakeParamFile run rerun fieldList fileList
- enMakeRunRerunDir
- enFamilyToUseGet
- begin looping through param file
- enStoreOneFile filename
- exit with error if error otherwise continue loop
- if copyToEnstore is successful say Success and remove file in /tmp
- restoreFromEnstore
D) Testing
- is enstoreRunning?
- if no exit with error
- enFileFamilyOnRobotVerify
- if file family is not on robot exit with message
- enFileTypeList
- enExpandFields
- enMakeParamFile
- start looping through param file
- enRestoreOneFile filename enDirName
- if error exit with message
- enRestoreOneFile filename enDirName_2c
- if restoreFromEnstore is successful return with message and remove file from /tmp
- regressions tests included in the product
E) Issues list
- test to ensure that the size of the file in is equal to the size of the file out.
- before we declare a run successfully run through we should do one last run through the input list of files and make sure they are all there.
- list of unit tests
- integration test
- How can someone else test that the product works?
- use of enDirName and enFileFamilyRoot as globals to ensure we are writing to and from where we want to be writing during testing phase
- in normal use, leave these unset, to test set enDirName /pnfs/sdss/testing
- questions that need investigation
- What is the real limit on the number of files in a Directory?
- What is the best way to group things into families?
- Is is reasonable to have collaborators remote mount the tape robot to download data or will they be pulling the data they need off onto fsgi03 and then transferring the data?
- is there a policy on mounting off site machines?
- If they can be mounted, have people tried to work through firewalls etc.?
- What should we do if 2 people start backups at the same time?
- do we have a setup command that will also allow us to use the checkVolume command and other higher level commands?
- How big is one tape?
- how do we handle reruns?
- We won't for example want to require1241/0 and 1241/1 to be in the same file family to keep family sizes reasonable.
- list of resolved issues (with answers)
- encp returns a status of OK if the operation completed sucessfully.. does this check to see if file sizes are the same?
- yes, and if it was not sucessful, there is nothing on the tape that is recoverable so we don't have to worry about moving failed encp to .bad
- note: it may be more efficient to use the Pnfs Volume maps to arrange the files for getting them back out of enstore. see sec 1.1.5 of the documentation, I believe that the files will be on the tape in the same order that I copied them, but I will have to save a few runs and then look through the volmap to verify this.
- volmap does show the order that the files were saved onto tape so we will probably want to use it to make our file for getting things off of enstore.
- to setup enstore
- setup -q stken encp
- this needs to be made part of dp dependencies to make ensore work.
- What is the overhead of copying files over one at a time, or 100 at a time?
- This robot was designed so that files could be moved onto it in real-time from experiments so it supports both random access of files as well as streaming the sequential access of successive files on tape.
- by adding --delayed_dismount to the encp command you can give the library manager a hint that there is more work coming for the volume and it should not be dismounted "too quickly" after transfer is completed.
- How do we get a list of File Families?
- ls -l /pnfs/sdss/volmap
- This will give you a list of family names, if you then do a :
ls -l /pnfs/sdss/volmap/$family_name
you will get a list of the tapes that are in that family
if you do a :ls -l /pnfs/sdss/volmap/$family_name/$tape_label
it will tell you what is on that tape.- How many pigeon holes do we have?
- Bakken said we could reasonably expect at least 400.
- Email from Bakken states:
- Please note that there is not a requirement that an entire file family be in or out of the robot at any given time.
- What is a reasonable family size?
- the tapes go into and out of the robot in groups of 21 so 21 may be the best number of tapes to be put into a family. I talked to Rick and he said that entering one full group plus a few extra's makes more sense than opening up the robot to take out or put in a smaller number of tapes.
- point number 2 for 21 tapes, is that for a full nights data normal backup mode takes 7 tapes, so this would make a family ~3 long runs big.
- make 2 copies of the same file name at the same time to keep data integrety
- Is there something special we need to have setup to use the pcmd commands?
- enstore pnfs command
- Is there something special we need to have setup to use the ecmd commands?
- enstore pnfs command
- Do you specify file family with mkdir or with each copy?
- the file family is being specified after mkdir by using the following:
- cd $directory you are putting files into
- enstore pnfs -- file_family $fileFamilyName
- What happens if you copy a file over twice?
- can't happen enstore will give an error that the file is already there
- How can I find out the number of tapes written to a file family at a time?
- ls /pnfs/sdss/volmap/$filefamily...
- to see the code that does it look in enFileFamilyToUseGet