(or at least how to look like you know what you're doing)
Table of Contents
I. Welcome to the D-Zero Collaboration!
A) What do we do at D-Zero ?
B) Why are 8mm tapes so special?
1) The importance of backups (how we hedge our bets)
II. Tapes, tapes, and more tapes
A) A nice table of tape prefixes, creation platforms, data types, etc.
1) reference to a document that you should read
B) Tapes will be naughy sometimes
C) How naughty is the tape?
1) brief summary of what tape problems may occur
D) How you might find out about such tapes
1) Pick_Events
2) A User's Complaint
3) The importance of attending meetings and presenting yourself
III. The Data Aide (a.k.a. "deception aide")
A) Your Mission
1) Keep the Users "using"
2) Fix what needs fixing
3) Ultimately, "automation"
4) The fair warning
IV. Getting the Big Show started
A) Labeling Tapes
B) Initializing tapes (at d0)
1) Using the robot in the control room area
a) who to talk to about getting the password for the on-line cluster
b) how to initialize tapes
C) Vaulting tapes at the Feynman Computing Center
1) Where am I? (directions to FCC)
2) Do what? for which prefix?
a) prefix and data type breakdown
3) While you are here...
a) notes on copy-protecting tapes at the FCC vault
4) Scream from the rooftop
a) who to let know that you are the current tape-minion
5) The importance of being CHEAP!
a) Suggestions on how to delegate responsibilty
6) If you like UNIX...
a) another fair warning
A) An example of a problem tape
1) In detail, how to remedy the problem
a) commands to use
b) how to use them
c) where to find the commands (see also "Appendix of Commands")
d) where to document your work
e) other helpful hints
A) What is the FATMEN catalogue?
1) Why should I care?
B) Using FATMEN to your advantage
1) Examples of...
a) finding backups
b) doing it yourself (what to do if your utilities do not work)
c) another fair warning
VII. Production Data Base - FATMEN's SeQueL
A) What is SQL?
1) Why should I care?
B) Using SQL to your advantage
1) Examples of...
a) finding a creation date
b) also finding backups
c) good estimate of what files should appear on a tape
d) etc.
C) More fair warnings
VIII. Appendix of command and documentation areas
A) COPYMAN
1) commands
a) corresponding examples
B) PROMAN
1) commands
a) corresponding examples
A) Be CREATIVE!
1) making improvements to existing utilities
2) starting from scratch
B) Friendly reminder of Ultimate Goal
( recent collaboration photo)
I.A What do we do at D-Zero ?
D-Zero is the name of the detector we insert into the main ring here at Fermilab to study events of what happens when a proton and anti-proton collide. To find out more about your fellow collaborators (we are not really as mean as the photo above would lead you to believe), click onto the "D-Zero" marker above. Try to avoid looking around too much. Chances are that you may become rather confused. You will learn to better apreciate the WWW as you continue through this document. Bear in mind that throughout this document, I am mainly refering to commands and utilities that are aimed towards those more familiar with VAX clusters. If you are more prone to using UNIX, refer to the VMS to UNIX Translation Table in the man pages for UNIX.
I.B Why are 8mm tapes so special?
We collect a large amount of data and store it in various ways. Mainly data is stored onto the 8mm tape. For safety and efficiency, we keep our precious 8mm tapes storerd at the Feyman Computing Center where many users can access our data at anytime and from virtually anywhere in the world. Note that before I said that we keep a large ammount of tapes. It is therefore imminent that some tapes will have some problems (~ one third of one percent). That is where you come into the collaborative effort. You, as the data aide, are in charge of seeing that all available data is usable. Bad data is clearly not usable. Not all data can be fixed, some of it needs to be hidden from the general public, insuring that analysis at D0 runs smoothly. However, in other cases you may have the luxury of finding that a backup copy of some data already exists, and eases your own workload considerably. The small portion of our total data set mentioned above will become your priority. Follow the instructions in the following manual pages and you should find this position to not be so overwhelming.
(original artist's conception of the Feynman Computing Center, a.k.a. FCC)
As mentioned before, many different problems may occur with tapes we write to at D0. These may range from unmountable tapes to those that are physically damaged. In this chapter, we will look at some fundamental problems for tapes, with some brief explanations for corrective action. More detailed explanations will follow later. This chapter is more for reference than concrete instruction.
II.A A nice table of tape prefixes, creation platforms, data types, etc.
Meanings for some abbreviations...
RAW : unprocessed data from the detector
STA : "STAndard" - contains standard reconstructed data
DST : "Data Summary Tape" - contains important data
MDST : "Micro - DST" - contains most important data
Prefix | Data Type | Platform | Creation Process | Vaulting Procedure |
OM, OMA, OMB, OMC,WM | RAW | VAX/VMS | Online Data Taking Cluster | Not your responsibility |
ON, ONA, ONB | STA | SGI/UNIX | Production Farm | Add to blanks file fnsfe or fnsfg |
WNE, WNG, WNF(WNF is not used anymore) | DST, MDST | VAX/VMS | D0fs - FATMEN backup and Backup DST's | Add to Vault |
INA(mediatype 6) | DST | VAX/VMS | Backup DST's | Add to vault |
YW, YWA, YWB | STA | VAX/VMS | D0fs- PROMAN STA processing | Add to vault |
ONX | DST (fixed only) | SGI/UNIX | Production Farm | add to blanks file fnsfe or fnsfg or fnsfd |
YWF, YWG | DST | VAX/VMS | D0fs - PROMAN DST processing | add to vault |
YY | STA | CRIMSON/UNIX | CRIMSON STA streaming | add to blanks file fnstfy or fnstfz |
UU | DST/MDST | VAX/VMS | Pick_events backup | add to vault |
VGL | RAW | VAX/VMS | RAWdata for Monte Carlo production | add to vault |
DZ | DST,MDST | ? | DLT backups of DST's and MDST's | Not your responsibility |
The above table is remarkably similar to that in Diagnosing and Managing Bad Tapes at the Fermilab D0 Experiment by Josh Norten. Josh held this same position before I came along. It is STRONGLY recommended that you find a copy of this document. The best place to look for a copy would have to be in Dorota Genser's old D0 office on DAB5. I will be refering to Josh's document many times for it has examples that I was unable to put into this document due to some HTML restrictions.
II.B Tapes will be naughty sometimes
Many different problems can (and will) occur with D0 tapes. The root of the problems are essentially the same. All D0 data files are (or ought to be) listed in the FATMEN catalog (to be described later). If a tape is, for some reason, unusable, the FATMEN catalog will still refer users to the bad tape. To remedy this, many utilities have been designed for you to aid in "hiding" bad files from the general public. No files are EVER to be deleted. There is always the possibility of recovery later (tape was merely misbehaving).
II.C How naughty is the tape?
This section will include brief summaries of what can go wrong with a D0 tape. Reasonable (and more detailed) solutions on how to remedy such problems will occur later in this document. This is merely to make you familiar with what you may come across while you are here.
II.C.1 Mount errors
Mount errors occcur for one simple reason. In order for a tape to be mounted properly onto a drive, the drive must be able to read the internal labels assigned to a tape. If a tape's internal labels cannot be read, the tape is deemed un-mountable and the user will not be able to access the data on the tape. Usually a tape is considered unmountable due to overwriting, where someone will write new data to a tape. This makes any other prior existing data on the tape useless. There are some easy remedies for this to be discussed later. If a tape becomes physically damaged (usually from over-use), its internal labels will also be unreadable to the drive, making it unmountable. In addition, bad tape drives or tapes written in compressed mode which are attempted to mount on drives that cannot read compressed tapes will result in mount_errors.
Its up to you to verify the cause of any mounting problems and I STRONGLY recommend reading through chapter 2, section 1, of Josh's document of tape dump samples. I would have included similar documentation here, but was unable to do so. I also recommend reading through chapter3, section 1 of Josh's document in order to get a brief idea of how to dump a tape. I plan to give a more fundamental example later on in an effort to make tape dumping more trivial. In addition, there is a utility called "tape_check" which will be very useful to you in verifying an unmountable tape. More documentation about this utility will come later.
II.C.2 Read errors
Another problem that can arise is if a tape file is unreadable for certain reasons. More than likely, this will occur in tapes of borderline tape quality (somewhat damaged, but not obviously). There is a utility called "tape_in" which will help significantly in determining if a file is unreadable. Better analysis will come later.
II.C.3 List errors
Yet another problem that can happen to tapes is the list error. This is quite possibly the naughtiest of tapes. Although it is not really the tape's fault. A list error occurs if a user requests a file from a tape using the FATMEN catalog and the requested file is not where it should be on the tape. This error arises strictly from overwriting of tapes. When the production farm writes to a tape, the files on the tape are then documented in the FATMEN catalog (a large datbase like our own file "library"). Sometimes the farm makes mistakes that are not fully understood. Sometimes the farm will accidentally overwrite part of a tape, but not change the FATMEN catalog. Other times the farm computing will write a file to a tape, get confused for some unknown reason, and write the same file again at the next sequence number on the tape without changing the catalog. You can see that this problem is not obvious as to a clear solution. Therefore, I wrote a utility called "fm_check" that ought to aid in solving the list error problem. This utility is not perfect, but is a good start to solving this problem. A VERY detailed description will come later.
II.C.4 Tape end errors
A tape end error is very similar to the list error. What will happen is that FATMEN will think that more files should be on a tape than actually exist. Most tapes with list errors will have a corresponding tape end error. However, in other cases, blatant overwriting is to blame. If someone overwrites a tape (puts in new data or different volume label or same volume label but different creation date) but does not make any changes to FATMEN the chances of a tape end error arising are very good. FATMEN may think that a tape should have 147 files when in reality it only has 1 file. This is usually the case for tape end errors where there is no corresponding list error. The "fm_check" utiltiy that I wrote should be of some help with tape end errors. Fm_check should help for tape end errors that have corresponding list errors. If a tape end error arises where a tape fails very prematurely (on the first file). I would recommend doing another procedure which I will describe later.
II.C.5 Physically damaged tapes
Tapes are prone to become damaged. This usually comes from overuse. The more frequently a tape gets used, the more likely that damage can occur. Usually you will receive notification about this problem from operators at FCC. If this does happen, you need to physically remove the tape from the FCC vault (I'll tell you how to do that later) and mark its files bad in FATMEN (also to be described later). There is not much else to do about physically damaged tapes. If a backup for the tape already exists, FATMEN should automatically direct the user to it. If not, you may want to alert the production farm and have them "reprocess" the data. I'll go into more detail later. If RAW data from the detector becomes damaged, you are simply out of luck.
II.D How you might find out about such tapes
There are several different ways to find out about bad (or potentially bad) tapes. In this section, I would like to give a brief overview of how you might start looking for bad tapes.
II.D.1 Pick Events
This utility will quite possibly become a good friend of yours. Pick_events will do a lot of search work for you and send you an alarm alerting you of potentially bad tapes. A very nice feature of pick_events is that when it finds a potentially bad tape, it will put the tape out of rotation (hide it from the general public) until you have the opportunuity to look at the tape. Pick_events will usually test a tape (picked at random or in sequence I am not sure) for some various problems. Pick_events will also test the same tape on a number (usually 4 or 5) of drives and if the same problem keeps happening on all drives tested it "sends" a message to you. No, pick_events is not so sophisticated that it sends messages directly to your mailbox, but its "messages" are easy to find.
Before I go any further into pick_events, it might be a good idea to show you how to log into the realms of COPYMAN or PROMAN...
I am assuming that you already have a FNALD0 account and are at D0. If you do, log in normally.
Now, to get into COPYMAN,running under the D0FSA node, do the following...
D0(some node)> rlogin d0fsa/user=copyman
You may be asked for the password, I cannot give that out over the web. Ask Lee Lueking or Alan Jonckheere to give it to you. If you would like to avoid having to type in the password every time that you login to COPYMAN, ask Lee or Alan to get you onto the .rhosts file for COPYMAN. A similar .rhosts file exists for PROMAN.
The node D0FSA is a D0FS node that has access to use just about any COPYMAN or PROMAN utilty. Other nodes with similar privileges do exist. They are D0FSE, D0FSI, and D0FSU. Many other nodes are on D0FS that you can log onto for processing as COPYMAN or PROMAN. To find some of the other nodes do the following...
COPYMAN:D0FSA> show cluster
The above command will help you determine other nodes that you can log onto that may not be as crowded as A, E, I, or U. Also, note that the above "rlogin" command is for VAX usage. If you are under a unix account, the "rlogin" command will look more like...
D0(some node)> rlogin -l copyman d0fsa
If you are ever planning on working off-site, it will be a good idea to memorize the COPYMAN and PROMAN passwords. Off-site you may have to enter a command like
off_site> telnet d0fsa.fnal.gov
Then simply supply the username (COPYMAN or PROMAN) and the corresponding password when prompted and you are in.
Now, if you are in COPYMAN (or PROMAN) do the following to find out if pick_events has found any suspicious tapes for you...
COPYMAN:D0FSA> dir/size/date pick_events$root:[new.error]*.error_mount_e*
This will let you know of any new mount errors that have yet to be looked into. The format for a mount error may look like the following...
Directory PICK_EVENTS$ROOT:[NEW.ERROR]
WNE464.ERROR_MOUNT_ERROR;1
COPYMAN:D0FSA>
For looking into other types of errors that pick_events can find for you, try the following...
COPYMAN:D0FSA> dir/size/date pick_events$root:[new.error]*.error_read_err*
to find read errors, or...
COPYMAN:D0FSA> dir/size/date pick_events$root:[new.error]*.list_error
to find list errors, or...
COPYMAN:D0FSA> dir/size/date pick_events$root:[new.error]*.error_tape_end
to find tape end errors. Typing out the "error" file will tell you which drives the tape was tested on by pick_events. What exactly to do with these errors will come later. For now, let us continue with other ways that you might find out about problem tapes.
II.D.2 A User's complaint
I am going to assume that you already know how to use some form of mail utility and wish to give you some addresses of people that you should write to very soon. These would be d0farm@fnal and helpdesk@fnal . The farm at d0 and the help-desk at FCC both need to know that you are the new data aide. Simply write them and tell them who you are, how long you plan to be with d0, and the best way to reach you, should any problems arise. This will serve many purposes. First, you will find out about damaged tapes fast (from FCC). Second, if the farm has any problems with tapes, they can find you easily without having to ask around "Who is the new data aide?" Other users in the general D0 population may hunt you down to let you know that they have been experiencing problems with a certain tape. Advertise yourself, let as many people as possible know who you are and if they should need any help, to let you know.
II.D.3 The importance of meeting attendance and presenting yourself
At this time, the members of the production staff meet every other Tuesday at 9:30 A.M in the "d0ghouse" meeting room of one of the portakamps . The purpose of these meetings is to discuss any important issues or to at least cover what has happened within the last 2 weeks. It is a good idea to find out if these meetings still take place. Ask Lee Lueking or someone like him about the meetings. If he is unavailable, Harry Melanson may know. Go to the meetings, your work is quite dependent upon ideas exchanged there. You may also find out about other tapes that may have problems from those who attend these meetings. Also, it is a good idea to provide a short presentation (like a brief transparency) of what tapes you found to be bad in the last 2 weeks and what you did with them. The people at the production meetings need to know these details, so keeping REALLY detailed records is a must! I'll go into more detail later on some tips for keeping records.
( remember, you are expendable! )
Now that you have covered some very basic skills, it is time to look over your mission as a Data Aide at D0.
III.A Your Mission
The data aide will play many roles during their time at D0. Basically, keep users of D0's data happy and using our data at their will. You will oversee that any problems with access to data be taken care of in a timely fashion. You will also see to it that anyone who needs tapes to write to (production farm, special project personnel, etc.) have tapes at their disposal. You will also keep meticulous records of your work so that if anyone should question the whereabouts of data, they can find out immediately. Your mission is to keep accessible data at a high quality standard and at the same time, attempt to "automate" your job. I quoted the word "automate" because it is impractical to think that you can truly automate this position. Some human decision and action will have to occur. If D0 can make a computer do as much of the work as possible, it will be much cheaper than a human. Minimizing human interference is your ultimate goal by (with time and practice) making this job as easy as "the push of a button", as Lee Lueking would like to see it.
I know this may seem like a nasty task to pursue, but someone has to do it. I did it for a year, so can you.
At this time I feel the need to warn you that you are a luxury for D0. You will not have the opportunity to take this position on permanently. Hard work will keep you here for awhile, but not forever. Try your best to avoid premature termination. You are not here for financial gain, but more for experience that will hopefully help you prosper with later pursuits. Eventually, you will terminate your postion at D0. Keeping you on staff costs D0 an appreciable amount of money that could be spent elsewhere. Use your time here wisely and it should pay off in the long run.
The term "deception aide" is a joke. The data aide basically hides unwanted things (much like a hit-man for the mafia) from the public at large. You also lead a somewhat dangerous life. Any malicious mistakes can get you into SERIOUS trouble. So I a going to repeat myself, NEVER delete any files from the FATMEN catalog unless absolutely neccessary!
( The data aide has many skills to master )
Before getting started, now may be a good time to mention PROMAN's tape_report.com. This command runs every 3 days and lets you know which tape series need to updated ( add more tapes to vault ). Look into Chapter VIII (VIII.B.3.a.1 to be more specific) for more details on how to get this mail sent to you. And now, on with the show!
IV.A Labeling tapes
Labeling tapes is very much a necessity for the succes of D0.
IV.A.1 - O.K. , where do I get these tapes?
A reserve of tapes for D0's use is kept in Lee Lueking's office (DAB5) in a tall filing cabinet. The top drawer of this filing cabinet contains some of Lee's files. The bottom 4 drawers should have some tapes in them. If the cabinet is locked, the key should be in the top drawer of Lee Lueking's desk. If there are no tapes present, you would need to go to the site 38 warehouse to pick up some more. To get tapes, you need 2 things. First, have your d0 budget code changed to "3" to make unlimited purchases at the warehouse. Second, you will need an order form to take to the warehouse. Copies of such forms are located in a binder on the rack next to Lee's desk entitled "UU Tape.txt + Tape Copy". Inside are the order forms that you will require. You only need to get tapes, not additional tape cases. To get to Site 38, leave DAB and take Swenson Rd. back to Eola Road(stop sign). Turn left and go to Batavia Rd. (stop sign). Turn left, and take left hand fork (Road D) to Road B (first intersection past stop sign at CDF). Make a right hand turn onto road B and take it back to Receiving Road (1st intersection past stop sign) and make a right hand turn. Follow Receiving Road back to Site 38, It should be the third building on your left hand side. Once at Site 38, go into the main entrance and ask if your order can be filled. If your order cannot be filled, you should then try the big stockroom towards the back of the building. Show them your order form and they should fill it for you. Note that Site 38 personnel will only take goods to the main door. They WILL NOT load your car for you. If you cannot load your own car, bring someone with you to help. Tapes CANNOT be delivered to D0. Tapes are very sensitive to heat, cold, and (worst) humidity. Site 38 staff refer to tapes as "sensitive itens" and WILL NOT deliver them for you. Do not keep these tapes in your car overnight. Take them straight back to DAB and place them into Lee's filing cabinet.
IV.A.2 - O.K. , how about some labels?
Excess tape labels should be in Lee's office on the desk opposite Lee's (with an X-Terminal on it). Also, more labels may be in the VAX lab of Northern Illinois University's Physics Department (should you be in the area). If you need more labels printed, the procedure is really quite simple. Send mail to helpdesk@fnal and put "A note for Linda Blomberg " in the subject area. Leave a detailed message for Linda. Letting her know which prefix (like ONX) and which series of label to be made ( like 001 - 200) and if you need top labels, side labels, or both (usually both). She will process your request at her convenience. Keep in mind that she only works Monday through Wednesday and if you needs labels during a current week it is a very good idea to leave her mail by Tuesday afternoon. If Linda is not there, any other operator at FCC should be able to fill a request, Linda is just the fastest at it. Be sure to include in your mail message that you would like a reply message when the task is complete, this usually makes the operators work faster.
IV.A.3 - How to Label Tapes
Put the big label on the big side of the tape, the small label on the small side of the tape. Be sure to take tapes out of the box and out of their plastic. Use the same plastic case that the tape came in for vaulting.
IV.B - Initializing tapes (at D0)
Now would be a good time to describe initializing tapes. If you will refer back to Chapter II. Tapes, tapes, and more tapes you will see the table for tape prefixies, etc. If you will notice that some tape prefixes have a creation platform for UNIX, and Some for VAX/VMS. Tapes with a UNIX platform will not need any internal labels before vaulting, just extrernal. However, VAX platform tapes are not that easily dealt with. You need to "initialize" the tape before vaulting it. This will require you to venture downstairs to the control room area.
IV.B.1 Using the Robot in the Control Room Area
If you go down to the first floor and enter the Control Room Area, you will notice an eating/meeting area on your right. If you continue just beyond that and look to your right you will notice a terminal labeled D0HS15. Next to D0HS15 is the robot "stacker". The stacker is capable of intializing 10 tapes at a time. Load 10 tapes into the stacker, lowest sequence number on the bottom (like ONX101), to highest sequence number on top (like ONX110). next, close the door to the stacker and let the robot load the first tape into the drive. If successful, return to your terminal. If not, try reloading the tapes.
Once you have returned to your terminal, you will need to get access to the password for the online cluster, specifically for the username "BACKUP_MGR". To obtain this, contact either Tacy Joffe-Minor or Tracy Taylor Thomas. To find out how to reach them, try using the "whod0" utility by typing the following from any FNALD0 node...
D0(some node)> whod0 acy
This should give you some names, E-Mail addresses, and phone numbers. Send mail to Tacy and Tracy letting them know that you are the new data aide and that you need to know the password for username "BACKUP_MGR" . One of these 2 people should be able to give you the password, chances are they will write you back asking you to call them and they will give you the password then. Write the password down in a safe place. Now to get onto the on-line cluster and initialize some tapes do the following...
D0(some node)> set host d0hs15
Enter the username BACKUP_MGR and then the password at the appropriate prompts. Then...
D0HS15> @preinit_tape2
Preinit_tape2 is a command file that will ask you some questions, I will go through an example with you for initializing ONX101 through ONX110
1st tape label (6 char): ONX101
***** Prefix not equal to "OMB"? *****
***** Are you sure???(yes/[no]): y
Init [10] tapes? 10
Initing tape label ONX101 <date>
*****Is that correct? ([y]/n): n
The program will then execute and give you some confimation when it is done like...
Normal Exit with 110 tapes done at <date>
If the tape drive ever gets a tape caught in it, pull the tape out and try over. It is a good idea to keep the drive clean. Use a cleaning tape after about every 50 tapes initialized. If you have more than 10 tapes to initialize, pull the preceding 10 tapes out of the stacker, load it with the next 10 tapes, and repeat the above steps from "@preinit_tape2". It is not necessary to logoff of the online cluster after every 10 tapes initialized, but do logoff when you are done with the stacker with a command like...
D0HS15> logoff
This will return you to your previous node. Most commands at D0 have been shortened to 2 or 3 characters. For instance, instead of typing...
D0HS15> logoff
You could have just as effectively typed...
D0HS15> lo
You will have to practice with DCL commands for a while to get used to saving keystrokes with abbreviations. Or try doing the following...
D0(some node)> show sym lo
LO*GOFF == "LOGOFF/FULL"
If you ever are in question as to what a symbol means, try the "show sym" command or "show log" command for logicals.
Now that you know how to initialize tapes, lets go over how to vault them at FCC.
IV.C - Vaulting tapes at the Feynman Computing Center
The FCC is where D0 stores its tapes. However you can't just drop the tapes off and expect the operators to know what you want, there are some vaulting procedures to follow to make the operators work for you. But first it might be a good idea if you at least knew where to take your tapes that you have just labeled (and perhaps initialized).
IV.C.1 - Where am I? (directions to FCC)
The Feynman Computing Center is very easy to find. Leave DAB and take Swenson Road back to Eola Road. Turn left and take Eola to Batavia Road. Turn left and take left hand fork (Road D) to Road B (first intersection past stop sign at CDF). Turn right onto Road B and the FCC will be off to your left. Turn into the parking lot. The entrance to the building that you want is at the far end of the parking lot (1- West). Go straight into the building through to the 3rd set of double doors. The FCC I/O counter is there as well as the entrance to the vault.
IV.C.2 - Do What? For Which Prefix?
Across from the I/O counter is a rack with some forms. Look for the form entitled "Tape Vault Activity Form". Take a minute to look over this form. Vaulting tapes is virtually painless. Fill out as much as you can of the top of the form. If you look back to Chapter II. Tapes, tapes, and more tapes I have included a handy table that will tell you which instructions (last column of table) to include in the "User Comments" field. Then supply the tape prefix, as well as the sequence of tapes that you wish to vault. Unless you are removing bad tapes from the vault, always check the "addition" box. I doubt that you will ever need to use the "No Vault" option. If you are still unsure of what procedure to follow for which series of tapes, write d0farm@fnal and someone there should be able to help you or at least recommend who is responsible for a certain tape prefix. Removing tapes is very much the same procedure, just provide an explanation of why you need to remove a tape ( damaged, need to make backup copy, etc. ).
IV.C.3 - While you are here...
While you are at the FCC, it is a very good idea to copy-protect tapes. Copy-protecting tapes on a regular basis ( about once per week ) gives us some piece of mind. As I have mentioned before, list-error and tape-end error tapes are really nasty and happen all too often. These problems almost always occur from overwritten tapes. Copy-protecting tapes make it virtually impossible to overwrite a tape. Copy-protecting tapes is VERY easy. If you take a good look at an 8mm tape, you will notice that on the narrow side of the tape, there is a small red tab at the side of a "window". If you take a pen ( I prefer the mini-screwdriver of my Swiss-army knife ) and close the window ( cover the window so that all you see is red ), the tape is copy-protected. To avoid having anyone putting the tape back into write mode, place an official Fermilab "PROTECTED" sticker over the window. If anyone intentionally removes this sticker and overwrites data on a tape, they will get fired. Now, you don't want to go crazy and copy-protect any tape that you might see in the vault. Some tapes in the vault are empty and still need to be written to. So, I wrote a utility in COPYMAN called copylist_maker.com. More of how this utility works is in Chapter 8.
IV.C.3 - Scream from the rooftop
It would be a good idea to let certain people know that you are the current D0 data aide. Basically, any FCC vault operators or supervisors is a good place to start. If you are friendly to them, they will be nice to you in return. Jack MacNerland is another person at FCC with whom you should get aquainted. His office is on 1W of FCC and he can do you a favor if you ask nicely. Stop by his office. Introduce yourself as the current D0 data aide and let him know that you would like to have your name added to the list of those who have access to the vault. He will ask you for your Fermilab I.D., so be sure to bring it with you. Once you are on his list you can use your Fermilab I.D. to enter the vault area without having to pester the operators. It might also be a good idea to introduce yourself to Wayne Baisley, the D0FS manager. His office is on the 2nd floor of FCC west. Wayne is very helpful for certain applications (more details later) and he likes to have people visit him. Closer to D0, any farm shifter (like Ki Suk Hahn, Suyong Choi, Heidi Schellman, etc.) are good people to let know who you are. Find Stan Krzywdzinski (current FATMEN manager) and let him know that if he needs any help, you are available. Nobuaki Oshima is also somene to find. Should you require a UNIX account, Mike Diesburg would be the best person to contact. Also, anyone else that you may encounter at the production meetings.
IV.C.4 - The Importance of Being CHEAP!
Although it may seem that D0 or Fermilab as a whole, boasts a large budget, it can only go so far. It is a good idea to be cheap. If you are here on behalf of another university, like Northern Illinois University for example, you may want to ask around and see if some less-expensive student help is available to help you out with some of the more remedial chores. Allow me to pose an example with tape labeling and initializing. While school was in a regular session, I had the luxury of having student employees (from the NIU physics office) at my disposal for labeling and initializing tapes. Not because I was lazy, they were just cheaper labor and did not mind picking up the extra hours for their paychecks. I would make sure that the VAX lab for the physics department was well stocked with tape labels and tapes, and send a request for tapes to be labeled to be completed during the next week. I would send a request on a Thursday, and pick them up the following Friday (8 days later). It was a nice system and left me more time to do more important things. Since I lived near the University, it was never a problem to make sure that they had plenty of tapes and labels as well as pick completed tapes up. If you think that there is ever a task that someone else can do just as easily and more cheaply, by all means- ASK! There is no penalty for asking, the worst that someone can do is say "No, thank you".
IV.C.5 - If you like UNIX...
There is always a good chance that you are already more familiar with UNIX platform than with VAX. If you are more UNIX oriented, chances are that you are going to hate this manual. Just about every utility that has already been set up for your disposal is programmed in DCL ( Digital Command Language, like BASIC for a VAX system) or FORTRAN. If you need to convert all commands and files from VAX to UNIX, there are some pages on the WWW that should help you. Refer to the UNIX at Fermilab on-line manual for VMS to UNIX migration if you should feel the need to convert from VMS to UNIX. Personally, I am not very fond of UNIX. VMS seems closer to the English language and I use it as a crutch. Just wanted to offer fair warning. Now, lets get into some real work!
( you are bound to find some surprising tape problems )
Now that you have had a chance to get some of the basics down, it might be best to go through a brief example of what a bad tape may look like and how to treat such a tape.
V.A An example of a problem tape
This section will go into some detail of how I may have handled a certain problem tape. Note that there are many different types of problems that can go wrong with tapes, and with many different solutions. This section is for reference and if you look at chapter 8, it will have more detail as to which commands to use for specific tape problems
And now on with the show!
First thing that you are probably going to want to do is log on as COPYMAN. If at D0, try the following...
D0(some node)> rlogin d0fsis/user=copyman
D0FSIS is just a node that I like to use because no one else seems to use it so it is always available. Now you may want to open another window ( I usually use 5 DECTerms at any given moment ). Now that you have another window do the following....
D0(some node)> rlogin d0fse/user=copyman
D0FSE is another node that I like to use as copyman. It may not always be available, so try A, I, or U instead. Now that you have 2 windows as copyman, try the following...
CHECKING PICK_EVENTS FOR NEW ERRORS...
D0FSIS> dir/size/date pick_events$root:[new.error]*.error_mount_error
This should give you an output to your terminal that might look like...
Directory PICK_EVENTS$ROOT:[NEW.ERROR]
WNE464.ERROR_MOUNT_ERROR;1
1 6-AUG-1996 21:57:57.49
Now try the following...
D0FSIS> cd tc_area
D0FSIS> dir *check*.list
The "cd" command is the same as "set default", "tc_area" is a symbol for the directory "USR$ROOT21:[COPYMAN.TAPE_CHECK]" . Pulling all ".list" filenames from the directory containing check in the filename is done with the second command. Your output may look like ...
Directory USR$ROOT21:[COPYMAN.TAPE_CHECK]
CHECK_1996_09_05.LIST;1
The list will contain various tape Visual Serial Numbers (a.k.a. - VSN's) that were checked on the date implied in the filename. You may now want to make a similar list...
D0FSIS> edit check_<today's date>.list
Now you will be in the editor editing a new file. Now type in " WNE464 ", then press the " control " and " Z "keys simultaneously to exit the editor. If you have a list of several tapes that you would like to check, be sure to assign each tape its own line ( by hitting the carriage return after each tape entry ) in the list. Now type the following...
D0FSIS> tape_check check_<today's date>.list
" tape_check" is a command that will try to mount the tape on a random drive and keeps a log file in the directory USR$ROOT21:[COPYMAN.TAPE_CHECK.LOG] or " tc_log " for short. If you should ever want to execute a utility interactively, insert the "@" before the command name and you can watch the command as it executes. Otherwise, not using the "@" will submit most jobs to a batch queue essentialy freeing up your current window to perform more tasks. The appendices to come later will go into more detail on some of these matters but for the purpose of this example, just trust me.
When tape_check is finished it will give you a "notification" like the following...
Job CHECK_<date> completed <time>
When this notification arrives you may then check the log file by entering the following...
CHECKING THE LOG FILES OF YOUR WORK
D0FSIS> edit/read tc_log:<date>.log
or...
D0FSIS> type/page tc_log:<date>.log
This will let you look at the log file for the tape_check job. Should you choose the first option, be sure to enter "control" and "z" in order to exit the editor when you are finished. For a more brief summary of what happened try the following...
D0FSIS> search/exact tc_area:tc_tested_tapes.list "<date>"
The "/exact" qualifier is merely to let the search command continue searching even though you are using non-alphanumeric characters ( in this case the hyphen). Most commands in DCL have qualifiers that can make execution of commands more specific, and therefore more efficient ( less leftover garbage to weed through ). If you think that a DCL command can be executed more effectively, refer to the "VMS User's Manual" for greater detail. "tc_area" is a symbol for "USR$ROOT21:[COPYMAN.TAPE_CHECK]" and don't worry, I will cover more symbols that you should know later. "tc_tested_tapes.list" is a rather large file that contains brief summaries of what has happened to various tapes over time. Should you notice an output like the folowing...
WNE464 - mount OK : < tape drive > : < date > < time >
then the tape is mountable and you can proceed to do the following...
D0FSIS> delete PICK_EVENTS$ROOT:[NEW.ERROR]WNE464.ERROR_MOUNT_ERROR;1
Deleting the error message in Pick_Events will put the potentially bad tape back into rotation. However, should the result of the search command show a "mount ERROR" message, you have more work to do. Tape_check will automatically dump a tape that has any errors during mounting. To get a closer look at the dump try the following...
D0FSIS> cd tc_dump
D0FSIS> dir WNE464.dump
The first command will change your default directory to that of USR$ROOT21:[COPYMAN.TAPE_CHECK.DUMP]. The second command will let you know if any files of the specified type exist in that directory. Now, try typing out the dump file to have a look at what is wrong. Try and compare the results of your dump with those of Josh's to find out what the problem ( if any ) is. After looking at the dump file, it is a good idea to move the dump file to a more specific directory. In this case, try the following...
D0FSIS> ren tc_dump:wne464.dump [copyman.tape_check.dump.wne]wne464.dump
"ren" is short for the "rename" command. The directory that you sent the dump file to should contain only dumps for tapes with the prefix "WNE". This method of moving files to other directories whenever possible actualy makes book-keeping easier in the sense that no one directory will become cluttered with too many files. If you should ever feel the need to create a new directory (See VMS Users Manual) to make your book-keeping easier, DO SO! You should try to keep your records as neat and orderly as possible. Let's just say that for the sake of exammple, your dump looked remarkably similar to that of Joshs document, section 2.1.1-B ( NO Volume 1 label). You might be wondering, "Whatever shall I do now?". The procedure is actually quite painless. Tapes without a VOL1 label are not mountable or useless. You then have to mark its files bad in FATMEN.
MARKING FILES BAD IN FATMEN
There is a direcctory in the realm of COPYMAN where you will find some useful reference lists. The directory is called USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA] or del _mgmt for short. Set your default directory to del_mgmt. Now extract the directory. You will notice 3 lists of significant importance. Those being BAD_FILES.LIST, BAD_TAPES.LIST, and SCSI_TAPE_PROB.LIST. BAD_FILES.LIST contains a listing of tape files that have been marked bad. BAD_TAPES.LIST contains a listing of tapes where the ENTIRE tape is bad. and SCSI_TAPE_PROB.LIST contains a list of tapes whose problems were that other than a SCSI bus reset. In this case, a tape with no VOL1 label. we would have to opt for marking this tape bad not only in FATMEN, but also in BAD_TAPES.LIST and SCSI_TAPE_PROB.LIST. This makes for the start of some good record keeping. More in-depth record keeping will be described later in this chapter. I merely wanted to make you familiar with some lists that you will be updating frequently.
To mark files bad in FATMEN, you need to do 2 things. First, make sure that you are logged on as COPYMAN on either D0FSA, E, I, or U. The utility that you will use uses PDB, which uses RDB ( DEC's Relation DataBase) and we only have license for these nodes to use this utility. Next you will need to set your default to USR$ROOT21:[COPYMAN.TAPE_DEL.UTIL] or del_util for short. There is a utility that has already been created for your disposal called ALL_PREFIX_TAPE_DEL.COM . To use this utility for this example tape enter the following...
D0FSA> @all_prefix WNE464
This utility will extract what files FATMEN thinks should be on the tape and create a list in a format to be sent to a FATMEN server to mark with "mediatype 13". Marking a file with mediatype 13 will not delete a file from FATMEN. It will merely make the file "invisible" to other users. I'll go into more detail about this later. For now, watch all_prefix_tape_del.com execute and wait for this finishing prompt...
Files to delete: _<date>.list_del
Now type the following to send the list over to the FATMEN server...
D0FSA> @remove _<date>.list_del
KEEPING RECORDS FOR YOUR OWN REFERENCE
You should receive some confirmation that the file list was sent over to FATMEN. Now there is more work to be done. This is where the record keeping begins. Type the following...
D0FSA> ren del_util:_<date>.list_del del_del:wne464_<date>.list_del_sent
D0FSA> ren del_util:_<date>.list_all del_all:wne464_<date>.list_all
This will rename 2 types of lists to make record keeping easier. First, list_del is a copy of what you sent to the FATMEN server. Second, list_all is a more descriptive listing of the task that you performed. Next set your default to del_mgmt and type the following...
D0FSIS> edit bad_tapes.list
This will take you into the editor mode and you can update this file to include the tape that you have just marked bad in FATMEN. Please try to keep this list in alphabetical order. Use the guidelines at the top of the list to fill in the appropriate fields with the proper information. Then type "control" and "z" to exit the editor. Now type...
D0FSIS> edit scsi_tape_prob.list
Now perform similar book- keeping for this list. It may not be so obvious where to find things like the drive, node, or creation date for a tape. For this information, you will have to enter the production data base (PDB) using a data base language called SQL. To do that, first you need to be logged onto some D0FS node as PROMAN. Then type...
D0FSE> d0setup prod_db
D0FSE> sql
SQL> set transaction read only wait;
Using the last command is the most important. Almost any command that you will use in SQL must be followed by a ";" at the end. ALWAYS start an SQL session as read-only, you DO NOT want to make any changes to PDB. Now try the following...
SQL> show tables
Now try to look at the table for processed tapes...
SQL> show table proc_tape
After looking over this table try entering the following...
SQL> sel * from proc_tape where vis_tape_label='WNE464';
Anytime that you ask SQL to search for a string containing alphabetical characters, you must include single quotes on either side of the search string. Any alphabetical letters must be capitalized. If only numbers are to be searched for, you may include the single quotes, but it is not necessary. You should be able to extract the drive, node, and creation date from here for your list. If you should see a weird creation date like "NOV-15-1858", don't worry about it. That is VMS's default creation date if one is not supplied during initial staging of the tape. Just put a "?" into the creation date field of scsi_tape_prob.list . Or you can always "guess" from the timestamp of any of the dsn filenames on the tape. Take for example, a dsn filename like...
ALL_086836_34.X_STA01REU1215_ALL00_NONEX00_5011314
The timestamp assigned to this file is 5011314, or 2 P.M. (14th hour) of Jan 13 (13th day of first month) of 1995 (5th year).
When done entering what you need into scsi_tape_prob.list, enter " control " and " z " to exit the editor. Now, this is only the begining of record keeping. When I started this position, Dorota Genser thought that it was a good idea to create a directory reserved for documentation of your work. This directory ( and its corresponding sub-directories ) is in the realm of COPYMAN as a subdirectory of del_mgmt.
There are 2 main directories for tape and file documentation. The first is USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA.TAPE_DOC]. The second is USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA.FILE_DOC]. Getting to these directories is very easy, type the following as COPYMAN...
D0FSIS> cd del_mgmt
D0FSIS> down tape_doc
or ...
D0FSIS> cd del_mgmt
D0FSIS> down file_doc
Now, you can look through the subdirectories of these 2 directories by entering a command like..
D0FSIS> dir .dir
I STRONGLY recommend that you continue going further into the subdirectories until you find a .txt file like USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA.TAPE_DOC.WNE]WNE717.TXT .
Read through some of the tape_doc files for entire tapes marked bad and some of the file_doc files for individual files marked bad to look for trends in the problems with the tapes and how they were remedied. You do not have to follow the format that I use for documentation, you may develop your own. However, it might be a good idea to keep your documentation in these directories for the future reference of not only yourself, but those that will hold this position after you.
Other advice that I might offer is to look into FATMEN ( covered in next chapter ) and see if a backup exists of any files that you may have marked bad in FATMEN. Also, if you should be notified (by mail, etc. ) that a tape has been damaged and is useless. You may want to go over to the FCC and remove the tape. This procedure is quite similar to adding tapes. If you need help filling out the removal form, ask an available operator for assistance. Now that you have a REALLY basic knowledge of how to treat a bad tape, let us continue to find out more about your mentor, FATMEN.
( the data aide is a protector of FATMEN )
VI.A - What is the FATMEN catalogue?
The FATMEN catalogue is a database that contains various information about all data files produced by experiments at Fermilab. We at Fermilab borrowed the FATMEN concept from our good friends at CERN ( also home to the WWW ) in Switzerland. FATMEN stands for File And Tape MANagement and why it is called FATMEN instead of FATMAN I could not tell you. There are many ways to manipulate the FATMEN catalogue. In this section, I will only cover some very trivial and basic procedures that you may have to use at some point in tme. If you need to do some work in FATMEN with greater detail, kindly refer to CERN Program Library Long Writeups Q123 . The CERN document covers FATMEN in every detail, plus a "trick" or two that I will show you here.
VI.A.1 - Why Should I Care?
FATMEN is a good place to find out some info about a suspicious file. Unfortunately, noone has really set up too many utilities to manipuilate FATMEN to YOUR advantage. So, most work that you do in FATMEN you may have to do by hand, or write your own utilities ( to be covered later ). For now, lets look through some examples of what you may want to consider using FATMEN for.
VI.B - Using FATMEN to Your Advantage
Here we will go over some examples of how to use FATMEN. First, it might be nice to know how to get into FATMEN. The easiest way to access FATMEN is to be logged on as PROMAN and from that window simply type the following...
D0FSI> setup_exe:FM.EXE
Or you may use the easier method to get into FATMEN...
D0FSI> fm
D0FSI is just one node that you may use to access FATMEN, others include D0FSA, D0FSE, or D0FSU. Once you are in FATMEN, you may want to test a few very basic things. But first, let me show you how to leave FATMEN...
FM> end
and then type...
FM> exit
or you can best simplify leaving FATMEN with...
FM> quit
I find using the " end " then "exit " sequence to be much safer, but you make your own decision. If you are still in FATMEN, try entering the following command...
FM> set/mediatype -1
It is a good idea to enter the above command at the beginning of any FATMEN session. This command will clear all mediatypes letting you look at all mediatypes stored in FATMEN ( including mediatype 13, which would normally be " invisible " to you ). Now, let us get into some examples...
VI.B.1 - Examples of ...
VI.B.1.A - Finding a Backup Copy of a File
Lets say that you have a file in question that perhaps is on a tape of marginal quality and you are not sure whether or not to make a backup copy ( to be described later ). Take the following dsn filename for example...
ALL_087446_52.X_DST01REU1215_ALL00_NONEX00_5010819
This is a " DST " file and should have a backup copy, but you want to make certain. The procedure is really quite simple. First make sure that you are logged on as PROMAN in at least one window. Now, type the following...
D0FSI> cd [proman.disk_maint]
D0FSI> edit junk.txt
This will take you to the directory USR$ROOT21:[PROMAN.DISK_MAINT] and enter you into the editor editing a new file junk.txt ( use any name you like ). Simply enter the dsn filename(s) that you would like to look at (give each dsn filename its own line in this .txt file) and then exit the editing session with the usual " control " and " z ". Now try the following...
D0FSI> @dsn_to_gen.com junk.txt
DSN_TO_GEN.COM will take a list of dsn filenames and translate them to a generic filename (or close to it ) that FATMEN can recognize. The output will be a file called junk.gen ( in this example ). Now you can do the following...
D0FSI> ty junk.gen
Here you will see how dsn_to_gen.com translated the filename that you entered and made a generic filename. Take some time to look at the differences and similarities between a dsn filename and a generic one. About the only thing missing from a FATMEN generic filename is the time-stamp at the end of the dsn filename. If you use FATMEN often enough you should be able to create the generic filename yourself without needing to use dsn_to_gen.com all of the time. Now you are ready to have a look at what FATMEN has to say about the generic filename that you are questioning. So, do the following...
D0FSI> fm
FM> set/mediatype -1
FM> ls -a //FNAL/D0/CLDA/93P1E18/DST01REU1215/ALLALL00NONEX00/*/R087446P52
The command " ls " is a command to list some information about a file. The "-a " qualifier is to let FATMEN know that you would like all information available about a file. FATMEN does have online help, to get to it, simply type "help" and go from there. If you look at the results of the FATMEN search, you will notice that FATMEN has come up with 2 entries for that filename. One on tape DZ0047, fsn 221. The other on WNE653, fsn 102. WNE653 is the 8mm copy, DZ0047 is a DLT backup copy. Since there exists a backup copy in FATMEN of the file, you need not worry too much about going through the trouble of making an 8mm backup copy for this file. FATMEN should direct the user to the correct location for the DLT copy should the user need to access this file.
If you would like to learn more about how the FATMEN catalog applies to D0, have a look at the following document...
D0$FATMEN$ROOT:[DOCS]FATMEN_GUIDE.TXT
This document will more clearly describe what each part of a generic filename means.
Every once in awhile, your utilities may not work for you ( for some strange, misunderstood reason ). Should this occur, and you need to make some alterations to FATMEN, it would be best if you new some more of how to manipulate FATMEN when the need arises.
VI.B.1.B - Doing it yourself ( when your utilities don't work )
Let's look back at the example from chapter 5 when an unmountable tape had to have its files marked bad in FATMEN. At that time you used the " all_prefix " command to aid you in making the list to send to the FATMEN server to mark some files with mediatype 13. Now, suppose that all_prefix did not feel like working for some reason and you still needed to mark some files bad in FATMEN. The best thing to first do is to try using the utility again, perhaps some parameters were entered improperly. If that does not work and you have tried most everything else, you have to do it by hand. I will freely admit that this is very tedious, but sometimes becomes neccessary. In this case you may want to try the following. First, find Stan Krzywdzinski ( current FATMEN manager ) and let him know that you need to make a ".kumac" file of a list of files to be marked bad in FATMEN. He should then give you the format of how the list should appear. I'll give a brief overview here with the example, but Stan would probably best describe it to you in person. We will start with one dsn file, that being ALL_086836_11.X_DST01REU1215_ALL00_NONEX00_501907 . Now you must try to convert it into a generic filename. Using dsn_to_gen.com will only give you a partial generic filename like //FNAL/D0/CLDA/93P1E18/DST01REU1215/ALLALL00NONEX00/*/R086836P11 . The actual generic filename will look more like //FNAL/D0/CLDA/93P1E18/DST01REU1215/ALLALL00NONEX00/08680/R086836P11. You need the full generic filename in order to make an accurate .kumac file. You will also need to know the " Key Serial Number " or KSN for short. To obtain what you need to know for making a .kumac file, try the following in PROMAN...
D0FSI> fm
FM> set/mediatype -1
FM> ls -bn //FNAL/D0/CLDA/93P1E18/DST01REU1215/ALLALL00NONEX00/*/R086836P11
All of the information that you will need should be available after executing the above commands. First, you will have the corrected directory ( without the * ). Second, the KSN and last the VID (essentially the same as VSN ). These parts you will need to make a .kumac file. Now, go to a directory that you want to create your .kumac file. Del_mgmt in the realm of COPYMAN might be a nice place to work from. Start a new file in the editor with a name like junk.kumac. The correct format for "moving" a file in FATMEN is ...
mv <generic filename> _
<generic filename> <ksn> _
<lcoation code> <data representation> <mediatype> <VSN>
<VID> <FSN> <dsn> <hostname>
If we only wanted to make a .kumac file for moving the generic filename above so that it was changed to meditype 13, and no other changes were to be made, the format might look like...
mv //FNAL/D0/CLDA/93P1E18/DST01REU1215/ALLALL00NONEX00/08680/R086836P11 _
//FNAL/D0/CLDA/93P1E18/DST01REU1215/ALLALL00NONEX00/08680/R086836P11 1019 _
! ! 13 WNE464 WNE464
Any unnecessary elements can be substituted with an " ! ". There is no need for trailing exclamation points. If you look at the VMS User's Manual you should be able to find some hints on how to make compiling a list by hand somewhat easier by using the correct commands in the editor ( like "learn" and "repeat" ). Once you have compiled your list, you have 2 options. You can simply send the list to Stan and let him know that you want the changes made in FATMEN. You could do this yourself ( with proper authority ) by entering commands in PROMAN llke...
D0FSI> fm
FM> exec usr$root21:[copyman.tape_del.mgmt_area]junk.kumac
The .kumac file that you just created actually has a command (or series of commands ) built into it. " mv " is a command for " move ". " exec " tells FATMEN to execute the file that you specify. Essentially executing a series of commands more quickly than doing it by hand. If you are ever unsure of what to do in FATMEN, ask Stan or refer to the on-line FATMEN manual underlined above. Now that you have a brief idea of what you might find in FATMEN, it is probably best to offer another fair warning.
VI.B.1.C - Another Fair Warning
You will not always find what you are looking for in FATMEN no matter how hard you try. Sometimes entries just are not made in FATMEN or are made incorrectly. Do not be surprised to find some strange things in FATMEN like the following example for tape with VSN WNG004.
Let's say that I tried to copy ( read ) tape WNG004 with COPYMAN's tape_in utility ( to be described later ) and saw that the 107th ( last ) file on the tape resulted in a " parity error ". My first instinct might be that the tape is slightly ( physically ) damaged in that particular tape position. So I might try to mark that file bad in FATMEN with no success. Then I would try to look in FATMEN and see what it says about this file with a command like the following...
FM> ls -bn //FNAL/D0/CLDA/93P1E18/DST01REU1219/ALLALL00NONEX00/*/r090434p05
And FATMEN told me that this file should actually reside at WNG016, FSN 001. If I then tried to read tape WNG016, I would find that the file does exist at WNG016, FSN 001 and was read without any errors. This happens every once in awhile. Sometimes on the production farm, a file will be read in from the spooler ( input or output I am not certain ) to be put onto a tape, but there is not enough room on the tape to fit this file. The spooler will then hold the file in the queue until it can find a place to fit it ( another tape ). This is nothing to be alarmed about and no changes need to be made to the catalogue. Its just a pest to your job. Also, you may find that with list errors, sometimes FATMEN thinks that a filename should reside on a different tape or FATMEN has no listing for the file at all. Why things like this occur in the catalogue is not clearly understood by me. I am told that this problem is maintained by other people and not of my concern. So, there is no need for it to be of any concern to you.
VI.B.1.D - Some advice to follow while in FATMEN
While in FATMEN, it may be unclear of where to look for certain files. The FATMEN catalog has a directory structure that must be followed with attention to detail. Most files that you will be looking for in FATMEN will start with //FNAL/D0/CLDA. This translates as collider data from the D0 Detector at Fermilab. Monte Carlo data taken for the D0 collaboration will have the directory extension CLMC instead of CLDA. Some practical advice to take while in FATMEN would be to use the "ld" ( list directory ) command to see some options of subdirectories and then the "cd" ( change directory ) command to move your way around through the catalog. Once you have changed to a new directory, you may type "ld" again to see more subdirectories of the directory that you just changed to. Probably the most perplexing directory of the 7 directories within a filename would be the fourth one. This directory is dependent upon the run number during which the data was taken. A breakdown of which directory to use for which run number is as follows...
Lowest Run Number FATMEN Directory Highest Run Number
0 < 92p1e18 < 72250
72250 < 93p1e18 < 93887
93887 < 95p1e18 < 94817
94817 < 95p1e06 < 95398
95398 < 95p1e18
So, for DSN filename ALL_087446_52.X_DST01REU1215_ALL00_NONEX00_5010819 , the run number is 87446 and is therfore found in the 93p1e18 directory in FATMEN. Now might be a good time to look into the Production Data Base ( PDB ) and see what it has to offer to help you in your pursuits here at D0.
( Production Data Base, although far more eloquent than FATMEN, can be recalcitrent at times )
VII.A - What is SQL?
SQL stands for Structured Query Language. I really don't know much about its development, it is merely a language that we use to manipulate our Production Data Base ( PDB ) here at D0.
VII.A.1 - Why should I care?
Being able to properly manipulate PDB with SQL will enhance your ability to find information about data. Your main concern is making certain that FATMEN is as correct as possible. Sometimes you may have a questionable file or files about which FATMEN has no information ( even if you use the " set/mediatype -1 " trick ). No one is really sure why this happens, but it does. Its your job to deal with it and overcome these obstacles to help maintain FATMEN properly. In this chapter, I wish to go over some fundamentals of using PDB with some examples that should help you.
VII.B - Using SQL to your advantage
SQL can be a very handy tool to know ( many people use it in the private sector ) and is not to be taken lightly. In this section, I will go over some routine examples. Should you need to dwell deeper into PDB, you have a couple of options. First, ask around. Chances are that a lot of people here know how to manipulate PDB better than you. Stan Krzywdzinski is probably your best bet. He is the current FATMEN manager and is quite famialr with how to manioulate PDB. Second, if you need really detailed information about SQL, refer to the VAX Rdb/VMS Guide to Using SQL/Services manual. A copy of this reference should be in Stan's office. Third, once you are in SQL, you may use the on-line help manual simply by typing "help". Now that we have covered how to get more information on how to use SQL, it might be handy to know how to enter PDB. Its really quite painless. From nodes like D0FSA, D0FSE, D0FSI, or D0FSU, make certain, that you are logged on as PROMAN and type the following commands...
D0FSI> d0setup prod_db
D0FSI> sql
SQL> set transaction read only wait;
The last command is quite possibly the most important. It will set your entire session in SQL as read only, therefore eliminating any possibility of you altering the database by accident. Once you are in PDB, it is really quite simple to leave, simply press the " control " and " z " keys simultaneously and you will be returned to your original node as PROMAN. Now that you know how to get in and out of PDB, lets go over some things that you might do while there.
VII.B.1 - Examples of...
a) Finding a creation date
As I have said in prior chapters, you have to keep meticulous records of your work. Let's say for example that you found that a certain tape ( ON0001 ) is compressed. Not that a compressed tape should alarm you, it is just a good idea to keep a record of which tapes you have found to be compressed so that if a question about the tape should come up again, you have a quick reference instead of going through the trouble of testing it over again. In COPYMAN's realm, the del_mgmt directory to be specific, there exists a file called SCSI_TAPE_PROB.LIST that contains info about tapes with problems other than SCSI bus resets ( more details later ). If you have determined that a tape is compressed, you must enter some information about the tape into this list. You can get much of this info within COPYMAN ( tape VSN, data type, and problem ) but some other necessary information is not so obvious. Yes, you can probably guess the creation date by looking at the timestamp of a sample file from the tape, and you can probably get the number of files on a tape by executing del_util:all_prefix_tape_del.com for the tape. But there are some faster and easier ways to get this information. That leads us to PDB. You still require the number of files on the tape, the creation date, the drive and node created on, and the data type. Now, try entering PDB again and we'll go over some commands to find the remainder of necessary information
SQL> show tables
This command will give you a list of tables from whch you can extract information, provided you give the proper parameters. Now try...
SQL> show table proc_tape
This command will show you which parameters you can use within the table. The "column names " are like types of files within a directory. Tape_id is like a .txt file whereas node_name might be like a .list file. The table is like a directory and you will want to search through its contents to find any information that you might need. The commands that you will enter in SQL are somewhat similar to DCL commands for searching a directory. I guess the basic format for a search in SQL would be...
SQL> sel all <column name(s)> from <table> where <column name>='<input>';
An example command would be like...
SQL> sel * from proc_tape where vis_tape_label='ON0001';
This command will give you all corresponding matches from every column name ( due to the wildcard "*" ) in the table ( proc_tape )for a VSN (vis_tape_label ) of ON0001. Now you have the drive ( a.k.a. tape unit name), node, and creation date for this tape. You still need the number of files on the tape and the data style. To get that you might want to enter a command like...
SQL> show table sta
SQL> sel all filename from sta where tape_id=13583;
The number of "rows selected" will tell you how many files should be on the tape, and you can take the data style from any filename, in this case " STA 1213 ". You probably have several questions right now. The first question ought to be, " How did you know to look into table STA for this info "? Simple, I guessed. I could have looked in many places, but I was most likely going to find it in STA or DST. Second, "Why are some entries in COPYMAN's del_mgmt:scsi_prob_tapes.list preceded by 'CLDA' instead of 'RAW', 'STA', or 'DST' for the data style column"? Good question, I don't know. Almost all data that you are going to be researching will be of the form "CLDA" or " collider data ". This is real data from the D0 detector, very rarely will you ever have to worry about "CLMC" or " collider Monte Carlo " data generated by a computer simulation. It is best to remark upon the reconstruction style rather than the collider type. Note that no records are kept in PDB for CLMC data. If you should run across some questionable CLMC data, you should probably make an additional note in COPYMAN's del_mgmt:scsi_prob_tapes.list, so that you don't confuse it with CLDA. But CLDA should be a default data style and you don't have to waste time including it. Third, "Why did you follow your command with a ";" after some commands but not others"? Another good question, the ";" should be used when selecting some information or setting a transaction. The ";" is not necessary for " showing ", like showing a table. I didn't write the code, I don't know why this is, just deal with it. Fourth, "Why did you include single quotes around your input query for some commands and others not"? SQL needs to search exactly for input queries involving any charachter that is not a number. If your input query is entirely numeric, you may exclude the single quotes if you like, but it is not necessary. I usually just always use the single quotes so that I don't have to think too deeply about whether or not to use them. Develop your own style, I'm merely letting you know your options.
VII.B.1.b - also finding backups
Every once in awhile, you may encounter a tape that is clearly bad (overwritten, damaged ) and none of its files are usable. In this case, it may be a good idea to find out if a backup copy exists. Most DST tapes ought to have a backup copy on DLT format. Some other tapes will also have a backup of some form or other. When you have a tape that you are curious about the existence of a backup, you may want to try and check FATMEN first by llisting a sample file from the bad tape. If more than one entry is listed for the filename, there is a good chance of a backup copy existing. If this search is inconlusive, you will have to go to PDB. If you wanted to know if a backup existed for a tape like WNE795, you may want to try the following...
SQL> sel * from proc_tape where vis_tape_label='WNE795';
You will probably be most interested in the tape_id. Take this info and...
SQL> sel * from dup_tape_copy where tape_id=18284;
Dup_tape_copy is a table for files that were copied from tape to tape (usually 8mm to DLT). This command will give you a large list of files that were on WNE795 and were later copied over to another tape. You wil need to take a file_id from this list to find out the VSN of the DLT copy. Try the following...
SQL> sel all tape_id from dst where file_id=187567;
SQL> sel all vis_tape_label from proc_tape where tape_id=18246;
Now you at least know that a backup copy exists, DZ0062 ( Most DLT backup tapes will start with the prefix " DZ "). Eventually you will have to figure out how to update FATMEN to forward a user to the DLT copy. Talk to Stan Krzywdzinski about this matter as needed. There are several other ways that you could have found the same information, this was just one example. When a problem does arise, take some time to develop your own style for manipulating PDB. Play around with the various tables and their corresponding column names, and you will quickly see what PDB can do for you. This may seem intimidating at first, but you will catch on.
VII.B.1.c - good estimate of what files should appear on a tape
Let's say that you have a tape that you suspect may be overwritten and want a quick approxiamation of what files should appear on the tape. PDB can help you with a situation like this. Use commands similar to the above example to find an approxiamation of how many files should appear on YY1884.
SQL> sel all tape_id from proc_tape where vis_tape_label='YY1884';
SQL> sel all filename from strm_sta where tape_id=10861;
And you will see that 43 files should appear on this tape, when in fact only 1 does ( physically ). Clearly this tape was overwritten and you should follow the measures described in later chapters.
VII.B.1.d - etc.
PDB can help you in many ways that could not be described in this manual. You will merely have to play around with it for awhile and when you discover a new way to retrieve some information, try and take careful notes of it. This way you can refer back to your notes when a similar problem arises
VII.C - More fair warnings
At this time I feel the need to warn you of some limits to PDB. Yes, it can help you a lot, but only if you really know how to use it properly. The following will be an example of how to handle a really odd situation.
Recently, Nobuaki Oshima sent me some mail saying that Heidi Schellman was having a hard time locating 8mm copies of certain files. She felt certain that only DLT copies existed because that was all that she could find in FATMEN. Nobu passed this problem onto me because I had helped him with some PDB problems before. Let's take an example of how I found an 8mm copy of the file ALL_086816_01.X_DST01RTU1215_ALL00_NONEX00_5020121 .
First thing that I tried was a command like the following...
SQL> sel * from dst where fm_filename='R086816P01';
This gave me 3 tape_id's that I could have chosen from. But I wanted the tape_id that had an exact match with the above DSN filename. In that case I did the following...
SQL> sel all vis_Tape_label from proc_tape where tape_id=18284;
This lead me to tape WNE795. I felt content for the most part at this point, but wanted to invewstigate further. I then tried to copy this tape using the tape_in utility ( to be discussed later ) in COPYMAN's realm to see if this file really did physically reside on the tape. As it turns out it did reside on the tape, and I felt happy in my finding. So, I wrote Nobu and told him what I had found. He wrote back saying that file I had found was really of little interest. If you look at the filename, you will notice a reconstruction type called "RTU" . This is a test reconstruction, and how it ended up on an 8mm tape is a mystery. Nobu then sent me on another quest, to find an 8mm copy of the file with a more standard reconstruction type like "REU". So, my search started again. I entered the first command above and it only led me to the DLT tape DZ0046. At first I felt like I might have a real problem on my hands. But I tried something different, something that proved worthwhile. I started from scratch with a command like the following...
SQL> sel * from raw_file where fm_filename='R086816P01';
This gave me a raw_file_id of 138238. I then looked into where this file may have went after that with a command like...
SQL> sel * from raw_file_sta where input_file_id=138238;
This gave me 2 options for output_file_id's. I pursued both by using similar commands in the sta_dst table which gave me even more file_ids. Last I checked all of these file_ids with a command like...
SQL> sel * from dup_Tape_copy where file_id=139278;
Then lastly I checked proc_tape with the tape_id that this command gave and got the correct tape VSN of WNE657. I then performed another copy job using the tape_in utility ( to be discussed later ) and found that this file ( with "REU" reconstruction ) did physically appear on the tape. I sent my results to Nobu and Heidi and they were pleased. It is still not clear whether or not to change PDB to make such files easier to find or to bother changing FATMEN if we have a perfectly usable DLT copy to refer users to. But the point remains that PDB can find out a plethora of information for you, if you know how to use it.
By this time you are probably wondering what a lot of these " utilities " are that I have mentioned or the best way to keep records. In that case, lets move on to the next chapter...
( Nothing to fear, this chapter will make your life much easier)
This chapter will help you through all of the whys and wherefores of both COPYMAN and PROMAN realms. As I have mentioned before, many utilities have already been designed and are at your disposal. Both COPYMAN and PROMAN host a variety of utilities to aid you in your job. With each utility described here, there will be a brief explanation of how and why to use them. Also, I will include some good places to look for log files and where to keep your own records. Good luck!
COPYMAN is the fellow to use for diagnosing and managing bad tapes and/or bad files. This section exists in an outline format and starts with a main directory, then works its way through subdirectories, telling you of anything significant along the way.
D0FS::USR$ROOT21:[COPYMAN]
This is the main directory of COPYMAN. In general, a lot junk here. Probably the only thing of significance is the .rhosts file. The first time that you log on as COPYMAN, you will probably need the password. To change this, simply edit the .rhosts file and add the info that can log you on without using the password. You will need to enter the node that you will be logging in from, and your username. If you may be logging on as COPYMAN from more than one node, you will have to enter each node individually. This is just some preliminary bookeeping that will save you some time later. Now let's go into some subdirectories ( and their subdirectories, etc. ) of [COPYMAN] .
VIII.A.1 - D0FS::USR$ROOT21:[COPYMAN.COPYMAN]
This directory is of particular interest to you. It contains some very useful utilties for tape copying jobs. To get to this directory ( and you are already logged on as COPYMAN ) simply enter...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> cd cm_area
Once you are in, try looking at some of the command files...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> dir *.com
There are only a couple of commands that you will be most interested in.
VIII.A.1.a - Commands in cm_area
VIII.A.1.a.1 - TAPE_IN.COM
This command will aid you in tape-copying jobs. It will copy a tape to disk if you give the right parameters. IMPORTANT: using " nl: " as the disk name will merely create a directory of the tape's content and store the directory in cm_log:<tape VSN>_in.log ( to be described later ). Here is how it works...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> tape_in
This utility will ask you a series of questions. Fill in the blanks ( when neccessary ) and don't forget to enter " nl: " for the disk name at the appropriate prompt for almost all of your tape copying jobs. Also, when asked to enter a node, use a node that has tape drives that you are certain are working. Right now, D0TSUS$MKA400 ( you only need enter D0TSUS ) is probably the best working drive that you have access to at FCC. In the past, you were supposed to be able to use D0RSES$MKA300. This drive was supposed to be reserved exclusively for your use and was a very good working drive. Recently, this drive was taken away and repaced with a really crummy drive. You can stiill use D0RSES$MKA300, but you may get frustrating results. Shop around for drives that seem to work decently and keep a record somewhere.
VIII.A.1.a.2 - TAPE_OUT.COM
This command will let you copy a tape to another tape ( useful for making backup copies ). It is written on a similar premise to that of tape_in of making user inquiries before executing. Terry Heuring uses this command more frequently than you ever will. If you have any questions, he would be a good person to speak with. Dorota Genser wrote this utility, so she would be the best person to speak with, but she does not work for D0 anymore and you may experience difficulty reaching her.
VIII.A.1.b - Subdirectories of cm_area
VIII.A.1.b.1 - D0FS::USR$ROOT21:[COPYMAN.COPYMAN.LOG]
To get to this directory, simply type...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> cd cm_log
This directory contains log files of tape-copying jobs ( using tape_in ). The format for these log files is <tape VSN>_in.log. If you have completed a tape_in job, look at the log file in this area for any errors corresponding to any particular files. Errors may have to be looked into more closely or acted upon by another utility ( like del_util:all_prefix, del_util:bad_file_take.com, del_util:remove.com, or perhaps [proman.disk_maint]fm_check.com ; all to be described later ) depending upon the circumstances involved.
VIII.A.1.c. - Useful lists in cm_area
VIII.A.1.c.1 - OUTPUT_TAPES.LIST*
These are a series of lists that give some information about 8mm backup-copies that were made. If you ever want to know whether or not an 8mm backup copy of a certain 8mm tape was made, this is the place to look. Lets say you wanted to know if an 8mm backup copy for tape WNF108 existed, you would enter a command like...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> sear output*.list*;* WNF108
VIII.A.2 - D0FS::USR$ROOT21:[COPYMAN.TAPE_CHECK]
Another directory of interest. It contains utilities for tape testing and for creating lists of tapes to be copy-protected, subdirectories of tape dumps, lists of tapes to be tested, lists of tapes that have been tested, etc. To get to it simply enter
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> cd tc_area
Once you are here, try to perform a directory. Notice that there is a lot of stuff here. Don't worry, there are only a few items of real concern. First, some utilities that you should know about.
VIII.A.2.a - Commands in tc_area
VIII.A.2.a.1 - TAPE_CHECK.COM
This utility is very effective with handling mount_errors. Its premise is really simple. If you give it an input list ( see [COPYMAN.TAPE_CHECK.LIST] in section B ) of tapes that are potentially unmountable, the utility will attempt to mount each tape. Records of this job are kept in tc_area:tc_tested_tapes.list ( see section C ) . It is very simple to use, simply enter...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> tape_check tc_list:check_<date>.list
VIII.A.2.a.2 - COPYLIST_MAKER.COM
This is a utility that I wrote to keep up with copy-protecting tapes. As I have mentioned before, frequent copy-protecting will greatly reduce the chance of tapes becoming overwritten. While this will not totally solve the problem of overwriting, this will save you a lot of time in the long-run. Copylist_maker.com should run once per week and send you a list of tapes to be protected. In order to receive the mail, you need to be on the " .dis " list ( see section C ). Every once in a great while, the fnald0 cluster will run into problems and need to re-start itself. However, it will not always know to restart copylist_maker.com. Should you notice that copylist_maker.com is not submitted to run, simply type the following command...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> @start_copylist_maker.com
and copylist_maker.com should run again.
VIII.A.2.a.3 - PROTECT2.COM
This command you will use after you have completed the task of copy-protecting the tapes that copylist_maker.com sent you mail about. The mail that does get sent to you should give you advice on how to execute this command. However, there is always the possibilty that copylist_maker.com will ask you to copy protect some tapes that may not be in the vault area when you arrive. Should this occur, you will want to make a note of which tapes could not be found in the vault and try again next time. I like to use a highliter pen to highlight those tapes which I could not find. When I am " done ", I will edit the <date>_prot2.list list and delete all entries which I was unable to find. Then rename the corrected list as recommended in the mail that you receive, and execute protect2.com .
VIII.A.2.b - Subdirectories of tc_area
VIII.A.2.b.1 - [COPYMAN.TAPE_CHECK.DUMP]
This directory contains output files from tape dump jobs. To get to this directory, simply type...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> cd tc_dump
This directory also has many subdirectories, most for dumps for a specific prefix. For example, if you were to now type...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> down wne
Your default directory would be set to that of tape dumps for tapes begining with the prefix " WNE ". If a tape is marked as " MOUNT ERROR " in tc_area:tc_tested_tapes.list from a tape_check job, it will be sent only to tc_dump. After you evaluate the dump ( see Josh's manual ), it is a good idea to rename the dump file to its appropriate subdirectory.
VIII.A.2.b.2 - [COPYMAN.TAPE_CHECK.LIST]
This directory contains numerous lists of tapes that were potentially unmountable ( mount_error) and sent to tape_check to be test-mounted. The prefered format of naming one of these lists is ...
check_<date>.list
Type out one of the files in this list and you will notice that when making multiple entries to a list it is best to give each tape VSN its own line. When I first started, I made the mistake of seperating entries with commas. Noone told me not to, so I tried, failed, and learned from my mistakes. Only mentioning this to save you some aggravation.
VIII.A.2.b.3 - [COPYMAN.TAPE_CHECK.LOG]
This directory contains log files from tape_check jobs. If a tape shows a mount_error and the dump was unsuccessful, you may want to check the log file to see what happened. Sometimes the operators at FCC will write back saying that the " tape is damaged " or the " tape is not in vault " or something to give you a clue of what to do next. To get to this directory simply type...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> cd tc_log
VIII.A.2.c - Useful lists in tc_area
Before looking at the following lists, are you in the tc_area? If you are not sure, enter the following command...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> show def
If the response is not USR$ROOT21:[COPYMAN.TAPE_CHECK] , change your default directory to tc_area by...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> cd tc_area
and proceed.
VIII.A.2.c.1 - TC_TESTED_TAPES.LIST
This list contains many entries with information about tapes that have been tested with tape_check, or tapes that have been copy-protected. For tapes that were checked by tape_check, their format will look like ...
<tape VSN> - mount <OK or ERROR> : <node> <drive> : <time>
For tapes that were copy protected, their format will look like ...
<tape VSN> - protected : : <time>
VIII.A.2.c.2 - *PROT2.LIST
This list as mentioned before will be sent to protect2.com to mark a list of tapes as being copy protected. Lets use the example of 6-OCT-1996_PROT2.LIST . Once I was finished, I went into the editor to remove some entries that I could not find in the vault. Once that was complete, I performed the following...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> purge/noconfirm *prot2.list
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> ren 6-OCT-1996_PROT2.LIST prot2.list
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> @protect2.com prot2.list
And protect2.com did what it does best, marks the proper tape VSN's as being protected in tc_tested_tapes.list. This may not seem really important to you, but if you do not mark a tape that you copy-protected as " protected " in tc_tested_tapes.list, copylist_maker.com will ask you to copy-protect it again next week. So you might end up searching for tapes in the vault that you already protected and otherwise waste time. Just a fair warning.
VIII.A.2.c.3 - COPYLIST_MAKER.DIS
You will DEFINITELY want to put yourself onto this list. If you do not, you will not be able to receive the mailed list that copylist_maker.com worked so hard to make. Its easy to add yourself to this list...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> edit copylist_maker.dis
and enter your prefered E-mail address. Press " control " and " z " to exit and you should be receiving mail from copylist_maker.com in the near future.
VIII.A.3 - D0FS::USR$ROOT21:[COPYMAN.TAPE_DEL]
This directory is the main area for managing bad tape files, with several subdirectories ( and subdirectories of those subdirectories, etc. ) and can be challenging to find your way around in. I'm going to go against the format that I used above and start with each subdirectory and any important information that they may contain. You'll figure it out!
To get to D0FS::USR$ROOT21:[COPYMAN.TAPE_DEL] , you need only type the following...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> cd del_area
Once you are here, you may want to try and view the number of subdirectories involved. To do that, enter the following...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> dir .dir
You will notice several ( 9 ) subdirectories, but you are most interested in 5 of them. These are...
USR$ROOT21:[COPYMAN.TAPE_DEL.LIST_ALL]
USR$ROOT21:[COPYMAN.TAPE_DEL.LIST_DEL]
USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA]
USR$ROOT21:[COPYMAN.TAPE_DEL.TAPE_MOVED]
and
USR$ROOT21:[COPYMAN.TAPE_DEL.UTIL]
VIII.A.3.a - Subdirectories of del_area
VIII.A.3.a.1 - USR$ROOT21:[COPYMAN.TAPE_DEL.LIST_ALL]
This subdirectory contains lists of tape files in tapes_moved format ( see section 3.a.4 USR$ROOT21:[COPYMAN.TAPE_DEL.TAPE_MOVED] ) as well as the corresponding generic FATMEN filename. These lists are the result of the execution of del_util:ALL_PREFIX_TAPE_DEL.COM . The format for the names of these lists should look like < tape VSN(s) >_<date>.list_all. To get to this directory simply type ...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> cd del_all
VIII.A.3.a.2 - USR$ROOT21:[COPYMAN.TAPE_DEL.LIST_DEL]
This subdirectory also contains output lists created from del_util:ALL_PREFIX_TAPE_DEL.COM. These lists are somewhat different from those of *.list_all. These contain lists of FATMEN filenames that have already been sent to the FM_RM_SERVER to be marked bad ( mediatype 13 ). The format for the names of these lists should look like < tape VSN(s) >_date.list_del_sent . To get to this directory simply type...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> cd del_del
VIII.A.3.a.3 - USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA]
This subdirectory also has many of its own subdirectories ( to come later ). However, within [COPYMAN.TAPE_DEL.MGMT_AREA] there are many lists that you should be interested in. First, to get to this area simply type...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> cd del_mgmt
VIII.A.3.a.3.a - IMPORTANT lists in del_mgmt
VIII.A.3.a.3.a.1 - BAD_FILES.LIST
This list contains entries of bad files from tapes that are only partially bad. These entries should also have been marked bad in FATMEN. The format for each entry should look like...
VSN FSN Filename < date > reason in FATMEN?
Most information to be supplied to each entry can be found from the TAPE_IN log file ( cm_log:<VSN>_in.log ). If the filename has been sent to the FM_RM_SERVER ( to be marked bad ) simply place an " n " into the " in FM " column. You may also add your initials after the " in FM " column, but it is not necessary.
VIII.A.3.a.3.a.2 - BAD_TAPES.LIST
This list contains entries of tapes that are entirely bad for some reason other than UNIX platform problems. These entries should also have been marked bad in FATMEN. The format for each entry should look like...
VSN < creation date > in vault? in FM? problem File type del list
A good number of tapes in this list are physically damaged. If you have removed the bad tape from the vault simply place an " n " into the " in vault " column. The " del list " column can be filled with the " date " assigned to the .list_all or .list_del files given to you by del_util:all_prefix_tape_del.com .
VIII.A.3.a.3.a.3 - SCSI_TAPE_PROB.LIST
This list contains entries of bad tapes that are bad due to SCSI bus resets on the UNIX production farm. Entries should have a format like...
VSN < creation drive > < creation node > error < data > <# files> prep
The creation drive and node are to be taken from PDB ( as mentioned before ). The prep list is only for tapes marked bad in FATMEN. Not all tapes on this list are marked bad in FATMEN. If you find a tape that was written in compressed mode, make a note of it here.
VIII.A.3.a.3.b - Subdirectories of del_mgmt
VIII.A.3.a.3.b.1 - USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA.FILE_DOC]
This subdirectory acts as a main directory to more subdirectories to keep documentation files in. The documentation files " under " file_doc are for documentation of when of bad files from partially bad tapes are to be accounted for. These bad files break down into 4 main categories ( and 4 additional subdirectories ). These being...
VIII.A.3.a.3.b.1.a - USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA.FILE_DOC.LIST_ERROR]
This subdirectory acts as a main directory to more subdirectories for sorting .txt document files into categories by tape prefix ( same principle as tc_dump's subdirectories ) . In the subdirectories, you will find documentation for files whose info was changed in FATMEN by using the FM_CHECK utility in PROMAN ( to be described later ).The best way to find your way around all of these subdirectories is by using 2 utilities written by Krzysztof Genser called " up " and " down ". Here's an example of how they work. Let's say you just marked some files bad in FATMEN and in del_mgmt:bad_files.list. You now want to make a documentation file clearly stating all actions performed on the bad files and you want to write this into the ON subdirectory ( assuming that the bad files were on an ON series tape ) of USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA.FILE_DOC.LIST_ERROR] . Starting at del_mgmt, do the following...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> down file_doc
then...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> down list_error
and then...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> down wne
WHOOPS! I don't want to be in the WNE subdirectory, I want to be in the ON subdirectory. Its simple to fix, simply enter...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> up
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> down on
This is an easy way to move around through directories without logical names assigned to them ( like " tc_area " , " del_mgmt ", etc. ). For some reason, this is not used in PROMAN ( to be described later ), but I suppose that it could be if you really wanted it so.
VIII.A.3.a.3.b.1.b - USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA.FILE_DOC.OPINCOMPL]
This subdirectory also has many subdirectories categorized by tape prefix. These sudirectories are for files with unrecoverable read_errors that you will mostly find out about from pick_events. Opincompl is short for " operation is incomplete ". This is an error that you may find in the log files from the tape_in utility. This error is probably ( I am not absolutely certain ) caused by overwriting of tape files. Of course, copy-protecting tapes on a regular basis should reduce these occurences.
VIII.A.3.a.3.b.1.c - USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA.FILE_DOC.PARITY]
This subdirectory also has many subdirectories categorized by tape prefix. These subdirectories are also for files with unrecoverable read_errors that you will mostly find out about from pick_events. When you notice a parity error in the log file of the tape_in utility, it is usually a good clue the the tape is slightly physically damaged in that particular tape position. The tape as a whole is fine, except for one or more files. There really are not very many ways to fix this problem except hoping that a backup copy is made early enough before a tape becomes damaged. Should you have a tape of marginal ( borderline ) quality, check first in FATMEN to see if a backup DLT copy has been made. Next check to see if an 8mm backup copy exists in cm_area:output_tapes.list* . If neither exists, you can try to use the utility cm_area:tape_out to recover a tape. For this to work though, you need to make certain that you can completely copy a tape ( on a drive of exceptional quality ) without any errors. Then submit the tape_out job. Otherwise, the files are to be marked bad in FATMEN and keep documentation in this area.
VIII.A.3.a.3.b.1.d - USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA.FILE_DOC.TAPE_END]
This subdirectory also contains more subdirectories categorized by tape prefix. These subdirectories are for documentation of when a tape_end error arises. The tape_in utility and especially the fm_check utility ( in PROMAN, to be described later ), should help with the management of this problem. Tape_end errors originate from overwritten tapes. FATMEN thinks that more files should exist on the tape than are actually there. You try to salvage the remaining ( if any ) files on the tape and mark the rest bad in FATMEN.
If it is not clear what action to perform on questionable files, LOOK AT THE .TXT FILES IN THE FILE_DOC AREA for some advice. Within, I have clearly stated what is wrong, why, and what action I took under which circumstances. THESE ARE FOR YOUR REFERENCE, NOT ONLY TO INITIALLY LEARN, BUT KEEP REFERING TO LATER!
VIII.A.3.a.3.b.2 - USR$ROOT21:[COPYMAN.TAPE_DEL.MGMT_AREA.TAPE_DOC]
This subdirectory of del_mgmt has many subdirectories categorized by tape prefix. These subdirectories are for keeping documentation of entire tapes marked bad in FATMEN.
READ THROUGH SOME EXAMPLES OF TAPE_DOC .TXT FILES FOR REFERENCE! You should notice some " trends " with bad tapes and files and how to handle them properly. Also note that I keep hard copies of these .txt files in a binder " just in case ". You may continue to do so if you wish, but I would recommend splitting up these documents into perhaps 2 binders, one for file_doc, the other for tape_doc.
VIII.A.3.a.4 - USR$ROOT21:[COPYMAN.TAPE_DEL.UTIL]
This is a particularly important subdirectory of USR$ROOT21:[COPYMAN.TAPE_DEL] . Within, you will find many utilities to aid you in preparing lists of files to be marked bad in FATMEN. To get to this directory, simply type...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> cd del_util
Once you have del_util as your default directory, try typing...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> dir *.com
You will notice many commands. Only a few are of any real importance to you.
VIII.A.3.a.4.a - comands within USR$ROOT21:[COPYMAN.TAPE_DEL.UTIL]
VIII.A.3.a.4.a.1 -ALL_ PREFIX_TAPE_DEL.COM
This particular command is very handy if you are going to send a list of files to FATMEN from a tape that is ENTIRELY bad. It is simple to use. Simply type...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> all_prefix < tape VSN >
You can type in more than one VSN ( like VSN1, VSN2, ... ), but executing this command on one tape at a time makes for better book-keeping ( in my opinion ). One drawback to this command is that it does not automatically create a log file of the job. Should you need a log file, you will need to submit this command ( you may use a similar format for many, many other commands if you need to customize the output of the command ) with a format like the following...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)>submit/keep/noprint/notify-
_RLOGIN::D0FS(<some d0fs node>::COPYMAN)>/queue=<some d0fs node>_sys$batch-
_RLOGIN::D0FS(<some d0fs node>::COPYMAN)>/log=<your logfile name>.log-
_RLOGIN::D0FS(<some d0fs node>::COPYMAN)>/parameter=(<tape VSN>) -
_RLOGIN::D0FS(<some d0fs node>::COPYMAN)>all_prefix_tape_del.com
NOTE: The "submit" command can be very useful if you do not wish to view the command's execution interactively. This will save on CPU time and you can use the window that you executed the command from to perform other tasks. I reccomend looking at the VMS User's Manual for more details on the "submit" command. Now back to all_prefix...
All_prefix will create 2 lists for your disposal. The first being *.LIST_ALL. The "all " list contains a list of filenames of bad tape files and corresponding FATMEN generic filenames to be marked bad in FATMEN. The second list is *.LIST_DEL. The " del " list contains a list of only FATMEN generic filenames to be sent over to the FM_RM_SERVER ( see Josh's manual, or for the purpose of this document - REMOVE.COM ).
Once all_prefix is done executing, there are some other things to do. First execute the REMOVE.COM command ( more details later ) with the *.LIST_DEL list being the parameter. Next, rename the *.list_all and *.list_del to filenames you specify ( like del_all:wng023.list_all ) into the del_all and del_del subdirectories respectively. To try this and all_prefix has completed, try something like the following...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> rename del_util:_607211123.list_del -
_RLOGIN::D0FS(<some d0fs node>::COPYMAN)>del_del:WNG023_607211123.list_del_sent
Of course you don't have to use tape VSN WNG023, it is merely for example. Use whatever strategy you find helpful for naming files for your book-keeping purposes.
VIII.A.3.a.4.a.2 - BAD_FILE_TAKE.COM
This command will take an input list of filenames of BAD tape files in tapes moved format ( see VIII.A.3.a.5 for more details ) and create a list of FATMEN generic filenames to be marked bad ( mediatype 13 ) in FATMEN. The "tapes-moved" format mentioned above will be discussed in more detail later. However, to execute this utility simply type...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> @bad_file_take.com-
_RLOGIN::D0FS(<some d0fs node>::COPYMAN)>USR$ROOT21:[COPYMAN.TAPE_DEL.TAPE_MOVED]-
_RLOGIN::D0FS(<some d0fs node>::COPYMAN)> <file extension>_error.list
VIII.A.3.a.4.a.3 - REMOVE.COM
This command will take an input list of FATMEN generic filenames and send them to the FM_RM_SERVER to be marked bad ( mediatype 13 ). It is easy to use, once you have created a list of FATMEN filenames to be marked bad in FATMEN ( either by all_prefix, bad_file_take, or sadly, by hand ) type something similar to the following...
RLOGIN::D0FS(<some d0fs node>::COPYMAN)> @remove <your filename>.list_del
and the FM_RM_SERVER should do the rest. You may want to check FATMEN every once in awhile to see that the job was done, but more often than not, it will be done in a reasonable amount of time. A good practice is to keep your *.list_del list's names as short as possible, something like _607211123.list_del. You can always rename the *.list_del file later to something that makes your book-keeping easier.
Josh's manual does mention some other utilities that I have really never used ( never had a need to ). I will mention them, but not give an example. If you feel that you can benefit from these other utilities, you may want to see if Josh has any better documentation outside of his manual...
VIII.A.3.a.4.a.4 - PDB_TAKE_BAD.COM
Apparently, if you give this command a list of good tapes, this procedure creates a list of corresponding FATMEN filenames to be marked bad.
VIII.A.3.a.4.a.5 - PDB_TAPE_CHECK.COM
Apparently, if you give this command a list of tape prefices, it will create a list of files on a tape with the output being in tapes-moved format.
VIII.A.3.a.4.a.6 - TAPE_DEL.FOR
Apparently, this command will output a list of FATMEN filenames if you provide a list of tape files in tapes-moved format. Bear in mind that this command was written in FORTRAN and should be executed differently, refer to the VMS User's Manual for more details.
VIII.A.3.a.5 - USR$ROOT21:[COPYMAN.TAPE_DEL.TAPE_MOVED]
This subdirectory contains lists of tape files in tapes-moved format. I broke down the categories to match the errors involved ( makes for good book-keeping ). You will notice that I started most of these lists with "LEX". Lex is merely my nickname and has no other real significance on the filenames. Try to use this area for creating lists of tape files in tapes-moved format rather than cluttering another directory ( like del_util ). Tapes moved format ( also mentioned in Josh's manual ) looks like...
VSN Filesequence # (A.K.A. - FSN ) DSN Filename
Example> WN8196 1 ALL_077790_43.X_STA01REU1210_ALL00_NONEX00_4051214
Josh does go into some detail in his manual about the FM_RM_SERVER. It makes for nice reading, but I have never used any of the information within. Take some time to read it, I think that it is more for your reference than for practical use.
PROMAN is another fellow to use for other data management purposes. Within his realm are some rather handy utilities that will help keep you on task. Logging on as PROMAN is quite similar to logging on as COPYMAN ( See V.A ) except enter "proman" as the username as opposed to "copyman". This section will also be in a loose outline format and will have notes about anything of particular importance along the way. Good Luck!
D0FS::USR$ROOT21:[PROMAN]
This is the main directory for proman and if you execute a command like...
(some d0fs node) > dir .dir
you will notice about 46 subdirectories attached to just this one directory. Many, many people at D0 utilize PROMAN for a variety of reasons, you only need to know about a few areas of real interest. While you are logged in as PROMAN, I may as well remind you of some things that you can do in PROMAN's realm that you could not do in COPYMAN's.
ACCESSING THE FATMEN CATALOGUE:
getting into the FATMEN catalogue is easy when logged in as PROMAN. Simply type...
(some d0fs node, like D0FSI, D0FSA, D0FSE, or D0FSU) > fm
And to exit the catalogue, simply type...
FM> end
FM> exit
ACCESSING PRODUCTION DATA BASE:
To enter PDB simply type the following commands while logged on as PROMAN...
(some d0fs node, like D0FSI, D0FSA, D0FSE, or D0FSU) > d0setup prod_db
(some d0fs node, like D0FSI, D0FSA, D0FSE, or D0FSU) > sql
SQL> set transaction read only wait;
This may seem repetitive from what I had mentioned before, but I figured that there was no harm in mentioning it again.
Now, let's see more of what we can find in PROMAN's realm.
VIII.B.1 - USR$ROOT21:[PROMAN.DISK_MAINT]
Originally, I think that this area was more or less designed to aid in maintaining files on disk. It also seemed like a good place to keep a utility that I wrote called "FM_CHECK".
VIII.B.1.a - useful commands in [proman.disk_maint]
VIII.B.1.a.1 - DSN_TO_GEN.COM
This command will try to create a list of FATMEN generic filenames ( or at least close to it ) from an input list of DSN filenames.
If you have an entire tape's worth of DSN filenames that you want converted to generic filenames, and a log file of both tape_in and tape_dir jobs exist in COPYMAN's realm, simply execute [proman.alex]textmaker.com ( to be described later ) for the tape VSN in question and you should appreciate the results. otherwise, create a list of DSN filenames ( the first parameter for the command ) by hand and execute ( @ ) this command to get generic FATMEN filenames.
VIII.B.1.a.2 - FM_CHECK.COM
This utility I wrote in an attempt to aid us in the problem of list_error files. If it really does work as well as I hope that it does, it should aid in the management of tape_end error files also. Here I will give a brief summary of how to use fm_check, and what improvements you might consider trying to create on your own. Unfortunately, I am leaving here sooner than I had originally anticipated and am not going to be able to perfect this utility.
The premise is really quite simple, fm_check will extract filenames as they currently ( or most recently ) exist on a tape, and try to see where FATMEN thinks these same filenames should exist. When a discrepancy arises, fm_check will create lists to aid in making appropriate changes to the FATMEN catalogue. Let's take an example of WNE653.
Tape WNE653 was brought to my attention by pick_events in the new error area as a list error tape. FATMEN thought that some files were out of place on this tape. As it turns out, the 5th filename was an exact match with the 6th filename on the tape. For some reason, the spooler got confused and wrote the same filename to the tape twice and did not make note that the 6th file was the same as the 5th file in the FATMEN catalogue. Therefore, if anyone tried to access any files beyond the 5th file using FATMEN as a reference, they were directerd to the wrong file. So I executed fm_check to create a couple of lists that would aid in detecting the problem and fixing it. Here is an example of how to execute FM_CHECK...
(some d0fs node) > cd [proman.disk_maint]
(some d0fs node) >submit/keep/noprint/notify/log=[proman.disk_maint]wne653.log
/queue= d0fsix_batch/parameter=(WNE653) fm_check.com
IMPORTANT: Before executing fm_check.com, certain accomodations must be made. First, make sure that a log file of a tape_in job for the tape that you wish to execute fm_check on exists in COPYMAN's cm_log area. Second, be certain that a log file of a tape_dir job for the same tape exists in COPYMAN's td_log area. Also, be sure that you enter the tape VSN as a parameter and that it is in CAPITALS. I really did not have the time to perfect this utility, and you may consider developing a simpler submission command like SUB_FM_CHECK.COM that asks the user for the tape VSN and executes tape_in (actually, copytape_in) for the tape, and then tape_dir, and does not care whether or not the tape VSN is in CAPS or not, etc. Some other chages need to be made, but I will discuss them along with what output lists are created by FM_CHECK. D0FSIX_BATCH is an ALPHA queue that will complete these jobs MUCH faster than using a D0FS queue. Try to avoid using the alpha queues except when necessary. Last and possibly most important, ONLY SUBMIT FM_CHECK FOR ONE TAPE AT A TIME. I did not code this utility to accept more than one job at a time ( like cm_area:tape_in can do up to 5 tapes at a time ). If you do need to submit fm_check for more than one tape, I used the "/after=" qualifier after the "submit " ( see VMS User's Manual )command and would space these jobs out accordingly. FM_CHECK takes ~ 4 minutes to perform all actions for 1 DSN filename. Therefore, a tape with 15 files on it would take FM_CHECK about an hour to execute on an ALPHA queue. Normal batch queues take much longer. So space apart your jobs accordingly.
VIII.B.1.a.2.a - output lists of FM_CHECK.COM and what to do with them.
VIII.B.1.a.2.a.1 - the FM_CHKD*.LIST
This is the most readable of the lists that FM_CHECK creates. It tells you what the FATMEN generic filename is, its Key Serial Number, its File Sequence Number as it appears on the tape, the FSN that FATMEN thinks that this file should have, the tape VSN, the mediatype, and the DSN filenames as they appear on the tape and in FATMEN. Nice to look at for pinpointing where a problem occurs. Format for the list's filename should look like FM_CHKD_<tape VSN>.LIST .
VIII.B.1.a.2.a.2 - the *.KUMAC file
This file look like a list but is really a series of commands to manipulate FATMEN. When fm_check is done, send this list over to Stan Krzywdzinski and ask him to execute it for you. This should make any necessary changes in FATMEN. You may also want to talk to Stan to see if there is a way that you can have more direct access to FATMEN in order to make these changes yourself. Right now, you do not have the authority to do so, but Stan may consider letting you take care of these chores for him. The format for the *.kumac's name should look like MV_<tape VSN>.KUMAC
VIII.B.1.a.2.a.3 - The PDB*.LIST
This file should have a VERY basic listing that merely has the DSN filename as it appears on the tape, the tape VSN, the FSN as it appears on the tape, and the FSN as FATMEN thinks it should appear. Note that this listing is made only for filenames who have a discrepancy between reality and FATMEN. Stan and I were supposed to work on a command in FORTRAN to help with making similar changes in PDB that we were making in FATMEN. You should really consider helping him out with this. When done, you may send him lists of this type to make the appropriate chages in PDB. The format for the names of these lists should look like PDB_<tape VSN>.list
VIII.B.1.a.2.a.4 - the *.LOG file
If submitted properly, you should have a log file of any fm_check jobs performed that you can refer to should a problem arise later. You may want to incorporate the creation of a log file into the SUB_FM_CHECK.COM command, should you ever write one.
VIII.B.2 - USR$ROOT21:[PROMAN.ALEX]
This is where I did most of the development for FM_CHECK. The only thing of any interest in this area is TEXTMAKER.COM. Let's say that you want a list of both DSN filenames and generic FATMEN filenames from a tape that you have already performed a tape_in and tape_dir job for in COPYMAN's realm. Simply execute commands like the following...
(some d0fs node) > cd usr$root21:[proman.alex]
(some d0fs node) > @textmaker.com < tape VSN >
This command (TEXTMAKER.COM) is executed during the execution of usr$root21:[proman.disk_maint]fm_check.com.
VIII.B.3 - USR$ROOT21:[PROMAN.TAPE_REPORT]
This area has some important lists and has a utility that will aid you in finding some work to do ( namely, labeling & vaulting tapes ).
VIII.B.3.a - Useful commands in usr$root21:[proman.tape_report]
VIII.B.3.a.1 - TAPE_REPORT.COM
Tape_report was written by Lee Lueking to help keep track on current tape writing and vaulting. It's output format is in the form of mail that you should receive about every 3 days. The output has 2 rows, one for vaulted tapes ( marked with a " v " ), and the other for used tapes ( marked with a " u " ). You will want to receive this mail ( to be mentioned later ) and when you see a tape series with less than 100 excess tapes ( vaulted - used < 100 ), you will want to label and vault more tapes ( and sometimes initialize ,see table 2.1 ). If a new series begins either by ending a previous series or just starting a brand new series, you will want to change the " prefix list " within the tape report command file. Simply edit tape_report.com and delete or add any prefixes as necessary.
Sometimes D0FS will run into some problems and have to restart itself, Tape_report usually suffers by not being able to restart itself every 3 days.
VIII.B.3.a.2 - START_TAPE_REPORT.COM
This command will help you restart tape_report if it cannot start itself again. If you suspect that tape_report failed to restart itself or was just acting weird period, you can check its status if you are logged on as PROMAN by entering...
(some d0fs node) > show entry "TapeReport"
Note that this query is case sensitive and should you receive an error message like " no such entry ", execute a command to restart the tape_report process by entering the following...
(some d0fs node) > @start_tape_report.com
and tape_report should run again. By now you should be curious about how to receive tape_report's output. Its realy quite trivial.
VIII.B.3.b - Useful lists, etc. in USR$ROOT21:[PROMAN.TAPE_REPORT]
VIII.B.3.b.1 - TAPE_REPORT.DIS
You will need to be on this distribution list if you want tape_report's output sent to you. To get onto the .dis list, simply edit this file and add your prefered E-mail address ( similar to tc_area:copylist_maker.dis in COPYMAN ).
VIII.B.3.b.2 - TAPE_REPORT.LOG
This is the log file for the most recent tape_report job. Should any problems arise during the execution of tape_report, this is your best place to look to see if any changes should be made to help make tape_report run more effectively.
VIII.B.3.b.3 - EIGHTMM.DAT
This file is a copy of the tapes flat-file at FCC. There are plans of turning this into a database, but no one knows for sure when that will happen. Eightmm.dat updates every time tape_report runs successfully. If you are ever curious if a certain tape VSN exists in the vault catalog, simply enter a command like...
(some d0fs node) > cd [proman.tape_report]
(some d0fs node) > sear eightmm.dat <tape VSN >
And if you get an error like " no strings match ", then chances are that the tape VSN you entered is not in the vault. Tape_report will look through this file to find the most recently vaulted tape of a certain series at the FCC vault. COPYMAN's tc_area:copylist_maker.com also access this file to extract FCC catalog numbers to send to output.
VIII.B.4 - D0FS_USR$DISK21:[PROMAN.COMPARE]
This area I am not very familiar with, but I will try and tell you what I do know. Compare is a command that helps maintain the D0FS disk. If files that are supposed to be on disk are not there, compare will let you know and you should be able to stage files from tape to disk and make any deletions from the disk as needed. Until recently, Bryan Lauer of Iowa State University was taking care of this matter and then it was supposed to be passed onto me. Now it is your turn to try and figure out what should be happening in this area.
NOTE: Rick Passaglia wrote the original COMPARE software and you should try to reach him to find any documentation about how to effectively use compare. In this section, I will go over what notes I do have from Bryan Lauer.
VIII.B.4.a - Commands in D0FS_USR$DISK21:[PROMAN.COMPARE]
VIII.b.4.a.1 - COMPARE.COM, etc.
Before you get started, it is recommended that you log onto D0FSI as both user FMD0 and as PROMAN ( 2 seperate windows ).
Now, if you are logged onto d0fsi as fmd0, type in the following commands...
RLOGIN::D0FS(D0FSI::FMD0)> cd d0fs_usr$disk21:[proman.compare]
RLOGIN::D0FS(D0FSI::FMD0)> dir *.com
You should notice about 10 different command files in this area. Compare.com basically executes all of the other command files in this area.
VIII.B.4.a.2 - Lists in D0FS_USR$DISK21:[PROMAN.COMPARE]
Once per week compare.com will compile some lists that you will be interested in. To have a look at these and you are logged onto d0fsi as fmd0, type in the following commands...
RLOGIN::D0FS(D0FSI::FMD0)> cd d0fs_usr$disk21:[proman.compare]
RLOGIN::D0FS(D0FSI::FMD0)> dir/since=18-OCT-1996
I am merely using the date "18-OCT-1996" for reference. It is one week ago from the time that I am writing this, so for your purposes, use the date that is one week ago from "now". You should now notice about 15 files that have a date-stamp at the begining of each filename ( like 961019 ). There are only a few files that you are most interested in ( to the best of my knowledge ). These are...
D0FS_USR$DISK21:[PROMAN.COMPARE]*_DST_FAT.LIST_SPOOL_8MM
D0FS_USR$DISK21:[PROMAN.COMPARE]*_DST_DSK.LIST_DELETE
D0FS_USR$DISK21:[PROMAN.COMPARE]*_DST_FAT.LIST_RM
and
D0FS_USR$DISK21:[PROMAN.COMPARE]*_DST_FAT.LIST_SPOOL_DLT
The first file will have a list of events that must be spooled off of a tape and back onto disk. The format of this list will look like ( with a corresponding example )...
Tape VSN FSN DSN Filename Mediatype
WNE202 68 ALL_051977M01.X_MDS01REU1119_ALL00_TEJXX20_4072022 5
These files will need to be staged back to disk. To do this you must first be logged on as PROMAN on D0FSI and do the following...
D0FSI> set def spli_area
D0FSI> dir *.list
D0FSI> ty/p spool_in_project.list
The fourth item on this list is spooler 8mmrec, you will use this spooler to spool tape files back onto disk. Now, type the following ( still in PROMAN's window )...
D0FSI> show sym spli*
Notice the symbol "SPLI_TAPE_ADD". This command will enable you to spool 8mm tape files back onto disk. The following is an example of how to spool files from tape WNE202 back onto disk...
D0FSI> spli_tape_add wne202 d0fs_usr$disk21:[proman.compare]-
_D0FSI> 961019_dst_fat.list_spool_8mm 8mmrec
If there are files from more than one tape to be spooled back onto disk in this file, you must enter a new command for each tape, replacing the old tape VSN with the new ( ex: replace "WNE202" with "WNG028" in the above example ) VSN. The command will also ask you to confirm what you want to spool and simply type in "y" when prompted.
The second file of interest ( *dst_dsk.list_delete ) is really a command file to delete any disk files located in *dst_dsk.list_excess. Its execution is really quite simple,
D0FSI> @D0FS_USR$DISK21:[PROMAN.COMAPARE]961019_DST_DSK.LIST_DELETE
The third file of interest ( *dst_fat.list_rm )has never had any entries as long as I have looked over this chore. If you do see anything here, my guess is that you may have to send this list over to Stan Krzywdzinski and ask him to "exec" it in FATMEN.
The fourth file of interest ( *dst_fat.list_spool_dlt ) is similar to the 8mm list to be spooled. Treat this list in a very similar fashion except use " dltrec " as the spooler instead of " 8mmrec ".
Compare.com will execute itself once per week and if for some reason it ever does stop, go into editor mode for the command and near the end you will notice a "submit" command that should help you start it back up again.
This is really about all that I know or have guessed about this matter. I STRONGLY recommend contacting Rick Passaglia in order to try and find some documentation about how to use these commands/files.
VIII.B.5 - USR$DISK21:[PROMAN.SPOOL_IN]
This directory has the command for spooling tapes onto disk as described above in VIII.B.4.a.2 . There are many other commands in this area, but SPOOL_IN_ADDTAPE.COM is really the only command that you need to be aware of.
VIII.B.6 - FATMEN_ROOT:[FMD0.TAPEMAN]
Although this area is not a part of the PROMAN realm, it is probably best if you know about it anyways. To get to this directory and you are logged on as PROMAN, simply type in the following...
(some d0fs node) > cd tm_area
This area contains lists that Copyman's tc_area:copylist_maker.com uses to extract info about a certain prefix.
VIII.B.7 - FATMEN_ROOT:[FMD0.TAPEMAN.LOG]
Another directory that Copyman's tc_area:copylist_maker.com uses to extract info about a certain prefix. This directory and the one above are most easily accessed as PROMAN, therefore I am mentioning them here instead of in the COPYMAN section.
Hopefully by now you will have a fairly good idea of what is expected of you. There is really only one more chapter follwing this appendix to give some friendly advice. I really did not accomplish all that I wanted to while here at D0. Therefore the next chapter should give you some hints as to where to start on making your job even easier.
( Why yes, there is still work to be done )
Although I may have been here a long time, that does not mean that I accomplished all that I had really wanted to. Shortly after I arrived at D0, Lee Lueking requested that I make my job " as easy as the push of a button ". Sure, I may have written some utilities to make my job a little easier, but by no means am I anywhere near the completion of the task that Lee had assigned to me. Now, this duty falls into your hands. This chapter will cover some ideas that I have had, but was never able to act upon. Also, there will be some advice should you choose to start doing some programming while here at D0. Good Luck!
IX.A - Some Ideas That I Had...
There really is no format to this section, just some ideas that I had while here at D0 that I will put past you to think about...
E-mail is your friend. If you write any utilities that take a long time to compile and don't want to forget to check the output later after the utility has finished, insert a "mail" command somewhere near the end of the command code. This will remind you to act upon what the computer worked so hard to put together for you. An example would be PROMAN's tape_report or COPYMAN's copylist_maker.com.
The ability to properly delegate responsibility is the key to any successful executive. In other words, have someone that makes even less money than you do as much of your work as possible. Example; labeling and/or copy-protecting tapes. It won't kill you to do it yourself, but having someone else do it for you is quite a luxury.
Mounting tapes except when necessary will lead to premature damage. Try to keep your tape mounts down to a minimum. Tapes can only take a finite number of reading/writing sessions before they fall apart. If you notice a tape is of marginal quality, make a backup AS SOON AS POSSIBLE!
Try writing a utility that will check pick_events for you about once/week and send you anything that it finds that is new ( within the last week ). Some tricky "directory" commands and "mail" commands should help you out.
Try writing a utility that will act on the output of above by category. In other words, extract VSN's from the output of pulling a directory from the mount_errors in pick_events and write these VSN's to a list and submit it to copyman's tc_area:tape_check.com. Then have it E-Mail you the results when done ( "mount ERROR" or "mount OK")
If that utility works, write a utility to mail you the filenames of any dump files that resulted from a "mount ERROR" above so that you may check mount errors faster than before.
Try similar utilities for other errors in pick_events except after searching for read, list, or tape_end errors. Have the utility submit jobs to batch to perform both tape_in and tape_dir jobs for each tape found in pick_events ( for read, list or tape_end errors only ).
If this works, write another utility that will automatically send potential list_error or tape_end error tapes to FM_CHECK and send you the results later. One potential problem to consider is the tape_end error tape that fails after the first file. These have more than likely been re-initialized and should be dumped and check the creation date on the tape and compare to PDB. A discrepancy = a re-initialized tape that should be marked bad in FATMEN. Copy-protecting tapes when you ought to should reduce this occurence.
For read error tapes, consider a utility that will search the log file of the tape_in jobs for the word "ERROR" extract the DSN filename attached to the error, and create a tapes-moved list ready to send to FATMEN. HOWEVER, I do not recommend a utility that will automatically send these lists to be marked bad in FATMEN without your approval. It may be best to send yourself the tapes-moved lists and look through the corresponding tape_in log files to see that error files are genuinely un-readable.
If any command files that have already been set up for you can use any improvement - DO IT! This will save you some aggravation in the long run.
If you want to write a utility from scratch and can ever "borrow" some ideas from someone else's utility - DO IT! Copyman's copylist_maker.com is based very heavily upon Proman's tape_report. This is a collaboration, simply ask the creator of the command that you want to borrow from if you may use their utility as reference, and should they say "yes" ( I've never had anyone say "No" ), use at your will. A little research will save you oodles of time. Einstein wasted 3 years in front of a chalkboard trying to come up with the Lorentz transformations on his own because he was too full of pride to go to the library and do some research!
You may want to start a list in Copyman's del_mgmt area for list errors exclusively that you made changes in FATMEN for. A good format to follow would be like...
VSN FSN ( on tape ) FSN ( as thought by FATMEN ) DSN filename FATMEN changed (y/n)
Execute the command backup_list_maker.com about once every few months or so. This command resides in Copyman's tc_area. Simply type @tc_area:backup_list_maker.com while logged on as copyman and you should receive some mail when it finishes. if there are no tapes on the output list, don't do anything. If something does come up, follow the instructions at the top of the mail that you receive.
While logged on as copyman, do a command like...
D0FS:COPYMAN> dir [copyman...]*.com*
...to help find any really nice commands that maybe were not mentioned here. Try something similar for proman. Search for FORTRAN code too by *.for instead of *.com .
PDB is a very tricky database to find your way around in. I recommend that you take a decent amount of time to find your way around in PDB. Just about everything that you should want to know about data produced at D0 should be in PDB, if you know where to look. Do not be afraid to perhaps ask StanKrzywdzinski for some advice about where to look for records in FATMEN or PDB.
Try your best to make certain that tapes are copy protected every week. If you cannot do it, try to find someone else who can. This really saves the experiment more money in the long run than you might think!
Above all - BE CHEAP!
BEST OF LUCK TO YOU. IT IS MY SINCEREST HOPE THAT THIS MANUAL HAS (AND WILL) SERVED YOU WELL. GOOD LUCK AND TAKE CARE OF YOURSELF!
Regards,
Alexander F. Burkert