qmanager.doc: Description of the unix based TRANSP queue manager. ================================================================== Contents: 0. Introduction. 1. Common File system. 2. The Run Queue. 3. User Interface. a) Command Interface b) GUI 4. Configuration Database. 5. The Queue Server. 6. The Compute Servers. 7. The "master daemon". Appendices: A. Automated Code Maintenance B. Troubleshooting Hints C. Miscellaneous Notes D. Possible Future Developments 0. Introduction. ================= The `qmanager' system provides an efficient mechanism for facilitating the sharing of N TRANSP compute server machines amongst M TRANSP users. The qmanager is a fundamental component of a unix based multi-user TRANSP run production system. Its use in a single user TRANSP code development system is optional, but may prove convenient. The system can be thought of as consisting of: * Users: create input data, generate run requests, examine results. * Queue Server: gathers and processes user run requests, sending runs in an orderly way to compute servers as these become available. * Compute Servers: the machines where the TRANSP runs are actually carried out. A more detailed list of qmanager system components follows: 1. Common File System: the queue server, compute servers, and user machines need to share a common file system, with consistent naming of directories, in order for the communications between machines required for orderly queueing of runs to function properly. 2. The Run Queue: A directory, writable by all users, where run queue requests are posted. The TRANSP system software also uses this directory to post changes to the status of runs, as these occur. Runs are generally executed in the order queued, subject to availability of compute servers. Users can jump the queue by asserting a high priority for their runs, but, they must post an explanation of their action. Run ownership is tracked, but beyond this security is minimal, as an honor system is presumed to exist amongst the TRANSP users. 3. User Interface and user machines: A set of user interface commands allows users to view the status of all jobs in the TRANSP queue, to add runs to the queue, to remove runs from the queue, to examine partial output from an executing run, and to terminate an executing run. Users can use "staging directories" on their own machines, provided the queue server can have read access to the staging directory (to copy out the namelist when time comes to execute the user's run). Typically, each user sets up namelists and input data in a staging directory chosen by the user. The user posts a run request when the input data is fully prepared, by means of qmanager's `enqueue' command. All user interface functions can also be accessed via the Tcl/Tk based GUI application `xlauncher'. To find xlauncher, add $CODESYSDIR/wish to the user's PATH environment variable. 4. Configuration Database: A set of data files which characterize the available TRANSP compute server machines. The database contains such information as a ranking of the computer servers by speed, the number of simultaneously executing TRANSP runs allowed on each machine, a list of machines that are off-line e.g. due to hardware trouble, and access control data by which certain machines can be reserved to certain users. The configuration database is maintained by a "TRANSP system administrator" and is not normally directly accessible to users. 5. The Queue Server: One machine (which may or may not also be one of the compute servers), with access to the run queue and the configuration database, serves the queue by actually dispatching runs and related requests to the compute server machines. The queue server runs out of the `master daemon' job which is regularly and frequently scheduled (i.e. once every five minutes). 6. The Compute Servers: The machines where TRANSP jobs are actually executed. The compute servers execute scripts which look for run requests and launch runs. The run control scripts provide for the generation of files containing data which allow the progress of runs to be monitored. 7. The Master Daemon: A control script that must run on every machine that is to function as a queue server and/or a compute server and/or a build server that carries out code maintenance functions. Traditionally, master.daemon is scheduled once every 5 minutes as a `cron' job. The master.daemon job will automatically restart TRANSP jobs interrupted by a system crash or scheduled downtime. Systems administration can rely on this and so reboot TRANSP machines more or less at will without concern about losing jobs. Each of these components will be described in detail. 1. Common File System. ======================= A common file system needs to exist in order for the queue manager software to operate correctly in the task of carrying user requests from user machines to the TRANSP compute server machines. Although the TRANSP system's directory structure is suggestive of ways to set up this common file system, the actual task of producing the common file system functionality will be up to the systems administration of the site where TRANSP is installed for production use. At PPPL, we use NFS for directory sharing, but there are other options. For those familiar with the standard set of environment variables used to identify standard TRANSP system subdirectories, the following table may help in the task of understanding which directories should be machine local, and which directories should be shared, in an installed TRANSP production system. There are actually three categories to be considered: local -- local to the machine or workstation in question shared -- common directory shared by all machines architecture -- for binaries: either local to each machine, or, preferably, shared amongst binary compatible machines. environment variable translation or link description category ------------------------------------------------------------------------- TRANSPROOT root directory root directory local TMPDIR $TRANSPROOT/tmp temporary files local REQUESTDIR $TRANSPROOT/request server requests local DBGDIR $TRANSPROOT/debug debug work area local WORKDIR $TRANSPROOT/work root, work directories local ...... $TRANSPROOT/daemon daemon scripts dir. shared DATADIR $TRANSPROOT/data input data root dir. shared LOGDIR $TRANSPROOT/log log files root dir. shared RESULTDIR $TRANSPROOT/result results data root dir. shared CONFIGDIR $TRANSPROOT/config configuration data shared CODESYSDIR $TRANSPROOT/codesys source code root dir. shared QSHARE $TRANSPROOT/qshare user requests and run state files [QSHARE is writable by all TRANSP users] shared LOCAL $TRANSPROOT/local code binaries architecture SIGTABLDIR $TRANSPROOT/sigtabl atomic physics tables architecture Generally, the shared directories appear as links to NFS exported filesystems on all machines except perhaps those machines to which the shared directories are local. The TRANSP queue server and compute servers must have all of the above defined, with read/write access. At PPPL we have invented a username `pshare', under which all TRANSP production servers operate. User accounts need read access to all directories in the "shared" and "architecture" categories. In addition, write access to $QSHARE and to $DATADIR subdirectories will be needed, and, each user needs a writable $TMPDIR though it should probably be defined as something other than a subdirectory of $TRANSPROOT. The following paragraphs give a brief description of the various TRANSP system directories. The TRANSP source code, including shell scripts for the queue manager, are installed under a root directory identified by the environment variable $CODESYSDIR. Users wishing to examine the TRANSP source code will need read access. The local configuration of the TRANSP production system (the configuration database) will be defined by a directory indicated by the environment variable $CONFIGDIR. With these environment variables, the user will be able to see: $CODESYSDIR/source... (the TRANSP source code) $CODESYSDIR/qcsh (queue manager shell scripts) $CONFIGDIR/TOKAMAK.DAT (file containing list of known tokamaks) and other files and directories. (The $CODESYSDIR notation will be used in this document; $CODESYSDIR indicates "the value of the environment variable CODESYSDIR"). The queue server machine will use the "configuration database" ($CONFIGDIR) which defines the compute servers and their TRANSPROOTs. The configuration database will also specify the machines responsible for once per night runs of make files to keep the various sets of code binaries up-to-date under the various instances of $LOCAL (at least one per architecture). The TRANSP code writes its output to subdirectories of a root directory for the TRANSP output directory tree, referred to as $RESULTDIR. Users have read access to $RESULTDIR so that they can see the results of completed TRANSP runs-- not just their own runs but all users' runs. The subdirectories of $RESULTDIR are named from strings, consisting of a known tokamak id, a dot, and a two digit shot year code, for example: $RESULTDIR/D3D.97 (analysis of D3D 1997 shots) $RESULTDIR/TFTR.88 (analysis of TFTR 1988 shots). If the production system is heavily used, it is likely that automated archival and retrieval of TRANSP results will be needed (note that this implies a need to keep track of all existing runids; see the description of the ENQUEUE_EXIST procedure below, under the user interface section). If the Ufiles system is used for providing input data to TRANSP, the input data for TRANSP runs should be visible to all users under a common, shared root directory $DATADIR. (If MDS+ is used instead of Ufiles, data will be accessed via a call to the MDS+ server, and the $DATADIR tree is no longer needed. However, the user must still supply the input data, and that is the hard part!). Under $DATADIR there should exist a subdirectory for each tokamak, e.g.: $DATADIR/D3D (root directory for D3D TRANSP input data) $DATADIR/TFTR (root directory for TFTR TRANSP input data). Under these roots the organization of the data into further subdirectories is site-determined. (Here are some strategies that have been used: a separate subdirectory for each shot, a separate subdirectory for each related group of shots, a separate subdirectory for each worker involved in data preparation). The TRANSP namelist (TRDAT section) will set the INPUTDIR character variable to the appropriate subdirectory under $DATADIR/. If INPUTDIR is defaulted, the tokamak root subdirectory itself is expected to contain the Ufiles. $DATADIR needs to be writable by all users, so that users may put their TRANSP Ufile input data in place. Subdirectories might be owned by individual users. Other organizations of input data (not using $DATADIR) are also possible. TRANSP executable binaries are stored in what TRANSP itself knows as $LOCAL/exe where $LOCAL has a different value depending on the machine architecture (i.e. one value of $LOCAL for DEC UNIX machines, another for SUN, and another for HP). Users will need to be able to find the TRANSP executable binaries directory in order to be able to run useful programs such as qmonitor and trdat. The user should be able to see the correct binaries from whichever machine he/she is using. The user's PATH environment variable should be set accordingly, to find these programs. The log file output of TRANSP runs are stored in the directory $LOGDIR/runlog where $LOGDIR points to the shared root directory for log files, which should be readable by all TRANSP users. Users will use "staging directories" of their own choosing for preparation of TRANSP namelists. These working directories could be chosen to be sub- directories of $DATADIR, although normally only one user should be using any given staging directory. The user's staging directory must be readable by the queue server machine, as the queue server will need to copy files out of the staging directory when it is time to submit the user's run. The work directories of the TRANSP compute server must be visible and writable from the queue server machine, as the queue server will need to copy in the namelist files and write a file instructing the compute server to carry out the user's run. The run's temporary files will also be written in the compute server's work directories, in the course of normal run execution. And, finally, the queue server, compute server, and users will need to share write access to directory indicated by the value of the environment variable $QSHARE where the run queue itself is stored, as a set of files. $QSHARE will contain files with information on the state of runs, i.e. queued, running, aborted, or successfully completed, as described in more detail in the next section. The following sequence summarizes use of this filesystem as a series of steps for creation of a TRANSP run under the shared queue manager system: 1. User prepares input data. Ufiles are placed in a selected subdirectory of $DATADIR. The user places the namelist in a "staging directory". 2. From the staging directory, the user executes a command to enqueue a TRANSP run. This causes a file to be written in $QSHARE which will be interpreted by the queue server as a run request. The file includes the path back to the user's staging directory, a date-time stamp, a priority specification, and optional run parameters. A file is also written giving the ownership of the run. 3. The queue server processes run requests found in $QSHARE, forwarding the requests (and copying files) to compute servers as these become available. The run state changes from "queued" to "submitted". When the compute server succeeds in starting a run, its state changes to "running". 4. The compute server carries out the run. When the run job terminates, the compute server modifies the state file in $QSHARE to indicate normal successful completion, or, abnormal termination, of the run. On normal successful completion, the run output files are copied to the appropriate $RESULTDIR subdirectory, the log file is copied to the $LOGDIR/runlog directory, and the compute server's workspace is cleaned up, readying the server for the next run. In the case of an abnormal termination, all files are left in place to allow expert diagnosis and debugging of the failure of automatic processing. When a run terminates (successfully or with problems), Email is sent to the run's owner. 5. Within 24 hours of successful completion, the queue server removes the residual run files from $QSHARE. 6. The user can monitor run progress with the `qmonitor' program. 2. The Run Queue. ================== The shared run queue is implemented as a collection of files in the shared writable directory $QSHARE. The general form of filenames in $QSHARE is: _., where: is a TRANSP runid, such as 94388A07, is a valid tokamak id, such as D3D or TFTR, and indicates the filetype; each filetype has a particular meaning to the queue manager software. The supported filetypes are: .owner -- specifies the owner of the run. contains: owner name .queue -- indicates a requested run, and contains details thereon. contains: staging directory path date-time queued priority additional parameters (for queue server or compute server). .submit -- run has been submitted to a compute server, but is not yet executing. contains: (same as .active file) .active -- run is currently executing, compute server is specified. contains: compute server name date-time started .stopped -- run has been stopped (by user or by abnormal termination), file contents specifies compute server. contains: compute server name date-time halted "user" or "system"; "user" if user requested halt .success -- run has completed successfully. contains: compute server name date-time completed .look -- request for an advance peek at output of an incomplete run. contains: name of person requesting the peek. dates and processing information .halt -- request to (prematurely) halt a currently executing run. contents: dates and processing information .archive -- request to (prematurely) halt and archive an executing run. contents: dates and processing information .cancel -- request to (prematurely) halt and remove an executing run. contents: dates and processing information The queue server can submit a run to a compute server, changing the run's state from queued to active. The compute server can post here that a run has halted abnormally, or completed successfully. The queue server will eventually remove files associated with a successfully completed run from the $QSHARE directory, so that $QSHARE will mostly contain information about requested or currently active runs. The files themselves contain small amounts of data, as needed. For example, the .queue file contains the path to the user's staging directory, the date, time and priority setting associated with the run request, and additional optional parameters. 3. User Interface. =================== Creating a TRANSP run requires completion of three tasks: * creation of a namelist * creation of input data (Ufiles or MDS+) * submission of a run request. The user creates the namelist in a work directory which is accessible to TRANSP's queue server. The namelist contains information specifying MDS+ data input, or, pointing to the directory containing the Ufiles. The Ufiles directory must be visible from all TRANSP compute servers. The user might choose to create the namelist in the same directory with the Ufiles. Creation of namelist and Ufiles input is covered elsewhere in the TRANSP documentation. What the user needs to be aware of for purposes of using the queue manager is that (1) the namelist file needs to be placed in a directory accessible to the TRANSP queue server machine, and (2) the Ufile data needs to be in a directory accessible to any TRANSP compute server machine. These requirements are covered in the section on the shared filesystem, above. Operating within these parameters, the user can use the `xtranspin' program as a graphical interface to the namelist; the `trdat' program can be used to check the namelist for errors, and to re-examine the input data. The following section describes a command interface which allows users to access the necessary functionality. A Tcl/Tk GUI application `xlauncher' can also be used; it is described in the subsequent section. Command Oriented User Interface: -------------------------------- The queue manager's user interface is concerned with the submission of an actual TRANSP run request, once the TRANSP namelist and input data are ready. The interface commands are summarized as follows: notation: -- run id string, e.g. 94276A16 [args...] -- optional arguments (explained below). command summary: qmonitor -- show the status of the TRANSP run queue and/or list the available TRANSP compute servers. This is an interactive program. Arguments are interpreted as "type-ahead" input, so that non-interactive forms such as "qmonitor rq" and "qmonitor lq" can be used. enqueue [args...] -- enqueue a TRANSP run, specifying the tokamak-shot-year string and giving hand entered comments on the run. -- user's current working directory must contain the run's namelist (but see notes on ENQDIR environment variable, below). requeue [args...] -- requeue a TRANSP run (tokamak-shot-year and run comments have already been given and are not to be changed). -- `requeue lrs' relinks and restarts an aborted run `requeue rs' restarts an aborted run (no relink); this usually only causes the run's abort state to recur, but the "lrs" option can be useful if a code repair has been done. -- `requeue' commands must be issued from the same working directory from which the original `enqueue' was issued. (but see notes on ENQDIR environment variable, below). dequeue -- remove a run from the queue (before it starts executing). tr_look -- preduce a scratch set of run output, allowing a peek at run output prior to normal completion of the run. The scratch output dataset is placed in the directory $RESULTDIR/scratch. The tr_look command can be issued by a non-owner of . The following commands are available for error recovery, and should be used with great care, as they are irreversible. These can only be used by 's owner: tr_cancel -- delete a run from the queue (after it has started executing). This is usually done to correct a mistake; the run may have crashed. The run is halted, if necessary, and all run- specific files are removed from the queue server and the compute server; the compute server is freed for processing the next queued run request. tr_archive -- archive a run, after execution has started but before normal completion. This is usually done to archive a partial run that crashed before reaching the end of the simulation. If issued for a run that is still executing, the run is halted first. Run files are removed from the compute server, freeing it to process the next queued run request. tr_halt -- force an abnormal termination of a run, but do not archive or delete its files. The run files continue to take up space on the compute server, which may prevent that server from being available for a subsequent run. The run can be restarted later. Many commands result in processing that can lead to the generation of Email. The user receives Email whenever any of the following events occur: -> a user's run completes successfully. -> a user's run terminates abnormally. -> a user's tr_look request is completed. -> a user's tr_cancel, tr_archive, or tr_halt request is completed. If a user request "vanishes without a trace", this indicates a problem (see the appendix on trouble shooting). The `enqueue' command is interactive-- the user will be expected to supply a tokamak-shot-year code identifying the run's destination output directory (e.g. "D3D.97" for D3D runs based on 1997 shots). Also, the user will be placed into an editor session in which comments describing the run may be entered. Environment variables affect the `enqueue' interaction, as described, below. The `enqueue' and `requeue' commands accept additional arguments, which will be processed by the enqueue/requeue script, or passed on to the queue manager. Any arguments passed to the queue manager but not processed there are passed on to the TRANSP job itself. Arguments processed by enqueue/requeue: --------------------------------------- priority (n a number between 1 and 8). This asserts a priority number for the job being enqueued. Jobs with higher priority numbers are processed first. However, to request a priority greater than the default (priority = 5), the user must supply comments justifying the high priority request; these comments are separate from the comments describing the run itself, and they are readable by other users using an option of the `qmonitor' program. Arguments processed by the queue manager: ----------------------------------------- top (m a number between 1 and the number of compute servers) The user can request the job to be queued to the named compute server (only), when that machine becomes available. Alternatively, the user may request that only one of the "m top (i.e. fastest) machines" be used to service the run. These options restrict the choice of compute server, but do not otherwise give the run any priority advantage in the queueing process. Arguments processed by the TRANSP run script: --------------------------------------------- Although there are several arguments understood by the TRANSP run script, these are not normally set by the user. The only option that is sometimes used is: lrs (load and restart) which can sometimes be used to restart an aborted run, after a bugfix has been installed into the TRANSP source code. The name of the compute server on which the aborted run resides (visible with qmonitor) must also be given. Examples: enqueue 12345A07 priority 7 top 2 -- enqueue the run at raised priority, to run only on one of the two fastest compute servers. The user will be required to give comments justifying the heightened run priority. requeue 12345A07 -- requeue the run at normal priority, eligible for any compute server. requeue 12345A07 lrs -- requeue the run for load and restart on the same machine where the run was originally started. Environment Variables. ---------------------- Environment variables need to be set in order to allow the user to access the queue manager, and to customize its behaviour. To access the queue manager user interface, the user's PATH needs to be modified. The interface consists mostly of executable shell scripts. It is made available by placing the TRANSP directory $CODESYSDIR/qcsh in the user's PATH environment variable. If the user's PATH includes $CODESYSDIR/csh, the qcsh directory should come first, so as not to access a more primitive `enqueue' command implemented for standalone TRANSP running without a queue manager. In order for the queue manager to function properly, all users need to be able to access a shared writable directory, as indicated by the environment variable QSHARE, i.e. QSHARE = . As this resource must be shared by all TRANSP users at a given site, it might be appropriate to set this up at the system level. In order for the qmonitor program to function properly, it needs to be able to find the queue manager configuration database. This should be set in the environment variable CONFIGDIR, i.e. CONFIGDIR = . So, the user sees the same files under $CONFIGDIR as the TRANSP compute servers and queue server. The following is useful for preventing a situation where a user or multiple users create two TRANSP runs with the same runid: It is generally advisable to set up a site-specific procedure for determining, given and , whether the named has already been used. The idea is to protect the user from inadvertantly reusing a which another user might already have chosen for a run of his own. The default procedure simply looks in the appropriate $RESULTDIR subdirectory for the named run. This would not be sufficient at a site where runs are tracked and archived off-line. In this case, the site needs to invent a procedure which accepts the arguments and returns a status code TRUE (1) if the run already exists, and FALSE (0) otherwise. The name of the procedure is set in the user environment variable ENQUEUE_EXIST. For example, if the shared executable script /usr/trsys/shared/dupcheck were set up for this purpose, the environment variable ENQUEUE_EXIST = /usr/trsys/shared/dupcheck would be set. Then, if the user performed an `enqueue 12345A99' and specified results output subdirectory (tokamak-shot-year) "D3D.97", then the `dupcheck' script would be activated with arguments "12345A99 D3D.97". If the user always processes TRANSP runs on the same tokamak device, the environment variable ENQUEUE_TOKYY can be set: ENQUEUE_TOKYY = (e.g. "D3D") or even ENQUEUE_TOKYY = (e.g. "D3D.97") which serves as a hint to the enqueue script. If this environment variable is not set, the enqueue script will use (a) a prior enqueue-ing of the same runid, or (b) the current work directory string, by looking for subdirectory names containing a valid tokamak-id. The valid tokamak-id strings are as given in the first column of $CONFIGDIR/TOKAMAK.DAT. Users are expected to use a staging directory (or multiple staging directories) for preparation of namelists. The `enqueue' and `requeue' commands normally have to be issued with the user's current working directory already set to this staging directory. However, if the user defines the environment variable ENQDIR = Then, as a convenience, `enqueue' and `requeue' will cd to this directory first. However, if the user employs multiple staging directories then he/she should not define ENQDIR, and must cd to the desired staging directory, containing the run's namelist, prior to giving an `enqueue' or `requeue' command. Finally, the user can define which editor (e.g. "emacs" or "vi") to use when comments are required. The enqueue script follows the following procedure: (1) if the ENQUEUE_EDITOR environment variable is defined, use this as the editor command, otherwise (2) if the EDITOR environment variable is defined, use this, otherwise (3) use "vi". Graphical User Interface: ------------------------- (New 29 Jan 1998). A new directory, $CODESYSDIR/wish has been added to the TRANSP source code distribution. This contains an executable Tcl/Tk script, `xlauncher', which provides a GUI that allows user point and click access to the queue manager commands just described. A simple setup procedure is required to make effective use of `xlauncher'. 1) Add $CODESYSDIR/wish to the PATH environment variable, creating the xlauncher command. 2) Identify the paths to one or more staging directories to be used for preparation of TRANSP runs. The user should create at least one staging directory, or perhaps one staging directory "per project". 3) Run `xlauncher' and use the options under the `definitions' menu to enter the necessary information on staging directories. This will create commands under the `applications' menu which will take the user to the staging directory and then allow the user to select runs, modify namelists, examine data, enqueue runs, monitor runs, terminate runs, etc. The GUI is built on top of the command interface, uses the same shell scripts, and provides the same functionality as the command interface, but with a point and click control interface. 4. Configuration Database. =========================== An interactive program, `configdb', can be used by the TRANSP "system administrator" (not an ordinary user) to modify the contents of the TRANSP system's configuration database, which contains such information as: * list of known tokamak-ids. * a list of compute servers in descending order of speed, giving each server's TRANSPROOT, run-capacity (# of simultaneously executing runs), and disk-capacity (# of simultaneously executing runs + aborted runs). * a list of compute servers that are off-line temporarily due to hardware trouble. * a list of access restrictions placed on compute servers. * the queue server node, where the queue manager runs. Also the queue server runs TRANSP's "makefile generator" once per 24 hour period. * the list of nodes where nightly "builds" are run, exactly one node for each instance of LOCAL, as described in the section on the common filesystem. The generated makefiles are executed, causing software to be recompiled and reloaded in response to source code changes. The `configdb' program is meant to be self-explanatory. However, it is probably worth making some comments about compute server access restriction options. Compute servers can be reserved to an OR list of users or tokamak-id's, by giving a list of names. For example, reserving node HYDRA to names "jojo D3D" allows runs by user jojo **or** any D3D run to execute on HYDRA. Alternatively, certain users or tokamak-id's can be denied access to a compute server. The reservation string "~TFTR ~cpuhog" for compute server USCWS3 states that anyone except "cpuhog" can make a TRANSP run on server USCWS3, as long as it is not a TFTR run. Finally, these "reservation" and "denial" clauses can be mixed: the string "D3D PBXM ~cpuhog" would require the run to be either a D3D or PBXM run, and it would prohibit any access by user "cpuhog". Some caution should be exercised, since combinations may exist which end up denying access to all users. 5. The Queue Server. ===================== The `qserver' script and `qmanager' program carry out the servicing of the run queues. The qmanager program analyzes the queue and generates a script which will copy the necessary information to the compute server request area. The copied files represent requests to the compute servers; the following types of requests are possible: $REQUESTDIR/_.queue -- execute a TRANSP run $REQUESTDIR/_.look -- "trlook" a TRANSP run, i.e. put in $RESULTDIR/ an output dataset for a TRANSP run that has not yet completed execution $REQUESTDIR/_.halt -- halt (abort) a TRANSP run $REQUESTDIR/_.archive-- abort and archive a TRANSP run $REQUESTDIR/_.cancel -- abort and delete a TRANSP run The compute server script `cserver' processes From there, the compute server takes over (see the next section). The queue server will also schedule nightly code maintenance jobs. By default, these jobs run at 1 am every night. To select a different hour, define the BUILDHOUR environment variable at login time, e.g. setenv BUILDHOUR 4 to specify that code maintenance jobs should run at 4 am. Code maintenance "build" jobs are controlled from the `bserver' script. For further details see appendix A. The queue server also carries out "cleanup" actions, to remove from $QSHARE information on runs that completed on the previous day or earlier. The $LOGDIR directory tree is also subject to "cleanup" actions. By default run log files that are older than 30 days are deleted. To change the number days set the environment variable LOGFILE_RESIDENCY, e.g. setenv LOGFILE_RESIDENCY 14 to specify a policy to remove files older than 14 days. The scripts `qserver', `bserver', and `cserver' will not run automatically unless the "master daemon" script is scheduled for the system cron daemon. See section 7, below. 6. Compute Servers. ==================== The compute servers have responsibility for actual execution of user requests: normal run requests, tr_look requests, tr_archive requests, tr_halt requests, and tr_cancel requests. These services are implemented by periodic execution of the `cserver' script under the "master daemon". The `cserver' script should *only* be run out of master.daemon, which takes responsibility for preventing multiple simultaneous executions of the script (which would have unpredictable consequences). The `cserver' script goes to considerable lengths to notify users by Email if problems are detected. For example, if cserver tries to start a queued run which contains an error in the namelist or input data such that the run cannot be started, cserver will dequeue the run and send an Email to the user. Similarly, cserver will notify the user if it detects contradictory requests, e.g. a simultaneous tr_archive and tr_cancel request on a given run (the tr_archive request wins). If user requests "disappears without a trace" (i.e. it is not executed and no Email is generated) this indicates some kind of system problem. See the appendix on troubleshooting. 7. The "master daemon". ======================== The queue server, compute server, and build server scripts are all run out of the shared script $TRANSPROOT/daemon/master.daemon. This script contains "locking mechanisms" to prevent multiple parallel execution of scripts that could e.g. lead to redundant launchings of a TRANSP run. At the same time, the script also checks against "hanging locks" that can be created on the occasion of system crashes. These mechanisms are thought to be reasonably reliable, as they have been use in the PPPL TRANSP run production system for several years. The master.daemon also runs a "reboot daemon" which will restart TRANSP runs interrupted by a system crash. Use the `crontab' command to schedule master.daemon for regular execution. This should be done on *all* machines being used as TRANSP servers. The `crontab' command creates crontab table entries; each table entry contains scheduling information and an sh command specifying the scheduled action. Here is a typical master.daemon crontab entry at PPPL: 0,5,10,15,20,25,30,35,40,45,50,55 * * * * /mount/transp0/pshare/daemon/master.daemon >/mount/transp0/pshare/work/log/master.cron.log This schedules master.daemon to run once every 5 minutes. The more important scripts executed out of the master.daemon are: restart_check -- restart TRANSP runs (i.e. after a system crash). cserver interrupt -- carry out tr_look, tr_halt, etc. bserver setup -- misc. build server related functions. bserver execute -- run makefile generator; run makefiles. cserver start -- launch transp runs. qserver -- (queue server node only) service the queue. Note: master.daemon is launched by the system cron daemon, out of `sh'. The transp account's .login script is **not** automatically source'd by cron. Therefore, master.daemon supports various ways to define its environment. The proper method of definition will vary from site to site. For further information, see the first few lines of the $TRANSPROOT/daemon/master.daemon script, or seek help (Email dmccune@pppl.gov). A. Automated Code Maintenance. =============================== Automated code maintenance eases the installation burden, if local code changes are made. Basically, the code's makefiles get checked, regenerated if necessary, and executed once every 24 hours. This means that if a change is installed in the common source code, but built in and tested only on compute server `A', then, the change will also be built in on all other compute servers within 24 hours. The queue server will run the makefile generator and then request makefile execution on each "build server" defined in the configuration database. Each build server has responsibility for a separate set of code binaries. B. Troubleshooting Hints. ========================== The basic method of troubleshooting amounts to examination of log files. Message output from scripts is routed to log files as follows: general: $LOGDIR/_error.log ... "unusual" messages from the build server or compute server scripts compute server (general): $LOGDIR/_master.log ... record of each TRANSP job launched on . $LOGDIR/_reboot.log ... record of runs restarted (i.e. after reboot) on . $LOGDIR/_archive.log ... tr_archive requests processed on . $LOGDIR/_cancel.log ... tr_cancel requests processed on . $LOGDIR/_halt.log ... tr_halt requests processed on . This includes "implied" requests due to tr_archive or tr_cancel. $LOGDIR/_trlook.log ... tr_look requests processed on . Output file translator job messages are included here. compute server (specific runs): $LOGDIR/runlog/_trdat.log ... logfile of preprocessing (trdat, link) for run $LOGDIR/runlog/_tr.log ... logfile of TRANSP execution and post-processing for run . (note that the queue server removes these files after 30 days or the number of days indicated in the environment variable LOGFILE_RESIDENCY). build server: $LOGDIR/_chktok.log ... log of jobs to check work subdirectories against list of tokamaks. $LOGDIR//_checkmake.log ... log of checkmake job on most recent instance of . checkmake looks for changes in the source code and reruns the makefile generator on all components containing changes. $LOGDIR//_build.log ... log of build job on most recent instance of , on the indicated which matches the on which checkmake was run. $LOGDIR//_build2.log ... log of build job on most recent instance of , on the indicated which is different from the on which checkmake was run. Errors detected during build jobs are compressed out of the log files (using the errfilter program) and sent as Email to the address indicated in the file $CONFIGDB/csh_mail.address, or, if this file is not found, the address used is the output of the `whoami` command on the build server. queue server: $LOGDIR/qserver.log ... "unusual" messages from the queue server. $LOGDIR/qlog/.log ... contains summary qmanager output for the indicated month (most recent year only). A date-time stamp is output each time the qmanager program is run. If in addition the qmanager decides on action (i.e. to submit a run) or detects an abnormal condition (e.g. a compute server is inaccessible), these actions or conditions are reported. $LOGDIR/qlog/thisweek/.log $LOGDIR/qlog/lastweek/.log more detailed qmanager outputs on a day by day basis; only the current week and preceding week's output are retained. Suggestions for improving the logging system? Email dmccune@pppl.gov (dmc 19 Dec 1997). C. Miscellaneous Notes... ========================== 1. The qcsh scripts assume an ability to send Email to users, using only the output of the `whoami` command as an address. It is up to local system administration to assure that such Email gets sent to the user in an appropriate way. That is, if the queue server wants to send user "harry" an Email, this should not arrive in a local Email file on the queue server machine, which "harry" will never check. Email should be routed to a central mail server and forwarded in a sensible way. 2. The TRANSP compute server machines also need to be able to send mail to users knowning only their username. The qcsh scripts record the output of the `whoami` command in the ".owner" file for each run; this data is subsequently used by the TRANSP queue server and compute server machines. Email will be sent in case of error, and when the TRANSP run is completed. 3. Email will also be sent when an error occurs during execution of a TRANSP build script. This Email is sent to the address contained in the file $CONFIGDIR/csh_mail.address. If this file is not found, the output of `whoami` in the TRANSP account (the account running the master.daemon) is used for the address. D. Possible Future Development. ================================ The queue management system creates a superstructure that allows for further automation of TRANSP related tasks. An example of a task that could be automated (but has not been as of 19 Dec 1997) is the regular scheduling of machines to be "on-line" or "off-line" for TRANSP batch processing, depending on the time-of-day and/or day-of-the-week. Jobs executing when the machine goes off-line could be suspended, to be restarted when the machine goes back on-line. Additional automation ideas are welcome. Send Email to dmccune@pppl.gov.