MOP Technical Design Note 0001 Anzar Afaq, Shafqat Aziz, Peter Couvares, Alan DeSmet, and Greg Graham 1. Introduction MOP is a component which adds value to the CMS Distributed Processing Environment (DPE) by packaging production processing jobs into submit packages in DAGMAN format and by handling submission of those packages to DAGMAN using Condor-G as a back end. As a result, MOP is using the DAGMAN format of the submit files to impose explicit structure on CMS DPE jobs, is using the recovery mechanisms of DAGMAN to add fault tolerance to CMS DPE jobs, and is using Condor-G as a submission agent to geographically dispersed compute resources running globus job managers. 2. Existing Architecture MOP consists of a single main Python file, mop_submitter.py, with several smaller unremarkable helper modules. These together define classes and functions needed to package existing CMS DPE jobs into DAGMAN submit files. MOP imposes a simple series of steps on the jobs: (1) Stage-in, (2) Run, (3) Stage-out, and (4) Cleanup. Each step is represented as a DAG-node in a small, 4-node linear DAG. (This DAG structure will be referred to as the "micro-DAG.") MOP user wrapper scripts to implement each step. The wrapper scripts are constructed from templates at invocation time and stored locally at the MOP-master site and referenced in DAGMAN submit files. At submit time, the scripts are sent to one of many possible MOP-slave sites depending upon command line argument. Before the actual submit, MOP uses a local database at the MOP-master site to resolve any site dependencies in the MOP wrapper scripts; these are resolved by specification of environment variables in the DAGMAN submit files. If many MOP-jobs are defined at the same time, jobs are saved in a directory at the MOP-master site and then grouped together into a "group-DAG" (or macro-DAG) before submission. In this way, MOP itself only needs to be installed at the "master" site; jobs coming out of the master site Condor-G gateway are Globus jobs. The remote sites only need to be running globus job managers. 3. Existing Interface Front-end: The existing MOP front-end interface basically accepts the name of a locally resident executable and optionally a list of locally resident helper files. The command line also accepts a specification of the remote site. Optionally, the front-end accepts a publish option which will force the creation of a fifth DAG node in the micro-DAG to do GDMP publish. Also, the command line accepts an option to specify the group dir which is where the macro-DAG will be constructed. Back-end: MOP produces a structure of scripts locally and a DAGMAN submit script. MOP maintains a local database of remote site configurations at the MOP master site. MOP uses globus-url-copy to fetch input data and helper files from the MOP master site, and it uses globus-url-copy to stage output data back to the master site. globus-url-copy has been known to hang, to instances of this command are wrapped in the fault tolerant shell (ftsh). 4. Proposed Changes to Architecture and Interface A. MOP is doing three things atomically: packaging CMS DPE jobs in DAGMAN format, resolving site local configurations, and job submission. We propose to factor MOP into distinct components for these three tasks. B. MOP makes simple micro-DAGs suitable for single steps in CMS Monte Carlo Production. We propose to alter the creation of DAGs to accept multiple steps in CMS production. (We call these "mini-DAGs.) This means that the mini-DAG will be a simple DAG of micro-DAGs. The micro-DAG is generally used to process one input data source through one step of CMS monte carlo production. While one can stuff the run node of the micro-DAG with multiple steps, this may not be the best way to optimize the job. Rather, we would like MOP to support the possibilty of modelling the multi-step jobs as a DAG of micro-DAGs: hence the mini-DAG. The macro-DAG then becomes a group of mini-DAGs. At this point in time, we make the simplifying assumption that the mini-DAGs will execute on one host. We would like to build the opportuinity that the micro-DAGs may migrate to more than one host in the future; but this means that DAGMAN must have the capability of knowing whether or not a file needs to be transferred. C. globus-url-copy, DAP, third party transfer, replica catalogues, planners ? 5. Use Cases for MOP Use Case: Job Submission |Identifier |UC#jobsubmit | |Goals in |Send Job to Grid Computing Resources | |Context | | |Actors |User | |Triggers |Decision to submit job | |Included Use |Specify program; Dataset Access; | |Cases | | |Specialised Use | | |Cases | | |Pre-conditions |User or Program logged into Grid; | | |Needed Datasets available on network; | |Post-conditions |Program is run. Any files specified as "valuable | | |output" are available for further use or retrieval; | |Basic Flow |User specifies job information: | | |Environment needed (hardware and software, can be any);| | | | | |Any Grid input dataset needed[2]; | | |Any local input files needed; | | |The program to be executed; | | |Any output files which should not be deleted; | | |Optionally job attributes in the form of key=value | | |pairs to be set in the job catalogue; | | |EXTENSION POINTS: (steer submission), (resource | | |estimation), (environment modification); | | |User submits description to job submission command; | | |The job catalogue is updated | | |Job executes; | | |Upon completion, system optionally notifies user; | |Devious Flow(s) |Grid input files not found; local input files not | | |found; program not found or not executable; output | | |files don't exist after program ends; no matching | | |resources; user does not have sufficient permission or | | |resources; User program crashes; | |Importance and |Basic job. High frequency and importance. | |Frequency | | |Additional | | |Requirements | | Use Case: Job output access or retrieval |Identifier |UC#joboutaccess | |Goals in |Retrieve output of a job | |Context | | |Actors |User | |Triggers |Need to monitor job progress | |Included Use |Job submission; Grid login | |Cases | | |Specialised Use | | |Cases | | |Pre-conditions |User logged into Grid; | | |User knows the identifier of a job running; | |Post-conditions |Information on the files produced in the job space | | |local to the execution site is retrieved. Selected | | |files are retrieved or browsed. | |Basic Flow |User specifies job identifier; | | |User submits a query to list the content of the job | | |local storage | | |System returns a list of files in the job local | | |storage; | | |User submits a query to retrieve one or more of these | | |files; | | |System returns the files on local user storage (in this| | |case meaning on the computer from which the user | | |executes this use case); | |Devious Flow(s) |No job is associated with the identifier; User has no | | |right to access the information | |Importance and |High frequency and importance. | |Frequency | | |Additional | | |Requirements | | Use Case: Error recovery for failed production jobs |Identifier |UC#jobrecov | |Goals in |Stop a production that is known to fail | |Context | | |Actors |User, Production manager | |Triggers |Failure of a job in a production for reasons that will | | |lead the whole production to fail. | |Included Use |Job submission | |Cases | | |Specialised Use | | |Cases | | |Pre-conditions |A production job has been submitted according to a | | |corresponding Use Case and one of the subjobs fails. | |Post-conditions | | |Basic Flow |The middleware sends information about crashed or | | |aborted jobs along with diagnostic information. | | |The production manager can take action the following | | |actions: | | |Delete the entire production job; | | |Delete the subset of this job that has been submitted | | |to the faulty site | | |Migrate the subset submitted to the faulty site, | | |resubmitting at a different site | | |The system releases the resources freed (if any) by the| | |preceding action; | |Devious Flow(s) | | |Importance and | | |Frequency | | |Additional |Points b) and c) require to associate an Id with a list| |Requirements |of jobs, e.g. those submitted to one site, to perform | | |global operations. Given this Id, the job management is| | |implemented by other use cases. | | |It must be possible to associate an error report with a| | |(high-level) production job, i.e. with the set of | | |subjobs. | Use Case: Job control |Identifier |UC#jobcont | |Goals in |Perform management or control functions on a job | |Context | | |Actors |User, production manager; | |Triggers |Need to change the current status of a job | |Included Use | | |Cases | | |Specialised Use | | |Cases | | |Pre-conditions |A job has been submitted; | | |User has enough privileges to perform specified | | |actions; | |Post-conditions |Specified action is performed; | |Basic Flow |The user specifies the action and an identifier for a | | |job or a set of jobs; | | |The user submits the request to the system; | | |The system performs the action; | | |The system returns a message with the result of the | | |action and any additional information; | |Devious Flow(s) |Invalid job identifier; | | |Operation not recognised; | | |User is not owner or has insufficient rights; | |Importance and |High importance. High frequency, people make lots of | |Frequency |mistakes. | |Additional |The following actions must be possible for jobs in | |Requirements |queues: | | |Cancel the job; | | |Change the job's priority; | | |Reroute the job to a different queue compatible with | | |its parameters; | | |Hold a job in the queue; | | |Resume a job that has been held; | | |The following actions are desirable for jobs in queues:| | | | | |Reroute the job to a specific Computing Element; | | |Change (some of) the parameters with which it has been | | |submitted; | | |The following actions must be possible for a running | | |job: | | |Kill the job retrieving the local files; | | |Kill the job without retrieving the local files; | | |Resubmit the job (if the job is running this may mean | | |killing the current running instance); | | |Suspend a job; | | |Resume a job | | |The following actions would be nice to have for a | | |running job; | | |Checkpoint a job; | | |Move a job to another location; | | |This accepts also composite jobs, and we need a way to | | |refer to individual component jobs. Composite jobs also| | |have unique job ids, and there is a tool to provide | | |info on subjobs. | Use Case: STEER Job Submission |Identifier |UC#jobsteer | |Goals in |Send Job to Constrained Grid Computing Resource | |Context | | |Actors |User; Submission Daemon; User Program | |Triggers |Desire to direct jobs to specific sites | |Included Use |UC#jobsubmit | |Cases | | |Specialised Use | | |Cases | | |Pre-conditions |User or Program logged into Grid; | | |Preconditions for UC#jobsubmit | |Post-conditions |Behaviour of basic job submission is influenced at | | |extension point (steer submission) | |Basic Flow |User specifies steering criteria. Following are | | |acceptable: | | |Specific computing server ID | | |Proximity to specific storage server | | |Select from available computing servers only those for | | |which user-specified String is found in run-time | | |environment published by Computing server | | |Alternate resource broker (experiment-provided) | | |Plugin cost function module for WMS broker (if plugin | | |provided) | | |System confirms reception of valid criteria | |Devious Flow(s) |Criteria format not recognized; alternate resource | | |broker not responding; cost function plugin module | | |crashes | |Importance and |Unknown importance. Expect moderate frequency, users | |Frequency |who wish to use "known" computing resource or do not | | |trust resource broker. | |Additional | | |Requirements | | Use Case: Job Resource Estimation |Identifier |UC#jobresest | |Goals in |Provide estimate of resources needed for job; assist | |Context |resource broker | |Actors |User; Submission Daemon; User Program | |Triggers |Expectation of significant resource consumption | |Included Use |UC#jobsubmit | |Cases | | |Specialised Use | | |Cases | | |Pre-conditions |User or Program logged into Grid; | | |Preconditions for UC#jobsubmit | |Post-conditions |Resource broker has estimates of various resources | | |needed to execute basic job submission; influences | | |UC#jobsubmit at extension point (resource estimation) | |Basic Flow |User provides resource estimates. Following are | | |acceptable: | | |Estimated CPU time | | |Estimated memory usage | | |Upper limit on CPU time needed | | |Local disk space (scratch) needed | | |System confirms reception of valid resource estimate | | |description | |Devious Flow(s) |Estimate format not recognized; | |Importance and |Moderate importance. Expect moderate frequency, likely | |Frequency |used for most production job submissions. | |Additional | | |Requirements | | Use Case: Job Environment Modification |Identifier |UC#jobenvspec | |Goals in |Modify or add to job execution environment; supply | |Context |needed environment variables | |Actors |User; Submission Daemon; User Program | |Triggers |User program needs specific environment variables | |Included Use |UC#JobSubmit | |Cases | | |Specialised Use | | |Cases | | |Pre-conditions |User or Program logged into Grid; | | |Preconditions for UC#JobSubmit | |Post-conditions |Executing program on Grid will have access to variables| | |via standard variant of "getenv" system; influences | | |UC#JobSubmit at extension point (environment | | |modification) | |Basic Flow |User provides list of environment variables and their | | |values | |Devious Flow(s) |Specification format not recognized; | |Importance and |High importance. Expect high frequency. | |Frequency | | |Additional | | |Requirements | | Use Case: Job Splitting |Identifier |UC#jobsplit | |Goals in |Distribute Jobs over multiple CPUs or sites | |Context | | |Actors |User; Submission Daemon; User Program | |Triggers |Program consumes enough resources to make splitting | | |advantageous | |Included Use |UC#jobsubmit | |Cases | | |Specialised Use | | |Cases | | |Pre-conditions |User or Program logged into Grid; | | |Preconditions for UC#jobsubmit | |Post-conditions |Same as for UC#jobsubmit, except that job will have to | | |run faster than on a single CPU. Output should be | | |identical[3] to that produced by running program on | | |single worker node. The job identifier returned should | | |be recognized by other Grid tools as attached to | | |multiple running instances of the program. | |Basic Flow |To be decided. | |Devious Flow(s) | | |Importance and |High importance. Distribution is one of the most | |Frequency |important motivations for HEP on the Grid. | |Additional |Notes: Most of what this use case originally said was | |Requirements |already discussed under use case UC#analysis1, where | | |job input specification via selection criteria was | | |discussed. However in that use case, the question of | | |job partitioning was not discussed. Job partitioning | | |should be a separate use case, since input | | |specification (UC#analysis1) is logically different | | |from job partitioning. One real point of contact is | | |that a selection-criterion input probably results in | | |events from many different logical files, which makes | | |partitioning perhaps more attractive from a cost | | |perspective. An alternative is to make it an extension | | |of the basic job submission. Suggest leaving this use | | |case for discussion in round 2. | Use Case: Production Job |Identifier |UC#jobprod | |Goals in |Produce large quantity of official data product | |Context | | |Actors |Production Manager; Submission Daemon; | |Triggers |Official decision on maturity of analysis program, | | |conditions, calibrations | |Included Use |UC#dstran | |Cases | | |Specialised Use | | |Cases | | |Pre-conditions |Preconditions for UC#dstran | |Post-conditions |Output dataset(s) physically located on at least one | | |Grid storage element; Output dataset(s) registered in | | |DS metadata catalogue. Job identifier returned should | | |refer to entire production job (expected that job | | |splitting will occur so there will be multiple | | |instances of the program) | |Basic Flow |To be decided. Not obvious how production job is | | |different from dstran+ job splitting. | |Devious Flow(s) |See included use cases. | |Importance and |High importance. Frequency several times per year per | |Frequency |VO. | |Additional |Notes: need to integrate software publishing; | |Requirements |UC#jobprod will likely use registered versions of the | | |software, specifying "Logical Program Names" analogous | | |to Logical File Names. The output, if file-based, | | |should be generated such that a) all output file names | | |are unique, b) output names may be constructed | | |according to an experiment-defined recipe, and c) some | | |sort of listing or database containing the complete | | |list of output LDNs must be available. | 6. References A. Mop Proposal, Jim Amundson B. Fault Tolerant Shell, Doug Thain C. DAP