Edited by AR 20.07.2002 ############################################################################## # Tools to run GEANT3 simulation production as a Virtual Data Derivation # ############################################################################## You can find here a tar file with a complete ATLAS GEANT3 simulation production toolkit. This setup is able to run on any Linux system without an additional software (you will need input data, however). All production done with this script should first be registered in NOVA VDC database. Go to http://atlassw1.phy.bnl.gov/NOVA/VDC/phpMyAdmin/index.php3 (standard ATLAS web pw) and click on Dataset icon in the upper left corner to see the existing dataset descriptions. If you need other datasets please contact nevski@bnl.gov or vaniachine@anl.gov for new entry registration. When the production parameters are taken from the NOVA VDC we garantee the correctness of the simulated data. You have to perform the following steps in order to get production system running: step 1: Installation ======= - Create a production directory which will keep all codes needed for program running (you will need minimum 120 Mb there). - Create a run directory (the one where from you will start jobs submission) We recommend it be different from the production directory to keep generated data separated from the original production codes. - Un-tar the distribution tar-file directly from CERN (if you have afs) or from your local copy into the production directory. You can also download the tar-file from http://www.usatlas.bnl.gov/~nevski/localcache - Copy example of the production script "atlsim_prod.job" from the production directory into run directory. This step may look like: cd somewhere mkdir prod run tar xfvz /afs/cern.ch/user/n/nevski/public/adist/3.2.1.tz -C prod cp prod/atlsim_prod.job run/atlsim_prod.job step 2: Customizing ======= Now you have to customize your copy of the production script "atlsim_prod.job" in your "run" directory. Leave the original script as it is - it can be used as a test to generated 50 short (2 events per job) test jobs with known input (Higgs to 4e) and output. We recommend you to create a different copy for each new dataset you are simulating. 1) You have to select a DataSet from the existing database entries, i.e.: - dataset simul_001000 is a test sample. The input for this dataset is distributed in the same tar-file with the toolkit, so that you can run few test jobs with it immediately after step 1 is done. 2) Describe your storage layout in the environment variables. - PRODDIR: directory where you have the atlsim production kit installed, (mandatory) i.e. "somewhere/prod" as used in STEP 1 of the example above. - RUNDIR: directory which contains this customized production scrip (mandatory) and where from you will submit batch jobs, i.e. "somewhere/run" as used in STEP 1 of the example above. - INPUTDIR: directory (tree) where you will input EVGEN files or links (optional) pointing to their real places. If undefined, RUNDIR will be searched for existing input files. If requested input file is not found, the script will try to stagein new input If INPUTDIR is undefined, stageing is done in the job working directory - LOGDIR: where you want to keep production log files (optional, by default they will be created in the RUNDIR) - JOBDIR: where you want to keep simulation output files (2 Mb per event is needed on everage). If undefined, job working directory will be used to keep zebra and histogram outputs. 3) In addition, when your site has a mass storage system (CASTOR, HPSS, etc), you can activate the post-production output archiving by defining in addition: - STORE: mass storage for output files to be copied using RFIO - SPARE: reserve area for data saving in case of STORE failure or absence - HPSSIN: mass storage accessible using stage_in script (stagein at CERN) More examples are exposed as comments in the original atlsim_prod.job. step 3: Testing ======= If you want to run test job interactively type the command: cd run emacs atlsim_prod.job replace "PRODDIR `pwd`" by "PRODDIR yourpath/prod" atlsim_prod.job This will create a dataset "simul_001000" which points to the input data file "dc1.001000.evgen.0001.test.pythia_100h_4e.zebra" in the directory "pythia_100h_4e", is provided with the same distribution. This file contains Higgs events with the mass of 100 GeV decaying into four electrons in GENZ format generated with PYTHIA event generator. To start batch job at CERN on LSF: bsub -q 8nh atlsim_prod.job To monitor the job use: bjobs or xlsf To start batch job at Lyon on BQS: qsub -l t=138:00:00,M=256MB,hpss,platform=betaLINUX,scratch=2000mb -V -q T atlsim_prod.job To monitor the job use: qjob and qcat d) If the test job runs normally you will find the logfile in simul_001000/atlas.0001.log and simulated events and histograms in simul_001000/atlas.0001.zebra simul_001000/atlas.0001.his step 4: Running production. ======= The only action required is to put the correct dataset name into your production script and to submit batch jobs. The correctness of the production is checked by inspection of the logfiles in the $LOGDIR and the test of the histograms in $STORE/dataset/his and reconstruction of events in $STORE/dataset/zebra # few comments - For the moment we support only 3.2.1 distributive, so no need to bother with signature parameter. - if you let us add your site description in NOVA VDC database we can provide more support for you in the future. Otherwise you can always use "default" site parameter. ############################################################################ For all question please contact Pavel Nevski (nevski@bnl.gov) 15-july-2002