Index


Use Case Analysis

Use cases for analysis are enumerated here. They are divided into currently supported use cases and use cases that are reasonable but not available. The use cases can be broken into components. These components are listed and linked to instructions on how to execute each of the components.

top


Supported Use Cases

top

Unsupported Use Cases

top

Use Case Examples

top


Reduction of Use Cases to their Components

Table of Contents

Read

1.1 From SAM cache
1.1.1 From local SAM cache
1.1.2 From remote SAM cache or enstore (via the local SAM cache)
1.2 Local non-SAM transfer, with file unknown to SAM
1.2.1 From local disk
1.2.2 From scratch dCache
1.3 Local non-SAM transfer, with file known to SAM
1.3.1 From local disk
1.3.2 From scratch dCache
top

Write

2.1 To SAM
2.1.1 Write to Enstore (FNAL)
2.1.2 Write to permanent disk location
2.2 Non-SAM transfer giving SAM the location information
2.2.1 Write to scratch dCache (with file location declared to SAM
2.2.2 Write to local disk (with file location declared to SAM
2.3 Non-SAM transfer, with no info to SAM
2.3.1 Write to scratch dCachek
2.3.2 Write to local disk
top

1 Read

1.1 From SAM cache
1.1.1 From local SAM cache
1.1.2 From remote SAM cache
1.2 Local non-SAM transfer, with file unknown to SAM
1.2.1 From local disk
1.2.2 From scratch dCache
1.3 Local non-SAM transfer, with file known to SAM
1.3.1 From local disk
1.3.2 From scratch dCache

top

2 Write

2.1 To SAM
2.1.1 Write to Enstore (FNAL)
2.1.2 Write to permanent disk location

2.2 Non-SAM transfer giving SAM the location information

For this case, you can obtain the information on the location and names of files, and use that to just create a list that you put into your tcl file yourself and read from disk. These are the necessary steps:


2.2.1 Write to scratch dCache (with file location declared to SAM)

2.2.2 Write to local disk (with file location declared to SAM)
2.3 Non-SAM transfer, with no info to SAM

2.3.1 Write to scratch dCache

2.3.2 Write to local disk with cataloging in SAM

top

Basic Test Exercise for Storing Files to Tape on the FNAL CDFEN Robot

Let's get started with a test. The goal is to store a file from fcdfdata016.fnal.gov with 1 event.

Here is what to do:

Once this works, you can try to store from your remote location.

Note that this is a TEST area we are writing to. The proper way is to use the procedures that code so that handle automatic creation of directories in pfns space so that the --dest argument above will in fact be some kind of dummy argument that tells the code to pick the location.

The file tested here cannot be read back. For the complete testing of writing and reading, you should refer to the instructions below.

top


Metadata HowTo: Automated generation of metadata for CDF storage Version

The easiest way to store a file for CDF is to use a script from DHMods This calls the sam store commands decribed in the basic store test but it also harvests all the metadata from the file and your enviroment so that you don't have to write a metadata file like that described in the basic metadata creation instructions or the advanced description of adding metadata.

The command is evoked by checking out the DHMods package from the cdfsoft enviroment. A help commands is provided:

/DHMods/bin/samStoreCdfFile -help.
which gives
Store regular CDF files in SAM
Usage: samStoreCdfFile 
 possible options are:
  --help                   - this message
  --file=            - full name of the file to be stored - mandatory
  --dataset=      - CDF dataset assigned to the file - mandatory
  --pnfs=       - file destination - mandatory
  --station=  - local SAM station, may be set via
                             $SAM_STATION - mandatory
  --host=        - hostname for the local SAM station, may be
                             set via $SAM_HOST_NAME - mandatory
  --description= - any description of the file
  --html=  - reference to the description of data
  --storeoptions= - auxiliary options to be forwarded to the
                             "sam store" command
  --rename                 - rename file according CDF convention
  -v                       - verbose output
  -t

This will store a file according to a dataset id you provide. To obtain the destination, you need to run a separate script. where-to-store.sh

top


MetaData HowTo: Basic Version

When you store a file with SAM, you store some data with it in order to find it later. A filename is unique in all places and times in SAM so if you choose "test" for your filename, you are likely to find you get an error. Therefore you should choose a filename that is unique -- usually by adding the unix time stamp and your station name or location. You should never use the filename with wildcards to search for files: that is what the metadata are used for.

The minimalist version of a metadata file contains the program name, version, nubmer of events, time produced, your name, where it was produced, the type of run, the group, stream and some descriptive text that may be the same for your private dataset. You can also choose your own "cdf dataset" name -- but beware that you choose something that someone else won't also pick or you will mix up your files with them. (This will be made impossible in a few months, but right now, it is a problem.) So putting your kerberos principle as the first part of any cdf dataset name is a good idea. You can also put a reference to some web page where additional information on the dataset is kept. Finally, if you are storing Monte Carlo files, you should NOT use the cdf dataset at the time of the file storage, but rather at the time of the Monte Carlo Request.

An example of this is shown here; while it says Monte Carlo and Generator, this is only a temporary kludge that you must use for all files. Use this anyway for a real file or ntuple. Also, the parameters chosen are for Monte Carlo -- they should be declared in the Monte Carlo Request. They are left here for illustration that parameters can be anything.

from import_classes import *

appfamily=AppFamily('generator', '1.00', 'generator')
filename = 'rs-1ev-test-031106-1910.root'
t = SAMMCFile(filename,Events(1, 2, 2),
"generated",
appfamily,
"01/21/2003 10:59:09",
"01/21/2003 11:20:08",
18,
{
'Global':
{ 'ProducedByName':'mrenna',
  'OriginName':'fermilab',
  'Phase':'unspecified',
  'FacilityName':'fixed-target-farm',
  'ProducedForName':'mrenna',
  'RunType':'Monte Carlo',
  'GroupName':'cdf',
  'Stream':'m', 
  'Description':'test mc',
},
'CDF':
{ 'DataSet':'stink2',
  'html':'http://cepa.fnal.gov/personal/mrenna/',
}
'Generated' :{
'AppFamily':'generator',
'FirstEvent':'1',
'AppVersion':'1.00',
'LastEvent':'2',
'NumRecords':'2',
'AppName':'generator',
'TotalEvents':'2',
'RunNumber':54321,}
}
)
For a Monte Carlo file, the request system is working so use this. Note the use of "requestid" in the "global" category:
from import_classes import *

appfamily=AppFamily('generator', '1.00', 'generator')
filename = 'rs-1ev-test-031106-1910.root'
t = SAMMCFile(filename,Events(1, 2, 2),
"generated",
appfamily,
"01/21/2003 10:59:09",
"01/21/2003 11:20:08",
18,
{
'Global':
{ 'ProducedByName':'mrenna',
  'OriginName':'fermilab',
  'Phase':'unspecified',
  'FacilityName':'fixed-target-farm',
  'ProducedForName':'mrenna',
  'RunType':'Monte Carlo',
  'GroupName':'cdf',
  'Stream':'m', 
  'Description':'test mc',
  'requestid':'27'
},
'CDF':
{ 'DataSet':'stink2',
  'html':'http://cepa.fnal.gov/personal/mrenna/',
},
'Pythia':
{ 'cdfrelease':'testpycdfrelease',
      'collider':'testpycollider', 
    'comments':'testpycomments',
    'decaytable':'testpydecaytable',
    'energy':'testpyenergy', 
    'et_jet_cut':'testpyet_jet_cut', 
    'fact_scale':'testpyfact_scale', 
    'lamqcd5':'testpylamqcd5', 
    'numrecords':'testpynumrecords', 
    'partons':'testpypartons',
    'pdf':'testpypdf', 
    'physicsprocess':'testpyphysicsprocess',
    'picobarns':'testpypicobarns',    
    'qcd_order':'testpyqcd_order',    
    'qcd_power':'testpyqcd_power',    
    'qed_order':'testpyqed_order',    
    'qed_power':'testpyqed_power',    
    'ranseed1':'testpyranseed1',     
    'ranseed2':'testpyranseed2',     
    'renorm_scale':'testpyrenorm_scale', 
    'runnumber':'testpyrunnumber',  
    'useevtgen':'testpyuseevtgen',    
    'useqq':'testpyuseqq',
    'validated':'testpyvalidated',
    'version':'testpyversion',      
    'webpage':'testpywebpage',
},
'Herwig' :
{  'cdfrelease':'testhercdfrelease',
   'collider':'testhercollider',      
    'comments':'testhercomments',      
    'decaytable':'testherdecaytable',    
    'energy':'testherenergy',      
    'et_jet_cut':'testheret_jet_cut',    
    'fact_scale':'testherfact_scale',    
    'lamqcd5':'testherlamqcd5',       
    'numrecords':'testhernumrecords',    
    'partons':'testherpartons',       
    'pdf':'testherpdf',           
    'physicsprocess':'testherphysicsprocess',
    'picobarns':'testherpicobarns',     
    'qcd_order':'testherqcd_order',     
    'qcd_power':'testherqcd_power',     
    'qed_order':'testherqed_order',    
    'qed_power':'testherqed_power',     
    'ranseed1':'testherranseed1',      
    'ranseed2':'testherranseed2',      
    'renorm_scale':'testherrenorm_scale',  
    'runnumber':'testherrunnumber',     
    'validated':'testhervalidated',
    'version':'testherversion',       
    'webpage':'testherwebpage',        
},
'Alpgen' :{
    'collider':'testalpcollider',
    'comments':'testalpcomments',      
    'dr_jj_cut':'testalpdr_jj_cut',     
    'dr_lj_cut':'testalpdr_lj_cut',     
    'energy':'testalpenergy',        
    'et_jet_cut':'testalpet_jet_cut',    
    'et_lep_cut':'testalpet_lep_cut',    
    'fact_scale':'testalpfact_scale',    
    'lamqcd5':'testalplamqcd5',        
    'll_mass_cut':'testalpll_mass_cut',    
    'numrecords':'testalpnumrecords',     
    'partons':'testalppartons',        
    'pdf':'testalppdf',           
    'physicsprocess':'testalpphysicsprocess', 
    'picobarns':'testalppicobarns',      
    'qcd_order':'testalpqcd_order',      
    'qcd_power':'testalpqcd_power',      
    'qed_order':'testalpqed_order',      
    'qed_power':'testalpqed_power',      
    'ranseed1':'testalpranseed1',       
    'ranseed2':'testalpranseed2',      
    'renorm_scale':'testalprenorm_scale',   
    'runnumber':'testalprunnumber',      
    'validated':'testalpvalidated',
    'version':'testalpversion',        
    'webpage':'testalpwebpage',        
    'weight':'testalpweight',      
    },
'Madgraph' :{
    'collider':'testmadcollider',         
    'comments':'testmadcomments',         
    'dr_jj_cut':'testmaddr_jj_cut',        
    'dr_lj_cut':'testmaddr_lj_cut',        
    'energy':'testmadenergy',         
    'et_jet_cut':'testmadet_jet_cut',       
    'et_lep_cut':'testmadet_lep_cut',       
    'fact_scale':'testmadfact_scale',       
    'lamqcd5':'testmadlamqcd5',          
    'll_mass_cut':'testmadll_mass_cut',      
    'numrecords':'testmadnumrecords',       
    'partons':'testmadpartons',          
    'pdf':'testmadpdf',              
    'physicsprocess':'testmadphysicsprocess',   
    'picobarns':'testmadpicobarns',        
    'qcd_order':'testmadqcd_order',        
    'qcd_power':'testmadqcd_power',        
    'qed_order':'testmadqed_order',        
    'qed_power':'testmadqed_power',       
    'ranseed1':'testmadranseed1',        
    'ranseed2':'testmadranseed2',         
    'renorm_scale':'testmadrenorm_scale',     
    'runnumber':'testmadrunnumber',
    'validated':'testmadvalidated',
    'version':'testmadversion',
    'webpage':'testmadwebpage',          
    'weight':'testmadweight',                
},
'Generated' :{
'AppFamily':'generator',
'FirstEvent':'1',
'AppVersion':'1.00',
'LastEvent':'2',
'NumRecords':'2',
'AppName':'generator',
'TotalEvents':'2',
'RunNumber':54321,}
}
)

When you have store metadata for a file, you can retrieve it with the command:

sam get metadata --file=<myfile>
as illustrated for the more complex example below.

top


MetaData HowTo: Expanded Version

The more expanded version of a metadata file is shown below. Parameters come in categories and types. The "type" is something you would call the parameter name.

You can see all the parameter types by the following sam command:

sam translate constraints --dim="help"
which returns
Dimensions can be used to query files based on the SAM meta-data,
Run Config data, or MCRun parameters.
This style of help shows all the available dimensions.
Or, you can view a more limited subset of dimensions by using the type option:
   --dim=help --type=<typeName>
Where <typeName> is one of the following:
  alpgen, cdf, cdfsim, datafile, datasetdef, dfc, herwig, madgraph, mc, mcrun, pythia, run
Following the instructions above, we issue the folloing command for the typeName "cdf":
[stdenis@nglas05 ~]$ sam translate constraints --dim="help" --type=cdf
and we obtain


To use dimension queries, specify dimensions and constraints combined
with and/or/minus operators, as in these examples:
--dim='pythia.topmass 75 and simulated.numrecords > 404'
  --rpn='pythia.topmass 75 simulated.numrecords > 404 and'
  --dim='pythia.topmass 50-100 or global.originname nixhef'
  --dim='(data_tier digitized and appl_name d0reco and version preco03.07.00) \
         minus generated.decay > 12'

Available dimensions (not case sensitive):

CDF.DATASET : cdf file catalog dataset
CDF.FILESET : cdf file catalog fileset
CDF.HTML : web page with further information about this file

The syntax then maps into the metadata storage as indicated in the example below. This example is a very complex and somewhat nonsensical one that demonstrates a number of parameters that are available.
from import_classes import *

appfamily=AppFamily('generator', '1.00', 'generator')
filename = 'rs-1ev-test-031106-1910.root'
t = SAMMCFile(filename,Events(1, 2, 2),
"generated",
appfamily,
"01/21/2003 10:59:09",
"01/21/2003 11:20:08",
18,
{
'Global':
{ 'ProducedByName':'mrenna',
  'OriginName':'fermilab',
  'Phase':'unspecified',
  'FacilityName':'fixed-target-farm',
  'ProducedForName':'mrenna',
  'RunType':'Monte Carlo',
  'GroupName':'cdf',
  'Stream':'m', 
  'Description':'test mc',
},
'CDF':
{ 'DataSet':'stink2',
  'html':'http://cepa.fnal.gov/personal/mrenna/',
},
'Pythia':
{ 'cdfrelease':'testpycdfrelease',
      'collider':'testpycollider', 
    'comments':'testpycomments',
    'decaytable':'testpydecaytable',
    'energy':'testpyenergy', 
    'et_jet_cut':'testpyet_jet_cut', 
    'fact_scale':'testpyfact_scale', 
    'lamqcd5':'testpylamqcd5', 
    'numrecords':'testpynumrecords', 
    'partons':'testpypartons',
    'pdf':'testpypdf', 
    'physicsprocess':'testpyphysicsprocess',
    'picobarns':'testpypicobarns',    
    'qcd_order':'testpyqcd_order',    
    'qcd_power':'testpyqcd_power',    
    'qed_order':'testpyqed_order',    
    'qed_power':'testpyqed_power',    
    'ranseed1':'testpyranseed1',     
    'ranseed2':'testpyranseed2',     
    'renorm_scale':'testpyrenorm_scale', 
    'runnumber':'testpyrunnumber',  
    'useevtgen':'testpyuseevtgen',    
    'useqq':'testpyuseqq',
    'validated':'testpyvalidated',
    'version':'testpyversion',      
    'webpage':'testpywebpage',
},
'Herwig' :
{  'cdfrelease':'testhercdfrelease',
   'collider':'testhercollider',      
    'comments':'testhercomments',      
    'decaytable':'testherdecaytable',    
    'energy':'testherenergy',      
    'et_jet_cut':'testheret_jet_cut',    
    'fact_scale':'testherfact_scale',    
    'lamqcd5':'testherlamqcd5',       
    'numrecords':'testhernumrecords',    
    'partons':'testherpartons',       
    'pdf':'testherpdf',           
    'physicsprocess':'testherphysicsprocess',
    'picobarns':'testherpicobarns',     
    'qcd_order':'testherqcd_order',     
    'qcd_power':'testherqcd_power',     
    'qed_order':'testherqed_order',    
    'qed_power':'testherqed_power',     
    'ranseed1':'testherranseed1',      
    'ranseed2':'testherranseed2',      
    'renorm_scale':'testherrenorm_scale',  
    'runnumber':'testherrunnumber',     
    'validated':'testhervalidated',
    'version':'testherversion',       
    'webpage':'testherwebpage',        
},
'Alpgen' :{
    'collider':'testalpcollider',
    'comments':'testalpcomments',      
    'dr_jj_cut':'testalpdr_jj_cut',     
    'dr_lj_cut':'testalpdr_lj_cut',     
    'energy':'testalpenergy',        
    'et_jet_cut':'testalpet_jet_cut',    
    'et_lep_cut':'testalpet_lep_cut',    
    'fact_scale':'testalpfact_scale',    
    'lamqcd5':'testalplamqcd5',        
    'll_mass_cut':'testalpll_mass_cut',    
    'numrecords':'testalpnumrecords',     
    'partons':'testalppartons',        
    'pdf':'testalppdf',           
    'physicsprocess':'testalpphysicsprocess', 
    'picobarns':'testalppicobarns',      
    'qcd_order':'testalpqcd_order',      
    'qcd_power':'testalpqcd_power',      
    'qed_order':'testalpqed_order',      
    'qed_power':'testalpqed_power',      
    'ranseed1':'testalpranseed1',       
    'ranseed2':'testalpranseed2',      
    'renorm_scale':'testalprenorm_scale',   
    'runnumber':'testalprunnumber',      
    'validated':'testalpvalidated',
    'version':'testalpversion',        
    'webpage':'testalpwebpage',        
    'weight':'testalpweight',      
    },
'Madgraph' :{
    'collider':'testmadcollider',         
    'comments':'testmadcomments',         
    'dr_jj_cut':'testmaddr_jj_cut',        
    'dr_lj_cut':'testmaddr_lj_cut',        
    'energy':'testmadenergy',         
    'et_jet_cut':'testmadet_jet_cut',       
    'et_lep_cut':'testmadet_lep_cut',       
    'fact_scale':'testmadfact_scale',       
    'lamqcd5':'testmadlamqcd5',          
    'll_mass_cut':'testmadll_mass_cut',      
    'numrecords':'testmadnumrecords',       
    'partons':'testmadpartons',          
    'pdf':'testmadpdf',              
    'physicsprocess':'testmadphysicsprocess',   
    'picobarns':'testmadpicobarns',        
    'qcd_order':'testmadqcd_order',        
    'qcd_power':'testmadqcd_power',        
    'qed_order':'testmadqed_order',        
    'qed_power':'testmadqed_power',       
    'ranseed1':'testmadranseed1',        
    'ranseed2':'testmadranseed2',         
    'renorm_scale':'testmadrenorm_scale',     
    'runnumber':'testmadrunnumber',
    'validated':'testmadvalidated',
    'version':'testmadversion',
    'webpage':'testmadwebpage',          
    'weight':'testmadweight',                
},
'Generated' :{
'AppFamily':'generator',
'FirstEvent':'1',
'AppVersion':'1.00',
'LastEvent':'2',
'NumRecords':'2',
'AppName':'generator',
'TotalEvents':'2',
'RunNumber':54321,}
}
)

The metadata for a file stored in this way can then be examined with the command:

sam get metadata --file='rs-1ev-test-031106-1910.root'

which returns

             File Type:  SAMMC Data File
             File Name:  rs-1ev-test-031106-1910.root
               File ID:  2318128
             File Size:  307446 [B]
              CRC Data:  525925219L [adler 32 crc type]
       File Start Time:  01/21/2003 10:59:09
         File End Time:  01/21/2003 11:20:08
       Physical Stream:  m
      File Format Info:  unknown file format
           First Event:  1
            Last Event:  2
          Total Events:  2
    Application Family:  generator
      Application Name:  generator
   Application Version:  1.00
     Import Process ID:  0
             Node Name:  fcdfdata016.fnal.gov
            Work Group:  cdf
             User Name:  sam
          Produced For:  mrenna
           Produced By:  mrenna
       Origin Location:  fermilab
       Origin Facility:  fixed-target-farm
       Physics Channel:  
           Description:  test mc
              MC Phase:  unspecified
            Run Number:  54321
              Run Type:  monte carlo
        Run Start Time:  01/21/2003 10:59:09
          Run End Time:  01/21/2003 11:20:08
       Run Description:  test mc
         Run CM Energy:  0.0
          Parent Files:  []
                 Split:  0
                 Merge:  0
                   Key:  collider = testalpcollider (Category: alpgen)
                   Key:  comments = testalpcomments (Category: alpgen)
                   Key:  dr_jj_cut = testalpdr_jj_cut (Category: alpgen)
                   Key:  dr_lj_cut = testalpdr_lj_cut (Category: alpgen)
                   Key:  energy = testalpenergy (Category: alpgen)
                   Key:  et_jet_cut = testalpet_jet_cut (Category: alpgen)
                   Key:  et_lep_cut = testalpet_lep_cut (Category: alpgen)
                   Key:  fact_scale = testalpfact_scale (Category: alpgen)
                   Key:  lamqcd5 = testalplamqcd5 (Category: alpgen)
                   Key:  ll_mass_cut = testalpll_mass_cut (Category: alpgen)
                   Key:  numrecords = testalpnumrecords (Category: alpgen)
                   Key:  partons = testalppartons (Category: alpgen)
                   Key:  pdf = testalppdf (Category: alpgen)
                   Key:  physicsprocess = testalpphysicsprocess (Category: alpgen)
                   Key:  picobarns = testalppicobarns (Category: alpgen)
                   Key:  qcd_order = testalpqcd_order (Category: alpgen)
                   Key:  qcd_power = testalpqcd_power (Category: alpgen)
                   Key:  qed_order = testalpqed_order (Category: alpgen)
                   Key:  qed_power = testalpqed_power (Category: alpgen)
                   Key:  ranseed1 = testalpranseed1 (Category: alpgen)
                   Key:  ranseed2 = testalpranseed2 (Category: alpgen)
                   Key:  renorm_scale = testalprenorm_scale (Category: alpgen)
                   Key:  runnumber = testalprunnumber (Category: alpgen)
                   Key:  validated = testalpvalidated (Category: alpgen)
                   Key:  version = testalpversion (Category: alpgen)
                   Key:  webpage = testalpwebpage (Category: alpgen)
                   Key:  weight = testalpweight (Category: alpgen)
                   Key:  dataset = stink2 (Category: cdf)
                   Key:  html = http://cepa.fnal.gov/personal/mrenna/ (Category: cdf)
                   Key:  appfamily = generator (Category: generated)
                   Key:  appname = generator (Category: generated)
                   Key:  appversion = 1.00 (Category: generated)
                   Key:  firstevent = 1 (Category: generated)
                   Key:  lastevent = 2 (Category: generated)
                   Key:  numrecords = 2 (Category: generated)
                   Key:  runnumber = 54321 (Category: generated)
                   Key:  totalevents = 2 (Category: generated)
                   Key:  facilityname = fixed-target-farm (Category: global)
                   Key:  groupname = cdf (Category: global)
                   Key:  originname = fermilab (Category: global)
                   Key:  phase = unspecified (Category: global)
                   Key:  producedbyname = mrenna (Category: global)
                   Key:  producedforname = mrenna (Category: global)
                   Key:  cdfrelease = testhercdfrelease (Category: herwig)
                   Key:  collider = testhercollider (Category: herwig)
                   Key:  comments = testhercomments (Category: herwig)
                   Key:  decaytable = testherdecaytable (Category: herwig)
                   Key:  energy = testherenergy (Category: herwig)
                   Key:  et_jet_cut = testheret_jet_cut (Category: herwig)
                   Key:  fact_scale = testherfact_scale (Category: herwig)
                   Key:  lamqcd5 = testherlamqcd5 (Category: herwig)
                   Key:  numrecords = testhernumrecords (Category: herwig)
                   Key:  partons = testherpartons (Category: herwig)
                   Key:  pdf = testherpdf (Category: herwig)
                   Key:  physicsprocess = testherphysicsprocess (Category: herwig)
                   Key:  picobarns = testherpicobarns (Category: herwig)
                   Key:  qcd_order = testherqcd_order (Category: herwig)
                   Key:  qcd_power = testherqcd_power (Category: herwig)
                   Key:  qed_order = testherqed_order (Category: herwig)
                   Key:  qed_power = testherqed_power (Category: herwig)
                   Key:  ranseed1 = testherranseed1 (Category: herwig)
                   Key:  ranseed2 = testherranseed2 (Category: herwig)
                   Key:  renorm_scale = testherrenorm_scale (Category: herwig)
                   Key:  runnumber = testherrunnumber (Category: herwig)
                   Key:  validated = testhervalidated (Category: herwig)
                   Key:  version = testherversion (Category: herwig)
                   Key:  webpage = testherwebpage (Category: herwig)
                   Key:  collider = testmadcollider (Category: madgraph)
                   Key:  comments = testmadcomments (Category: madgraph)
                   Key:  dr_jj_cut = testmaddr_jj_cut (Category: madgraph)
                   Key:  dr_lj_cut = testmaddr_lj_cut (Category: madgraph)
                   Key:  energy = testmadenergy (Category: madgraph)
                   Key:  et_jet_cut = testmadet_jet_cut (Category: madgraph)
                   Key:  et_lep_cut = testmadet_lep_cut (Category: madgraph)
                   Key:  fact_scale = testmadfact_scale (Category: madgraph)
                   Key:  lamqcd5 = testmadlamqcd5 (Category: madgraph)
                   Key:  ll_mass_cut = testmadll_mass_cut (Category: madgraph)
                   Key:  numrecords = testmadnumrecords (Category: madgraph)
                   Key:  partons = testmadpartons (Category: madgraph)
                   Key:  pdf = testmadpdf (Category: madgraph)
                   Key:  physicsprocess = testmadphysicsprocess (Category: madgraph)
                   Key:  picobarns = testmadpicobarns (Category: madgraph)
                   Key:  qcd_order = testmadqcd_order (Category: madgraph)
                   Key:  qcd_power = testmadqcd_power (Category: madgraph)
                   Key:  qed_order = testmadqed_order (Category: madgraph)
                   Key:  qed_power = testmadqed_power (Category: madgraph)
                   Key:  ranseed1 = testmadranseed1 (Category: madgraph)
                   Key:  ranseed2 = testmadranseed2 (Category: madgraph)
                   Key:  renorm_scale = testmadrenorm_scale (Category: madgraph)
                   Key:  runnumber = testmadrunnumber (Category: madgraph)
                   Key:  validated = testmadvalidated (Category: madgraph)
                   Key:  version = testmadversion (Category: madgraph)
                   Key:  webpage = testmadwebpage (Category: madgraph)
                   Key:  weight = testmadweight (Category: madgraph)
                   Key:  cdfrelease = testpycdfrelease (Category: pythia)
                   Key:  collider = testpycollider (Category: pythia)
                   Key:  comments = testpycomments (Category: pythia)
                   Key:  decaytable = testpydecaytable (Category: pythia)
                   Key:  energy = testpyenergy (Category: pythia)
                   Key:  et_jet_cut = testpyet_jet_cut (Category: pythia)
                   Key:  fact_scale = testpyfact_scale (Category: pythia)
                   Key:  lamqcd5 = testpylamqcd5 (Category: pythia)
                   Key:  numrecords = testpynumrecords (Category: pythia)
                   Key:  partons = testpypartons (Category: pythia)
                   Key:  pdf = testpypdf (Category: pythia)
                   Key:  physicsprocess = testpyphysicsprocess (Category: pythia)
                   Key:  picobarns = testpypicobarns (Category: pythia)
                   Key:  qcd_order = testpyqcd_order (Category: pythia)
                   Key:  qcd_power = testpyqcd_power (Category: pythia)
                   Key:  qed_order = testpyqed_order (Category: pythia)
                   Key:  qed_power = testpyqed_power (Category: pythia)
                   Key:  ranseed1 = testpyranseed1 (Category: pythia)
                   Key:  ranseed2 = testpyranseed2 (Category: pythia)
                   Key:  renorm_scale = testpyrenorm_scale (Category: pythia)
                   Key:  runnumber = testpyrunnumber (Category: pythia)
                   Key:  useevtgen = testpyuseevtgen (Category: pythia)
                   Key:  useqq = testpyuseqq (Category: pythia)
                   Key:  validated = testpyvalidated (Category: pythia)
                   Key:  version = testpyversion (Category: pythia)
                   Key:  webpage = testpywebpage (Category: pythia)
            Request ID:  0
             Data Tier:  generated

top


Sam Translate: Getting lists of files based on constraints on parameters

The most useful form of sam translate constraints is to ask for a cdf dataset that you formed. For the example in "complex" store the cdf dataset name is "stink2". Hence we use:
 [stdenis@nglas05 ~]$ sam translate constraints --dim="cdf.dataset stink2"
to obtain:
Files:
  sm-store-test1.root
  rs-1ev-test-031106-1810.root
  rs-1ev-test-031106-1840.root
  rs-1ev-test-031106-1844.root
  rs-1ev-test-031106-1844-1.root
  rs-1ev-test-031106-1845.root
  rs-1ev-test-031106-1845-1.root
  rs-1ev-test-031106-1847.root
  rs-1ev-test-031106-1855.root
  rs-1ev-test-031106-1910.root
  rs-1ev-test-031114-2251.root

File Count:  11
Average File Size:  300
Total File Size:  3300
Total Event Count:  22

For Monte Carlo, you do not use the dataset to do the dimension. You use the request id. This is because you will have looked for the Monte Carlo based on some description and then found some samples that are interesting. With each sample is a request id and when you want to use that sample, you ask for the file based on this id. The request ID is guaranteed to be unique. For example, if the request given by requestId=27 is what you want, the sam translate constraints will be done as follows:

sam translate constraints --dim="GLOBAL.REQUESTID 27"
yielding
Files:
  samTueMay25133949CDT2004.root
  samTueMay25145356CDT2004.root

File Count:  2
Average File Size:  300
Total File Size:  600
Total Event Count:  4

top


Making a Monte Carlo Request

A Monte Carlo Request is made by creating a python file that looks much like that for the file storage. The file contains the parameters of the request. Here is an example:
from SamUserApiImportClasses import *
datatier='reconstructed'
appfamily=AppFamily('generator', '1.00', 'generator')
dict={
        'cdf':{
                'dataset':'stink2',
                'html':'http://cepa.fnal.gov/personal/mrenna/',
                },
        'Pythia':
        { 'cdfrelease':'testpycdfrelease',
          'collider':'testpycollider', 
          'comments':'testpycomments',
          'decaytable':'testpydecaytable',
          'energy':'testpyenergy', 
          'et_jet_cut':'testpyet_jet_cut', 
          'fact_scale':'testpyfact_scale', 
          'lamqcd5':'testpylamqcd5', 
          'numrecords':'testpynumrecords', 
          'partons':'testpypartons',
          'pdf':'testpypdf', 
          'physicsprocess':'testpyphysicsprocess',
          'picobarns':'testpypicobarns',    
          'qcd_order':'testpyqcd_order',    
          'qcd_power':'testpyqcd_power',    
          'qed_order':'testpyqed_order',    
          'qed_power':'testpyqed_power',    
          'ranseed1':'testpyranseed1',     
          'ranseed2':'testpyranseed2',     
          'renorm_scale':'testpyrenorm_scale', 
          'runnumber':'testpyrunnumber',  
          'useevtgen':'testpyuseevtgen',    
          'useqq':'testpyuseqq',
          'validated':'testpyvalidated',
          'version':'testpyversion',      
          'webpage':'testpywebpage',
          },
        'Herwig' :
        {  'cdfrelease':'testhercdfrelease',
           'collider':'testhercollider',      
           'comments':'testhercomments',      
           'decaytable':'testherdecaytable',    
           'energy':'testherenergy',      
           'et_jet_cut':'testheret_jet_cut',    
           'fact_scale':'testherfact_scale',    
           'lamqcd5':'testherlamqcd5',       
           'numrecords':'testhernumrecords',    
           'partons':'testherpartons',       
           'pdf':'testherpdf',           
           'physicsprocess':'testherphysicsprocess',
           'picobarns':'testherpicobarns',     
           'qcd_order':'testherqcd_order',     
           'qcd_power':'testherqcd_power',     
           'qed_order':'testherqed_order',    
           'qed_power':'testherqed_power',     
           'ranseed1':'testherranseed1',      
           'ranseed2':'testherranseed2',      
           'renorm_scale':'testherrenorm_scale',  
           'runnumber':'testherrunnumber',     
           'validated':'testhervalidated',
           'version':'testherversion',       
           'webpage':'testherwebpage',        
           },
        'Alpgen' : {
    'collider':'testalpcollider',
    'comments':'testalpcomments',      
    'dr_jj_cut':'testalpdr_jj_cut',     
    'dr_lj_cut':'testalpdr_lj_cut',     
    'energy':'testalpenergy',        
    'et_jet_cut':'testalpet_jet_cut',    
    'et_lep_cut':'testalpet_lep_cut',    
    'fact_scale':'testalpfact_scale',    
    'lamqcd5':'testalplamqcd5',        
    'll_mass_cut':'testalpll_mass_cut',    
    'numrecords':'testalpnumrecords',     
    'partons':'testalppartons',        
    'pdf':'testalppdf',           
    'physicsprocess':'testalpphysicsprocess', 
    'picobarns':'testalppicobarns',      
    'qcd_order':'testalpqcd_order',      
    'qcd_power':'testalpqcd_power',      
    'qed_order':'testalpqed_order',      
    'qed_power':'testalpqed_power',      
    'ranseed1':'testalpranseed1',       
    'ranseed2':'testalpranseed2',      
    'renorm_scale':'testalprenorm_scale',   
    'runnumber':'testalprunnumber',      
    'validated':'testalpvalidated',
    'version':'testalpversion',        
    'webpage':'testalpwebpage',        
    'weight':'testalpweight',      
    },
'Madgraph' :{
    'collider':'testmadcollider',         
    'comments':'testmadcomments',         
    'dr_jj_cut':'testmaddr_jj_cut',        
    'dr_lj_cut':'testmaddr_lj_cut',        
    'energy':'testmadenergy',         
    'et_jet_cut':'testmadet_jet_cut',       
    'et_lep_cut':'testmadet_lep_cut',       
    'fact_scale':'testmadfact_scale',       
    'lamqcd5':'testmadlamqcd5',          
    'll_mass_cut':'testmadll_mass_cut',      
    'numrecords':'testmadnumrecords',       
    'partons':'testmadpartons',          
    'pdf':'testmadpdf',              
    'physicsprocess':'testmadphysicsprocess',   
    'picobarns':'testmadpicobarns',        
    'qcd_order':'testmadqcd_order',        
    'qcd_power':'testmadqcd_power',        
    'qed_order':'testmadqed_order',        
    'qed_power':'testmadqed_power',       
    'ranseed1':'testmadranseed1',        
    'ranseed2':'testmadranseed2',         
    'renorm_scale':'testmadrenorm_scale',     
    'runnumber':'testmadrunnumber',
    'validated':'testmadvalidated',
    'version':'testmadversion',
    'webpage':'testmadwebpage',          
    'weight':'testmadweight',                
},
        'Global':{
                'phase':'undefined',
                'stream':'notstreamed',
                'description':'junk',
                'producedforname':'stdenis',
                'runtype':'monte carlo',
                'groupname':'test',
                },
        'Generated':{
                'useevtgen':'on',
                'collisionenergy':'1960.0',
                'decay':'tauola',
                'generator':'pythia',
                'pdflibfunc':'CTEQ5',
                },
       
        }


This request is obviously nonsense but demonstrates the generators and parameters available. In a real case, one of these would be chosen with sensible values.

The request is then sent by the following command:

sam create request --dictfile=req3.py --group=test --numEvents=1

and one obtains a request id:
RequestId 32
This request id is then entered in the file metdata under the global parameter category. The request can be queried:
sam get request details --requestId=32
This returns all the info:
Request Detail ID:21
Family: generator
Application Name: generator
Version: 1.00
Request Detail Status: okay
Proj Snap ID: 0
Request ID: 32
Installation ID: 0
Param Type: runtype Value: monte carlo Category: global  Data_tier: reconstructed  Description:
Param Type: stream Value: notstreamed Category: global  Data_tier: reconstructed  Description:
Param Type: description Value: junk Category: global  Data_tier: reconstructed  Description:
Param Type: producedforname Value: stdenis Category: global  Data_tier: reconstructed  Description:
Param Type: groupname Value: test Category: global  Data_tier: reconstructed  Description:
Param Type: useevtgen Value: on Category: generated  Data_tier: reconstructed  Description:
Param Type: collisionenergy Value: 1960.0 Category: generated  Data_tier: reconstructed  Description:
Param Type: dataset Value: stink2 Category: cdf  Data_tier: reconstructed  Description:
Param Type: html Value: http://cepa.fnal.gov/personal/mrenna/ Category: cdf  Data_tier: reconstructed  Description:
Param Type: dr_lj_cut Value: testmaddr_lj_cut Category: madgraph  Data_tier: reconstructed  Description:
Param Type: qed_order Value: testmadqed_order Category: madgraph  Data_tier: reconstructed  Description:
Param Type: pdf Value: testmadpdf Category: madgraph  Data_tier: reconstructed  Description:
Param Type: webpage Value: testmadwebpage Category: madgraph  Data_tier: reconstructed  Description:
Param Type: et_jet_cut Value: testmadet_jet_cut Category: madgraph  Data_tier: reconstructed  Description:
Param Type: version Value: testmadversion Category: madgraph  Data_tier: reconstructed  Description:
Param Type: energy Value: testmadenergy Category: madgraph  Data_tier: reconstructed  Description:
Param Type: comments Value: testmadcomments Category: madgraph  Data_tier: reconstructed  Description:
Param Type: numrecords Value: testmadnumrecords Category: madgraph  Data_tier: reconstructed  Description:
Param Type: ranseed2 Value: testmadranseed2 Category: madgraph  Data_tier: reconstructed  Description:
Param Type: ranseed1 Value: testmadranseed1 Category: madgraph  Data_tier: reconstructed  Description:
Param Type: ll_mass_cut Value: testmadll_mass_cut Category: madgraph  Data_tier: reconstructed  Description:
Param Type: collider Value: testmadcollider Category: madgraph  Data_tier: reconstructed  Description:
Param Type: qcd_order Value: testmadqcd_order Category: madgraph  Data_tier: reconstructed  Description:
Param Type: renorm_scale Value: testmadrenorm_scale Category: madgraph  Data_tier: reconstructed  Description:
Param Type: validated Value: testmadvalidated Category: madgraph  Data_tier: reconstructed  Description:
Param Type: dr_jj_cut Value: testmaddr_jj_cut Category: madgraph  Data_tier: reconstructed  Description:
Param Type: picobarns Value: testmadpicobarns Category: madgraph  Data_tier: reconstructed  Description:
Param Type: fact_scale Value: testmadfact_scale Category: madgraph  Data_tier: reconstructed  Description:
Param Type: lamqcd5 Value: testmadlamqcd5 Category: madgraph  Data_tier: reconstructed  Description:
Param Type: et_lep_cut Value: testmadet_lep_cut Category: madgraph  Data_tier: reconstructed  Description:
Param Type: runnumber Value: testmadrunnumber Category: madgraph  Data_tier: reconstructed  Description:
Param Type: physicsprocess Value: testmadphysicsprocess Category: madgraph  Data_tier: reconstructed  Description:
Param Type: qcd_power Value: testmadqcd_power Category: madgraph  Data_tier: reconstructed  Description:
Param Type: weight Value: testmadweight Category: madgraph  Data_tier: reconstructed  Description:
Param Type: partons Value: testmadpartons Category: madgraph  Data_tier: reconstructed  Description:
Param Type: qed_power Value: testmadqed_power Category: madgraph  Data_tier: reconstructed  Description:
Param Type: qed_order Value: testpyqed_order Category: pythia  Data_tier: reconstructed  Description:
Param Type: pdf Value: testpypdf Category: pythia  Data_tier: reconstructed  Description:
Param Type: et_jet_cut Value: testpyet_jet_cut Category: pythia  Data_tier: reconstructed  Description:
Param Type: version Value: testpyversion Category: pythia  Data_tier: reconstructed  Description:
Param Type: comments Value: testpycomments Category: pythia  Data_tier: reconstructed  Description:
Param Type: numrecords Value: testpynumrecords Category: pythia  Data_tier: reconstructed  Description:
Param Type: cdfrelease Value: testpycdfrelease Category: pythia  Data_tier: reconstructed  Description:
Param Type: ranseed1 Value: testpyranseed1 Category: pythia  Data_tier: reconstructed  Description:
Param Type: collider Value: testpycollider Category: pythia  Data_tier: reconstructed  Description:
Param Type: decaytable Value: testpydecaytable Category: pythia  Data_tier: reconstructed  Description:
Param Type: ranseed2 Value: testpyranseed2 Category: pythia  Data_tier: reconstructed  Description:
Param Type: qcd_order Value: testpyqcd_order Category: pythia  Data_tier: reconstructed  Description:
Param Type: useevtgen Value: testpyuseevtgen Category: pythia  Data_tier: reconstructed  Description:
Param Type: renorm_scale Value: testpyrenorm_scale Category: pythia  Data_tier: reconstructed  Description:
Param Type: validated Value: testpyvalidated Category: pythia  Data_tier: reconstructed  Description:
Param Type: partons Value: testpypartons Category: pythia  Data_tier: reconstructed  Description:
Param Type: picobarns Value: testpypicobarns Category: pythia  Data_tier: reconstructed  Description:
Param Type: fact_scale Value: testpyfact_scale Category: pythia  Data_tier: reconstructed  Description:
Param Type: lamqcd5 Value: testpylamqcd5 Category: pythia  Data_tier: reconstructed  Description:
Param Type: energy Value: testpyenergy Category: pythia  Data_tier: reconstructed  Description:
Param Type: runnumber Value: testpyrunnumber Category: pythia  Data_tier: reconstructed  Description:
Param Type: webpage Value: testpywebpage Category: pythia  Data_tier: reconstructed  Description:
Param Type: qcd_power Value: testpyqcd_power Category: pythia  Data_tier: reconstructed  Description:
Param Type: physicsprocess Value: testpyphysicsprocess Category: pythia  Data_tier: reconstructed  Description:
Param Type: useqq Value: testpyuseqq Category: pythia  Data_tier: reconstructed  Description:
Param Type: qed_power Value: testpyqed_power Category: pythia  Data_tier: reconstructed  Description:
Param Type: dr_lj_cut Value: testalpdr_lj_cut Category: alpgen  Data_tier: reconstructed  Description:
Param Type: qed_order Value: testalpqed_order Category: alpgen  Data_tier: reconstructed  Description:
Param Type: pdf Value: testalppdf Category: alpgen  Data_tier: reconstructed  Description:
Param Type: webpage Value: testalpwebpage Category: alpgen  Data_tier: reconstructed  Description:
Param Type: et_jet_cut Value: testalpet_jet_cut Category: alpgen  Data_tier: reconstructed  Description:
Param Type: version Value: testalpversion Category: alpgen  Data_tier: reconstructed  Description:
Param Type: energy Value: testalpenergy Category: alpgen  Data_tier: reconstructed  Description:
Param Type: comments Value: testalpcomments Category: alpgen  Data_tier: reconstructed  Description:
Param Type: numrecords Value: testalpnumrecords Category: alpgen  Data_tier: reconstructed  Description:
Param Type: ranseed2 Value: testalpranseed2 Category: alpgen  Data_tier: reconstructed  Description:
Param Type: ranseed1 Value: testalpranseed1 Category: alpgen  Data_tier: reconstructed  Description:
Param Type: ll_mass_cut Value: testalpll_mass_cut Category: alpgen  Data_tier: reconstructed  Description:
Param Type: collider Value: testalpcollider Category: alpgen  Data_tier: reconstructed  Description:
Param Type: qcd_order Value: testalpqcd_order Category: alpgen  Data_tier: reconstructed  Description:
Param Type: renorm_scale Value: testalprenorm_scale Category: alpgen  Data_tier: reconstructed  Description:
Param Type: validated Value: testalpvalidated Category: alpgen  Data_tier: reconstructed  Description:
Param Type: dr_jj_cut Value: testalpdr_jj_cut Category: alpgen  Data_tier: reconstructed  Description:
Param Type: picobarns Value: testalppicobarns Category: alpgen  Data_tier: reconstructed  Description:
Param Type: fact_scale Value: testalpfact_scale Category: alpgen  Data_tier: reconstructed  Description:
Param Type: lamqcd5 Value: testalplamqcd5 Category: alpgen  Data_tier: reconstructed  Description:
Param Type: et_lep_cut Value: testalpet_lep_cut Category: alpgen  Data_tier: reconstructed  Description:
Param Type: runnumber Value: testalprunnumber Category: alpgen  Data_tier: reconstructed  Description:
Param Type: physicsprocess Value: testalpphysicsprocess Category: alpgen  Data_tier: reconstructed  Description:
Param Type: qcd_power Value: testalpqcd_power Category: alpgen  Data_tier: reconstructed  Description:
Param Type: weight Value: testalpweight Category: alpgen  Data_tier: reconstructed  Description:
Param Type: partons Value: testalppartons Category: alpgen  Data_tier: reconstructed  Description:
Param Type: qed_power Value: testalpqed_power Category: alpgen  Data_tier: reconstructed  Description:
Param Type: qed_order Value: testherqed_order Category: herwig  Data_tier: reconstructed  Description:
Param Type: pdf Value: testherpdf Category: herwig  Data_tier: reconstructed  Description:
Param Type: et_jet_cut Value: testheret_jet_cut Category: herwig  Data_tier: reconstructed  Description:
Param Type: version Value: testherversion Category: herwig  Data_tier: reconstructed  Description:
Param Type: comments Value: testhercomments Category: herwig  Data_tier: reconstructed  Description:
Param Type: numrecords Value: testhernumrecords Category: herwig  Data_tier: reconstructed  Description:
Param Type: cdfrelease Value: testhercdfrelease Category: herwig  Data_tier: reconstructed  Description:
Param Type: ranseed1 Value: testherranseed1 Category: herwig  Data_tier: reconstructed  Description:
Param Type: collider Value: testhercollider Category: herwig  Data_tier: reconstructed  Description:
Param Type: decaytable Value: testherdecaytable Category: herwig  Data_tier: reconstructed  Description:
Param Type: ranseed2 Value: testherranseed2 Category: herwig  Data_tier: reconstructed  Description:
Param Type: qcd_order Value: testherqcd_order Category: herwig  Data_tier: reconstructed  Description:
Param Type: renorm_scale Value: testherrenorm_scale Category: herwig  Data_tier: reconstructed  Description:
Param Type: validated Value: testhervalidated Category: herwig  Data_tier: reconstructed  Description:
Param Type: partons Value: testherpartons Category: herwig  Data_tier: reconstructed  Description:
Param Type: picobarns Value: testherpicobarns Category: herwig  Data_tier: reconstructed  Description:
Param Type: fact_scale Value: testherfact_scale Category: herwig  Data_tier: reconstructed  Description:
Param Type: lamqcd5 Value: testherlamqcd5 Category: herwig  Data_tier: reconstructed  Description:
Param Type: energy Value: testherenergy Category: herwig  Data_tier: reconstructed  Description:
Param Type: runnumber Value: testherrunnumber Category: herwig  Data_tier: reconstructed  Description:
Param Type: webpage Value: testherwebpage Category: herwig  Data_tier: reconstructed  Description:
Param Type: qcd_power Value: testherqcd_power Category: herwig  Data_tier: reconstructed  Description:
Param Type: physicsprocess Value: testherphysicsprocess Category: herwig  Data_tier: reconstructed  Description:
Param Type: qed_power Value: testherqed_power Category: herwig  Data_tier: reconstructed  Description:
Param Type: phase Value: undefined Category: global  Data_tier: reconstructed  Description:
Param Type: generator Value: pythia Category: generated  Data_tier: reconstructed  Description:
Param Type: decay Value: tauola Category: generated  Data_tier: reconstructed  Description:
Param Type: pdflibfunc Value: CTEQ5 Category: generated  Data_tier: reconstructed  Description:

Requests can be listed:
sam list requests --requestIdGt=0

giving
[1, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none']
[2, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none']
[3, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none']
[4, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none']
[5, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none']
[6, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none']
[7, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[8, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[9, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[10, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[11, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[12, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[13, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[14, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[15, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[16, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[17, 'new', 'mc-request', 1, 'lauri', 'test', 1, 'none']
[18, 'new', 'mc-request', 1, 'lauri', 'test', 1, 'none']
[19, 'new', 'mc-request', 1, 'lauri', 'test', 1, 'none']
[20, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[21, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[22, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[23, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[24, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[25, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[26, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[27, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[28, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[29, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[30, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[31, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
[32, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
This functionality will soon be available on web pages, just as they are in D0. It is also possible to use
sam modify request ...
To modify the number of events, the status of the request and so on.

top


Sam Locate

You can find every location of a file everywhere using the sam locate command. We can use one of the files that came from the sam translate example to show how this works:
[stdenis@nglas05 ~]$ sam locate sm-store-test1.root
which yields
['/pnfs/cdfen/filesets/SM/SMTest,ia3937']
This is dull -- just the location in enstore, along with the "enstore cookie" (don't ask).

Here is a more intersting one (common file used in all tests, so it gets around):

[stdenis@nglas05 ~]$ sam locate gb01defd.0001exo0
yielding
['/pnfs/cdfen/filesets/GI/GI05/GI0500/GI0500.0,ia3638', 
'ncdf68.fnal.gov:/scratch/sam/cache1/boo', 
'nglas09.fnal.gov:/cdf/scratch/sam/cache/cdfyale/prd/boo', 
'lf7.ph.gla.ac.uk:/localhome/sam/cache1/boo', 
'tuhept.phy.tufts.edu:/home/sam/cache1/boo', 
'cdf3.uchicago.edu:/cdf/data3a/boo', 
'nglas07.fnal.gov:/cdf/scratch/sam/pro/boo', 
'nglas08.fnal.gov:/cdf/scratch/sam/prd/boo', 
'testwulf.hpcc.ttu.edu:/home/sam/cache1/prd/boo',
'matrix.physics.ox.ac.uk:/eweak/disk1/sam/boo', 
'cdfg.ph.gla.ac.uk:/data3/sam/prd/boo', 
'cdf001.ucsd.edu:/cdf/data01/cdf001/cdf-sam/cache/boo', 
'nglas03.fnal.gov:/data3/sam/prd/boo', 
'nglas10.fnal.gov:/data/nglas10/a/sam/prd/boo',
'nglas05.fnal.gov:/data3/sam/pro/boo',
'nglas04.fnal.gov:/data3/sam/pro/boo',
'nglas06.fnal.gov:/data1/sam/prd/boo', 
'fcdfdata016.fnal.gov:/data1/cdf-sam/cache/cdf-scotgrid-2/prd/boo', 
'pccdf2.ts.infn.it:/cdf3/sam_cache/boo', 
'dcap://cdfdca-door01:dcap://cdfdca.fnal.gov:25125/pnfs/fnal.gov/usr//cdfen/filesets/GI/GI05/GI0500/GI0500.0',
'dcap://cdfdca-door03:dcap://cdfdca.fnal.gov:25137/pnfs/fnal.gov/usr//cdfen/filesets/GI/GI05/GI0500/GI0500.0',
'dcap://cdfdca-door02:dcap://cdfdca.fnal.gov:25136/pnfs/fnal.gov/usr//cdfen/filesets/GI/GI05/GI0500/GI0500.0',
'dcap://cdfdca-door04:dcap://cdfdca.fnal.gov:25138/pnfs/fnal.gov/usr//cdfen/filesets/GI/GI05/GI0500/GI0500.0',
'cdfsam.cnaf.infn.it:/cdf/data/data001/SAM-100GB-cache/boo']

top


Making a SAM Dataset

To make a SAM dataset, you can either use the command line arguments or use the dataset definition GUI. Files are imagined to live in a multidimensional space with various parameters as possible axes. You put constraints on the space to carve out the files you wish to have. The definition is usually done by first checking that you get a sensible list of files with the sam translate command and then you use the sam create dataset definition and sam create dataset commands. If you are confused by the multiple uses of the word "dataset" then you should read the section on "Datasets Explained".

The syntax of the sam translate has been shown elsewhere. This can then be used to define a dataset as follows:

sam create dataset definition --defname=stink2  --group=test \
    --defdesc='test mc param store with mrenna' \
    --dim='cdf.dataset stink2'
A definition can be examined with
sam translate constraints --dim="DATASET_DEF_NAME stink2"
This gives:
[stdenis@nglas05 ~]$ sam translate constraints --dim="DATASET_DEF_NAME stink2"
Files:
  sm-store-test1.root
  rs-1ev-test-031106-1810.root
  rs-1ev-test-031106-1840.root
  rs-1ev-test-031106-1844.root
  rs-1ev-test-031106-1844-1.root
  rs-1ev-test-031106-1845.root
  rs-1ev-test-031106-1845-1.root
  rs-1ev-test-031106-1847.root
  rs-1ev-test-031106-1855.root
  rs-1ev-test-031106-1910.root

File Count:  10
Average File Size:  300
Total File Size:  3000
Total Event Count:  20


If the result is not what was expected, then the definition can be redone and the previous definition will be overwritten.

After you are satisfied, you can either run a project or use:

sam create dataset --defname=stink2
This ensures that your defintion cannot be changed.

top


Datasets Explained: SAM Datasets, CDF DataSets, Datasets, Project Snapshots

Unfortunately the word "dataset" is heavily overloaded in the data handling world. Furthermore, in the deep dark history of SAM, a change was made in syntax and so there are some references to words that mean the same thing.

First, a cdf dataset has a more modern (especially in Grid) concept of a "data collection". That is a group of files that are common in their properties. For our implementation of SAM in CDF, we have maintained this as a parameter although more sophisticated ways of handling this are being hammered out.

A SAM dataset definition corresponds to a selection of files meeting some criteria set by a variety of parameters that describe them based on the declarations made in the metadata.

A very simple way through the morass is to use the parameter cdf.dataset in defining a sam dataset and that is the end of the story. Using more sophisticated combinations of parameters requires care that one has specified the collection of files uniquely. Tools exist to allow you to examine files you care to inspect, but this is indeed a complex operation.

Once a dataset definition is made, it can be used to specify the files to be delivered to a project. When that delivery has been done, sam keeps permanent record of the project that was run and it is possible to always go back to find out what files were used. This is called a "dataset" within the context of sam or a "project snapshot" within the context of sam.

When a dataset definition is made, it is possible to immediately take a snapshot of the files that satisfy the requirements specified in the definition with "sam create dataset". Once this is done the definition is frozen. This is useful if you want to make sure that your definition is not modified - by someone else!

top


SAM Datasets: Figuring out what parameters are defined for each file, their values and getting access to metadata

Information on parameters and metadata for datasets in Randy's browser under the "sam" reports and shows up in the pulldown menus as:

Here is a description of how to use them and what they do.

  • SAM:File Parameter Names by Project (aka dataset)

    One is presented with two fields to fill:

    One will want to fill the first with a favorite project defintion name. This example uses jbot0h.

    The second is not useful until one has done the query or knows something about one's snapshots. Snapshots and datasets are described in this document.

    If one submits one's request after giving a dataset definition one obtains the following. ( The example for jbot0h is instructional) For every snapshot, one sees what parameters there are and how many times they have different values for all the files. One does NOT see all the files and the parameter next to each one. That would be information overload at this point. The columns obtained are:

    Here is a more detailed description of each column

  • SAM:File Parameter Values by Project (aka dataset)

    In this case, one is presented the same fields to fill in as before. This time, for the example, jbot0h is chosen and the latest snapshot is entered (28).

    The resulting report may be found at this URL. where the following columns are returned:

    The only difference from the report above is that the value is given. For the case where occurances are the same for all files, only the single value appears.

    This effectively gives you a way to recover the value of parameters for a dataset defintion and inasmuch as we match sam dataset definitions in cdf to cdf datasets, this gives parameters defining a dataset.

    Also, for the case of some parameters that have meaning, Randy has hyperlinked it to more detailed information. Clicking on jbot0h in this example case will give a page of all the files that exist in that dataset and details on the location of every file in every cache, and a basic dump of all metadata for each file. Files are listed by default 125 at a time.

    top


    SAM Datasets: Cleaning up after some jobs crash

    Here's how to make a "cleanup" dataset definition:
    sam translate constraints --dim = "__set__
    old_dataset_definition_name  minus
    (project_name your_old_project_name and consumed_status consumed
                                                and consumer your_name)"
    
    where Some points to note:

    For example, for the dataset jbot0h where the sam project was run at some point, go to the log file to get the project name and then use

    sam translate constraints --dim="__set__ jbot0h minus
    (project_name stdenis_cdf-sam_jbot0h_1078085002.52 and 
    consumed_status consumed and
    consumer stdenis)" 
    
    to see the files missed. Remember sam translate constraints just lists the files. If all looks good, replace "sam translate constraints" with "sam create dataset definition" and add "--group=test --defname=your_new_def_name" and don't forget to do the "sam create dataset" as well, as decribed in above.

    If you want further checks after you create a dataset, you can look at the parameters and metadata if you follow these instructions.

    top

    File Availabity Status and Dimension Queries

    What does file availabity status mean? It means that there is AT LEAST ONE accessible location for the file:

    If there are one or more locations, and at least one of them is considered "good", then the file is available.

    What was it used for? It was used because:

    Users were confused when they'd do "translate constraints" and see a list of N files, then run their project and be delivered only M files (M<N), because the constraints they provided did not account for "files with locations" (aka 'file_availability_status'='available').

    In the past, this columns was set to 'available' if the file ever received a location (but not maintained after that point).

    Another file status value is FILE_CONTENT_STATUS, which is inherent to a file (including any and all replicas) and is a global judgement that the integrity of the file itself is good or bad. This is not a reflection of the quality of the physics contents.

    FILE_CONTENT_STATUS is a function of the file itself.

    FILE_AVAILABILITY_STATUS is a function of all valid sam_locations which might contain copies of the file; the file is available if at least one replica is accessible.

    top

    Useful Binary list for SAM/CAF installations

    top

    Adding a Permanent Disk for Storage

    [sam@nglas08 sam]$ samadmin add data disk --fullpath=nglas08.fnal.gov:/data/nglas08/b/sam/perm --size=3G
    Data disk nglas08.fnal.gov:/data/nglas08/b/sam/perm has been registered, id = 269, type = disk
    
    [sam@nglas08 perm]$ samadmin add disk location --fullpath=nglas08.fnal.gov:/data/nglas08/b/sam/perm/perm
    Disk location nglas08.fnal.gov:/data/nglas08/b/sam/perm/perm has been registered: id = 109539, type = 'disk'
    
    
    Note the subtle bug in that the fullpath is longer for the disk location than for data disk!

    top

    How to add disk for sam cache returning URL's for DCACHE, HPSS, or AFS

    
    The problem: How to get SAM to recognize a cache based on AFS
    (Posed by Liz Buckley)
    --------------------------------------------------------------
    Think of the current implementation of SAM and dCache. The station is
    configured with SAM cache that is actually dcache doors. When the user
    runs a SAM job the sam_get_next_file returns them a URL to the file's
    location in dcache. No data is moved by SAM. It is up to the user
    application to open the file, at which point it will physically be staged
    into dCache if it isn't there already. SAM does place an entry in its
    database showing that file now has a dcache location but it has no
    ownership of the files, they are not "owned" bu user SAM.
    
    Now think of AFS. I have a big pool of AFS space with files that are
    already there - sort of like dCache. The files are also in PNFS and will
    be declared to SAM with PNFS locations like any other file. I would
    like to tell the SAM station that this is it's "cache". When a user runs a
    SAM job I would like get_next_file to return the AFS path. No data should
    be moved by SAM, like with dCache, the user application will open the file
    because it has AFS available on the node it is running on. SAM doesn't own
    the files in this case either.
    
    Conceptually these two cases seem to me rather similar. Clearly it is
    possible to make this work for dCache disk so why can't something similar
    be implemented for AFS disk.
    
    
    The Answer
    From Andrew
    ------------------------------
    
    SAM and dCache in its present form exists as a proof of concept that SAM
    can manage and broker different storage elements. The adapter I
    developed to map PNFS location to dCache URL is not universal and
    probably won't even work outside of Fermilab.
    
    Yet there is an ongoing effort to introduce missing pieces in the
    context of  SAM - SRM intgration project.  One of the key concepts that
    is been developed in SAM  at the moment is a borderline between
    protocols that manage files and protocols that access them.  As an
    illustration think of ways files can be put onto a regular disk using
    ftp or similar but accessed via mount point on AFS disk (that may have
    nothing to do with local filesystem path).
    
    With all that said here is a klude :
    The thing you are talking about exits already. I took me quite a while
    to realize it ( had to stop writing email and think a bit).
    Current HPSS adapter works the same way you want AFS adapter to work.
    It does not add anything extra to a existing location and it will use
    existing set of non cache locations to map to a pseudo cache.
    
    What you need to do.
    
    1) Create a node that is named like rfio://<anything here>.
    2) Add that node to the station : sam add disk --mount=<node name from
    step 1)>:<path that is common to all AFS locations (important)>
    3) Add --constrain-delivery=<path that is common to all AFS
    locations>::<node name from step 1)>
    4) Add --pmaster-arg=--consumption-map=\.\*::<node name from step 1)>
    5) Add --prefer-loc=<path that is common to all AFS locations>
    
    6) Run a stager anywhere . Give it argument --node-name=<node name from
    step1)>
    
    7) Run a project . See if locations given are what you expect.
    8) Report any issues.
    
    
    Here is an Example for DCache on CDF:

    top

    Breaking up CDF Datasets or How to handle 10's of thousands of files

    1. Make a dataset definition based on the run range you wish (or any other
    metadata constraint)
    
    Example:  We want 3 run ranges since we need 3 sets of tcl to
    handle things according to rumors we heard.
    
    100045 1 GOOD    for            runs < 149717
    150045 1 GOOD        subsequent runs <= 158635
    160045 1 GOOD        after this
    
    
    sam translate constraints --dim="__SET__ hbhd0c and run_number < 149717"
    sam translate constraints --dim="__SET__ hbhd0c and run_number >= 149717 and
    run_number <= 158635 "
    sam translate constraints --dim="__SET__ hbhd0c and run_number > 158635"
    
    For the first range we get:
    File Count:         1287
    
    For the second:
    File Count:         8141
    
    For the third:
    File Count:         20017
    
    2. Calculate your strategy for processing.  
    Constraints:
    a. We want our ntuple to be about 1G for efficient storage to tape.  
    
       Computation
           1 file output ntuple is 50M. So 100 files is 5G. This is probably ok.
    
    b. Make sure you are not running a batch job for more than a day or two.
    
       Computation
           100 files in a segment takes 10 hours.  We get 10 segments at once.  So we can 
           easily get 1000 files, and we will probably get a couple of these.  Let's try
           3000 files and 30 segments.
    
    Conclusions:
    a. We can use the first dataset definition as it is.
    
    b. We need to divide the second definition into 3 pieces.
    
    c. We need to divide the third definition into 7 pieces.
    
    Therefore we will have 11 jobs to submit. We will have a total output
    of 29445 files 1472.25 G = 1.47225 TB.
    
    
    
    3. Create the first dataset definition. Lets call it bussey-hbhd0c-pass01-01
    sam create dataset definition  \
    --dim="__SET__ hbhd0c and run_number < 149717 " \
    --definitionName="bussey-hbhd0c-pass01-01" \
    --defdesc="hbhd0c run<149717"  \
    --group=test
    
    Make a snapshot: This takes a definition, applies the constraints to the
    information about (metadata) of the current set of files and creates a
    list that is stored on the database.  This is called a snapshot and once
    this is taken, the dataset definition cannot be modified.
    
    sam take snapshot  \
    --definitionName="bussey-hbhd0c-pass01-01" \
    --group=test
    
    We get back the snapshot id 9063.
    
    This does not need division, so we dont need to do anything more.
    
    
    4. Now create the other two dataset definitions and take snapshots
    
    a. Create the definitions:
    
    sam create dataset definition \
    --definitionName="bussey-hbhd0c-pass01-02" \
    --defdesc="hbhd0c 149717 <= Run <= 158635" \ 
    --dim="__SET__ hbhd0c and run_number >= 149717 and run_number <= 158635" \
    --group=test
    
    
    sam create dataset definition \  
    --definitionName="bussey-hbhd0c-pass01-03" \
    --defdesc="hbhd0c run> 158635" \
    --dim="__SET__ hbhd0c and run_number > 158635" \
    --group=test 
    
    
    b. Take snapshots:
    
    sam take snapshot \
    --definitionName="bussey-hbhd0c-pass01-02" \
    --group=test
    
    Snapshot has been taken, snapshotId = 9064
    
    sam take snapshot \
    --definitionName="bussey-hbhd0c-pass01-03" \
    --group=test
    
    Snapshot has been taken, snapshotId = 9065
    
    
    c. Now divide the second dataset into 3 pieces.  
    
    i. Check first to see the filelist
    sam translate constraints \
    --dim="dataset_def_name bussey-hbhd0c-pass01-02 and snapshot_version 1 and snapshot_file_number 1-3000"
    
    sam translate constraints \
    --dim="dataset_def_name bussey-hbhd0c-pass01-02 and snapshot_version 1 and snapshot_file_number 3001-6000"
    
    sam translate constraints \
    --dim="dataset_def_name bussey-hbhd0c-pass01-02 and snapshot_version 1 and snapshot_file_number 6001-8148"
    
    ii. Now make these into dataset definitions
    sam create dataset definition \
     --definitionName=bussey-hbhd0c-pass01-02-set01 \
     --dim="dataset_def_name bussey-hbhd0c-pass01-02 and snapshot_version 1 and snapshot_file_number 1-3000" \
     --defdesc="hbhd0c 149717 <= Run <= 158635 files 1-3000" \
     --group=test
    
    
    
    

    top

    Define a Dataset that combines others

    
    sam translate constraints --dim="__SET__ jbot0h or __SET__ jbot1h"
    
    will give files in both datasets. (Warning: case sensitive, so OR does not work) For example
    sam translate constraints --dim="__SET__ jbot0h" you will get 690 files 
    
    and the same with jbot1h gives 41 so the "or" gives 731.

    top

    Storing Datasets in samV6

    Go to this site for a good example of how to store sam files with sam v6.

    top