Index

Use Case Analysis

Supported Use Cases
Unsupported Use Cases

Use Case Examples

Online emulation of DFC behaviour with orphans
H Stream: processing the production exe
User files and Monte Carlo

Reduction of Use Cases to their Components

Basic Test Exercise to Write to tape on CDFEN robot
Metadata HowTo: Automated generation of metadata for CDF storage
MetaData HowTo: Basic Version

MetaData HowTo: Expanded Version

Making an MC Request

Sam Translate: Getting lists of files based on constraints on parameters
Sam Locate: Finding where in the world the files are located
Making a SAM Dataset
Datasets Explained: SAM Datasets, CDF DataSets, Datasets, Project Snapshots
SAM Datasets: Figuring out what parameters are defined for each file, their values and getting access to metadata
Useful Binary list for SAM/CAF installations
SAM Datasets: Cleaning up after some jobs crash
File Availabilty Status and Dimension Queries
How to add disk for Permanent storage
How to add disk for sam cache returning URL's for DCACHE, HPSS, or AFS
How to divide up a dataset into smaller pieces for multiple job submission(how to handle 10's of thousands of files)
How to combine datasets

Use Case Analysis

Use cases for analysis are enumerated here. They are divided into currently supported use cases and use cases that are reasonable but not available. The use cases can be broken into components. These components are listed and linked to instructions on how to execute each of the components.

top

Supported Use Cases

Read any Data (including raw and secondary data) from local SAM cache at remote site and write
- HBOOK histograms to local disk without cataloging
- ROOT histograms to local disk without cataloging
- to FNAL CDFEN tape store with files > 100M with cataloging
- ntuples to local disk without cataloging
- ntuples with sizes > 100M to Fermilab CDFEN Tape Store with cataloging in SAM

Read any Data (including raw and secondary data) from local dCache configured as SAM cache at remote site and write
as above

Make a Monte Carlo request, Generate Monte Carlo, Simulate, Run through Trigsim and production and write to FNAL CDFEN Tape Store

Make a Request

Learn to Store a File

Understand how to create Metadata for the request or for the file storage

Optional: Learn more on storing parameters and metadata

Get Metadata from CDF file - optionally store at the same time

Read Generated events from Fermilab CDFEN Tape Store, Simulate, Run through Trigsim and production and store at FNAL CDFEN Tape Store with cataloging in SAM

Read Generated events from Fermilab Patriot Tape Store, Simulate, Run through Trigsim and production and save on disk

Store files on disk to FNAL CDFEN Tape Store with cataloging in SAM

top

Unsupported Use Cases

Read Secondary Data from FNAL DCache through local SAM Cache at remote site and write ntuples with sizes < 100M to disk, concatenate, and write concatenated files to Fermilab CDFEN Tape Store with cataloging in SAM

Read Generated events from Fermilab Patriot Tape Store, Simulate, Run through Trigsim and production and store at FNAL CDFEN Tape Store with cataloging in SAM

Read events (any kind) from non-SamCache scratch DCache (cataloged in SAM) and analyze in any manner.

Read secondary datasets from scratch DCache Simulate, Run through Trigsim and production and store at FNAL CDFEN Tape Store with cataloging in SAM

Read Monte Carlo events from scratch DCache

Read any Data from local DCache at remote site and perform any analysis while still being able to read any data from FNAL CDFEN Tape Store.

top

Use Case Examples

Use case: online emulation of dfc behaviour with orphans
o sam declare Note: requires knowledge of dataset and fileset. fileset can be some fake that is done for 10 files at time and is used by CDF'ers for accounting Later: o sam add location < permanent disk> o samadmin mark file availability status o sam admin mark file status ... do verification o copy to tape by hand with encp. o sam add location < tape> o special command to add cookie and checksum o sam delete location o samadmin mark file availablility status o samadmin mark file status

Use Case: For H stream,
o sam declare Note: Requires knowledge of dataset and fileset. fileset can be some fake that is done for 10 files at time and is used by CDF'ers for accounting Also requires the exact size in kbytes. o use a python script to put in the CRC checksum Note: Talks to dbserver so it does not need a password to the db. o sam add location Note: Location must be added as a valid location with sam_admin command o Run concatenation, use sam store to tape for concatenated output

Use Case: user files, and mc
o sam store < permanent disk> o samadmin mark file availability status o sam admin mark file status ... do verification o copy to tape by hand with encp. o sam add location < tape> o special command to add cookie. (Checksum should be there) o sam delete location o samadmin mark file availablility status o samadmin mark file status

top

Reduction of Use Cases to their Components

Table of Contents

Read
1.1 From SAM cache
1.1.1 From local SAM cache
1.1.2 From remote SAM cache or enstore (via the local SAM cache)
1.2 Local non-SAM transfer, with file unknown to SAM
1.2.1 From local disk
1.2.2 From scratch dCache
1.3 Local non-SAM transfer, with file known to SAM
1.3.1 From local disk
1.3.2 From scratch dCache
top

Write
2.1 To SAM
2.1.1 Write to Enstore (FNAL)
2.1.2 Write to permanent disk location
2.2 Non-SAM transfer giving SAM the location information
2.2.1 Write to scratch dCache (with file location declared to SAM
2.2.2 Write to local disk (with file location declared to SAM
2.3 Non-SAM transfer, with no info to SAM
2.3.1 Write to scratch dCachek
2.3.2 Write to local disk
top

1 Read
1.1 From SAM cache
1.1.1 From local SAM cache
1.1.2 From remote SAM cache
1.2 Local non-SAM transfer, with file unknown to SAM
1.2.1 From local disk
1.2.2 From scratch dCache
1.3 Local non-SAM transfer, with file known to SAM
1.3.1 From local disk
1.3.2 From scratch dCache

top

2 Write
2.1 To SAM
2.1.1 Write to Enstore (FNAL)

Perform a test exercise on fcdfdata016 and then on your station

Learn how to create the metadata file for your use

Use the instructions that allow you to store without regard to the details of the physical tape.

Retrieve your files as a group and form a sam dataset that allows you to run projects and analyze them.

2.1.2 Write to permanent disk location

This is not yet supported.

2.2 Non-SAM transfer giving SAM the location information

For this case, you can obtain the information on the location and names of files, and use that to just create a list that you put into your tcl file yourself and read from disk. These are the necessary steps:

Follow the instructions to get a list of files that meet your constraints.

Start a project that fetches your files into your local cache. Beware! You should be sure there is enough room to hold all your files and that nobody else will fetch files that will cause yours to go away because theirs are taking up too much space. You are better off using SAM within your application to get the next file, analyze and release it so you are sure that this works. You can do this with the calls illustrated in sam_par_ret.
For each file, obtain its location using sam locate

2.2.1 Write to scratch dCache (with file location declared to SAM)

This is not yet supported.

2.2.2 Write to local disk (with file location declared to SAM)

This is not yet supported.
2.3 Non-SAM transfer, with no info to SAM

You know how to do this: follow the DHOutput instructions.

2.3.1 Write to scratch dCache

This is not yet supported.

2.3.2 Write to local disk with cataloging in SAM

This is not yet supported.

top

Basic Test Exercise for Storing Files to Tape on the FNAL CDFEN Robot

Let's get started with a test. The goal is to store a file from fcdfdata016.fnal.gov with 1 event.
Here is what to do:

Get a sam shifter to add your kerberos principal to the .k5login on fcdfdata016.

Execute the following commands, following instructions in the comment lines:
ssh fcdflnx2 ssh sam@fcdfdata016 # Get around firewalls cd storetest cp rs-1ev-test-031105-0811.root \ # put in the UNIX time to dl-1ev-test-031105-<time>.root # make the file unique somehow cp rs-1ev-test-031105-0811.py dl-1ev-test-031105-<time>.py # #Edit the file rs-1ev-test-031105-0811.py #(1) Where you see "filename" put # in "dl-1ev-test-031105-<time>.root". #(2) Replace occurances of my username (stdenis) with yours. # # Setup the sam product and create an enviroment variable for the # station name setup sam export SAM_STATION=cdf-sam # # Do the store. Note that you specify the location in pnfs space. # sam store --descrip=dl-1ev-test-031105-<time>.py --source=. \ --dest=/pnfs/cdf/sam/test

You should get back a message like:

[sam@fcdfdata016 storetest]$ sam store --descrip=rs-1ev-test-031105-0820.py --source=. --dest=/pnfs/cdf/sam/test >>>>>> Loading user description of file(s) from rs-1ev-test-031105-0820.py >>>>>> Destination for rs-1ev-test-031105-0820.root is /pnfs/cdf/sam/test >>>>>> Validating metadata for rs-1ev-test-031105-0820.root >>>>>> Declaring 1 file(s) >>>>>> File rs-1ev-test-031105-0820.root has been declared with ID=2314195 >>>>>> Initial file meta-data has been committed >>>>>> Submitting rs-1ev-test-031105-0820.root to FSS at station cdf-sam >>>>>> Submitted all requests, now waiting for callbacks

At this point you will wait for the copy and store to occur. It can take a couple of minutes.

You will get a messsage when it is ok that says:
File store complete. 1

You can check that it was stored ok with:

sam@fcdfdata016 storetest]$ sam locate rs-1ev-test-031105-0811.root

['/pnfs/cdf/sam/test,vo3257'] [sam@fcdfdata016 storetest]$

Once this works, you can try to store from your remote location.
Note that this is a TEST area we are writing to. The proper way is to use the procedures that code so that handle automatic creation of directories in pfns space so that the --dest argument above will in fact be some kind of dummy argument that tells the code to pick the location.
The file tested here cannot be read back. For the complete testing of writing and reading, you should refer to the instructions below.
top

Metadata HowTo: Automated generation of metadata for CDF storage Version

The easiest way to store a file for CDF is to use a script from DHMods This calls the sam store commands decribed in the basic store test but it also harvests all the metadata from the file and your enviroment so that you don't have to write a metadata file like that described in the basic metadata creation instructions or the advanced description of adding metadata.
The command is evoked by checking out the DHMods package from the cdfsoft enviroment. A help commands is provided:
/DHMods/bin/samStoreCdfFile -help.
which gives
Store regular CDF files in SAM Usage: samStoreCdfFile possible options are: --help - this message --file= - full name of the file to be stored - mandatory --dataset= - CDF dataset assigned to the file - mandatory --pnfs= - file destination - mandatory --station= - local SAM station, may be set via $SAM_STATION - mandatory --host= - hostname for the local SAM station, may be set via $SAM_HOST_NAME - mandatory --description= - any description of the file --html= - reference to the description of data --storeoptions= - auxiliary options to be forwarded to the "sam store" command --rename - rename file according CDF convention -v - verbose output -t
This will store a file according to a dataset id you provide. To obtain the destination, you need to run a separate script. where-to-store.sh
top

MetaData HowTo: Basic Version

When you store a file with SAM, you store some data with it in order to find it later. A filename is unique in all places and times in SAM so if you choose "test" for your filename, you are likely to find you get an error. Therefore you should choose a filename that is unique -- usually by adding the unix time stamp and your station name or location. You should never use the filename with wildcards to search for files: that is what the metadata are used for.
The minimalist version of a metadata file contains the program name, version, nubmer of events, time produced, your name, where it was produced, the type of run, the group, stream and some descriptive text that may be the same for your private dataset. You can also choose your own "cdf dataset" name -- but beware that you choose something that someone else won't also pick or you will mix up your files with them. (This will be made impossible in a few months, but right now, it is a problem.) So putting your kerberos principle as the first part of any cdf dataset name is a good idea. You can also put a reference to some web page where additional information on the dataset is kept. Finally, if you are storing Monte Carlo files, you should NOT use the cdf dataset at the time of the file storage, but rather at the time of the Monte Carlo Request.

An example of this is shown here; while it says Monte Carlo and Generator, this is only a temporary kludge that you must use for all files. Use this anyway for a real file or ntuple. Also, the parameters chosen are for Monte Carlo -- they should be declared in the Monte Carlo Request. They are left here for illustration that parameters can be anything.

from import_classes import * appfamily=AppFamily('generator', '1.00', 'generator') filename = 'rs-1ev-test-031106-1910.root' t = SAMMCFile(filename,Events(1, 2, 2), "generated", appfamily, "01/21/2003 10:59:09", "01/21/2003 11:20:08", 18, { 'Global': { 'ProducedByName':'mrenna', 'OriginName':'fermilab', 'Phase':'unspecified', 'FacilityName':'fixed-target-farm', 'ProducedForName':'mrenna', 'RunType':'Monte Carlo', 'GroupName':'cdf', 'Stream':'m', 'Description':'test mc', }, 'CDF': { 'DataSet':'stink2', 'html':'http://cepa.fnal.gov/personal/mrenna/', } 'Generated' :{ 'AppFamily':'generator', 'FirstEvent':'1', 'AppVersion':'1.00', 'LastEvent':'2', 'NumRecords':'2', 'AppName':'generator', 'TotalEvents':'2', 'RunNumber':54321,} } )
For a Monte Carlo file, the request system is working so use this. Note the use of "requestid" in the "global" category:
from import_classes import * appfamily=AppFamily('generator', '1.00', 'generator') filename = 'rs-1ev-test-031106-1910.root' t = SAMMCFile(filename,Events(1, 2, 2), "generated", appfamily, "01/21/2003 10:59:09", "01/21/2003 11:20:08", 18, { 'Global': { 'ProducedByName':'mrenna', 'OriginName':'fermilab', 'Phase':'unspecified', 'FacilityName':'fixed-target-farm', 'ProducedForName':'mrenna', 'RunType':'Monte Carlo', 'GroupName':'cdf', 'Stream':'m', 'Description':'test mc', 'requestid':'27' }, 'CDF': { 'DataSet':'stink2', 'html':'http://cepa.fnal.gov/personal/mrenna/', }, 'Pythia': { 'cdfrelease':'testpycdfrelease', 'collider':'testpycollider', 'comments':'testpycomments', 'decaytable':'testpydecaytable', 'energy':'testpyenergy', 'et_jet_cut':'testpyet_jet_cut', 'fact_scale':'testpyfact_scale', 'lamqcd5':'testpylamqcd5', 'numrecords':'testpynumrecords', 'partons':'testpypartons', 'pdf':'testpypdf', 'physicsprocess':'testpyphysicsprocess', 'picobarns':'testpypicobarns', 'qcd_order':'testpyqcd_order', 'qcd_power':'testpyqcd_power', 'qed_order':'testpyqed_order', 'qed_power':'testpyqed_power', 'ranseed1':'testpyranseed1', 'ranseed2':'testpyranseed2', 'renorm_scale':'testpyrenorm_scale', 'runnumber':'testpyrunnumber', 'useevtgen':'testpyuseevtgen', 'useqq':'testpyuseqq', 'validated':'testpyvalidated', 'version':'testpyversion', 'webpage':'testpywebpage', }, 'Herwig' : { 'cdfrelease':'testhercdfrelease', 'collider':'testhercollider', 'comments':'testhercomments', 'decaytable':'testherdecaytable', 'energy':'testherenergy', 'et_jet_cut':'testheret_jet_cut', 'fact_scale':'testherfact_scale', 'lamqcd5':'testherlamqcd5', 'numrecords':'testhernumrecords', 'partons':'testherpartons', 'pdf':'testherpdf', 'physicsprocess':'testherphysicsprocess', 'picobarns':'testherpicobarns', 'qcd_order':'testherqcd_order', 'qcd_power':'testherqcd_power', 'qed_order':'testherqed_order', 'qed_power':'testherqed_power', 'ranseed1':'testherranseed1', 'ranseed2':'testherranseed2', 'renorm_scale':'testherrenorm_scale', 'runnumber':'testherrunnumber', 'validated':'testhervalidated', 'version':'testherversion', 'webpage':'testherwebpage', }, 'Alpgen' :{ 'collider':'testalpcollider', 'comments':'testalpcomments', 'dr_jj_cut':'testalpdr_jj_cut', 'dr_lj_cut':'testalpdr_lj_cut', 'energy':'testalpenergy', 'et_jet_cut':'testalpet_jet_cut', 'et_lep_cut':'testalpet_lep_cut', 'fact_scale':'testalpfact_scale', 'lamqcd5':'testalplamqcd5', 'll_mass_cut':'testalpll_mass_cut', 'numrecords':'testalpnumrecords', 'partons':'testalppartons', 'pdf':'testalppdf', 'physicsprocess':'testalpphysicsprocess', 'picobarns':'testalppicobarns', 'qcd_order':'testalpqcd_order', 'qcd_power':'testalpqcd_power', 'qed_order':'testalpqed_order', 'qed_power':'testalpqed_power', 'ranseed1':'testalpranseed1', 'ranseed2':'testalpranseed2', 'renorm_scale':'testalprenorm_scale', 'runnumber':'testalprunnumber', 'validated':'testalpvalidated', 'version':'testalpversion', 'webpage':'testalpwebpage', 'weight':'testalpweight', }, 'Madgraph' :{ 'collider':'testmadcollider', 'comments':'testmadcomments', 'dr_jj_cut':'testmaddr_jj_cut', 'dr_lj_cut':'testmaddr_lj_cut', 'energy':'testmadenergy', 'et_jet_cut':'testmadet_jet_cut', 'et_lep_cut':'testmadet_lep_cut', 'fact_scale':'testmadfact_scale', 'lamqcd5':'testmadlamqcd5', 'll_mass_cut':'testmadll_mass_cut', 'numrecords':'testmadnumrecords', 'partons':'testmadpartons', 'pdf':'testmadpdf', 'physicsprocess':'testmadphysicsprocess', 'picobarns':'testmadpicobarns', 'qcd_order':'testmadqcd_order', 'qcd_power':'testmadqcd_power', 'qed_order':'testmadqed_order', 'qed_power':'testmadqed_power', 'ranseed1':'testmadranseed1', 'ranseed2':'testmadranseed2', 'renorm_scale':'testmadrenorm_scale', 'runnumber':'testmadrunnumber', 'validated':'testmadvalidated', 'version':'testmadversion', 'webpage':'testmadwebpage', 'weight':'testmadweight', }, 'Generated' :{ 'AppFamily':'generator', 'FirstEvent':'1', 'AppVersion':'1.00', 'LastEvent':'2', 'NumRecords':'2', 'AppName':'generator', 'TotalEvents':'2', 'RunNumber':54321,} } )

When you have store metadata for a file, you can retrieve it with the command:
sam get metadata --file=<myfile>
as illustrated for the more complex example below.
top

MetaData HowTo: Expanded Version
The more expanded version of a metadata file is shown below. Parameters come in categories and types. The "type" is something you would call the parameter name.
You can see all the parameter types by the following sam command:
sam translate constraints --dim="help"
which returns
Dimensions can be used to query files based on the SAM meta-data, Run Config data, or MCRun parameters. This style of help shows all the available dimensions. Or, you can view a more limited subset of dimensions by using the type option: --dim=help --type=<typeName> Where <typeName> is one of the following: alpgen, cdf, cdfsim, datafile, datasetdef, dfc, herwig, madgraph, mc, mcrun, pythia, run
Following the instructions above, we issue the folloing command for the typeName "cdf":
[stdenis@nglas05 ~]$ sam translate constraints --dim="help" --type=cdf
and we obtain
To use dimension queries, specify dimensions and constraints combined with and/or/minus operators, as in these examples: --dim='pythia.topmass 75 and simulated.numrecords > 404' --rpn='pythia.topmass 75 simulated.numrecords > 404 and' --dim='pythia.topmass 50-100 or global.originname nixhef' --dim='(data_tier digitized and appl_name d0reco and version preco03.07.00) \ minus generated.decay > 12' Available dimensions (not case sensitive): CDF.DATASET : cdf file catalog dataset CDF.FILESET : cdf file catalog fileset CDF.HTML : web page with further information about this file
The syntax then maps into the metadata storage as indicated in the example below. This example is a very complex and somewhat nonsensical one that demonstrates a number of parameters that are available.
from import_classes import * appfamily=AppFamily('generator', '1.00', 'generator') filename = 'rs-1ev-test-031106-1910.root' t = SAMMCFile(filename,Events(1, 2, 2), "generated", appfamily, "01/21/2003 10:59:09", "01/21/2003 11:20:08", 18, { 'Global': { 'ProducedByName':'mrenna', 'OriginName':'fermilab', 'Phase':'unspecified', 'FacilityName':'fixed-target-farm', 'ProducedForName':'mrenna', 'RunType':'Monte Carlo', 'GroupName':'cdf', 'Stream':'m', 'Description':'test mc', }, 'CDF': { 'DataSet':'stink2', 'html':'http://cepa.fnal.gov/personal/mrenna/', }, 'Pythia': { 'cdfrelease':'testpycdfrelease', 'collider':'testpycollider', 'comments':'testpycomments', 'decaytable':'testpydecaytable', 'energy':'testpyenergy', 'et_jet_cut':'testpyet_jet_cut', 'fact_scale':'testpyfact_scale', 'lamqcd5':'testpylamqcd5', 'numrecords':'testpynumrecords', 'partons':'testpypartons', 'pdf':'testpypdf', 'physicsprocess':'testpyphysicsprocess', 'picobarns':'testpypicobarns', 'qcd_order':'testpyqcd_order', 'qcd_power':'testpyqcd_power', 'qed_order':'testpyqed_order', 'qed_power':'testpyqed_power', 'ranseed1':'testpyranseed1', 'ranseed2':'testpyranseed2', 'renorm_scale':'testpyrenorm_scale', 'runnumber':'testpyrunnumber', 'useevtgen':'testpyuseevtgen', 'useqq':'testpyuseqq', 'validated':'testpyvalidated', 'version':'testpyversion', 'webpage':'testpywebpage', }, 'Herwig' : { 'cdfrelease':'testhercdfrelease', 'collider':'testhercollider', 'comments':'testhercomments', 'decaytable':'testherdecaytable', 'energy':'testherenergy', 'et_jet_cut':'testheret_jet_cut', 'fact_scale':'testherfact_scale', 'lamqcd5':'testherlamqcd5', 'numrecords':'testhernumrecords', 'partons':'testherpartons', 'pdf':'testherpdf', 'physicsprocess':'testherphysicsprocess', 'picobarns':'testherpicobarns', 'qcd_order':'testherqcd_order', 'qcd_power':'testherqcd_power', 'qed_order':'testherqed_order', 'qed_power':'testherqed_power', 'ranseed1':'testherranseed1', 'ranseed2':'testherranseed2', 'renorm_scale':'testherrenorm_scale', 'runnumber':'testherrunnumber', 'validated':'testhervalidated', 'version':'testherversion', 'webpage':'testherwebpage', }, 'Alpgen' :{ 'collider':'testalpcollider', 'comments':'testalpcomments', 'dr_jj_cut':'testalpdr_jj_cut', 'dr_lj_cut':'testalpdr_lj_cut', 'energy':'testalpenergy', 'et_jet_cut':'testalpet_jet_cut', 'et_lep_cut':'testalpet_lep_cut', 'fact_scale':'testalpfact_scale', 'lamqcd5':'testalplamqcd5', 'll_mass_cut':'testalpll_mass_cut', 'numrecords':'testalpnumrecords', 'partons':'testalppartons', 'pdf':'testalppdf', 'physicsprocess':'testalpphysicsprocess', 'picobarns':'testalppicobarns', 'qcd_order':'testalpqcd_order', 'qcd_power':'testalpqcd_power', 'qed_order':'testalpqed_order', 'qed_power':'testalpqed_power', 'ranseed1':'testalpranseed1', 'ranseed2':'testalpranseed2', 'renorm_scale':'testalprenorm_scale', 'runnumber':'testalprunnumber', 'validated':'testalpvalidated', 'version':'testalpversion', 'webpage':'testalpwebpage', 'weight':'testalpweight', }, 'Madgraph' :{ 'collider':'testmadcollider', 'comments':'testmadcomments', 'dr_jj_cut':'testmaddr_jj_cut', 'dr_lj_cut':'testmaddr_lj_cut', 'energy':'testmadenergy', 'et_jet_cut':'testmadet_jet_cut', 'et_lep_cut':'testmadet_lep_cut', 'fact_scale':'testmadfact_scale', 'lamqcd5':'testmadlamqcd5', 'll_mass_cut':'testmadll_mass_cut', 'numrecords':'testmadnumrecords', 'partons':'testmadpartons', 'pdf':'testmadpdf', 'physicsprocess':'testmadphysicsprocess', 'picobarns':'testmadpicobarns', 'qcd_order':'testmadqcd_order', 'qcd_power':'testmadqcd_power', 'qed_order':'testmadqed_order', 'qed_power':'testmadqed_power', 'ranseed1':'testmadranseed1', 'ranseed2':'testmadranseed2', 'renorm_scale':'testmadrenorm_scale', 'runnumber':'testmadrunnumber', 'validated':'testmadvalidated', 'version':'testmadversion', 'webpage':'testmadwebpage', 'weight':'testmadweight', }, 'Generated' :{ 'AppFamily':'generator', 'FirstEvent':'1', 'AppVersion':'1.00', 'LastEvent':'2', 'NumRecords':'2', 'AppName':'generator', 'TotalEvents':'2', 'RunNumber':54321,} } )

The metadata for a file stored in this way can then be examined with the command:
sam get metadata --file='rs-1ev-test-031106-1910.root'

which returns
File Type: SAMMC Data File File Name: rs-1ev-test-031106-1910.root File ID: 2318128 File Size: 307446 [B] CRC Data: 525925219L [adler 32 crc type] File Start Time: 01/21/2003 10:59:09 File End Time: 01/21/2003 11:20:08 Physical Stream: m File Format Info: unknown file format First Event: 1 Last Event: 2 Total Events: 2 Application Family: generator Application Name: generator Application Version: 1.00 Import Process ID: 0 Node Name: fcdfdata016.fnal.gov Work Group: cdf User Name: sam Produced For: mrenna Produced By: mrenna Origin Location: fermilab Origin Facility: fixed-target-farm Physics Channel: Description: test mc MC Phase: unspecified Run Number: 54321 Run Type: monte carlo Run Start Time: 01/21/2003 10:59:09 Run End Time: 01/21/2003 11:20:08 Run Description: test mc Run CM Energy: 0.0 Parent Files: [] Split: 0 Merge: 0 Key: collider = testalpcollider (Category: alpgen) Key: comments = testalpcomments (Category: alpgen) Key: dr_jj_cut = testalpdr_jj_cut (Category: alpgen) Key: dr_lj_cut = testalpdr_lj_cut (Category: alpgen) Key: energy = testalpenergy (Category: alpgen) Key: et_jet_cut = testalpet_jet_cut (Category: alpgen) Key: et_lep_cut = testalpet_lep_cut (Category: alpgen) Key: fact_scale = testalpfact_scale (Category: alpgen) Key: lamqcd5 = testalplamqcd5 (Category: alpgen) Key: ll_mass_cut = testalpll_mass_cut (Category: alpgen) Key: numrecords = testalpnumrecords (Category: alpgen) Key: partons = testalppartons (Category: alpgen) Key: pdf = testalppdf (Category: alpgen) Key: physicsprocess = testalpphysicsprocess (Category: alpgen) Key: picobarns = testalppicobarns (Category: alpgen) Key: qcd_order = testalpqcd_order (Category: alpgen) Key: qcd_power = testalpqcd_power (Category: alpgen) Key: qed_order = testalpqed_order (Category: alpgen) Key: qed_power = testalpqed_power (Category: alpgen) Key: ranseed1 = testalpranseed1 (Category: alpgen) Key: ranseed2 = testalpranseed2 (Category: alpgen) Key: renorm_scale = testalprenorm_scale (Category: alpgen) Key: runnumber = testalprunnumber (Category: alpgen) Key: validated = testalpvalidated (Category: alpgen) Key: version = testalpversion (Category: alpgen) Key: webpage = testalpwebpage (Category: alpgen) Key: weight = testalpweight (Category: alpgen) Key: dataset = stink2 (Category: cdf) Key: html = http://cepa.fnal.gov/personal/mrenna/ (Category: cdf) Key: appfamily = generator (Category: generated) Key: appname = generator (Category: generated) Key: appversion = 1.00 (Category: generated) Key: firstevent = 1 (Category: generated) Key: lastevent = 2 (Category: generated) Key: numrecords = 2 (Category: generated) Key: runnumber = 54321 (Category: generated) Key: totalevents = 2 (Category: generated) Key: facilityname = fixed-target-farm (Category: global) Key: groupname = cdf (Category: global) Key: originname = fermilab (Category: global) Key: phase = unspecified (Category: global) Key: producedbyname = mrenna (Category: global) Key: producedforname = mrenna (Category: global) Key: cdfrelease = testhercdfrelease (Category: herwig) Key: collider = testhercollider (Category: herwig) Key: comments = testhercomments (Category: herwig) Key: decaytable = testherdecaytable (Category: herwig) Key: energy = testherenergy (Category: herwig) Key: et_jet_cut = testheret_jet_cut (Category: herwig) Key: fact_scale = testherfact_scale (Category: herwig) Key: lamqcd5 = testherlamqcd5 (Category: herwig) Key: numrecords = testhernumrecords (Category: herwig) Key: partons = testherpartons (Category: herwig) Key: pdf = testherpdf (Category: herwig) Key: physicsprocess = testherphysicsprocess (Category: herwig) Key: picobarns = testherpicobarns (Category: herwig) Key: qcd_order = testherqcd_order (Category: herwig) Key: qcd_power = testherqcd_power (Category: herwig) Key: qed_order = testherqed_order (Category: herwig) Key: qed_power = testherqed_power (Category: herwig) Key: ranseed1 = testherranseed1 (Category: herwig) Key: ranseed2 = testherranseed2 (Category: herwig) Key: renorm_scale = testherrenorm_scale (Category: herwig) Key: runnumber = testherrunnumber (Category: herwig) Key: validated = testhervalidated (Category: herwig) Key: version = testherversion (Category: herwig) Key: webpage = testherwebpage (Category: herwig) Key: collider = testmadcollider (Category: madgraph) Key: comments = testmadcomments (Category: madgraph) Key: dr_jj_cut = testmaddr_jj_cut (Category: madgraph) Key: dr_lj_cut = testmaddr_lj_cut (Category: madgraph) Key: energy = testmadenergy (Category: madgraph) Key: et_jet_cut = testmadet_jet_cut (Category: madgraph) Key: et_lep_cut = testmadet_lep_cut (Category: madgraph) Key: fact_scale = testmadfact_scale (Category: madgraph) Key: lamqcd5 = testmadlamqcd5 (Category: madgraph) Key: ll_mass_cut = testmadll_mass_cut (Category: madgraph) Key: numrecords = testmadnumrecords (Category: madgraph) Key: partons = testmadpartons (Category: madgraph) Key: pdf = testmadpdf (Category: madgraph) Key: physicsprocess = testmadphysicsprocess (Category: madgraph) Key: picobarns = testmadpicobarns (Category: madgraph) Key: qcd_order = testmadqcd_order (Category: madgraph) Key: qcd_power = testmadqcd_power (Category: madgraph) Key: qed_order = testmadqed_order (Category: madgraph) Key: qed_power = testmadqed_power (Category: madgraph) Key: ranseed1 = testmadranseed1 (Category: madgraph) Key: ranseed2 = testmadranseed2 (Category: madgraph) Key: renorm_scale = testmadrenorm_scale (Category: madgraph) Key: runnumber = testmadrunnumber (Category: madgraph) Key: validated = testmadvalidated (Category: madgraph) Key: version = testmadversion (Category: madgraph) Key: webpage = testmadwebpage (Category: madgraph) Key: weight = testmadweight (Category: madgraph) Key: cdfrelease = testpycdfrelease (Category: pythia) Key: collider = testpycollider (Category: pythia) Key: comments = testpycomments (Category: pythia) Key: decaytable = testpydecaytable (Category: pythia) Key: energy = testpyenergy (Category: pythia) Key: et_jet_cut = testpyet_jet_cut (Category: pythia) Key: fact_scale = testpyfact_scale (Category: pythia) Key: lamqcd5 = testpylamqcd5 (Category: pythia) Key: numrecords = testpynumrecords (Category: pythia) Key: partons = testpypartons (Category: pythia) Key: pdf = testpypdf (Category: pythia) Key: physicsprocess = testpyphysicsprocess (Category: pythia) Key: picobarns = testpypicobarns (Category: pythia) Key: qcd_order = testpyqcd_order (Category: pythia) Key: qcd_power = testpyqcd_power (Category: pythia) Key: qed_order = testpyqed_order (Category: pythia) Key: qed_power = testpyqed_power (Category: pythia) Key: ranseed1 = testpyranseed1 (Category: pythia) Key: ranseed2 = testpyranseed2 (Category: pythia) Key: renorm_scale = testpyrenorm_scale (Category: pythia) Key: runnumber = testpyrunnumber (Category: pythia) Key: useevtgen = testpyuseevtgen (Category: pythia) Key: useqq = testpyuseqq (Category: pythia) Key: validated = testpyvalidated (Category: pythia) Key: version = testpyversion (Category: pythia) Key: webpage = testpywebpage (Category: pythia) Request ID: 0 Data Tier: generated

top

Sam Translate: Getting lists of files based on constraints on parameters
The most useful form of sam translate constraints is to ask for a cdf dataset that you formed. For the example in "complex" store the cdf dataset name is "stink2". Hence we use:
[stdenis@nglas05 ~]$ sam translate constraints --dim="cdf.dataset stink2"
to obtain:
Files: sm-store-test1.root rs-1ev-test-031106-1810.root rs-1ev-test-031106-1840.root rs-1ev-test-031106-1844.root rs-1ev-test-031106-1844-1.root rs-1ev-test-031106-1845.root rs-1ev-test-031106-1845-1.root rs-1ev-test-031106-1847.root rs-1ev-test-031106-1855.root rs-1ev-test-031106-1910.root rs-1ev-test-031114-2251.root File Count: 11 Average File Size: 300 Total File Size: 3300 Total Event Count: 22

For Monte Carlo, you do not use the dataset to do the dimension. You use the request id. This is because you will have looked for the Monte Carlo based on some description and then found some samples that are interesting. With each sample is a request id and when you want to use that sample, you ask for the file based on this id. The request ID is guaranteed to be unique. For example, if the request given by requestId=27 is what you want, the sam translate constraints will be done as follows:

sam translate constraints --dim="GLOBAL.REQUESTID 27"
yielding
Files: samTueMay25133949CDT2004.root samTueMay25145356CDT2004.root File Count: 2 Average File Size: 300 Total File Size: 600 Total Event Count: 4

top

Making a Monte Carlo Request
A Monte Carlo Request is made by creating a python file that looks much like that for the file storage. The file contains the parameters of the request. Here is an example:
from SamUserApiImportClasses import * datatier='reconstructed' appfamily=AppFamily('generator', '1.00', 'generator') dict={ 'cdf':{ 'dataset':'stink2', 'html':'http://cepa.fnal.gov/personal/mrenna/', }, 'Pythia': { 'cdfrelease':'testpycdfrelease', 'collider':'testpycollider', 'comments':'testpycomments', 'decaytable':'testpydecaytable', 'energy':'testpyenergy', 'et_jet_cut':'testpyet_jet_cut', 'fact_scale':'testpyfact_scale', 'lamqcd5':'testpylamqcd5', 'numrecords':'testpynumrecords', 'partons':'testpypartons', 'pdf':'testpypdf', 'physicsprocess':'testpyphysicsprocess', 'picobarns':'testpypicobarns', 'qcd_order':'testpyqcd_order', 'qcd_power':'testpyqcd_power', 'qed_order':'testpyqed_order', 'qed_power':'testpyqed_power', 'ranseed1':'testpyranseed1', 'ranseed2':'testpyranseed2', 'renorm_scale':'testpyrenorm_scale', 'runnumber':'testpyrunnumber', 'useevtgen':'testpyuseevtgen', 'useqq':'testpyuseqq', 'validated':'testpyvalidated', 'version':'testpyversion', 'webpage':'testpywebpage', }, 'Herwig' : { 'cdfrelease':'testhercdfrelease', 'collider':'testhercollider', 'comments':'testhercomments', 'decaytable':'testherdecaytable', 'energy':'testherenergy', 'et_jet_cut':'testheret_jet_cut', 'fact_scale':'testherfact_scale', 'lamqcd5':'testherlamqcd5', 'numrecords':'testhernumrecords', 'partons':'testherpartons', 'pdf':'testherpdf', 'physicsprocess':'testherphysicsprocess', 'picobarns':'testherpicobarns', 'qcd_order':'testherqcd_order', 'qcd_power':'testherqcd_power', 'qed_order':'testherqed_order', 'qed_power':'testherqed_power', 'ranseed1':'testherranseed1', 'ranseed2':'testherranseed2', 'renorm_scale':'testherrenorm_scale', 'runnumber':'testherrunnumber', 'validated':'testhervalidated', 'version':'testherversion', 'webpage':'testherwebpage', }, 'Alpgen' : { 'collider':'testalpcollider', 'comments':'testalpcomments', 'dr_jj_cut':'testalpdr_jj_cut', 'dr_lj_cut':'testalpdr_lj_cut', 'energy':'testalpenergy', 'et_jet_cut':'testalpet_jet_cut', 'et_lep_cut':'testalpet_lep_cut', 'fact_scale':'testalpfact_scale', 'lamqcd5':'testalplamqcd5', 'll_mass_cut':'testalpll_mass_cut', 'numrecords':'testalpnumrecords', 'partons':'testalppartons', 'pdf':'testalppdf', 'physicsprocess':'testalpphysicsprocess', 'picobarns':'testalppicobarns', 'qcd_order':'testalpqcd_order', 'qcd_power':'testalpqcd_power', 'qed_order':'testalpqed_order', 'qed_power':'testalpqed_power', 'ranseed1':'testalpranseed1', 'ranseed2':'testalpranseed2', 'renorm_scale':'testalprenorm_scale', 'runnumber':'testalprunnumber', 'validated':'testalpvalidated', 'version':'testalpversion', 'webpage':'testalpwebpage', 'weight':'testalpweight', }, 'Madgraph' :{ 'collider':'testmadcollider', 'comments':'testmadcomments', 'dr_jj_cut':'testmaddr_jj_cut', 'dr_lj_cut':'testmaddr_lj_cut', 'energy':'testmadenergy', 'et_jet_cut':'testmadet_jet_cut', 'et_lep_cut':'testmadet_lep_cut', 'fact_scale':'testmadfact_scale', 'lamqcd5':'testmadlamqcd5', 'll_mass_cut':'testmadll_mass_cut', 'numrecords':'testmadnumrecords', 'partons':'testmadpartons', 'pdf':'testmadpdf', 'physicsprocess':'testmadphysicsprocess', 'picobarns':'testmadpicobarns', 'qcd_order':'testmadqcd_order', 'qcd_power':'testmadqcd_power', 'qed_order':'testmadqed_order', 'qed_power':'testmadqed_power', 'ranseed1':'testmadranseed1', 'ranseed2':'testmadranseed2', 'renorm_scale':'testmadrenorm_scale', 'runnumber':'testmadrunnumber', 'validated':'testmadvalidated', 'version':'testmadversion', 'webpage':'testmadwebpage', 'weight':'testmadweight', }, 'Global':{ 'phase':'undefined', 'stream':'notstreamed', 'description':'junk', 'producedforname':'stdenis', 'runtype':'monte carlo', 'groupname':'test', }, 'Generated':{ 'useevtgen':'on', 'collisionenergy':'1960.0', 'decay':'tauola', 'generator':'pythia', 'pdflibfunc':'CTEQ5', }, }
This request is obviously nonsense but demonstrates the generators and parameters available. In a real case, one of these would be chosen with sensible values.
The request is then sent by the following command:
sam create request --dictfile=req3.py --group=test --numEvents=1
and one obtains a request id:
RequestId 32
This request id is then entered in the file metdata under the global parameter category. The request can be queried:
sam get request details --requestId=32
This returns all the info:
Request Detail ID:21 Family: generator Application Name: generator Version: 1.00 Request Detail Status: okay Proj Snap ID: 0 Request ID: 32 Installation ID: 0 Param Type: runtype Value: monte carlo Category: global Data_tier: reconstructed Description: Param Type: stream Value: notstreamed Category: global Data_tier: reconstructed Description: Param Type: description Value: junk Category: global Data_tier: reconstructed Description: Param Type: producedforname Value: stdenis Category: global Data_tier: reconstructed Description: Param Type: groupname Value: test Category: global Data_tier: reconstructed Description: Param Type: useevtgen Value: on Category: generated Data_tier: reconstructed Description: Param Type: collisionenergy Value: 1960.0 Category: generated Data_tier: reconstructed Description: Param Type: dataset Value: stink2 Category: cdf Data_tier: reconstructed Description: Param Type: html Value: http://cepa.fnal.gov/personal/mrenna/ Category: cdf Data_tier: reconstructed Description: Param Type: dr_lj_cut Value: testmaddr_lj_cut Category: madgraph Data_tier: reconstructed Description: Param Type: qed_order Value: testmadqed_order Category: madgraph Data_tier: reconstructed Description: Param Type: pdf Value: testmadpdf Category: madgraph Data_tier: reconstructed Description: Param Type: webpage Value: testmadwebpage Category: madgraph Data_tier: reconstructed Description: Param Type: et_jet_cut Value: testmadet_jet_cut Category: madgraph Data_tier: reconstructed Description: Param Type: version Value: testmadversion Category: madgraph Data_tier: reconstructed Description: Param Type: energy Value: testmadenergy Category: madgraph Data_tier: reconstructed Description: Param Type: comments Value: testmadcomments Category: madgraph Data_tier: reconstructed Description: Param Type: numrecords Value: testmadnumrecords Category: madgraph Data_tier: reconstructed Description: Param Type: ranseed2 Value: testmadranseed2 Category: madgraph Data_tier: reconstructed Description: Param Type: ranseed1 Value: testmadranseed1 Category: madgraph Data_tier: reconstructed Description: Param Type: ll_mass_cut Value: testmadll_mass_cut Category: madgraph Data_tier: reconstructed Description: Param Type: collider Value: testmadcollider Category: madgraph Data_tier: reconstructed Description: Param Type: qcd_order Value: testmadqcd_order Category: madgraph Data_tier: reconstructed Description: Param Type: renorm_scale Value: testmadrenorm_scale Category: madgraph Data_tier: reconstructed Description: Param Type: validated Value: testmadvalidated Category: madgraph Data_tier: reconstructed Description: Param Type: dr_jj_cut Value: testmaddr_jj_cut Category: madgraph Data_tier: reconstructed Description: Param Type: picobarns Value: testmadpicobarns Category: madgraph Data_tier: reconstructed Description: Param Type: fact_scale Value: testmadfact_scale Category: madgraph Data_tier: reconstructed Description: Param Type: lamqcd5 Value: testmadlamqcd5 Category: madgraph Data_tier: reconstructed Description: Param Type: et_lep_cut Value: testmadet_lep_cut Category: madgraph Data_tier: reconstructed Description: Param Type: runnumber Value: testmadrunnumber Category: madgraph Data_tier: reconstructed Description: Param Type: physicsprocess Value: testmadphysicsprocess Category: madgraph Data_tier: reconstructed Description: Param Type: qcd_power Value: testmadqcd_power Category: madgraph Data_tier: reconstructed Description: Param Type: weight Value: testmadweight Category: madgraph Data_tier: reconstructed Description: Param Type: partons Value: testmadpartons Category: madgraph Data_tier: reconstructed Description: Param Type: qed_power Value: testmadqed_power Category: madgraph Data_tier: reconstructed Description: Param Type: qed_order Value: testpyqed_order Category: pythia Data_tier: reconstructed Description: Param Type: pdf Value: testpypdf Category: pythia Data_tier: reconstructed Description: Param Type: et_jet_cut Value: testpyet_jet_cut Category: pythia Data_tier: reconstructed Description: Param Type: version Value: testpyversion Category: pythia Data_tier: reconstructed Description: Param Type: comments Value: testpycomments Category: pythia Data_tier: reconstructed Description: Param Type: numrecords Value: testpynumrecords Category: pythia Data_tier: reconstructed Description: Param Type: cdfrelease Value: testpycdfrelease Category: pythia Data_tier: reconstructed Description: Param Type: ranseed1 Value: testpyranseed1 Category: pythia Data_tier: reconstructed Description: Param Type: collider Value: testpycollider Category: pythia Data_tier: reconstructed Description: Param Type: decaytable Value: testpydecaytable Category: pythia Data_tier: reconstructed Description: Param Type: ranseed2 Value: testpyranseed2 Category: pythia Data_tier: reconstructed Description: Param Type: qcd_order Value: testpyqcd_order Category: pythia Data_tier: reconstructed Description: Param Type: useevtgen Value: testpyuseevtgen Category: pythia Data_tier: reconstructed Description: Param Type: renorm_scale Value: testpyrenorm_scale Category: pythia Data_tier: reconstructed Description: Param Type: validated Value: testpyvalidated Category: pythia Data_tier: reconstructed Description: Param Type: partons Value: testpypartons Category: pythia Data_tier: reconstructed Description: Param Type: picobarns Value: testpypicobarns Category: pythia Data_tier: reconstructed Description: Param Type: fact_scale Value: testpyfact_scale Category: pythia Data_tier: reconstructed Description: Param Type: lamqcd5 Value: testpylamqcd5 Category: pythia Data_tier: reconstructed Description: Param Type: energy Value: testpyenergy Category: pythia Data_tier: reconstructed Description: Param Type: runnumber Value: testpyrunnumber Category: pythia Data_tier: reconstructed Description: Param Type: webpage Value: testpywebpage Category: pythia Data_tier: reconstructed Description: Param Type: qcd_power Value: testpyqcd_power Category: pythia Data_tier: reconstructed Description: Param Type: physicsprocess Value: testpyphysicsprocess Category: pythia Data_tier: reconstructed Description: Param Type: useqq Value: testpyuseqq Category: pythia Data_tier: reconstructed Description: Param Type: qed_power Value: testpyqed_power Category: pythia Data_tier: reconstructed Description: Param Type: dr_lj_cut Value: testalpdr_lj_cut Category: alpgen Data_tier: reconstructed Description: Param Type: qed_order Value: testalpqed_order Category: alpgen Data_tier: reconstructed Description: Param Type: pdf Value: testalppdf Category: alpgen Data_tier: reconstructed Description: Param Type: webpage Value: testalpwebpage Category: alpgen Data_tier: reconstructed Description: Param Type: et_jet_cut Value: testalpet_jet_cut Category: alpgen Data_tier: reconstructed Description: Param Type: version Value: testalpversion Category: alpgen Data_tier: reconstructed Description: Param Type: energy Value: testalpenergy Category: alpgen Data_tier: reconstructed Description: Param Type: comments Value: testalpcomments Category: alpgen Data_tier: reconstructed Description: Param Type: numrecords Value: testalpnumrecords Category: alpgen Data_tier: reconstructed Description: Param Type: ranseed2 Value: testalpranseed2 Category: alpgen Data_tier: reconstructed Description: Param Type: ranseed1 Value: testalpranseed1 Category: alpgen Data_tier: reconstructed Description: Param Type: ll_mass_cut Value: testalpll_mass_cut Category: alpgen Data_tier: reconstructed Description: Param Type: collider Value: testalpcollider Category: alpgen Data_tier: reconstructed Description: Param Type: qcd_order Value: testalpqcd_order Category: alpgen Data_tier: reconstructed Description: Param Type: renorm_scale Value: testalprenorm_scale Category: alpgen Data_tier: reconstructed Description: Param Type: validated Value: testalpvalidated Category: alpgen Data_tier: reconstructed Description: Param Type: dr_jj_cut Value: testalpdr_jj_cut Category: alpgen Data_tier: reconstructed Description: Param Type: picobarns Value: testalppicobarns Category: alpgen Data_tier: reconstructed Description: Param Type: fact_scale Value: testalpfact_scale Category: alpgen Data_tier: reconstructed Description: Param Type: lamqcd5 Value: testalplamqcd5 Category: alpgen Data_tier: reconstructed Description: Param Type: et_lep_cut Value: testalpet_lep_cut Category: alpgen Data_tier: reconstructed Description: Param Type: runnumber Value: testalprunnumber Category: alpgen Data_tier: reconstructed Description: Param Type: physicsprocess Value: testalpphysicsprocess Category: alpgen Data_tier: reconstructed Description: Param Type: qcd_power Value: testalpqcd_power Category: alpgen Data_tier: reconstructed Description: Param Type: weight Value: testalpweight Category: alpgen Data_tier: reconstructed Description: Param Type: partons Value: testalppartons Category: alpgen Data_tier: reconstructed Description: Param Type: qed_power Value: testalpqed_power Category: alpgen Data_tier: reconstructed Description: Param Type: qed_order Value: testherqed_order Category: herwig Data_tier: reconstructed Description: Param Type: pdf Value: testherpdf Category: herwig Data_tier: reconstructed Description: Param Type: et_jet_cut Value: testheret_jet_cut Category: herwig Data_tier: reconstructed Description: Param Type: version Value: testherversion Category: herwig Data_tier: reconstructed Description: Param Type: comments Value: testhercomments Category: herwig Data_tier: reconstructed Description: Param Type: numrecords Value: testhernumrecords Category: herwig Data_tier: reconstructed Description: Param Type: cdfrelease Value: testhercdfrelease Category: herwig Data_tier: reconstructed Description: Param Type: ranseed1 Value: testherranseed1 Category: herwig Data_tier: reconstructed Description: Param Type: collider Value: testhercollider Category: herwig Data_tier: reconstructed Description: Param Type: decaytable Value: testherdecaytable Category: herwig Data_tier: reconstructed Description: Param Type: ranseed2 Value: testherranseed2 Category: herwig Data_tier: reconstructed Description: Param Type: qcd_order Value: testherqcd_order Category: herwig Data_tier: reconstructed Description: Param Type: renorm_scale Value: testherrenorm_scale Category: herwig Data_tier: reconstructed Description: Param Type: validated Value: testhervalidated Category: herwig Data_tier: reconstructed Description: Param Type: partons Value: testherpartons Category: herwig Data_tier: reconstructed Description: Param Type: picobarns Value: testherpicobarns Category: herwig Data_tier: reconstructed Description: Param Type: fact_scale Value: testherfact_scale Category: herwig Data_tier: reconstructed Description: Param Type: lamqcd5 Value: testherlamqcd5 Category: herwig Data_tier: reconstructed Description: Param Type: energy Value: testherenergy Category: herwig Data_tier: reconstructed Description: Param Type: runnumber Value: testherrunnumber Category: herwig Data_tier: reconstructed Description: Param Type: webpage Value: testherwebpage Category: herwig Data_tier: reconstructed Description: Param Type: qcd_power Value: testherqcd_power Category: herwig Data_tier: reconstructed Description: Param Type: physicsprocess Value: testherphysicsprocess Category: herwig Data_tier: reconstructed Description: Param Type: qed_power Value: testherqed_power Category: herwig Data_tier: reconstructed Description: Param Type: phase Value: undefined Category: global Data_tier: reconstructed Description: Param Type: generator Value: pythia Category: generated Data_tier: reconstructed Description: Param Type: decay Value: tauola Category: generated Data_tier: reconstructed Description: Param Type: pdflibfunc Value: CTEQ5 Category: generated Data_tier: reconstructed Description:
Requests can be listed:
sam list requests --requestIdGt=0
giving
[1, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none'] [2, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none'] [3, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none'] [4, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none'] [5, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none'] [6, 'new', 'mc-request', 1, 'sam', 'test', 10000, 'none'] [7, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [8, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [9, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [10, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [11, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [12, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [13, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [14, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [15, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [16, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [17, 'new', 'mc-request', 1, 'lauri', 'test', 1, 'none'] [18, 'new', 'mc-request', 1, 'lauri', 'test', 1, 'none'] [19, 'new', 'mc-request', 1, 'lauri', 'test', 1, 'none'] [20, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [21, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [22, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [23, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [24, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [25, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [26, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [27, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [28, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [29, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [30, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [31, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none'] [32, 'new', 'mc-request', 1, 'sam', 'test', 1, 'none']
This functionality will soon be available on web pages, just as they are in D0. It is also possible to use
sam modify request ...
To modify the number of events, the status of the request and so on.
top

Sam Locate
You can find every location of a file everywhere using the sam locate command. We can use one of the files that came from the sam translate example to show how this works:
[stdenis@nglas05 ~]$ sam locate sm-store-test1.root
which yields
['/pnfs/cdfen/filesets/SM/SMTest,ia3937']
This is dull -- just the location in enstore, along with the "enstore cookie" (don't ask).
Here is a more intersting one (common file used in all tests, so it gets around):
[stdenis@nglas05 ~]$ sam locate gb01defd.0001exo0
yielding
['/pnfs/cdfen/filesets/GI/GI05/GI0500/GI0500.0,ia3638', 'ncdf68.fnal.gov:/scratch/sam/cache1/boo', 'nglas09.fnal.gov:/cdf/scratch/sam/cache/cdfyale/prd/boo', 'lf7.ph.gla.ac.uk:/localhome/sam/cache1/boo', 'tuhept.phy.tufts.edu:/home/sam/cache1/boo', 'cdf3.uchicago.edu:/cdf/data3a/boo', 'nglas07.fnal.gov:/cdf/scratch/sam/pro/boo', 'nglas08.fnal.gov:/cdf/scratch/sam/prd/boo', 'testwulf.hpcc.ttu.edu:/home/sam/cache1/prd/boo', 'matrix.physics.ox.ac.uk:/eweak/disk1/sam/boo', 'cdfg.ph.gla.ac.uk:/data3/sam/prd/boo', 'cdf001.ucsd.edu:/cdf/data01/cdf001/cdf-sam/cache/boo', 'nglas03.fnal.gov:/data3/sam/prd/boo', 'nglas10.fnal.gov:/data/nglas10/a/sam/prd/boo', 'nglas05.fnal.gov:/data3/sam/pro/boo', 'nglas04.fnal.gov:/data3/sam/pro/boo', 'nglas06.fnal.gov:/data1/sam/prd/boo', 'fcdfdata016.fnal.gov:/data1/cdf-sam/cache/cdf-scotgrid-2/prd/boo', 'pccdf2.ts.infn.it:/cdf3/sam_cache/boo', 'dcap://cdfdca-door01:dcap://cdfdca.fnal.gov:25125/pnfs/fnal.gov/usr//cdfen/filesets/GI/GI05/GI0500/GI0500.0', 'dcap://cdfdca-door03:dcap://cdfdca.fnal.gov:25137/pnfs/fnal.gov/usr//cdfen/filesets/GI/GI05/GI0500/GI0500.0', 'dcap://cdfdca-door02:dcap://cdfdca.fnal.gov:25136/pnfs/fnal.gov/usr//cdfen/filesets/GI/GI05/GI0500/GI0500.0', 'dcap://cdfdca-door04:dcap://cdfdca.fnal.gov:25138/pnfs/fnal.gov/usr//cdfen/filesets/GI/GI05/GI0500/GI0500.0', 'cdfsam.cnaf.infn.it:/cdf/data/data001/SAM-100GB-cache/boo']

top

Making a SAM Dataset
To make a SAM dataset, you can either use the command line arguments or use the dataset definition GUI. Files are imagined to live in a multidimensional space with various parameters as possible axes. You put constraints on the space to carve out the files you wish to have. The definition is usually done by first checking that you get a sensible list of files with the sam translate command and then you use the sam create dataset definition and sam create dataset commands. If you are confused by the multiple uses of the word "dataset" then you should read the section on "Datasets Explained".
The syntax of the sam translate has been shown elsewhere. This can then be used to define a dataset as follows:
sam create dataset definition --defname=stink2 --group=test \ --defdesc='test mc param store with mrenna' \ --dim='cdf.dataset stink2'
A definition can be examined with
sam translate constraints --dim="DATASET_DEF_NAME stink2"
This gives:
[stdenis@nglas05 ~]$ sam translate constraints --dim="DATASET_DEF_NAME stink2" Files: sm-store-test1.root rs-1ev-test-031106-1810.root rs-1ev-test-031106-1840.root rs-1ev-test-031106-1844.root rs-1ev-test-031106-1844-1.root rs-1ev-test-031106-1845.root rs-1ev-test-031106-1845-1.root rs-1ev-test-031106-1847.root rs-1ev-test-031106-1855.root rs-1ev-test-031106-1910.root File Count: 10 Average File Size: 300 Total File Size: 3000 Total Event Count: 20
If the result is not what was expected, then the definition can be redone and the previous definition will be overwritten.
After you are satisfied, you can either run a project or use:
sam create dataset --defname=stink2
This ensures that your defintion cannot be changed.
top

Datasets Explained: SAM Datasets, CDF DataSets, Datasets, Project Snapshots
Unfortunately the word "dataset" is heavily overloaded in the data handling world. Furthermore, in the deep dark history of SAM, a change was made in syntax and so there are some references to words that mean the same thing.
First, a cdf dataset has a more modern (especially in Grid) concept of a "data collection". That is a group of files that are common in their properties. For our implementation of SAM in CDF, we have maintained this as a parameter although more sophisticated ways of handling this are being hammered out.
A SAM dataset definition corresponds to a selection of files meeting some criteria set by a variety of parameters that describe them based on the declarations made in the metadata.
A very simple way through the morass is to use the parameter cdf.dataset in defining a sam dataset and that is the end of the story. Using more sophisticated combinations of parameters requires care that one has specified the collection of files uniquely. Tools exist to allow you to examine files you care to inspect, but this is indeed a complex operation.
Once a dataset definition is made, it can be used to specify the files to be delivered to a project. When that delivery has been done, sam keeps permanent record of the project that was run and it is possible to always go back to find out what files were used. This is called a "dataset" within the context of sam or a "project snapshot" within the context of sam.
When a dataset definition is made, it is possible to immediately take a snapshot of the files that satisfy the requirements specified in the definition with "sam create dataset". Once this is done the definition is frozen. This is useful if you want to make sure that your definition is not modified - by someone else!
top

SAM Datasets: Figuring out what parameters are defined for each file, their values and getting access to metadata

Information on parameters and metadata for datasets in Randy's browser under the "sam" reports and shows up in the pulldown menus as:

SAM:File Parameter Names by Project (aka dataset)

SAM:File Parameter Values by Project (aka dataset)

Here is a description of how to use them and what they do.

SAM:File Parameter Names by Project (aka dataset)
One is presented with two fields to fill:

Specify project definition name, optional text.

Specify snapshot version number, an optional long integer.

One will want to fill the first with a favorite project defintion name. This example uses jbot0h.
The second is not useful until one has done the query or knows something about one's snapshots. Snapshots and datasets are described in this document.
If one submits one's request after giving a dataset definition one obtains the following. ( The example for jbot0h is instructional) For every snapshot, one sees what parameters there are and how many times they have different values for all the files. One does NOT see all the files and the parameter next to each one. That would be information overload at this point. The columns obtained are:

Project / Dataset

Snapshot / Version

Category

Type

Occurrences

Distinct

Here is a more detailed description of each column

Project / Dataset
This is the name of the dataset definition entered.

Snapshot / Version
The various snapshot numbers will be repeated. For a stable dataset, the same number of files appears each time. If the dataset grew and changed according to the datset definition, the number of occurances grows. In this example, there are 690 files returned in this dataset definition. That is stable since more files that fit the criteria have not been added.

Category
Parameters are specified by category and type where type is really the parameter name. Syntax in the dimensions is <parameter category>.<parameter_type> so they will be referred to in this way here.
The parameters cdf.dataset and cdf.analysis_group appear in the example given.

Type
Already described above. This is the parameter type.

Occurrences
The number of times that this parameter category and type has appeared in the project snapshot for this dataset definition. It is 690 for the example and this is what is expected for these parameters since they should be defined for every file in the dataset.

Distinct
The dataset and the analysis group should be constant for all the files and hence the number of distinct values should be 1. It is worth noting, however, that if one is doing Monte Carlo and had random number as a parameter type, that distinct should be equal to the number of times the parameter appears in the snapshot.

SAM:File Parameter Values by Project (aka dataset)
In this case, one is presented the same fields to fill in as before. This time, for the example, jbot0h is chosen and the latest snapshot is entered (28).
The resulting report may be found at this URL. where the following columns are returned:

Project / Dataset

Snapshot / Version

Category

Type

Value

Occurrences

The only difference from the report above is that the value is given. For the case where occurances are the same for all files, only the single value appears.
This effectively gives you a way to recover the value of parameters for a dataset defintion and inasmuch as we match sam dataset definitions in cdf to cdf datasets, this gives parameters defining a dataset.
Also, for the case of some parameters that have meaning, Randy has hyperlinked it to more detailed information. Clicking on jbot0h in this example case will give a page of all the files that exist in that dataset and details on the location of every file in every cache, and a basic dump of all metadata for each file. Files are listed by default 125 at a time.
top

SAM Datasets: Cleaning up after some jobs crash
Here's how to make a "cleanup" dataset definition:
sam translate constraints --dim = "__set__ old_dataset_definition_name minus (project_name your_old_project_name and consumed_status consumed and consumer your_name)"
where

old_dataset_definition_name = the dataset definition for the original project

your_old_project_name = the name of the project that busted

your_name = your user name

Some points to note:

There are 2 underscores on each side of "set".

Don't forget the "minus". That's what makes this command work.

For example, for the dataset jbot0h where the sam project was run at some point, go to the log file to get the project name and then use
sam translate constraints --dim="__set__ jbot0h minus (project_name stdenis_cdf-sam_jbot0h_1078085002.52 and consumed_status consumed and consumer stdenis)"
to see the files missed. Remember sam translate constraints just lists the files. If all looks good, replace "sam translate constraints" with "sam create dataset definition" and add "--group=test --defname=your_new_def_name" and don't forget to do the "sam create dataset" as well, as decribed in above.
If you want further checks after you create a dataset, you can look at the parameters and metadata if you follow these instructions.
top

File Availabity Status and Dimension Queries
What does file availabity status mean? It means that there is AT LEAST ONE accessible location for the file:

on a disk with "ok" status,

on a station cache disk with "ok" status,

on a tape with "online" status

If there are one or more locations, and at least one of them is considered "good", then the file is available.
What was it used for? It was used because:

we told the users to use it

because to create datasets you can only process files that have LOCATIONS (not "virtual" files, aka "metadata only" files, aka intermediate files for whom the metadata is stored, but not the file itself).

Users were confused when they'd do "translate constraints" and see a list of N files, then run their project and be delivered only M files (M<N), because the constraints they provided did not account for "files with locations" (aka 'file_availability_status'='available').
In the past, this columns was set to 'available' if the file ever received a location (but not maintained after that point).
Another file status value is FILE_CONTENT_STATUS, which is inherent to a file (including any and all replicas) and is a global judgement that the integrity of the file itself is good or bad. This is not a reflection of the quality of the physics contents.
FILE_CONTENT_STATUS is a function of the file itself.
FILE_AVAILABILITY_STATUS is a function of all valid sam_locations which might contain copies of the file; the file is available if at least one replica is accessible.
top

Useful Binary list for SAM/CAF installations

AC++Dump

Edm_EventLister

Edm_ObjectLister

Edm_DescribeFile

ProductionExe

TRGSim++

cdfGen

cdfSim

top

Adding a Permanent Disk for Storage

[sam@nglas08 sam]$ samadmin add data disk --fullpath=nglas08.fnal.gov:/data/nglas08/b/sam/perm --size=3G Data disk nglas08.fnal.gov:/data/nglas08/b/sam/perm has been registered, id = 269, type = disk [sam@nglas08 perm]$ samadmin add disk location --fullpath=nglas08.fnal.gov:/data/nglas08/b/sam/perm/perm Disk location nglas08.fnal.gov:/data/nglas08/b/sam/perm/perm has been registered: id = 109539, type = 'disk'
Note the subtle bug in that the fullpath is longer for the disk location than for data disk!
top

How to add disk for sam cache returning URL's for DCACHE, HPSS, or AFS

The problem: How to get SAM to recognize a cache based on AFS (Posed by Liz Buckley) -------------------------------------------------------------- Think of the current implementation of SAM and dCache. The station is configured with SAM cache that is actually dcache doors. When the user runs a SAM job the sam_get_next_file returns them a URL to the file's location in dcache. No data is moved by SAM. It is up to the user application to open the file, at which point it will physically be staged into dCache if it isn't there already. SAM does place an entry in its database showing that file now has a dcache location but it has no ownership of the files, they are not "owned" bu user SAM. Now think of AFS. I have a big pool of AFS space with files that are already there - sort of like dCache. The files are also in PNFS and will be declared to SAM with PNFS locations like any other file. I would like to tell the SAM station that this is it's "cache". When a user runs a SAM job I would like get_next_file to return the AFS path. No data should be moved by SAM, like with dCache, the user application will open the file because it has AFS available on the node it is running on. SAM doesn't own the files in this case either. Conceptually these two cases seem to me rather similar. Clearly it is possible to make this work for dCache disk so why can't something similar be implemented for AFS disk. The Answer From Andrew ------------------------------ SAM and dCache in its present form exists as a proof of concept that SAM can manage and broker different storage elements. The adapter I developed to map PNFS location to dCache URL is not universal and probably won't even work outside of Fermilab. Yet there is an ongoing effort to introduce missing pieces in the context of SAM - SRM intgration project. One of the key concepts that is been developed in SAM at the moment is a borderline between protocols that manage files and protocols that access them. As an illustration think of ways files can be put onto a regular disk using ftp or similar but accessed via mount point on AFS disk (that may have nothing to do with local filesystem path). With all that said here is a klude : The thing you are talking about exits already. I took me quite a while to realize it ( had to stop writing email and think a bit). Current HPSS adapter works the same way you want AFS adapter to work. It does not add anything extra to a existing location and it will use existing set of non cache locations to map to a pseudo cache. What you need to do. 1) Create a node that is named like rfio://<anything here>. 2) Add that node to the station : sam add disk --mount=<node name from step 1)>:<path that is common to all AFS locations (important)> 3) Add --constrain-delivery=<path that is common to all AFS locations>::<node name from step 1)> 4) Add --pmaster-arg=--consumption-map=\.\*::<node name from step 1)> 5) Add --prefer-loc=<path that is common to all AFS locations> 6) Run a stager anywhere . Give it argument --node-name=<node name from step1)> 7) Run a project . See if locations given are what you expect. 8) Report any issues.
Here is an Example for DCache on CDF:

This is how one of the 30 disks we mounted for 30 doors looks in a sam dump station of cdf-sam:

disk 243 dcap://cdfdca-door01a:dcap://cdfdca1.fnal.gov:25125/pnfs/fnal.gov/usr, 561047913B/200000000KB = 0.3% free

Note the back bone disk is added as well:

disk 349 dcap://cdfdca-backbone:dcap://cdfdca1.fnal.gov:25125/pnfs/fnal.gov/usr, 82408456MB/163840000MB = 50.3% free

Here is the server list file. This is all one line in the real file, but has been broken up here for readability:

station prd v4_2_1_69 cdf-sam --preferred-loc=enstore --pmaster-arg=--consumption-map=^1$::dcap://cdfdca-door01a --pmaster-arg=--consumption-map=^2$::dcap://cdfdca-door02a --pmaster-arg=--consumption-map=^3$::dcap://cdfdca-door03a --pmaster-arg=--consumption-map=^4$::dcap://cdfdca-door04a --pmaster-arg=--consumption-map=^5$::dcap://cdfdca-door05a --pmaster-arg=--consumption-map=^6$::dcap://cdfdca-door06a --pmaster-arg=--consumption-map=^7$::dcap://cdfdca-door07a --pmaster-arg=--consumption-map=^8$::dcap://cdfdca-door08a --pmaster-arg=--consumption-map=^9$::dcap://cdfdca-door09a --pmaster-arg=--consumption-map=^10$::dcap://cdfdca-door10a --pmaster-arg=--consumption-map=^11$::dcap://cdfdca-door11a --pmaster-arg=--consumption-map=^12$::dcap://cdfdca-door12a --pmaster-arg=--consumption-map=^13$::dcap://cdfdca-door13a --pmaster-arg=--consumption-map=^14$::dcap://cdfdca-door14a --pmaster-arg=--consumption-map=^15$::dcap://cdfdca-door15a --pmaster-arg=--consumption-map=^16$::dcap://cdfdca-door16a --pmaster-arg=--consumption-map=^17$::dcap://cdfdca-door17a --pmaster-arg=--consumption-map=^18$::dcap://cdfdca-door18a --pmaster-arg=--consumption-map=^19$::dcap://cdfdca-door19a --pmaster-arg=--consumption-map=^20$::dcap://cdfdca-door20a --pmaster-arg=--consumption-map=^21$::dcap://cdfdca-door21a --pmaster-arg=--consumption-map=^22$::dcap://cdfdca-door22a --pmaster-arg=--consumption-map=^23$::dcap://cdfdca-door23a --pmaster-arg=--consumption-map=^24$::dcap://cdfdca-door24a --pmaster-arg=--consumption-map=^25$::dcap://cdfdca-door25a --pmaster-arg=--consumption-map=^26$::dcap://cdfdca-door26a --pmaster-arg=--consumption-map=^27$::dcap://cdfdca-door27a --pmaster-arg=--consumption-map=^28$::dcap://cdfdca-door28a --pmaster-arg=--consumption-map=^29$::dcap://cdfdca-door29a --pmaster-arg=--consumption-map=^30$::dcap://cdfdca-door30a --constrain-delivery=dcap://cdfdca-backbone --stager-arg=--max-transfers=1 --excess-satisfaction=0 --log-file=station_log --max-delivery-unit-size=5
Some comments:

The constrain-delivery qualifier is what requires that files be cached in the "backbone" disk as well as the local door.

In the consumption map, when the string before the colon is observed as the HOST_NAME, the file is delivered with the transfer to the string after the colon. Hence, when we run, we
export HOST_NAME=1
Then the string dcap://cdfdca-door1a will be given as the "node". When we open this file, it actually uses the dcap protocol to obtain the file.
This has a wonderful side effect that if we just want to test sam delivery of URL's, we can simply ignore the url and go on.

top

Breaking up CDF Datasets or How to handle 10's of thousands of files

1. Make a dataset definition based on the run range you wish (or any other metadata constraint) Example: We want 3 run ranges since we need 3 sets of tcl to handle things according to rumors we heard. 100045 1 GOOD for runs < 149717 150045 1 GOOD subsequent runs <= 158635 160045 1 GOOD after this sam translate constraints --dim="__SET__ hbhd0c and run_number < 149717" sam translate constraints --dim="__SET__ hbhd0c and run_number >= 149717 and run_number <= 158635 " sam translate constraints --dim="__SET__ hbhd0c and run_number > 158635" For the first range we get: File Count: 1287 For the second: File Count: 8141 For the third: File Count: 20017 2. Calculate your strategy for processing. Constraints: a. We want our ntuple to be about 1G for efficient storage to tape. Computation 1 file output ntuple is 50M. So 100 files is 5G. This is probably ok. b. Make sure you are not running a batch job for more than a day or two. Computation 100 files in a segment takes 10 hours. We get 10 segments at once. So we can easily get 1000 files, and we will probably get a couple of these. Let's try 3000 files and 30 segments. Conclusions: a. We can use the first dataset definition as it is. b. We need to divide the second definition into 3 pieces. c. We need to divide the third definition into 7 pieces. Therefore we will have 11 jobs to submit. We will have a total output of 29445 files 1472.25 G = 1.47225 TB. 3. Create the first dataset definition. Lets call it bussey-hbhd0c-pass01-01 sam create dataset definition \ --dim="__SET__ hbhd0c and run_number < 149717 " \ --definitionName="bussey-hbhd0c-pass01-01" \ --defdesc="hbhd0c run<149717" \ --group=test Make a snapshot: This takes a definition, applies the constraints to the information about (metadata) of the current set of files and creates a list that is stored on the database. This is called a snapshot and once this is taken, the dataset definition cannot be modified. sam take snapshot \ --definitionName="bussey-hbhd0c-pass01-01" \ --group=test We get back the snapshot id 9063. This does not need division, so we dont need to do anything more. 4. Now create the other two dataset definitions and take snapshots a. Create the definitions: sam create dataset definition \ --definitionName="bussey-hbhd0c-pass01-02" \ --defdesc="hbhd0c 149717 <= Run <= 158635" \ --dim="__SET__ hbhd0c and run_number >= 149717 and run_number <= 158635" \ --group=test sam create dataset definition \ --definitionName="bussey-hbhd0c-pass01-03" \ --defdesc="hbhd0c run> 158635" \ --dim="__SET__ hbhd0c and run_number > 158635" \ --group=test b. Take snapshots: sam take snapshot \ --definitionName="bussey-hbhd0c-pass01-02" \ --group=test Snapshot has been taken, snapshotId = 9064 sam take snapshot \ --definitionName="bussey-hbhd0c-pass01-03" \ --group=test Snapshot has been taken, snapshotId = 9065 c. Now divide the second dataset into 3 pieces. i. Check first to see the filelist sam translate constraints \ --dim="dataset_def_name bussey-hbhd0c-pass01-02 and snapshot_version 1 and snapshot_file_number 1-3000" sam translate constraints \ --dim="dataset_def_name bussey-hbhd0c-pass01-02 and snapshot_version 1 and snapshot_file_number 3001-6000" sam translate constraints \ --dim="dataset_def_name bussey-hbhd0c-pass01-02 and snapshot_version 1 and snapshot_file_number 6001-8148" ii. Now make these into dataset definitions sam create dataset definition \ --definitionName=bussey-hbhd0c-pass01-02-set01 \ --dim="dataset_def_name bussey-hbhd0c-pass01-02 and snapshot_version 1 and snapshot_file_number 1-3000" \ --defdesc="hbhd0c 149717 <= Run <= 158635 files 1-3000" \ --group=test

top

Define a Dataset that combines others

sam translate constraints --dim="__SET__ jbot0h or __SET__ jbot1h"
will give files in both datasets. (Warning: case sensitive, so OR does not work) For example
sam translate constraints --dim="__SET__ jbot0h" you will get 690 files
and the same with jbot1h gives 41 so the "or" gives 731.
top

Storing Datasets in samV6
Go to this site for a good example of how to store sam files with sam v6.
top