Skip to content.

SAMGrid Plone

Sections
Personal tools
You are here: Home » Wiki » Storing Files at D0
Views

History for Storing Files at D0

changed:
-
Before starting

  Storing files into SAM normally involves putting them into the tape system (enstore). This puts some restrictions on the sort of files you should store: tape is most efficient in both space usage and access time when the files are as large as possible. You should aim to make the files you are storing as big as you can: at least 1GB is preferable. Smaller files should be merged together before being put on tape.

SAM Metadata

  Every file stored in SAM is described by metadata which contains information about the contents of the file. When you store a file into SAM you must provide the metadata for each file. Metadata may be created for you by the tools you use to create the files, or in some cases you may need to write it yourself. You can view the metadata for any file already in SAM with the command 'sam get metadata --file=filename'.

File types

  Each file has a particular file type associated with it. The type of the file determines what other metadata fields are required for that file. You can view the requirements for a file type with the command 'sam describe metadata requirements --fileType=filetype'

  nonPhysicsGeneric -- for archived log files, executable tar files, and similar. The required fields are fileName, fileSize, and group.

  physicsGeneric -- for physics data files that don't fit in any other category. In addition to the requirements for nonPhysicsGeneric it also wants the dataTier.

  importedSimulated -- for MC files which are imported (rather than being the results of running a SAM processing job), such as the MC generator output. In addition to physicsGeneric, requires applicationFamily, firstEvent, lastEvent, and eventCount.

  derivedSimulated -- for MC files which are the result of a SAM processing job. In addition to physicsGeneric, requires eventCount, firstEvent, lastEvent, processId, and parents.

  derivedDetector -- for files which are the result of processing real data files through a SAM processing job. In addition to physicsGeneric, requires eventCount, firstEvent, lastEvent, processId, parents, runDescriptorList, and datastream.

  In addition to the required attributes the metadata may include any of the other attributes if they are relevant to the file.

Data tiers

  The data tier describes what general class of data the file being stored belongs to, for example, raw, generated, thumbnail, root. You can use 'sam get registered data tiers' to get a list of all the valid tiers. For storing personal (as opposed to production) files you should use one of the bygroup tiers (such as root-tree-bygroup) and set the group attribute to the group you are storing the data for. This makes it easier to avoid confusion between your files and official production ones.

Creating the metadata

  For simple file stores the metadata for a file must be provided in the form of a "python":http://www.python.org file (beware, leading whitespace is significant in python code). When a job is run using !SAMManger it automatically creates metadata for the job output files. You may be able to use this as is, or you may have to customise it to better describe the files (remember the more descriptive the metadata you provide, the easier it is to get the files you want, and only those files, back out of SAM). The metadata can be validated with the command 'sam verify metadata --descriptionFile=metadata_file.py'. Examples of metadata are at the end of this page.

  It is possible to declare a file to SAM without actually storing it (for example, unmerged files which are needed to track file parentage, but are not retained permanently). To do this, use the 'sam declare file' command.

Storing the file

  On clued0 only certain nodes are able to store files into the tape system. Currently it is possible on jetsam-clued0 and flotsam-clued0. It is better to avoid running the copies over NFS as this has poor performance and appears to give a greater chance of corrupting the file. Instead, copy the files onto the node's /work disk, store the file and then delete the local copy.

  To store the file, log into the appropriate node and run the command::

    sam store --sourceFile=path/to/data/datafile --descriptionFile=metadata_file.py

  If you want it to wait until the store has completed, add the flag '--waitForCompletion'. Otherwise you can check on the current status with the command 'sam get file transfer request status'. A pending store can be cancelled with 'sam request file transfer'. If a store fails or is cancelled, you can retry it with::

    sam store --sourceFile=path/to/data/datafile --resubmit

v5 metadata

  Many releases and tools at D0 still produce the SAM v5 metadata format. This has metadata files which begin like::

    from import_classes import *
    TheFile = ProcessedFile(

  These can not be stored using the default sam commands. If the metadata is the output from a !SamManager job you can convert it to the new format with::

    setup d0sam_utils
    convert_sam_metadata --group=mygroup v5metadata_file.py

  (this does not work properly for mcrunjob produced MC files).

  Alternatively you can still use the old file store method on clued0 by doing::

    setup sam -o -q prd
    sam store --station=clued0-v5 ...

Advanced techniques

  Both the metadata objects and the sam store functionality are exposed as part of the sam API. You can write python programs which create or modify the metadata and store the file all in one operation. See the "API interface documentation":http://d0db-prd.fnal.gov/sam_user_api/sam_API.html

Example metadata

  Example nonPhysicsGeneric metadata::

    from SamFile.SamDataFile import  *

    TheFile = NonPhysicsGenericFile({
              'fileName' : 'an_example_logfile',
              'fileType' : 'nonPhysicsGeneric',
              'fileSize' : SamSize('31.84MB'),
     'fileContentStatus' : 'good',
                 'group' : 'dzero',
    })

  Example derivedDetector metadata::

    from SamFile.SamDataFile import  *

    TheFile = DerivedDetectorFile({
             'fileName' : 'some_CAF_file.root',
             'fileType' : 'derivedDetector',
             'fileSize' : SamSize('1007.09MB'),
    'fileContentStatus' : 'good',
           'eventCount' : 29218,
             'dataTier' : 'root-tree-bygroup',
           'firstEvent' : 74820835,
            'lastEvent' : 15700280,             
            'startTime' : SamTime(1140985720.0),
              'endTime' : SamTime(1140996049.0),
            'processId' : 8921387,
                'group' : 'dzero',
              'parents' : ['CSskim-MUinclusive-20060208-140843-2018302_p17.09.03', 'CSskim-MUinclusive-20060208-153319-2019474_p17.09.03'],
           'datastream' : 'all',
    'lumBlockRangeList' : [LumBlockRange(3495926, 4172312)],
    'runDescriptorList' : [RunDescriptor(runType='physics data taking', runNumber=195737),RunDescriptor(runType='physics data taking', runNumber=208918)],
    })


« February 2009 »
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
 
 

Powered by Plone

This site conforms to the following standards: