Reference: Data Access: CASTOR

Last modified: Wed Jan 14 07:11:08 GMT 2009
Nick West
Return to home page

Contents

Overview

For an introduction to GRID data access, including file ownership issues, please see Tutorial: Accessing Storage Elements.

CASTOR (or strictly CASTOR2 as it is a complete re-write of CERN's original CASTOR) is the standard SE protocol for RAL. It is is a hierarchical storage management (HSM) system developed at CERN for files which may be migrated between front-end disk and back-end tape storage hierarchies. CASTOR provides a UNIX like directory hierarchy of file names. This directory structure can be accessed using rfio (Remote File Input/Output) protocols either at the command level or, for C programs, via function calls. Our service at RAL has an SRM interface which makes it GRID accessible.

Concepts

Storage Classes

Storage Classes are a generic SE concept, much broader than just CASTOR. Basically there are 4 storage classes
disk[0|1]tape[0|1] (sometimes abbreviated to d[0|1]t[0|1])
defined as follows:-

Storage ClassCharacteristics
disk0tape0 Volatile. Not retained on disk or tape.
disk1tape0 Retained on disk, not written to tape
disk0tape1 Not retained on disk but written to tape
disk1tape1 Retained on disk and written to tape.

Although its possible to talk about Storage Classes in a CASTOR context, it isn't necessary either; everything can be described in terms of Service Classes and File Classes.

As with all tape based systems, small files are not stored efficiently, therefore CASTOR should be used for storing relatively large files (>10MB).

Garbage Collection

In fact garbage collection is a poor name for it (but sticks) because the data is certainly not garbage - it's usually precious and you want to keep it, it's just cleared off disk to make room for something else (and you still have it on tape).

Service Classes

A stager service class (svcclass) ties together disk, tape resources and the policies for how those resources should be used. Unlike a File Class, a Service Class isn't permanently associated with a file, instead it is associated with a request. It's possible to have a file rolled out onto tape with one Service Class and then return to disk in a different Service Class. This provides a way to move files between Disk Pools, but MINOS only have one that they exclusively own so it's not clear to me that we would ever want to do this.

There isn't currently a way to list what Service Classes are available to MINOS or get a detailed description of a specific class beyond identifying the associated Disk Pool and disk server using the stager_qry command.

For MINOS, the RAL Castor service class is minosTape. In principle there is also genTape which is shared with other experiments, but it is not clear to me what advantage there is in using it. Both are Storage Class disk0tape1

You may hear the term Space Token. Essentially this is the SRM equivalent of a service class.

File Classes

File Classes are attached to paths in the nameserver and determine if data goes to tape or not. It provides a way to fine tune tape migration within a single service class.

With the Stager Command nslistclass you can list File Classes, and with the nschclass command you can change the File Class assigned to a file or directory.

In the RAL CASTOR SE, MINOS currently has 2 file classes:-

IDNameEquivalent
Storage Class
18minos-tape1disk0tape1
19minos-tape0disk0tape0
i.e. volatile!

Disk Pools

A Disk Pool is a collection of disks that collectively are bound to one or more Service Classes.

The stager_qry command can be used to get statistics (the -s option) about either a Service Class or a Disk Pool:-

  stager_qry -s -S minosTape  # Select a Service Class

gives:-

  POOL minosTape        CAPACITY 8.18T      FREE   3.41T(41%)  RESERVED       0( 0%)

while

  stager_qry -s -d minosTape  # Select a Disk Pool

  POOL minosTape        CAPACITY 8.18T      FREE   3.41T(41%)  RESERVED       0( 0%)
    DiskServer gdss336.gridpp.rl.ac.uk DISKSERVER_PRODUCTION   CAPACITY 8.18T      FREE   3.41T(41%)  RESERVED       0( 0%)
       FileSystems                       STATUS                  CAPACITY   FREE          RESERVED       GCBOUNDS
       /exportstage/castor1/             FILESYSTEM_PRODUCTION   2.73T        1.11T(40%)        0( 0%)   0.20, 0.30
       /exportstage/castor2/             FILESYSTEM_PRODUCTION   2.73T        1.13T(41%)        0( 0%)   0.20, 0.30
       /exportstage/castor3/             FILESYSTEM_PRODUCTION   2.72T        1.16T(42%)        0( 0%)   0.20, 0.30

We only have one pool and it has the same name as the Service Class.

The command lists the associated diskpool with capacity and GC [Garbage Collection bounds - trigger when less than lower limit (0.2) free and continue until upper limit (0.3) free reached]. Default policy is oldest first with some weight for bigger files.

Environment Variables

The following Environment Variables are used by the stager_qry, rf* and ns* commands

Variable Meaning Example Value
STAGE_SVCCLASS Service Class minosTape
STAGE_HOST The Castor stage genstager.ads.rl.ac.uk
RFIO_USE_CASTOR_V2 Select Caster V2 YES (always)
RFIO_TRACE Debug RFIO 3
STAGE_TRACE Debug stager 3

rf* (rfio) and ns* (CASTOR name server) Commands

To check on the client version of CASTOR:-
  castor -v

gives e.g.

  2.1.2.9
There are two families of commands that can be used to access CASTOR locally at RAL:-
For remote access to CASTOR see CASTOR Tutorial: Remote Access

There is a heavy overlap between the two sets, for example (nsrm,rfrm) and (nsls,rfdir), and where a choice is possible the recommendation is to use the rf* commands with the exception of nsls that has better functionality than rfdir.

These commands don't use GRID certificates, it's all down to UNIX permissions; when files are created they will be owned by the user running on the UI. If subsequently accessed via some GRID service the username will normally be different but so long as it belongs to the same group i.e. 'minos' then group attributes can be used to control access.

Caution: Commands like rfdir and nsls work with information about the logical file in the nameserver, the file itself may only be on tape. Use the Stager Commands to see if a file is really on disk

Command Action & Options Example
nsls List local directory
-l long
-T List tape residence i.e. on tape
-d List directory (cf. ls -d)
--class list File Class ID
nsls --class -l /castor/ads.rl.ac.uk/prod/minos/test
rfdir List local or remote directory
rfdir /castor/ads.rl.ac.uk/prod/minos/test
rfrename Rename file and/or path to it
rfrename /castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root 
         /castor/ads.rl.ac.uk/prod/minos/test/F00035853_0022.mdaq.root
rfchmod Change file permissions
rfchmod 0775 /castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root 
rfrm Remove file. See Caution
-r recursively (be careful!)
rfmkdir Create directory
-p create parent directories if required
rfmkdir /castor/ads.rl.ac.uk/prod/minos/test/nwest
rfcp Copy file into/out of CASTOR
rfcp  /castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root 
      /tmp 
rfcat List file
rfcat /castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root
      > /tmp
nslistclass List File Class
--id x - specify class ID
--name x - specify class name
nslistclass --id 18
nslistclass --name minos-tape1
nschclass class path
(class = ID or name)
Assign File Class to file or directory.
When assigned to directory it will apply recursively to all its contents unless overridden by another assignment.
nschclass 19  /castor/ads.rl.ac.uk/prod/minos/test

Caution
Currently, the command rfrm (and the equivalent nsrm) removes the file from NS (the name server) but not off the stager disk. If the file is in the stager it has to be removed using the Stager Commands stager_rm -M before using rfrm. As of CASTOR version 2.1.7 the stager clean up should be automatic, it should not be necessary to use stager_rm -M.

Stager and Tape Commands

The stager manages the CASTOR disk cache. If the file is not on disk the stager will recall the file from tape, which is an operation that can take several minutes up to hours. Any RFIO or ROOT command will block until the file is available on disk. The stager_qry command can be use to see if the file is already on disk and the stager-get can be used to pre-stage it.

Command Action Example
stager_qry -M hsmfile Query file status. Will give error if not on disk.
Useful to see if a file really is on disk.
stager_qry -M /castor/ads.rl.ac.uk/prod/minos/ ...
  ... test/nwest/F00035853_0022.mdaq.root
stager_rm -M hsmfile Mark disk copy of a file as suitable for deletion.
Won't effect tape version if it exists.
stager_rm -M /castor/ads.rl.ac.uk/prod/minos/ ...
  ... test/nwest/F00035853_0022.mdaq.root
stager_get -M hsmfile Stage file to disk
stager_get -M /castor/ads.rl.ac.uk/prod/minos/ ...
  ... test/nwest/F00035853_0022.mdaq.root
stager_qry -s -d disk_pool Query service class status
stager_qry -s -d minosTape
vmgrlisttape List CASTOR tapes
vmgrlisttape -P minos
    - summary of all tapes
vmgrlisttape -P minos -V CS1017 -x
    - extended listing of an individual tape.

File status is one of:-

  STAGEIN   the  file  is  being  recalled  from tape or being internally
            disk-to-disk copied from another diskpool.

  STAGED    the file has been successfully staged from  tape  and  it  is
            available  on  disk, or the file is only available on disk if
            its associated nbTapeCopies = 0, see  the  nschclass(1castor)
            command.

  STAGEOUT  the file is being staged from client.

  CANBEMIGR the  file  is  on  disk and any transfer from client has been
            completed. The migrator will take it for tape migration.

  INVALID   the file has no valid diskcopy/tapecopy, or  a  failure  happened 
            concerning the file.

Capacity

At RAL, you can see our capacity (both total and remaining) using the stager_qry command:-
   stager_qry -s -S minosTape 

gives:-

  POOL minosTape        CAPACITY 8.18T      FREE   3.41T(41%)  RESERVED       0( 0%)
You can also check on the backend tape capacity:-
  vmgrlisttape -P minos

gives:-

  CS1003   CS1003 STK_RAL1 500GC    aul minos            453.38GB 20080805 RDONLY
  CS1017   CS1017 STK_RAL1 500GC    al  minos            334.27GB 20080819 RDONLY
  CS1018   CS1018 STK_RAL1 500GC    al  minos            348.43GB 20080820 RDONLY
  CS1019   CS1019 STK_RAL1 500GC    al  minos            472.95GB 20080804 RDONLY
...
Away from RAL you can use lcg-info
  lcg-info --vo minos.vo.gridpp.ac.uk --list-se -query 'SEName=RAL-LCG2:srm' --attrs AvailableSpace,UsedSpace,Path
which produces:-
- SE: srm-minos.gridpp.rl.ac.uk
  - AvailableSpace      3747000000
  - UsedSpace           5251000000
  - Path                /castor/ads.rl.ac.uk/prod/minos/tape
- the numbers only roughly match those obtained using stager_qry.

You can really go to town and use ldapsearch There two command. For the data about the service

ldapsearch -x -h lcgbdii02.gridpp.rl.ac.uk -p 2170 -b "mds-vo-name=local, o=grid" \
    '(&(objectClass=GlueSE)(GlueSEUniqueID=srm-minos.gridpp.rl.ac.uk))'
For the data about the capacity and access:-
ldapsearch -x -h lcgbdii02.gridpp.rl.ac.uk -p 2170 -b "mds-vo-name=local, o=grid" \
    '(&(objectClass=GlueSA)(GlueSALocalID=minos.vo.gridpp.ac.uk:minosTape))'

TRFIOFile

ROOT's TRFIOFile which offers the possibility to read and write files via an rfiod server. At RAL UI the following works
"rfio://genstager.ads.rl.ac.uk/?path=/castor/ads.rl.ac.uk/prod/minos/test/nwest/F00035853_0022.mdaq.root"
In might also be possible to use TCastorFile. One difference between it and TRFIOFile is that the later talks to a daemon called 'rfiod', part of the CASTOR package while TCastorFile uses the 'rootd' daemon, delivered with ROOT. I am not sure if this is properly supported at RAL. Also, in March 2007 trying to use TCastorFile left the file in a broken state. Don't try to use TCastorFile yet!

Stress Test

In March 2007 I did a stress test running N batch jobs on RAL T1 simultaneously loading 11GB of flux files from
  /castor/ads.rl.ac.uk/prod/minos/test/gnumiv19/fluka05_le010z185i
finding the mean elapse time to complete the load:-
   1 jobs:  Mean time =  1372.0 +-  0.0 Failures:  0 Mean time per flux set = 1372.0
   2 jobs:  Mean time =  2023.5 +- 10.5 Failures:  0 Mean time per flux set = 1011.8
   4 jobs:  Mean time =  2910.0 +- 34.5 Failures:  1 Mean time per flux set =  727.5
  10 jobs:  Mean time =  5479.5 +- 83.1 Failures:  5 Mean time per flux set =  548.0
  20 jobs:  Mean time =  8734.5 +-120.8 Failures: 14 Mean time per flux set =  436.7
Conclusions On the last point dealing with error recovery there was an interest point raised on the CASTOR support email list by Tim Folkes:-
  As Glen said, castor generates a checksum when the data is written to
  tape, stored in the database, and checked when the file is read off
  tape.

  I'm not sure if there is any checking done as it is written/read
  to/from disk.

  Not easy to find a files checksum.  No option on nsls, but there is
  on nslisttape.  So its a two stage query -

  [root@castor200 rtcopy]# nsls -lT /castor/ads.rl.ac.uk/test/gtf/test003
  - 1   1 CS1109       3 0000200f           2147483648  99 
  /castor/ads.rl.ac.uk/test/gtf/test003
  [root@castor200 rtcopy]# nslisttape -V CS1109 --checksum | grep test003
  - 1   1 CS1109       3 0000200f           2147483648  99         adler32 
  7cb3c451 /castor/ads.rl.ac.uk/test/gtf/test003

Dave Newbold: 

    The secret incantation nsls -T --checksum seems to do something.

External Links


Return to home page