CHAPTER 1 Introduction

File I/O

A variable can be obtained from a file or collection of files, or can be generated as the result of a computation. Files can be in any of the self-describing formats netCDF, HDF, GrADS/GRIB (GRIB with a GrADS control file), or PCMDI DRS. (Depending on your local installation, HDF and DRS may or may not be enabled.) For instance, to read data from file sample.nc into variable u :

>>> import cdms

>>> f = cdms.open('sample.nc')

>>> u = f('u')

Data can be read by index or by world coordinate values. The following reads the n-th timepoint of u (the syntax slice(i,j) refers to indices k such that i <= k < j):

>>> u0 = f('u',time=slice(n,n+1))

and this reads u at time 366.0:

>>> u1 = f('u',time=366.)

A variable can be written to a file with the write function:

>>> g = cdms.open('sample2.nc','w')

>>> g.write(u)

<Variable: u, file: sample2.nc, shape: (1, 16, 32)>

>>> g.close()

Attributes

As mentioned above, variables can have associated attributes. In fact, nearly all CDMS objects can have associated attributes, which are accessed using the Python dot notation:

>>> u.units='m/s'

>>> print u.units

m/s

Attribute values can be strings, scalars, or 1-D Numeric arrays.

When a variable is written to a file, not all the attributes are written. Some attributes, called internal attributes, are used for bookkeeping, and are not intended to be part of the external file representation of the variable. In contrast, external attributes are written to an output file along with the variable. By default, when an attribute is set, it is treated as external. To see the list of external attribute names:

>>> print u.attributes.keys()

['datatype', 'name', 'missing_value', 'units']

The Python dir command lists the internal attribute names:

>>> dir(u)

['_MaskedArray__data', '_MaskedArray__fill_value', ..., 'id', 'parent']

In general internal attributes should not be modified directly. One exception is the id attribute, the name of the variable. It is used in plotting and I/O, and can be set directly.

Masked values

Variables can have an optional mask which represents a portion of data that is missing. If present, the mask of a variable is an array of ones and zeros, of the same shape as the data array. A mask value of one indicates that the corresponding data array element is missing or invalid.

Arithmetic operations in CDMS take missing data into account. The same is true of the functions defined in the cdms.MV module. For example:

>>> a = MV.array([1,2,3]) # Create array a, with no mask

>>> b = MV.array([4,5,6]) # Same for b

>>> a+b

variable_13

array([5,7,9,])

>>> a[1]=MV.masked # Mask the second value of a

>>> a.mask() # View the mask

[0,1,0,]

>>> a+b # The sum is masked also

variable_14

array(

data = [5,0,9,],

mask = [0,1,0,],

fill_value=[0,]

)

When data is read from a file, the result variable is masked if the file variable has a missing_value attribute. The mask is set to one for those elements equal to the missing value, zero elsewhere. If no such attribute is present in the file, the result variable is not masked.

When a variable with masked values is written to a file, data values with a corresponding mask value of one are set to the value of the variable's missing_value attribute. The data and missing_value attribute are then written to the file.

Masking is covered in See MV module.. Also see the documentation on the Python Numeric and MA modules, on which cdms.MV is based, at http://numpy.sourceforge.net .

File Variables

A variable can be obtained either from a file, a collection of files, or as the result of computation. Correspondingly there are three types of variables in CDMS:

A file variable is a variable associated with a single data file. Setting or referencing a file variable generates I/O operations.
A dataset variable is a variable associated with a collection of files. Reference to a dataset variable reads data, possibly from multiple files. At present writing, dataset variables are read-only.
A transient variable is not associated with a file or dataset. The examples to this point illustrate this type of variable. Transient variables result from a computation or I/O operation.

A typical use of file variables is to inquire information about variables in a file without actually reading the data for the variables. A file variable is obtained by applying the slice operator [] to a file, with the name of the variable, or with the getVariable function. Note that obtaining a file variable does not actually read the data array:

>>> f = cdms.open('sample.nc','r+')

>>> u = f.getVariable('u') # or u=f['u']

>>> u.shape

(3, 16, 32)

File variables are also useful for fine-grained I/O. They behave like transient variables, but operations on them also affect the associated file. Specifically:

slicing a file variable reads data,
setting a slice writes data,
referencing an attribute reads the attribute,
setting an attribute writes the attribute,
and calling a file variable like a function reads data associated with the variable:

>>> f = cdms.open('sample.nc','r+') # Open read/write

>>> uvar = f['u'] # Note square brackets

>>> uvar.shape

(3, 16, 32)

>>> u0 = uvar[0] # Reads data from sample.nc

>>> u0.shape

(16, 32)

>>> uvar[1]=u0 # Writes data to sample.nc

>>> uvar.units # Reads the attribute

'm/s'

>>> uvar.units='meters/second' # Writes the attribute

>>> u24 = uvar(time=24.0) # Reads data

>>> f.close() # Save changes to sample.nc (I/O may be buffered)

In an interactive application, the type of variable can be determined simply by printing the variable:

>>> rlsf # Transient variable

rls

array(

array (4,48,96) , type = f, has 18432 elements)

>>> rlsg # Dataset variable

<Variable: rls, dataset: mri_perturb, shape: (4, 46, 72)>

>>> prc # File variable

<Variable: prc, file: testnc.nc, shape: (16, 32, 64)>

Note that the data values themselves are not printed. For transient variables, the data is printed only if the size of the array is less than the print limit. This value can be set with the function MV.set_print_limit to force the data to be printed:

>>> smallvar.size() # Number of elements

>>> MV.get_print_limit() # Current limit

300

>>> smallvar

small variable

array(

[[ 0., 1., 2., 3.,]

[ 4., 5., 6., 7.,]

[ 8., 9., 10., 11.,]

[ 12., 13., 14., 15.,]

[ 16., 17., 18., 19.,]])

>>> largevar.size()

400

>>> largevar

large variable

array(

array (20,20) , type = d, has 400 elements)

>>> MV.set_print_limit(500) # Reset the print limit

>>> largevar

large variable

array(

[[ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19.,]

... ])

The datatype of the variable is determined with the typecode function:

>>> x.typecode()

'd'

Dataset Variables

The third type of variable, a dataset variable, is associated with a dataset, a collection of files that is treated as a single file. A dataset is created with the cdscan utility. This generates an ASCII metafile that describes how the files are organized, and what metadata is contained in the files. In a climate simulation application, a dataset usually represents the data generated by one run of a general circulation or coupled ocean-atmosphere model.

For example, suppose data for variables u and v are stored in six files: u_2000.nc, u_2001.nc, u_2002.nc, v_2000.nc, v_2001.nc , and v_2002.nc . A metafile can be generated with the command:

% cdscan -x cdsample.xml [uv]*.nc

The metafile cdsample.xml is then used like an ordinary data file:

>>> f = cdms.open('cdsample.xml')

>>> u = f('u')

>>> u.shape

(3, 16, 32)

Grids and Regridding

Latitude-longitude grids are used for regridding variables. A grid encapsulates:

latitude, longitude coordinates
grid cell boundaries
area weights
data ordering

For example, to regrid variable u to a 96x192 Gaussian grid:

>>> u = f('u')

>>> u.shape

(3, 16, 32)

>>> t63_grid = cdms.createGaussianGrid(96)

>>> u63 = u.regrid(t63_grid)

>>> u63.shape

(3, 96, 192)

To regrid a variable uold to the same grid as variable vnew :

>>> uold.shape

(3, 16, 32)

>>> vnew.shape

(3, 96, 192)

>>> t63_grid = vnew.getGrid() # Obtain the grid for vnew

>>> u63 = u.regrid(t63_grid)

>>> u63.shape

(3, 96, 192)

Regridding is discussed in See Regridding Data..

Time types

CDMS provides extensive support for time values in the cdtime module. cdtime also defines a set of calendars, specifying the number of days in a given month.

Two time types are available: relative time and component time . Relative time is time relative to a fixed base time. It consists of:

a units string, of the form "units since basetime" , and
a floating-point value

For example, the time "28.0 days since 1996-1-1" has value= 28.0 , and units= "days since 1996-1-1". To create a relative time type:

>>> import cdtime

>>> rt = cdtime.reltime(28.0, "days since 1996-1-1")

>>> rt

28.00 days since 1996-1-1

>>> rt.value

28.0

>>> rt.units

'days since 1996-1-1'

A component time consists of the integer fields year, month, day, hour, minute , and the floating-point field second . For example:

>>> ct = cdtime.comptime(1996,2,28,12,10,30)

>>> ct

1996-2-28 12:10:30.0

>>> ct.year

1996

>>> ct.month

The conversion functions tocomp and torel convert between the two representations. For instance, suppose that the time axis of a variable is represented in units " days since 1979 ". To find the coordinate value corresponding to January 1, 1990:

>>> ct = cdtime.comptime(1990,1)

>>> rt = ct.torel("days since 1979")

>>> rt.value

4018.0

Time values can be used to specify intervals of time to read. The syntax time=(c1,c2) specifies that data should be read for times t such that c1<=t<=c2:

>>> c1 = cdtime.comptime(1990,1)

>>> c2 = cdtime.comptime(1991,1)

>>> ua = f['ua']

>>> ua.shape

(480, 17, 73, 144)

>>> x = ua.subRegion(time=(c1,c2))

>>> x.shape

(12, 17, 73, 144)

or string representations can be used:

>>> x = ua.subRegion(time=('1990-1','1991-1'))

Time types are described in See cdtime Module..

Plotting data

Data read via the CDMS Python interface can be plotted using the vcs module. This module, part of the Climate Data Analysis Tool (CDAT) is documented in the VCS reference manual. The vcs module provides access to the functionality of the VCS visualization program.

To generate a plot:

Initialize a canvas with the vcs init routine.
Plot the data using the canvas plot routine.

For example:

>>> import cdms, vcs

>>> f = cdms.open('sample.nc')

>>> f['time'][:] # Print the time coordinates

[ 0., 6., 12., 18., 24., 30., 36., 42., 48., 54., 60., 66., 72., 78., 84., 90.,]

>>> precip = f('prc', time=24.0) # Read precip data

>>> precip.shape

(1, 32, 64)

>>> w = vcs.init() # Initialize a canvas

'Template' is currently set to P_default.

Graphics method 'Boxfill' is currently set to Gfb_default.

>>> w.plot(precip) # Generate a plot

(generates a boxfill plot)

By default a boxfill plot of the lat-lon slice is produced. Since variable precip includes information on time, latitude, and longitude, the continental outlines and time information are also plotted.

The plot routine has a number of options for producing different types of plots, such as isofill and x-y plots. See See Plotting CDMS data in Python. for details.