The content of the ARM website is available to any browser, but for the best experience we highly recommend you upgrade to a standards-compliant browser such as Firefox, Opera or Safari.
VIEW CART
primary link menu HOME SITE INDEX PEOPLE
skip to main content ABOUT ARMABOUT ACRFSCIENCESITESINSTRUMENTSMEASUREMENTSDATAPUBLICATIONSEDUCATIONFORMS

Page Contents

Data Management and Documentation Plan

Purpose

This document describes datastream documentation requirements and standard formatting and naming protocols for both data users and the infrastructure who produce the data.

Definitions

Baseline Change Request (BCR):
Used by the ARM Infrastructure as a process to provide configuration control and for formally requesting and documenting changes within the ARM Infrastructure.
Data Object Description (DOD):
The basic information, definitions, and metadata required to process "raw" measurement data into netCDF files. The DOD becomes the header of the ARM netCDF files.
Data Stream:
A time sequenced series of like data files.
Metadata:
Often described as "information or data about the data." Typically refers to information about primary data, which is usually numerical, or information describing aspects of the primary data. Such information could include, instrument site information, environmental conditions under which the data were acquired, and any other data needed to understand the primary data.
Near-Real Time:
When referred to in textual references, the ARM conception of "near-real time" is "with a few hours delay."
Quality Assured Data:
Typically the final form of data to be submitted to the ARM data system. This includes data stream description documentation, fully calibrated data in commonly used geophysical units, quality flagged data files and all ancillary data (metadata) needed by a future user of the data stream to make full sense of it.
Quality Measurement Experiment (QME):
The regular intercomparison of two or more data sets intended to understand the individual data streams either as functions of the performance of an instrument or the accuracy of a model prediction.
Value-Added Product (VAP):
A new data stream generated by applying an algorithm or other transform to existing data.

Data Documentation Requirements

For all new data streams, measurements, VAPs, QMEs, and data reprocessing, several steps are required before approval as an addition to the ARM baseline:

Data Formatting and Naming Protocols

File Type/Format

NetCDF is the preferred data format because it supports efficient data storage and reliable/robust documentation of the data structure. More information about netCDF is available at http://www.unidata.ucar.edu/packages/netcdf/faq.html. ASCII and HDF formats are used for some "External Data Products." When using ASCII, a description of the file structure and its proposed documentation should be reviewed and approved by the External Data Center (XDC) and/or Archive data managers. HDF is the standard for most satellite data. More information about HDF is available at http://hdf.ncsa.uiuc.edu.

File Naming Conventions

Raw Data

Raw data files shall be named according to the following naming convention:
(sss)(inst)(Fn).00.YYYYMMDD.hhmmss.raw.(xxxx.zzz)

where:

sss
is the site identifier (e.g., sgp, twp, nsa)
inst
is the instrument basename (e.g., mwr, wsi, mpl)
Fn
is the facility designation (e.g., C1, E13, B4)
xxxx.zzz
is the original raw data file name produced on the instrument

An example raw data file name is:
nsamwrC1.00.20021109.140000.raw.20_20021109_140000.dat.

This file is from the North Slope of Alaska Barrow site. It contains raw microwave radiometer data for November 9, 2002, for the hour beginning 140000. Most raw instrument data are collected hourly resulting in 24 raw data files per day. These files are bundled into daily tar files before archival. Tar bundles shall be named according to the following naming convention:
(sss)(inst)(Fn).00.YYYYMMDD.000000.raw.(zzz).tar

where:

sss
is the site identifier (e.g., sgp, twp, nsa)
inst
is the instrument basename (e.g., mwr, wsi, mpl)
Fn
is the facility designation (e.g., C1, E13, B4)
zzz
is the extension from the original raw data file name, usually the format of the file or an instrument serial number.

The example raw file shown above will be archived in a tar bundle named
nsamwrC1.00.20021109.000000.raw.dat.tar.

Guidelines for Original Raw File Naming

When possible, the original file name produced on the instrument or instrument data system should contain adequate information to determine the origin of the file including:

Under constraints of 8.3, it is probably not possible to include all this information. In these instances, it is important to include adequate header information inside the file to permit the user to determine the source/origin data and provide a reference date (including year) and time.

Data names are case sensitive. xxxxxx.DAT and xxxxxx.dat may be interpreted as two different names by ingests and bundling routines. Instruments should be consistent in the way the original file names are assigned, including case.

Processed Data

ARM netCDF files shall be named according to the following naming convention:
(sss)(nn)(inst)(qqq)(Fn).(ln).YYYYMMDD.hhmmss.cdf.

where:

sss
is the site identifier (e.g., sgp, twp, nsa)
nn
is the data integration period in minutes (e.g., 1, 5, 15, 30, 1440)
inst
is the instrument basename (e.g., mwr, wsi, mpl)
qqq
is an optional qualifier that distinguishes these data from other data sets produced by the same instrument
Fn
is the facility designation (e.g., C1, E13, B4)
ln
is the data level (e.g., a0, a1, b1, c1)

An example netCDF data file name is depicted below:

Diagram of a data filename

The sgp5mwravgB4.c1.20040706.020415.cdf file contains 5-minute averaged microwave radiometer data from the Southern Great Plains Vici site from July 6, 2004. The data level is "c1" indicating the data was derived or calculated via Value-Added Processing (see Data Levels).

Other Data Formats

Processed ARM data may be stored in a format other than netCDF. The basic naming convention for processed files will not change, but the final extension will change accordingly:

asc
ASCII data format
hdf
HDF data format (limited to satellite data)
png
PNG data format (standard ARM image format)
mng
MNG data format (standard ARM movie format)

Other data formats (e.g., gifs, jpg) may also exist, but are not recommended for future development.

Data Levels

Data levels are based on the "level of processing" with the lowest level of data being designated as raw or "00" data. Each subsequent data level has minimum requirements and data level is not increased until ALL those requirements of that level as well as the requirements of all data levels below that level have been met.

00
raw data - primary raw data stream collected directly from instrument
01
raw data - redundant data stream or sneakernet data
a0
converted to netCDF
a1
calibration factors applied and converted to geophysical units
a2... to a9
further processing on a1 level data that does not merit b1 classification
b1
QC checks applied to measurements
b2... to b9
further processing on b1 level data that does not merit c1 classification
c0
intermediate value-added data product; this data level is always used as input to a higher level "VAP"
c1
derived or calculated value-added data product (VAP) using one or more measured or modeled data (a0 to c1) as input
c2... to c9
further processing applied to a "c1" level data stream
s1
summary file consisting of a subset of the parent .c1 file with simplified QC and known 'bad' values set to missing

Notes:

  1. Not every data level need be produced for each instrument data set. For example, if conversion to netCDF and calibration and engineering units are applied in a single processing step, no "a0" data product would be produced.
  2. Data level .cN is restricted to data derived or calculated through value-added processing.

Graphic Data Formats

For formatted documents and graphics-rich documents, PDF file type is standard. For photographs, drawings, sketches, and data plots, PNG file type is standard. For movies, MPG file type is standard.

File Duration

To control the number of small files and to help facilitate the use of ARM data, the suggested file period is 24 hours. Very large data sets may be routinely split into two or more netCDF files per day to increase usability. Infrequently, daily data files may be split into two files when the global header information changes as a result of a maintenance action (e.g., instrument serial number or calibration change).

Measurement Metadata and Standard Measurement Names

A scientifically relevant "measurement description" is a structured description of a data stream; the description addresses why the data stream exists. Data streams also contain other information that is important in understanding or interpreting the data stream but are not considered significant for naming purposes. Examples include global information, such as location; calibration procedural information; QC checks and flags. If relevant, other instrument details can be included:

Data Quality Reports

Data Quality Reports (DQRs) document events which result in altered data quality. As such, DQRs do not capture maintenance or calibration activity unless the data stream is affected negatively (e.g., lowering an instrumented tower for preventive maintenance results in a DQR documenting the time the data was impacted). DQRs may be submitted by any source with knowledge of the quality of the data stream - instrument mentor, site operations, or a data user. Data users are encouraged to submit DQRs that describe the usefulness or limitations of the data for scientific analysis. When submitting DQRs, specific data streams, measurements, and time periods need to be stated explicitly. The DQR standard tool for submitting DQRs is available at http://www.db.arm.gov/PIFCARDQR/entry.