root/cf-conventions/trunk/docbooksrc/description-of-the-data.xml

Revision 92, 24.5 kB (checked in by mlaker1, 2 months ago)

Final touches related to Ticket #17.

Line 
1 <chapter>
2   <title>
3     Description of the Data
4   </title>
5
6   <para>
7     The attributes described in this section are used to
8     provide a description of the content and the units
9     of measurement for each variable. We continue to
10     support the use of the
11     <varname>units</varname>
12     and
13     <varname>long_name</varname> attributes
14     as defined in COARDS. We extend COARDS by adding the
15     optional
16     <varname>standard_name</varname>
17     attribute which is used to provide
18     unique identifiers for variables. This is important for
19     data exchange since one cannot necessarily identify a
20     particular variable based on the name assigned to it by
21     the institution that provided the data.
22   </para>
23  
24   <para>
25     The
26     <varname>standard_name</varname>
27     attribute can
28     be used to identify variables that contain coordinate
29     data. But since it is an optional attribute, applications
30     that implement these standards must continue to be
31     able to identify coordinate types based on the COARDS
32     conventions.
33   </para>
34
35   <section id="units">
36     <title>Units</title>
37     <para>
38                 The <varname>units</varname> attribute is required for all variables
39         that represent dimensional quantities (except for boundary variables
40         defined in <xref linkend="cell-boundaries"/> and climatology variables
41         defined in  <xref linkend="climatological-statistics"/>). The value of
42         the <varname>units</varname> attribute is a string that can be
43         recognized by UNIDATA"s Udunits package <biblioref linkend="udunits"/>,
44         with a few exceptions that are given below.
45         The <ulink url="http://www.unidata.ucar.edu/software/udunits/">Udunits package</ulink> includes a file
46         <filename>udunits.dat</filename>,
47         which lists its supported unit names. Note that case is significant in the <varname>units</varname> strings.
48     </para>
49
50     <para>
51                 The COARDS convention prohibits the unit
52         <constant>degrees</constant> altogether, but this unit is not
53         forbidden by the CF convention because it may in fact be appropriate
54         for a variable containing, say, solar zenith angle. The unit
55         <constant>degrees</constant> is also allowed on coordinate variables
56         such as the latitude and longitude coordinates of a transformed grid.
57         In this case the coordinate values are not true latitudes and
58         longitudes which must always be identified using the more specific
59         forms of <constant>degrees</constant> as described in
60         <xref linkend="latitude-coordinate"/> and <xref linkend="longitude-coordinate"/>.
61     </para>
62
63     <para>
64       Units are not required for dimensionless quantities. A variable with no units attribute is assumed to be dimensionless. However, a units attribute specifying a dimensionless unit may optionally be included. The Udunits package defines a few dimensionless units, such as <constant>percent</constant>, but is lacking commonly used units such as ppm (parts per million). This convention does not support the addition of new dimensionless units that are not udunits compatible. The conforming unit for quantities that represent fractions, or parts of a whole, is "1". The conforming unit for parts per million is "1e-6". Descriptive information about dimensionless quantities, such as sea-ice concentration, cloud fraction, probability, etc., should be given in the <varname>long_name</varname> or <varname>standard_name</varname> attributes (see below) rather than the <varname>units</varname>.
65     </para>
66
67     <para>
68                 The units <constant>level</constant>, <constant>layer</constant>, and <constant>sigma_level</constant> are allowed for dimensionless vertical coordinates to maintain backwards compatibility with COARDS. These units are not compatible with Udunits and are deprecated by this standard because conventions for more precisely identifying dimensionless vertical coordinates are introduced (see <xref linkend="dimensionless-vertical-coordinate"/>).
69     </para>
70
71     <para>
72       The Udunits syntax that allows scale factors and offsets to be applied to
73       a unit is not supported by this standard. The application of any scale
74       factors or offsets to data should be indicated by the
75       <varname>scale_factor</varname> and <varname>add_offset</varname>
76       attributes. Use of these attributes for data packing,
77       which is their most important application,
78       is discussed in detail in <xref linkend="packed-data"/>.
79     </para>
80
81     <para>
82       Udunits recognizes the following prefixes and their abbreviations.
83       <table id="table-supported-units" frame="all"><title>Supported Units</title>
84         <tgroup cols="7" align="left" colsep="1" rowsep="1">
85           <thead>
86             <row>
87               <entry>Factor</entry>
88               <entry>Prefix</entry>
89               <entry>Abbreviation</entry>
90               <entry></entry>
91               <entry>Factor</entry>
92               <entry>Prefix</entry>
93               <entry>Abbreviation</entry>
94             </row>
95           </thead>
96           <tbody>
97             <row>
98               <entry>1e1</entry>
99               <entry>deca,deka</entry>
100               <entry>da</entry>
101               <entry></entry>
102               <entry>1e-1</entry>
103               <entry>deci</entry>
104               <entry>d</entry>
105             </row>
106             <row>
107               <entry>1e2</entry>
108               <entry>hecto</entry>
109               <entry>h</entry>
110               <entry></entry>
111               <entry>1e-2</entry>
112               <entry>centi</entry>
113               <entry>c</entry>
114             </row>
115             <row>
116               <entry>1e3</entry>
117               <entry>kilo</entry>
118               <entry>k</entry>
119               <entry></entry>
120               <entry>1e-3</entry>
121               <entry>milli</entry>
122               <entry>m</entry>
123             </row>
124             <row>
125               <entry>1e6</entry>
126               <entry>mega</entry>
127               <entry>M</entry>
128               <entry></entry>
129               <entry>1e-6</entry>
130               <entry>micro</entry>
131               <entry>u</entry>
132             </row>
133             <row>
134               <entry>1e9</entry>
135               <entry>giga</entry>
136               <entry>G</entry>
137               <entry></entry>
138               <entry>1e-9</entry>
139               <entry>nano</entry>
140               <entry>n</entry>
141             </row>
142             <row>
143               <entry>1e12</entry>
144               <entry>tera</entry>
145               <entry>T</entry>
146               <entry></entry>
147               <entry>1e-12</entry>
148               <entry>pico</entry>
149               <entry>p</entry>
150             </row>
151             <row>
152               <entry>1e15</entry>
153               <entry>peta</entry>
154               <entry>P</entry>
155               <entry></entry>
156               <entry>1e-15</entry>
157               <entry>femto</entry>
158               <entry>f</entry>
159             </row>
160             <row>
161               <entry>1e18</entry>
162               <entry>exa</entry>
163               <entry>E</entry>
164               <entry></entry>
165               <entry>1e-18</entry>
166               <entry>atto</entry>
167               <entry>a</entry>
168             </row>
169             <row>
170               <entry>1e21</entry>
171               <entry>zetta</entry>
172               <entry>Z</entry>
173               <entry></entry>
174               <entry>1e-21</entry>
175               <entry>zepto</entry>
176               <entry>z</entry>
177             </row>
178             <row>
179               <entry>1e24</entry>
180               <entry>yotta</entry>
181               <entry>Y</entry>
182               <entry></entry>
183               <entry>1e-24</entry>
184               <entry>yocto</entry>
185               <entry>y</entry>
186             </row>
187           </tbody>
188         </tgroup>
189       </table>
190     </para>
191
192   </section>
193   <section id="long-name">
194     <title>Long Name</title>
195     <para>
196       The <varname>long_name</varname> attribute is defined by the NUG to contain a long descriptive name which may, for example, be used for labeling plots. For backwards compatibility with COARDS this attribute is optional. But it is highly recommended that either this or the <varname>standard_name</varname> attribute defined in the next section be provided to make the file self-describing. If a variable has no <varname>long_name</varname> attribute then an application may use, as a default, the <varname>standard_name</varname> if it exists, or the variable name itself.
197     </para>
198   </section>
199
200   <section id="standard-name">
201     <title>Standard Name</title>
202     <para>
203       A fundamental requirement for exchange of scientific data is the ability to describe precisely the physical quantities being represented. To some extent this is the role of the <varname>long_name</varname> attribute as defined in the NUG. However, usage of <varname>long_name</varname> is completely ad-hoc. For some applications it would be desirable to have a more definitive description of the quantity, which would allow users of data from different sources to determine whether quantities were in fact comparable. For this reason an optional mechanism for uniquely associating each variable with a standard name is provided.
204     </para>
205    
206     <para>
207                 A standard name is associated with a variable via the attribute <varname>standard_name</varname> which takes a string value comprised of a standard name optionally followed by one or more blanks and a standard name modifier (a string value from <xref linkend="standard-name-modifiers"/>).
208     </para>
209
210     <para>
211       The set of permissible standard names is contained in the standard name table. The table entry for each standard name contains the following:
212     </para>
213
214     <variablelist>
215       <varlistentry>
216         <term>standard name</term>
217         <listitem>
218           <para>
219             The name used to identify the physical quantity. A standard name contains no whitespace and is case sensitive.
220           </para>
221         </listitem>
222       </varlistentry>
223       <varlistentry>
224         <term>canonical units</term>
225         <listitem>
226           <para>
227                   Representative units of the physical quantity. Unless it is dimensionless, a variable with a <varname>standard_name</varname> attribute must have units which are physically equivalent (not necessarily identical) to the canonical units, possibly modified by an operation specified by either the standard name modifier (see below and <xref linkend="standard-name-modifiers"/>) or by the <varname>cell_methods</varname> attribute (see <xref linkend="cell-methods"/> and <xref linkend="appendix-cell-methods"/>).
228           </para>
229         </listitem>
230       </varlistentry>
231       <varlistentry>
232         <term>description</term>
233         <listitem>
234           <para>
235             The description is meant to clarify the qualifiers of the fundamental quantities such as which surface a quantity is defined on or what the flux sign conventions are. We don"t attempt to provide precise definitions of fundumental physical quantities (e.g., temperature) which may be found in the literature.
236           </para>
237         </listitem>
238       </varlistentry>
239     </variablelist>   
240
241     <para>
242       When appropriate, the table entry also contains the corresponding GRIB parameter code(s) (from ECMWF and NCEP) and AMIP identifiers.
243     </para>
244
245     <para>
246         The standard name table is located at
247         <ulink url="http://cf-pcmdi.llnl.gov/documents/cf-standard-names/standard-name-table/current/cf-standard-name-table.xml">http://cf-pcmdi.llnl.gov/documents/cf-standard-names/standard-name-table/current/cf-standard-name-table.xml</ulink>
248         , written in compliance with the XML format, as described in
249         <xref linkend="standard-name-table-format"/>.
250         Knowledge of the XML format is only necessary for application
251         writers who plan to directly access the table. A formatted text
252         version of the table is provided at
253         <ulink url="http://cf-pcmdi.llnl.gov/documents/cf-standard-names/standard-name-table/current/standard-name-table">http://cf-pcmdi.llnl.gov/documents/cf-standard-names/standard-name-table/current/standard-name-table</ulink>
254         , and this table may be consulted in order to find the standard
255         name that should be assigned to a variable. Some standard names
256         (e.g. <varname>region</varname> and <varname>area_type</varname>)
257         are used to indicate quantities which are permitted to take only
258         certain standard values. This is indicated in the definition of the
259         quantity in the standard name table, accompanied by a list or a link
260         to a list of the permitted values.
261     </para>
262
263     <para>
264                         Standard names by themselves are not always sufficient to describe a quantity. For example, a variable may contain data to which spatial or temporal operations have been applied. Or the data may represent an uncertainty in the measurement of a quantity. These quantity attributes are expressed as modifiers of the standard name. Modifications due to common statistical operations are expressed via the <varname>cell_methods</varname> attribute (see <xref linkend="cell-methods"/> and <xref linkend="appendix-cell-methods"/>). Other types of quantity modifiers are expressed using the optional modifier part of the <varname>standard_name</varname> attribute. The permissible values of these modifiers are given in <xref linkend="standard-name-modifiers"/>.
265     </para>
266    
267     <example>
268     <title>Use of <varname>standard_name</varname></title>
269       <programlisting>
270 float psl(lat,lon) ;
271   psl:long_name = "mean sea level pressure" ;
272   psl:units = "hPa" ;
273   psl:standard_name = "air_pressure_at_sea_level" ;
274       </programlisting>
275         <para>
276           The description in the standard name table entry for <varname>air_pressure_at_sea_level</varname> clarifies that "sea level" refers to the mean sea level, which is close to the geoid in sea areas.
277         </para>
278     </example>
279
280     <para>
281       Here are lists of equivalences between the CF standard names and the standard names from the
282       <ulink url="http://cf-pcmdi.llnl.gov/documents/cf-standard-names/ecmwf-grib-mapping">ECMWF GRIB tables</ulink>,  the
283       <ulink url="http://cf-pcmdi.llnl.gov/documents/cf-standard-names/ncep-grib-code-cf-standard-name-mapping">NCEP GRIB tables</ulink>, and the
284       <ulink url="http://cf-pcmdi.llnl.gov/documents/cf-standard-names/pcmdi-name-cf-standard-name-mapping">PCMDI tables</ulink>.
285     </para>
286   </section>
287
288   <section id="ancillary-data">
289     <title>Ancillary Data</title>
290     <para>
291                 When one data variable provides metadata about the individual values of another data variable it may be desirable to express this association by providing a link between the variables. For example, instrument data may have associated measures of uncertainty. The attribute <varname>ancillary_variables</varname> is used to express these types of relationships. It is a string attribute whose value is a blank separated list of variable names. The nature of the relationship between variables associated via <varname>ancillary_variables</varname> must be determined by other attributes. The variables listed by the <varname>ancillary_variables</varname> attribute will often have the standard name of the variable which points to them including a modifier (<xref linkend="standard-name-modifiers"/>) to indicate the relationship.
292     </para>
293
294     <example><title>Instrument data</title>
295       <programlisting>
296   float q(time) ;
297     q:standard_name = "specific_humidity" ;
298     q:units = "g/g" ;
299     q:ancillary_variables = "q_error_limit q_detection_limit" ;
300   float q_error_limit(time)
301     q_error_limit:standard_name = "specific_humidity standard_error" ;
302     q_error_limit:units = "g/g" ;
303   float q_detection_limit(time)
304     q_detection_limit:standard_name = "specific_humidity detection_minimum" ;
305     q_detection_limit:units = "g/g" ;
306       </programlisting>
307     </example>
308   </section>
309
310   <section id="flags">
311     <title>Flags</title>
312     <para>
313       The attributes <varname>flag_values</varname><emphasis role="newtext">,
314       <varname>flag_masks</varname></emphasis> and <varname>flag_meanings</varname>
315       are intended to make variables that contain flag values self describing.
316       <emphasis role="newtext">Status codes and Boolean (binary) condition flags
317       may be expressed with different combinations of <varname>flag_values</varname>
318       and <varname>flag_masks</varname> attribute definitions.</emphasis>     
319     </para>
320     <para>
321       <emphasis role="newtext">The <varname>flag_values</varname> and <varname>flag_meanings</varname>
322       attributes describe a status flag consisting of mutually exclusive coded values.</emphasis>
323       The <varname>flag_values</varname> attribute is the same type as the variable to which
324       it is attached, and contains a list of the possible flag values.
325       The <varname>flag_meanings</varname> attribute is a string whose value is a blank
326       separated list of descriptive words or phrases, one for each flag value.
327       If multi-word phrases are used to describe the flag values, then the words within
328       a phrase should be connected with underscores. <emphasis role="newtext">The following example illustrates
329       the use of flag values to express a speed quality with an enumerated status code.</emphasis>
330     </para>
331     <example><title>A flag variable<emphasis role="newtext">, using <varname>flag_values</varname></emphasis></title>
332       <programlisting>
333   byte current_speed_qc(time, depth, lat, lon) ;
334     current_speed_qc:long_name = "Current Speed Quality" ;
335     <emphasis role="newtext">current_speed_qc:standard_name = "sea_water_speed status_flag" ;</emphasis>
336     current_speed_qc:_FillValue = -128b ;
337     current_speed_qc:valid_range = <emphasis role="newtext">0b, 2b</emphasis><emphasis role="deletedtext">-127b, 127b</emphasis> ;
338     current_speed_qc:flag_values = 0b, 1b, 2b ;
339     current_speed_qc:flag_meanings = "quality_good sensor_nonfunctional
340                                       outside_valid_range" ;
341       </programlisting>
342     </example>
343     <para>
344       <emphasis role="newtext">The <varname>flag_masks</varname> and <varname>flag_meanings</varname>
345       attributes describe a number of independent Boolean conditions using bit field notation by setting
346       unique bits in each <varname>flag_masks</varname> value.  <varname>The flag_masks</varname> attribute
347       is the same type as the variable to which it is attached, and contains a list of values matching unique
348       bit fields.  The <varname>flag_meanings</varname> attribute is defined as above, one for each
349       <varname>flag_masks</varname> value.  A flagged condition is identified by performing a bitwise AND
350       of the variable value and each <varname>flag_masks</varname> value; a non-zero result indicates a
351       <varname>true</varname> condition.  Thus, any or all of the flagged conditions may be <varname>true</varname>,
352       depending on the variable bit settings. The following example illustrates the use of <varname>flag_masks</varname>
353       to express six sensor status conditions.</emphasis>
354     </para>
355     <example>
356       <title><emphasis role="newtext">A flag variable, using <varname>flag_masks</varname></emphasis></title>
357       <programlisting><emphasis role="newtext">
358   byte sensor_status_qc(time, depth, lat, lon) ;
359     sensor_status_qc:long_name = "Sensor Status" ;
360     sensor_status_qc:_FillValue = 0b ;
361     sensor_status_qc:valid_range = 1b, 63b ;
362     sensor_status_qc:flag_masks = 1b, 2b, 4b, 8b, 16b, 32b ;
363     sensor_status_qc:flag_meanings = "low_battery processor_fault
364                                       memory_fault disk_fault
365                                       software_fault
366                                       maintenance_required" ;</emphasis>
367       </programlisting>
368     </example>
369     <para>
370       <emphasis role="newtext">The <varname>flag_masks</varname>, <varname>flag_values</varname> and
371       <varname>flag_meanings</varname> attributes, used together, describe a blend of independent Boolean
372       conditions and enumerated status codes.  The <varname>flag_masks</varname> and <varname>flag_values</varname>
373       attributes are both the same type as the variable to which they are attached.  A flagged condition
374       is identified by a bitwise AND of the variable value and each <varname>flag_masks</varname> value;
375       a result that matches the <varname>flag_values</varname> value indicates a <varname>true</varname>
376       condition.  Repeated <varname>flag_masks</varname> define a bit field mask that identifies a number
377       of status conditions with different <varname>flag_values</varname>.  The <varname>flag_meanings</varname>
378       attribute is defined as above, one for each <varname>flag_masks</varname> bit field and
379       <varname>flag_values</varname> definition.  Each <varname>flag_values</varname> and
380       <varname>flag_masks</varname> value must coincide with a <varname>flag_meanings</varname> value. 
381       The following example illustrates the use of <varname>flag_masks</varname> and <varname>flag_values</varname>
382       to express two sensor status conditions and one enumerated status code.</emphasis>
383     </para>
384     <example>
385       <title><emphasis role="newtext">A flag variable, using <varname>flag_masks</varname> and <varname>flag_values</varname></emphasis></title>
386       <programlisting><emphasis role="newtext">
387   byte sensor_status_qc(time, depth, lat, lon) ;
388     sensor_status_qc:long_name = "Sensor Status" ;
389     sensor_status_qc:_FillValue = 0b ;
390     sensor_status_qc:valid_range = 1b, 15b ;
391     sensor_status_qc:flag_masks = 1b, 2b, 12b, 12b, 12b ;
392     sensor_status_qc:flag_values = 1b, 2b, 4b, 8b, 12b ;
393     sensor_status_qc:flag_meanings =
394          "low_battery
395           hardware_fault
396           offline_mode calibration_mode maintenance_mode" ;</emphasis>
397       </programlisting>
398     </example>
399     <para>
400       <emphasis role="newtext">In this case, mutually exclusive values are blended with Boolean values
401       to maximize use of the available bits in a flag value.  The table below represents the four binary
402       digits (bits) expressed by the <varname>sensor_status_qc</varname> variable in the previous
403       example.</emphasis>
404     </para>
405     <para>
406       <emphasis role="newtext">Bit 0 and Bit 1 are Boolean values indicating a low battery condition
407       and a hardware fault, respectively. The next two bits (Bit 2 and Bit 3) express an enumeration
408       indicating abnormal sensor operating modes.  Thus, if Bit 0 is set, the battery is low and if
409       Bit 1 is set, there is a hardware fault - independent of the current sensor operating mode.</emphasis>
410     </para>
411     <table frame="all"><title><emphasis role="newtext">Flag Variable Bits (from Example)</emphasis></title>
412       <tgroup cols="4" align="left" colsep="1" rowsep="1">
413         <colspec colwidth="50pt"/>
414         <colspec colwidth="50pt"/>
415         <colspec colwidth="50pt"/>
416         <colspec colwidth="50pt"/>
417         <thead>
418           <row>
419             <entry><emphasis role="newtext">Bit 3 (MSB)</emphasis></entry>
420             <entry><emphasis role="newtext">Bit 2</emphasis></entry>
421             <entry><emphasis role="newtext">Bit 1</emphasis></entry>
422             <entry><emphasis role="newtext">Bit 0 (LSB)</emphasis></entry>
423           </row>
424         </thead>
425         <tbody>
426           <row>
427             <entry></entry>
428             <entry></entry>
429             <entry><emphasis role="newtext">H/W Fault</emphasis></entry>
430             <entry><emphasis role="newtext">Low Batt</emphasis></entry>
431           </row>
432         </tbody>
433       </tgroup>
434     </table>
435     <para>
436       <emphasis role="newtext">The remaining bits (Bit 2 and Bit 3) are decoded as follows:</emphasis>
437     </para>
438     <table frame="all"><title><emphasis role="newtext">Flag Variable Bit 2 and Bit 3 (from Example)</emphasis></title>
439       <tgroup cols="3" align="left" colsep="1" rowsep="1">
440         <colspec colwidth="50pt"/>
441         <colspec colwidth="50pt"/>
442         <colspec colwidth="100pt"/>
443         <thead>
444           <row>
445             <entry><emphasis role="newtext">Bit 3</emphasis></entry>
446             <entry><emphasis role="newtext">Bit 2</emphasis></entry>
447             <entry><emphasis role="newtext">Mode</emphasis></entry>
448           </row>
449         </thead>
450         <tbody>
451           <row>
452             <entry><emphasis role="newtext">0</emphasis></entry>
453             <entry><emphasis role="newtext">1</emphasis></entry>
454             <entry><emphasis role="newtext">offline_mode</emphasis></entry>
455           </row>
456           <row>
457             <entry><emphasis role="newtext">1</emphasis></entry>
458             <entry><emphasis role="newtext">0</emphasis></entry>
459             <entry><emphasis role="newtext">calibration_mode</emphasis></entry>
460           </row>
461           <row>
462             <entry><emphasis role="newtext">1</emphasis></entry>
463             <entry><emphasis role="newtext">1</emphasis></entry>
464             <entry><emphasis role="newtext">maintenance_mode</emphasis></entry>
465           </row>
466         </tbody>
467       </tgroup>
468     </table>
469     <para>
470       <emphasis role="newtext">The "12b" flag mask is repeated in the <varname>sensor_status_qc</varname>
471       <varname>flag_masks</varname> definition to explicitly declare the recommended bit field masks to
472       repeatedly AND with the variable value while searching for matching enumerated values. An application
473       determines if any of the conditions declared in the <varname>flag_meanings</varname> list are
474       <varname>true</varname> by simply iterating through each of the <varname>flag_masks</varname> and
475       AND'ing them with the variable. When a result is equal to the corresponding <varname>flag_values</varname>
476       element, that condition is <varname>true</varname>. The repeated <varname>flag_masks</varname> enable
477       a simple mechanism for clients to detect all possible conditions.</emphasis>
478     </para>
479   </section>
480 </chapter>
481
482
483
Note: See TracBrowser for help on using the browser.