Updated by L. A. Hook, T. W. Beaty, S. Santhana-Vannan, L. Baskaran, and R. B. Cook. June 2007.
Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A.
At the request of several field researchers, investigators, GIS and image specialists, and data managers, the following guidelines have been prepared for the data management practices that data collectors should follow to improve the usability of their data sets. This guidance is provided for those who perform environmental measurements, although many of the practices may be useful for other data collection and archiving activities.
We assembled what we feel are the most important practices that researchers could implement to make their data sets ready to share with other researchers. These practices could be performed at any time during the preparation of the data set, but we suggest that researchers consider them before measurements are taken. The order of the practices is not necessarily sequential, as a researcher could provide draft data set metadata before any measurements are taken.
The seven best practices are:
File names should reflect the contents of the file and include enough information to uniquely identify the data file. File names may contain information such as project acronym, study title, location, investigator, year(s) of study, data type, version number, and file type. The file name should be provided in the documentation (described in Sect. 7) and in the first line of the header rows in the file itself.
Clear, descriptive, and unique file names may be important later when your data file is combined in a directory or FTP site with your own data files or with the data files of other investigators. Avoid using file names such as mydata.dat or 1998.dat.
File names should be constructed for easy management by various data systems. Names should contain only numbers, letters, dashes, and underscores -- no spaces or special characters. Also, in general, lower-case names are less software and platform dependent and are preferred. When choosing a file name, check for any database management limitations on the use of special characters and file name length. For practical reasons of legibility and usability, file names should not be more than 64 characters in length and if well constructed could be considerably less.
You may want to use similar logic when designing directory structures and names. Also, the data set title (see Sect. 6) should be similar to the data file name(s).
Version Number: Including a data file creation date or version number enables data users to quickly determine which data they are using if an update to the data set is released (e.g., *_v1.csv, *_r1.csv, or *_20070227.csv).
File Type or Extensions: Use *.txt, *.csv generally for tabular data. Section 2 addresses formats and extensions for image data files.
File Compression: Use *.zip, *.gz, or *.tar file extensions, as appropriate for the compression software. The individual files may be compressed for space conservation or several files may be aggregated and then compressed as one file of reduced size. When multiple files are compressed together, the same file naming guidelines apply to the compressed collection of files.
Example Data File Names:
c130_a792_20000916.csv
(From data set SAFARI 2000 C-130 Aerosol and Meteorological Data, Dry Season 2000)
WBW_veg_inventory_all_20050304.csv
(From data set Walker Branch Watershed Vegetation Inventory, 1967-1997)
bigfoot_agro_2000_gpp.zip
(From bigfoot_agro_2000_gpp.zip, data set BigFoot GPP Surfaces for North and South American Sites, 2000-2004)
In choosing a file format, data collectors should select a consistent format that can be read well into the future and is independent of changes in applications.
Tabular Data:
Using ASCII file formats is the best way to ensure that measurement data are readable in the future.
At the top of the file, include several header rows.
Within the ASCII file, follow these guidelines.
In the data set documentation, specifically add the following data file information.
Image (Raster) Data:
Some field researchers may generate Image (Raster) data sets. Below are some guidelines/recommendations for archiving these types of data files.
Suggested Non-Proprietary File Formats: (Listed in order of our preference. See file extension reference, Appendix A.)
If you cannot use any of the above formats, another option is to use any non-proprietary public domain data format. Whatever file format you use, be sure to thoroughly document the format and follow the suggested guidelines.
Guidelines for documenting image data files:
Proprietary Software Data Formats:
Data that are provided in a proprietary software format must include documentation of the software specifications (i.e., Software Package, Version, Vendor, and native platform). The archive data center will use this information to convert to a non-proprietary format for the archive.
All files types, that constitute the complete geographic data format documentation, must be provided for the specific software package. For example:
idrisi software images -- provide the *.rdc and the *.rst files [http://www.clarklabs.org/]
Image (Vector) Data:
Below are suggested vector file formats. These are mostly proprietary data formats; please be sure to document the Software Package, Version, Vendor, and native platform.
Also make sure that the vectors are properly geo-referenced and the geometry type (Point, Line, Polygon, Multipoint etc ) is specified.
File Extension Reference Table
A table of common file extensions and their generally accepted formats are described in Appendix A.
On-line Resources:
In order for others to use your data, they must fully understand the contents of the data set, including the parameter names, units of measure, formats, and definitions of coded values. Provide the English language translation of any data values and descriptors that are in another language (e.g., coded fields, variable classes, and GIS coverage attributes).
Parameter Name: The parameters reported in the data set need to have names that describe the contents. The documentation should contain a full description of the parameter. Use commonly accepted parameter names, for example, Temp for temperature, Precip for precipitation, and Lat and Long for latitude and longitude. See the online references in the Bibliography for additional examples. Also, be sure to use consistent capitalization (not temp, Temp, and TEMP) and use only letters, numerals, and underscores in the parameter name.
Units: The units of reported parameters need to be explicitly stated in the data file and in the documentation. We recommend SI units but recognize that each discipline may have its own commonly used units of measure. The critical aspect here is that the units be defined in the documentation so that others understand what is reported.
Formats: Within each data set, choose a format for each parameter, explain the format in the documentation, and use that format throughout the data set. Consistent formats are particularly important for dates, times, and spatial coordinates. For numeric parameters, if the number of decimal places should be preserved to indicate significant digits, then explicitly define the format such that users may take precautions to ensure that significant figures are not lost or gained during data transformations.
We recommend the following formats for common parameters:
Dates: yyyy-mm-dd or yyyymmdd, e.g., January 2, 1997 is 19970102.
Time: Use 24-hour notation (13:30 hrs instead of 1:30 p.m. and 04:30 instead of 4:30 a.m.). Report in both local time and Coordinated Universal Time (UTC). Include local time zone in a separate field. As appropriate, both the begin time and end time should be reported in both local and UTC time. Because UTC and local time may be on different days, we suggest that dates be given for each time reported. Applicable data and time standards are listed in Appendix B.
Spatial Coordinates: Spatial coordinates should be recorded in decimal degrees format to at least 4 (preferably 5 or 6) significant digits past the decimal point. Provide latitude and longitude with south latitude and west longitude recorded as negative values, e.g., 80 30' 00" W longitude is is -80.5000. Make sure that all location information in a file uses the same coordinate system, including coordinate type, datum, and spheroid. Document all three of these characteristics (e.g., Lat/Long decimal degrees, NAD83 (North American Datum of 1983), WGRS84 (World Geographic Reference System of 1984)). Mixing coordinate systems [e.g., NAD83 and NAD27 (North American Datum of 1927)] will cause errors in any geographic analysis of the data. Applicable spatial coordinate standards are listed in Appendix C.
Elevation: Provide elevation in meters. Include detailed information on the vertical datum used (e.g.- North American Vertical Datum 1988 (NAVD 1988) or Australian Height Datum (AHD)). Additional information on vertical datum are include in Appendix D.
Coded Fields:
Coded fields, as opposed to free text fields, often have standardized lists of predefined values from which the data provider may choose. Two good examples are U.S. state abbreviations and postal zip codes. Data collectors may establish their own coded fields with defined values to be consistently used across several data files. The use of consistent sampling site designations is a good application. Coded fields are more efficient for storage and retrieval of data than free text fields.
Guidance for two specific coded fields commonly used in environmental data files:
Data Flag or Qualifying Values: A separate field with specified values may be used to provide additional information about the measured data value including, for example, quality considerations, reasons for missing values, or indicating replicated samples. Codes should not be parameter specific but should be consistent across parameters and data files. Definitions of flag codes should be included in the accompanying data set documentation.
Example documentation of Data Quality Flag values:
Flag Value
Description
V0
Valid value
V1
Valid value but comprised wholly or partially of below detection limit data
V2
Valid estimated value
V3
Valid interpolated value
V4
Valid value despite failing to meet some QC or statistical criteria
V5
Valid value but qualified because of possible contamination (e.g., pollution source, laboratory contamination source)
V6
Valid value but qualified due to non-standard sampling conditions (e.g., instrument malfunction, sample handling)
V7
Valid value but set equal to the detection limit (DL) because the measured value was below the DL
M1
Missing value because no value is available
M2
Missing value because invalidated by data originator
H1
Historical data that have not been assessed or validated
Units: While data collectors can generally agree on the units for reporting measured parameters, the exact syntax of the units designation varies widely among programs, projects, scientific communities, and investigators (if standardized at all). If a shorthand notation is reported in the data file, the complete units should be spelled out in the documentation so that others can understand and interpret your representation of subscripts, superscripts, area, time intervals, etc.
Missing Values: Use a specified extreme value not likely to ever be confused with a measured value (e.g., -9999). Consistently use the same notation for each missing value in the data file.
Typical Parameter Documentation:
The following text describes the parameters in a data set; this type of description should be included in the data set documentation.
Data File Contents: (kt_tree_data.csv) The files are in comma-delimited ASCII format, with the first line listing the data set, author, and date. The data records follow and are described in the table below. A value of -9.99 indicates no data.
Column Description Units/Format SITE k=Kataba forest, p=Pandamatenga, m=Near Maun, e=HOORC/MPG Maun tower, o=Okwa river crossing, t=Tshane, skukuza=Skukuza Flux Tower text SPECIES Scientific name up to 25 characters text DATE Date of measurement yyyymmdd BA Woody plant basal area m2/ha SEBA Standard error of BA m2/ha DENSITY Woody plant density (number of trees per hectare) number/ha SEDEN Standard error of DENSITY (n=42 for KT, n=49 for Skukuza) number/ha STEMS Number of stems per hectare (/ha) number/ha HEIGHT Basal area-weighted average height m2/ha WOOD Aboveground woody plant wood dry biomass kg/ha LEAF Aboveground woody plant leaf dry biomass kg/ha LAI Leaf Area Index calculated by allometry m2/m2 [ Adapted from Scholes, R. J. 2005. SAFARI 2000 Woody Vegetation Characteristics of Kalahari and Skukuza Sites. Data set. Available on-line [http://daac.ornl.gov/] from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A. ]
Data File Contents: (NARSTO_EPA_SS_HOUSTON_FRASER_ORG_SPEC_24HR_V1.txt)
COLUMN NAME
NAME TYPE
CAS IDENTIFIER
UNITS
FORMAT TYPE
FORMAT FOR DISPLAY
MISSING CODE
OBSERVATION TYPE
SAMPLE PREPARATION
BLANK CORRECTION
Site ID: standard Variable
None
None Char 12 None Supplementary data
Not applicable
Not applicable
Date start: local time
Variable
None
yyyy/mm/dd
Date
10
None
Supplementary data
Not applicable
Not applicable
Time start: local time
Variable
None
hh:mm
Time
5
None
Supplementary data
Not applicable
Not applicable
Date end: local time
Variable
None
yyyy/mm/dd
Date
10
None
Supplementary data
Not applicable
Not applicable
Time end: local time
Variable
None
hh:mm
Time
5
None
Supplementary data
Not applicable
Not applicable
Time zone: local
Variable
None
None
Char
3
None
Supplementary data
Not applicable
Not applicable
Date start: UTC
Variable
None
yyyy/mm/dd
Date
10
None
Supplementary data
Not applicable
Not applicable
Time start: UTC
Variable
None
hh:mm
Time
5
None
Supplementary data
Not applicable
Not applicable
Date end: UTC
Variable
None
yyyy/mm/dd
Date
10
None
Supplementary data
Not applicable
Not applicable
Time end: UTC
Variable
None
hh:mm
Time
5
None
Supplementary data
Not applicable
Not applicable
Fluoranthene
Variable
C206-44-0
ng/m3 (nanogram per cubic meter)
Decimal
8.2
-999.99
Particles
Organic extraction
Blank corrected
Fluoranthene
Flag
C206-44-0
None
Char
2
None
Particles
Organic extraction
Blank corrected
Pyrene
Variable
C129-00-0
ng/m3 (nanogram per cubic meter)
Decimal
8.2
-999.99
Particles
Organic extraction
Blank corrected
Pyrene
Flag
C129-00-0
None
Char
2
None
Particles
Organic extraction
Blank corrected
Benz[a]anthracene
Variable
C56-55-3
ng/m3 (nanogram per cubic meter)
Decimal
8.2
-999.99
Particles
Organic extraction
Blank corrected
Benz[a]anthracene
Flag
C56-55-3
None
Char
2
None
Particles
Organic extraction
Blank corrected
[ Adapted from Fraser, Matthew. 2003. NARSTO EPA_SS_HOUSTON TEXAQS2000 PM2.5 Organic Speciation Data. Available on-line (http://eosweb.larc.nasa.gov/PRODOCS/narsto/table_narsto.html) at the Langley DAAC, Hampton, Virginia, U.S.A. ]
We recommend that you organize the data within a file in one of two ways. Whichever style you use, be sure to place each observation in a separate line (row). Most often each row in a file represents a complete record, and the columns represent all the parameters that make up the record. This arrangement is similar to a spreadsheet or matrix. For example:
SAFARI 2000 Plant and Soil C and N Isotopes, Southern Africa, 1995-2000SITE,COUNTRY,LAT,LONG,DATE,START_DEPTH,END_DEPTH,CHARACTERISTICS,C,N,d13C,d15N units,none,decimal degrees,decimal degrees,yyyy/mm/dd,cm,cm,none,percent,percent,per mil,per mil USGS-1,Botswana,-21.62,27.37,1999/07/12,5,20,Hardveld,0.67,0.052,-17,8.9 USGS-2,Botswana,-21.07,27.42,1999/07/12,5,20,Hardveld,0.68,0.063,-18.3,8 USGS-3,Botswana,-20.72,26.83,1999/07/12,5,20,Hardveld,0.94,0.087,-17,6.8 USGS-4,Botswana,-20.52,26.41,1999/07/12,5,20,Hardveld,0.53,0.04,-19.9,5.5 USGS-5,Botswana,-20.55,26.15,1999/07/12,5,20,Lacustrine,2.11,0.162,-15.2,5.9 ... USGS-30,Botswana,-19.81,23.63,1999/07/18,5,20,Alluvium,0.67,0.063,-19.2,11.8 USGS-31,Botswana,-20.62,22.74,1999/07/18,5,20,Hardveld,0.23,0.014,-16.8,16.2 USGS-32,Botswana,-21.06,22.4,1999/07/18,5,20,Hardveld,0.39,0.028,-20.9,9.5 USGS-33,Botswana,-22.01,21.37,1999/07/19,5,20,Sandveld,0.19,0.01,-17.9,9.1 USGS-34,Botswana,-22.99,22.18,1999/07/19,5,20,Sandveld,0.16,0.006,-19.7,8.7 USGS-35,Botswana,-23.7,22.8,1999/07/19,5,20,Sandveld,0.37,0.019,-20.7,15.2[ From: Aranibar, J. N., and S. A. Macko. 2005. SAFARI 2000 Plant and Soil C and N Isotopes, Southern Africa, 1995-2000.Data set. Available on-line [http://daac.ornl.gov/] from Oak Ridge National Laboratory Distributed Active Archive Center,Oak Ridge, Tennessee, U.S.A. ]
If you use a coded value or abbreviation for a site or station, be sure to provide a definition, including spatial coordinates, in the documentation.
A second arrangement may be more efficient when most records do not have measurements for most parameters, that is, a very sparse matrix of data, with many missing values. In this arrangement, one column is used to define the parameter and another column is used for the value of the parameter. Other columns may be used for site, date, treatment, units of measure, etc. For example:
Coast redwood NPP data from Humboldt Redwoods State Park, California, USA; Busing & Fujimori, June 2005 Old stand plot study at Bull Creek with bole diameter measurements at 1.7 m aboveground in 1972 and 2001 Orig_sort_order Parameter Measurement_Type Value Units Species Sequoia_sp_grav Equation 1 Latitude Site Characteristics 40.35 decimal degree Not applicable -999.9 Not applicable 2 Longitude Site Characteristics -123 decimal degree Not applicable -999.9 Not applicable 3 Terrain Site Characteristics Alluvial flat Not applicable Not applicable -999.9 Not applicable 4 Slope Site Characteristics 0 degree Not applicable -999.9 Not applicable 5 Elevation (above mean sea level) Site Characteristics 80 m (meter) Not applicable -999.9 Not applicable 6 Total site area Site Characteristics 1.44 ha (hectare) Not applicable -999.9 Not applicable 7 Density Density 380 stems/ha (stems per hectare) All species -999.9 Not applicable 8 Basal area Area 330 m2/ha (square meter per hectare) All species -999.9 Not applicable 9 Basal area Area 329 m2/ha (square meter per hectare) Sequoia -999.9 Not applicable ... 123 Total tree ANPP ANPP 581-697 g/m2/yr (gram per square meter per year) All species 0.33 eq. 2 estimates 124 Total tree ANPP ANPP 669-802 g/m2/yr (gram per square meter per year) All species 0.38 eq. 2 estimates Sequoia_sp_grav: *Specific gravity, 0.33 mg/cm3, see WE Westman & RH Whittaker, 1975, J. Ecol. for details. Sequoia_sp_grav: ^Specific gravity, 0.38 mg/cm3, from DW Green et al., 1999, USDA Forest Service FPL-GTR-113. Method: **Calculations & allometric equations described by RT Busing & T Fujimori, 2005, Plant Ecol. Notes: ***Range of values results from min. & max. estimation ratios of WE Westman & RH Whittaker, 1975, J. Ecol. From: Busing, R. T., and T. Fujimori. 2005. NPP Temperate Forest: Humboldt Redwoods State Park, California, U.S.A., 1972-2001.Data set. Available on-line [http://www.daac.ornl.gov] from Oak Ridge National Laboratory Distributed Active Archive Center,Oak Ridge, Tennessee, U.S.A.
Keep Similar Information TogetherAn important issue with data organization is the number of records in each file (file size). There are a number of factors that determine the optimal number of records in a file, and we don't have any hard and fast rules. In general, keep a set of similar measurements together (e.g., same investigator, methods, time basis, and instruments) in one data file. Please do not break up your data into many small files, e.g., by month or by site if you are working with several months or sites. Instead, make month or site a parameter and have all the data in one large file. Researchers who later use your relatively large data file won't have to process many small files individually. There is an upper limit to the size of files, though. Large files (on the order of several tens of thousands of records, or several tens of megabytes) do become unwieldy and may be too large for some applications. For example, Excel 2003 will currently only support 65,000 rows and 256 columns of data. (Excel 2007 may eliminate these limitations.) Large tabular data files may need to be broken into logical smaller files.
Organization by Data Type
If you are collecting many observations of several different types of measurements at a site (e.g., leaf area index and above- and belowground biomass), place each type of measurement in a separate data file. For each data file, use similar data organization, parameter formats, and common site names, so that users understand the interrelationships between data files.
Data types collected on different time bases (e.g., per hour, per day, per year) might be handled more efficiently in separate files.
Alternatively, if relatively few observations are made at a site for a suite of parameters, then all data could be placed in one file. Thorough data set documentation would be needed.
In addition to scientific quality assurance (QA), we suggest that you perform basic data QA on the data files. These checks complement the Tabular and Image file preparation guidance provided in Section 2.
Tabular Data
Image Vector and Raster Data
For GIS image/vector files, ensure that the projection parameters have been accurately given. Additional information such as data type, scale, corner coordinates, missing data value, size of image, number of bands, endian type should be checked for accuracy.
A checklist with suggested reviews for spatial data file attributes and accompanying documentation is included in Appendix E.
We recommend that data set titles be as descriptive as possible. When giving titles to your data sets and associated documentation, please be aware that these data sets may be accessed many years in the future by people who will be unaware of the details of the project.
Data set titles should contain the type of data and other information such as the date range, the location, and the instruments used. If your data set is part of a larger field project, you may want to add that name, too (e.g., SAFARI 2000 or LBA-ECO). In addition, we recommend that the length of the title be restricted to 80 characters (spaces included) to be compatible with other clearinghouses of ecological and global change data collections. Names should contain only numbers, letters, dashes, underscores and spaces -- no special characters. The data set title should be similar to the name(s) of the data file(s) in the data set (see Sect. 1). A given data set might contain only one data file or many thousands of data files.
Some bad titles:
Characteristics:
The documentation accompanying your data set should be written for a user 20 years into the future. Therefore you should consider what that investigator needs to know to use your data. Write the document for a user who is unfamiliar with your project, sites, methods, or observations.
To ensure that documentation can be read well into the future requires that it be in a stable non-proprietary format. If figures, maps, equations, or pictures need to be included, use a non-proprietary document format such as html (hypertext markup language). Images, figures, and pictures may be included as individual gif (graphics interchange format) or jpg (Joint Photographic Experts Group) files. Converting documents to a stable proprietary format, such Adobe pdf (portable document format) files, is a good choice.
The documentation should be in a separate file that is identified in the data file. The name of the documentation file should be similar to the name of the data set.
The data set documentation should provide the following information:
Documentation can never be too complete.
Kanciruk, P., R.J. Olson, and R.A. McCord. 1986. Quality Control in Research Databases: The US Environmental Protection Agency National Surface Water Survey Experience. In: W.K. Michener (ed.). Research Data Management in the Ecological Sciences. The Belle W. Baruch Library in Marine Science, No. 16, 193-207.
Michener, W. K., J. W. Brunt, J. Helly, T. B. Kirchner, and S. G. Stafford. 1997. Non-Geospatial Metadata for Ecology. Ecological Applications. 7:330-342.
Michener, W.K. and J.W. Brunt (ed.). 2000. Ecological Data: Design, Management and Processing, Methods in Ecology, Blackwell Science. 180p.
Michener, W K. 2006. Meta-information concepts for ecological data management. Ecological Informatics. 1:3-7.
Cook, Robert B, Richard J. Olson, Paul Kanciruk, and Leslie A. Hook. 2001. Best Practices for Preparing Ecological Data Sets to Share and Archive. Bulletin of the Ecological Society of America, Vol. 82, No. 2, April 2001.
Funding high-throughput data sharing. Nature Biotechnology 22:1179-1183. doi:10.1038/nbt0904-1179.
USGS. 2000. Metadata in plain language. Available on-line at: http://geology.usgs.gov/tools/metadata/tools/doc/ctc/
Christensen, S. W. and L. A. Hook. 2007. NARSTO Data and Metadata Reporting Guidance. Provides reference tables of chemical, physical, and metadata variable names for atmospheric measurements. Available on-line at: http://cdiac.ornl.gov/programs/NARSTO/
U.S. EPA. 2007. Environmental Protection Agency Substance Registry System (SRS). SRS provides information on substances and organisms and how they are represented in the EPA information systems. Available on-line at: http://www.epa.gov/srs/
Olsen, L.M., G. Major, K. Shein, J. Scialdone, R. Vogel, S. Leicester, H. Weir, S. Ritz, T. Stevens, M. Meaux, C.Solomon, R. Bilodeau, M. Holland, T. Northcutt, and R. A. Restrepo. 2007. NASA/Global Change Master Directory (GCMD) Earth Science Keywords. Version 6.0.0.0.0. Available on-line at: http://gcmd.gsfc.nasa.gov/Resources/valids/archives/keyword_list.html
1Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725. This work was sponsored by the U.S. National Aeronautics and Space Administration, Earth Science Data and Information Systems Project.
Appendix A
Suggested image and GIS data file formats suitable for long-term archiving.
File Extension Reference Table
FileExtension File Format Description
.asc
ASCII Text .avlESRI ArcView 3.x legend file .bilband interleaved by line raster image file .bip
band interleaved by pixel raster image file .blw
world file for .bil image file .bmp
Windows OS/2 Bitmap Graphics .bpwworld file for .bip/.bmp image file .bqw
world file for .bsq image file .bqw
world file for .bsq image file .bsq
band sequential raster image file .csv comma-separated values .dbf ARCVIEW shape file dbase tabular data file .docMicrosoft Word document .e00 ESRI Arc/Info export file .evf ENVI vector file .flt
Binary Floating Point file - Similar to ASCII grid file with Float data values .hdf HDF is a physical file format for storing scientific data. It features a collection of tools for writing, manipulating, viewing, and analyzing data across diverse computing platforms.
HDF-EOS supports three geospatial data types (grid, point, and swath), providing uniform access to diverse data types in a geospatial context. The HDF-EOS software library allows a user to query or subset the contents of a file by earth coordinates and time (if there is a spatial dimension in the data). Tools that process standard HDF files will also read HDF-EOS files; however, standard HDF library calls cannot access geolocation data, time data, and product metadata as easily as with HDF-EOS library calls. http://www.hdfeos.org/index.php
.hdr ENVI software image file data format documentation .img
Raster Image (Many format types) .gif
Graphics Interchange Format .gzfile compressed using the Gnu GZIP algorithm. Compressed files will typically have the .gz appended to the original file extension. .jpg (.jpeg)Joint Photographic Experts Group raster image .nc
NetCDF (network Common Data Form) [ http://www.unidata.ucar.edu/software/netcdf/ ] .pngPortable Network Graphic raster image ( http://www.libpng.org/pub/png/ ) .pptMicrosoft PowerPoint presentation Adobe Acrobat document .prj ARCVIEW shape file projection information, which is a text file that you can read. .rdc idrisi software image file data format documentation .rrd IMAGINE software image file data format documentation .rst idrisi software image file data format documentation .sbn ARCVIEW shape file spatial index for read-write shapefiles .sbx ARCVIEW shape file spatial index for read-write shapefiles. .shp ARCVIEW shape file feature geometry .shx ARCVIEW shape file lookup index .tfw TIF world file of projection information .tif (.tiff)Tagged Image File Format raster image .tiff
GeoTIFF -- Geographic tagged image file format [ http://www.remotesensing.org/geotiff/geotiff.html ] .txtText file
Appendix B
Applicable data and time standards suitable for long-term archiving of environmental data.
Applicable Date and Time Standards
The ISO 8601 international standard date notation is YYYY-MM-DD: ISO 8601 uses the 24-hour clock system that is used by most of the world.
References
|
Appendix C
Applicable spatial coordinate standards suitable for long-term archiving of image and GIS data.
Applicable Spatial Coordinate Standards
Global Positioning System derived coordinates may
use additional reference datum:
Applicable Standards
|
Appendix D
Additional information for reporting elevations.
Additional Information on Vertical Datum
Vertical Datum Vertical datums are a considerable challenge for cartographers in the marine world. Ultimately all datasets should refer all depths to WGS84 Datum (or equivalent) to create a seamless database. This is relatively straightforward for land data as geoidal models can be used to derive the separation between local land datum and a global reference surface. However, Chart Datum, to which all soundings are referred, is not a coherent surface. It is certainly not easy to model. (www.hydrographicsociety.org/PDF/Journal-113-Article2.pdf ) The National Geodetic Survey (NGS) develops and maintains the current national geodetic vertical datum, NAVD 88. In addition, NGS provides the relationships between past and current geodetic vertical datums, e.g., NGVD 29 and NAVD 88. However, another part of our parent organization, NOS (National Ocean Service), is the Center for Operational Oceanographic Products and Services (CO-OPS). CO-OPS publishes tidal bench mark information and the relationship between NAVD 88 and various water level/tidal datums (e.g., Mean Lower Low Water, Mean High Water, Mean Tide Level, etc.). (http://www.ngs.noaa.gov/faq.shtml) |
Appendix E
Following is a checklist for the quality assurance of image vector and raster data with suggested reviews for data file attributes and accompanying documentation.
For GIS image/vector files, ensure that the projection parameters have been accurately given. Additional information such as data type, scale, corner coordinates, missing data value, size of image, number of bands, endian type should be checked for accuracy.
Checklist for Image Vector and Raster Data File Characteristics:
Checklist for Image Vector and Raster Data File Content:
|