Evaluating Carbon Model Data Needs and Potential "Fits" with ARM Data Products

modified from Sarmiento and Gruber 2002

Before carbon modelers can "borrow" ARM data to use for their own purposes, they need to have a general understanding of both the conceptual philosophy of the ARM experimental design and the hierarchical terminology used with ARM datasets.

Spatial Organization and Sampling Design for ARM Data

Data generated by ARM represent a single geographic point rather than a geographic line or area. Consistent with its mission to support and improve climate modeling, all ARM points at a single site are meant to represent a single cell of a General Circulation Model (GCM). Most GCMs simulate the atmosphere as a series of "single columns" with vertical structure located above a cell location on the earth's surface. Although the atmosphere is dynamic, the ARM instruments will always sample the same conceptual column which is represented in many popular GCM models. At the SGP site, the ARM sampling design places instruments within both the "core" of the conceptual single cell (with central, intermediate, and extended facilities) and areas just outside the boundaries of the GCM cell (boundary facilities).

The geographic locations sampled by ARM are structured in a two-level hierarchy:

Sites - large-scale "single column" locations in the Southern Great Plains (SGP), Tropical Western Pacific (TWP), and North Slope of Alaska (NSA). The TWP and NSA sites are conceptually ONLY central facilities.
Facilities - similar instrument installations at different geographic locations within a site. The SGP site has central, extended, boundary, and intermediate facilities. The TWP site has installations on Manus and Nauru Islands. A third site for operational staging is now implemented at Darwin, Australia. The NSA site includes installations at Barrow and Atqasuk.

The following table shows the major dimensions of the ARM data structure.

Dimension Number of values Explanation

Site 3 values The major field locations operated by ARM (Southern Great Plains (SGP), North Slope of Alaska, Tropical Western Pacific)

Facility ~30 values for SGP in 5 categories Central, Extended, Boundary, Intermediate, External

Time: (Day, month, or year ) Currently spans 10 years and will continue long-term; Many values possible depending on the time resolution chosen.

Data stream: ~2000 values Can be partitioned by data levels or facility; can be partitioned by source (> 100 types of instruments and algorithms).

ARM data are stored and archived in machine-independent binary NetCDF format.

The SGP site may be of particular interest to carbon modelers, due to the large number of facilities and instrumentation, and the long history of data available there. Soils and vegetation vary across SGP facilities, and the nominative land cover for SGP has been characterized via remote sensing. Soil texture and Land Use/Land Cover maps for Oklahoma and Kansas are also available, as are Landsat and MTI satellite views of the SGP. We have prepared a special page on Soil Measurements for Carbon Science at ARM facilities.

Recognizing ARM Data Uses in Carbon Models

ARM data have the potential to improve estimates of carbon cycle fluxes and pools affected by exchanges between the atmosphere and the land surface, especially vegetation, soil, and detritus. In addition to standard meteorological forcing variables, ARM measures many aspects of the radiation budget as solar energy passes through the atmosphere to reach the land surface, including longwave and shortwave direct and diffuse radiation, and latent and sensible heat. All of these measurements reflect parameters which affect vegetation and decomposition.

Use of ARM Measurements as Forcing or Driver Variables

Perhaps the most obvious potential use for ARM data is to provide measured rather than estimated meteorological data which can be used to "drive" carbon cycle simulations. Most carbon simulations require weather data, which they use as "forcing" functions to control photosynthesis, respiration, decomposition, and other important carbon fluxes. Daily or hourly meteorological inputs are often needed, although the internal calculation timesteps used in these models may be even faster.

The following table shows a number of ARM data types that are clearly well-suited for use as driver or forcing variables in carbon models.

Measurement (units) Summaries Needed ARM data source :: measurement

Air Temperature (C) min, max, mean, std within the period 1 minute surface meteorology data (sgp1smos.a0) :: temp

Precipitation (mm) total within the period 1 minute surface meteorology data (sgp1smos.a0) :: precip

Vapor Pressure (kPa) min, max, average 1 minute surface meteorology data (sgp1smos.a0) :: vap_pres

Wind Speed (m/s) min, max, average 1 minute surface meteorology data (spg1smos.a0) :: wspd

Solar radiation (total incoming shortwave = direct + diffuse) (W/m2) hourly: average of watts / m2
daily: average of watts / m2; total of joules / m2

Monthly: average of daily joules / m2; total of joules / m2
Solar and Infrared Radiation Station (sgpsirs and sgpsiros) :: short_direct_normal + down_short_diffuse_hemisp or down_short_hemisp

One complication with using ARM meteorlological data as forcing functions for carbon models is that ARM facilites do not duplicate the measurement of basic surface meteorological observations (SMOS) if there is a meteorological station operated by another agency, the Oklahoma MESONET or the Kansas MESONET, located within 20 km. The Oklahoma and Kansas MESONET networks are subscription services, and require a paid account to obtain data. Carbon modelers wishing to use ARM data at certain locations would ordinarily have to subscribe, then obtain and merge appropriate data from these external sources to build a complete matrix of driver measurements. As part of this project, we have prepared and are distributing statistical summarizations of such merged data products which are ready for immediate use as meteorological driver variables in carbon simulations.

The table below shows ARM facilities which do not measure surface meteorology, and the closest MESONET site which must be substituted:

ARM Facility without SMOS	Closest MESONET	Distance (km)
EF2	Hesston, KS	< 10
EF10	COPA	19.7
EF12	FORA	0.1
EF16	SEIL	16.7
EF18	OKMU	13.0
EF19	ELRE	2.0
EF22	BESS	9.1
EF26	NINN	11.3

The closest OK MESONET site to EF2 is NEWK, which is 161.4 km away in OK.

Use of ARM Measurements for Validation and Comparison

ARM also measures a number of parameters which are commonly estimated internally by carbon models. A less-obvious but more interesting possibility is the use of ARM data as an external validation for internal algorithms present in carbon models. Rather than being used as input or driver data, ARM data are used in this capacity as a standard against which algorithms simulating important intermediate processes can be independently tested, evaluated, and tuned. Our research group has expertise with such validation of models with observed data.

Several candidates for such validation and comparison exist within the ARM data collection:

Latent heat, as measured by the Energy-Balance Bowen Ratio (EBBR) instrumentation

Latent heat is the energy transported away from the surface by evaporating water, sensible heat is the energy transported from the land surface by air movement (convection), and soil/water heat is the heat that goes into changing the temperature of the soil or the water standing on the land surface. The difference between net radiation and soil/water heat is the energy transported upward by sensible heat and latent heat. The Bowen ratio is the ratio of sensible heat to latent heat. The energy budget can then be written in terms of the Bowen ratio to solve directly for latent heat. The latent heat can be used to obtain estimates of evapotranspiration, a value computed internally in most carbon cycle models.
Soil moisture and temperature, which are directly measured through a soil depth profile by the ARM Soil Water and Temperature System (SWATS)

It is usually necessary for models to estimate depth profiles of soil moisture and temperature, and these estimates can be compared to ARM measured values to gauge the accuracy of this portion of the simulations. Go here for links to soil temperature and moisture data sets designed and prepared for such model validation.
it may be possible to generate from ARM radiation measurements a partitioning of radiation into visible/near-infrared and direct/diffuse radiative components, including photosynthetically-active radiation (PAR) and photon flux density.

Use of ARM Measurements to Reduce Simulation "Spin-up" Time

Many carbon simulations must be run for a prolonged period prior to the actual period for which output results are desired. Such model "spin-up" is required to allow all compartments and fluxes to come to relative equilibrium before the desired simulation interval is reached. Without a sufficiently long spin-up period, initial outputs may contain strange transient values or show unusual trajectories until equilibrium is initially achieved.

Spin-up can consume significant time, effort, and computational resources. Often it is soil compartments which are the slowest to come into equilibrium. Depth profiles of soil temperature and soil moisture are difficult to initially set, and, because of their slow response, may be among the last to equilibrate. The ARM Archive contains measurements of soil moisture and temperature through a depth profile, as part of the Soil Water and Temperature System (SWATS) instruments. It may ultimately be possible to reduce or eliminate simulation spin-up time by using ARM SWATS measurements to initialize soil water and temperature profiles more accurately.

Review of Frequently Used Global Carbon Models

To guide the identification of ARM data suitable for use with carbon models (as either drivers or comparison), we reviewed several popular models for input and comparison data needs. The carbon simulation models described below simulate carbon, nutrient (primarily nitrogen), and water cycles in terrestrial ecosystems. Environmental variables influence processes such as photosynthesis, decomposition, microbial nutrient transformations, and water fluxes. Predictions include net primary productivity (NPP), net nitrogen mineralization, evapotranspiration fluxes, and storage of carbon and nitrogen in vegetation and in soil.

Most carbon models can be run in equilibrium or transient mode. When run in equilibrium mode, the models assume that all carbon and nitrogen fluxes are balanced and that annual carbon and nitrogen inputs to the ecosystem are equal to annual losses. When the models are run in transient mode, annual net ecosystem production (NEP) changes in response to transient climate and atmospheric CO₂ concentration. In the absence of disturbances, annual NEP (equals NPP minus heterotrophic respiration) represents the net annual CO₂ flux between the atmosphere and the terrestrial biosphere. Terrestrial ecosystems act as a sink of atmospheric CO₂ when NEP is positive and act as a source of CO₂ when NEP is negative.

Many of the more advanced carbon models also have two spatial modes: point-based, or single location mode, and area-based mode, which produces a continuous raster grid of values. Although we are starting our investigations using carbon models to simulate single spatial points, we have plans to move to spatial grid simulation mode as this project progresses. Input data for models running in grid mode must themselves be in a spatial grid format, necessitating interpolation between actual measurement locations.

Carbon Models Reviewed

Biome-BGC
The Biome-BGC (BioGeochemical Cycles) model simulates NPP for multiple biomes. Because NPP is computed as the difference between simulated GPP and autotrophic respiration, environmental controls operate on both the process of photosynthesis and respiration. Although nitrogen dynamics have been added, Biome-BGC relies primarily on the hydrologic cycle and how water availability controls C uptake and storage. The response of NPP to elevated CO₂ is determined mainly by changes in transpiration associated with reduced leaf conductance, rather than feedbacks from nutrient cycling. Biome-BGC has a daily time step and no explicit spatial scale. The model has an intermediate number of vegetation (4) and litter/soil (3) pools.

Century
The CENTURY model simulates carbon, nutrient, and water dynamics for different types of ecosystems. CENTURY includes a soil organic matter/decomposition sub-model, a water budget sub-model, two plant production sub-models (grassland and forest), and functions for scheduling events. The model computes flows of carbon, nitrogen, and (optionally), phosphorus, and sulfur through model compartments. The four elements have identical organic matter structure, but they differ in inorganic compounds. Carbon uptake in CENTURY is controlled primarily by nitrogen availability. A grassland/crop sub-model and a forest production sub-model assume that the monthly maximum plant production is controlled by moisture and temperature, and that maximum plant production rates depend on the availability of nutrients. The CENTURY model uses a monthly time step. This model has the finest partitioning of soil/litter (15) and vegetation (8) pools.

TEM
The Terrestrial Ecosystem Model (TEM) is a process-based ecosystem model that describes carbon and nitrogen dynamics of plants and soils for terrestrial ecosystems. This model simulates limitation of GPP by multiple factors. Because plant respiration is explicitly modeled, NPP is simulated as the difference between GPP and carbon respiration. TEM explicitly simulates nitrogen mineralization and immobilization dynamics. However, TEM does not consider the influence of vapor pressure deficit on stomatal conductance or photosynthesis. The TEM uses spatially referenced information on climate, elevation, soils, vegetation and water availability as well as soil- and vegetation-specific parameters to make monthly estimates of carbon and nitrogen fluxes and pool sizes. The response of NPP to elevated CO₂ in TEM is controlled by the control of nitrogen availability on carbon uptake and storage. TEM operates on a monthly time step. The TEM model is viewed as a global model with a spatial resolution of 0.5 degrees latitude/longitude. This model uses the relatively few compartments, with only one carbon pool each for vegetation and soil/litter (two for nitrogen).

PnET
The PnET models provide a nested set of modular approaches to simulating the carbon, water and nitrogen dynamics of forest ecosystems. The different versions of PnET are modular and build out from simplest to most complex. Algorithms such as photosynthesis are identical among model versions. PnET-Day uses foliar mass, specific leaf weight, foliar N concentration, temperature and radiation flux to predict daily gross and net photosynthesis of whole forest canopies. PnET-II adds carbon allocation and respiration terms, as well as a full water balance to predict NPP, transpiration and runoff. An empirical soil respiration terms allows prediction of total ecosystem carbon balance under ambient conditions. This version is used to predict the combined effects of climate change and increased atmospheric CO₂ on these processes. PnET-CN adds compartments for woody biomass and soil organic matter, as well as algorithms for biomass turnover and litter and soil decomposition to allow calculation of complete carbon and nitrogen cycles. The original PnET uses a monthly timestep. PnET-Day uses a daily timestep. The PnET models do not have an explicit spatial scale, but they are viewed as regional.

LoTEC
LoTEC is a mechanistic soil-plant-atmosphere model of ecosystem carbon storage and CO₂ and H₂O flux. Canopy photosynthesis is described by a "Bigleaf" implementation of either a C3 or C4 biochemical model of photosynthesis combined with a sub-model of stomatal conductance. Maintenance respiration for four plant compartments is a function of tissue nitrogen concentration and temperature, while growth respiration is proportional to the change in compartment size. Canopy photosynthesis and maintenance respiration are calculated hourly. Carbon allocation, growth, and growth respiration are calculated daily. Litter and soil carbon dynamics are simulated with a monthy time step. The spatial scale of the model is a half-degree grid cell. This model uses the empirical Miami model, including a factor that represents the response to changing CO₂, as a basis for estimating steady-state NPP instead of the Farquahar model or other process-based models. Because rubisco-limited photosynthesis is not simulated, use of LoTEC is best justified when light is the limiting factor. A complete run generally requires three phases of simulation: spin-up, historical, and future.

SiBD
The Simple Biosphere 2 (SiB2) model can simulate local and regional scale land-surface energy, momentum and mass fluxes using observed forcing ("off-line" mode), or serve as the land surface component of a General Circulation Model (GCM). The strength of SiB2 is its vegetation modeling, with dynamic treatment of LAI based on remote-sensed imagery. Meteorology driver data are typically provided at 30-min. intervals. Model results can be provided with a high (seconds) or low (monthly) temporal resolution. The spatial scale of each model simulation is a local canopy, but global simulations can be made by providing separate inputs for each location in a grid. The SiBD model was developed for integration with GCMs. SiBD allows dynamic vegetation to be simulated driven by satellite-derived global data of vegetation phenology. The soil hydrological parameterization has been modified to give more-reliable calculations of inter-layer exchange within the soil profile. SiBD simulates gradual changes in surface temperature and reflectance as the amount of snow varies.

Input Requirements for Selected Carbon Models

The input requirements for reviewed carbon models are directly compared in the table below.

Input Requirements and Characteristics of Frequently-Used Carbon Models

Ecosystem Model

Model Inputs Century Biome-BGC TEM PnET LoTEC SiB2D

Climate drivers monthly daily monthly monthly / daily daily / hourly hourly / 30-min.

Avg. mean air temperature X X X X

Avg. maximum air temperature X X X

Avg. minimum air temperature X X X

Total precipitation X X X X X Convective
and large-scale
precip

Relative humidity X

Dewpoint temperature X

Vapor pressure daylight average VP deficit X

Solar radiation average X average X Total

Photosynthetically active radiation X Calculated internally

Longwave radiation X

Daylength (sunrise to sunset) X X

Wind speed X

Atmospheric inputs

Nitrogen X

CO₂ X X X X X X

O₂ X

Soil properties

# Soil/litter compartments 15 3 1 C,
2 N 1

Texture %sand/silt/clay X X X

Depth X X X

Slope X

Water holding capacity X X X

Soil porosity X

Initial soil nitrogen X Derived X

Initial soil carbon X Derived

Initial snowpack X X X

Vegetation properties

Biome/Type X X X

Leaf area index (LAI) daily for a year monthly mean

# Vegetation compartments 8 4 1 C,
2 N 5

Life form X X X monthly

Nitrogen X X X

Other nutrients (e.g., P, S, Lignin) Optional

Canopy properties X Dynamic

Rooting depth X X monthly

Precipitation interception X X

Photosynthetic light response X X

Water-carbon trade-off Yes None Vapor pressure deficit Farquhar RuBP model

Site properties

Elevation X X X

Latitude reference only X X reference only reference only

Longitude reference only X reference only reference only

Shortwave albedo useful monthly

Parameters listed in bold represent summaries or measurements that cannot be directly obtained from the ARM data collection.

Many of the data needed can be summarized or directly obtained from the ARM data collection. Other data have been recorded as metadata during the implementation or operation of the Sites. Some needed input measurements associated with vegetation or land use are not contained in the ARM archive. Parameters listed in bold represent summaries or measurements that cannot be directly obtained from the ARM data collection.

The Problem of Gaps in Data Measurement Streams

Most models do not contain logic to handle missing data. Data products designed for use as input to models must be complete (i.e., have a value for every parameter for every time step), yet every practical dataset contains times when measurements were unable to be recorded. Another phase of this project is dedicated to the development and evaluation of intelligent gap-filling techniques to fill these measurement gaps.

Collaborating with Carbon Modelers to Assess Their Needs

In order to better understand their needs, we are working closely with leading carbon modelers. We have retained Dr. Niall Hanan, at Colorado State University and NREL, to give us feedback as we develop products designed to enhance applicability of ARM data to carbon modeling. We are also in contact with Dr. Ian Baker, a postdoctoral fellow working with Dr. Scott Denning at CSU. Dr. Peter Thornton, now working at NCAR, has given us advice on MT-CLIM, one of the synthetic weather generators used in our Make-a-Difference experiment.

Margaret Torn, a member of the Lawrence Berkeley National Laboratory Geochemistry Department Staff, is the acting Program Leader for our sister project, "Design and Implementation of Carbon Measurements at the ARM Southern Great Plains Site", which is part of the LBNL Global Change and Carbon Biogeochemistry Program. Dr. Marc Fischer assists with this LBNL project, and helps to coordinate it with our ORNL efforts. We have also been in contact with Dr. Joseph Berry, who assists with several fixed and portable CO₂ eddy flux towers operating at an ARM CART site as part of the LBNL project. Dr. Tony King and Dr. Mac Post are carbon modelers here at the Environmental Science Division (and creators of the LoTEC model) with whom we consult regularly.

In July 2002, one of us (Hargrove) attended a workshop on the MODIS sensor, organized by Dr. Steve Running at the University of Montana-Missoula (home of the Biome-BGC model). Hargrove presented a poster and gave a brief interview (Quicktime movie). Already available on the Terra and Aqua satellite platforms, MODIS data will provide for the first time global 1 km² estimates of LAI, fPAR, PSN and NPP as often as once a day. These products are certain to become central data in carbon models of the future.

Presently, carbon modelers are involved in intensive efforts to validate these derived MODIS products. One such validation effort, Bigfoot, is being led by Dr. David P. Turner, at Oregon State, Dr. S. Tom Gower, at Wisconsin-Madison, and Dr. Warren B. Cohen, at the PNW Research Station. Bigfoot is a NASA-funded project which is comparing MODIS products to intensive ground measurements taken at nine 5km x 5km sites which are centered on a CO₂ eddy flux tower. The MODIS/ground comparison is completed for four of the sites.

We have suggested the possibility of establishing an NACP intensive within the ARM Cart area. While this makes scientific sense, the cross-agency nature of such a collaboration might prove challenging. Nevertheless, the increased role of ARM in national carbon monitoring would make the location of an NACP intensive at the ARM CART a clear choice. We prepared a favorable review in strong support of the original whitepaper suggesting this possibility.

Such participation (and the ground sampling that would be involved) would help with another future goal of our project, which is to spatially interpolate ARM "spot" measurements into continuous spatial coverage grid maps. Such spatially continuous maps of input variables will be necessary to supply carbon models not running in simple point mode with the requisite input data. Preparing spatial data products which modelers can easily use to run carbon models in this spatially explicit area mode will be another way to attract carbon modelers to use the data held in the ARM archive.

We are also considering the possibility of hosting or sponsoring a National Workshop for carbon modelers interested in using data from the ARM archive. This idea, still in the planning stages, would be to gather researchers interested in carbon modeling, present results from the MAD framework, and sponsor "hands-on" laboratories to help carbon modelers work with ARM data.

General References

Cramer, W. and C.B. Field. 1999. Comparing global models of terrestrial net primary productivity (NPP): introduction. Global Change Biology 5: 3.

Cramer, W., D.W. Kicklighter, A. Bondeau, B. Moore, C. Churkina, B. Nemry, A. Ruimy, and A.L. Schloss. 1999. Comparing global models of terrestrial net primary productivity (NPP): overview and key results. Global Change Biology, 5: 1.

VEMAP members. 1995. Vegetation/ecosystem modeling and analysis project: Comparing biogeography and biogeochemistry models in a continental-scale study of terrestrial ecosystem responses to climate change and CO₂ doubling. Global Biogeochemical Cycles 9(4): 407-437.

William W. Hargrove (hnw@fire.esd.ornl.gov)
Last Modified: Fri Jul 2 12:24:37 EDT 2004

Dimension	Number of values	Explanation
Site	3 values	The major field locations operated by ARM (Southern Great Plains (SGP), North Slope of Alaska, Tropical Western Pacific)
Facility	~30 values for SGP in 5 categories	Central, Extended, Boundary, Intermediate, External
Time: (Day, month, or year )	Currently spans 10 years and will continue long-term;	Many values possible depending on the time resolution chosen.
Data stream:	~2000 values	Can be partitioned by data levels or facility; can be partitioned by source (> 100 types of instruments and algorithms).

Measurement (units)	Summaries Needed	ARM data source :: measurement
Air Temperature (C)	min, max, mean, std within the period	1 minute surface meteorology data (sgp1smos.a0) :: temp
Precipitation (mm)	total within the period	1 minute surface meteorology data (sgp1smos.a0) :: precip
Vapor Pressure (kPa)	min, max, average	1 minute surface meteorology data (sgp1smos.a0) :: vap_pres
Wind Speed (m/s)	min, max, average	1 minute surface meteorology data (spg1smos.a0) :: wspd
Solar radiation (total incoming shortwave = direct + diffuse) (W/m2)	hourly: average of watts / m2 daily: average of watts / m2; total of joules / m2 Monthly: average of daily joules / m2; total of joules / m2	Solar and Infrared Radiation Station (sgpsirs and sgpsiros) :: short_direct_normal + down_short_diffuse_hemisp or down_short_hemisp

	Ecosystem Model
Model Inputs	Century	Biome-BGC	TEM	PnET	LoTEC	SiB2D
Climate drivers	monthly	daily	monthly	monthly / daily	daily / hourly	hourly / 30-min.
Avg. mean air temperature			X	X	X	X
Avg. maximum air temperature	X	X		X
Avg. minimum air temperature	X	X		X
Total precipitation	X	X	X	X	X	Convective and large-scale precip
Relative humidity		X
Dewpoint temperature						X
Vapor pressure		daylight average VP deficit			X
Solar radiation		average	X	average	X	Total
Photosynthetically active radiation		X				Calculated internally
Longwave radiation						X
Daylength (sunrise to sunset)		X				X
Wind speed						X
Atmospheric inputs
Nitrogen	X
CO₂	X	X	X	X	X	X
O₂						X
Soil properties
# Soil/litter compartments	15	3	1 C, 2 N	1
Texture %sand/silt/clay	X		X			X
Depth	X	X				X
Slope						X
Water holding capacity		X		X		X
Soil porosity						X
Initial soil nitrogen	X	Derived				X
Initial soil carbon	X	Derived
Initial snowpack	X			X		X
Vegetation properties
Biome/Type		X			X	X
Leaf area index (LAI)					daily for a year	monthly mean
# Vegetation compartments	8	4	1 C, 2 N	5
Life form	X	X	X			monthly
Nitrogen	X		X	X
Other nutrients (e.g., P, S, Lignin)	Optional
Canopy properties		X				Dynamic
Rooting depth	X	X				monthly
Precipitation interception		X		X
Photosynthetic light response				X		X
Water-carbon trade-off	Yes		None	Vapor pressure deficit		Farquhar RuBP model
Site properties
Elevation	X	X	X
Latitude	reference only	X	X	reference only		reference only
Longitude	reference only		X	reference only		reference only
Shortwave albedo		useful				monthly