NOAA

Geophysical Fluid
Dynamics Laboratory

Skip to: [content] [navigation]
If you are using Navigator 4.x or Internet Explorer 4.x or Omni Web 4.x , this site will not render correctly!

gfdl homepage > people > cobweb homepage > people > v. balaji homepage > this page

Balaji's journal

 

what is a common modeling architecture (CMA)?

This was written following a discussion on the NUOPC Common Modeling Architecture Committee.

This is a short write-up attempting to clarify some issues around the term "common modeling architecture". It can mean a variety of things, and it's important to know which one we're talking about, as each implies a different development model. In particular, this has implications for who does what, and who pays for it.

In some circles this is taken to mean:

  • a commitment to component-based design: where components can be created, stepped forward, and destroyed; and have state variables where they connect to other components. This is the sense in which the term is used in ESMF or CCA.
  • in addition, it implies a commitment to a standard component interface (SCI), where we agree to adopt ESMF as a standard for expressing the external interface to a component.

But there's two other steps before this results in a coupled model.

If we have a pool of N components that we wish to connect together, have N!/(2!*(N-2)!) ways of hooking them together. So model assembly is still quite far away... You probably don't expect anyone to have the biosphere model send the data to the ionosphere model, which sends it to the river model: but the "CMA" defined until now won't prevent you. We could agree to define this as the CMA and stop here.

NEMS (and FMS, for that matter) takes it a step further, and says that we expect the components to fit into these defined "slots", atmos-dyn, atmos-phy, etc, as well as adhering to the ESMF SCI. The relationship between the slots is also defined: e.g atmos-dyn and atmos-phy must be siblings. This we could call the "common physical architecture".

Signing up to participate in a CPA means taking on some development costs (by the component developer). For instance, GFDL-FMS defines atmos-phy as a child of atmos-dyn; if we want our atmos-phy to play in NEMS (and we do!) we have to take on the cost of recoding our model to support atmos-phy as a sibling to atmos-dyn.

If we don't sign up to the CPA but only to the SCI, then that development cost is passed on to the site that is doing the assembly.

So with a CPA we've created a single graph of the N components. There's still a step to be done before we have an application, which is to decide what physical fields are shared by each edge of the graph. That's clearly a science problem and outside the scope of the CMA committee.

So in summary:

  • a CMA can be an SCI or a CPA.
  • If it's a CPA, the component developer is signing up to deliver a component that fits in a certain "slot" of the CPA.
  • If it's an SCI, then the costs are passed on to the side that's assembling the components into an application

Which model does NUOPC wish to follow?.

 

Courant meeting on prediction and filters

AEW predictability, Ryan Torn: look up Thorncroft et al 2007, Berry et al 2008, Berry and Thorncroft 2005. (Gareth Berry).

10 Sep 2006 AMMA test case. WRF 36 km with param conv.

Suranjana Saha NCEP webpage has maps of rawin and surface measurements.

 

How much disk space do we need?

A survey of disk and archive usage from various experiments. The ultimate goal is to estimate disk and archive usage. Specifically:

  • at what rate will we archive data?
  • how much disk is needed so that experiments can be post-processed without touching /archive (assuming that a filesystem like ptmp is available...)

Data produced by /home/vb/bin/frearchrpt.csh.

  • length is length of experiment in years
  • Y/D is years/day from restart directory, thus including all delays due to user, machine, queue wait. etc.
  • Other fields are archive size in GB.
  • The ts directory contains duplicate data, due to binning the same time series in 5yr/20yr/100yr chunks. The pp/ts min column lists the ideal size if we eliminated all duplicate data. In some cases, there's more than one level of duplication (i.e more than 2 copies of some time chunks).
  • The a (atm), o (ocean), i (ice), c (atmospheric chemistry) and b (biology, i.e topaz) subscripts split the sizes by component, unlisted components are negligible.
  • There is a significant chunk of data in ts/seasonal, does anyone use this or can we eliminate it?
  • What's the likely disk high water mark for an experiment? I suggest we estimate it as a 100-year post-processing chunk, i.e all the history, ts and av data for a run scaled to 100 years. The archive high water mark should be the sum of history, ts (min) and av for the whole run.
  • The total disk (i.e ptmp) needed: take some mix of runs from this table — the extreme cases would be 66 ESM2G runs or 11 CM2.4 runs (7920 PEs) — and see if we could store all the 100-year post-processing associated with it on ptmp. Based on this data, we need about 1 PB of disk to operate the post-processing entirely without touching /archive.
  • This total disk space could be reduced by 5X if we used the 20-yr chunk as the largest chunk needing to be disk-resident at one time, and the 100-yr chunks as a rare occurrence. And a further 4X if we used the 5-yr chunk as the basis.
Model NPES Length Y/D restart history pp/av pp/ts pp/ts min directory
CM2.1U 180 300 4.96 135 6268 1166 (926o+213a) 20354 (16172o+3765a) 8322 (5725o+2449a) /arch4/ccsp/ipcc_ar4/CM2.1U_Control-1990_E1
CM2G 120 311 6.13 194 2838 709 (646o+36a) 14649 (12175o+2086a) 6891 (5044o+1688a) /arch7/aja/fre/omsk_2008_03/gm_topo_ogrp/CM2G_gm_topo
ESM2G 120 68 3.48 114 2148 808 (476b+268o+55a) 10148 (7330b+1862o+826a) 5314 (3812b+968o+462a) /arch8/jgj/fre/perth/ESM2G/oct282008/ESM2G_1860_mica_bergs_prescribed_co2
CM2.4 720 236 1.33 7027 24014 6005 (5530o) 102340 (87836o+8670i+5640a) 57734 (46052o+5958i+4769a) /arch6/fjz/CM2.4/CM2.4SQ_A_Control-1990_D1
CM3Q 264 186 8.26 237 5542 243 (218o+19a) 12629 (99870+2344a) 6356 (4597o+1632a) /arch2/wh/cm3/CM3Q_Control-1990_B5
C180 216 27 0.5 92 882 14 315 256 /arch3/miz2/GCM/omsk/c180_hv1_amip1
CM2M 110 110 5.34 101 1920 702 (601o+89a) 8350 (6536o+1108a+681i) 4963 (3583o+773a+587i) /arch2/bls/rts/perth_2008_10/CM2.1_ESM/CM2M_09nov2008_smaxp005
C48L48 96 22 1.6 516 1557 293 (281c+12a) 1477 (1263a+211c) 1450 (1235a+211c) /arch4/lwh/fms/perth/c48L48_am3p5_chem-gamma2
 

METAFOR

  • version mismatch: "re-versioning" tool a useful addition.
  • things to say about GO-ESSP/ESG/CMIP5
  • broad acceptance of native grids
  • broad acceptance that AR4 questionnaire can be replaced by something that can harvest more structured information
  • example of volcano input datasets: controlled vocabulary for keys/values?
  • CONCIM (dt=1yr) to evolve more slowly than APPCIM (dt=1m)
 

WGCM-12

  • RCM issues, asked for IOPs of interest ahead of time.
  • raised issue of vertical coordinates, seems to be general acceptance of conversion software!
 

WGCM-12

  • PMIP experiments must be done with same model (same resolution) as CMIP, so that we can directly compare past and future climate sensitivity. Perhaps if you run PMIP, you must run CMIP with that model?
  • what about CFMIP? Should this be a component? Can we get cloud feedbacks any other way? Implement the simulator is a high priority. Need to talk about this at WGNE.
  • IPSL: 2.5x1.5A 20; higher-res ocn for AOGCM than ESM; diurnal cycle of convection from Lafore, Guichard, &c! interactive aerosols. 80 NEC PEs for 1 year budgeted for all expts.
  • CNRM: Arpege+NEMO: indirect effect (first only); completed 1000-yr control, etc, for ENSEMBLES, also LGM run; CM4LR 2.8A+2O; CM4HR
    1. 4A+1O; perhaps 1A+0.5O) can run 250yr/mo with LR.
  • MPI: ECHAM5+JSBACH veg + MPIOM + HAMOCC. already doing dec pred with assimilation, limited skill; already doing "ENSEMBLES stream 2" with LR ESM: important for us to be doing ENSEMBLES! ESM for AOGCM expts as well; atm res between T85-T159, 0.5L80 for ocn; perhaps no interactive aerosols; expts start spring 2009, new machine.
  • CCCma: CanESM1, perhaps also to be used as AOGCM; similar to AR4 model. pretty good carbon cycle; interactive O3 from strat version, to be used in CCMVal also. improved phys/chem but probably same res as before. T63 CHFP coupled historical forecast project CHFP, T63 maybe T128, 1 or 0.5 ocn. runs start late 2009.
  • CMCC-INGV: dec pred via EU COMBINE and ENSEMBLES; US collaboration, is that Rosati? ECHAM5 T159+0.5 NEMO; PICNTL completed with C-model, all via ENSEMBLES again! T30+2deg OPA. carbon equilibration acceleration, how?
  • CSIRO: T63+0.94; full aerosol chemistry; regional downscaling model CCAM (conformal cubic); ACCESS model: N96 atm + OM3 res, ACCESS model probably not available for AR5; will use mk3.5.
  • MIROC: chem offline in AOGCM, runs start first half 2009: long term t42+1deg; near-term HR T213+1/6 ocn + aerosols; 10-member ensembles, also embedded regional ocean?; MR T85; HR timeslice 20km + 1km regional
  • MRI/JMA: MRI-ESM; 20 km HR near-term integrations (time-slice and coupled)
  • Hadley: HADCM3 N48+1.26 yes, HiGEM N144+0.33 likely, HadGEM3 N216+0.25 possible; start early 2009. full veg + chem. Upgrading to IBM early next year, model performance unknown.
  • Hadley HiGEM for dec prediction. see above, 128p 0.75y/day.
  • CCSM: prognostic aerosols; coupled chemistry. 2009 Ultra-HR 0.2atm X 0.1 ocn this year at ORNL.
  • GFDL
  • GMAO; dec pred only, full o3/aerosols. 1x1L72 atm 0.5 ocn 10-member ensemble + twice the res.
  • KMA: using HadGEM2
  • LASG (China): regional model CREM coupled to POM. No ESM. coupled aerosol-chem available.
  • CMA (China): NCAR flux coupler coupling MOM4+SIS? T42 atm + MOM4 OM3 grid. with chem. carbon cycle+aero+chem by end of 2008.
  • DKC: AOGCM only, T159-0.5, no interactive aerosols.
  • NorClim: HAMOCC+MICOM+CCSm for everything else 1.9atm X 1 ocn.
  • EC-Earth: ECMWF IFS extended to climate ranges (but key developers are KNMI etc, ECMWF doesn't actually do runs: seems to be a way to reassure people that ECMWF isn't invading their turf). T159L62+1deg ORCA. Only dec pred.
  • Discussion: standardized datasets for prescribed o3... what about n2? oxidants?
 

WGCM-12

  • got only half of discussion on decadal expts, most of discussion focused on misinterpretation (as "forecast")

long-term expts: Karl

AOGCMs

  • 500+ yr PIcontrol, prescribed concentrations
  • C20C, prescribed conc
  • AMIP
  • SRES replaced by RCPs labeled by approx rad forcing at 2100 using
    1. 5 (next highest) 6 (high) 4.5 (mandatory) 2.7 (very high): extend to 2300 (v high for 4.5, recommended for other RCPs)

carbon cycle models

CMIP3 model yrs 500-8400 median 2200 CMIP5 1215+1125 + 640 data available to public dec 2010

sandrine bony: CFMIP-2

  • additional diagnostic: COSP obs simulator: cfmip.net software download? online calc means less output
  • additional idealized runs
  • additional process diagnostics
  • present our CFMIP plans at WGNE

veronika eyring: ccmval

  • ccmval-generated o3 timeseries may be available
  • time/space resolution to be decided, possibly dynamically-referenced coordinates (e.g height wrt tropopause)

pascale braconnot: PMIP

  • coordinated expts "highly recommended" by CMIP?

filippo georgi: RCMs

  • examples; RegCM, PRECIS, RSM, WRF
  • coupled RCMs, 2-way nesting
  • 10 km typical, NH CRMs growing
  • projects: NARCCAP, ENSEMBLES, PRUDENCE, AMMA, AFRMIP, ARCMIP, AIACC
  • 6-hourly fields requirement for subset of rus (as in NARCCAP)

smg: WGOMD

  • general acceptance of native grid data

karoly: IDAG

  • discussion on volcanic/solar forcing datasets, there will likely be more than one (~3-4) data sources, need to document.
 

FRE: ppmake

  • moved to fms/make directory, alias to ppmake.

Order matters: if the same target has several different prereqs, the first one is used. I've used this to make a clean implementation of HSM functionality. The rules are:

  • HSM directories are called %.d, and are defined relative to the root directories on on work, ptmp and arch.
  • a directory in work looks for itself in ptmp and copies it.
  • a directory in ptmp looks for itself on arch and copies it.
  • if %.d is not found on arch, look for %.cpio instead, at the same point in the tree relative to archroot.
  • we'd eventually like to transition out of cpio to tar. We can do this by implementing a third fallback rule: if %.cpio is not found, look for %.tar.

The reverse traffic is a function of policy.

  • directories on work get copied to ptmp.
  • directories on ptmp get copied, or cpioed or tarred depending on the policy. Perhaps even dependent on filesize, or user choice.
 

FRE: Continuing development of frepp.mk

Maintenance issues:

  • rename frepp.mk and put it somewhere else, not in the scripts/postProcess directory.
  • rename the mkpp alias to ppmake or something... I frequently find myself typing mkdir when I mean mkpp.

Current problems:

  • parallel make is a problem, several of the lower-level scripts appear to create intermediate files without unique names maybe? For instance timavg.csh creates a date-stamped namelist input file, but that uses the date command with seconds: if two timavg are launched in the same second, there's a name conflict. I haven't looked into plevel or split_ncvars or list_ncvars so far.
  • deletion of intermediate files is a problem, should I declare all files in the work directory to be .PRECIOUS? i.e

   .PRECIOUS: $(WORKROOT)/%
   
  • how to do cheap LOCALCP on the same filesystem? I tried ln and ln -s, both still make make try to remake...
  • the LOCALCP is necessary because everything needs to end up in pp/*, and I'm still struggling with the WORKDIR... how do you run everything in some directory, and then have it end up in pp/*?
 

FRE: design of refrepp

The current frepp runs years in sequence... the 1981 job submits 1982 and so on. If you hit a "chunk" boundary, that job is longer, as it does the chunk stuff.

Working with the example of a 5-year chunk, and let's assume 1983 failed but 1981, 82, 84 and 85 are available.

The most efficient implementation of refrepp would detect the 1983 hole, resubmit a frepp -t 1983 (which by default is one year, then submit frepp -t 1985 to complete the chunk. (This will repeat the one-year processing for 1985 but that can't be helped, at least in the current frepp).

1983 should finish running before you launch 1985... this is best done by using qsub -hold_jid.

If 1981 and 1983 are both missing, they can both be launched concurrently.

The refrepp script needs to:

  • detect the frepp configuration: you can do this by: interpreting the FRE XML file; letting freppcheck interpret the FRE file and looking at the resulting directory tree; by calling tree on the pp directory (assumes that you've got the right directory structure there
  • detecting the holes: you can let freppcheck do this; or you can tree the output directory and do it yourself.

Implementation: you can either:

  • produce a frepp command that creates the output, or
  • look in the scripts/postProcess directory and rerun the scripts you find there (assumes the scripts exist). This option is probably safer, since the assumption of refrepp is that some scripts ran and failed.

The refrepp script will continue to work with only year boundaries: it looks like smaller run segments will have to wait for the new frepp (as in the frepp.mk approach).

 

FRE: frepp.mk state issues

How does frepp.mk know the state of play?

  • Availability of history files in $(ARCHROOT)/history tells where the run is.
  • Availability of files in $(ARCHROOT)/pp tells what's been post-processed. (Delete files here to recreate them).
  • history cpio file can contain multiple dates (currently cpio files are made per year, not per segment). Unpack to recover.
  • variables in a file are recovered by running list_ncvars.csh.
  • components in the archive file can be got from FRE XML or by extracting (tar tvf or cpio -itI).
  • other input files (e.g GRIDSPEC) from FRE. (For other input files, how do we know when to issue a dmget? We can force dmget for any file under $(ARCHROOT) but what about outside?)

Propose that frepp.mk creates the pp directory tree in PTMPROOT and WORKROOT when unpacking files, and creates a Makefile in each. This Makefile will include a state.mk file that contains variables you need, e.g list of variables in a file.

 

FRE: frepp.mk

The first attempts at building frepp.mk have been quite enlightening.

Indispensable is the GNU Make manual:-).

Here's a first cut:

# HSM.mk: makefile fragment for extracting files from archive files,
# using three-level storage model

#use csh, with no user .cshrc
SHELL = /bin/csh -f

# Directory trees in home, archive, ptmp, home follow identical
# "experiment tree" as defined by EXPT
ARCH = /archive/$(USER)
PTMP = /work/$(USER)
WORK = $(TMPDIR)
EXPT = fms/omsk_vb/c48_am3p3
GRIDSPEC = /archive/bw/fms/init/cubed_sphere/mosaic/C48.M45.tripolar.mosaic.cpio
ifeq ($(suffix $(GRIDSPEC)),.cpio)
USES_MOSAIC_GRIDSPEC =
else
#write clause for no-mosaic case
endif


# define roots
ARCHROOT = $(ARCH)/$(EXPT)
HOMEROOT = $(HOME)/$(EXPT)
PTMPROOT = $(PTMP)/$(EXPT)
WORKROOT = $(WORK)/$(EXPT)

# list of available history files
histories = $(shell ls $(ARCHROOT)/history)
filelists = $(patsubst %.cpio,%.filelist,$(histories))
histdates = $(patsubst %.nc.cpio,%,$(histories))

# commands
TIMER = /usr/bin/time -f	# timer for various commands
DMGET = dmget			# retrieve from deep storage
FASTCP = /usr/cluster/bin/cxfscp -dprv # fast copy
CP = /bin/cp -p
FREGRID = /home/z1l/bin/tools_20071219/fregrid
FREGRID_FLAGS = --interp_method 'conserve_order2' --nlon 144 --nlat 90
FMSBIN = /home/fms/local/ia64/v10
NCVARS = $(FMSBIN)/list_ncvars.csh
TIMAVG = $(FMSBIN)/timavg.csh -mb
PLEVEL = $(FMSBIN)/plevel.sh
# not sure about .PRECIOUS, this will not get deleted if you C-c!
.PRECIOUS: $(PTMPROOT)/history/%.filelist

.SUFFIXES: .cpio .tar .filelist .complist .varslist
.PHONY: list setup
list:
	@echo $(ARCHROOT)/history contains:
	@echo $(histdates)
# make setup first
setup: $(PTMPROOT)/history $(WORKROOT)/history $(PTMPROOT)/pp $(WORKROOT)/pp
$(PTMPROOT)/history $(WORKROOT)/history $(PTMPROOT)/pp $(WORKROOT)/pp:
	mkdir -p $@

# vpath %.cpio     $(ARCHROOT)/history
# vpath %.tar      $(ARCHROOT)/history
# vpath %.filelist $(PTMPROOT)/history
#everything in WORKROOT must come from PTMPROOT
$(WORKROOT)/%: $(PTMPROOT)/%
	$(FASTCP) $< $(@D)
$(PTMPROOT)/history/%.filelist: $(ARCHROOT)/history/%.nc.cpio
	$(DMGET) $<
	cpio -it -I $< > $@
	mkdir -p $(basename $@)
#how to make the same directory in WORKROOT?
#	echo $(patsubst $(PTMPROOT),$(WORKROOT),$(basename $@))

# unpacking a history cpio file: unpack to directory ptmp/%
define unpack_cpio
$(PTMPROOT)/history/$(1).complist: $(ARCHROOT)/history/$(1).nc.cpio $(PTMPROOT)/history/$(1).filelist
	cd $(PTMPROOT)/history/$(1) && \
	cpio -iv -I $$<
#the following depends on our filenaming conventions
	echo dates = `ls $(PTMPROOT)/history/$(1) |cut -f1 -d. |sort -u` >  $$@
	echo comps = `ls $(PTMPROOT)/history/$(1) |cut -f2 -d. |sort -u` >> $$@
	echo tiles = `ls $(PTMPROOT)/history/$(1) |cut -f3 -d. |sort -u |grep -v '^nc'` >> $$@
endef
$(foreach date, $(histdates), $(eval $(call unpack_cpio,$(date))))

#gridspec
$(PTMPROOT)/gridspec: $(GRIDSPEC)
ifeq ($(suffix $(GRIDSPEC)),.cpio)
	$(DMGET) $<		# how do I know $< is on archive?
	mkdir -p $@
	cd $@; cpio -iv -I $<; echo Source $(GRIDSPEC) > README
else
#write clause for no-mosaic case
endif

%.nc.2dvars: %.nc
#list of variables in file with a horizontal domain
	$(NCVARS) -st23 $< > $@

#regrid an atmos file
# ifdef USES_MOSAIC_GRIDSPEC
# $(shell echo here)
define fregrid_atmos
$(WORKROOT)/history/$(1)/$(2).$(3).nc: $(WORKROOT)/history/$(1)/$(2).$(3).tile1.nc \
                                       $(WORKROOT)/history/$(1)/$(2).$(3).tile2.nc \
                                       $(WORKROOT)/history/$(1)/$(2).$(3).tile3.nc \
                                       $(WORKROOT)/history/$(1)/$(2).$(3).tile4.nc \
                                       $(WORKROOT)/history/$(1)/$(2).$(3).tile5.nc \
                                       $(WORKROOT)/history/$(1)/$(2).$(3).tile6.nc \
                                       $(WORKROOT)/history/$(1)/$(2).$(3).tile1.nc.2dvars \
                                       $(WORKROOT)/gridspec
	$(FREGRID) $(FREGRID_FLAGS) --input_mosaic $(WORKROOT)/gridspec/atmos_mosaic.nc --input_file $$@ \
          --scalar_field `cat $(WORKROOT)/history/$(1)/$(2).$(3).tile1.nc.2dvars`
endef
define atmos_loop
include $(PTMPROOT)/history/$(1).complist
$(foreach comp, $(comps), $(foreach d, $(dates), $(eval $(call fregrid_atmos,$(1),$(d),$(comp)))))
endef
$(foreach date, $(histdates), $(eval $(call atmos_loop,$(date))))
# endif				# USES_MOSAIC_GRIDSPEC

Some notes:

  • frepp.mk gets the state of the run by looking at the history directory (which will later extend to the pp directory).
  • Certain things (e.g GRIDSPEC, list of pp components) can be obtained from the XML. Certain other "state" issues (e.g list of variables) have to be extracted from the history files. Since we cannot assume that unpacked cpio files are available, we have to set some variables as the result of a make. The best approach might be to create all the Makefile while unpacking.
  • we have restricted work to happen in a stringent directory hierarchy anchored at the various *ROOT directories. This makes the Makefile hard to read? Having a local Makefile in each pp/* directory might be better.
 

Computers in weather and climate

 

FMS: Designated I/O processors

We've toyed on and off with the idea of having I/O streamed from a subset of PEs: currently you can either do single-threaded from a pelist, or they all do it, there's nothing between.

Back in October 2007, I wrote a design doc for more flexible parallel I/O which has yet to be implemented, mainly because on the platforms we ourselves run, this hasn't become the rate-limiter.

However, some of our colleagues (specifically, MOM users in Australia with a NEC platform, have very fast arithmetic, and thus are becoming I/O bound. Tim Pugh proposes a somewhat different design. We arecurrently evaluating both.

 

FMS diag_manager feature requests

Regional output on mosaics

This is a request from Steve Klein:

Is it possible to instrument diag_manager so that output can be restricted to a subset of grid tiles? The application would be something where regional output at high time-resolution is desired: it would make sense to reduce data volumes by only doing this on the subset of grid tiles in a mosaic (e.g perhaps only one cube face) that are needed to span the region in question.

CF standard names

We've already implemented a new character(len=*), intent(in) :: standard_name argument to register_diag_field. The domain expert is supposed to consult the CF standard name table and match our diagnostic variables against a standard name.

Bruce Wyman suggests that rather than modify register_diag_field calls, it would be better if diag_manager_init read a text file that's part of the checkout, say cf_standard_name.txt which contains a lookup table from "GFDL variable name" to "CF standard name".

I don't see why that's any easier... it seems to me that either way the relevant domain expert has to go through all her register_diag_field calls, and find the standard_name, and add it somewhere, either in the source code, or submit it for inclusion in cf_standard_name.txt. But if that's what people prefer we can go with it.

 

FMS: f90tohtml

Niki found this neat tool called f90tohtml that marks up f90 source code in HTML, with cross-ref links (caller, callee, etc). Try this cobweb page.

It looks like f90tohtml hasn't been updated in a while... perhaps we should try to extend its capabilities? Ideas include

  • markup variables with a title tag that gives its definition, and a link to its declaration line.
  • figure out how to CVSify it, so you regenerate it for each checkout.
  • perhaps link to the model dev database?
 

FMS: enabling concurrent ensembles

FMS provides support for running ensembles (specifically, the Ensemble Adjustment Kalman Filter of Zhang et al 2005) where each ensemble member runs concurrently on an independent set of processors. The concurrency requirement ensures that each ensemble member has its own address space, and can modify anything within it, including module global variables, without affecting any other ensemble member. This is in contrast to the position Max Suarez calls the "moral high ground", where components must be written so it is always possible to create as many instances as you like of any component within a single address space. Be that as it may, we are sticking to the current method developed by Matt, Shaoqing and myself, where ensembles can only be run concurrently in separate address spaces.

The ensemble code is now mature to the point where we're planning to make part of the FMS infrastructure. Bringing it into FMS is proposed with the perth release: this also means that henceforth the RTS will include a regression test in ensemble mode.

Niki has provided a good summary of the current code status. The basic change to the plumbing in coupler_main is straightforward. Some of the more extensive changes have to do with I/O.

The basic design of I/O relies on the fact that each ensemble member has its own pelist, which is then assigned a name (e.g ENS01, ENS02, ...); then any file foo.nc which is private to an ensemble member gets automatically renamed foo.ENS01.nc, etc by the I/O layer. The I/O code is currently instructed to add this string by the argument append_pelist_name, which when TRUE, tells the I/O to change the names.

The code changes appear extensive because this append_pelist_name argument is passed up and down many deep call chains. I propose instead replacing this with an internal logical variable append_pelist_name in mpp, which can be altered by the user:

append_pelist_name=.FALSE.
subroutine mpp_append_pelist_name()
  append_pelist_name=.TRUE.
end

... so that it gets set to TRUE in ensemble mode but is FALSE most of the time.

With these changes, I think the ensemble code becomes vastly simplified: please review the code changes in this light.

The code is scheduled to be in the perth release (June 2008).

 

FMS: enabling the river code on mosaics

Review of current parallel river model

The first version of a parallel river model in LM2 (see this paper section) was based on a basin decomposition. An entire river basin network was maintained on a single processor. The amount of computational work in a basin is proportional to the number of cells in the basin. So the work was partitioned among processors by sorting the basins in descending order of size; then assigning the basins among processors by going down the sorted sequence and assigning basins to the PE that had the least amount of work. (See routine sort_basins).

This non-traditional decomposition had some limitations. One is that the largest basin limits the scalability of this approach. For the 30-minute grid of Vorosmarty et al (2000) this limited the scalability to about 30. The second is since the code uses non-standard decomposition rather than the domain2D type, we aren't able to use traditional distributed I/O from diag_manager. Instead we gather the data onto one processor and pretend to diag_manager that this is indeed undistributed data. Even the gather onto one processor cannot use the standard gather routine mpp_global_field, thus we have a number of strange uses of mpp_sum to effect a gather:

wrk2 = 0.0
wrk2(isc_lnd:iec_lnd,jsc_lnd:jec_lnd) = runoff(:,:) &
     *gland_mask(isc_lnd:iec_lnd,jsc_lnd:jec_lnd)
call mpp_sum(wrk2, nlon_lnd*nlat_lnd)

before calling diag_manager, horiz_interp and so on.

Motivation to rewrite

The old method worked for CM2-class models where the performance of the river code wasn't an issue: it was too small a fraction of the total model time. We were also working under the assumption that regridding a network grid was somewhat insurmountable: we were wedded to the Voro 30-minute grid.

We are currently revisiting this issue for various reasons.

  • The atmosphere is moving to a cube-sphere grid. In the FMS implementation of this grid the cube faces are on different processors: thus a "basin decomposition" is not possible (basins may cross face boundaries). For algorithmic reasons we may want to keep the land grid identical to the atmosphere grid.
  • We are getting more confident (see Fekete et al (2001)) that it's possible to scale up river networks to any target grid in a reasonable amount of time. For algorithmic reasons we may want to keep the river grid the same as the land grid. Generation of river network grids is a project of Chris Milly and Krista Dunne.
  • We are now expecting the river to have a complete biogeochemistry and carry many tracers. The performance assumptions of CM2 may no longer apply to CM3 and beyond.

Given these considerations, we are proposing a new river parallel infrastructure that will address many of these concerns: a framework where

  • atmosphere, land and river grids are allowed to be independent;
  • when the grids indeed overlie, the code will make appropriate optimizations;
  • scalability not limited by catchment size.

Current proposal

The current architecture has the river physics performed within an update_river step, which is surrounded by calls that interpolate and transfer data between land and river grids.

update_river calls river_physics_step which actually performs the physics

travelnow = maxtravel
do travelnow = maxtravel, 1, -1
   if( ncells(travelnow).EQ.0 )cycle
   call mpp_clock_begin(physicsclock)
   call river_physics_step (River, petravel, travelnow )
   call mpp_clock_end(physicsclock)
enddo

maxtravel is 72 on the Voro 30-min grid (that's the Nile I think...) but most basins have a shorter length. ncells detects whether a basin has any work for the current travel step travelnow.

Within river_physics_step, most of the action is local, on the point (i,j). The exception is this code section, where you send your outflow to the inflow at the downstream point:

idest=i+di(River%tocell(i,j))
jdest=j+dj(River%tocell(i,j))
if (idest.eq.0           ) idest=River%nlon
if (idest.eq.River%nlon+1) idest=1
if (jdest.eq.0 .or. jdest.eq.River%nlat+1) then
    write (*,*) 'WARNING: JDEST=', jdest
endif
River%inflow(idest,jdest) = &
     River%inflow(idest,jdest) + River%outflow(i,j)

We propose to rebuild the code as follows.

  • river calls mpp_define_domains and gets a halo-less mosaic decomposition. (If it is inheriting the land grid, it can just use Land%domain).
  • When idest and jdest reach the domain edge, they send their outflow into a buffer. (if jdest==jec the outflow gets copied into the e_buf; if idest==iec and jdest==jec, it goes to ne_buf. Once the buffers are full, we call mpp_send (non-blocking) to send the buffers to the appropriate destination.
  • at the beginning of the i,j loop, river_physics_step looks in River%fromcell to see if it's expecting an inflow from another PE: if so, it calls mpp_recv (blocking) and receives the inflows. (River%fromcell is not currently in the River data structure, but it's trivial to compute and define).
  • Since the communication is point-to-point and not a halo update, it is only called when necessary, the amount of communication is small. If this turns out to be a problem (unlikely) it's not hard to make this communication overlap with computation.
  • The "gather using mpp_sum" can be eliminated, we can directly apply horiz_interp and diag_manager calls to the distributed River fields with the River%domain argument.
  • The code is set up to call horiz_interp between land and river. In the new code this is optimized so that this call is bypassed with simple copies when in fact the river and land grids are the same.

Testing

We propose to test this using the current river_solo test case that uses the Voro 30-min grid. This grid will be cast as a mosaic with two domains, and should yield the same answer as the original code to within roundoff. Estimated completion: 1 March 2008.

Other code comments

  • I find the current structure, where new tracers have been added afterwards (using a _c suffix) to be cumbersome. Could we, for instance, combine River%storage(:,:) and River%storage_c(:,:,:) into a single array River%storage(:,:,:+1) whose first element is the original %storage field?
  • The current structure has led to possible typos: e.g in river_physics_step (line 160) I see a line like
River%storage_c(i,j,:) = River%storage_c(i,j,:)       &
     + (River%inflow_c(i,j,:) - River%outflow_c(i,j,:)   &
     + River%infloc_c(i,j,:) ) * River%dt_slow

but there's no corresponding line for River%storage(i,j).

 

FMS: ifort problem?

What is a linker supposed to do if it finds multiple instances of a main (i.e program entry point) in its list of objects and libraries?

On the Origin, this was a fatal error: the linker ld would fail with an error listing all the available instances of main.

We have inadvertently set up our cubed-sphere based compilations to have multiple main programs. In particular, the directory atmos_cubed_sphere/tools/pp contains files cub2cub_driver.F90 and cub2latlon_driver.F90 that both have entry points.

Nonetheless, ifort compiled successfully with no error! And the resulting executable works correctly.

The linker command line from our Makefiles gives us something like:

ifort libcoupler.a libatmos_dyn.a libatmos_phys.a libice.a libland.a libocean.a libfms.a \
  -Wl,-V,-M -lnetcdf -lmpi -lsma -o fms_c48_am3p3.x

It turns out that cub2cub_driver.o and cub2latlon_driver.o are in libatmos_dyn.a whereas coupler_main.o is in libcoupler.a ... purely by virtue of the fact that our Makefile put the coupler library first on the command line we have a successful compile!

This is clearly fragile and unacceptable. And after scouring the ld manpage (for strings like main, entry, program, ...) I can find no flag even to control how ifort should respond.

Two things:

  • this should be taken up with Intel.
  • we should modify our codes or checkouts so that these other entry points never get linked.

emacs-muse-mode created by v. balaji (balaji@princeton.edu) in emacs using the emacs-muse mode.
last modified: 23 December 2008
this page visited: 469 times
smaller bigger reset