Concerns about workshop goals

By @jeremy Date 2007-05-21 10:45

Hi Stephane (and others),

Please, feel free to complain -- the more dialog the better as far as I'm
concerned. Not comfortable when the group gets too quiet ;-)

Let me expand
on our "agency" goals a bit, which will perhaps clear up our perspective.

In our perfect world, we'd have a consolidated IOP algorithm to be put into
operational satellite processing, accompanied by a rigorous uncertainty
budget for this algorithm. Currently, a number of approaches exist, but
when you "tear them apart", most are very similar -- differing only by
basis vector parameterization (S, aph*, etc.) or inversion technique. We
have no interest in an algorithm "shoot-out" -- rather, we'd like to get
a group together (yourselves) to study these parameterizations/inversions
to determine their sensitivities in an operational satellite processing
environment. We know each algorithm has been verified using in situ
data (specific manuscripts and the IOCCG report) and some satellite data --
but to our knowledge, most have yet to be rigorously vetted in the
satellite environment (e.g., how/where does the alg perform in l2 space
versus l3 space, does spectral resolution matter (we lose a green band in
VIIRS), what happens when input Lwn are imperfect?) -- nor, has satellite
inversion failure remediation been robustly explored. In other words, we'd
like to better understand why some algs "blow up" when globally applied, and
to determine if/how such events can be avoided. All of you have worked
on this in some capacity, we just want to "get the band back together".

With regards to implementation in msl12 -- well, we need the algs to work
in this processing environment -- otherwise, no operational satellite
products from NASA, right? Our goal is to ensure that msl12 faithfully
replicates the original implementation, however, differences will arise
(we'd like to know where/why) and we'll have to iterate with each PI.
Ultimately, we hope to build a version of msl12 with a generic IOP
algorithm that allows command-line parameterization of S, aph*, etc. and the
inversion method. Not there yet, but it's coming. This will take some
work and much thought as to implementation. As you mention, changing the
inversion (for example) changes the answer -- which is another reason why
we're asking the questions we're asking (why? do we have strong
recommendations as a community? etc.)

I know I didn't answer all of your questions/concerns. We know we'll need
a series of statistical metrics to evaluate the "infinite" series of
approaches, but this is a work in progress and suggestions are welcome.
Again, we don't want to foster an environment of algorithm competition --
one person's approach versus another's -- but, rather work towards
better understanding of how/where everything works (or doesn't) when applied to
satellite Lwn.

Keep the dialog coming!

By tsm Date 2007-05-21 11:05

Greetings all,

One thing that was lost at the OCBAM meeting a few years ago, in my opinion,
was the fact that the semi-analytic algorithms under consideration at the
time were evaluated by comparing the 3 basic output variables - Chl, ag0,
and bbp0 - to the measured in situ values.   However, other output variables
exist with these algorithms - namely, the other IOP results such as aph.
How do you weight the evaluation of multiple output fields produced from a
single algorithm, and is it useful to look at the full spectral distribution
of these fields?   As a hypothetical example, is it 'bettter' to minimize
error in the chlorophyll product over the bbp or some other IOP product?   I
really don't know...

This gets even more complicated when you start to evaluate the sensitivities
of these fields within the satellite data at the different levels.

I'm just thinking out loud...

Tim

By zplee Date 2007-05-21 11:25

Hi All:

Apparently Stephane started this discussion/conversation ahead of the
October scheduel, that is great!

To add to the IOP dilemma, another issue is, unlike Chl, there is wavelength
variation. An algorithm may perform well at one wavelength, but does not
mean the same for other wavelengths. So, in my mind, the first thing is to
define the "standard" IOP products, mean IOPs at what wavelengths
will/should be produced. From there, to determine different 'operational'
algorithms for different products. If one is good for total a, then uses
that for total a; if one is good for aph, then uses that for aph. It is not
necessary to have one algorithm for all products. The best (and eventually
it will be) is to have a package (mix/match of existing and new ones to form
the 'best' package) for the overall IOP-products objective.

That is my two cents ...

Cheers!
zhongping

By EmmanuelBoss Date 2007-05-21 12:06

Dear all,

Here are my 2c'.

Most differences between current IOP algorithm are cosmetic. The fundamental
approach in all is very similar:

1.    Define a relationship between Rrs and IOPs (explicit, implicit, 2
vs. 1 term).

2.    Choose spectral shape for IOP (may use Rrs in input as in QAA or
not).

3.    Invert to obtain best fit (linear, nonlinear, Look-Up-Table).

The subtle differences which we should discuss are whether certain choices
among these algorithms (e.g. in brackets above) are clearly superior to
others in a broad category of tests including:

1.    Matchup with datasets not used in tuning.

2.    Computing speed.

3.    Generation of uncertainties.

4.    Ability to work in complex waters.

5.    Dealing with inelastic scattering.

among others.

I don't think we should have a pissing contest among existing algorithm but
rather evaluate what are the advantages of each in order to suggest a set of
recommendation for yet-to-be invented more optimal algorithms.

Don't forget that the MEASURED IOPs against which we test the algorithms
themselves have their own uncertainties:

For a_phi from filter-pads - the only method for scattering correction is a
constant offset removal possibly resulting in an overestimate in the blue
wavelength. For b_bp - we use a smooth curve even though we know that in
reality it is not smooth when algae are present. I will leave a_cdm problems
alone for now.

Cheers,

   Emmanuel

By paul.lyon Date 2007-05-21 12:15

Jeremy, and others:

I understand Stephane's concerns. I appreciate what you are trying to
do also. One of the issues I think also exists is that there are trade
offs for each technique. An algorithm that is designed to work well
globally can be appropriate for large scale IOP studies, however, a
regionally tuned algorithm may be more appropriate for in depth small
scale process studies. Also, the number of bands used and parameters
retrieved should be driven by the goals of the study at hand (someone
interested in total absorption doesn't need an algorithm that retrieves
all the different phytoplankton pigment groups).

I think it will be very difficult to find a on size best fits all
algorithm to have as a default and on the other side, very difficult to
code up all the options that can be used... I have 4 major versions of
just the linear matrix inversion method that I use that are coded in
ways that are best suited for different tasks. Remember the issues
users had with the multitude of Chl algorithms available on MODIS? It
will be very difficult to include all the methods for that reason alone.

I do think it is good to get together and have a discussion of the
strengths and weaknesses of the different methods. The IOCCG report
number 5 did a pretty good job of describing several algorithms. Maybe
as a group we can expand on the information published there.

Paul

By tjsm Date 2008-05-22 08:02

... and now to continue the conversation on the forum ...

Following on from Ping's comment yesterday:

"So, in my mind, the first thing is to define the "standard" IOP products, mean IOPs at what wavelengths will/should be produced. From there, to determine different 'operational' algorithms for different products. If one is good for total a, then uses that for total a; if one is good for aph, then uses that for aph. It is not necessary to have one algorithm for all products. The best (and eventually it will be) is to have a package (mix/match of existing and new ones to form the 'best' package) for the overall IOP-products objective."

I want to agree with this in a pragmatic sense - but intellectually I am not so sure. A number of the semi-analytical algorithms rely on first determining the primary IOPs (i.e. total a and total bb) and then partitioning them according to empirical field or laboratory data (or sweeping generalisations and extrapolations). Theoretically, this type of algorithm could reproduce excellent retrievals of aph and ady - but poor retrievals of total a and bb. It may give us the right answer in the pragmatic sense but it will not lead to any further understanding as to why it does. Following on from Malik's excellent "ambiguity" paper last year perhaps better understanding is worth more than pragmatism? By keeping algorithms intact it also allows us to identify sources of error and propagate them through the equations better (perhaps).

Following on from Emmanuel's point about in-situ measurements having error: this is something we really must be mindful of. Looking at V1 of the NOMAD database I am sure that the bb retrievals have been made at a single wavelength and then extrapolated to the other wavelengths. Measurements aren't always measurements - there is always a degree of modelling in them, especially when using instruments such as the ac9 / bb6 etc.

That is my tuppence worth (we don't have cents here!)

Tim

By EmmanuelBoss Date 2008-05-22 16:21

Tim,
the issue of ambiguity in Rrs inversion was also treated by:

Uniqueness in Remote Sensing of the Inherent Optical Properties of Ocean Water
Michael Sydor, Richard W. Gould, Robert A. Arnone, Vladimir I. Haltrin, and Wesley Goode
Applied Optics, 2004, Vol. 43, Issue 10, pp. 2156-2162

Worth the read.

All the best,
Emmanuel

By stephane Date 2008-05-22 16:51

Tim is right about the bb data in NOMAD v1. I think a fit to the in situ data was used rather than the actual data. I think this is documented somewhere. Jeremy, do you confirm ? Will you use the same approach for NOMAD v2 ?

Stéphane

By @jeremy Date 2008-05-22 17:09

Confirmed! The main NOMAD v1 dataset presents bb at 20 wavelengths. At the time, most field instruments were only reporting up to 6 channels. As such, the field data were used to "extrapolate" to the other wavelengths. For those who are interested, the original (measured) bb data and a document describing the IOP data preparation and QC are available online via the main NOMAD Web page under "IOP Processing Documentation". The document describes how the bb slopes / extrapolation / etc. were executed. Specifically, check out:

http://seabass.gsfc.nasa.gov/data/werdell_nomad_iop_qc.pdf
http://seabass.gsfc.nasa.gov/data/nomad_bbeval_v1.3_2005262.txt

Ultimately, both the fit and measured data are available in NOMAD, but (as you mention) only the fit data are provided in the main data file. I plan to do the same for v2, unless consensus suggests otherwise.

By tsm Date 2008-05-22 17:20

Hi,
A thought on this...aggregating separate IOP's from different algorithms to
represent a 'best of' may not be entirely compatible within the whole
AOP-IOP system.   That is, IOPs from different algorithms mixed and combined
could likely lead to AOPs being different from the original Rrs spectra when
using the IOPs in a forward model.   I think this should be tested out, but
we've seen here at UNH that combining IOPs from different models 'can' lead
to some pretty weird relationships...

American Tim

By Hirata Date 2008-05-23 09:26

Dear all,

I think that it is better for us to have bb data (actually any data) as accurate as possible, especially when the data are used to evaluate IOP algorithms. Also we may want to remember that most (or all?) bb data in NOMAD are taken by a commercial instrument(s) which actually "estimates" bb from scatterance at one or a few angles by using a model. In any instrument, there is almost always a certain "model" wthin the instrument to produce "measurement", just like satellite ocean colour we are discussing now). If we fit the "estimated bb" over a certain spectral range as in NOMAD v1, we are getting further and further away from "true" bb (the fitting process does not necessarily pick up even the original measements at some wavelengths, or does it?). As far as original measurements are available, I think that the original measurements should be used. Emmanuel also warned that the bb spectra is not always smooth as a commmon model shows, probably due to effects of absorbing particles. Although using the original data may reduce a number of the data matched-up with other IOPs & AOPs (in terms of wavelength), we may want to admit it just as a fact?

Another idea is that we may use "both" the fitted and measured data in IOP model comparison. But in this case, we need to clearly state which results are obtained from which "data" (i.e. fitted or measured). In this way, we can evaluate a "potential" of an IOP model which may be used to estimate bb at wavelengths not supported in validation data (= measurements). My point here is that the use of both type of data allows us to separate the "potential" from "actual performance" of an IOP model.

In any case, we might need original measurements?

Takafumi

By @jeremy Date 2008-05-23 12:50

Good morning (on the US east coast, anyway), all.

Couple of thoughts for the conversation regarding field data. Both the measured and fit bb (and ad and ag, for that matter) are available via the NOMAD web site -- as is a document that describes how the measured data were fit. I'd suggest that a few of you take a look at these data (as time permits) to evaluate their utility. Believe me, you don't have to convince us that some data have quality issues. But, they were rigorously screened and, hopefully, most errors are systematic. Ultimately, we'd *love* to have more eyes on the product, so this exercise would be really useful from our perspective. I'd be interested in hearing more about your thoughts on fit versus measured data for this activity.

FYI, after fitting, the modeled and measured data were visually inspected in tandem to confirm coherence. In general, almost everything in NOMAD was visually inspected -- one reason why it takes so long to generate the data set. Doesn't mean the data are perfect, just means we've aimed for consistency and searched for obvious errors.

I'm glad we're having this discussion, as the quality of the field data -- and the general lack of assigned uncertainties for these products -- is very important (for the workshop and otherwise). Keep in mind, however, that statistical comparisons with field data may only emerge as a minor metric for our workshop analyses. First, this has been done thoroughly in the past. Second, personally, I'm more interested in using these data to better understand algorithm sensitivities -- that is, how do the comparisons change (relative to themselves) as input parameters (Rrs) or constants (aph*, S, etc.) are modified. Finally, don't forget that we'll be using satellite data (match-ups, level-2 and level-3 files) as well. All I'm saying is that "all of our eggs aren't in the in situ data basket" --- which, hopefully, provides everyone with some comfort. Now, I know the satellite Lwn aren't perfect either -- but, to make any progress towards our specific goals, we're going to have to take a leap of faith that they're good enough for the Workshop (as a starting point, anyway). I don't envision having sufficient time to delve deeply into atmos. correction and normalization problems -- all good topics and fodder for a follow up activity.

Happy Friday.

J