[Previous Article] [Next Article]
Multiple Regression, Discriminant Analysis Predictions
of Mar-Apr-May 1997 Rainfall in Northeast Brazil
contributed by Andrew Colman1, Mike Davey1,
Mike Harrison2, Tony Evans2 and Ruth Evans2
1Ocean Applications Branch 2NWP Division
UK Meteorological Office, Bracknell, United Kingdom
Seasonal rainfall in the North Nordeste in northeast Brazil occurs mainly from February to May,
with heaviest amounts in March and April. Experimental forecasts of North Nordeste rainfall at 1
and 0 month leads are issued using November-January and January-February predictor data,
respectively. Two predictors found to deliver substantial forecast skill are (1) the 30oN-30oS
portion of the third covariance-based EOF of Atlantic SST for all seasons, and (2) the first EOF
of Pacific SST for Dec-Jan-Feb. Both of these EOF patterns are shown in the March 1993 issue
of this Bulletin. The Atlantic EOF pattern reflects the SST anomaly immediately off the North
Nordeste east coast and the large scale north-south SST gradient structure, while the Pacific EOF
pattern serves mainly as an index of the ENSO situation. The amplitude time series of each of
these predictors are used to predict North Nordeste rainfall both with multiple regression (giving
a point forecast) and discriminant analysis (giving probabilities for each of five climatologically
equiprobable [for 1951-1980] rainfall amount categories).
Details about the EOF analyses, the physical relevance of the predictors, and the two forecasting
methods are given in Ward and Folland (1991). Multiple regression develops optimal weights for
each predictor in order that the resulting linear equation minimizes squared errors between
forecasts and corresponding observations over the training periods (1913-95, 1946-95). In
discriminant analysis, categories of rainfall amount are defined, and, given values of the
predictors, probabilities of each of the rainfall categories are determined using Bayes' theorem.
Less linear constraint is imposed here than in multiple regression, as the probabilities do not
necessarily change smoothly as a function of category.
Forecasts are made for three separate North Nordeste rainfall predictands: Nobre (for Feb-May),
Hastenrath (Mar-Apr) and Fortaleza/Quixeramobim (FQ) (Mar-May). These are illustrated in Fig.
1. Each of these forecasts is done using both multiple regression and discriminant analysis. The
forecasts presented here are only for the two predictands whose periods begin in March, making
for a long-lead forecast: Hastenrath and FQ. The Hastenrath rainfall area occupies a central
portion of the north Nordeste, while FQ is the rainfall averaged over the two stations, one of
which is in the Hastenrath area.
If the amplitudes of the predictor EOFs are changing rapidly during the Nov-Jan period, values
from Dec-Jan or only January may be used as predictors if the more recent SST anomalies are
expected to persist. In early March updated forecasts for the predictand periods are issued, using
SST data through February. While this is not a long-lead forecast, it is presented here following
the 1-month lead forecast for comparison.
To estimate forecast skill, multiple regression and discriminant analysis hindcasts for FQ based on
the SST for Nov-Jan were made for the 1971-92 period using data from 1913-70, and for the
1981-92 period using data from 1913-80. The eigenvector patterns, computed from 1901-80 data,
cause a slight dependency in the first experiment but complete independence in the second. The
discriminant analysis forecast skill was assessed by comparing the observed category with the
most likely category according to the hindcasts over the period 1971-92 (Table 1: note that the
quints are defined by 1951-80 observations), while the point estimate rainfall amounts predicted
by multiple linear regression were correlated with observed values. The resulting correlation for
the 1971-92 experiment is 0.715 with a bias of +0.09 standard deviations and a root mean squared
error (RMSE) of 0.62 standard deviations. For the totally independent (including the eigenvector
pattern) period of 1981-92, the correlation is 0.662, with bias of +0.07 and RMSE of 0.70
standard deviations. While the latter results are not quite as high, it is shown in Ward and Folland
(1991) that independence of the eigenvector patterns is not nearly as critical to estimation of
independent forecast skill as independence of the periods used for statistical model development
and for forecast testing.
Observed
Q1 Q2 Q3 Q4 Q5
Q1 4 2 0 2 0
Q2 0 0 0 0 0
Hindcast Q3 0 1 1 0 0
Q4 0 0 0 0 0
Q5 1 2 0 3 6
Table 1. Hindcasts (i.e. forecasts for already observed times, but with model derived without
target years) of FQ rainfall index for 1971-92, using linear discriminant analysis. The Q's are
quintiles (Q1=very dry, Q5=very wet).
Experimental real-time forecasts for FQ using the methods discussed here have been made for
each rainfall season since 1987. The forecasters combine the forecasts from discriminant analysis
and multiple regression to determine the official forecast category. The forecast- observation
correspondence from 1987 to 1996 is very good for the preliminary forecast (hit rate 6.5 out of
10), and slightly worse for the updated forecasts (4.5 out of 10). (Over a large number of cases
the updated forecasts would be expected to have more skill.) Table 2 shows the record of
real-time forecasts for 1987-96. It is clear that the FQ rainfall index is fairly skillfully predicted
from the two SST EOFs--in fact, better so than almost any variable in the extratropical
Pacific/North American region at any time of the year. The error in the 1996 forecast may be
related to a sharp change in Atlantic SST through the forecast season (see the predictor time
series in Fig. 2).
Year: 87 88 89 90 91 92 93 94 95 96
Prelim fcst 1 4 5 2 4 1.5 2 5 4 2.5
Updated fcst 1 5 5 3 4 2 2 4 4.5 3.5
observed 1 4 5 2 4 1.5 1 4.5 5 5
Table 2. Verification of experimental real time forecasts of NE Brazil rainfall (predictions of
March-May rainfall at FQ). 1=very dry, ..., 5=very wet. The number of correct categorical
forecasts out of 10 (hit rate) is 6.5 for the preliminary (1 month lead) forecast, and 4.5 for the
updated (zero lead) forecast.
1997 Forecast
Figures 2 and 3 show the monthly time series of the Atlantic and Pacific SST anomaly predictors
used in the regression and discriminant analysis prediction models. The Atlantic predictor is
influenced by large scale north-south SST anomaly differences, and its value has increased rapidly
over the previous 6 months to a high positive January value. The Pacific predictor value is near
zero, with little change over the previous 12 months.
Atlantic: There are negative (cool) SST anomalies in the south east tropical Atlantic and warm
anomalies north of the equator and in the south west subtropical Atlantic. This pattern historically
favors drier conditions in NE Brazil. There are also warm anomalies south of the equator in the
west Atlantic, close to NE Brazil, which favor wet conditions in NE Brazil.
Pacific: There are weak La Nina conditions with colder than average SST in the equatorial east
Pacific, and weak warm anomalies further west and south.
Example multiple regression equations for the 1-month lead forecast for the Hastenrath (for
Mar-Apr) and FQ (for Mar-May) rainfall indices (standardized rainfall anomaly units), based on
1946-1995 data, are:
Hastenrath = 0.12 - 0.65A - 0.05P
FQ = 0.17 - 0.84A - 0.06P
where the EOF predictor time coefficients (A=Atlantic EOF, P=Pacific EOF) are not
standardized. (The Atlantic series varies between about -2.5 and 1.4 between 1981 and 1995,
while the Pacific series varies between about -4.0 and 9.7).
For the 1-month lead linear regression predictions, we calculate the average of predictions made
using training periods 1913-95 and 1946-95, and SST anomalies for November, December and
January. The result is:
predictand forecast quint quint range stand.
error
Hastenrath -0.50 2 -.65 to -.16 0.57
FQ -0.65 2 -.70 to -.16 0.65
The Hastenrath and FQ forecasts are both near the lower end of the quint 2 (dry) range. The
multiple regression forecast for the Nobre (Feb-May) predictand is for quint 1 (very dry). The
standard errors (in standard deviation units) associated with these forecasts express the inherent
uncertainty based on the training period statistics.
The discriminant analysis produces the following probabilities that the Hastenrath and FQ indices
will be in each of the quintiles for the 1-month lead time:
very dry dry average wet very wet
Hastenrath .33 .38 .19 .08 .02
FQ .45 .21 .17 .15 .01
The discriminant analysis prediction for Hastenrath favors dry conditions, while that for FQ
suggests very dry conditions.
Based on the statistical results alone, our best estimate forecast is for dry conditions for the
Hastenrath and FQ rainfall indices.
As in 1994-96, dynamical predictions of Northeast Brazil rainfall were also made using a version
of the UKMO climate atmospheric general circulation model (see article by Harrison et al., this
issue). The AGCM showed high skill in simulating interannual variability of Northeast Brazil
rainfall when forced with observed SST.
For 1997 the AGCM results consistently predict a southward displacement of the intertropical
convergence zone in the west Atlantic from its climatological location, leading to above average
rainfall in NE Brazil (see Harrison et al. for details).
In summary, the statistical and dynamical methods give very different predictions for 1997. The
regression predictions and the discriminant analysis probability predictions are for dry or very dry
conditions for the FQ, Hastenrath and Nobre rainfall regions, as a consequence of the strong
positive value of the Atlantic predictor. However, the AGCM predictions consistently indicate
rainfall well above average in NE Brazil, which may be due to the warm SST anomalies near NE
Brazil. The contrast between the statistical and dynamical predictions may be due to differing
effects of large-scale and local SST anomalies in the tropical Atlantic.
The large differences between the statistical and dynamical predictions indicate considerable
uncertainty in the seasonal rainfall forecast for 1997. Confidence in an overall forecast is very
low, and no specific overall best-estimate forecast has been issued.
UPDATED (ZERO-LEAD) FORECASTS, INCLUDING FEBRUARY PREDICTOR DATA
While zero-lead forecasts are not encouraged for this Bulletin, they appear here as auxiliary
information accompanying long-lead forecasts for the same targets. In February the Atlantic
predictor value remained strongly positive while the Pacific predictor remained near zero (see
Figs. 2, 3). The zero-lead statistical forecasts indicate quint 1 (very dry) conditions, drier than the
1-month lead forecast.
Ward, M.N. and C.K. Folland, 1991: Prediction of seasonal rainfall in the North Nordeste of
Brazil using eigenvectors of sea surface temperature. Int. J. Climatol., 11, 711-743.
Fig. 1. Locations of the stations used in the Hastenrath rainfall time series, and the Fortaleza and
Quixeramobim stations. The Nobre rainfall time series is based on stations throughout the
bounded region indicated.
Fig. 2. Amplitude time series for the Atlantic eigenvector for January 1990 to February 1997.
Positive values (e.g. SST anomalies warm in north tropical Atlantic, cool in south tropical
Atlantic) are associated with drier conditions.
Fig. 3. Amplitude time series for the Pacific eigenvector for January 1990 to February 1997.
Positive values (e.g. SST anomalies warm in the central-east equatorial Pacific, cool in the
north-west and south-west Pacific) are associated with drier conditions.