Multiple Regression, Discriminant Analysis Predictions

of Mar-Apr-May 1997 Rainfall in Northeast Brazil

contributed by Andrew Colman¹, Mike Davey¹,

Mike Harrison², Tony Evans² and Ruth Evans²

¹Ocean Applications Branch ²NWP Division

UK Meteorological Office, Bracknell, United Kingdom

Seasonal rainfall in the North Nordeste in northeast Brazil occurs mainly from February to May, with heaviest amounts in March and April. Experimental forecasts of North Nordeste rainfall at 1 and 0 month leads are issued using November-January and January-February predictor data, respectively. Two predictors found to deliver substantial forecast skill are (1) the 30^oN-30^oS portion of the third covariance-based EOF of Atlantic SST for all seasons, and (2) the first EOF of Pacific SST for Dec-Jan-Feb. Both of these EOF patterns are shown in the March 1993 issue of this Bulletin. The Atlantic EOF pattern reflects the SST anomaly immediately off the North Nordeste east coast and the large scale north-south SST gradient structure, while the Pacific EOF pattern serves mainly as an index of the ENSO situation. The amplitude time series of each of these predictors are used to predict North Nordeste rainfall both with multiple regression (giving a point forecast) and discriminant analysis (giving probabilities for each of five climatologically equiprobable [for 1951-1980] rainfall amount categories).

Details about the EOF analyses, the physical relevance of the predictors, and the two forecasting methods are given in Ward and Folland (1991). Multiple regression develops optimal weights for each predictor in order that the resulting linear equation minimizes squared errors between forecasts and corresponding observations over the training periods (1913-95, 1946-95). In discriminant analysis, categories of rainfall amount are defined, and, given values of the predictors, probabilities of each of the rainfall categories are determined using Bayes' theorem. Less linear constraint is imposed here than in multiple regression, as the probabilities do not necessarily change smoothly as a function of category.

Forecasts are made for three separate North Nordeste rainfall predictands: Nobre (for Feb-May), Hastenrath (Mar-Apr) and Fortaleza/Quixeramobim (FQ) (Mar-May). These are illustrated in Fig. 1. Each of these forecasts is done using both multiple regression and discriminant analysis. The forecasts presented here are only for the two predictands whose periods begin in March, making for a long-lead forecast: Hastenrath and FQ. The Hastenrath rainfall area occupies a central portion of the north Nordeste, while FQ is the rainfall averaged over the two stations, one of which is in the Hastenrath area.

If the amplitudes of the predictor EOFs are changing rapidly during the Nov-Jan period, values from Dec-Jan or only January may be used as predictors if the more recent SST anomalies are expected to persist. In early March updated forecasts for the predictand periods are issued, using SST data through February. While this is not a long-lead forecast, it is presented here following the 1-month lead forecast for comparison.

To estimate forecast skill, multiple regression and discriminant analysis hindcasts for FQ based on the SST for Nov-Jan were made for the 1971-92 period using data from 1913-70, and for the 1981-92 period using data from 1913-80. The eigenvector patterns, computed from 1901-80 data, cause a slight dependency in the first experiment but complete independence in the second. The discriminant analysis forecast skill was assessed by comparing the observed category with the most likely category according to the hindcasts over the period 1971-92 (Table 1: note that the quints are defined by 1951-80 observations), while the point estimate rainfall amounts predicted by multiple linear regression were correlated with observed values. The resulting correlation for the 1971-92 experiment is 0.715 with a bias of +0.09 standard deviations and a root mean squared error (RMSE) of 0.62 standard deviations. For the totally independent (including the eigenvector pattern) period of 1981-92, the correlation is 0.662, with bias of +0.07 and RMSE of 0.70 standard deviations. While the latter results are not quite as high, it is shown in Ward and Folland (1991) that independence of the eigenvector patterns is not nearly as critical to estimation of independent forecast skill as independence of the periods used for statistical model development and for forecast testing.

Observed

Q1 Q2 Q3 Q4 Q5

Q1 4 2 0 2 0

Q2 0 0 0 0 0

Hindcast Q3 0 1 1 0 0

Q4 0 0 0 0 0

Q5 1 2 0 3 6

Table 1. Hindcasts (i.e. forecasts for already observed times, but with model derived without target years) of FQ rainfall index for 1971-92, using linear discriminant analysis. The Q's are quintiles (Q1=very dry, Q5=very wet).

Experimental real-time forecasts for FQ using the methods discussed here have been made for each rainfall season since 1987. The forecasters combine the forecasts from discriminant analysis and multiple regression to determine the official forecast category. The forecast- observation correspondence from 1987 to 1996 is very good for the preliminary forecast (hit rate 6.5 out of 10), and slightly worse for the updated forecasts (4.5 out of 10). (Over a large number of cases the updated forecasts would be expected to have more skill.) Table 2 shows the record of real-time forecasts for 1987-96. It is clear that the FQ rainfall index is fairly skillfully predicted from the two SST EOFs--in fact, better so than almost any variable in the extratropical Pacific/North American region at any time of the year. The error in the 1996 forecast may be related to a sharp change in Atlantic SST through the forecast season (see the predictor time series in Fig. 2).

Year: 87 88 89 90 91 92 93 94 95 96

Prelim fcst 1 4 5 2 4 1.5 2 5 4 2.5

Updated fcst 1 5 5 3 4 2 2 4 4.5 3.5

observed 1 4 5 2 4 1.5 1 4.5 5 5

Table 2. Verification of experimental real time forecasts of NE Brazil rainfall (predictions of March-May rainfall at FQ). 1=very dry, ..., 5=very wet. The number of correct categorical forecasts out of 10 (hit rate) is 6.5 for the preliminary (1 month lead) forecast, and 4.5 for the updated (zero lead) forecast.

1997 Forecast

Figures 2 and 3 show the monthly time series of the Atlantic and Pacific SST anomaly predictors used in the regression and discriminant analysis prediction models. The Atlantic predictor is influenced by large scale north-south SST anomaly differences, and its value has increased rapidly over the previous 6 months to a high positive January value. The Pacific predictor value is near zero, with little change over the previous 12 months.

Atlantic: There are negative (cool) SST anomalies in the south east tropical Atlantic and warm anomalies north of the equator and in the south west subtropical Atlantic. This pattern historically favors drier conditions in NE Brazil. There are also warm anomalies south of the equator in the west Atlantic, close to NE Brazil, which favor wet conditions in NE Brazil.

Pacific: There are weak La Nina conditions with colder than average SST in the equatorial east Pacific, and weak warm anomalies further west and south.

Example multiple regression equations for the 1-month lead forecast for the Hastenrath (for Mar-Apr) and FQ (for Mar-May) rainfall indices (standardized rainfall anomaly units), based on 1946-1995 data, are:

Hastenrath = 0.12 - 0.65A - 0.05P

FQ = 0.17 - 0.84A - 0.06P

where the EOF predictor time coefficients (A=Atlantic EOF, P=Pacific EOF) are not standardized. (The Atlantic series varies between about -2.5 and 1.4 between 1981 and 1995, while the Pacific series varies between about -4.0 and 9.7).

For the 1-month lead linear regression predictions, we calculate the average of predictions made using training periods 1913-95 and 1946-95, and SST anomalies for November, December and January. The result is:

predictand forecast quint quint range stand.

error

Hastenrath -0.50 2 -.65 to -.16 0.57

FQ -0.65 2 -.70 to -.16 0.65

The Hastenrath and FQ forecasts are both near the lower end of the quint 2 (dry) range. The multiple regression forecast for the Nobre (Feb-May) predictand is for quint 1 (very dry). The standard errors (in standard deviation units) associated with these forecasts express the inherent uncertainty based on the training period statistics.

The discriminant analysis produces the following probabilities that the Hastenrath and FQ indices will be in each of the quintiles for the 1-month lead time:

very dry dry average wet very wet

Hastenrath .33 .38 .19 .08 .02

FQ .45 .21 .17 .15 .01

The discriminant analysis prediction for Hastenrath favors dry conditions, while that for FQ suggests very dry conditions.

Based on the statistical results alone, our best estimate forecast is for dry conditions for the Hastenrath and FQ rainfall indices.

As in 1994-96, dynamical predictions of Northeast Brazil rainfall were also made using a version of the UKMO climate atmospheric general circulation model (see article by Harrison et al., this issue). The AGCM showed high skill in simulating interannual variability of Northeast Brazil rainfall when forced with observed SST.

For 1997 the AGCM results consistently predict a southward displacement of the intertropical convergence zone in the west Atlantic from its climatological location, leading to above average rainfall in NE Brazil (see Harrison et al. for details).

In summary, the statistical and dynamical methods give very different predictions for 1997. The regression predictions and the discriminant analysis probability predictions are for dry or very dry conditions for the FQ, Hastenrath and Nobre rainfall regions, as a consequence of the strong positive value of the Atlantic predictor. However, the AGCM predictions consistently indicate rainfall well above average in NE Brazil, which may be due to the warm SST anomalies near NE Brazil. The contrast between the statistical and dynamical predictions may be due to differing effects of large-scale and local SST anomalies in the tropical Atlantic.

The large differences between the statistical and dynamical predictions indicate considerable uncertainty in the seasonal rainfall forecast for 1997. Confidence in an overall forecast is very low, and no specific overall best-estimate forecast has been issued.

UPDATED (ZERO-LEAD) FORECASTS, INCLUDING FEBRUARY PREDICTOR DATA

While zero-lead forecasts are not encouraged for this Bulletin, they appear here as auxiliary information accompanying long-lead forecasts for the same targets. In February the Atlantic predictor value remained strongly positive while the Pacific predictor remained near zero (see Figs. 2, 3). The zero-lead statistical forecasts indicate quint 1 (very dry) conditions, drier than the 1-month lead forecast.

Ward, M.N. and C.K. Folland, 1991: Prediction of seasonal rainfall in the North Nordeste of Brazil using eigenvectors of sea surface temperature. Int. J. Climatol., 11, 711-743.

Fig. 1. Locations of the stations used in the Hastenrath rainfall time series, and the Fortaleza and Quixeramobim stations. The Nobre rainfall time series is based on stations throughout the bounded region indicated.

Fig. 2. Amplitude time series for the Atlantic eigenvector for January 1990 to February 1997. Positive values (e.g. SST anomalies warm in north tropical Atlantic, cool in south tropical Atlantic) are associated with drier conditions.

Fig. 3. Amplitude time series for the Pacific eigenvector for January 1990 to February 1997. Positive values (e.g. SST anomalies warm in the central-east equatorial Pacific, cool in the north-west and south-west Pacific) are associated with drier conditions.

[Previous Article] [Next Article]