***This file contains most of the text documentation from the comprehensive NDP-026C documentation file, ndp026c.pdf (which requires Adobe Acrobat Reader), without any of the tables and figures. For complete documentation, users should view ndp026c.pdf.*** PLEASE NOTE: On March 20, 2000 this page and the global cloudiness data set were updated to reflect the addition of 12 monthly files of ship reports for 1996. Previously, the ship data extended only through 1995. __________________________________________________________________________ EXTENDED EDITED SYNOPTIC CLOUD REPORTS FROM SHIPS AND LAND STATIONS OVER THE GLOBE, 1952-1996 (Data Set Documentation) August 1999 Carole J. Hahn Department of Atmospheric Sciences University of Arizona Tucson, AZ 85721-0081 Stephen G. Warren Department of Atmospheric Sciences University of Washington Seattle, WA 98195-1640 TABLE OF CONTENTS ABSTRACT 1. INTRODUCTION 2. SOURCE DATA 2.1. Some Problems Encountered in Source Data Sets 2.2. List of Data Periods Known to be Missing 3. PROCESSING OF WEATHER REPORTS 3.1. Cloud Information in Synoptic Reports and the "Extended" Cloud Code 3.2. Processing Through the Total Cloud Stage 3.3. Consistency Checks for Cloud Types, and the Change Code 3.4. Corrected Values for Cloud Base Heights in 1994-96 Land data 3.5. The Amounts of Middle and High Clouds 3.6. Determination of Cloudiness at Night 4. THE EXTENDED EDITED CLOUD REPORT AND THE DATA ARCHIVE 4.1. Contents and Format of the EECR 4.2. Organization of the Archive 5. COMMENTS ON USE OF THE DATA 5.1. Computing the Average Cloud Amounts and Frequencies 5.1.1. Total Cloud Cover 5.1.2. Low Cloud Types 5.1.3. Upper Level Clouds 5.1.4. Middle-Level Amount-When-Present in China, 1971-79 5.2. Avoiding the Night-detection Bias and Day-night Sampling Bias 5.3. Avoiding the Clear-sky Bias and Sky-obscured Bias for Cloud-Type Frequency of Occurrence 5.3.1. Adjustment Factors 5.3.2. Land Stations That Do Not Report Cloud Types 5.3.3. Grid Boxes with Ship Reports Unsuitable for Cloud-Type Analyses 5.3.4. Computational Methods That Correct For These Biases 5.4. Land Stations Usable for Analysis of Trends in Cloud Cover 5.4.1. Canadian Stations That Changed Station ID Numbers 5.5. Land Station Latitude-Longitude Switches 6. COUNT SUMMARIES 6.1. Distribution of Cloud Reports over the 8 Synoptic Hours 6.2. Distribution of Code Values 6.3. Cases of Sky-obscured and Nimbostratus Cloud 6.4. Distribution of Reports over the Globe 6.5. Decline in USA Synoptic Cloud Reports Since 1982 7. HOW TO OBTAIN THE DATA ACKNOWLEDGMENTS REFERENCES -------------------------------------------------------------------------------- ABSTRACT Hahn, C.J., and S.G. Warren, 1999: Extended Edited Synoptic Cloud Reports from Ships and Land Stations Over the Globe, 1952-1996. ORNL/CDIAC-123, NDP-026C, Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, TN. (Also available from Data Support Section, National Center for Atmospheric Research, Boulder, CO as DS 292.2) Surface synoptic weather reports for the entire globe, gathered from various available data sets, were processed, edited, and rewritten to provide a single dataset of individual observations of clouds, spanning the 45 years 1952-1996 for ship data and the 26 years 1971-1996 for land station data. In addition to the cloud portion of the synoptic report, each edited report also includes the associated pressure, present weather, wind, air temperature, and dew point (and sea surface temperature over oceans). The cloud reports included in this "Extended Edited Cloud Report Archive" (EECRA) have passed through extensive quality control tests. Reports from the source data sets that did not meet the quality control standards were rejected for the EECRA. Minor correctable inconsistencies within reports were edited for consistency. Cases of "sky obscured" were interpreted by reference to the present-weather code as to whether they indicated fog, rain, snow, or thunderstorm. Special coding was added to indicate probable nimbostratus clouds which are not specifically coded for in the standard synoptic code. Any changes made to an original report are also noted in the archived edited report so that the original report can be reconstructed if desired. This "extended edited cloud report" also includes the amounts, either inferred or directly reported, of low, middle, and high clouds, both overlapped and non-overlapped amounts. The relative lunar illuminance and the solar zenith angle associated with each report are also given, as well as an indicator that tells whether our recommended illuminance criterion was satisfied so that the "night-detection bias" for clouds can be minimized. The EECRA contains over 72 million cloud observations from ships and 311 million from land stations. Each report is 80 characters in length. The archive consists of 841 files of edited synoptic reports, one file for each month of data for land and ocean separately, and 4 ancillary files which provide important information about reporting characteristics of the land stations. This data set will be useful for applications such as: (1) development of user- defined cloud climatologies for particular subtypes of clouds, or for different temporal and spatial resolution than we have chosen for our atlases, (2) in comparison of satellite cloud retrievals with surface observations, to help diagnose difficulties in cloud identification from satellite, and (3) to relate formation of individual types of clouds to their meteorological environments. 1. INTRODUCTION Surface synoptic weather reports containing cloud information have been gathered for the entire globe from various available data sets and processed, edited, and rewritten to provide a single dataset of individual observations of clouds and other associated weather variables. The reports included span 45 years (1952-1996) for ships and 26 years (1971-1996) for land stations. These cloud data were originally gathered for development of our own cloud climatologies (Hahn et al., 1982; 1984; Warren et al.,1986; 1988; Hahn et al., 1994), and additional years of data have now been processed to accommodate our future climatological studies. As such, the reports in this "Extended Edited Cloud Report Archive" (EECRA) have passed through all the processing procedures for our cloud-climatology work up to the point of being ready to enter into averages. Therefore this archive should be useful to many other researchers for a variety of applications. A preliminary version of the EECRA has been tested by a few users and some results have already been published (Norris 1998a, b; Norris et al., 1998). Some changes have been made to that preliminary version and all future users should obtain the official release presented here. The EECRA is an update and extension of the "Edited Cloud Report Archive" (ECRA; Hahn et al., 1996) which covered the 10-year period 1982-1991. Much of the documentation herein is similar to the report for that data set but with additional sections required to discuss the extended format which accommodates the other weather variables, the extended time period which spans changes in data coding practices, and several ancillary files which provide important information about reporting characteristics for ships and land stations. Also, an updated version of our source data for ship observations (see Section 2) was used for the EECRA, resulting in a 34% increase in ocean data volume for 1982-91 over that of the ECRA. The EECRA has several features that facilitate its use in cloud analyses: 1) Data sets of synoptic weather reports include reports that do not contain cloud information, such as those from automated weather stations on land and buoys in the oceans. These are excluded from the EECRA. 2) The cloud portion of the synoptic report occasionally contains obvious errors or inconsistencies which must be checked for to avoid inclusion of detectably erroneous data in an analysis. Quality control procedures which we have developed over years of analyzing surface cloud reports have been applied so that erroneous or inconsistent reports have either been excluded or, if possible, corrected before inclusion in the archive. 3) Although the amount of low cloud is coded directly into the synoptic report, the amounts of middle and high clouds are not, but they may often be inferred. Where possible for upper level clouds, the EECR (an individual report in the EECRA) includes the "actual" cloud amount (sometimes requiring use of the random- overlap assumption) as well as the non-overlapped amount, which is simply the amount actually seen from below. 4) Cases of "sky obscured" were interpreted by reference to the present-weather code as to whether they indicated fog, rain, snow, or thunderstorm. Special coding was added to indicate probable nimbostratus clouds which are not specifically coded for in the standard synoptic code. Any changes made to an original report are also noted in the edited report so that the original report can be reconstructed if desired. 5) Although all reports that meet the above criteria are included in the EECRA, many of the nighttime reports were made under conditions of insufficient illumination for adequate detection of clouds. Use of such reports results in an underestimate of nighttime cloudiness by about 4% globally and has a profound influence on computed diurnal cycles in cloudiness (Hahn et al., 1995). Reports made under conditions that satisfy the criterion for adequate illumination developed by Hahn et al. (1995) are flagged in the EECR, and both the relative lunar illuminance and the solar altitude are given for each report. 6) Synoptic weather reports contain information in addition to clouds, such as air temperature, pressure, winds, humidity, visibility, past weather and, for ships, sea surface temperature and ocean wave parameters. The EECR retains only the most common weather variables, thus reducing the data volume while allowing cloud characteristics to be assessed within the context of the surrounding meteorological environment. Since only reports with cloud information are retained here, many reports with temperature, pressure, etc., but no cloud data, were eliminated. Non-standard terms used in the following discussions are defined in the glossary of terms and abbreviations in Appendix A. CAUTION: The following sections should be given special attention for effective use of the cloud data in this archive: -Avoiding the clear-sky and sky-obscured biases for cloud types, Sec. 5.3. -Computation of cloud type frequencies and amount-when-present, Sec. 5.1. -Amount-when-present of altostratus clouds for regions of China, 1971-79, Sec. 5.1.4. -Illuminance criterion for avoiding the night-detection bias, Secs. 3.6, 5.2. And of less critical nature are: -Use of land stations in trend analyses, Sec. 5.4 -Incorrect latitude and longitude for some land stations, Sec. 5.5. -Note on declining number of USA stations since 1982, Sec. 6.5. -Missing cloud heights in NCEP data beginning September 1994, Sec. 3.4. 2. SOURCE DATA All source data were obtained from the Data Support Section, National Center for Atmospheric Research (NCAR), Boulder, Colorado. For land data, two data sets were processed. For 1971-1976 we used the "SPOT" archive of the Fleet Numerical Oceanography Center (FNOC). For 1977-1996 we used data from the National Centers for Environmental Prediction (NCEP; formerly NMC). Reasons for not using land data prior to 1971 are given by Warren et al. (1986). About 315 million reports were processed. Only those stations assigned official station numbers by the World Meteorological Organization (WMO) were used. (See Section 6.5 for comments on characteristics of USA station reports.) Synoptic reporting hours are 00, 03, 06, 09, 12, 15, 18, 21 Greenwich Mean Time (GMT). The source data for 1971-1977 are sorted first by time and second by station number for each month of data. This sort is retained in the EECRA. In the NCEP archive for 1978-1996, the 6-hourly reports (00,06,12,18 GMT) are stored in separate files from the intermediate 3-hourly reports (03,09,15,21 GMT). Within each of these two groups the reports are sorted by time and station number. We processed the two groups in tandem for each month so that in the EECRA the 6-hourly reports occur first within a file, followed by the intermediate 3-hourly reports. The source we used for ship observations is the Comprehensive Ocean-Atmosphere Data Set (COADS; Woodruff et al., 1987; 1998). About 74 million reports were processed, excluding the large number of buoys and other weather reports that do not contain cloud information. (Prior to about 1980 less than 5% of the COADS reports fell into these two categories. During the 1980's this fraction began to increase and in January 1995 the fraction was 58%.) The Release 1 CMR5 data were used for 1952-1979, Release 1A LMRF6 data were used for 1980-1992, and a Release 1A Extension was used for 1993-1995. The Release 1A data are provided in a synoptic sort by time. For the EECRA, the data for each time were then sorted by latitude and longitude. Reports from Release 1 data had to be re-sorted to match this order. More current data were not yet available at the time of the present work. Data prior to 1952 are not included because cloud type information was considered to be unreliable prior to that time (Warren et al., 1988). [Cloud- type reporting actually did not stabilize until 1954, as shown in Figures 3 and 4 of London et al. (1991).] 2.1. Some Problems Encountered in Source Data Sets For those who might have occasion to access the source data sets used here, we can point out a few problems that we have encountered with them. Some of these will be discussed in more detail in later sections. Of the 71 months of SPOT data processed, 20% contained some erroneous station elevations of >5000 m or <-400 m. (Actual station elevations range from -350 m to 4877 m.) Investigation showed that these occurred in "bad blocks" of data which contained other errors as well, a consequence of the data format employed by FNOC. All reports from these blocks were excluded from the EECRA. Only one case of station elevation >5 km occurred in the NCEP data. That was in the 1977 data and was corrected for inclusion in the EECRA. Approximately 0.6% of the SPOT reports were duplicates; i.e. a second report from the same station at the same time. We retained these in the EECRA and did not check to see if the weather data in the duplicate reports were identical (sometimes such reports differ in only a single variable and sometimes one report contains cloud-type information and the other does not). The NCEP data had no duplicates until 1993, and after 1993 had less than 0.005% in the worst-case month. In the SPOT data set we frequently encountered stations (identified by their WMO station ID number) whose reported latitude or longitude changed one or more times within a month of data. These were not actual station moves, but round-off errors (latitude and longitude are given only to one decimal place in the SPOT data record) or data-entry errors. Some such latitude-longitude switches did occur in the NCEP data but they were rare. Since such errors can affect the assignment of reports to grid boxes for averaging, this problem is dealt with further in Section 5.5. From mid-September 1994 through (at least) April 1997, all cloud base heights are in error in the NCEP data. This was discovered by noticing that no height codes (see Table 1) in the range 4 to 8 occurred in the data reports during this period. (If unnoticed, this would imply quite a lowering of cloud base heights in recent years!) These erroneous heights resulted from a processing error at NCEP. They will not be corrected by NCEP in existing data but will probably be eliminated in future data releases (G. Walters, NCAR, personal communication, 1997). In the EECRA we have corrected these erroneous heights by reference to the SPOT data set for those years, as discussed in Section 3.4. Until September 1993, the NCEP data set included an identifiable class of reports from the United States airways reporting system. These "hourlies" reports, identified by call letters rather than by WMO station numbers, are eliminated from all of our cloud analyses because the reporting procedures for clouds in these reports do not use the WMO synoptic code and cannot be reconciled with it. Beginning in 1982 when NOAA began to close synoptic weather stations in the United States, airways data were "converted" into the synoptic format and included in the NCEP data set with corresponding WMO station numbers (D. Joseph, NCAR, personal communication, 1997). A flag to identify these reports as "converted hourlies" is included in the NCEP data record beginning in 1983, so we were able to eliminate them from the EECRA. We manually deleted 39 stations for 1982. See Section 6.5 for further discussion. The COADS data set is generally reliable. Some reports from the Arctic Ocean, which were included in the EECRA, contain unrealistically high air temperatures (>25 C) and similarly high sea surface temperatures. These are probably mis-located reports which should be excluded from Arctic cloud analyses. The present-weather indicator (Ix, Section 3.2) was not available in ship data from 1982 to about 1985, but a slot for it was added to the LMRF format in COADS Release 1a. 2.2. List of Data Periods Known to be Missing Land station data for the entire globe are missing in the source datasets for the following times, so they are missing also in the Land EECRA. ______________________________________________________________________ Year Month Days hours,GMT Year Month Days hours,GMT ---- ----- ---- --------- ---- ----- ---- --------- SPOT Data NCEP Data 1971 Jan 26-31 1977 Apr 28 06-21 Feb 1-15 29 00-09 May 4 30 06-15 Jul 16-19 Jul 5-6 1973 Aug 7-16 1978 Apr 20 i3-hrly 1974 May 2-12 22-23 i3-hrly Aug 7-9 25-26 i3-hrly Nov 16-19 1979 Nov 9-10 6-hrly 1975 Jan 18-23 1981 Apr 22 i3-hrly Feb 1-5 1982 Jul 4-10 i3-hrly Sep 19-30 1984 Apr 15-19 i3-hrly Oct 10-16 1986 Nov 2-8 i3-hrly Dec 12-25 1976 Nov 14 1987 Oct 11-13 all but 00,12 ___________________________________________________________________________ 3. PROCESSING OF WEATHER REPORTS 3.1. Cloud Information in Synoptic Reports and the "Extended" Cloud Code Synoptic weather reports are coded according to the system given by the World Meteorological Organization (WMO, 1988). The information in these reports that relates to cloud analysis is summarized in Table 1. The other weather variables included in the EECR are described in Section 4.1. A more detailed breakdown of the definitions of the cloud and weather types, as used here, is given in Table 2. The table shows the synoptic codes that correspond to various precipitation types (ww codes) as well as the codes that correspond to the various cloud types defined within each of the three reporting levels: low, middle and high (CL, CM, CH). We give special consideration to the cloud type nimbostratus (Ns), which is not specifically defined in the synoptic code. Codes CM= 2 or 7 may signify Ns but may also signify As or Ac, respectively. We consider these codes to signify Ns when there is concurrent precipitation in the form of drizzle, rain, or snow as indicated in the present weather code ww (symbolized in Table 2 as D, R and S, respectively). To distinguish Ns from As/Ac we "extend" the synoptic code for CM to include the values 12 and 11 to represent these cases of CM= 2 and 7, respectively. The extended code values (shown in Table 2) are entered in the edited cloud report (Section 4) without loss of the information content in the original report. Nimbostratus is also considered to be present when the middle level is unreported (CM= /) and specified combinations of precipitation and low cloud types are present (Table 2). These cases are given the extended code CM= 10. This definition for Ns has been simplified from that used in our previous work (W86, W88). We no longer define cases of CM=/ with low stratus and drizzle to be Ns because such clouds are probably thin and do not extend up to the middle level. This will cause a slight reduction in computed Ns amounts in comparison to W86 and W88 (Section 6.3). Special consideration is also given to the case of N=9 (sky obscured). If ww indicates that the sky was obscured due to F, Ts, or DRS (symbols defined in Table 2), the cloud type is considered to be Fo, Cb, or Ns, respectively, and is given the extended code CL=11, CL=10, or CM=10, respectively, and the value of N is set to 8 oktas. All the changes described here are coded in a parameter called "the change code" (Section 3.3 below) which is also included in the edited cloud report (Section 4), so that the original report can be reconstructed if desired. 3.2. Processing Through the Total Cloud Stage A cloud report may be suitable for total cloud analysis even if cloud type information is missing or incomplete. Certain inconsistencies within the cloud- type portion of the report may, however, make the whole report suspect and cause us to reject it even for total cloud analysis. The processing and quality control checks performed on each weather report read from the original archive (FNOC, NCEP or COADS), and designed to ensure suitability for total cloud analysis, are shown in the flow chart in Figure 1. The percentage of reports discarded at each stage of the processing, discussed in the following paragraphs, is indicated. In the early stages of processing, land and ship reports required slightly different checks (upper portion of Figure 1). A land station that did not have a WMO station number was discarded (most of these were airways data from the United States, not originally reported in the synoptic code), thus ensuring more uniformity in reporting procedures. Any ship report known to be from a buoy (by the "deck" number in the COADS data) was discarded. Any report that had no cloud information (N=/) was discarded. In 1982 WMO introduced several changes in the coding procedure (WMO, 1988). One of these changes is that observers are now permitted to set ww=/ if present weather was either "not available" or "observed phenomena were not of significance" (ww codes 00-03 are considered to represent phenomena without significance). The present weather indicator, Ix, is used to distinguish these cases. Land station reports with Ix values of 4, 5 or 6 signify automatic weather stations and were discarded. Reports with Ix=3 (data not available) were also discarded because without ww it is not possible to interpret cases of N=9 (see W86) or to evaluate the occurrence of precipitation. Ix=2 indicates that observed phenomena were not of significance, while Ix is coded as "1" when ww is given. Occasionally Ix=1 when ww=/; these inconsistent reports were also discarded. Examination of the NCEP data set showed that while land stations adopted this new coding procedure almost immediately, Ix was not consistently coded in ship reports until 1985. We therefore did not screen ship reports on the basis of Ix. Appendix F1 gives the relative occurrence of various present weather codes (where "-1" represents a "/" in the original report) over a sampling of years for both land and ship data, globally. For land, reports with ww=/ were excluded from the EECRA prior to 1982 and Ix inconsistencies were excluded from 1982 onward, so that the sum of the codes -1 to 3 in the EECRA remained within the range 71 to 74% (with 1995 at 76%, possibly because of the deterioration of surface reporting in the 1990's as discussed in Section 6.5). For ships, present weather was reported about 99% of the time from 1962 to 1981 and the sum of the codes -1 to 3 remained fairly constant at about 80% from 1962 to 1995. The large values for ww=-1 from 1952 to 1961 are a result of the inclusion, in those years, of data from the Historic Sea Surface Temperature Project (HSST) in which both present weather and cloud-type information were artificially deleted by the data processors (thus ww=-1 and CL=-1 in the EECRA). (The HSST data are retained in the EECRA for total cloud analyses but will not contribute to cloud-type analyses.) Because of the inability to perform consistency checks for Ix on the ship data, some reports which should ideally be eliminated are included in the EECRA. At the upper horizontal dashed line in Figure 1, 315 million land reports (1971-96) and 73.5 million ship reports (1952-95) remained. The discard-fractions given below this line are fractions of these numbers. If the reported latitude and longitude of a land station assigned the station into a grid box that is entirely water (0.05% of the reports) or if the reported location of a ship assigned the ship into a box that is entirely land (0.1%), the report was discarded. Stations on small islands in an otherwise ocean grid box are included in the land data, and reports from the Great Lakes and the Caspian Sea are included in the ship data (Sec. 5.3.2). If the sky was obscured (N=9) by fog (ww=F; 1.1% land, 1.9% ship), thunderstorms-showers (ww=Ts, abbreviated as T in Figure 1; 0.05% land, 0.1% ship), or drizzle-rain-snow (ww=DRS, abbreviated here as R; 0.4% land, 0.6% ship), the sky was considered to be overcast (N=8). This source of "cloudiness" contributed about 1% to the total cloud cover globally, and much more in some locations and seasons (Hahn et al., 1992). Clouds could not be inferred if the sky was obscured for other reasons, such as blowing dust or snow, and such reports were discarded. The change code, IC=1 (discussed in Section 3.3 below), signifies that a report came through the N=9 branch of the processing. Thus 1.6% of the land reports and 2.6% of the ship reports had N=9 with the ww codes D, R, S, F or Ts. For our earlier ocean cloud atlas (Warren et al., 1988), which included data only through 1981, we converted N=9 with ww=/ to N=8 for total cloud analyses. Less than 0.1% of the reports fell into this category. We do not make that conversion for the EECRA because after the 1982 code change a large fraction of the reports have ww=/. Other consistency checks indicated in Figure 1 ensure that the low cloud amount is not greater than the total cloud cover, that precipitation (as defined in Table 2) is not reported with a clear sky, and that if cloud is present (and types are reported), a cloud type must be indicated in at least one of the three possible levels (this test actually discards a report if N>0 and CL=0 and CM0 and CH0). The re-coding indicated in the lower left box in Figure 1 is necessary only for post-1981 data because of the 1982 code change (WMO, 1988) that instructs observers to set CL=CM=CH=/ when N=0 (this requires special attention in cloud type analysis and will be discussed in Section 5). Prior to 1982, stations which normally report cloud types entered 0's for the cloud type variables when N=0, while stations which do not report cloud types entered /'s. These /'s are left intact in the pre-1982 reports so that these two types of stations can be distinguished, thereby avoiding the "clear-sky bias" introduced into the post- 1981 data. Methods for avoiding this bias in post-1981 data are described in Section 5.3. The number of reports that survive these tests and are suitable for total cloud analysis (referred to as "total reports", as opposed to "type reports") is 311 million for land and 71 million for ships. Of these, 227 million and 52 million, respectively, were made under sufficient solar or lunar illuminance (referred to as "light reports") to meet the established illuminance criterion for adequate cloud visibility (Hahn et al., 1995). 3.3. Consistency Checks for Cloud Types, and the Change Code The reports that failed the cloud type consistency checks shown in Figure 1 were discarded. Other inconsistencies are possible which may be correctable or may provide cause to reject the report for cloud type analysis while allowing it to contribute to total cloud analysis. As the synoptic reports were processed, any inconsistency encountered required a change to be made in the report before it was entered into the EECRA. Any changes thus made are noted by assigning a "change code" (IC) to that report. This change code, with values 0 to 9, is given in the EECR (Section 4) so that modifications made to the original report can be identified. The change codes are described briefly in Table 3 along with the frequency of occurrence of each change type. Details of the cloud type processing, which follows the total cloud stage shown in Figure 1, are presented in the form of the FORTRAN code in Table 4. Each segment of the table (delineated by a change code heading) describes the processing of a particular type of inconsistency or change. The changes referred to by IC=1 were discussed in section 3.2. Most of the inconsistencies under consideration have been discussed previously (W86, W88) but are summarized here. For a report to provide useful cloud type information, CL (and usually Nh) must be given. If CL is missing (and was not "corrected" by reference to ww in cases with N=9), then the report cannot be used for cloud type analysis and all cloud type variables are set to -1 for consistency (segment IC_5 in Table 4). There is one situation, the case in which there is middle cloud but no low cloud, shown in the IC_2,4 segment in Table 4, for which CL can be given with Nh missing. In that case Nh should give the middle cloud amount, but in many reports from China in the 1970's, Nh was improperly reported as 0 (W86; see also Section 5.1.4). If there was no high cloud then we can correct the report by setting Nh=N (IC=2). But if high cloud is present, then the value of Nh is indeterminate and set to -1. Here again IC is set to 2 (which differs from the handling in H96 which dealt only with post-1981 data) and the report can be used for determination of cloud type frequency but not amount. Because of these changes, it is possible for an EECR to show CL = 0 with Nh = -1. This special procedure is of little consequence in the EECRA except for the Chinese stations in the 1970's for which it was designed. (Compare fractions of CL=-1 and Nh=-1 in the tables of Appendix F.) The situation is similar in segment IC_3,4. If only high cloud is present, Nh should properly be 0 but is occasionally given the value of N by an observer. This is readily corrected (IC=3). On rare occasions 0 -1) are referred to as "type reports". 3.4. Corrected Values for Cloud Base Heights in 1994-96 Land Data As mentioned in Section 2.1, no cloud height codes in the range h= 4 - 8 occur in the NCEP data from mid-September 1994 through April 1997. (We discovered that somehow height codes 4 had been converted to 0, 5 to 1, 6 & 7 to 2, and 8 to 3.) These erroneous heights occur in the NCEP data set but not in the SPOT data. To correct this problem for the EECRA, we compared reports from the SPOT data set to reports from the NCEP data set, and inserted SPOT heights into the EECR when a match was found. (A match was considered to be made if the cloud portions of the two reports were otherwise identical for a station at a given time.) For any EECR that could not be matched to a SPOT report (NCEP data contain more reports than SPOT data), h was set to -1. In this way about 80% of the erroneous reports were corrected in the EECRA for the affected months. 3.5. The Amounts of Middle and High Clouds The synoptic code contains two cloud amount variables, N and Nh. The amount of low cloud, if present, is directly specified by Nh. While the amounts of upper level clouds (middle and high) are not specified directly, they may often be inferred. Thus when CL=0, the amount of middle cloud is given by Nh, and when CL=CM=0, the amount of high cloud is given by N. If clouds are present at all three levels, the upper cloud amounts cannot be determined from the report. If clouds are present at just two levels, the amount in the higher of the two levels may be estimated if the extent of overlap is assumed. The EECR provides amounts for which the random overlap assumption was used, where necessary, to estimate the actual cloud amounts (the fraction of the sky occupied by a cloud type, whether visible or not). The EECR also gives the non- overlapped amounts which require no assumptions but which represent only that portion of the upper level cloud visible from below. [Satellite-derived cloud amounts given by ISCCP (Rossow and Schiffer, 1991) are the non-overlapped amounts seen from above.] The random overlap assumption was justified for vertically separated cloud layers by Tian and Curry (1989). Table 6 gives our method, in the form of FORTRAN code, for determining the actual and non-overlapped amounts of middle and high clouds from a synoptic weather report. This table differs slightly from that in H96 in order to accommodate the possibility of Nh=-1 with CL=0 (lines 5-, 6+ ,24+, 35&). A few points should be noted. The random overlap equation (lines 17 and 38) is invoked only when Nh<7. Table 7, which gives the outcomes of the possible combinations of N and Nh in the equation, shows that only two outcomes are possible for the higher cloud amount when Nh=7, namely 0 and 8 oktas, making this a highly inaccurate determination (W86). In these cases we therefore leave the higher cloud amount undetermined. However, if the higher cloud is Ns, we assume maximum overlap and assign to Ns amount the value of N (lines 13-14, Table 6). In this case the nimbostratus cloud layer is likely to be adjacent to or continuous with the low cloud, so the maximum overlap assumption is more appropriate (Tian and Curry, 1989). Also, certain arbitrary decisions are sometimes required, such as our choice, in line 7 of Table 6, to allow middle cloud to be computed when CH=/. This choice is justifiable since such a case tends to occur with large N so that any error induced in this situation would be small. The numbers of reports processed through each path in Table 6 are listed in Table 8. Light reports (for which the illuminance criterion was met) and dark reports (for which it was not) are both shown for comparison. Non-overlapped (NOL) amounts were computable in more than 90% of the cloud-type reports since one can know that a cloud cannot be seen even if one does not know whether it is present. Thus the non-overlapped amount of an upper level cloud is frequently zero. Percentages are not explicitly shown in the table but it can be seen that upper level clouds are reported present more frequently in the set of light reports than in the set of dark reports (38% and 30%, respectively, for land middle clouds, and 40% and 25% for ocean middle clouds, for example). When upper clouds are present, they are more frequently computable within the set of dark reports and the random overlap assumption (ROL) is less often required. [This is an artifact of the lessened ability to distinguish different clouds under conditions of poor illumination and we recommend that dark reports not be used to develop cloud climatologies, as explained in the next section.] Upper level clouds, when reported present, are less likely to be computable in an ocean report than in a land report and are more likely to require ROL because low level clouds are nearly always present over the oceans. (The percentages given here merely represent the fractions of reports within the data set and are not area-weighted global averages.) 3.6. Determination of Cloudiness at Night The ability of surface observers to adequately detect clouds at night has been questioned for many years (e.g. Riehl, 1947). In an attempt to find a practical solution to this "night-detection-bias", Hahn et al. (1995) analyzed ten years of nighttime data for the latitude band 0-50o N and plotted reported cloud cover as a function of the illumination due to moonlight, which depends on the phase and altitude (angle above or below the horizon) of the moon and on the distance of the moon from the earth. The total cloud cover reported at night increased as the lunar illuminance increased up to a certain threshold, then leveled off. This threshold value of moonlight is referred to as "the illuminance criterion" and can also be satisfied by the twilight produced by the sun at about 9 degrees below the horizon. Thus the illuminance criterion is met when either the sun is at an altitude greater than -9o or the phase and altitude of the moon are such that its illuminance exceeds the threshold value of 0.11. These conditions were determined for each report using an ephemeris program together with the latitude, longitude, year, month, day, and time of the report. This illuminance criterion was applied in analyses of total cloud cover and clear-sky frequency (Hahn et al., 1995). Application of the illuminance criterion increased the computed global average total cloud cover at night by about 4% and thus increased the daily average computed cloud cover by about 2%. Diurnal cycles of total cloud cover over land, which typically show daytime maxima, were reduced in amplitude when compared to previous studies which did not use the illuminance criterion (W86). Over the oceans, the increased computed nighttime cloud cover was often sufficient to result in nighttime maxima, in contrast to the daytime maxima previously reported (W88). Preliminary surveys conducted in conjunction with the present work suggest that we should expect similarly dramatic effects on analyses of middle and high clouds but little effect on low clouds. Because of the importance of moonlight in the detection of clouds at night, parameters relating to the illuminance criterion are included in the edited cloud report (Section 4). Reports for which the illuminance criterion is met are referred to as "light reports", as opposed to "dark reports" (for which it is not met) or "all reports" (both light and dark). 4. THE EXTENDED EDITED CLOUD REPORT AND THE DATA ARCHIVE 4.1. Contents and Format of the EECR Table 9 shows the variables included in the EECR, the number of characters allocated for each, and the maximum and minimum values allowed. Each item in the table is discussed briefly below. Sample reports selected from ship and land data (mostly from December 1981 and January 1982 with a few from earlier years) are provided in Table 10. These reports are in the order in which they appear in their respective files (see next section) though these selections are not consecutive within the file. The reports are numbered in the table for convenience. Item 1: The first item in the report gives the year, month, day and GMT hour of the report, with two characters allotted for each. There are no spaces ("3", for example, is given as "03") so that the entire item can be read as a single integer. Only the last two digits of the year (1900's) are given. Months are coded as 01 through 12, representing January through December. Item 2: The IB variable ("B" for "brightness") indicates whether the illuminance criterion of Hahn et al. (1995) was satisfied (IB=1) at the time and place of the report or not (IB=0). This variable can be checked in lieu of SA and RI (items 19 and 20 below) if one accepts the criterion recommended by Hahn et al. (1995). Item 3: The latitude is given in degrees to two decimal places and written as a 5-digit integer, so it must be divided by 100 to obtain the actual latitude. Actual values range from +90 to -90 for 90N to 90S, respectively. In land reports for 1971-76 the second decimal place is always "0". Item 4: The longitude is given in degrees to two decimal places and written as a 5-digit integer, so it must be divided by 100 to obtain the actual longitude. Actual values range from 0 to 360E. In land reports for 1971-76 the second decimal place is always "0". Item 5: For land stations, ID is the 5-digit WMO station number (WMO, 1977). For ships, ID makes use of only 4 digits, the first 3 of which contain the card deck number (Slutz et al., 1985), while the last digit is the "ship type" (Appendix G) provided in the COADS data report but is of questionable reliability. Item 6: This parameter indicates whether a report is from a land station (LO=1) or a ship (LO=2). Items 7-13: These weather and cloud variables are coded as specified by WMO (1988) except that items 11 and 12 have been "extended" as described in Section 3.1 and Table 2. Also, cases of N=9 (item 8) that were not discarded have been converted to N=8. Any such conversion is recorded in the "change code" (item 18 below). The value "-1" indicates missing data. Item 8 (N) does not obtain a value of -1 in this data set since all reports with N=/ were discarded during processing. Item 10 (h) may have a value of 9 only when a cloud is present since h was set to -1 in cases of N=0 (Figure 1). Items 14-15: These variables give the "actual" cloud amounts of middle and high clouds, determined with use of the random overlap equation if necessary (Section 3.5). Values are given in oktas to two decimal places and written as 3-digit integers, so they must be divided by 100 to obtain the actual values. An actual value of 9 (coded value 900) indicates missing data. Items 16-17: These variables give the "non-overlapped" amounts, in oktas, of middle and high clouds; i.e. the amounts visible from below (Section 3.5). A value of 9 indicates missing data. Item 18: The change code indicates whether a change was made to the original report during processing. Code values are defined in Table 3. Two digits are allotted in the EECR for IC but the codes used here require only a single digit. A change code of 0 means that no change was made other than the trivial change of converting /'s to 0's in the case of N=0 in the post-1981 data. Examples of reports with each change code are provided in Table 10. Items 19-20: These variables give the solar and lunar parameters needed to determine the illuminance provided by the sun or moon for the date, time and location of the report (Section 3.6). SA is the altitude of the sun above the horizon, given to a tenth of a degree (divide the coded value by 10 to obtain the actual value). RI is the relative lunar illuminance defined by Hahn et al. (1995): RI= F sin(A) (R2/r2), where A is the lunar altitude, r is the earth-moon distance, R is the mean earth-moon distance, and F is the lunar phase function which varies from 0 to 1 in a concave shape such that a half moon is only 8% as bright as a full moon (Hahn et al., 1995, Figure 1). The illuminance criterion of Hahn et al. (1995) is satisfied (IB=1, item 2) when SA-9o or RI>0.11. A negative value of RI means the moon was below the horizon. Items 21-28: These relate to weather variables other than clouds. Since these are not our main focus, we did not do an in-depth analysis of these variables but performed basic quality control checks. The COADS group (Slutz et al., 1985) did extensive quality control testing on the ship data. We merely blanked (set to our missing-value code) variables whose values were coded as "trimmed" from the COADS summary data. For the land data, we checked whatever quality indicators were available (these are different in the SPOT and NCEP data sets), blanked those indicated to be "bad" or "inappropriate" (e.g. pressure given for an elevation other than sea level), and blanked variables outside the ranges shown in Table 9. Examples of a variety of cases are provided in Figure 10. A few comments about the individual variables follow: Item 21: SLP. About 95% of the SPOT pressures were indicated to be for sea level. It was very rare for any of these to be out of range. The NCEP quality mark was usually missing and about 5% of the values were out of range. Item 22: WS. NCEP wind speeds are given in knots and we converted them to m/s for the EECRA. SPOT data contained both knots and m/s (about 50% each) and were handled according to an indicator. In the land data, a very few cases of wind speed > 99.9 were converted to 99.9. (There are only 231 such cases in the entire data set, most of which occur in reports from the early 1970's. An incorrect indicator in a SPOT report could result in an interpreted speed to be either twice or half the true value.) The COADS CMR5 data (1952-79) gave wind as u and v (east and north components) which we converted to speed and direction. Item 23: WD. Wind directions (the direction, in degrees, 0 to 359, from which the wind blows) were originally given to 10's of degrees although ship data taken from the COADS CMR5 will show values to 1o in the EECRA because of conversion from the u,v components. For land and pre-1980 ship data WD is given as 0 when WS is 0. COADS LMRF included a code of 361 ("calm") for wind direction with wind speeds of 0. We retained this. However we converted COADS code 362 ("variable") to -1 for the EECRA. Item 24: AT. Air temperatures were given to 0.1o Celsius in COADS. The SPOT data gave temperatures only to whole degrees; fractional degrees appear in NCEP data beginning in 1982. Item 25: DD. Dew point depression was given as the moisture variable in NCEP and COADS CMR5 data. Dew point temperatures were given in SPOT and COADS LMRF data and were converted to depression values for the EECRA. A few DD values >70 were converted to 70 for the EECRA. Item 26: EL, SST. For land data this variable gives station elevation (see Section 2.1) to whole meters. For ship data this variable gives sea surface temperature to tenths of a degree. Item 27: IW. For ship data IW=1 means that wind speed was measured and IW=0 means it was estimated or the method was unknown. For land data IW means something different and is less useful. A value of "1" means that the "wind quality mark" (in NCEP source data) signifies that wind data should be retained, while "0" signifies that wind data should not be used (and are blanked in the EECRA). A value of "9" means that the indicator itself is "missing". This is by far the most common case. SPOT data do not have this indicator and EECRA values for IW are always "9" for land data prior to 1977. Item 28: IP, IH. For ship data IH=1 means that cloud base height was measured and IH=0 means it was estimated. IH=9 means the method was unknown. For land no similar variable was available so we make use of the available space by giving the quality flag used to evaluate SLP in item 21, calling it "IP". For SPOT data the value is usually "1" while for NCEP data it is usually "9". A value of "0" is associated with SLP=-1. (Items 27 and 28 could have been omitted in favor of a 4-digit value for the year.) 4.2. Organization of the Archive The EECR data are divided into 841 files, one for each month for 26 years of land observations (Jan 1971 to Dec 1996) and 45 years (plus one month) of ocean observations (Dec 1951 to Dec 1996). Within each month the reports are sorted first by time, and then by station number for land and by latitude and longitude for the ocean, as described in Section 2. File sizes range from 30 to 93 MB for land data and from 5 to 15 MB for ocean data. In addition there are 4 ancillary files: XSTATY, YSTATY, XSTALL, and LLFR5C which are described in Sections 5.3 - 5.5 below. 5. COMMENTS ON USE OF THE DATA Based on our experience and the experience of other pre-release users of the EECRA, we think the following comments are essential. 5.1. Computing the Average Cloud Amounts and Frequencies The determination of frequencies of occurrence and average cloud amounts from surface observations requires special considerations to avoid several potential biases and to obtain representative values. Upper-level clouds present special problems because they are sometimes partially or completely hidden from the view of the observer by lower clouds. These issues are discussed in detail in W86 and W88 but will be summarized here. 5.1.1. Total Cloud Cover Total cloud cover is basically the sum of the values of N in the synoptic code (converted to percent if desired) divided by the number of contributing reports. However, to avoid the day-night sampling bias, which arises because more observations are made in the daytime than at night, some method of equalizing the contribution of reports between day and night is necessary. This is discussed in Section 5.2 below. 5.1.2. Low Cloud Types Of the 227 million light reports suitable for total cloud analysis for land (Figure 1), 223 million have cloud type information (Table 5). For the ocean these numbers are 52 million and 44 million. In the type reports, the amount of a low cloud type (if present) is always given in the Nh variable of the report. The average amount for a particular low cloud type can be obtained, in a manner similar to that for total cloud amount, by summing the Nh values when the type is present and dividing by the number of contributing reports (using the precautions against the day-night bias discussed above). The contributing reports consist of those with CL0 and include reports of N=0 (see Section 5.3 below for avoiding the potential clear-sky and sky-obscured biases). An alternative, but equivalent, method for obtaining the average amount for a low cloud type is to compute the frequency of occurrence (f) of the type (the number of occurrences of the type divided by the number of contributing reports) and the amount-when-present (awp; sum of Nh's divided by the number of occurrences of the type) separately. Then the average cloud amount is: amt = f x awp. This method is described because it is often of interest to know the frequency of occurrence of a type in addition to its amount, because awp tends to be characteristic of a cloud type, and also because this is the method used to compute upper level cloud type amounts. 5.1.3. Upper Level Clouds Cloud type reports do not always contain information about upper level clouds because these clouds may be hidden by an overcast or near-overcast layer of lower clouds. Thus, of the 223 million light-type reports for land (Table 8), only 188 million have information about the middle cloud level (CM0) and 152 million have information about the high level (CH0). Of the 44 million light-type reports for the oceans, 34 million have CM0 and 27 million have CH0. The average amounts of upper level cloud types are obtained as described in the last section: amt = f x awp. When reporting cloud-type frequencies, it is important to state how the frequencies were computed. The term "frequency of occurrence" has sometimes been used by other authors when what was actually computed was "frequency of sightings". To illustrate the difference, consider three surface weather report segments: report number N CH 1 4 1 2 8 / 3 3 0 In these three reports high cloud (Ci) was seen once, so one could say that the frequency of "sighting" is 33%. But in report #2 one does not know whether Ci was present. So only reports 1 & 3 contain information about high clouds. Therefore the frequency of occurrence is 50%. Statistically, this amounts to assuming that the frequency of occurrence is the same when the high level is visible through gaps in lower clouds as when it is not. For land, 84% of the light type reports contribute to computation of frequencies at the middle level, and 68% at the high level. For the oceans these values are 77% and 61%. The degree to which these portions of the data set represent the whole data set for types was studied in W82 and is discussed in W86 and W88. Based on a study of the frequency of occurrence of As/Ac [f(As,Ac)] versus low cloud amount, W88 applied an adjustment to f(As,Ac) which assigned to the cases of CM=/ (15.5% of the type reports for land and 24.5% for ocean) a value that is the average of f(As,Ac) of the reports that have low cloud amounts of 3 to 7 oktas. For high clouds, f was computed only from reports with Nh<7 in order to reduce the partial-undercast bias (W88). Since we want the actual frequency of occurrence of a cloud type, f is computed as the number of times the type was observed divided by the number of reports of CU0 (where CU represents either CM or CH as appropriate). The amount-when-present of an upper cloud type can be determined, when it is reported present (CU>0), only if there are at most two cloud levels present. Furthermore, we do not compute amounts for an upper cloud if it is undercast by a layer that covers 7 oktas or more of the sky (Section 3.5). Therefore awp is computed from an even smaller pool of data than that used for frequency. Table 8 shows that, for land, 77% of the observed (light) occurrences of middle clouds and 74% of the observed occurrences of high clouds are computable. For the ocean data these values are 62% and 46%. Nevertheless, awp computed from these data is probably fairly representative of the actual awp because awp is less variable than f (W86, W88). Any systematic error inherent in the random-overlap assumption would produce a smaller error in computed amounts since this assumption is used for only a fraction of the computable observations. Table 8 shows that over land the random-overlap assumption is used in 39% of the computable observations (light) for middle cloud and in 55% for high cloud. These fractions are larger for ocean data (66% and 72%). A special consideration applies to Ns. Because Ns is defined on the basis of the occurrence of precipitation (Table 2) which does not depend on the visibility of the middle cloud level for its detection, its presence or absence is known for every type report. Thus the number of contributing reports for f(Ns) is the same as that for low cloud types (CL0 and Nh0). However, when present, its amount is not always known and a separate tally (which will be different from that for the As/Ac clouds) must be kept for determining its awp. 5.1.4. Middle-Level Amount-When-Present in China, 1971-79 In the 1970's many Chinese stations reported Nh=0 with CM>0 and CL=0, resulting in unrepresentative awp for middle clouds. This is a violation of the synoptic coding rules which state that when there is no low cloud present, Nh is given the value for middle cloud amount, as was discussed in Section 3.3. We discovered this violation in our earlier work (W86) and dealt with it by using awp computed only from 1980-81 data for the affected grid boxes, because the coding rules were correctly adhered to in those two years. (In that work we observed that interannual variations of awp are generally smaller than those of frequency. Nevertheless, because of this problem, any trends in middle cloud amount computed for these boxes for the 1970's will depend solely on variations in frequency.) Appendix B shows the fifty 5c grid boxes affected. Now that more years of data are available, awp can be computed for those boxes for 1980-1996. In the EECRA, land station reports that had CL=0, CM>0, CH>0, and Nh=0 are assigned the values Nh=1 and IC=2 (Section 3.3), so although they cannot be used for awp, they can still be used for frequency. Of the light-type reports in January 1971, for example, 0.8% fall into this category. 5.2. Avoiding the Night-detection Bias and Day-night Sampling Bias About 55% of all observations, for both land and ocean, are made between 0600 and 1800 local time, producing a potential day-night sampling bias. The night-detection bias is largely eliminated by using only data for which the illuminance criterion is met (Section 3.6). This, however, enhances the day- night sampling bias unless precautions are taken since only about 42% of the observations between 1800 and 0600 local time qualify as "light" (about 98% of the observations from 0600 to 1800 are "light"). In W86, averages were obtained by first forming averages for each of the 8 synoptic hours and then averaging these 8 numbers. For oceans, data are less plentiful and the 3-hourly times often do not have a sufficient number of reports to obtain a statistically reliable average. Hahn et al. (1994) classified each observation into one of two 12-hour periods, 0600-1800 local time ("day") and 1800-0600 local time ("night"), formed two separate averages, and then averaged these two numbers. Note that when using only the light reports (to avoid the night-detection bias) to form monthly averages, only about two weeks of data (surrounding full moon) will contribute to the nighttime average in any single month. Due to this "monthly-sampling error" there will be more scatter in monthly averages from year to year, but multi-year averages should become more statistically representative of climatological means as the number of contributing years is increased. Similarly, seasonal averages should be more representative of an individual season than monthly averages are of an individual month. These considerations of the day-night bias, night-detection bias, and monthly-sampling error apply equally to total cloud analyses and cloud type analyses. However, for fog and precipitation, whose detection does not depend on illumination, all observations may be used. 5.3. Avoiding the Clear-sky Bias and Sky-obscured Bias for Cloud-Type Frequency of Occurrence The clear-sky bias affects only the computation of the frequency of occurrence of cloud types. It does not affect the amount-when-present of the types, nor does it affect the total cloud cover or the frequency of occurrence of fog or precipitation. The clear-sky bias is a potential consequence of the changes to the synoptic code in 1982. To illustrate, consider the potential for the clear sky bias to affect the computation of the frequency of occurrence of low cloud types in land station reports. The frequency of occurrence of a low cloud, e.g. cumulus, is the number of times the cloud type was reported present (CL=1 or 2) divided by the number of times the low cloud level was given in the report (CL0). If a particular station never reports cloud types (call it an "abstaining" station), then CL for that station would always (prior to 1982) be "/" (translated as "-1" in the EECRA). This causes no problem. However, beginning in 1982 all observers were instructed to record CL=/ whenever N=0. Therefore, since 1982 we can no longer count CL=/ with N=0 as a non-report of the low level from an abstaining station; it is most likely a report of clear sky from a conscientious observing station that always reports the low level, and so must be treated as CL=0. But this treatment causes us to use low-cloud reports from the abstaining stations only when the sky is clear, producing the "clear-sky bias" which would cause the computed frequencies of all cloud types to be too low. A similar argument can be made for the case of N=9, which is a consequence of the way we must handle sky-obscured cases as discussed in Section 3.1, and not a consequence of the 1982 code changes, so the sky-obscured bias applies to all years. The result is that an abstaining station would contribute to cloud-type analyses only when N=9, producing the "sky-obscured bias" which would increase the computed frequencies of sky-obscured by fog and precipitation and cause the computed frequencies of other cloud types to be too low. Both of these biases may occur simultaneously. We will first exemplify the magnitude of these potential biases, and then show several methods for avoiding them. 5.3.1. Adjustment Factors A correction factor was derived in the documentation to our earlier data set (the ECRA, H96). The "clear-sky adjustment factor" (AF0) was defined to convert a "raw" frequency of a particular cloud type Fr, into an "adjusted" frequency Fa, as Fa = AF0 * Fr where Fr = Nt / Nr (see Section 5.3.4 for definitions of Nt and Nr). H96 showed that subtracting from Nr the reports of N=0 that came from "abstaining" stations leads to Fa = Nt / (Nr - fb * N0) (1) or AF0 = 1 / (1 - fb * f0) where f0 is the frequency of N=0 among the reports with cloud-type information, and fb is the fraction of the N=0 reports that came from abstaining stations. We can estimate fb by determining the fraction of times cloud types are unreported (CL=/) when clouds are known to be present (1N8). The value of AF0 is equal to one if either fb or f0 is zero; i.e. if cloud type information is always given or if the sky is never clear. The value of fb is greater for ship data than for land data but f0 is less for ship data, so that, globally, AF0 is 1.003 for ships and 1.007 for land (H96). Similarly, the "sky-obscured adjustment factor" can be derived: AF9 = 1 / (1 - fb * f9) where f9 is the frequency of occurrence of N=9 among the reports with cloud-type information. Correction for the sky-obscured bias is more complicated for the cloud type nimbostratus because many cases of N=9 convert to Ns, and corresponding portions of N=9 would have to be removed from Nt as well for this case. Both fb and f9 are smaller for land data than for ship data, and the global averages of AF9 are 1.003 for ships and 1.0003 for land (H96). It turns out that these adjustment factors are very nearly equal to 1 (no adjustment) for most regions of the globe (see Figure 5a in H96 for land and Figure 2 in this report for ocean), although there are some regions of the globe where the biases are large (and may vary from year to year or season to season or day to night). For land one can essentially eliminate the bias by identifying and excluding the stations that do not report cloud types, while for the ocean one can minimize their effects by excluding the most severely affected grid boxes. These options are described in the following two sections. 5.3.2. Land Stations That Do Not Report Cloud Types During processing of the original weather reports in the preparation of the EECRA, information was saved from which a list of land stations that normally do not report cloud types could be prepared. Eliminating these stations from cloud type analyses will essentially eliminate the clear-sky bias and the sky-obscured bias in the 1982-96 data as described above. We identified 939 such stations. To emphasize the importance of this, Table 11a lists the 234 "worst offenders" of those 939 stations. If not removed from cloud type analyses, each of these 234 stations would contaminate more than 9 of the 15 years of data. The table also indicates 23 Canadian stations which could be used for cloud types if analyses were restricted to years prior to 1992. [Several Canadian stations (e.g. 71707 listed in Table 11b) stopped reporting cloud types in 1993.] Selection criteria. During processing of the EECRA, the value of fb (the fraction of reports of CL=/ with N>0) was computed for each station for each month of data. Then fb was examined for all Januarys and Julys for 1982-96 for each station. About 200 stations never report cloud types and always show fb=100%. However, many stations sometimes report cloud types. Any station that had fb20% in any year-month was considered to "fail" for that year-month. Stations with fb<20% were considered to "pass" for that year-month because occasionally the cloud-type portion of the report may be missing even for stations that normally report cloud types (over 1500 stations "failed" at least once). Furthermore, some stations "pass" in some year-months and "fail" in others. Any station that "failed" in at least one fourth of the year-months in which it contained cloud data, for either January or July, was put on the list of stations to exclude for cloud-type analyses. Some statistics. While 939 stations appear on the list of stations to be excluded for cloud types based on the criteria just described, many of the stations actually have data only for a year or two. Over 10,000 different stations were encountered at least once in the 1982-96 data, but only about 7000 stations appear in any single month. In any single month about 3.3% of the stations fail the fb < 20% test. Note that while the clear-sky bias is not at issue in the pre-1982 data because reports of CL=/ with N=0 can be eliminated in the cloud-type processing simply by excluding all reports with CL=/ (Section 3.2), the same is not true for the sky-obscured bias which is a consequence of the cloud-type processing (Section 5.3 above). Thus it would be useful to eliminate these 939 stations from the pre-1982 data as well. [Also, in pre-1982 data we found that the occurrence of CL=/ in reports of N=0 tends to be smaller than the occurrence of CL=/ in reports with 1N8 (fb), suggesting that there is a residual, though small, clear-sky bias in the pre-1982 data.] Ancillary file (XSTATY). An ancillary file is supplied with this archive that lists the 939 stations which should be excluded in cloud type analyses. The format of this file is given in Table 11b, along with selected data records to exemplify the contents of the file. (See also Section 5.4 below.) 5.3.3. Grid Boxes with Ship Reports Unsuitable for Cloud-Type Analyses Because we cannot identify individual ships, as we did Land Stations, to find those that routinely do not report cloud types, we must either make adjustments to the frequency calculations (Section 5.3.1 above and Section 5.3.4 below) or possibly exclude the most offending ocean boxes to reduce the corresponding biases in the post-1981 ship data. What makes this latter option possible is that these adjustment factors are very nearly equal to 1 (no adjustment) for most regions of the globe, while there are limited regions of the globe where the bias is large. This was evident in Figure 5b in H96 which showed AF0 on a 5c grid for ocean boxes. Biases tended to be large in lakes and semi- enclosed seas. Figure 2a here is a reproduction of Figure 5b from H96 but with boxes of large AF0 values removed. The values plotted on the map are (AF0-1) x 100 for clarity of presentation, so that a 2, for example, on the map means 1.02. Figure 2b is a similar 5c grid map but for AF9 values with the same boxes removed. Selection criteria. Table 11c lists 28 grid boxes which should be excluded for cloud type analyses from ship data. The table indicates the general location of the boxes, the fraction of the box that is ocean (lakes are part of "land"), and it gives fb (the fraction of N=0 or N=9 reports that contribute to a bias), and computed values of AF0 and AF9. Boxes were listed if they had AF0 or AF9 1.2 or if either adjustment factor was greater than 1.1 with fb > 0.5. (Not all Caspian Sea boxes were analyzed but all 6 boxes in that area are on the exclusion list.) Because references to regions of the globe that should be omitted for cloud-type analyses from ship observations are given as "B5c numbers" (see Appendix A), it is necessary to have a means of converting latitude and longitude, which are given in the cloud report, to B5c as shown in Table 11c. FORTRAN subroutines provided in Appendix C convert latitude and longitude to B5c numbers and vice versa. There is one class of ship data, the HSST data described in Section 3.2, that can be identified by the card-deck numbers 150-156 (Table 9). These data contain no cloud-type information (CL=-1 in the EECRA) but make up a significant portion of the ship data from 1952 to 1961. Present weather is also missing (ww=-1) in these data and so the few reports that originally had N=9 have been excluded from the EECRA (Figure 1). The HSST data produce no bias in cloud-type analyses if they are excluded either on the basis of CL=-1 or by card-deck number. Consequences of the bias. Figure 2b shows that only 2 of 1473 remaining boxes have AF9 greater than 1.07, with values of 1.13 and 1.12 for boxes along the east coast of North America. Since clear-sky frequencies are generally quite small in the open ocean (W88), AF0 values are close to 1 (map values 0) except in some coastal regions and inland seas such as the Mediterranean where the maximum value of 1.14 is obtained as an annual average (Figure 2a). The examples below show typical values of amount-when-present (awp; not affected by these biases), the "raw' frequency of occurrence (Fr), and amount (amt = Fr x awp) for stratus (St) and cirrus (Ci) cloud types in the Mediterranean region (W88) and show the adjusted frequency (Fa) and corresponding amount for AF0 values of 1.02 and 1.10. Examples of Possible Influence of AF0 on Cloud Type Amounts ___________________________________________________________ Typical Mediterranean with AF0=1.02 with AF0=1.10 Type awp% Fr amt% Fa amt% Fa amt% St 60 0.10 6.0 0.102 6.1 0.11 6.6 Ci 36 0.25 9.0 0.255 9.2 0.28 9.9 ___________________________________________________________ Thus an amount of 6.0% for stratus cloud would become 6.1% after adjustment with AF0=1.02, or would become 6.6% after adjustment with AF0=1.10. Similarly, a cirrus amount of 9.0 becomes 9.2 or 9.9 after these two adjustments, respectively. For many applications, such discrepancies may be acceptable or even within the limits of accuracy inherent in the data. Diurnal variations should not be affected by this bias. Ancillary file (LLFR5C). Supplied with this archive is the ancillary file LLFR5C that lists, for the 1820 boxes of the 5c grid, the associated latitude and longitude of the box center, the land fraction, and a variable which indicates whether the box is pure land, pure ocean, contains a large lake, contains a small island, or is otherwise both land and ocean. The format of this file is given in Appendix D1. The variable LOB is negative for the 28 boxes listed in Table 11c for exclusion from cloud-type analyses of ship data. A map of this variable on the 5c grid is provided in Appendix D2. 5.3.4. Computational Methods That Correct For These Biases Norris (1998b, appendix), while analyzing frequencies of low cloud types from a preliminary version of the EECRA, introduced a scheme that essentially incorporates the adjustment factors discussed in Section 5.3.1 into a routine computational method. The method, though developed for low cloud types, will work as well for middle and high clouds. The method is also general so it can be applied to unbiased data as well. Expressing the adjusted frequency of occurrence of a cloud type t (Fat; from Norris, 1998b) in terms of counts of cloud variables gives Fat = (Nt / NL) * [1 - {(Nc + No) / NA}] (2) where Nt is the number of observations reporting the cloud type t, NL is the number of observations that report cloud types for the level (CL, CM, or CH) of the cloud type t when the sky is not clear or obscured, Nc is the total number of observations with clear sky (N=0), No is the total number of observations with obscured sky (N=9; in the EECRA N=9 is always associated with fog or precipitation), and NA is the total number of cloud observations (including those with CL=/). Note that if the sky is always clear or obscured, NL=0 and the equation fails. Usually this would simply mean that Fat=0 and one would have to decide, by reference to NA, whether to include that value in one's climatology. But if one considers a report of N=9 with precipitation to be the cloud type Ns (Table 2), then one must treat the report as counting towards Nt (for Ns) as well as N=9. Thus, if the sky is always reported as obscured by precipitation corresponding to Ns, then Fat=100%. In such cases it is not possible to know whether the reports are biased. Such a situation should be rare except when averaging very few observations. Expressing equation 1 (Section 5.3.1), and also incorporating AF9, for the adjusted frequency of occurrence of a cloud type t in terms of counts of cloud variables gives Fat = Nt / [Nr - (NTX / NN) * (N0 + N9)] (3) where, for the additional variables, Nr is the number of observations that report cloud types for the level (CL, CM, or CH) of the cloud type t including reports of clear-sky and sky-obscured, NTX is the number of observations in which CL=/ (with 1 N 8), NN is the number of observations with 1N8, N0 is the number of observations with N=0 in the reports with CL0, and N9 is the number of observations with N=9 and precipitation or fog in the reports with CL0. This representation is mathematically equivalent to equation 2; differences lie merely in the quantities that are counted and how they are manipulated. If the sky is always clear or obscured, then NN=0 and one can choose to ignore the right-hand term in the denominator, leaving Fat = Nt / Nr (which is equivalent to Fat = Nt / NA of equation 2 in such cases). In the EECRA all reports with N=9 were converted to N=8 and IC=1 (Section 3.2). Thus to count "N=9", one counts occurrences of IC=1. For land data, these correctional computations can be eliminated by excluding (for cloud-type analyses, not for total cloud analyses unless one wants a uniform data set for both analyses) the 939 stations listed in the ancillary file XSTATY. For ship data one might omit the correctional computations if analyzing only the expanses of the ocean where AF0 and AF9 are 1 (Figure 2). But even if the adjustment equations are employed, we recommend excluding ship data in the regions of the 28 grid boxes listed in Table 11c and shown in Appendix D2 because the biases are so large. 5.4. Land Stations Usable for Analysis of Trends in Cloud Cover Over 12,000 different land station ID's appear at least once in the 1971-96 data. However, only about 6000 stations report routinely throughout most of the 26-year period. To compute trends in cloud amounts it is desirable to have a long period of record. When computing trends for grid boxes it is important to have the same stations contributing in each year, otherwise spurious interannual variability could appear. We have therefore prepared an ancillary file that provides information from which a user can select stations with the most complete time series for computing trends in cloud cover. Ancillary file YSTATY. We first prepared a list of all stations that had any cloud type information in light obs (reports made under conditions meeting our illuminance criteria) for the 1982-96 period. The 939 stations determined to be unsuitable for cloud type analysis (Section 5.3.1) were removed from the list. (There are a few additional stations appearing in the 1971-81 data that report total cloud only, but such stations report CL=/ in those years and do not contribute to the bias.) Associated with each station on the list is the number of Januarys and the number of Julys that had at least 20 such reports, the sum of Januarys and Julys with between 1 and 20 such reports, and the sum of Januarys and Julys that failed the fb<20% selection criterion for cloud types (Section 5.3.1). [The criterion of 20 observations in a month allows a station that routinely reports only once a day to contribute to an analysis but excludes station numbers that appear only spuriously (perhaps due to transcription errors in the station ID). Most land stations, however, report 4 or 8 times a day (W86).] This list became the ancillary file YSTATY. The format is given in Table 12, along with a sample of data records. Using the file. To help clarify the meaning of the quantities given in the file, we will examine a few of the sample data records in Table 12. Station 01001 has at least 20 light-type reports for all 26 years of the period of record for both January and July. By contrast, for station 43268 there are no reports for January (ms=2) and less than 20 reports for one July. Between these extremes, station 44215 has 20 or more reports for 16 Januarys and 15 Julys, has less than 20 for 3 year-months (January or July), and has one year-month that failed the fb<20% test (Section 5.3.1). The maximum value that any station can attain for the variable mx from the 1982-96 data is 6 because if more than 3 Januarys or 3 Julys had failed the fb test (Section 5.3.1), the station was entered in the XSTATY file and excluded from this one. Thus the situation with station 06220 is interesting. It gave no reports for 1982-96 but did give reports for 8 years in the 1971-81 period, most of which are unsuitable for cloud type analysis. Similarly, for station 54523, because of the exclusion criteria used in Section 5.3.1, the single year- month under the mx column must come from the 1971-81 period. Stations such as these two can be excluded from use in trend analyses by selection on m1 and m2 or on mx, although they are acceptable for other analyses (long-term averages, diurnal cycles) because they do not contain the biases of the post-81 data. The YSTATY ancillary file could be used instead of the XSTATY file (Table 11b) to select stations for cloud type analyses. (The choice depends on the application. For example, when selecting stations for trend analysis using YSTATY, there is no point in also consulting XSTATY. YSTATY could also be used to exclude the few additional, pre-1982, abstaining stations which are not among the 939 listed in XSTATY.) In this file the locations of stations are given by their B2c numbers. FORTRAN subroutines for converting the B2c numbers to latitude and longitude and vice versa are provided in Appendix C. Some statistics. The statistics shown in Table 12 indicate that, of the total of 11,586 stations listed in the YSTATY file, 5838 stations have 20 or more light-type obs for 15 or more Januarys or 15 or more Julys. Since these are the stations that will be most desirable to use, Appendix E is provided to show the global distribution of these stations. More strict criteria of completeness of the period of record than shown here could probably be applied with little loss of geographical coverage (but see Section 6.5 for loss of coverage in recent years). 5.4.1. Canadian Stations That Changed Station ID Numbers Between June and July 1977, Canadian stations whose 5-digit WMO Station ID numbers began with 72 or 74 were changed to 71 for the first two digits, leaving the last three digits unchanged. These cases are exemplified in Table 13a. This will cause time series from individual station numbers to appear shorter than in reality for the station. Table 13b shows examples (taken from ancillary file YSTATY) of corresponding new and old station numbers. For example, station 7x926 under List B had 7 Januarys and 6 Julys under the number 72926 and 19 Januarys and 20 Julys under the number 71926, accounting for the 26 years in the land data set. Because of these changes, a station such as 7x072 is not shown on the map in Appendix E even though it actually reported 21 years of "good" Januarys (20 light- type reports per month). With these changes, Canadian stations all now have the prefix 71, while 72 and 74 are reserved for the United States stations. This situation will require some attention when selecting stations for trend analyses. 5.5. Land Station Latitude-Longitude Switches Occasionally an individual station appeared in the source data with different reported latitude-longitude values within a single month. These tended to be clerical errors or round-off discrepancies rather than real station moves. (We did not examine the data for the purpose of determining which, if any, stations actually moved.) In fact, in some cases the latitude-longitude changed several times in a month by switching back and forth between two sets of values. As a consequence, when using latitude and longitude to assign station reports to grid boxes, some reports for a single station will be included in one grid box while other reports for that station are included another box for a single month. This can adversely affect computed interannual variations and trend values (and, to a lesser extent, diurnal cycles and long-term means) computed for affected grid boxes. Examples are given below showing some reports for each of 4 stations whose latitude-longitude switches resulted in changes in the assigned B2c. Station elevation and air temperature are included in the examples because they can sometimes aid in determining a problem with the report. Example 1 shows an omitted minus sign for the latitude of a Southern Hemisphere station. In the EECRA we give longitude as values 0o to 360o East, while in the SPOT data longitude is given as -180o to +180o ,west-to-east, and in the NMC data as 0o to 360o West. The switch seen in Example 2 could result from improper interpretation or transcription of longitude. Latitude and longitude are given to two decimal places in the NMC data but to only one place in the SPOT data. Example 3 shows round-off discrepancies in SPOT data. It is difficult to explain the situation depicted in Example 4 but such seemingly- illogical, multiple switching is common among the stations that show switching. _______________________________________________ Example 1 Antarctica -77.90 314.02 89045 ----------------------------------------------- B2c YrMnDyHr Lat Lon Id Elev temp 77 81112000 77.90 314.02 89045 243 -14.0 7241 81112012 -77.90 314.02 89045 243 -13.0 _______________________________________________ _______________________________________________ Example 2 Great Britain 51.75 358.42 03649 ---------------------------------------------- B2c YrMnDyHr Lat Lon Id Elev temp 694 72041206 51.7 1.4 03649 86 5.0 694 72041209 51.7 1.4 03649 86 7.0 765 72041212 51.8 358.4 03649 86 10.0 765 72041215 51.8 358.4 03649 86 12.0 _______________________________________________ _______________________________________________ Example 3 Jordan 29.48 34.98 40341 ---------------------------------------------- B2c YrMnDyHr Lat Lon Id Elev temp 1931 72011112 29.4 34.9 40341 3 18.0 1932 72011206 29.5 35.0 40341 2 13.0 1932 72011209 29.5 35.0 40341 2 17.0 1931 72011606 29.4 34.9 40341 3 11.0 1932 72011706 29.5 35.0 40341 2 12.0 _______________________________________________ _______________________________________________ Example 4 USSR 62.08 126.70 24758 ---------------------------------------------- B2c YrMnDyHr Lat Lon Id Elev temp 431 72012603 62.1 126.7 24758 244 4.0 431 72012609 62.1 126.7 24758 244 1.0 359 72012703 62.5 127.7 24758 9000 -45.0 359 72012709 62.5 127.7 24758 9000 -43.0 431 72012715 62.1 126.7 24758 244 -49.0 _______________________________________________ Of the 312 months of land data included in the EECRA, 59 months contain at least one station that switches from one B2c to another. The months involved and the number of stations involved for each month are listed in Table 14a. 649 different stations are affected, some in more than one month. The majority of the switches occur in the years 1971-1976, the years for which the SPOT data set is the source. An ancillary file (XSTALL) is supplied with this archive that lists the 649 stations, a recommended latitude-longitude to be assigned to each station, obtained from station reports in months that showed no switching, and the year- months during which each station switched. The format of this file is given in Table 14b. 6. COUNT SUMMARIES 6.1. Distribution of Cloud Reports over the 8 Synoptic Hours Figure 1 and Tables 4, 5 and 8 showed the number of reports processed, deleted and changed, as well as the number of light reports, the number suitable for cloud type analysis, and the number of times upper level cloud amounts were computable. Table 15 shows how the reports are distributed over the synoptic reporting times. Land stations usually report 8 times per day but some do not (notably in the United States and Australia), so that 59% of all reports are made during the 6-hourly times. Ships, however, usually report 4 times per day so only about 11% of the ship reports are for the intermediate 3-hourly times. Having only 4 reports per day, rather than 8, limits the resolution possible in computations of the phase and amplitude of the diurnal cycle. It was also noted (W88) that regional averages formed from 6-hourly ship data may be systematically different from averages formed from i3-hourly data, consistent with a tendency for some ships to give a 3-hourly report only in unusually stormy weather. During the course of the present work we noticed that over the United States, where reports are usually made 6-hourly, the occurrence of N=9 averaged about 7% in January 1978 for the 6-hourly reports but 19% for the i3-hourly reports. A bias is also possible when averaging over a land grid box that has more than one station if stations within one climatic region report 8 times per day while stations within a different climatic region report 4 times per day. 6.2. Distribution of Code Values The histograms in Figures 3a for land and 3b for ocean show the frequency of occurrence of the extended code values for the six cloud variables for light reports in the archive of edited cloud reports. [In these figures N=9 is shown separately from N=8 although N=9 is converted to N=8 (with IC=1) in the EECR.] The shaded areas show the occurrence of precipitation (DRSTs, Table 2). Numerical values for the data shown in these figures are provided in Appendix F2. Several interesting features are evident in these figures. The distribution of codes for total cloud cover N is nearly U-shaped for land but strongly skewed towards the higher amounts for oceans. More than 96% of all precipitation occurs with N7 over land and with N6 over oceans. About 75% of precipitation occurs with Nh6. The most commonly occurring cloud base height code is h=5 (600-1000m) over land but h=4 (300-600m) over oceans. The large frequency of h=9 over land is a consequence of the large frequency of CL=0 so that h=9 often refers to the middle cloud level. The lower panels in Figures 3a and 3b show the occurrences of various cloud types within the three reporting levels. The reports with CL=-1 are not usable for cloud type analysis (2.0% for land data, 14.0% for ocean data). The occurrence of Nh=-1 is slightly greater than CL=-1 (underlined in Appendix F2) because of the processing procedure that allows CL to contribute to frequency determinations when Nh is not available for amounts under some circumstances when only middle clouds are present (Section 5.1.4). Larger fractions of the reports have CM=-1 (17.3% for land, 34.2% for ocean) and CH=-1 (33.1% for land, 47.5% for ocean) because of lower overcast. Thus 98% of the land reports have information about cloud types but only 82% of those have information about the middle cloud level and 66% about high clouds. For the oceans, 86% of the reports have low cloud information but only 60% of those have middle cloud information and 45% have high cloud information. The low cloud type most commonly reported over land is stratocumulus (CL=5). While this type is also common over the oceans, it is exceeded by the cumulus types CL= 1 and 2. About 25% of all precipitation occurs with the stratus cloud CL=7. When CL=7 is reported over land, precipitation is present in 66% of the reports. Precipitation occurs in 34% of the ship reports of CL=7. While 58% (land) and 46% (ocean) of all precipitation occurs with the middle clouds defined to be nimbostratus (CM= 10,11,12), 24% and 37%, respectively, of precipitation occurs when the middle cloud level is not given in the EECR (typically because of low overcast). Because of our definition of Ns shown in Table 2, most of these latter cases must have ww= D or Ts (drizzle, thunderstorms or showers). In the high cloud level, 90% of all precipitation occurs in reports with CH=-1 (high cloud level not reported, usually because of lower overcast). Warren et al. (1988) showed histograms similar to those in Figure 3 for ten spans of years for ship data from 1930 to 1979. Such figures give clues to changes in coding practices throughout the years (and also incorporate possible climate changes and may be influenced by shifts in shipping routes or land station locations). The tables in Appendix F3 thru F7 provide numerical histograms of the cloud codes for periods in the 1990's, 80's, 70's, 60's and 50's from the EECRA. 6.3. Cases of Sky-obscured and Nimbostratus Cloud [Numerical values for this discussion are taken from H96 because some of the values presented were not re-evaluated using the EECRA. Appendix F4 is similar to, though not identical to, the comparable table, Appendix A, in H96 for the 1982 to 1991 data. Values cited differ some from decade to decade.] The occurrence of reports of sky-obscured (N=9) due to fog or precipitation (defined in Table 2) is about 1.5% for land and 3.5% for ocean, with fog (CL=11) accounting for more than two thirds of these values for both land and ocean. These cases of sky- obscured due to fog make up 14% (land) and 48% (ocean) of reported cases of fog (in light reports). Other (non-obscuring) reports of fog are thin fog or fog at a distance but not at the station. Cases of thunderstorms or showers (CL=10) account for only about 3.5% of the reports of sky obscured, and sky-obscured due to thunderstorms or showers make up only 1.6% (land) and 5.8% (ocean) of the light reports of thunderstorms and showers. The remaining contribution to the reports of sky obscured (about 25%) is due to drizzle, rain or snow. Sky-obscured due to drizzle, rain or snow make up 5.1% (land) and 12.9% (ocean) of the light reports of drizzle, rain and snow. Table 16 shows the contributions of the three major paths to the frequency of Ns as defined in the EECRA (Table 2). (Frequencies here are based on light-type reports and are slightly higher than the frequencies quoted in the last paragraph which were based on the total set of light reports.) The largest contributor to Ns in the land data is the path through CM=2,7 (with ww=DRS). For both land and ocean CM=2 is far more important than CM=7 (see CM=11,12 in Appendix F4). The largest contributor to Ns in the ocean data comes through the CM=/ path, which has several contributors itself, the largest being the case of CL=7 with DRS. In the EECRA, we classify reports of sky-obscured due to drizzle, rain or snow as nimbostratus cloud and assign the extended code value CM=10 with IC=1 (Tables 2 and 4). With information provided in the EECR, the user is free to choose any definition of Ns. However, sky-obscuring drizzle is rare, so its assignment has almost no effect on Ns climatology. Excluded from the definition of Ns are the cases of CM=/ with CL=4,5,6,8 and ww=D (more than half of these cases have CL=6). These cases were considered to indicate Ns in our previous climatologies (W86, W88) but after subsequent consideration and discussions with colleagues we concluded that, since drizzle could occur from these low cloud types, the additional inference of Ns above them was inappropriate. Thus the frequencies of occurrence of Ns computed under the current definition will be reduced to about 97% (land) and 90% (ocean) of the frequencies given in W86 and W88. Another change associated with the simplification of our previous definition of Ns involves cases of CL=6,7 with DRS and CM other than 2,7,/. The CL=6,7 in these cases were previously reassigned as Ns, but are left unchanged in the present, simplified definition. This results in a further reduction in computed Ns frequency by factors comparable to those in the last paragraph. Thus the Ns frequencies computed under the present definition may be about 94% (land) and 81% (ocean) of those computed under the previous definition. Note that these percentages refer to the number of reports in the data set which contains a disproportionate contribution of reports from the densely populated northern mid-latitudes and thus do not represent the area-weighted global averages. Note also that the user of this data set is not restricted to the definitions assigned here since all the information necessary for any other interpretation is contained in the edited cloud reports. Note that cases of CM=/ and ww=DRS are not considered to indicate nimbostratus cloud when CL=1,2,3,9 (cumulus or cumulonimbus). 6.4. Distribution of Reports over the Globe To show the global distribution of the reports, numbers (shown as log10) of light-type reports are displayed on a 10c grid (see Glossary in Appendix A) in Figures 4a (land) and 4b (ocean). Numbers from 1 to 9 appear as 0, numbers from 10 to 99 appear as 1, etc. Grid boxes with no light-type reports are blank. 6.5. Decline in USA Synoptic Cloud Reports Since 1982 In Section 2.1 we noted that when NOAA began to close synoptic weather stations in 1982, NCEP began to convert airways hourly data into the synoptic format and included these converted hourlies under the associated WMO station number for the affected stations. Airways hourly cloud reports are given as clear, scattered, broken, overcast, and obscured. These words were converted to N= 0, 3, 6, 8, 9, respectively. Cloud types are also not given in accordance with WMO definitions. These reports were excluded from the EECRA, resulting in a continual decrease in the number of contributing USA stations. Around 1995 many stations converted to the Automatic Surface Observing System (ASOS) which provides no cloud data except for the base height of clouds below 12,000 feet. With the increasing automation of weather stations, synoptic cloud observations became voluntary (and some that were made did not get into the available data archives), and by December 1996 the number of USA stations providing at least 10 synoptic cloud reports for the month had dropped to 27. Figure 5 shows the time sequence of the decreasing number of USA stations that provide synoptic cloud data. Clearly, the "Cloud Hole Over the United States?" (Warren et al., 1991) has become a reality. The United States of America no longer contributes sufficient surface-based cloud observations suitable for future climatic analyses. To ascertain the degree to which other countries have switched to automatic reporting systems in recent years, we computed the ratio of reports contributing for a single season (September-October-November, SON) in 1996 to those contributing in 1979 for grid boxes over the globe. These ratios are given on a 10c grid in Appendix H. Some locations, mostly north of 50o N, have ratios near 0.5. Some grid boxes with only islands (not shown) and one grid box in Antarctica, which may contain only one station, have ratios less than 0.1. At this resolution, the only countries in which synoptic cloud reporting has apparently declined to 20% or less are the United States and New Zealand. In many locations there is an increase in reporting since 1979. Beginning in October 1994, a large number of additional (secondary) stations appeared in the NCEP data set for Australia, accounting for the exceptionally large ratios in that country. 7. HOW TO OBTAIN THE DATA This documentation and the data described herein are available from: Carbon Dioxide Information Analysis Center Oak Ridge National Laboratory Post Office Box 2008 Oak Ridge, TN 37831-6335, U.S.A. Telephone (423) 574-3645 (http://cdiac.esd.ornl.gov/) as NDP-026C or Data Support Section National Center for Atmospheric Research Boulder, CO 80307, U.S.A. Telephone (303) 497-1215 as DS 292.2. The following citation should be used for referencing this archive and/or this documentation report: Hahn, C.J., and S.G. Warren, 1999: Extended Edited Synoptic Cloud Reports from Ships and Land Stations Over the Globe, 1952-1996. ORNL/CDIAC-123, NDP026C, Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, U.S. Dept. of Energy, Oak Ridge, Tennessee. (Also available from Data Support Section, National Center for Atmospheric Research, Boulder, CO.) The archives of our earlier climatologies (Hahn et al., 1988; Hahn et al., 1994) and the accompanying atlases (Warren et al., 1986, 1988) are also available from the same sources listed above. ACKNOWLEDGMENTS Joel Norris and Steve Klein provided valuable assistance in the early stages of production of this data set. They helped to define the most useful additional quantities, other than clouds, to include in the extended report; they spent many hours running our preliminary programs to produce an early version of the EECRA for the ocean, 1952-1992; and they tested the data set in climatic analyses. We thank Gregg Walters, Dennis Joseph, and Roy Jenne for helpful discussions about land data. We give special acknowledgment to Julius London who initiated the cloud climatology project in 1980 and provided guidance for many years. This work was supported by NSF Climate Dynamics (Geosystems Database Infrastructure Program), and NOAA Climate Change Data and Detection Program, under grant ATM- 95-10170, and by a computing grant from NCAR. REFERENCES Hahn, C.J., S.G. Warren, and J. London, 1992: The use of COADS ship observations in cloud climatologies. Proceedings of the International COADS Workshop, H.F. Diaz, K. Wolter, and S.D. Woodruff, Eds., NOAA/ERL, Boulder, CO, 271-280. Hahn, C.J., S.G. Warren, and J. London, 1994: Climatological Data for Clouds Over the Globe from Surface Observations, 1982-1991: The Total Cloud Edition. NDP-026A, Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, Oak Ridge, TN. (Also available from Data Support Section, National Center for Atmospheric Research, Boulder, CO.) Hahn, C.J., S.G. Warren and J. London, 1995: The effect of moonlight on observation of cloud cover at night, and application to cloud climatology. J. Climate, 8, 1429-1446. Hahn, C.J., S.G. Warren, and J. London, 1996 (H96): Edited Synoptic Cloud Reports from Ships and Land Stations over the Globe, 1982-1991. Numerical Data Package NDP-026B, 47 pp. [Available from Carbon Dioxide Information Analysis Center, MS-050, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831-6050.] Hahn, C.J., S.G. Warren, J. London, R.M. Chervin and R. Jenne, 1982: Atlas of Simultaneous Occurrence of Different Cloud Types over the Ocean. NCAR Technical Note TN-201+STR, 212 pp. National Center for Atmospheric Research, Boulder, CO. Hahn, C.J., S.G. Warren, J. London, R.M. Chervin and R. Jenne, 1984: Atlas of Simultaneous Occurrence of Different Cloud Types over Land. NCAR Technical Note TN-241+STR, 216 pp. National Center for Atmospheric Research, Boulder, CO. Hahn, C.J., S.G. Warren, J. London, and R.L. Jenne, 1988: Climatological Data for Clouds Over the Globe from Surface Observations. NDP-026, Carbon Dioxide Information Analysis Center, Oak Ridge National Laboratory, Oak Ridge, TN. (Also available from Data Support Section, National Center for Atmospheric Research, Boulder, CO.) London, J., S.G. Warren, and C.J. Hahn, 1991: Thirty-year trend of observed greenhouse clouds over the tropical oceans. Adv. Space Res. 11(3), 45-49. Norris, J.R., 1998a: Low cloud type over the ocean from surface observations, Part I: Relationship to surface meteorology and the vertical distribution of temperature and moisture. J. Climate, 11, 369-382. Norris, J.R., 1998b: Low cloud type over the ocean from surface observations, Part II: Geographical and seasonal variations. J. Climate, 11, 383-403. Norris, J.R., Y. Zhang, and J.M. Wallace, 1998: Role of low clouds in summertime atmosphere-ocean interactions over the North Pacific. J. Climate, 11, 2482-2490. Riehl, H., 1947: Diurnal variation of cloudiness over the subtropical Atlantic Ocean. Bull. Amer. Meteor. Soc. , 28, 37-40. Rossow, W.B. and R.A. Schiffer, 1991: ISCCP cloud data products. Bull. Amer. Meteor. Soc., 72, 2-20. Slutz, R.J., S.J. Lubker, J.D. Hiscox, S.D. Woodruff, R.L. Jenne, D.H. Joseph, P.M. Steurer and J.D. Elms, 1985: Comprehensive Ocean-Atmosphere Data Set; Release 1. NOAA Environmental Research Laboratories, Boulder, Colo., 268 pp. (NTIS PB86-105723). Tian, L. and J.A. Curry, 1989: Cloud overlap statistics. J. Geophys. Res., 94, 9925-9935. Warren, S.G., C.J. Hahn, J. London, R.M. Chervin and R.L. Jenne, 1986 (W86): Global Distribution of Total Cloud Cover and Cloud Type Amounts over Land. NCAR Technical Note TN-273+STR, Boulder, CO, 29 pp. + 200 maps (also DOE/ER/60085-H1). Warren, S.G., C.J. Hahn, J. London, R.M. Chervin and R.L. Jenne, 1988 (W88): Global Distribution of Total Cloud Cover and Cloud Type Amounts over the Ocean. NCAR Technical Note TN-317+STR, Boulder, CO, 42 pp. + 170 maps (also DOE/ER-0406). Warren, S.G., J. London and C.J. Hahn, 1991: Cloud hole over the United States? Bull. Amer. Meteor. Soc, 72, 237-238. Woodruff, S.D., R.J. Slutz, R.L. Jenne and P.M. Steurer, 1987: A comprehensive ocean-atmosphere data set. Bull. Amer. Meteor. Soc., 68, 1239-1250. Woodruff, S.D., H.F. Diaz, J.D. Elms, and S.J. Worley, 1998: COADS Release 2 data and metadata enhancements for improvements of marine surface flux fields. Phys. Chem. Earth, 23 (in press). World Meteorological Organization, 1977: Weather Reporting/Messages Meteorologiques, Volume A, Stations. ( WMO Publ. No. 9), WMO, Geneva. World Meteorological Organization, 1988: Manual on Codes, Volume 1. (WMO Publ. No. 306), WMO, Geneva.