- July 2007 - fvs.hlp VERSION 2007.07 Keith F. Brill keith.brill@noaa.gov WELCOME TO fvs -- the EMC/HPC forecast verification system The fvs script performs one or more of four functions. These functions are triggered by the presence of a single character within a single character string forming only one command-line input. If no input is given, this help information is printed. The system controlled by fvs consists of a combination of unix C-shell scripts and fortran programs. Here is a summary of the actions caused by individual characters found in the command-line input to fvs: Character Action h Prints out this help information v Prints out version number and change summary u Starts user input for search conditions s Starts the search software for display output c Starts the search software for VSDB output p Starts the graphical display software Entering husp will perform all four functions in that order. Rearranging the input characters will not alter the order of the performance of the functions. A second input may be given on the fvs command line. If the first input is not h, the second input may be any character or string of characters. In this case, the presence of the second input triggers fvs to reset the path that points to the database; fvs will request input from the user to reset the VSDB_DATA environment variable. Entering fvs h followed by one of the parameters listed below gives specific help information on that item. Most of them are "fvs p" mode parameters. adline - graph mode parm panel - graph mode parm border - graph mode parm parsevgf - PARSEVGF help file c_codes - fvs compute codes pbs - General PBS prob file clear - graph mode parm ptype - graph mode parm colors - graph mode parm reflin - graph mode parm ctl_edit - YL's edit trace.ctl revers - graph mode parm device - graph mode parm rstrcs - graph mode parm factor - graph mode parm rwtlin - graph mode parm fvs - this fvs help ryaxis - graph mode parm gd2obs - GD2OBS help file sercor - graph mode parm gdfho - GDFHO help file sigtst1 - significance test1 gdgbs - GDGBS help file title - graph mode parm gdgpbs - GDGPBS help file trace - graph mode parm gdl1l2 - GDL1L2 help file t_text - graph mode parm gdpbsfho - GDPBSFHO help file version - info on fvs versions getter - KB's edit trace.ctl witlin - graph mode parm grphprm - graph parm list wwx - WWX PBS prob file hstclr - graph mode parm xaxis - graph mode parm lablev - graph mode parm xlabel - graph mode parm line - graph mode parm xudlbl - graph mode parm l_text - graph mode parm yaxis - graph mode parm marker - graph mode parm ylabel - graph mode parm mkplot - mkplot_vsdb help yrlbel - graph mode parm mkvsdb - mkvsdb_vsdb help For more information on VSDB data, see the VSDB database documentation below. The fvs script is found in $VSDB_SCRIPTS and must be in $PATH. The $PATH environmental variable must also include the following fortran programs: mkflnm_vsdb mkplot_vsdb mktlst_vsdb mktrcs_vsdb mkvsdb_vsdb mkymrg_vsdb These programs must be specifically built for your workstation and operating system type. Graphical displays are done by GEMPLT, the GEMPAK graphics primitive software library. Function u (user interface for search conditions) This function initiates a script that queries the user for information regarding search conditions for data to be displayed in a plot or a set of plots. Prompting information is given to assist in making choices. The search conditions for a total of up to twenty-four traces may be specified. These traces may be all in one plot or constitute several different plots. This function creates a file called trace.ctl. The user is asked to set search conditions for dependent and independent variables. It is helpful to have a thorough visual concept of the type of plot you want before setting the search conditions. It may be helpful to do a rough hand tracing of the type of graph you want to see before proceeding. The following definitions may help you in setting search conditions. Definitions: categorically binned data plot - a plot whose independent variable is determined by sets of search conditions that define up to 64 cells along the x axis. data combination - the process of adding together data values that occur at the same point on the abscissa (x axis) as determined by the independent variable search conditions dependent variable - numbers whose values are graphed along the ordinate (y axis) of a graph dependent variable search condition - any search condition that determines how data will be found and combined with other data but does not by itself determine where the data would be plotted on the x axis (see independent variable search condition) independent variable - a general term for that which determines positions on the abscissa (x axis) of a graph independent variable search condition - a search condition that determines where data would be plotted in the set of locations along the abscissa (x axis). The "values" along the abscissa may be character strings (e.g., regions, names of models, verification time ranges) or real numbers (e.g., forecast hours, threshold values, level values). plot - a display consisting of a set of one to eight traces that are usually displayed together on the same graph scatter plot - a plot formed by two traces whose dependent variable values are functions of the same independent variable (usually time). In this case, the x and y axis values are both determined by the values of the dependent variable. time series - a plot formed by traces of data whose independent variable is time trace - a sequence of numbers representing values of some dependent variable as a function of an independent variable. Many values may have been combined to make the number at a point on a trace. The user is asked to set some numbers controlling consistency checking, which is done when the search for data is completed. One form of consistency constraint is to make sure that the number of data records contributing to the combined values is the same from point to point along a trace and/or through corresponding points on multiple traces. Another form of constraint is to make sure that the same set of verifying times and values of any dependent variables for which a data combination list exists contributed to the combined values at points on the data traces. The form of the constraint is determined by a single number chosen by the user. Traces identified to be on the same plot are made consistent in accordance with the selected number. Another number determines whether the consistency constraint applies vertically through the traces (with point to point variation) or horizontally along the traces (with trace to trace variation), or both (for total consistency, no variation). It is possible to set a consistency constraint that results in a count of zero values for a trace. In any case, a brief message is printed telling how the data were combined. It is possible to have the software automatically determine the consistency constraint. When this is the case, the user is asked to enter a data loss tolerance percentage. This is the percentage of the data that you would be willing to sacrifice to maintain the consistency constraint. If more than this percentage would be lost in satisfying the consistency constraint, then the consistency constraint is removed, and all data found in the search are used in the data combination. Functions s and c (search the data base) In this mode, fvs makes no queries to the user. It performs a search of the data base to which the environmental variable VSDB_DATA points. The file trace.ctl must exist to provide the search instructions. A long search involving many input VSDB files will go faster if the process of generating the file list can be bypassed. If the search to be executed involves the same time range and models as the previous search, then copy the file named vsdb_files_found into vsdb_files_found.save. The latter file will be used as the source for the file names, thus saving the time required to build the list of file names. The function s generates a file called trace.dat. The function c generates a file called vsdb.dat. You may edit and clone different versions of trace.ctl without actually executing function u. You must make sure that the appropriate file is named trace.ctl in the local directory before initiating function s or c if you are bypassing function u. Editing of trace.ctl should be attempted with great caution. When the search is completed, a summary of the number of VSDB data records found, accepted, and rejected is printed out. The following gives a description of each column of that output: RECORDS This column is the count of the number FOUND of records actually found for this trace. If each point on the trace has the same number of records contributing to it, then this column shows the product of that number and the number of points along the trace. RECORDS This column reports the number of zero PADDED count FHO (forecast, hits, obs) records added to this trace before consistency checking is done so that consistency checking does not fail. For example, if there are no forecasts or observations of precipitation exceeding two inches, then that threshold and higher thresholds could be missing from the data archive. If they are, the software dummies them in as though they were found. The total number added is shown. This column is nonzero only when threshold value is the independent variable for the data plot. TOTAL This column is just the sum of the RECORDS preceding two. NUMBER This column reports the number of VSDB ACCEPTED records accepted after primary consistency checking is complete. Primary consistency checking is done by matching verification times and/or counting the number of records contributing to each point on an individual trace and comparing that number to either or both the corresponding points on the other traces comprising the plot or the other points along the trace. The degree and type of consistency checking is under user control. CONSISTENCY This is the number rejected by consistency REJECTIONS checking. It is the difference between the numbers reported in the last two columns: TOTAL RECORDS minus NUMBER ACCEPTED. ADDITIONAL This column reports additional rejections REJECTIONS that occur when points have to be eliminated from one or more traces because those points do not exist on at least one other trace. This count is non-zero when strict consistency constraints require an exact point for point match among two or more traces. TOTAL # OF This column is the sum of the preceding two REJECTIONS columns. FINAL # This column reports the final number accepted ACCEPTED after all rejections have been subtracted. # MISSING This column reports the number of empty bins POINTS for a categorically binned data plot; other- wise, it contains N/A (not applicable), which also is the case for wild card bin conditions. Empty bins account for disparities in the finally accepted number among traces for which consistency checking is in force, since data totally missing at a point on one trace is not allowed to eliminate consistently matching data on other traces of the same plot set. Function p (plot statistics) This function queries the user for information on how to display a graph of data in trace.dat, generated by applying function s. Help information is provided in this function. Please read instructions carefully before making choices. Function p may be applied repeatedly to data found in trace.dat. The file trace.dat may be saved under another name so that it is not necessary to re-execute function s to view the data. Always make sure that the file trace.dat contains the appropriate data before running function p (fvs p). The search conditions used to find the data in trace.dat are saved at the beginning of the trace.dat file. Other information: Required environmental variables: VSDB= path to VSDB software VSDB_TBL= path to tables used by fvs scripts VSDB_HLP= path to help information used by fvs scripts VSDB_SCRIPTS= path to fvs scripts VSDB_DAY1= the earliest possible day for all data VSDB_DATA= path to VSDB data directories VSDB_TEMP= path to temporary disk space (usually /tmp) Example of setting environmental variables: setenv VSDB /export/hp52/wd22kb/vsdb setenv VSDB_TBL /export/hp52/wd22kb/vsdb/tables setenv VSDB_HLP /export/hp52/wd22kb/vsdb/help setenv VSDB_SCRIPTS /export/hp52/wd22kb/vsdb/scripts setenv VSDB_DAY1 199601010000 setenv VSDB_DATA /export/mmbsrv/usr1/wd20er/data/vsdb setenv VSDB_TEMP /tmp How do I point fvs to my own statistical data base? 1. Make your database on some network accessable disk. a. Each model or forecast system to be verified must have its own directory under the $VSDB_DATA path. b. All of the verification data for a single day must be in a single file whose name is by convention the following: x_YYYYMMDD.vsdb where x is the model name, and YYYYMMDD is the year, month, and day. Note that x is also the subdirectory name and the name preceding / in the model header field (field # 2) in all of the data records in the files. The slash / is not used if no qualifiers follow the model name. Data may also be stored in monthly files named in the following form: x_YYYYMM.vsdb 2. Once several days of data are in the data base, execute the script $VSDB_SCRIPTS/update.files for each of the nine data header fields. This script generates table files used by function u that are tailored to that particular data. These files are . files under $VSDB_DATA. Do the following: a. Execute $VSDB_SCRIPTS/update.files for all fields--- $VSDB_SCRIPTS/update.files ALL b. If you change the contents of one of the header fields in your data, rerun $VSDB_SCRIPTS/update.files for that particular field--- $VSDB_SCRIPTS/update.files n where n = 1, 2, 3, 4, 5, 6, 7, 8, or 9 For more information on header fields, see the database description included below. 3. Make sure $VSDB_DATA points to the correct statistical data base before running fvs. You may have fvs set the data path on the fly by entering any character as a second input on the command line. You will be queried by fvs for the path. 4. Make symbolic links to point to other data directories that contain results to be compared with your data. This step is optional. EMC/HPC fvs INSTALLATION INFORMATION Keith F. Brill keith.brill@noaa.gov 20 April 1998 UPDATED: 20 January 2000 UPDATED: 07 August 2000 UPDATED: 06 March 2003 UPDATED: 03 May 2004 To run fvs, the NCEP statistical database access and display system, the following must be in your path: fvs - the driver C-shell script mkflnm_vsdb - fortran program that makes required VSDB file names (legacy program as of March 2003) mkplot_vsdb - fortran program that plots graphs mktlst_vsdb - fortran program that lists out trace information mktrcs_vsdb - fortran program that reads trace data and combines statistical values for plotting mkvsdb_vsdb - fortran program that reads trace data and combines statistical values for VSDB output mkymrg_vsdb - fortran program that scans a search control file to create a list of models and a list of date-time ranges $GEMEXE/gplt - GEMPAK graphics subprocess $GEMEXE/xw - GEMPAK X-windows driver subprocess $GEMEXE/ps - GEMAPK PostScript driver subprocess $GEMEXE/gf - GEMPAK GIF sdriver ubprocess where GEMEXE is an environmental variable defined to be the path to the GEMPAK executables for your workstation and operating system. The following environmental variables must be set setenv VSDB /complete path to fvs scripts, help, tables, etc. setenv VSDB_TBL $VSDB/tables setenv VSDB_HLP $VSDB/help setenv VSDB_SCRIPTS $VSDB/scripts setenv VSDB_DAY1 199601010000 OR your earliest date setenv VSDB_DATA /complete path to default VSDB data setenv VSDB_TEMP /tmp setenv DDAPP /complete path executable file directories The fortran programs must have been linked to run on your work- station and operating system. The following executables are available: $DDAPP/exe_hp HP OS $DDAPP/exe_sg6 SGI OS 6 $DDAPP/exe_lnx Linux $DDAPP/exe_aix IBM workstations Executing source $VSDB_SCRIPTS/for_fvs on a supported NCEP workstation should setup everything required to run fvs. Note that you may have to redefine VSDB_DATA to look at the data you want. There is a script designed to help you set the correct definition for VSDB_DATA. The script must ALWAYS be invoked by source: source $VSDB_SCRIPTS/set_vsdb_data The script prompts you for input. fvs will invoke this script for you when you enter a second input on the command line. In a non-NCEP environment, modify either (or both) for_fvs_generic (C shell) or for_fvs_generic.sh (Bourne/Korn shell) to set the environment appropriate to your workstation configuration. These scripts are located under $VSDB_SCRIPTS. To build the necessary executable files to run fvs, cd to $VSDB/lib Execute the compile script and then the bld script. For example, to do a build on a LINUX workstation, enter the following: compile lnx ALL bld lnx ALL fvs VERIFICATION STATISTICS DATABASE Keith F. Brill Mark D. Iredell 20 January 2000 MODIFIED: Keith F. Brill 16 Feb 2000 MODIFIED: Keith F. Brill 25 May 2000 MODIFIED: Keith F. Brill 07 Aug 2000 MODIFIED: Keith F. Brill 19 Jun 2002 MODIFIED: Keith F. Brill 12 Mar 2003 MODIFIED: Keith F. Brill 24 May 2005 The fvs Verification Statistics Database (VSDB) is an ASCII text file. Records in the file are separated by linefeeds (X'0A'). The maximum record size is 512 bytes. Each record contains one or possibly more statistic values. The record is defined by blank-separated text fields. The maximum field size is 24 bytes. The fields must not be either null or contain an embedded blank. Fields do not have to line up in columns. All characters are assumed to be upper case. There will be one VSDB file for each day for each model. The naming convention for the file will be name_YYYYMMDD.vsdb, where name is the model name, and YYYYMMDD is the year, month, and day. This will include all verifications done on the particular day YYYYMMDD for that model. The models will placed in separate directories. The directory name must agree with name_ in the .vsdb file name. Also permissible is name_YYYYMM.vsdb. All directory names and file names must be composed entirely of lower-case characters. VSDB files may be compressed, in which case .Z is appended to the file name, or gzipped, in which case .gz is appended. A directory may contain a mix of .gz, .Z, and regular text files. When a search is done, the files will be uncompressed or gunzipped into a directory named vsdb under $VSDB_TEMP. $VSDB_TEMP must be large enough and have write permission. If the directory vsdb does not exist, it will be created with permission open to all. If vsdb already exists, it must have write permission. All files in the existing vsdb directory will be deleted before the search commences. Any files placed in this directory will remain after the search is done. The first set of fields consist of the header fields. The header fields identify the verification statistic(s). There are usually 11 header fields but more can be added compatibly. The contents of the header fields should conform to the standards below: Header field 1 : (char) verification database version Header field 2 : (char) forecast model verified Header field 3 : (char) forecast hour verified Header field 4 : (char) verifying date Header field 5 : (char) verifying analysis or observation type Header field 6 : (char) verifying grid or region Header field 7 : (char) statistic type Header field 8 : (char) parameter name Header field 9 : (char) level description Header field 10-- Not yet defined Following the header fields is a separator field consisting of a single equals sign (=). Separator field : (char) = The next set of fields consist of the data fields. The first data field is typically the number of values used. The following data fields are one or more statistics values. The statistic type header field infers the order of the statistic values. A missing value for the data fields is -1.1e31. All data values must have no more than 9 digits following the decimal point (e.g., 24.123456789 is valid, while 24.1234567891 is not valid). Data field 1 : (real) number of values used (gridpoints or obs) Data field 2 : (real) actual statistic value(s) Optional data fields. Examples: V01 AVNB 24 1996090100 FNL NHX ACORR(1-20) Z P500 = 3600 94.32 V01 ERL 36 1996090100 MB_PCP G211 FHO>2.5 APCP/24 SFC = 6045 .40 .50 .30 V01 ETAX 24 1996090100 MESO G211 TENDCORR SLP MSL = 10000 77.77 V01 ECM 24 1996090100 FNL NHX RMSE Z P1000 = 3600 -1.1E31 V01 AVN 12 1996090100 AIRCFT/GOOD NHX RMSE T P250-200 = 3600 1.4321E+00 If the same entry in the header field is found in the next VSDB record, then it may be replaced by ". This allows for more compact VSDB files. The " may be repeated for that header field in following VSDB records until the entry changes. HEADER FIELD STANDARDS Header field 1: verification database version V01 Vnn Future versions Header field 2: forecast model verified AVN Aviation forecast model AVNX AVN Parallel X AVNY AVN Parallel Y AVNZ AVN Parallel Z AVNU AVN Parallel U AVNV AVN Parallel V AVN? AVN future parallel ? BAWX HPC Basic Weather Desk COM Combined Ensemble (SREF) ECM European Center Model ECMWF European Center Model ENSnn Ensemble member nn ENSxx Ensemble product xx ETA Early Eta model ETAL Eta Parallel L ETAV Eta Parallel V ETAX Eta Parallel X ETAY Eta Parallel Y ETA? Eta future parallel ? FNL Final GDAS GFS Global Forecast System KFETA Kain-Fritsch Eta LFM Limited Fine Mesh Model MEDR Medium Range Forecast Desk (HPC) MESO Mesoscale model MRF Medium-range forecast model MRFX MRF Parallel X MRFY MRF Parallel Y MRFZ MRF Parallel Z MRFU MRF Parallel U MRFV MRF Parallel V MRF? MRF future parallel ? NGM Nested-grid forecast model NOGAPS Navy Global Model PER/X Persistence from model X analysis QPF/nnn HPC quantitative precipitation forecast RSM Regional Spectral Model RSMH Hawaiian Regional Spectral Model RUC Rapid Update Cycle SPEC Spectral Model SREF Short Range Ensemble SRMEAN Short Range Ensemble Mean TSname Hurricane model UKM United Kingdom Met Office Model (UKMET) UKMET United Kingdom Met Office Model (UKMET) USRname User-defined experiment nnn Forecaster number identifier ##s/nnn HPC Snowfall probability forecast The model name may be followed by an optional slash preceding a grid number, indicating the output grid from which the model data was interpolated for the verification. For example, eta/212 implies that grid 212 was the source of the eta model data used in the verification. The slash may also be followed by a qualifying character string. For example, AVN/ANL would be the AVN analysis as opposed to the AVN initialization, both of which would be assigned forecast hours of 00. Ensemble member names may follow the slash, e.g., ETA/N1, ETA/CTL, ETA/P1. Members may also be MEAN, MED, BST, OPL, or others. Users must consult ensemble model developers for specific definitions of these qualifiers. Qaulifiers following a slash may be wildcarded in specifying fvs search conditions by terminating the header field with / followed by nothing (end of line). This feature should not be used if consistency constraints are set for the search. For best results when consistency constraints are set, explicitly list the elements to be found and combined in creating the trace.ctl file. The model name may be followed by an optional @ sign preceding a two-digit model cycle time, e.g., ETA@12, for the 12Z run of the ETA model. The HPC snowfall forecast is denoted by ##s/nnn, where ## is the product identifying number (e.g., 93, 94, or 98) and nnn is the reference number of the forecaster who made the forecast. The HPC QPF forecast is denoted by QPF/nnn, where nnn is the the forecaster reference number, if available. Header field 3: forecast hour hhhh.d/w where hhhh.d is the hour in the forecast that lies at the midpoint of the time interval w. w is usually the interval over which observations have been interpolated. The interval between forecasts used in the interpolation is w/2. If w is zero, then /w is omitted, and no time interpolation is implied (e.g., grid-to-grid verification). Header field 4: verifying date yyyymmddhh.d/w/i where yyyymmddhh.d is the beginning, midpoint, or ending of a time interval of width w hours with a data increment of i, which gives the time interval in hours between the data times contributing to the stored statistical result. If /w/i are absent, they are both assumed to be 0. If w is preceded by a plus (+) sign, then yyyymmddhh.d is the beginning of the time interval. If w is preceded by a minus (-) sign, then yyyymmddhh.d is the ending of the time interval. Otherwise, unsigned w indicates that yyyymmddhh.d is the midpoint of an interval w hours in duration. Note that .d is the fractional part of an hour and may be omitted if it is 0. To standardize certain commonly requested time searches, the following conventions are imposed for yyyymmddhh.d/w: hh.d = 12.0 and w=24 implies entire day yyyymmdd ddhh.d = 15xx.0 and w = 730 implies entire month yyyymm mmddhh.d = 0215xx.0 and w = 2190 implies first quarter of yyyy mmddhh.d = 0515xx.0 and w = 2190 implies second quarter of yyyy mmddhh.d = 0815xx.0 and w = 2190 implies third quarter of yyyy mmddhh.d = 1115xx.0 and w = 2190 implies last quarter of yyyy mmddhh.d = 0401xx.0 and w = 4380 implies first half of yyyy mmddhh.d = 1001xx.0 and w = 4380 implies last half of yyyy mmddhh.d = 0701xx.0 and w = 8760 implies entire year yyyy mmddhh.d = 0115xx.0 and w = 2190 implies climatological winter season for yyyy mmddhh.d = 0415xx.0 and w = 2190 implies climatological spring season for yyyy mmddhh.d = 0715xx.0 and w = 2190 implies climatological summer season for yyyy mmddhh.d = 1015xx.0 and w = 2190 implies climatological fall season for yyyy Here xx is the valid hour of the averaged forecasts. The search software will look for these specific criterion on request for daily, monthly, quarterly, semi-annually, annually, or seasonally tagged data. The search software will NOT make any attempt to decide whether a specific yyyymmddhh.d lies within intervals defined in the data base using w. It will, however, be able to match the string yyyymmddhh.d/w/i. It will only discriminate on the basis of w when daily, monthly, quarterly, semi-annually, annually, or seasonally tagged data is requested. The i in d/w/i will be used as a search criterion only. If it is not present in the data field, it will be assumed to have a zero value. Header field 5: verifying data source or analysis Any name that can be used in field 2 plus: MB_PCP Mike Baldwin's Precipitation Analysis ADPUPA Conventional upper-air ADPSFC Conventional surface AIRCAR ACARS AIRCFT Conventional aircraft ANYAIR Any upper-air data source ANYSFC Any surface data source CDTP GOES cloud-top pressure COOP Cooperative observer network CSQ Combination of COOP, surface, and QPE data ERS1DA ERS Scatterometer data GCDTT GOES cloud-top temperature GLBANL Global Analysis HPC/SFC NCEP/HPC surface analysis Knnn Observation PREPRO type nnn ONLYSF Surface data verified against 2/10-m forecast data PROFLR Profiler QPE Quantitative Precipitation Estimates RUC2 Rapid Update Cycle analysis SATEMP Satellite radiances SATWND Satellite winds SFCSHP Conventional marine SPSSMI SSM/I SREF_VF Verifying analyses used for SREF TOCC Total cloud cover TOCC_THR Total Cloud Cover with thresholds VADWND VAD WSR88D wind profiles A verifying data source name may be followed by /Knnn, where nnn is observation type number. The SREF_VF refers to the analyses used to verify the Short Range Ensemble Forecast (SREF) fields. These are derived from various data assimilation systems, including EDAS, GDAS, and stage-2 precipitation analyses. SREF verification is a grid-to-grid comparison. A data quality flag may be entered after each verifying data type. The following flags are standard: /GOOD -- useable data /BAD -- rejected data /GOOD may be omitted when only GOOD data is used. The data searching software must match these verbatim. Header field 6: verifying grid or region Bnnnnn Buoy, where nnnnn is the buoy number CONUS Continental United States EASTR Eastern half of CONUS WESTR Western half of CONUS GBL Global NHX Northern hemisphere extropics (20N-80N) SHX Southern hemisphere extropics (80S-20S) TRO Tropics (20S-20N) Gnnn NCEP grid GRIB type nnn Gnnn/SUBSET NCEP grid subset RFC4KM 4-km resolution RFC grids Rnnnnn Rawinsonde station nnnnn Rxxnnn Rawinsonde set xxnnn USRname User-defined grid USRname/SUBSET Subset of user-defined grid x:y:c:d Zonal band from longitude x to longitude y centered on latitude c, d degrees of latitude wide Grid 104 SUBSET 3-character names: 1 ATC (1) Arctic verification region 2 WCA (2) Western Canada verification region 3 ECA (3) Eastern Canada verification region 4 NAK (4) Northern Alaska verification region 5 SAK (5) Southern Alaska verification region 6 HWI (6) Hawaii verification region 7 NPO (7) Northern Pacific Ocean verification region 8 SPO (8) Southern Pacific Ocean verification region 9 NWC (9) Northern West Coast verification region effective 6/03 10 SWC (A) Southern West Coast verification region effective 6/03 11 NMT (B) Northern Mountain verification region effective 6/03 12 SMT (C) Southern Mountain verification region 12r GRB (C) Great Basin Verification region effective 6/03 13 NFR (D) Northern Front Range verification region 13r SMT (D) Southern Mountain verification region effective 6/03 14 SFR (E) Southern Front Range verification region 14r SWD (E) Southwest Desert verfication region effective 6/03 15 NPL (F) Northern Plains verification region effective 6/03 16 SPL (G) Southern Plains verification region effective 6/03 17 NMW (H) Northern Midwest verification region 17r MDW (H) Midwest verfication region effective 6/03 18 SMW (I) Southern Midwest verification region 18r LMV (I) Lower Mississippi Valley region effective 6/03 19 APL (J) Appalachians verification region effective 6/03 20 NEC (K) Northern East Coast verification region effective 6/03 21 SEC (L) Southern East Coast verification region effective 6/03 22 NAO (M) Northern Atlantic Ocean verification region 23 SAO (N) Southern Atlantic Ocean verification region 24 PRI (O) Puerto Rico & Islands verification region 25 MEX (P) Mexico verification region 26 GLF (Q) Gulf of Mexico verification region 27 CAR (R) Caribbean Sea verification region 28 CAM (S) Central America verification region 29 NSA (T) Northern South America verification region 30 GMC (U) Gulf of Mexico Coast effective 6/03 MDA ( ) Middle Atlantic subset of NEC (used by HPC 5/04) The sequence numbers and alpha-numeric characters inside parentheses refer to the labelling of the points within the regions on grid 104 displays. The lowercase "r" following some sequence numbers above indicate that the 6/03 modification of the regions resulted in this item replacing the region previously defined for that number. The Grid 104 subset regions were modified effective 1 June 2003. The modifications are described as follows: 1. The GMC region was added by extracting territory from the SPL and SMW regions. 2. The SMW region was extended further to the north, taking territory away from NMW, and renamed LMV. 3. The APL region was extended slightly further towards the southwest, taking a little area away from the old SMW region. 4. The NPL and SPL regions were shifted westward. 5. Both the NFR and SFR regions were eliminated, yielding some territory to the plains regions and some to the mountains. 6. The NWC region was extended to south of the San Francisco Bay area and was narrowed slightly east to west. 7. The SWC region yielded some territory to the new Southwest desert region. 8. A new Great Basin region was added by combining area from the former southern and northern mountain regions. HPC Phase error SUBSET names: SEUS SouthEastern US: (28,-93)->(40,-65) CEUS Central Eastern US: (34,-93)->(46,-65) NEUS NorthEastern US: (40,-93)->(52,-65) SCUS South Central US: (28,-110)->(40,-82) CCUS Central Central US: (34,-110)->(46,-82) NCUS North Central US: (40,-110)->(52,-82) SWUS SouthWestern US: (28,-128)->(40,-100) CWUS Central Western US: (34,-128)->(46,-100) NWUS NorthWestern US: (40,-128)->(52,-100) HPC Snow/Ice accumulation verification regions: NE - Northeastern US MA - Middle Atlantic US SE - Southeastern US AP - Appalachian Mountains MW - Midwestern US GP - Great Plains NR - Northern Rocky Mountains SR - Southern Rocky Mountains DSW - Desert Southwestern US SGB - Southern Great Basin NGB - Northern Great Basin NPC - Northern Pacific Coast SPC - Southern Pacific Coast HPC PMSL verifcation regions: MRDG Continental US NPS grid MRDG/PNW Pacific northwest MRDG/PSW Pacific southwest MRDG/DSW Desert southwest including southern CA MRDG/IMN Northern inter-mountain region of western US MRDG/IMS Southern inter-mountain region of western US MRDG/GPN Northern Great Plains MRDG/GPS Southern Great Plains MRDG/MVN Northern Mississippi Valley including Great Lakes MRDG/MVS Southern Mississippi Valley MRDG/ANE Northeastern US MRDG/ASE Southeastern US These regions will be defined in a table file. Header field 7 : statistic type ACTIVE statistic types consist of numbers from which other statistics can be computed by the display software. PASSIVE statistic types consist of pre-computed numbers that can only be found and displayed. ACTIVE TYPES: ESL1L2/n n-member ensemble mean L1 L2 norms for scalars EVL1L2/n n-member ensemble mean L1 L2 norms for vectors FHO<>* F,H, and O (three values), where F = Forecasted fraction above/below threshold H = Correct fraction above/below threshold (hits) O = Observed fraction above/below threshold GBS|<>/range General Brier Score for single threshold GBS|c1|c2|... General Brier Score with multiple categories, where c1, c2, etc., are category definitions ML1L2(*) L1 and L2 values plus MAE for Scalars (6 values) MTRK Midlatitude storm track errors PBS_94E Partitioned Brier score for HPC excessive rainfall guidance PBS_ENS:n/# Partitioned Brier score for n-member ensemble PBS_WWD/# Partitioned Brier score for HPC winter weather desk forecast verification (7 values) PBS_WWX/# Partitioned Brier score for HPC 93s, 94s, and 98s forecast verification (7 values) PBS_xxx/# Partitioned Brier score for arbitrary product xxx PHSE:# Truncated Zonal trig phase and amplitude errors for scalars (10 values) RPS|<> Ranked Probability Score for defined categories RPS/# RPS for # number of undefined categories SAL1L2(*) Anomaly L1 and L2 values for scalars (5 values) SL1L2(*) L1 and L2 values for Scalars (5 values + optional ones) SSAL1L2(*) Standardized Anomaly L1 and L2 values for scalars (5 values) RHET Ranked histogram array of probabilities that observed or analyzed data falls within intervals of values determined by the ensemble members, including the tails TTRK Tropical storm track errors VAL1L2(*) Anomaly L1 and L2 values for vectors (7 values) VL1L2(*) L1 and L2 values for Vectors (7 values) VSAL1L2(*) Standardized Anomaly L1 and L2 values for vectors (7 values) PASSIVE TYPES: ACORR(*) Anomaly correlation ACORWG(*) Anomaly correlation for waves 1-20, 1-3, 4-9, 10-20 AVGFR(*) Forecast mean AVGOB(*) Observed mean BIAS(*) Forecast mean minus obs mean CORR(*) Correlation FARR False alarm rate for ROC curve GDET GFS legacy deterministic statistics LRPS Legacy Ranked Probability scores MAXE(*) Maximum difference PODR Probability of detection for ROC curve RLE/# Relative Location Error RMDIF(*) RMS & MEAN differences (see below) RMSESP Ensemble mean RMS error and ensemble spread RMSE(*) Root Mean Square Error S1(*) Skill score for gradients SDERR(*) Standard deviation of error = (forecast - obs) SDFR(*) Standard deviation of the forecasts SDOB(*) Standard deviation of the obs TENDCORR(*) Tendency correlation RHNT Ranked histogram array of probabilities that observed or analyzed data falls closest to each ensemble member VLCEK Vlcek's statistics group (see below) ???(*) User defined B<>* Bias above/below threshold CSI<>* Critical Success Index above /below threshold ETS<>* Equitable threat score above/below threshold FAR<>* False alarm rate above/below threshold PA<>* Postagreement above/below threshold PF<>* Prefigurance above/below threshold POD<>* Probability of detection above/below threshold TS<>* Threat score above/below threshold The qualifier parenthetically enclosed following the statistic type may be any character string. The qualifier is optional. The searching software must match both the parameter name and the qualifier to find the statistic values. SREF relates to verification of the Short Range Ensemble Forecast (SREF) products. The scalar anomaly L1L2 data are composed of five numbers in addition to the data count: MEAN [f-c], MEAN [o-c], MEAN [(f-c)*(o-c)], MEAN [(f-c)**2], MEAN [(o-c)**2]. For standardized anomalies, the deviations from the mean are divided by the standard deviation. The scalar L1L2 data are composed of five numbers in addition to the data count: MEAN [f], MEAN [o], MEAN [f*o], MEAN (f**2), MEAN (o**2). In these expressions, f are forecast values, o are observed values, and c are climatological values. The SL1L2 data may optionally have a sixth number following the data count value. In that case, the sixth number is the mean absolute error, MEAN (|f-o|). If the sixth number is present, an optional group of four more values is recognized as the terms needed for the x- and y- direction S1 score components, defined as follows: MEAN [|DXf-DXa|], MEAN [max(|DXf|,|DXa|], MEAN [|DYf-DYa|], and MEAN [max(|DYf|,|DYa|]; where DXf is the difference between two adjacent points in the x direction on the forecast grid, DXa is the corresponding difference between two adjacent points in the x direction on the analysis grid, and DYf and DYa are similar differences for the y direction. So, the SL1L2 record may have a maximum of 11 values including the data count value. The vector anomaly L1L2 data are composed of seven numbers in addition to the data count: MEAN [uf-c], MEAN [vf-c], MEAN [uo-c], MEAN [vo-c], MEAN [(uf-c)*(uo-c)+(vf-c)*(vo-c)], MEAN [(uf-c)**2+(vf-c)**2], MEAN [(uo-c)**2+(vo-c)**2] The vector L1L2 data are composed of seven numbers in addition to the data count: MEAN [uf], MEAN [vf], MEAN [uo], MEAN [vo], MEAN [uf*uo+vf*vo], MEAN [uf**2+vf**2], MEAN [uo**2+vo**2] The ML1L2 statistic type data record is composed of the 5 scalar L1L2 norms followed by the MEAN (|f-o|), the mean absolute error. In the case of ensemble L1L2 (e.g., ESL1L2, EVL1L2) norms, the usual scalar and vector L1L2 values are followed by the variance of the members about the ensemble mean. This variance is n-1 weighted, where n is the number of ensemble members. The individual variances for each contributing forecast are combined over all ensemble forecasts represented by the VSDB record. These values, weighted by the number of forecasts for each VSDB record, are added together when VSDB records are combined by fvs. The combined variance is obtained by dividing by the total number of forecasts. In other words, this variance is treated just like any other L1L2 norm. An optional additional value may be included in these records. It is the fraction of ensemble members lying three or more standard deviations from the ensemble mean. The scalar ESL1L2 data are composed of six or seven numbers in addition to the data count. The vector EVL1L2 data are composed of eight or nine numbers in addition to the data count. Note that the statistic type determines whether vector or scalar treatment is appropriate in the computation of the following statistical quantities for SAL1L2, VAL1L2, SL1L2, VL1L2 and their ensemble equivalents: variance and standard deviation of forecast values variance and standard deviation of observed values root mean square error bias covariance correlation In the case of thresholds, the information is not enclosed in parentheses, but it is given as a real number preceded by either < or >, according to whether the value is an upper bound or a lower bound, respectively. The statistic types followed by <>* in the listing above must ALWAYS be accompanied by a threshold qualifier. The searching software will be binning this kind of data on the basis of the thresholds. Note that F, H, and O can be used to compute FAR, TS, ETS, POD, PA, B, PF, and CSI. Diagnostic software will be included to compute the latter from the former, which MUST be stored under the FHO statistic type. The values of F, H, and O are always entered as decimal values between 0 and 1.0. The number of events is simply the product of the value and the count. The S1S statistic type denotes S1 score component means after the count value N: DATA Field Contents 1 Number of contributing verification events 2 Mean of |dF-dA| 3 Mean of MAX (|dF|,|dA|) where dF is the difference between adjacent forecast points and dA is the corresponding difference between adjacent analysis points. For each pair of points compared, the maximum difference is used to compute the mean in the denominator. The RHNT is the array of probabilities of ensemble members being nearest the verification. RHET is the array of probabilities of the verification falling into bins defined by the ensemble members ordered from lowest to highest value. If there are Nm ensemble members, there will be Nm probabilities associated with RHNT, but Nm+1 probabilities associated with RHET. In both cases, the last probability is omitted from the record since it can be computed as a residual. The probabilities must always sum to 1. The data record for RHNT contains the following: DATA Field Contents 1 Number of contributing verification events 2 probability analysis closest to ensemble member # 1 3 probability analysis closest to ensemble member # 2 ... Nm probability analysis closest to ensemble member # Nm-1 residual = probability analysis closest to ensemble member $ Nm The data record for RHET contains the following: DATA Field Contents 1 Number of contributing verification events 2 probability analysis less than lowest ensemble member 3 probability analysis between lowest and next lowest ... Nm+1 probability analysis between next highest and highest residual = probability analysis greater than highest ensemble member The residual is one minus the sum of the probabilities stored from element two onward. The residual itself is not stored. RHNT is a passive statistic type; while RHET is an active type from which the probabilities of the analysis being, below, above, within, or outside the range of the ensemble members may be computed. The PHSE:# statistic type denotes a group of error values related to the difference between a forecast and an analysis in a truncated zonal band. The # represents the wave number within the zonal band to which the record applies. The wave number is not optional, it must be present for accurate documentation. PHSE:# may be followed by /WL, where WL is the wavelength to the nearest whole kilometer. /WL is optional. Computationally, meridional averaging reduces the band to a single one dimensional array of numbers which are subjected to trigonometric approximation allowing phase and amplitude error calculation for those wavelengths making comparable contributions to the total variation across the zone. The data record contains eleven data fields: DATA Field Contents 1 "1" (the data count is always one) 2 Phase error (PE, KM) 3 Amplitude error (AE, units of data) 4 Forecast variance contribution * PE 5 Forecast variance contribution (units**2) 6 Analysis variance contribution * PE 7 Analysis variance contribution (units**2) 8 Analysis phase angle (APA, degrees) 9 Analysis amplitude (AA, units of data) 10 Total forecast variance (units**2) 11 Total analysis variance (units**2) GBS|<>/range is a general Brier score with optional threshold specifications and an optional probability range specification. The latter option is to be used for single-threshold (two category) situtations. If the GBS statistic requires specification of multiple categories, then the | character is used to separate the category descriptions. When the | character is used, threshold checking is turned off, so multiple use of < and > is permitted to define the categories. The categories are given in order following the GBS. For example, GBS|>:100 means that category one is all events greater than or equal to 100, and category two is all events less than 100. (Note that : is used in place of = because = has special meaning as a separator between header and data information in a VSDB record.) The /range specification is provided to allow computation of the observed probability for a given range of forecast probabilities. This is intended for the case when there are only two categories. If ranges are specified, they can be combined for a total Brier score by terminating the GBS|<> statistic type with a forward slash (/) in setting search conditions, because, when nothing follows the forward slash, the search will accept all data to be combined. The optional threshold follows the > or < sign. In setting search conditions for GBS, the threshold should always be included in the statistic type, and no separate threshold search condition should be set. No explicit threshold checking is done when the | is present. The data record allows for multiple categories of outcomes: DATA Field Contents 1 Number of verification events (N) 2 Category 1 mean product of observed & forecast probabilites 3 Category 1 mean of squares of forecast probabilities 4 Fraction of N observed in category 1 5 Category 2 mean product of observed & forecast probabilites 6 Category 2 mean of squares of forecast probabilities 7 Fraction of N observed in category 2 8 Category 3 mean product of observed & forecast probabilites 9 Category 3 mean of squares of forecast probabilities 10 Fraction of N observed in category 3 11 Category 4 mean product of observed & forecast probabilites 12 Category 4 mean of squares of forecast probabilities The fraction of N observed in the last category is computed as a residual and should not be included in the data record. For example, if there are only two categories, there would only be 6 entries in the data field. The number of categories is the number of entries divided by 3. The Brier score is computed as follows: GBS = .5 * SUM ( MEAN (F*F) - 2 * MEAN (F*O) + Fraction ) where the SUM is over all categories. The perfect GBS is 0, the worst is 1. RPS|<> is the ranked probability score, where category thresholds are given in the same way as for the GBS statistic type described above. RPS/# is also allowed, where the # simply gives the number of categories and, therefore, the number of probability values constituting each individual forecast. The |<> and /# suffixes are optional and may be omitted. The data record contains the following: DATA Field Contents 1 Number of contributing verification events (N) 2 Ranked probability score 3 Positive RPS fraction 4 Climatological ranked probability score (optional) PBS_ENS:n/# is the partitioned Brier score statistic type for ensembles. The number of members is given by n. The threshold specification replaces #. This statistic type is used for making reliability diagrams and computing the Brier Score and its decomposition terms. The record entries are as described for PBS_WWX below except that more risk or probability categories are allowed. If there are M categories, then there are 2*M data fields. The first is always the number of events. The fraction of events with no risk forecast is always computed as a residual. The last M/2 data values are fractions of events correctly forecast, one for each probability category. Probabilities for the categories may be assigned in a file when scores are computed. Otherwise, the probabilities are computed internally with equal values assigned to categories 2 through M-1. Values of 0 and 1, respectively, are assigned to categories 1 and M. PBS_WWX is a single threshold probability verification using the Partitioned Brier (PB) score for HPC's winter weather forecasts. The threshold is given following a slash postfixed to the statistic type identifier. The data record contains these eight values: DATA Field Contents 1 The number of verification events 2 Fraction of events with low risk forecast 3 Fraction of events with moderate risk forecast 4 Fraction of events with high risk forecast 5 Fraction of events >= threshold, no risk forecast 6 Fraction of events >= threshold, low risk forecast 7 Fraction of events >= threshold, moderate risk forecast 8 Fraction of events >= threshold, high risk forecast The probabilities of exceeding the threshold for each risk category are 0, .25, .5, and 1.0 for no, low, moderate, and high risk, respectively. These default values may be overridden by specifying them in a file named pbs_wwx.prob. The file must have the following structure: HEADER RECORD - can be anything NO_RISK_PROBABILITY 0.0 LOW_RISK .25 MODERATE_RISK .50 HIGH_RISK 1.00 With the data values and these probabilities, the PB score can be computed using the following equation: PBS = SUM [ n(r) * ( f(r)**2 - 2*f(r)*p(r) + p(r) ) ] where n(r) is the number of events in category r divided by the total number of events over all categories, p(r) is the number of events in category r observed to exceed the threshold divided by the total number of events in category r, f(r) is the forecast probability of exceeding the threshold associated with the risk category r, and the summation is over risk categories, r. The perfect PBS is zero, the worst possible PBS is 1. The observed frequency or probability of observations exceeding the threshold (p(r)) can be displayed for each risk category. Skill scores like those computed from FHO values can be computed from the PBS_WWX numbers. ACORWG is composed of five numbers, the first being the data count. RMDIF is composed of five numbers: the data count, RMS (f-a), MEAN (f-a), RMS (f-c), and MEAN (f-c), where f is forcast, a is analysis, and c is climatology. The Relative Location Error (RLE) records a displacement of the forecast relative to the observation. Descriptor field element 6 gives the direction from the point of observation to the forecast location using one of the following designations: N - north NNE - north northeast NE - northeast ENE - east northeast E - east ESE - east southeast SE - southeast SSE - south southeast S - south SSW - south southwest SW - southwest WSW - west southwest W - west WNW - west northwest NW - northwest NNW - north northwest 0 - 0 error RLE is usually followed by /n where n quantifies the forecast and observation. The data field contains the following: DATA Field Contents 1 The number of reports contributing to the record 2 The relative location error in kilometers 3 The latitude of the observation 4 The longitude of the observation The VLCEK statistic type denotes a group of statistical values preceded by a data count value of 1. The values, in the assumed order, are: S1 BIAS SDERR RMSE AVGOB SDOB These parameter names are defined in the list above. All are "passive" parameters. The VLCEK data base begins at 1200 UTC on 1 October 1977 and ends at 1200 UTC on 31 December 1999, with the possibility that it may be continued into the year 2000. The data are roughly monthly values generated by the SUMAC program. The passive statisic type, GDET, is defined for legacy deterministic statistics. The data record consists of the following eight values: pattern anomaly correlation for waves 1--3 (PAC/1-3), PAC/4-9, PAC/10-20, PAC/1-20, RMSE for the actual forecast, bias (mean error) for the actual forecast, RMSE for the forecast anomaly, and bias for the forecast anomaly. These values follow a data count value of 1; thus, the data record has nine values. Two passive statistic types are defined to support display of ROC diagrams for ensembles. The PODR and FARR statistic types are the probability of detection and false alarm rate, respectively. Each data record consists of a list of probabilities, one for each ensemble member, following a data count value of 1. The data depth of a record currently permits 23 members. The ROC diagram is displayed as a scatter plot using code 9002 to turn the data fields into two traces, one for PODR and one for FARR. The RMSESP statistic type is defined for comparing ensemble spread to the RMS error of the ensemble mean. Each data record consists of three values: 1) the data count = 1, 2) the root-mean-squared error of the ensemble mean, and 3) the ensemble spread. This is a passive statistic type. The LRPS statistic type accomodates legacy Ranked probability scores. The data record consists of three values: 1) the data count = 1, 2) the ranked probability score, and 3) the climatological ranked probability score. This is a passive statistic type. The MTRK and TTRK statistic types denote means associated with storm track position and intensity errors. The data field has the following contents: DATA Field Contents 1 Number of contributing storms, usually 1 2 Latitude (-90 -> +90) of observed position 3 Longitude (-180 -> +180) of observed position 4 Direction (0 -> 360) from north at observed position to vector pointing along great circle arc toward forecast position 5 Distance in km along great circle arc from observed to forecast position 6 Distance (5) squared 7 X-component (km) of vector position error in standard meteorological coordinates 8 Y-component (km) of vector position error in standard meteorological coordinates; direction(4)=arctan (x/y) and distance squared(6)=x**2+y**2. 9--1 SL1L2 norms for central pressure (mb) 14--18 SL1L2 norms for maximum wind speed (m/s) -End of MTRK Record- 19--23 SL1L2 norms for 850 vorticity maximum (/s) Header field 8 : parameter identifier APCP/12 12-h Accumulated total precipitation APCP/24 24-h Accumulated total precipitation APCP/nn nn-h Accumulated total precipitation BPCP Precipitation covering broken area CFR Cloud fraction CPCP/12 12-h Convective precipitation CPCP/24 24-h Convective precipitation ER Excessive Rainfall (HPC) H Height above ground level HI Heat Index HIAVG Daily mean heat index HIMAX Daily maximum heat index HIMIN Daily minimum heat index IA/xx Ice accumulation over xx hours (inches) Knnn NCEP parameter GRIB type nnn Kxxxxx 5-character NCEP (Russ Jones) identifier PMSL Sea level pressure PxxI xx-h Accumulated total precipitation (inches) Q Specific humidity QPF Quantitative Precipitation Forecast RH Relative humidity S Snowfall SF Snowfall SF/xx Snowfall over xx hours (inches) SLP Sea level pressure SPCP/12 12-h Grid scale precipitation SPCP/24 24-h Grid scale precipitation T Temperature (sensible) TV Virtual temperature U U wind component V V wind component VWND Vector wind WDIR Wind Direction WSPD Wind Speed Z Height ZR Freezing Rain Note that accumulation or averaging periods follow the parameter name with / as the separator. For records using the MTRK and TTRK statistic types, the parameter identifies the storm. Header field 9 : level identifier Bx-y Constant pressure depth boundary layer Dx-y Depth Hx-y Height above ground level Px-y Pressure Sx-y Sigma Tx-y Potential temperature Zx-y Height ATMOS Entire atmosphere FRZDN Lower freezing level FRZUP Upper freezing level MSL Mean Sea Level MWND Maximum wind SFC Surface TROP Tropopause where x-y gives the bounding values of the levels for a layer. If -y is not given, then a single level value is specified. For B, D, H, P ,S, T, and Z, either x or x-y must ALWAYS be specified. For records using the MTRK and TTRK statistic types, the level identifier is reserved by the single letter X. Data field : count followed by data value(s) For data combination, the count will always multiply the data value before summing. The counts will be summed also.