Skip to Main Content
Peer Review Program
HomeUnited States Department of Transportation, HomeContact UsSite Map
Travel Model Improvement Program - TMIP
About TMIPTMIP ServicesClearinghouse, selectedConferences and CoursesContactsTravel Model DiscussionsTRANSIMSLinks

BROWSE

SEARCH

ORDER DOCUMENTS

NEWSLETTER


DOCUMENT NAVIGATION

Previous Section
Table of Contents
Next Section

Nonresponse in Household Travel Surveys

3. STATISTICAL METHODS FOR REDUCING THE IMPACT OF NONRESPONSE

3.1 CHAPTER SUMMARY

This chapter describes statistical methods for reducing the impact of nonresponse on survey estimates. Although the best method for reducing the impact of nonresponse is to achieve high response rates in the first place, the methods described here can help reduce the effects of any remaining nonresponse. Unless post-survey adjustments are made, estimates from the survey are likely to be distorted by nonresponse. The magnitude of the bias introduced by nonresponse depends on the proportion of the sample for whom data are not obtained and the amount of difference between the respondents and the nonrespondents. Two forms of nonresponse are usually distinguished-unit nonresponse (in which no data are obtained at all) and item nonresponse (in which one or more items are missing from an otherwise completed interview or questionnaire). The statistical consequences of unit and item nonresponse are the same; both forms of nonresponse can reduce the sample size and introduce bias. However, the most effective methods for compensating for the two forms of nonresponse are different.

STATISTICAL METHODS

Two statistical methods can be used to reduce the effects of unit nonresponse:

Nonresponse adjustments should always be made. Post-stratification should be used when accurate population figures are available.

The main tool for reducing the effects of item nonresponse is imputation. It is useful to distinguish "logical" imputation (in which the value of a missing item is inferred from other data) and statistical imputation (in which a statistical procedure is used to predict a value for the missing item). Logical imputation, or editing, should be carried out before statistical imputation is used. "Hot deck" imputation is an especially useful statistical method for imputing missing values in travel surveys. In this form of imputation, similar cases are grouped into cells. Missing values are replaced using the value provided by a "donor" from the same cell; the donor is simply another sample member who provided an answer to the relevant item.

3.2 TYPES OF NONRESPONSE

As we noted in Chapter 1, most discussions of nonresponse distinguish between two forms of nonresponse- unit nonresponse and item nonresponse. [1] Unit nonresponse refers to the failure to obtain questionnaires or data collection forms (such as the travel diaries used in many personal transportation surveys) for a member of the sample. Item nonresponse refers to the failure to obtain a specific piece of information from a responding member of the sample. Item nonresponse is often used interchangeably with the term "missing data."

Although their causes are different, unit and item nonresponse have identical statistical consequences. Unless adjustments are made to the data, the level of nonresponse bias will depend only on two factors- the proportion of the sample for whom data were not obtained and how much the respondents differ from the nonrespondents. It does not matter whether the data are missing because a sample member never responded to the survey at all or because the respondent failed to answer a specific question. However, different methods are typically used to reduce the impact of the two different forms of nonresponse.

UNIT NONRESPONSE

Unit nonresponse refers to complete nonparticipation by a member of the sample. In travel surveys, data are usually collected for both households and persons. There are, therefore, two types of unit nonresponse:

These two types of unit nonresponse are equivalent for surveys that only select one individual from each sample household. Because most personal transportation surveys collect data on every household member, in the rest of this report we assume that all eligible household members are asked to participate.

For many variables, person-level analyses are appropriate. This results in a straightforward pattern of unit nonresponse, since it is relatively easy to decide whether a person is to be considered a respondent. However, if household- level variables are of interest, it may be unclear how to treat the household if some but not all persons from that household provide data. In Chapter 1, we recommend treating the household as a respondent if the majority of its eligible members are classified as respondents and if a set of critical items are available about the household. Even if the household is considered a respondent, some items could still be missing due to item nonresponse.

ITEM NONRESPONSE

The second type of nonresponse is item nonresponse. This is when there are missing items in an interview or questionnaire that is otherwise completed by a respondent. The data could be missing because the respondent refused to answer (e. g., a question on income), the interviewer failed to ask the question (e. g., he or she may have skipped a question accidentally), or the respondent simply missed the question (e. g., he or she may have forgotten to complete the back of one page of a diary).

Missing trips. When data on one or more trips or activities are excluded from a travel diary, these trips can be treated as missing data. It is often difficult to spot these missing trips or activities unless the previous and next trip or activity form a chain of trips. If this is the case, the ending point for one trip and the starting point for the next trip may be different, suggesting that there is at least one missing trip between them. If undetected, this type of item nonresponse will result in underestimates of the number and distance of trips.

Missing items about a trip or activity. Items missing could include travel times, trip purpose, or the mode of transportation. It may be possible to fill in some or all of the missing information through careful editing of the data. Such editing involves looking at the context in which the trip occurred. Editing will, however, often fail to remove all of the item nonresponse within a reported trip or activity.

3.3 IMPACT OF NONRESPONSE

Nonresponse has two negative consequences for the quality of the estimates derived from the survey. First, it can reduce the number of cases for whom data are available. As a result, the survey statistics will not be as precise. In many surveys, however, the sample size is fixed. Additional cases may be fielded to compensate for those lost through nonresponse. Second, and more important, substantial bias can occur if the nonrespondents are different from the respondents on the characteristics of interest. This bias cannot be corrected with additional cases.

REDUCED SAMPLE SIZE

Reductions in sample size due to nonresponse have a direct effect on the variability in the statistics derived from the survey data. Unit nonresponse rates are as a rule higher than item nonresponse rates. But the effects of the two compound in reducing the sample sizes. This point is illustrated in Table 3.1. For example, if 1000 responses were sought and there was no nonresponse at all, the standard error for a proportion with a mean of 50 percent would be 0.0158. However, if the unit response rate were 70 percent and item response rate were 90 percent, we would only have 630 complete responses for a particular item (variable), giving a standard error of 0.0199, an increase of 26.0 percent. If the item response rate were to further fall to 80 percent, the standard error for the same variable would be higher, 0.0211, which is an increase of 33.6 percent due to nonresponse. Thus both unit nonresponse and item nonresponse contribute to higher standard errors- that is, decreased precision in the estimates from the survey.

Table 3.1 Impact of Nonresponse on Sample Size and Standard Errors
Table 3.1 Impact of Nonresponse on Sample Size and Standard Errors
Note: S. E. refers to the standard error of an estimate.

INTRODUCTION OF BIAS

The most important effect of nonresponse is the bias it produces. Nonresponse has an impact similar to that of excluding a portion of the population from the sampling frame. In both cases, a possibly nonrandom portion of the population is omitted from the study. This creates the potential for bias, and the effects of this bias can be substantial. Because of the omission of the nonrespondents, the sample no longer represents the entire population of interest. If the analysis ignores the effects of nonresponse, it implicitly assumes that nonrespondents do not differ systematically from the respondents on any characteristic of interest. To the extent that the nonrespondents do differ from the respondents, the results will be biased.

Differences between respondents and nonrespondents. Nonrespon-dents are rarely randomly distributed in the survey population. As we saw in Chapter 2, response rates vary widely across population subgroups and the survey variables are often associated with the characteristics of these subgroups. For example, nonresponse rates are usually much higher in cities than in the suburbs and preferred modes of transportation can differ in the two types of settings. Because they underrepresent city dwellers, travel surveys may underestimate the number of trips made by bus or subway. In one study, nonresponse led to the underestimation of the size of certain segments of the study population by 50 percent. (2)

Nonrespondents in transportation surveys. In some respects, nonrespondents in transportation surveys have similar characteristics to nonrespondents in other national surveys. They are more likely than respondents to live in densely populated areas, to have low levels of education, to be elderly, and to have visual or hearing difficulties which prevent them from completing surveys. In addition, language barriers for those whose native language is not English, mental and physical handicaps, and poor literacy skills can contribute to nonresponse. The 1995 NPTS pretest found 4 percent of the age 18 and older respondents used a proxy because of language problems.

In addition, members of other groups may be more prone to become nonrespondents in transportation surveys than in other surveys. For example, the elderly may feel that because they leave the house less frequently than younger persons, their data are not needed for transportation surveys (see Chapter 2).

Level of nonresponse bias. For means (and proportions), the magnitude of the bias resulting from nonresponse depends on the response rate (RR , as defined in Chapter 1) and the difference in the means (or proportions) for the respondents X bar sub R) and nonrespondents (X bar sub NR)1,1 :

equation

Although there is no hard-and-fast rule about when nonresponse bias represents a serious threat to a survey, many national surveys carried out for the federal government achieve response rates of 80 percent or higher. Thus, when the response rate is low (e. g., less than 60 to 70 percent), the potential for bias is high; the mean for the respondents will be a good estimate for the mean of the whole population only if the respondents and nonrespondents are similar. If this assumption is wrong, then the unadjusted mean will produce biased estimates for the whole population.

It is important to realize that the validity of doing nothing at all about nonresponse rests on the (often implausible) assumption that the respondents and nonrespondents do not differ. Every compensation procedure- including doing nothing- rests on assumptions about the characteristics of the nonrespondents, but some assumptions are more reasonable than others. In estimating means or totals, the simplest assumption is that the mean for the respondents is equal to the mean for the nonrespondents. An alternative assumption is that the missing data are missing at random. Under this assumption, the respondents and nonrespondents may differ in any given sample, but across all possible samples the means for the two groups are the same. Unfortunately, these simple assumptions are rarely tenable in practice. When they are incorrect, even relatively low levels of nonresponse (10 percent to 20 percent) can produce significant bias. Table 3.2 illustrates how the bias arises. It shows how the proportion estimated from survey data (e. g., the proportion of the study population using public transportation regularly) can be distorted by nonresponse.

When 20 percent of the sample become nonrespondents, and 20 percent of the respondents have some characteristic of interest versus 50 percent of the nonrespondents, the sample estimate will be 20 percent (based on the respondents) but the unbiased estimate (based on both the respondents and nonrespondents) will be 26 percent- a difference of 6 percent. (The unbiased estimate for the entire population is .20 for the 80 percent of the population represented by the respondents and .50 for the 20 percent of the population represented by the nonrepondents. Combining the two groups yields .80 x .20 plus .20 x .50, or 0.26). This margin of error- reflecting the nonresponse bias- is likely to be several times larger than random sampling error. More important, the error produced by nonresponse will not be random.

Table 3.2 Nonresponse Bias, by Level of Nonresponse
Table 3.2 Nonresponse Bias, by Level of Nonresponse
Note: Bias is the difference between the expected value of a sample statistic and the population characteristic it is intended to estimate.

3.4 SURVEY METHODS

This section discusses various survey procedures intended to minimize the impact of nonresponse.

AVERTING NONRESPONSE The best single method for reducing the effects of nonresponse is to have as little nonresponse as possible in the first place. The methods recommended in Chapter 2 all help achieve high response rates, thus minimizing the potential effects of nonresponse bias on survey estimates. Still, every survey incurs some nonresponse and the effects of nonresponse at the different stages of data collection cumulate. For example, if 90 percent of the sample households are successfully screened, 90 percent of those return a completed diary, and the diaries include information about 90 percent of the trips on average, then the cumulative response rate across all three stages is only about 73 percent (. 9 x .9 x .9 = .729). Despite high response rates at each stage, there is still room for considerable bias. Thus, other procedures are likely to be useful in reducing the impact of nonresponse even when the response rate is quite high.

KEEP NONRESPONSE RATES AS LOW AS POSSIBLE -

Recommendation: Use the methods discussed in Chapter 2 to keep nonresponse to a minimum. The bias from nonresponse is related to the cumulative nonresponse rate, taking into account both unit and item nonresponse.

FOLLOWING UP WITH A SUBSAMPLE OF NONRESPONDENTS

One way to reduce the bias resulting from unit nonresponse is to achieve a high response rate among a subsample of the cases who remain nonrespondents near the end of the regular data collection period. This approach gives some representation in the final sample to the pool of potential nonrespondents.

For example, suppose some combination of callbacks, follow-up letters, and incentives has produced a 60 percent response rate. This means that 40 percent of the cases are, at this point in the data collection effort, still nonrespondents. Within the remainder, it may be better to select a subsample of one case out of four and subject this subsample- representing 10 percent of the original cases- to more intensive follow-up efforts than would be possible if all of the nonrespondents were retained for further follow-up efforts. These "more intensive" follow-up efforts might include:

Suppose for the sake of illustration that 60 percent of the subsample selected for additional follow-up ultimately provide data. From a statistical point of view, this is equivalent to achieving a response rate of 84 percent within the entire sample. To understand why this is so, let us consider the initial sample as consisting of two strata, or subgroups. The first stratum, encompassing 60 percent of the cases fielded originally, includes those members of the population who require only standard efforts to be reached and persuaded to take part in the survey. The second stratum, encompassing the remaining 40 percent of the population, includes all of those who require additional efforts.

The response rate within the first stratum was 100 percent and the response rate within the second stratum was 60 percent; thus, the final sample respondents represent 84 percent of the population that the original sample was selected to represent (. 60 x 1.00 plus .40 x .60). This calculation assumes that the subsample is a random sample of the initial nonrespondents and that the remainder of this group- that is, the portion not selected for further follow-up- is dropped.

Subsampling of nonrespondents is complicated and it means that the final data set must be weighted. Because of these added costs, the strategy is only useful when two conditions are met.

It may be difficult to tell whether these conditions are met. If standard follow-up efforts are yielding few additional cases and the projected final response rate is low, subsampling may be worth trying.

CONCENTRATE RESOURCES ON A SUBSAMPLE -

Recommendation: When the potential nonresponse bias is large (because response rates are low or respondents and nonrespondents differ sharply), select a subsample of nonrespondents for intensive follow-up efforts. This method is useful when response rates can be increased by concentrating resources.

3.5 STATISTICAL METHODS

This section describes the three main procedures used to compensate for nonresponse in surveys- nonresponse weighting, post-stratification, and imputation. [1,3]

WEIGHTING

Need for weights. Data from surveys often require the use of weights to produce unbiased population estimates. The weights are typically applied for three main purposes. First, weights are often needed to compensate for differences in the selection probabilities of individual cases. Such differences can arise by design- a specific study may deliberately overrepresent one or more subgroups of a population in order to provide enough cases for separate analyses of the oversampled subgroups. For example, a regional transportation study may oversample a smaller jurisdiction to provide enough cases to allow separate estimates to be made for that jurisdiction. Or differences in selection probabilities may arise as an unintended byproduct of the sampling strategy. For example, in telephone samples, households with multiple telephone lines have more chances of being selected into the sample than households with a single line. Either way, estimates for the entire sample will be biased unless the data are appropriately weighted. Weights are needed to compensate for both deliberate and inadvertent departures from equal probability sampling. In the transportation literature, weighting is sometimes referred to as factoring.

Another purpose for weighting is to compensate for subgroup differences in response rates. Even if the sample as selected represents the larger population perfectly, differences in response rates can introduce systematic discrepancies between the population and the sample. For example, in personal transportation surveys, household size may be related to the probability that households will provide the required information (see Section 2.2 in Chapter 2 for a detailed discussion of characteristics related to nonresponse). Differences in response rates across subgroups of the sample can introduce bias into the results. Weighting adjustments can reduce such biases.

A final purpose for weights is to compensate for fluctuations from known population totals. For instance, if one area were overrepresented in a travel survey sample purely by chance, it would be possible to use data from the decennial census or the Current Population Survey (CPS) to adjust for this departure from the population distribution. In addition, adjusting the data to known population totals can help reduce the impact of undercoverage (e. g., the omission of persons in households without telephones) on survey estimates.

CALCULATING WEIGHTS

Weights are often calculated in three steps. 4 The first compensates for differences in selection probabilities; the second for differences by subgroup in response rates; the third for differences between the composition of the sample and the composition of the population.

Step 1: the base weight. Typically, the initial, or base, weight (W1i ) for a case (e. g., a sample household) is calculated as the inverse of that case's selection probability (Pri ):

W1i=1/Pri ,

All eligible selections- whether they complete the survey or not- should receive a base weight. The selection probability (or sampling rate) is the proportion of the population selected for the study. In a full random-digit dial (RDD) sample survey, the sampling unit is a telephone number and the selection probability is the percentage of possible numbers within the study area that were actually included in the sample. This total will include both nonworking and nonresidential numbers. Consider, for example, a study of Queens, New York. Suppose Queens encompasses 150 distinct exchanges (e. g., 753-xxxx). Since each exchange includes 10,000 possible numbers (the numbers 0000 through 9999), the total population of possible numbers includes 1,500,000. If 1,500 of these numbers are selected (each with the same probability), then each would have a selection probability of .001 (1,500/ 1,500,000).

Both the sample and the population from which it was drawn include nonworking and nonresidential numbers, besides the residential numbers that are actually eligible for the study. The sampling rate is the same for the subset of residential numbers as for overall set of possible numbers; it is 1 in 1000, or .001. The presence of ineligible units (such as nonworking numbers) in the sample does not affect the calculation of the weights for the eligible units. In some types of telephone sampling, sampling is restricted to certain subgroups of the possible telephone numbers. When the sample is restricted in this way, the size of the population of possible numbers will be smaller than 10,000 times the number of exchanges in the study area.

In stratified sample designs, the population is first divided into subgroups called "strata" and separate samples are selected within each one. Often different sampling probabilities are used within the different strata. For example, the study area might be divided into areas where different modes of transportation are used more often. High density urban areas might be sampled at a higher rate to increase the number of persons who make trips by walking, riding a bicycle, or taking public transit. Or the sampling rates may be set to ensure that separate estimates can be made for different jurisdictions. If the telephone numbers linked to different areas are subject to different rates of sampling, then the separate selection probabilities will have to be computed for each area. Table 3.3 shows an example of how this might be done.

Table 3.3 Calculation of Initial Weights in a Stratified RDD Sample
Table 3.3 Calculation of Initial Weights in a Stratified RDD Sample

Note: Both selections and population represent telephone numbers. W1 represents the initial weight for sample cases.

Table 3.3 displays a stratified sample of telephone numbers in New York City, with separate sampling rates in each of the five boroughs. Continuing our earlier example, we have assumed that, in Queens County, 1,500 numbers were included out of 1.5 million possible numbers in the exchanges that serve Queens. That gives a selection probability of .001 for each number and a base weight of 1,000.

In an RDD survey, this base weight should then be adjusted to compensate for the fact that people in households with multiple telephones have more than one chance of being selected into the sample. The standard adjustment is quite simple; it is the base weight for household i (1/Pr1i ) divided by the number of distinct telephone lines (ti ) in the household:

equation

To carry out this adjustment, it is necessary to add questions to the interview to determine how many distinct telephone lines the household has (excluding those dedicated to faxes or modems). For example, if a household in Queens County reports 3 telephone lines, then the adjusted weight will be 333.33 (= 1 over 0.001 x 3).

In a survey in which households are first screened and then subsampled for the main data collection, the base weight should reflect the selection probabilities at both phases of selection- selection into the screening sample and retention for the main sample:

equation

in which Pr1i represents the case's probability of inclusion in the screening sample and Pr2i its probability of retention in the main sample.

If the weights have been properly calculated, their sum represents an estimate of the size of the population from which the sample is selected. For example, in each stratum in Table 3.3, the weights sum to the stratum population size.

Recommendation: COMPENSATE FOR DIFFERENCES IN SELECTION PROBABILITIES BY WEIGHTING THE DATA - Each case should receive a base weight equal to the inverse of its selection probability. In a telephone sample, the weight should also reflect the number of lines to which a household is linked.

Step 2: compensating for nonresponse. The base weight (W1 or W1 ') should then be adjusted to compensate for the effects of nonresponse. Nonresponse adjustments ensure that the sum of the weights is unaffected by nonresponse; they do this by reallocating the weights assigned to the nonrespondents among the respondents. In addition, the nonresponse adjustments can reduce the bias introduced by nonresponse by compensating for differences in nonresponse rates across subgroups of the sample.

Nonresponse adjustments are calculated by grouping cases into nonresponse adjustment cells and finding the (weighted) response rates for cases in that cell. In our hypothetical survey of New York City, the adjustment cells might be the five boroughs. For each cell, the weighted response rate is computed using the procedures described in the first chapter of this report. For cases in cell j (e. g., telephone numbers in exchanges linked to the Bronx), the weighted response rate (Rj ) is:

equation

in which the numerator is the sum of the weights for the respondents in cell j (and nrj is the number of respondents in that cell) and the denominator is the sum of the weights for all eligible cases in that cell (and nej is the number of eligible cases in the cell). As we noted in Chapter 1, the number of eligibles may have to be estimated if there are cases for whom eligibility could not be ascertained.

The adjusted weight (W2) is the base weight divided by the weighted response rate (Rj ):

equation

For nonrespondents and ineligible cases, the adjusted weight is set to zero. The sum of the adjusted weights for the respondents in cell j should equal the sum of the base weights for the eligible cases in that cell.

Table 3.4 illustrates the calculation of adjusted weights for our hypothetical sample in New York City. The table shows the weighted number of eligibles and respondents in each borough, the response rate, and the adjusted weight, which incorporates an adjustment for nonresponse. For example, in Queens County, 700 of the selected telephone numbers turned out to be eligible for the study. The sum of the weights for these cases was 700,000 (700 cases, each with a weight of 1,000). The sum of the weights for the 560 respondents was 560,000. The weighted response rate was, therefore, 0.8 (560,000/ 700,000). The adjusted weight for each respondent from Queens is 1,250 (= 1,000/ 0.8); each of the nonrespondents receives a weight of 0. The adjusted weights sum to the same total as the initial weights for the eligible cases (700,000 = 700 x 1,000 = 560 x 1,250). In this example, all the cases in each adjustment cell have the same initial weight. In most surveys, different cases would begin with different initial weights.

Table 3.4 Calculation of Adjusted Weights in a Stratified RDD Sample
Calculation of Adjusted Weights in a Stratified RDD Sample

Note: The numbers given in the second and third columns are weighted (using W1 , the initial weight for sample cases); the parenthetical entries in those columns are raw numbers of eligible and responding cases, respectively. W2 is the adjusted weight.

Ideally, adjustment cells should be formed using variables that are related both to the likelihood of nonresponse and to the substantive variables of interest in the survey (such as travel behavior). Often, however, the choices are quite limited because so little is known about the nonrespondents and because both respondents and nonrespondents must be classified into adjustment cells. For example, in a telephone survey, the only information available for the nonrespondents may be their area code and exchange (and any geographic information that can be inferred from these). Thus, the nonresponse adjustment cells have to be formed using whatever information happens to be available for the nonrespondents.

When there are two phases of data collection- a screening phase and a main interview phase- separate nonresponse adjustments should be calculated for each phase. The same adjustment cells need not be used in both phases. In fact, the screening data are generally useful for forming adjustment cells to compensate for nonresponse to the main data collection. If R1j denotes the weighted response rate in the first phase of data collection and R2k the response rate in the second phase, then the adjusted weight would be:

equation

for the respondents and zero for the nonrespondents.

Recommendation: COMPENSATE FOR DIFFERENCES IN NONRESPONSE RATES BY ADJUSTING THE BASE WEIGHT - The base weights should be adjusted for nonresponse. If data are collected in two phases, separately calculate nonresponse adjustments for each phase.

Step 3: post-stratifying to population estimates. As we noted, the sum of the weights represents an estimate of the size of the survey population. Sometimes independent estimates of the size of the population are available (for example, from decennial census data). A technique called post-stratification can be used to bring the survey weights into agreement with the outside population figures. Post-stratification is used to correct for two types of errors in survey estimates- random sampling error and coverage error. Random sampling error refers to chance departures of the sample from the population it is selected to represent. Post-stratification can be expected to reduce random sampling error when a population estimate is derived from the decennial census or from a survey with a much larger sample than one used in the transportation survey being weighted. Coverage errors refer to systematic problems regarding who is included or excluded from the sample. Post-stratification can be expected to reduce the effects of coverage error when the population estimate gives better coverage of the population than the transportation survey sample does. For example, if a telephone survey was used to collect the data, the sample will necessarily exclude households without telephones. The two most frequent sources of figures for post-stratification are the decennial census and the Current Population Survey; these are thought to achieve much higher levels of coverage of the general population than other surveys do.

Post-stratification involves comparing the sum of the weights (i. e., W2) for a given subgroup with the population estimate for that group. For example, decennial census figures are available for age-race-sex groupings at the level of counties and minor civil divisions. The post-stratification adjustment is calculated by multiplying the adjusted weight of cases in a subgroup, say subgroup j, by the ratio between the population estimate for that subgroup (Nj) and the sum of the weights for sample cases in that subgroup:

equation

The adjustment cells are typically defined in terms of areas (such as townships) and one or more demographic variables (such as household size). For example, in weighting the data from a household travel survey carried out in a large metropolitan area, the analysts took into account household size, the number of vehicles available, and zones (defined by response rates) within townships in the sample area [2] .

Table 3.5 illustrates the calculation of post-stratified weights with the data from our hypothetical survey of New Yorkers. The sum of the adjusted weights (W2) for Queens is 700,000; according to the 1990 census, the total number of households was about 720,149. This produces a post-stratification adjustment factor of approximately 1.029 and a final weight (W3) of 1285.98 (= 1,250 x 1.029).

Table 3.5 Calculation of Post-Stratified Weights
Calculation of Post-Stratified Weights

Note: Population estimates are from the 1990 census and represent the number of households in each county. W2 is the adjusted weight; and W3 , the post- stratified weight.

Population figures for poststratification adjustments (the values for N in the j equation) can be obtained from decennial census data, the CPS, or other Census Bureau estimates. Which source to use will depend on how recent the data are, whether they are based on sufficient sample sizes (in the case of the CPS), and whether they provide appropriate grouping variables.

So far, we have emphasized the calculation of household weights. But in many transportation both household-level and person-level weights should be calculated. Typically, the same initial weights would be used (since every household member is selected within sample households). The two sets of weights would, however, incorporate different nonresponse and post-stratification adjustments.

Recommendation: IMPROVE THE ESTIMATES BY ADJUSTING WEIGHTS TO KNOWN POPULATION TOTALS - Multiply the weights for the cases in a cell by the ratio between the population estimate for the cell to the sum of the weights for that cell.

ALTERNATIVE METHODS FOR WEIGHTING

Factoring to population totals. In some cases, it is possible to skip this three-step process and simply to weight up to population figures instead:

Wij = Nj/nj

In this equation, Nj represents the population total for weighting cell and n j j the number of completed cases in that cell. The population figures could be based on decennial census data, the CPS, or some other reliable source. To use this method, it is important that the sample be selected with equal probabilities within each group. Suppose, for example, in our hypothetical survey of New York City, we had used this method of weighting. If the cells used for weighting were the different boroughs, then this method would generally yield the same result as the three-step method described earlier. Note, however, that within each borough the sample would overrepresent households with multiple telephone lines. In general, this simple method of weighting ignores any differences in selection probabilities within a weighting cell. Thus, we cannot recommend this approach unless the sample was selected with equal probabilities within each cell.

More complex schemes for weighting. The method of post-stratification described earlier assumes that population estimates are available for each weighting cell. Sometimes data are available for each variable used in defining the cells but not for every combination of these variables. Figures may be available for the total number of households in each township in a county and for each household size by number of vehicle combination but not for the three-way combination of township by household size by number of vehicles. It is still possible for the weights to take all three variables into account using a technique known variously as multidimensional raking, iterative proportional fitting, or the Deming-Stephan procedure. A very similar procedure- the Fratar method- has been used in transportation planning to project the growth in the number of trips over time 5 .

The basic principle behind the procedure is simple. The weights are adjusted to bring the survey figures into line with one set of population figures; then they are adjusted to agree with the other set of population figures. They are then readjusted to agree with the first set of figures, and so on, until the survey weights agree with both sets of population estimates.

An example will make the method clearer. Suppose we have population figures for each of four classes of household and for each of two geographic zones, but not for the cells formed by crossing the household classes with the zones. The goal is to bring the sums of the sample weights into agreement with these figures. The census figures indicate an overall total of 200,000 households in the study area.

Table 3.6 Population Data Available for Weighting Adjustment
Table 3.6

The preliminary weights prior to any post-stratification total 180,000, distributed as shown in Table 3.7.

Table 3.7 Sums of Sample Weights by Weighting Class
Table 3.7

Not only is the grand total off (180,000 vs. 200,000), but the row and column totals do not match the corresponding figures in Table 3.6.

The process of bringing the two sets of numbers into line starts with an adjustment to the row totals. Let Tjk designate the sum of the weights for a given cell, Tj+ the sum across the cells in row j, and T+k the total across the cells in column k. We will use superscripts to denote the different iterations of the process, with 0 representing the initial weights and totals before any adjustment to population figures. The new weight adjusted to the row population figures will simply be the old, unadjusted weight times an adjustment factor: equation

The adjustment factor (Nj+ /Tj+ ) is the ratio between the population figure for j+ j+ 0 row j and the sum of the current weights in that row. For example, all the weights in the cell for households with no available vehicles are increased by 1.25 (= 25,000/ 20,000). After the application of the adjustment factor, the sum of the weights in each row equals the population figure for the row (Nj+ ). This j+ is shown in Table 3.8. Unfortunately, the column totals are still off.

Table 3.8 Sums of Sample Weights after Initial Adjustment to Row Targets
Table 3.8

Thus, the next step is to adjust the new weights to the column totals. Once again, this is done by multiplying the current weights by an adjustment factor- the ratio between the population figure for the column and the sum of the current weights for that column:

equation

That is, the weights of cases in the first column will be adjusted by a factor of about 1.11 ( 120,000/ 108,333) and those of cases in the second column will be adjusted by a factor of about 0.87 ( 91,667 / 80,000). Table 3.9 shows the results. The column totals match the targets (except for rounding error) but now the row totals are off.

Table 3.9 Sums of Sample Weights after Adjustment to Column Targets

ums of Sample Weights after Adjustment to Column Targets

The whole process is now repeated, starting with the sums in Table 3.9 (instead of those in Table 3.7). More generally, the weights produced in one iteration (iteration m+ 1) adjust those produced in the previous iteration (iteration m):

equation

For example, the weights in the first row of Table 3.9 would be adjusted by a factor of 1.01 ( 25,342 / 25,000); those in the second row would be adjusted by a factor of 1.03 ( 51,469 / 50,000); and so on.

The process generally produces only small changes after three or four iterations. This method can be used with three or more dimensions as well as with two, as in our illustration.

Reconnendation: ADJUST WEIGHTS TO POPULATION TOTALS , EVEN WHEN TOTALS ARE AVAILABLE ONLY FOR INDIVIDUAL VARIABLES - Iterative proportional fitting (also known as frataring or raking) can be used when independent population estimates are not available for every weighting cell, but are available for row and column totals.

EDITING AND IMPUTATION

We have recommended both nonresponse weighting and post-stratification as methods for reducing the impact of unit nonresponse on survey results. When individual data items are missing, we recommend a different approach- the imputation of missing values. With imputation, information that is obtained for a case is used to make a guess about the information that is missing.

Statistical imputation should be distinguished from editing of the data or "logical" imputation. It is sometimes possible to figure out what the missing value should be from an examination of the data that were obtained. For example, the destination for a trip may be missing in the diary for one family member but included in the diary for another. Under these circumstances, it may be reasonable to infer, or "logically impute," the missing destination. Similarly, when a respondent's sex is missing but she is listed as the mother of another household member, it is reasonable to fill in her sex as female. Such editing procedures are outside the scope of this report but they can also help reduce the amount and impact of missing data.

Three Statistical Techniques. Three statistical techniques are commonly used to impute missing values [6] :

With both hot deck and cell mean imputation, cases are first grouped into cells based on variables that are obtained for all (or nearly all) of the cases. The imputation cells serve a function similar to the one played by the adjustment cells in nonresponse weighting. Cases with similar values on the survey variables of interest are placed together in a cell, and the data for the respondents within the cell (or those with complete data) are used to represent the nonrespondents (or those with missing data).

Cell mean imputation is rarely used in surveys, because (except in a few instances) it leads to biased estimates- by replacing each missing value with the cell mean, this method of imputation leads to serious underestimates of the variance of the survey statistics.

Regression-based methods also have problems:

Both cell-mean and regression-based imputation have their place. But for most surveys, the hot deck approach will be the most practical method.

It is worth noting that new methods of imputing missing values are being developed all the time. The new methods include multiple imputation (in which each missing value is replaced repeatedly, yielding multiple imputed values) and more sophisticated methods of single imputation. Multiple imputation has the advantage that, when done correctly, it does not underestimate the variance introduced by the imputation process [7] . All the other methods tend, to some degree, to lead to underestimation of the variance of the estimates (with cell-mean imputation producing the worst underestimation). However, software is only now being developed for these new methods; as a result, hot deck imputation, which can be carried out with widely available software, remains the most practical method for general use. Using hot deck imputation. Hot deck imputation is carried out in four steps:

  1. Group cases into imputation cells. The goal here is to group cases likely to have similar values on the variable in question. Depending on the variable, the imputation cells might take into account household size and composition, the predominant mode of transportation, and the age and sex of the respondent.
  2. Sort the cases within cells. Often cases are then sorted within the imputation cells. For example, cases may be sorted by the total number of trips they made during the survey period. The sorting allows the imputation to take into account continuous variables as well as the categorical variables used in forming the imputation cells. The final sort may also be done randomly.
  3. Replace the missing values. The value imputed to a case is just the actual value for the preceding case (the "hot deck" or record) in the cell that has a non-missing value for the variable in question. For example, if the third case in the imputation cell had a valid value for a variable and the fourth case was missing that variable, then the value for the third case (the "donor") would be used as the imputed value for the fourth. Sometimes, a limit is imposed on the number of times a given case can serve as a donor (for example, no more than three times). This limits the impact of any single case on the final survey statistics.
  4. Edit the imputed values. Imputation can produce values that are inconsistent with other information about the case. As a result, the imputed values should undergo the same editing and consistency checks as other values. Inconsistent or out-of-range values should be reimputed.

As this description makes clear, hot deck imputation is not a simple process. Moreover, it may be useful to form different imputation cells for different variables, making the process even more complex. Several computer programs are available for carrying out imputation and may simplify the work involved.

Still, it is worth bearing in mind the following principles about hot deck imputation:

  1. It is a statistical procedure and it works well on the average. It should not be used to impute data that are unique to an individual case. For example, it would not usually make sense to use hot deck imputation to impute a purpose for a specific trip.
  2. The more data that are imputed for a given case, the less accurate the imputed values are likely to be. It makes sense to impute one or two key missing values for a case. It makes far less sense to impute all of the data for a missing person within an otherwise complete household, or to impute all of the data for a missing trip in an otherwise completed diary.

Recommendation: IMPUTE MISSING VALUES FOR KEY SURVEY VARIABLES - Use hot deck imputation to replace missing values on a few key survey variables. Statistical imputation should not be used to impute all the data for a person or all the data regarding a trip.


References

1. For an introduction to the statistical treatment of nonresponse, see Lessler, J. T., & Kalsbeek, W. D. (1992). Nonsampling Error in Surveys. New York: John Wiley & Sons.

2. Kim, H., Li, J., Roodman, S., Sen, A.., Sööt, S., & Christopher, E. (1993). Factoring household travel surveys. Transportation Research Record, 1412, 17-22.

3. Kalton, G. (1983). Compensating for Missing Survey Data. Ann Arbor: Institute for Social Research.

4. Kish, L. (1992). Weighting for unequal P. Journal of Official Statistics, 8, 183-200.

5. Fratar, T., Brant, A. E., & Buttz, C. W. (1962). Estimating traffic patterns in urban areas. Paper presented at the International Road Federation, Ivth World Meeting, Madrid Spain, October 14-20.

6. Little, R. J., & Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York: John Wiley & Sons.

7. Rubin, D. (1986). Basic ideas of multiple imputation for nonresponse. Survey Methodology, 12, 37-47.