Skip to Main Content
Peer Review Program
HomeUnited States Department of Transportation, HomeContact UsSite Map
Travel Model Improvement Program - TMIP
About TMIPTMIP ServicesClearinghouse, selectedConferences and CoursesContactsTravel Model DiscussionsTRANSIMSLinks

BROWSE

SEARCH

ORDER DOCUMENTS

NEWSLETTER


DOCUMENT NAVIGATION

Previous Section
Table of Contents
Next Section

Travel Demand Modeling and Network Assignment Models

Estimation of Travel Demand Models with Grouped and Missing Income Data

PDF format PDF files require the use of a PDF reader. You may download a free copy of Adobe Acrobat Reader to view these files. If you already have the Adobe 
Acrobat Reader installed on your computer it should automatically open to view the files when you click on the link to the document.

Chandra Bhat(1)

A method to impute a continuous value for household income from grouped and missing income data for use as an explanatory variable in travel demand estimation was developed. Many data sets collect income in a discrete number of categories or in grouped form to simplify the respondent's task and to encourage a response. In spite of such grouped data collection, many respondents refuse to provide information on income, leading to missing income values. The issue of constructing a continuous measure of income from grouped and missing income data that, when used in travel demand models as an explanatory variable, enables consistent estimation of the model parameters is addressed.


Household income is an important sociodemographic explanatory variable in travel demand models such as car ownership models (1), trip generation models (2), and mode choice models (3). In almost all transportation data sets and in many other data sets (4) household income, an inherently continuous variable, is measured in a discrete number of categories or intervals; that is, it is measured in grouped form (e.g., between $15,000 and $30,000). The income question is also notorious for its high nonresponse rates, leading to missing income observations in most data sets.

Income is measured in grouped form for two related reasons. First, such a measuring scale provides a greater degree of protection of confidentiality compared with a continuous measure (the degree of protection being a function of the size of income intervals), thereby increasing response rates (5). Second, it renders the sensitive income question relatively innocuous during survey administration. Questions that seek a continuous measure on income can offend respondents, particularly in a telephone survey or in a personal interview survey in which respondents are put "on the spot."

although income is measured in grouped form, it is the continuous measure of income (or some function of this continuous measure) that frequently appears as an explanatory variable in travel demand models. It is important that this continuous measure be a reliable measure of the true income value to enable the development of an accurate and reliable relationship between travel demand variables and their explanatory variables and thus facilitate good prediction of travel demand variables [the research by Hamburg et al. (6) indicates that the estimates in a travel demand model are highly sensitive to the accuracies of sociodemographic input variables and emphasizes the need for accurate measures of the input variables]. This paper proposes a method for constructing such a continuous measure of income for all observations in a cross-sectional data set with grouped and missing income data.

The next section of this paper discusses the motivation for developing methods to explicitly accommodate the grouped and missing nature of income data in travel demand modeling. The subsequent section presents the need to develop a model relating household income and factors affecting household income to impute a continuous income measure from grouped and missing income data for use as an exogenous variable in travel demand models. The following section advances an econometric framework used to impute a continuous income measure through the development of a model relating income to variables influencing income. Empirical results obtained by using a Dutch data set are then presented. The final section provides a summary of the research and highlights important findings.

MOTIVATION FOR TREATMENT OF GROUPED AND MISSING INCOME DATA

The motivation for the treatment of grouped and missing income data originates from the need to develop a consistent relationship between travel demand variables and their explanatory variables (including income). The dependent variable in the demand model may be an observed continuous variable such as trip generation or a latent continuous variable that is a reflection of an observed discrete choice decision such as utilities in the case of a mode choice decision or car ownership propensity in the case of an ordered car ownership model. Unfortunately current procedures for constructing a continuous measure from grouped data and commonly used techniques for handling missing income data do not enable consistent estimation of travel demand models. This inconsistency in commonly used demand estimation procedures without and with missing income data is discussed below.

Commonly Used Estimation Procedures

Grouped Without Missing Income Data

Commonly used estimation procedures construct a continuous value of income from grouped data by assigning the midpoint of each of the income threshold bounds that determine each category to each observation in that category. If the threshold bounds for income category j are aj-1 (the lower bound) and aj (the upper bound), then a continuous income value is constructed for all observations in category jas

Equation 1

In the case of the two categories at either end of the income spectrum, an arbitrary truncation point is used as the representative value.

This midpoint method of constructing a continuous measure from grouped data has serious limitations. Consider an underlying linear regression between a demand variable y, and the actual (but unobservable) income variable Ii* as follows [the following presentation is based on Hsiao (7) and is confined to the case when the dependent demand variable is an observed continuous variable for ease in presentation]:

Equation 2

where

i = index for observations,
XXXX and XXXX = parameters to be estimated, and
ui = an error term.

Assume the standard regression conditions that ui is an independent and identically distributed (iid) random error term with a mean of zero and Ii* is uncorrelated with the error term. If the actual income value Ii* for an observation is replaced by the midpoint of the corresponding income category, the regression may be rewritten as

Equation 3

where Equation. In this case when the midpoint values are used, the coefficient of is given by (using Equation 2)

Equation 4

To simplify this expression write the actual (but unobserved) continuous income Ii* for an observation i falling in the grouped income category j as the sum of three components: the midpoint of the category j, i, as computed in Equation 1; an error term i representing the difference between the expected value of Ii* given that it falls in category j (or the expected value of the marginal distribution of the continuous income variable between the threshold bounds of category j) and the midpoint of category j; and a random error term, wi, representing the difference between the actual continuous income Ii* and the expected value of Ii* given that it falls in category j. That is,

Equation 5

Equation 6

where Equation and Equation. By using Equation 5 one can write Equation. By substituting this expression into Equation 4 one can rewrite the least-squares estimate of by the midpoint method as

Thus the parameter estimate on income obtained by the midpoint method converges to the actual value of 0 in the travel demand model if and only if Equation converges to zero. However this will generally not be the case. The magnitude and direction of Equation depend on the shape and distribution of the actual (but unobserved) income variable. Earlier studies (8,9) have indicated that a log-normal form is theoretically and empirically appropriate for the income distribution. Equation is, in general, not equal to zero for a log-normal distribution. No general result regarding the direction and magnitude of Equation (and therefore the direction and magnitude of the bias of the midpoint method) can be established for the log-normal distribution. A more definitive result can be established if it is assumed that Ii* in Equation 2 represents the logarithm transformation of actual income. In this case Ii* is normally distributed (since actual income is log normally distributed). Assuming small tail distributions, i decreases from a positive value for the lower income categories (the expected value of the normal distribution between the threshold bounds of category j is greater than the midpoint) to a negative value for the higher income categories (the expected value of the distribution between the threshold bounds of category j is lower than the midpoint) as indicated by Haitovsky (10). On the other hand the midpoint of income categories increases as one proceeds from lower to higher categories. Thus the covariance term, Equation, is negative and the midpoint estimate Equation in Equation 6 underestimates .

The midpoint method leads to inconsistent parameter estimates (a parameter estimate Equation is said to be a consistent estimator of the true if, as the sample size gets infinitely large, the probability that Equation will be less than any arbitrary small positive number approaches 1) in the travel demand model because i is not equal to zero. However if a consistent imputed estimate of income (that is, a consistent estimate of the expected value of Ii* given that it falls in category j) is used instead of the midpoints, i is zero and one obtains consistent parameters in the travel demand model (the reader will observe that as the number of income categories increases, or more appropriately as the size of the income interval within each income category decreases, i becomes closer to zero in the midpoint method and the inconsistency resulting from use of the midpoint method is reduced).

The results regarding the inconsistency of the midpoint method are generalizable to the case of many explanatory variables in the travel demand model. Specifically use of the midpoint income estimate as an explanatory variable leads to inconsistent parameter estimates on all of the explanatory variables in the model, not just the income variable (7).

Grouped with Missing Income Data

The discussion above assumed that there are no missing income observations. Now consider the limitations of commonly used methods when missing income data are present. Current methods adopt one of two strategies to estimate travel demand models from grouped and missing income data. The first strategy is to assign the midpoint of income categories for observations with observed (grouped) income values and to assign the average value of the midpoint estimates of the observed income observations to the missing income observations. As discussed earlier, the midpoint method does not provide consistent estimates of the travel demand model. Also this assignment of the average of observed income observations to missing income observations assumes that the average income of respondent households (i.e., households that report income) is identical to that of non respondent households (i.e., households that do not report income). This may not be true because of systematic variations in observed and unobserved characteristics affecting income earnings between members of respondent and nonrespondent households (11). Observed characteristics may include the education levels of the members of the household, whereas unobserved characteristics may include sensitivity to privacy and fear of governmental or other uses of the data. If such systematic variations are present between members of respondent and nonrespondent households, assigning the average income of respondent households to nonrespondent households is inappropriate and will further contribute to inconsistency in the parameter estimates of the demand model.

The second strategy for estimating travel demand models from grouped and missing income data is to assign the midpoint of income category thresholds for the observed (grouped) income data and to drop all missing income observations. It was already shown that the midpoint method provides inconsistent travel demand parameters. In addition another dimension of inconsistency arises when all missing income observations are dropped. If systematic variations in income level are present between respondent and nonrespondent households, then the relationship between independent variables and the travel demand variable for nonrespondents may be different from that for respondents. Thus the travel demand relationship obtained by dropping all nonrespondent households will not be a representative relationship for the entire population. This second strategy of dropping missing income observations also results in a loss of observations, resulting in inefficient estimation.

It is clear from the discussion above that commonly used procedures for dealing with grouped and missing income data are inadequate or waste valuable data. The next section discusses the need to develop a dependent income model, that is, a relationship between household income (the dependent variable) and a set of variables affecting household income (the independent variables), to impute a continuous income measure from grouped and missing data for use as an explanatory variable in travel demand models.

NEED FOR DISAGGREGATE INCOME MODEL FOR IMPUTING INCOME

This section discusses the need to develop a dependent income model to impute a continuous income measure. Cases in which there are no missing income data and in which there are missing income data are discussed.

No Missing Income Data

Earlier it was indicated that use of a consistent imputed estimate of income (that is, assigning to each observation falling in income category j the expected value of the income distribution bounded by the category thresholds) in a travel demand model provides consistent parameter estimates. This method assigns a single value to all income observations in a category. It does not use information on observed variables likely to affect income earnings (such as education level and number of employed adults in a household) that can help to differentiate among the incomes of different households within a particular grouped category. Developing a dependent income model (using the grouped observation on income) and combining the instrumental variable estimate of income from such a model with the information on income categories will enable construction of a consistent and efficient imputed income measure for use in travel demand models. The structure and estimation procedure for imputing income values from grouped data are discussed later in this paper.

Presence of Missing Income Data

The need to develop a dependent income model is critical when missing income data are present, since such a model is the only means of imputing an income measure for the missing data while at the same time accounting for any systematic variations in the observed characteristics (such as education level and number of employed adults) between respondent and nonrespondent households. The model should also account for systematic variations in unobserved characteristics between respondent and nonrespondent households. A consistent and efficient imputed estimate of income for use in travel demand models can be obtained from grouped and missing income data by combining the instrumental variable estimate of income from the model with information on whether a household responded to the income question or not and the income category in which a household's income falls (if the household responded). The structure and estimation procedure for imputing income values from grouped and missing income data are discussed later in this paper.

The discussion above emphasizes the need to develop a dependent income model to impute a continuous income estimate from grouped or grouped and missing income data for use as an exogenous variable in travel demand models. The remainder of this paper presents the econometric framework for imputing income through the development of a dependent income model and presents empirical results of the dependent income model and associated imputed estimates by using a Dutch data set.

ESTIMATION METHODOLOGY

The methodology used to develop a dependent income model and to impute a continuous income value from grouped and missing data in two stages is discussed in this section. In the first stage it is assumed that there are. no missing income values. The methodology is then extended to accommodate missing income values in the second stage. The program routines for all estimations in this paper were written and coded by using the GAUSS matrix programming language.

No Missing Income Data

Assume that the actual but unobserved logarithm of household income, Ii*, is a function of a vector Xi of exogenous variables as follows:

Equation 7

where

image = vector of parameters to be estimated,
Xi = vector of explanatory variables, and
imagei = a random disturbance term assumed to be homoscedastic, independent, and normally distributed with mean of zero and a variance of Image2 (a logarithm form is adopted for the dependent income variable because as indicated earlier a log-normal form has been found to be theoretically and empirically appropriate for the income distribution).

The observed data on income indicate that they fall into a prespecific interval. The relationship between the grouped observed income data Ii and the continuous unobserved (log) income value Ii* is written as follows:

Equation 8

where the aj's represent known threshold values (which represent the logarithm of the actual income threshold bounds) for each income category j. Representing the cumulative standard normal by (, the probability that household income falls in category j may be written from Equations 7 and 8 as

Equation 9

Defining a set of dummy variables

Equation 10

the likelihood function for estimation of the parameters y and a is

Equation 11

Initial parameter values for the maximum likelihood search are obtained by assigning to each income observation its conditional expectation on the basis of the marginal distribution of I* and regressing these conditional expections on the vector of exogenous variables. The reader will note that the likelihood function of Equation 11 differs from that of the standard ordered probit model. In particular is unidentifiable and the threshold values (the aj's) are unknown parameters to be estimated in the ordered probit model. In contrast in the current model the threshold values are known, and (as a consequence) is identifiable.

Defining the standard normal density function by Equation, an imputed value for household (log) income may be computed for all the observations from the estimates of and obtained from maximizing the likelihood function in Equation 11. The imputed value for an income observation in category j may be computed by using the properties of doubly truncated univariate normal distributions (12) as follows:

Equation 12

These imputed values represent unbiased and consistent measures of (log) income and can be used as an explanatory variable in travel demand models (the imputed values are also guaranteed to fall within the lower and upper boundaries of the observed income categories). If an alternative function of income (other than the log function), g(Ii*), appears as the explanatory variable in the travel demand model, an imputed value may be computed as:

Equation 13

This imputed value of the function of (log) income is not unbiased, since in general the expected value of a continuous function of a variable is not equal to the function of the expected value of the variable. However it is consistent by Slutsky's theorem and thus will enable consistent estimation of travel demand models.

Presence of Missing Income Data

If missing income values are present in the data (as is almost always the case), one of two approaches may be used to construct a continuous value for all observations: (a) the naive approach or (b) the sample selection approach.

Naive Approach

The naive approach employs the method described above to estimate and by using only the observed (and grouped) income values. A continuous (log) income value is then imputed by using Equation 12 for observed income values and using Equation for missing income values. The naive approach accounts for systematic differences in the observed characteristics (represented by the X vector in Equation 7) that affect income between households that provide income and those that do not. However it fails to accommodate for systematic differences in the unobserved characteristics that affect income between respondent and nonrespondent households; that is, it ignores any "self-selection" in the choice of households to report income. Specifically unobserved factors that affect household income may also influence the decision of individuals (or households) to report income. For example it seems at least possible that households with above-average incomes, other things being equal, will be more reluctant than other households to provide information on income [Lillard et al. (11) indicate that this is so in their study of the 1980 Census Population Survey]. Because of this potential sample selection [see Mannering and Hensher (13) for a detailed review of sample selection-related issues], the naive approach will not, in general, provide consistent (continuous) estimates of income for observed or missing income data [the method proposed by Stem (14) for imputing income from grouped and missing income data falls under the naive approach]. To obtain consistent estimates the decision to report income should be considered endogenous, as discussed in the next section.

Sample Selection Approach

The sample selection approach uses two equations, one for income reporting and the other for household income, and accounts for the correlation in error terms between the two equations. Thus it accommodates systematic differences in unobserved characteristics between respondent and nonrespondent households. The model system is as follows:

Equation 14

Equation 15

where

ri = observed binary variable indicating whether or not income is reported (ri = 1 if income is reported and ri = 0 otherwise),
ri* = underlying continuous variable related to the observed binary variable ri as shown above,
Xri and XIi = vectors of exogenous variables,
imager and imageI = vectors of parameters to be estimated, and
imageri and imageIi = normal random error terms assumed to be independent and identically distributed across observations with a mean of zero and variance of one and I2, respectively.

The error terms are assumed to follow a bivariate normal distribution (the author is not aware of any earlier application of sample selection in econometric literature in which the variable subjected to sample selection is observed only in grouped form).

The probability that income is observed and falls in income category j from the model system of Equations 14 and 15 is:

Equation 16

where is the correlation between the error terms ri and Ii and 2 is the cumulative standard bivariate normal function.

Defining a set of dummy variables Mij as in Equation 10 for the observed income observations, the appropriate maximum likelihood function for estimation of the parameters in the model system is

Equation 17

Initial start values for the ML iterations are obtained by assigning to each reported income observation its conditional expectation on the basis of the marginal distribution of the underlying latent continuous variable Ii*. These values are now treated as the actual continuous (log) income values, and a Heckman's two-step method (15) is applied for sample selection models to obtain start values for the parameters.

The continuous value of (log) income for households that reported income may be computed from the parameter estimates obtained from maximizing Equation 17. By using the properties of doubly truncated bivariate normal distributions (16) and defining the following quantities,

Equation

Equation

Equation

Equation

Equation

Equation

one can write

Equation 18

The above expression collapses to Equation 12 if the correlation between the error terms in the reporting equation and the income equation is zero.

The continuous value of (log) income for households that did not report income may be imputed as follows:

Equation 19

EMPIRICAL RESULTS

This section discusses the data used to develop the dependent income model and to impute income from grouped and missing income observations and also presents estimation results.

Data

The data source used in the present study is from a Dutch National Mobility Survey. The survey involved weekly travel diaries and household and personal questionnaires collected during the spring of 1988 [for a detailed description of this survey see van Wissen and Meurs (17)]. The sample included 889 households, 55 of which have missing income data. Household income was available in three categories (for the observed income observations) in the data: (a) less than or equal to 24,000 guilders, (b) from 24,001 to 28,000 guilders, and (c) greater than 38,000 guilders.

Empirical Specification and Results

The variables considered in the income reporting equation and household income equation are listed in Table 1. They included household age and education (see definitions in Table 1), number of employed adults in the household, number of kids in the household, an indicator of whether the household has a "returning" young adult, and unemployment rate in the municipality of household residence. The household age variables enable nonlinear estimation of the age effect on income reporting and income earnings. The education variables indicate the effect of different levels of education of the adults in the household relative to that for households with one or more adults with primary education.

Table 1


Table 2

The naive method and the sample selection method were used to estimate the parameters in the household income equation. The naive method estimates parameters from observed income observations b), using Equation 11, whereas the sample selection method estimates parameters from all observations by using Equation 17. The results are shown in Table 2. The naive method estimates only the income equation, whereas the sample selection method estimates both the reporting equation and the income equation and accounts for the correlation in unobserved factors that affects these equations simultaneously. In both models the level of household education and the number of employed adults have a positive effect on income. The magnitudes of the parameters on household education are consistent with the expectation that higher levels of education have a greater effect on income. The unemployment rate in the municipality of the household residence has a significant negative effect. The reporting equation estimation results in the sample selection model indicate that households with older adults, households whose individuals have a high level of education, and households with a returning young adult have a significant negative effect on reporting. Thus there are systematic differences in the observed characteristics between households that report income and those that do not.

Table 3

The magnitude and significance of the correlation term p in the sample selection model indicate that there is a significant (and rather high) negative correlation in the unobserved factors that affect the reporting equation and the household income equation; that is, households that did not report their incomes were, all observed characteristics being equal, likely to have higher incomes than households that reported their incomes. This indicates that the naive method provides biased and inconsistent estimation results. In particular the naive method tends to underestimate the magnitudes of parameters on the exogenous variables that have a positive effect on income and tends to overestimate the magnitudes of parameters on the exogenous variables that have a negative effect on income in the income equation because of the negative correlation between the error terms in the reporting and the income equations (although the difference in coefficient estimates between the naive and the sample selection approaches appears to be small, the reader should note that the dependent variable is the logarithm of income, and thus even small coefficient differences could translate into moderate differences with respect to income earnings; the small coefficient differences may also be attributable to the small number of missing income observations in the current data set).

The mean values of impacted (log) income for households that reported income and those that did not report income obtained by using the midpoint method, the naive method, and the sample selection method are shown in Table 3. The mean values for the midpoint method depend on the representative value used for the lowest and the highest income categories. In the computations shown in Table 3 a value of log (15,000) was assigned for the "less than or equal to 24,000 guilders" category and a value of log (43,000) was assigned for the "greater than 38,000 guilders" category. The inconsistency and the ad hoc nature of the midpoint method of imputing income were discussed above. Furthermore the mean value of imputed (log) income was identical for both respondent and nonrespondent households by the midpoint method because the midpoint method does not account for systematic -variations in the observed and unobserved characteristics that affect income between respondent and nonrespondent households.

The naive method accounts for systematic variations in observed characteristics between respondent and nonrespondent households. The higher mean estimate for nonrespondent households compared with that for respondent households indicates that nonrespondent households have higher values than respondent households for the observed characteristics that increase income. This is readily observed in the reporting equation estimates of the sample selection model in Table 2, which indicate that nonrespondent households are characterized by adult members with a higher education level than those of adult members in respondent households.

The sample selection method accounts for systematic variations in the observed and unobserved characteristics that affect income between respondent and nonrespondent households. The difference in the mean value of imputed (log) income for respondent and nonrespondent households between the sample selection and naive approaches comprises two components. The first component is an underestimation of income by the naive method on the basis of the observed characteristics that affect income because of the biases in parameter estimates of the naive approach in Table 2. This first component leads to an increase in imputed (log) incomes for both respondent and nonrespondent households in the sample selection method compared with those in the naive method. The second component is the effect of the unobserved characteristics that affect reporting status and income. It leads to a decrease in imputed (log) income for respondent households and an increase for nonrespondent households. The naive method does not consider this second component; only the sample selection model does. The difference in the mean value of imputed (log) income between the sample selection and naive approaches is small for respondent households because the two components mentioned above act in opposite directions and tend to offset each other. On the other hand the main value of imputed (log) income from the sample selection approach is substantially larger than that from the naive approach for nonrespondent households because the two components mentioned above reinforce each other. Aside from the magnitude of the difference between the estimates of the sample selection and the naive approaches, however, the naive approach provides inconsistent imputed estimates both for respondent and for nonrespondent households because the correlation in the unobserved factors that affect reporting status and income earnings is significantly different from zero in Table 2. In general the sample selection method is the only approach that provides consistent imputed income estimates from grouped and missing income data.

CONCLUSION

This paper developed a methodology for imputing a continuous value of income from grouped and missing income data for use as an explanatory variable in travel demand models. The method was applied to data from the Dutch National Mobility Survey. In addition to indicating the applicability of the procedure developed in the paper to accommodating grouped and missing data, the results show that there are systematic differences in observed and unobserved characteristics between households that report income and households that do not. Failure to accommodate for this sample selection results in biased and inconsistent amputations. Use of such inconsistent imputed income values as an explanatory variable will result in unreliable travel demand models.

The methodology developed in this paper is particularly relevant because almost all transportation-related data bases record income in grouped form and because there is a trend for an in-creasing percentage of respondents to refuse to provide income information in travel and travel-related surveys (11). The methodology developed in the paper is easy to apply and has been coded for use with the GAUSS programming language.

ACKNOWLEDGMENTS

The author would like to thank Frank Koppelman and three anonymous referees for useful suggestions on previous versions of this paper.


REFERENCES
  1. Golob, T. F. The Dynamics of Household Travel Time Expenditures and Car Ownership Decisions. Presented at the International Conference on Dynamic Travel Behavior Analysis, Kyoto, Japan, July 1989.
  2. Meurs, H. Dynamic Analysis of Trip Generation. Presented at the International Conference on Dynamic Travel Behavior Analysis, Kyoto, Japan, July 1989.
  3. Beggan, J. G. The Relationship Between Travel Activity, Behavior and Mode Choice for the Work Trip. M.S. thesis. Transportation Center, Northwestern University, Evanston, Ill., 1988.
  4. Stewart, M. B. On Least Squares Estimation When the Dependent Variable Is Grouped. Review of Economic Studies, 1983, pp. 737-753.
  5. Churchill, G. A., Jr., Marketing Research: Methodological Foundations. The Dryden Press, Chicago, 1983.
  6. Hamburg, J. R., E. J. Kaiser, and G. T. Lathrop. NCHRP Report 266: Forecasting Inputs to Transportation Planning. TRB, National Research Council, Washington, D.C., 1983.
  7. Hsiao, C. Regression Analysis with a Categorized Explanatory Variable. In Studies in Econometrics, Time Series, and Multivariate Statistics. Academic Press, Incorporated, New York, 1983.
  8. Aitchison, J., and J. A. C. Brown. The Lognormal Distribution with Special Reference to Its Uses Economics. Cambridge University Press, Cambridge, 1976.
  9. Mincer, J. Schooling, Experience and Earnings. National Bureau of Economic Research, New York, 1974.
  10. Haitovsky, Y Regression Estimation from Grouped Observations. Hafner Press, New York, 1973.
  11. Lillard, L., J. P. Smith, and R Welch. What Do We Really Know About Wages? Importance of Nonreporting and Census Information. Journal of Political Economy, Vol. 94, No. 31, 1986, pp. 489-506.
  12. Johnson, N., and S. Kotz. Distributions in Statistics: Continuous Multivariate Distribution. John Wiley & Sons, Incorporated, New York, 1972.
  13. Mannering, F., and D. A. Hensher. Discrete/Continuous Econometric Models and their Applications to Transport Analysis. Transport Reviews, Vol. 7, No. 3, 1987, pp. 227-244.
  14. Stern, S. Imputing a Continuous Income Variable from a Bracketed Income Variable with Special Attention to Missing Observations. Economic Letters, Vol. 37, 1991, pp. 287-291.
  15. Heckman, J. J. Sample Selection Bias as a Specification Error. Econometrica, Vol. 47, 1979, pp. 153-161.
  16. Shah, S. M., and N. T. Parikh. Moments of Singly and Doubly Truncated Standard Bivariate Normal Distribution, Vidya, Vol. 7, 1964, pp.
  17. van Wissen, L., and H. J. Meurs. The Dutch National Mobility Panel: Experiences and Evaluation. Transp ortation, Vol. 16, No. 2, 1989.

ENDNOTE

1. Department of Civil Engineering and Environmental Engineering, University of Massachusetts, Amherst, Mass. 01003.

Publication of this paper sponsored by Committee on Passenger Travel Demand Forecasting.