Methodology Report #20: Class Variables for MEPS Expenditure Imputations

Skip Navigation

U.S. Department of Health and Human Services

Search www.ahrq.gov

E-mail Updates

Methodology Report #20:
Class Variables for MEPS Expenditure Imputations

Marc W. Zodet, Agency for Healthcare Research and Quality; Diana Z. Wobus, Westat; Steven R. Machlin and David Kashihara, Agency for Healthcare Research and Quality; and Deborah D. Dougherty, Westat.

Table of Contents

Abstract

The Medical Expenditure Panel Survey (MEPS)

Introduction

Background

Methodology

Examples

Table 1. Total mean expenditures for physician office visits and inpatient hospital stays, by year (standard error)

Table 2. P-Values (Wald F Statistics) from weighted regression models, by year (SUDAAN)

Table 3. Order of entry into weighted regression models, by year (STEPWISE procedure)

Table 4. Coefficients for select variables from weighted regression models, by year (SUDAAN)

Table 5. Final class variable list for imputing physician office visit expenditures

Table 6. P-values (Wald F Statistics) from weighted regression models, by year (SUDAAN)

Table 7. Order of entry into weighted regression models, by year (STEPWISE procedure)

Table 8. Coefficients for select variables; weighted regression models, by year (SUDAAN)

Table 9. Final class variable list for imputing inpatient hospital expenditures

Table 10. Coefficients for reason in hospital; weighted regression models, by year (SUDAAN)

Summary

References

Abstract

The Medical Expenditure Panel Survey (MEPS) collects data on health care utilization, expenditures, sources of payment, insurance coverage, and health care quality measures. The survey was designed to produce national and regional estimates for the U.S. civilian noninstitutionalized population. The data on medical expenses are collected from both household respondents in the Household Component and from a sample of their health care providers in the Medical Provider Component. In the absence of payment information from either component, expenditure data are derived for sample persons through an imputation process. Missing expense data are imputed at the event level for each medical event type using a weighted hot-deck procedure. This process utilizes individual- and event-level data collected in MEPS that are correlated with medical expenditures. Bivariate analyses and linear regression models were utilized to assess the current class variables used for imputation. This paper details the methodology used to select, prioritize, and categorize the class variables used to impute missing expenditures for two event types: doctor visits and inpatients hospitalizations.

The estimates in this report are based on the most recent data available at the time the report was written. However, selected elements of MEPS data may be revised on the basis of additional analyses, which could result in slightly different estimates from those shown here. Please check the MEPS Web site for the most current file releases.

Center for Financing, Access, and Cost Trends
Agency for Healthcare Research and Quality
540 Gaither Road
Rockville, MD 20850
http://www.meps.ahrq.gov/

Return to Table of Contents

The Medical Expenditure Panel Survey (MEPS)

Background

The Medical Expenditure Panel Survey (MEPS) is conducted to provide nationally representative estimates of health care use, expenditures, sources of payment, and insurance coverage for the U.S. civilian noninstitutionalized population. MEPS is cosponsored by the Agency for Healthcare Research and Quality (AHRQ), formerly the Agency for Health Care Policy and Research, and the National Center for Health Statistics (NCHS).

MEPS comprises three component surveys: the Household Component (HC), the Medical Provider Component (MPC), and the Insurance Component (IC). The HC is the core survey, and it forms the basis for the MPC sample and part of the IC sample. Together these surveys yield comprehensive data that provide national estimates of the level and distribution of health care use and expenditures, support health services research, and can be used to assess health care policy implications.

MEPS is the third in a series of national probability surveys conducted by AHRQ on the financing and use of medical care in the United States. The National Medical Care Expenditure Survey (NMCES) was conducted in 1977, the National Medical Expenditure Survey (NMES) in 1987. Beginning in 1996, MEPS continues this series with design enhancements and efficiencies that provide a more current data resource to capture the changing dynamics of the health care delivery and insurance system.

The design efficiencies incorporated into MEPS are in accordance with the Department of Health and Human Services (DHHS) Survey Integration Plan of June 1995, which focused on consolidating DHHS surveys, achieving cost efficiencies, reducing respondent burden, and enhancing analytical capacities. To accommodate these goals, new MEPS design features include linkage with the National Health Interview Survey (NHIS), from which the sample for the MEPS-HC is drawn, and enhanced longitudinal data collection for core survey components. The MEPS-HC augments NHIS by selecting a sample of NHIS respondents, collecting additional data on their health care expenditures, and linking these data with additional information collected from the respondents’ medical providers, employers, and insurance providers.

Household Component

The MEPS-HC, a nationally representative survey of the U.S. civilian noninstitutionalized population, collects medical expenditure data at both the person and household levels. The HC collects detailed data on demographic characteristics, health conditions, health status, use of medical care services, charges and payments, access to care, satisfaction with care, health insurance coverage, income, and employment.

The HC uses an overlapping panel design in which data are collected through a preliminary contact followed by a series of five rounds of interviews over a two and a half year period. Using computer-assisted personal interviewing (CAPI) technology, data on medical expenditures and use for two calendar years are collected from each household. This series of data collection rounds is launched each subsequent year on a new sample of households to provide overlapping panels of survey data and, when combined with other ongoing panels, will provide continuous and current estimates of health care expenditures.

The sampling frame for the MEPS-HC is drawn from respondents to NHIS, conducted by NCHS. NHIS provides a nationally representative sample of the U.S. civilian noninstitutionalized population, with oversampling of Hispanics and blacks.

Medical Provider Component

The MEPS-MPC supplements and validates information on medical care events reported in the MEPS-HC by contacting medical providers and pharmacies identified by household respondents. The MPC sample includes all hospitals, hospital physicians, home health agencies, and pharmacies reported in the HC. Also included in the MPC are all office-based physicians:

Providing care for HC respondents receiving Medicaid.
Associated with a 75 percent sample of households receiving care through an HMO (health maintenance organization) or managed care plan.
Associated with a 25 percent sample of the remaining households. Data are collected on medical and financial characteristics of medical and pharmacy events reported by HC respondents, including:
Diagnoses coded according to ICD-9 (9th Revision, International Classification of Diseases) and DSMIV (Fourth Edition, Diagnostic and Statistical Manual of Mental Disorders).
Physician procedure codes classified by CPT-4 (Current Procedural Terminology, Version 4).
Inpatient stay codes classified by DRG (diagnosis related group).
Prescriptions coded by national drug code (NDC), medication names, strength, and quantity dispensed.
Charges, payments, and the reasons for any difference between charges and payments.

The MPC is conducted through telephone interviews and mailed survey materials.

Insurance Component

The MEPS-IC collects data on health insurance plans obtained through private and public sector employers. Data obtained in the IC include the number and types of private insurance plans offered, benefits associated with these plans, premiums, contributions by employers and employees, and employer characteristics.

Establishments participating in the MEPS-IC are selected through three sampling frames:

A list of employers or other insurance providers identified by MEPS-HC respondents who report having private health insurance at the Round 1 interview.
A Bureau of the Census list frame of private-sector business establishments.
The Census of Governments from the Bureau of the Census.

To provide an integrated picture of health insurance, data collected from the first sampling frame (employers and other insurance providers) are linked back to data provided by the MEPS-HC respondents. Data from the other three sampling frames are collected to provide annual national and State estimates of the supply of private health insurance available to American workers and to evaluate policy issues pertaining to health insurance. Since 2000, the Bureau of Economic Analysis has used national estimates of employer contributions to group health insurance from the MEPS-IC in the computation of Gross Domestic Product (GDP).

The MEPS-IC is an annual panel survey. Data are collected from the selected organizations through a prescreening telephone interview, a mailed questionnaire, and a telephone follow-up for nonrespondents.

Survey Management

MEPS data are collected under the authority of the Public Health Service Act. They are edited and published in accordance with the confidentiality provisions of this act and the Privacy Act. NCHS provides consultation and technical assistance.

As soon as data collection and editing are completed, the MEPS survey data are released to the public in staged releases of summary reports and microdata files. Summary reports are released as printed documents and electronic files. Microdata files are released on CD-ROM and/or as electronic files.

Printed documents and CD-ROMs are available through the AHRQ Publications Clearinghouse. Write or call:

AHRQ Publications Clearinghouse
Attn: (publication number)
P.O. Box 8547 Silver Spring, MD 20907
800-358-9295
703-437-2078 (callers outside the United States only)
888-586-6340 (toll-free TDD service; hearing impaired only)

To order online, send an e-mail to: ahrqpubs@ahrq.gov.

Be sure to specify the AHRQ number of the document or CD-ROM you are requesting. Selected electronic files are available through the Internet on the MEPS Web site: http://www.meps.ahrq.gov/

For more information, visit the MEPS Web site or e-mail mepspd@ahrq.gov.

Introduction

The Medical Expenditure Panel Survey (MEPS) collects data on health care utilization, expenditures, sources of payment, insurance coverage, and health care quality measures. The survey, conducted annually since 1996 by the Agency for Healthcare Research and Quality (AHRQ), is designed to produce national and regional estimates for the U.S. civilian noninstitutionalized population.

MEPS data on medical expenses are collected from both household respondents in the Household Component and from a sample of their health care providers in the Medical Provider Component. When payment (i.e., expenditure) information is missing from either component, these data are derived through an imputation process. Expense data are collected at the event level for each medical event type and a weighted hot-deck procedure is used for imputation. This process utilizes individual- and event-level data collected in MEPS that are correlated with medical expenditures. AHRQ uses bivariate analyses and linear regression models to assess potential variables to use in imputation.

Using office-based visits and inpatients stays as examples, this paper details the methodology used to select, prioritize, and categorize the class variables used to impute missing expenditure data. The paper does not address the specifics of how the imputations are actually carried out. For a more detailed description of the imputation procedure, see Machlin and Dougherty, 2004.

Return to Table of Contents

Background

Class variables

A key component of a hot-deck procedure is the matching of sample observations with missing information (i.e., recipients) to similar sample observations not missing the information (i.e., donors). Categorical or "class" variables that characterize the sample observations are used to classify both recipients and donors into imputation cells (i.e., classes). Within each imputation cell, the recipients’ missing values are imputed from the values of the donors. Variables that are considered important predictors of the data to be imputed are the primary candidates for use as class variables. The underlying assumption is that the recipients have similar values with regard to the measure of interest as the donors and that the data associated with the donors within the same imputation cell are appropriate for the imputation of the missing values (Cox, 1980).

Class variables are typically ordered in accordance with predictive importance (i.e., more important predictors are ranked higher). If there are fewer donors than recipients in a cell, then the procedure will begin collapsing over the categories of the class variables, starting at the bottom of the list and working up, until a sufficient number of donors are available.

MEPS event types

MEPS expenditure data are imputed separately for each of 10 event types: hospital inpatient stays, hospital outpatient department visits, emergency room visits, office-based visits (physician and non-physician), home health (agency and paid independent), dental, other medical equipment/supplies, and prescription medications. Separate imputations are conducted for each event type because the relevant variables and statistically significant correlates are not consistent across the event types. Therefore, for each event type, the class variables are evaluated and chosen separately, but some of the same class variables are used across different event types. For example, the class variables for the imputations of both emergency room expenditures and dental expenditures include patient age. While the same class variable may be used across multiple event types, the specification of the specific categories for the variable used in the individual imputations may differ. The remainder of this paper discusses the process by which variables are evaluated and selected for use in the creation of imputation cells.

Return to Table of Contents

Methodology

The lists of class variables used to impute event-specific expenditures were initially established based on the first year of MEPS data (1996). The process of identifying predictors of total expenditures was based both on substantive decisions and statistical associations that were identified primarily through multiple linear regression models. In 2002, analysts from AHRQ and Westat, the data collection contractor, jointly began to reevaluate and revise these lists of class variables. The methods presented in this section and the Examples section below are reflective of those efforts and focus primarily on the quantitative methods used in the decision process.

Data

Event-level data are used for these analyses. Only events that were potential donors (i.e., complete on the Household Component and/or the Medical Provider Component) were used in the analyses. Multiple years of data were examined: 1997, 1998, and 1999. For the most part, each year of data was examined separately. However, when the numbers of events were small (e.g., home health services), years of data were pooled to stabilize the variance of the estimates.

Potential class variables

The class variables considered for the imputation were those collected in MEPS that were thought a priori to potentially have a significant impact on total expenditures. Two variables were considered important enough to be included in all imputation procedures: type of insurance coverage and total charges. The former was chosen because the payment for health care services can vary widely by insurance status and type of insurance coverage (e.g., private, Medicare, Medicaid, etc.); the latter because total charges are highly correlated with total expenditures. Unfortunately, when expenditures are missing total charges are also frequently missing.

Other potential predictors of expenditures were selected quantitatively. These included various indicators of health care services (e.g., laboratory tests, radiology, surgeries/extractions, etc.). Predictors can be specific to the type of event. For example, the number of nights is associated with inpatient hospital stays, but is not relevant to physician office visits.

Return to Table of Contents

Regression models

Multiple linear regression was used to evaluate the statistical associations between potential class variables and total expenditures. The dependent variable in each model was total expenditures for the event. Total expenditures were defined as the sum of direct payments for care provided during the year, including both out-of-pocket, third-party (e.g., private insurance, Medicare, and Medicaid), and other miscellaneous payment sources.

Two approaches were taken when fitting the regression models to assess the association between potential class variables and total expenditures. First, to adjust for the complex design of MEPS, linear regression models were fit using PROC REGRESS in the SUDAAN statistical software package (www.rti.org/sudaan). With these models, the two primary considerations were 1) whether or not the resulting regression coefficients were significant and 2) the relative magnitude and direction of the significant coefficients. Statistical significance was determined at the α=0.05 level. To provide additional guidance in the selection of variables, models were fit using SAS PROC STEPWISE (http://www.sas.com/). The significance level for entry and retention was 0.15 (the SAS default). Block entry grouping of variables was used to ensure that all levels of a particular variable were entered, retained, or eliminated as a group.

Results from both sets of models (i.e., those fit using SUDAAN and those fit using SAS) were considered when selecting the final list of class variables to be used in the imputation procedures. Model results were also used to prioritize the class variables, which were ranked with the most important substantive and statistical predictors placed higher on the list. Model results were also used to determine the collapsing strategies for variables with three or more levels. When it became necessary to collapse over imputation cells due to insufficient availability of donors, the most important predictors of total expenditures (i.e., those higher on the list) were preserved. This was an effort to assure that recipients and donors were matched based on the most important predictors of total expenditures.

Return to Table of Contents

Examples

As noted previously, the process for identifying class variables was performed separately for each type of event. Examples of how this process works for physician office visits and inpatient hospital stays are presented below. To provide a point of reference for the magnitude of total expenses attributed to each of these two types of medical events, table 1 presents mean total expenditures per event for 1997 through 1999 for events with complete (i.e., not imputed) data. In 2001, approximately one-third of the expenditure values were fully imputed for physician office visits and hospital inpatient stays.

Table 1. Mean total expenditures for physician office visits and inpatient hospital stays, by year (standard error)

	1997	1998	1999
Physician office visits ¹	$92 ($3)	$98 ($3)	$107 ($3)
Hospital inpatient stays ¹, ²	$5,647 ($301)	$5,375 ($304)	$5,929 ($367)

¹ Estimates are for patients with complete event data (i.e., donors).
² Only events of patients who did not die during the year.

During the late 1990s, total expenditures for a physician office visit averaged roughly $100 per event while facility expenditures for an inpatient hospital stay during this same period averaged approximately $5,600 per event.

Physician office visits

Table 2 summarizes p-values for regression model coefficients fit using SUDAAN (i.e., adjusted for the complex survey design). Separate models were fit for the years 1997, 1998, and 1999 with physician office visit expenditures as the dependent variable in each model. Independent variables in the models were those hypothesized as potentially significant predictors of office visit expenditures and were the candidate variables from which to select the class variables to create the imputation cells.

The information provided in table 2 shows that surgery, radiology, other services, and laboratory services were all statistically significant predictors of physician office visit expenditures across all three years (p-values < 0.01). Other variables were statistically significant predictors in some years, but not others. For example, patient age was highly significant (p-value < 0.01) in 1999, but not in the two preceding years.

Table 2. P-Values (Wald F Statistics) from weighted regression models, by year (SUDAAN)
Dependent variable = physician office visit expenditures

	1997	1998	1999
# Obs Used in Regression	48,815	34,948	31,978
R²	0.043	0.048	0.032
Class variable ¹
Surgery (Yes; No)	<0.01	<0.01	<0.01
Radiology (yes; no)	<0.01	<0.01	<0.01
Other services (yes; no)	<0.01	<0.01	<0.01
Laboratory services (yes; no)	<0.01	<0.01	<0.01
Saw non-MD (yes; no)		<0.10	<0.10
Age (<18; 18-24; 25-64; 65+)			<0.01
Perceived health (poor; other)		<0.10	<0.05
Race/ethnicity (Hispanic; other)
Census region (S; MW; NE; W)
MSA (MSA; Non-MSA)	<0.05	<0.10

¹ Variables forced into the models are not shown (e.g., Insurance Source of Payment [Private; Medicare; Medicaid; CHAMPUS/TRICARE], Decile of Total Charges, and HMO Indicator [Yes; No])

Return to Table of Contents

Results from fitting the STEPWISE models for each year are presented in table 3, which shows the order in which the independent variables entered into the models. Surgery, radiology, and other services were consistently the first, second, and third variables entered into the model each year. Perceived health and laboratory services alternated as the fourth and fifth variables, depending on the year.

Table 3. Order of entry into weighted regression models, by year (STEPWISE procedure)
Dependent variable = physician office visit expenditures

	1997	1998	1999
# Obs used in regression	48,815	34,948	31,978
R²	0.042	0.048	0.032
Variable entry order
1st	Surgery	Surgery	Surgery
2nd	Radiology	Radiology	Radiology
3rd	Other services	Other services	Other services
4th	Perceived health	Lab services	Perceived health
5th	Lab services	Perceived health	Lab services
6th	Saw non-MD	Age	Age
7th	Region	Saw non-MD	Region
8th	Region	Region	Saw non-MD

Table 4 presents the SUDAAN regression coefficients for selected variables used in the model. This table illustrates that surgery was consistently associated with higher physician office visit expenditures. For the years observed (i.e., 1997–1999), the average additional expenditure associated with having a surgical procedure during a physician office visit was approximately $200, when controlling for the other variables on the model. These additional expenditures were substantially greater than what is observed for the other factors being considered. For example, the difference in mean expenditures per event associated with surgery compared to radiology (the second strongest effect) ranged from approximately $115 in 1999 ($196–$81) to approximately $136 in 1997 ($205–$69).

Table 4. Coefficients for select variables from weighted regression models, by year (SUDAAN),
Dependent variable = physician office visit expenditures

	β-Coefficients (SE β-Coefficients)
Class variable	1997	1998	1999
Surgery	$205 ($25)	$198 ($28)	$196 ($28)
Radiology	$69 ( $5)	$79 ( $7)	$81 ( $9)
Other services	$53 ( $9)	$44 ($10)	$58 ( $8)
Lab services	$21 ( $4)	$24 ( $6)	$20 ( $6)
Perceived health	$40 ($25)	$30 ($17)	$34 ($14)
Saw non-MD	-$8 ( $6)	-$14 ( $8)	-$11 ( $7)

Among the four most highly significant variables (i.e., surgery, radiology, other services, and laboratory services), the magnitudes of the coefficients (i.e., the average expenditures) associated with a particular service tended to diminish in accordance with the entry order of the variables into the STEPWISE models. However, while the expenses associated with surgery were consistently higher than those of any of the other factors considered, the magnitude of the differences between the other services (i.e., radiology, other, and lab) varied from year to year. For example, a simple comparison of the mean office visit expenditures associated with radiology compared to other services demonstrated no significant difference in 1997; but there was a significant difference in 1998, with payments for office visits involving a radiology service running about $35 more per visit compared with those with other services ($79 versus $44). In summary, of the factors considered, surgery clearly had the greatest impact on increasing physician office visit expenditures.

Return to Table of Contents

The final list of class variables used to impute physician office visit expenditures is presented in table 5. The top three variables were chosen based upon substantive reasoning: HMO (an indicator of whether or not the patient was enrolled in an HMO), type of insurance coverage, and total charges. The remainder were chosen based upon the regression results. Surgery, radiology, and other services followed in that order primarily because they were each highly significant in each of the SUDAAN models across all three years and because they were consistently the first three variables entered into the STEPWISE models in all three years. The laboratory services variable was placed above the perceived health variable because it was more highly significant in each of the SUDAAN models and because it entered into the STEPWISE models before the perceived health variable for two of the three years. In turn, the perceived health variable was more statistically significant in the SUDAAN models than the saw non-MD variable. It also entered into each of the STEPWISE models before saw non-MD and was therefore higher on the list. Despite being statistically significant in at least one of the years examined, neither age nor MSA (metropolitan statistical area) were included on the final list of class variables. The rationale for dropping age and MSA came from the fact that age was only significant in one year (p-value < 0.01), and MSA was never retained in any of the STEPWISE procedures.

Table 5. Final class variable list for imputing physician office visit expenditures

1.	HMO
2.	Type of Insurance Coverage
3.	Total charges
4.	Surgery
5.	Radiology
6.	Other services
7.	Laboratory services
8.	Perceived health
9.	Saw non-MD

Hospital inpatient stays

Table 6 shows that, based on the SUDAAN model, the only statistically significant predictors of inpatient hospital stay expenditures of the variables considered were length of stay and reason in hospital (p-values < 0.01). These results were consistent across each of the three years. Results from the STEPWISE models confirmed the importance of both length of stay (LOS) and reason in hospital, as these variables were consistently the first and second variable, respectively, added to each of the models (table 7).

Table 6. P-values (Wald F Statistics) from weighted regression models, by year (SUDAAN)
Dependent variable = inpatient hospital stay expenditures

	1997	1998	1999
# Obs used in regression	1,881	1,294	1,259
R²	0.40	0.36	0.44
Class variable ¹
ER before admission (yes; no)
HMO (yes; no)
Length of Stay (0, 1, 2,…6, 7, 8-13, 14-30, 31-60, 61+)	<0.01	<0.01	<0.01
Reason in hospital (surgery; treatment/therapy; diagnostic tests; give birth; to be born; other)	<0.01	<0.01	<0.01
Census region (N; MW; S; W)
MSA (MSA; non-MSA)

¹ Variables forced into the models are not shown (e.g., Insurance Source of Payment [Private; Medicare; Medicaid; CHAMPUS/TRICARE] and Decile of Total Charges)

Return to Table of Contents

Table 7. Order of entry into weighted regression models, by year (STEPWISE procedure)
Dependent variable = inpatient hospital stay expenditures

	1997	1998	1999
# Obs used in regression	1,881	1,294	1,259
R²	0.32	0.31	0.31
Variable entry order
1st	LOS	LOS	LOS
2nd	Reason	Reason	Reason
3rd	ER before	Region	Region
4th	Region		HMO

The coefficients for length of stay and reason in hospital that resulted from SUDAAN are presented in table 8. For the most part, mean expenditures per stay increased as the length of stay increased. There was some erratic behavior of the coefficients for the longest lengths of stay (e.g., sharp drops in average expenditures associated with lengths of stay of more than 60 days). While this may have been due to the influence of outliers in the 31–60 day category and/or may suggest that some other functional form of the variable was more appropriate, it had no impact on our decision to include length of stay as a high-priority variable. Surgery was the most significant contributor to inpatient expenditures compared with the other reasons for hospitalization. The coefficients indicated that surgery is associated with an approximate increase in inpatient expenditures of at least $3,000 compared to the other reasons category for admission to the hospital.

Table 8. Coefficients for select variables; weighted regression models, by year (SUDAAN)
Dependent variable = inpatient hospital stay expenditures

		β-Coefficients (SE β-Coefficients)
Class variable		1997	1998	1999
Length of stay (days)	0 (Reference)	$0 ( $0)	$0 ( $0)	$0 ( $0)
	1	$2,121 ( $411)	$2,020 ( $488)	$771 ( $550)
	2	$3,824 ( $448)	$3,073 ( $480)	$2,146 ( $638)
	3	$4,715 ( $523)	$3,792 ( $505)	$3,126 ( $569)
	4	$5,637 ( $615)	$5,239 ( $727)	$4,193 ( $708)
	5	$6,922 ( $933)	$6,624 ( $976)	$4,436 ( $707)
	6	$7,853 ( $836)	$7,307 ($1,236)	$6,165 ( $1,125)
	7	$8,532 ( $927)	$7,180 ($1,110)	$7,340 ( $1,066)
	8-13	$10,555 ( $1,053)	$8,722 ( $761)	$8,769 ( $1,124)
	14-30	$18,967 ( $3,048)	$18,123 ($2,706)	$19,409 ( $4,170)
	31-60	$44,950 ($12,311)	$25,739 ($6,567)	$39,188 ($17,209)
	61+	$5,484 ( $827)	$15,107 ($9,416)	$48,210 ($11,756)
Reason in hospital	Surgery (reference)	$0 ( $0)	$0 ( $0)	$0 ( $0)
	Treatment/therapy	-$4,342 ( $590)	-$3,906 ( $676)	-$4,937 ( $882)
	Diagnostic tests	-$4,315 ( $570)	-$3,543 ( $521)	-$4,998 ( $734)
	Give birth	-$3,380 ( $461)	-$3,122 ( $532)	-$3,780 ( $622)
	To be born	$2,456 ( $4,525)	-$2,082 ($1,956)	-$6,554 ( $1,701)
	Other	-$3,792 ( $924)	-$3,600 ($1,525)	-$4,567 ( $796)

Return to Table of Contents

The final list of class variables used to impute inpatient hospital expenditures is presented in table 9. As usual, type of insurance coverage and total charges were included at the top of the list. In addition, an indicator of whether or not there was an emergency room (ER) event before the hospital admission was included because the billing information for the ER and the hospital stay are often rolled up into one expenditure figure for the stay. Based on the findings noted above, length of stay and reason in hospital then followed in that order. MSA status and census region were also included on the final list; based in part on their being retained in the STEPWISE models (p-values<0.15).

Table 9. Final class variable list for imputing inpatient hospital expenditures

1.	Type of Insurance Coverage
2.	Total Charges
3.	ER before Admission
4.	Length of Stay
5.	Reason in Hospital
6.	MSA/Non-MSA
7.	Census Region

Class variable collapsing strategy

Results from the regression modeling presented above were also used to establish the collapsing strategy used during the hot-deck procedure for variables with three or more levels. The coefficients from the SUDAAN regression models weighed heavily in deciding how to collapse over variables with three or more categories. For example, consider the reason in hospital variable described above. Note that there was little difference between the coefficients for treatment/therapy and diagnostics tests only. Hence, prior to using the variable in the imputation procedure, it seemed reasonable to recode these two levels into one, effectively reducing the variable from six levels to five levels (table 10). During the imputation procedure, further collapsing of the remaining levels was determined by the number of recipients/donors residing in a given imputation cell. Given the findings noted above, it was important to maintain surgery as a separate category whenever possible since it was associated with the highest mean expenditures. Thus, the hot deck was programmed to maintain surgery as a separate category whenever possible.

Table 10. Coefficients for reason in hospital; weighted regression models, by year (SUDAAN)
Dependent variable = inpatient hospital stay expenditures

			β-Coefficients (SE β-Coefficients)
			1997	1998	1999
Reason in hospital:	1 {	Surgery (reference)	$0	$0	$0
		Treatment/therapy	-$4,342 ( $590)	-$3,906 ( $676)	-$4,937 ( $882)
		Diagnostic tests only	-$4,315 ( $570)	-$3,543 ( $521)	-$4,998 ( $734)
		Give birth	-$3,381 ( $461)	$3,122 ( $532)	-$3,780 ( $622)
		To be born	$2,456 ($4,525)	$2,082 ($1,956)	-$6,554 ($1,701)
		Other	-$3,792 ( $924)	-$3,600 ($1,525)	-$4,567 ( $796)

¹ Recoded into a single category (i.e., reason in hospital changes from a six-level variable to a five-level variable).

Return to Table of Contents

Summary

The process of selecting the most appropriate class variables to use when imputing health care expenditures is a combination of art and science that involves both substantive reasoning and statistical analysis. As illustrated above, predictors of expenses can vary by event type, and the selection of class variables includes the examination of both person characteristics and event characteristics. Careful selection of class variables should improve the quality of the hot-deck imputation procedure and reduce bias in MEPS expenditure estimates. The class variables used to impute health care expenditure data in MEPS are periodically reviewed and refined. Class variables being considered for future inclusion in the imputation procedures include provider specialty for ambulatory events and person-level condition information.

Return to Table of Contents

References

Cox, B. (1980). The Weighted Sequential Hot Deck Imputation Procedure. American Statistical Association 2004 Proceedings of the Section on Survey Research Methods, 721–726.

Machlin, S. and Dougherty, D. (2004). Overview of Methodology for Imputing Missing Expenditure Data in the Medical Expenditure Panel Survey. Methodology Report No. 19. March 2007. Agency for Healthcare Research and Quality, Rockville, Md. http://www.meps.ahrq.gov/mepsweb/data_files/publications/mr19/mr19.pdf

Return to Table of Contents

Return to the MEPS Homepage

Suggested Citation:
Zodet, M. W., Wobus, D. Z, Machlin, S. R., Kashihara, D., and Dougherty, D. D. Class Variables for MEPS Expenditure Imputations. Methodology Report No. 20. March 2007. Agency for Healthcare Research and Quality, Rockville, Md. http://www.meps.ahrq.gov/mepsweb/data_files/publications/mr20/mr20.shtml

MEPS HOME . MEPS FAQ . CONTACT MEPS . MEPS SITE MAP . MEPS PRIVACY POLICY . ACCESSIBILITY TOOLS

Agency for Healthcare Research and Quality 540 Gaither Road Rockville, MD 20850 Telephone: (301) 427-1364