Skip Navigation
acfbanner  
ACF
Department of Health and Human Services 		  
		  Administration for Children and Families
          
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™Download Reader  |  Print Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

Table of Contents | Previous | Next

Appendix 4.1: Imputations for Item Nonresponse in the Fall 2002 Data

To facilitate analysis of the data, and to ensure that the results obtained by different analysts are consistent with one another, it is desirable to impute missing responses to produce as complete a data set as possible. Imputation also helps to control for nonresponse bias and produce a more representative file for analysis. For example, many software packages select only the cases that are complete on the set of variables analyzed and ignore the cases with incomplete data. Discarding incomplete cases is inefficient, but more seriously, the complete cases may not be representative of the target population; consequently, estimates derived from them are subject to nonresponse bias.

For this study, missing values for fall 2002 variables due to item nonresponse were imputed using hot deck imputation. Hot deck imputation is a procedure where cases with missing values for specific variables have the “holes” in their records filled in with values from other similar cases. Because the imputed values come from actual respondents’ values, hot-deck imputation has the desirable property that imputed values are always realistic and preserve the underlying sampling variation in the data.

The “donor” case from which the imputed value is taken (also referred to as the respondent), is randomly selected from a pool of similar children who are matched to the “recipient” (or nonrespondent) on characteristics that are correlated with the variable being imputed. The aim is to construct pools (or imputation classes) that explain as much of the variance in the variable to be imputed as possible, but are of adequate size so that there is some minimum number of respondents in each class, and donors are not reused too many times. The assumption is that within each imputation class, the mechanism that leads to missing data is “ignorable”; that is, the missing values are as though they were missing at random. This means that the probability that a value is missing can depend on the values of the imputation class variables but, within class, not on the missing outcome values. If implemented carefully, hot deck imputation can preserve the distribution of the data on measured variables so that estimates of distributional characteristics such as percentiles, variability, and correlation will not be distorted. However, if the item response rate is very high, a small percentage of imputed data will have very little effect on the distribution of the variable regardless of the imputation method.

The variables used to form imputation classes or cells were identified from chi-square tests of association and bivariate correlation coefficients. In some cases, they were also determined by skip patterns in the parent questionnaire and other requirements of logical consistency between questionnaire items. The imputation cells were created by cross-tabulating all of these variables at once. A donor was allowed to be used up to three times. When no more donors were available in an imputation class, adjacent cells were collapsed. The order of collapsing was specified so that levels of the least correlated cell variable were collapsed first, followed by the second least correlated variable, etc. until a donor was found. Imputed values have been flagged so that an analyst has the option of not using the imputed data, such as when analyzing the effects of the imputed data on the results.

We imputed missing data for all fall 2002 demographic variables and the fall 2002 measures of each of the spring 2003 outcomes (e.g., parenting practices, child health, assessment scores, child socio-emotional behavior, and other scale variables). The variables that underwent imputation and their item nonresponse rates for the analysis sample used in this report (the spring 2003 child assessment respondents) are given in Exhibit A.4.1.1.

The logical relationships between items were taken into account in the imputation to maintain consistency of the data and attempt to preserve correlations among variables. Closely correlated items such as assessment scores or socio-emotional scales were usually imputed from a single donor child. The donor was randomly selected from within a donor pool of children matched by treatment/control group assignment, language spoken at home, sex, race/ethnicity, and age in months as of September 1, 2002. The score and scale variables were imputed in groups according to similar patterns of missingness (i.e., the joint missing rates) and the degree of correlation among them. This strategy was viewed as a compromise between the desire to avoid throwing away reported scores and the goal of preserving the correlation among score variables. In general only the missing scores were imputed on each record, and children with partially reported scores did not have them overwritten by the donor’s scores. However, for patterns of missingness represented by a small number of children, the donor’s scores were allowed to overwrite the reported scores in the interests of reducing the number of computer runs. It should be noted that the percentage of child records with partial reporting of score and scale variables is small. The socio-emotional scales were either entirely missing or entirely reported for all but a trivial (< 0.1%) percentage of the sample. For the depression, locus of control, welfare, and crime and violence scales, 8.3 percent of the sample had partially missing data (5.6 percent were missing all but one scale, 2.5 percent were missing only one scale, and 0.2 percent were missing some other combination). For the continuous score variables, less than 5 percent of the sample had partial reporting of scores; most were either missing all scores or none.

The order in which items are imputed is also important in preserving the correlation structure in the data, because some imputed items can be used to form imputation cells in the subsequent imputation of related items. This strategy was used, for example, in the imputation of categorical assessment scores, so that the first score that was imputed could be used to create imputation cells for the next score. It was also used throughout in the imputation of correlated demographic and household variables. Similarly, for items associated with a skip pattern in the parent questionnaire, the item that leads into the skip pattern was imputed first and the subsequent items were imputed depending on the value of the skip indicator. The demographic variables were imputed first, then used to impute parenting practice, household income, child health, assessment score, and scale variables. Items with the least amount of nonresponse within a group of related categorical variables were imputed first, then used in the imputation of items with larger amounts of missing data.

In general, donors were randomly selected from within the same Head Start program within a cell when possible, collapsed with a geographically adjacent program in the cell when necessary. Programs were sorted within a cell by broad geographic area (our primary sampling unit, or PSU) within Census region, so adjacent programs tended to be from the same county or a nearby county. When there were a large number of imputation cells, the donor search often was broadened to the entire geographic PSU within a cell, and sometimes PSUs within a region were also collapsed. Some items such as fall scores required a closer match on demographic variables than geography or Head Start program in order to find a similar donor pool, and no attempt to stay within the PSU or program was made for these. Geography was also ignored for certain items requiring a very close match to the donor on other questionnaire items for logical consistency.

The distribution of each imputed variable was compared before and after imputation to check that the imputation procedures had not appreciably changed the distribution of the variable. Correlation matrices were examined to check that bivariate correlations among scores and scales were not attenuated. Crosstabs between categorical variables involved in skip patterns and those requiring logical consistency were checked to make sure that inconsistencies had not been introduced. The only variable where the distribution shifted more than a trivial amount was father’s employment status, which had a very high missing rate of 51 percent. The percent age of fathers employed full-time shifted from 74 percent to 71 percent, and the percentage unemployed increased from 16 percent to 20 percent. Fathers for whom employment status is unknown tend to come from cells with higher unemployment rates among respondents; thus, the inclusion of their imputed values will raise the overall unemployment rate. The variables used to create imputation classes for employment status were receipt of food stamps, receipt of TANF, father’s level of education, father’s race, and PSU.

Exhibit A.4.1.1: Item Nonresponse Rates for Imputed Variables
Variable Name Reported Count Imputed Count Percent Imputed Total of Reported and Imputed Count
Crime & Violence Maximum Likelihood Ability Estimate 3,546 352 9.0% 3,898
Crime & Violence IRT True-Score 3,546 352 9.0% 3,898
Number of children age 17 and under in household 3,796 102 2.6% 3,898
Restricting Child Movement Scale - fall 3,539 359 9.2% 3,898
Family Cultural Enrichment Scale 3,524 374 9.6% 3,898
Family Cultural Enrichment Scale 2 3,540 358 9.2% 3,898
Removing Harmful Objects Subscale - fall 3,538 360 9.2% 3,898
# Times child is read to 3,548 350 9.0% 3,898
Safety Devices Subscale - fall 3,538 360 9.2% 3,898
Parental Safety Practices Scale - fall 3,537 361 9.3% 3,898
Spanked child in last week 3,544 354 9.1% 3,898
# Times spanked child 3,528 370 9.5% 3,898
Used time out in last week 3,542 356 9.1% 3,898
# Times used time out 3,524 374 9.6% 3,898
Adult books in home 3,547 351 9.0% 3,898
Derived caregiver's race 141 7 4.7% 148
Derived child race 3,882 16 0.4% 3,898
Child sex 3,898 0 0.0% 3,898
Derived father's race 3,710 188 4.8% 3,898
Head Start participation 3,897 1 0.0% 3,898
Derived mother's race 3,777 121 3.1% 3,898
Caregiver's age 137 11 7.4% 148
Child born in the United States 3,792 106 2.7% 3,898
Economic difficulty scale 3,525 373 9.6% 3,898
Father's employment status 1,875 2023 51.9% 3,898
Father's highest educational attainment 3,460 438 11.2% 3,898
Father's marital status 3,421 477 12.2% 3,898
Father's age 3,283 615 15.8% 3,898
Biological father's immigrant status 3,702 196 5.0% 3,898
Biological father a recent immigrant 1,273 90 6.6% 1,363
Biological father lives with child 3,660 238 6.1% 3,898
Biological father years in the United States 1,170 193 14.2% 1,363
Grandparent in the household 3,786 112 2.9% 3,898
Anyone in household with health condition 3,537 361 9.3% 3,898
Homelessness 3,535 363 9.3% 3,898
Primary home language 3,870 28 0.7% 3,898
Biological mother's immigrant status 3,773 125 3.2% 3,898
Biological mother recent immigrant 1,210 104 7.9% 1,314
Biological mother lives with child 3,789 109 2.8% 3,898
Biological mother years in the United States 1,210 104 7.9% 1,314
Household monthly income range 3,403 495 12.7% 3,898
Mother's employment status 3,598 300 7.7% 3,898
Mother has a GED 3,757 141 3.6% 3,898
Biological mother educational attainment 3,757 141 3.6% 3,898
Mother's marital status 3,759 139 3.6% 3,898
Mother's age 3,722 176 4.5% 3,898
Number of moves in last 12 months 3,449 449 11.5% 3,898
Other caregiver's employment status 135 13 8.8% 148
Other caregiver's educational attainment 134 14 9.5% 148
Number of adults 18 and over in household 3,534 364 9.3% 3,898
Primary caregiver health impairs caring for child 3,545 353 9.1% 3,898
Primary caregivers health 3,545 353 9.1% 3,898
Child had dental care, fall 02 3,542 356 9.1% 3,898
Child's health status, fall 02 3,544 354 9.1% 3,898
Child had care for an injury, fall 02 3,537 361 9.3% 3,898
Child has health insurance, fall 02 3,542 356 9.1% 3,898
Child needs ongoing health care, fall 02 3,785 113 2.9% 3,898
Child has regular place for medical care, fall 02 3,538 360 9.2% 3,898
PELS, fall 02 3,548 350 9.0% 3,898
Child has special needs, fall 02 3,787 111 2.8% 3,898
Child has an unmet health need, fall 02 3,540 358 9.2% 3,898
Housing problems scale 3,514 384 9.9% 3,898
Receives Food Stamps 3,771 127 3.3% 3,898
Receives TANF 3,765 133 3.4% 3,898
Respondent's relationship to child 3,786 112 2.9% 3,898
Public or subsidized housing 3,523 368 9.5% 3,891
Mother had a teen birth 3,733 165 4.2% 3,898
Number of children under age 6 in household 3,796 102 2.6% 3,898
Depression maximum likelihood ability estimate 3,536 362 9.3% 3,898
Depression IRT true-score 3,536 362 9.3% 3,898
Elision IRT score 2,408 294 10.9% 2,702
Elision true score 2,408 294 10.9% 2,702
PPVT IRT score 3,187 465 12.7% 3,652
PPVT true score 3,187 465 12.7% 3,652
PPVT standard score 3,187 465 12.7% 3,652
PPVT W-ability score 3,187 465 12.7% 3,652
Spanish Elision IRT score 1,015 124 10.9% 1,139
Spanish Elision true score 1,015 124 10.9% 1,139
TVIP IRT score 1,038 101 8.9% 1,139
TVIP true score 1,038 101 8.9% 1,139
TVIP standard score 1,038 101 8.9% 1,139
TVIP W-ability score 1,038 101 8.9% 1,139
Locus of control IRT scale score 3,534 364 9.3% 3,898
Locus of control true scale score 3,534 364 9.3% 3,898
Is respondent mother or father? 3,796 102 2.6% 3,898
How well did child do in bear counting 3,434 464 11.9% 3,898
Bear counting score 3,260 638 16.4% 3,898
Book score, total 3,473 425 10.9% 3,898
Color name score, total 3,516 382 9.8% 3,898
CTOPPP Elision total score 2,408 294 10.9% 2,702
CTOPPP Spanish Elision total score 1,015 124 10.9% 1,139
CTOPPP print score 1,034 105 9.2% 1,139
McCarthy total drawing score 3,508 390 10.0% 3,898
KFAST raw score 3,220 678 17.4% 3,898
PPVT: total score 3,187 465 12.7% 3,652
Print knowledge score: total 3,445 453 11.6% 3,898
TVIP: total score 1,038 101 8.9% 1,139
WJ3 Applied problems standard score 2,392 310 11.5% 2,702
S_WJ3APPLIED_W 2,392 310 11.5% 2,702
WJ3 Applied problems W score 2,378 324 12.0% 2,702
WJ3 Oral comprehension standard score 2,378 324 12.0% 2,702
WJ3 Oral comprehension W score 2,426 276 10.2% 2,702
WJ3 Spelling W score 2,426 276 10.2% 2,702
WJ3 Letter-word standard score 3,217 435 11.9% 3,652
WJ3 Letter-word W score 3,217 435 11.9% 3,652
WJ3 Applied problems total score 2,392 310 11.5% 2,702
WJ3 Oral comprehension, total score 2,378 324 12.0% 2,702
WJ3 Spelling, total score 2,426 276 10.2% 2,702
WJ3 Letter-word total score 3,217 435 11.9% 3,652
WM Applied problems total score 1,017 122 10.7% 1,139
WM Applied problems, standard score 1,017 122 10.7% 1,139
WM Applied problems, W score 1,017 122 10.7% 1,139
WM Dictation, total score 1,024 115 10.1% 1,139
WM Dictation, standard score 1,024 115 10.1% 1,139
WM Dictation, W score 1,024 115 10.1% 1,139
WM Letter-word, total score 1,028 111 9.7% 1,139
WM Letter-word, standard score 1,028 111 9.7% 1,139
WM Letter-word, W score 1,028 111 9.7% 1,139
Child age as of 9/1/02 3,898 0 0.0% 3,898
Welfare IRT scale score 3,689 209 5.4% 3,898
Welfare true scale score 3,689 209 5.4% 3,898


 

 

Table of Contents | Previous | Next