Table of Contents | Previous | Next |
Appendix 4.1: Imputations for Item Nonresponse in the Fall 2002 Data
To facilitate analysis of the data, and to ensure that the results obtained by different analysts are consistent with one another, it is desirable to impute missing responses to produce as complete a data set as possible. Imputation also helps to control for nonresponse bias and produce a more representative file for analysis. For example, many software packages select only the cases that are complete on the set of variables analyzed and ignore the cases with incomplete data. Discarding incomplete cases is inefficient, but more seriously, the complete cases may not be representative of the target population; consequently, estimates derived from them are subject to nonresponse bias.
For this study, missing values for fall 2002 variables due to item nonresponse were imputed using hot deck imputation. Hot deck imputation is a procedure where cases with missing values for specific variables have the “holes” in their records filled in with values from other similar cases. Because the imputed values come from actual respondents’ values, hot-deck imputation has the desirable property that imputed values are always realistic and preserve the underlying sampling variation in the data.
The “donor” case from which the imputed value is taken (also referred to as the respondent), is randomly selected from a pool of similar children who are matched to the “recipient” (or nonrespondent) on characteristics that are correlated with the variable being imputed. The aim is to construct pools (or imputation classes) that explain as much of the variance in the variable to be imputed as possible, but are of adequate size so that there is some minimum number of respondents in each class, and donors are not reused too many times. The assumption is that within each imputation class, the mechanism that leads to missing data is “ignorable”; that is, the missing values are as though they were missing at random. This means that the probability that a value is missing can depend on the values of the imputation class variables but, within class, not on the missing outcome values. If implemented carefully, hot deck imputation can preserve the distribution of the data on measured variables so that estimates of distributional characteristics such as percentiles, variability, and correlation will not be distorted. However, if the item response rate is very high, a small percentage of imputed data will have very little effect on the distribution of the variable regardless of the imputation method.
The variables used to form imputation classes or cells were identified from chi-square tests of association and bivariate correlation coefficients. In some cases, they were also determined by skip patterns in the parent questionnaire and other requirements of logical consistency between questionnaire items. The imputation cells were created by cross-tabulating all of these variables at once. A donor was allowed to be used up to three times. When no more donors were available in an imputation class, adjacent cells were collapsed. The order of collapsing was specified so that levels of the least correlated cell variable were collapsed first, followed by the second least correlated variable, etc. until a donor was found. Imputed values have been flagged so that an analyst has the option of not using the imputed data, such as when analyzing the effects of the imputed data on the results.
We imputed missing data for all fall 2002 demographic variables and the fall 2002 measures of each of the spring 2003 outcomes (e.g., parenting practices, child health, assessment scores, child socio-emotional behavior, and other scale variables). The variables that underwent imputation and their item nonresponse rates for the analysis sample used in this report (the spring 2003 child assessment respondents) are given in Exhibit A.4.1.1.
The logical relationships between items were taken into account in the imputation to maintain consistency of the data and attempt to preserve correlations among variables. Closely correlated items such as assessment scores or socio-emotional scales were usually imputed from a single donor child. The donor was randomly selected from within a donor pool of children matched by treatment/control group assignment, language spoken at home, sex, race/ethnicity, and age in months as of September 1, 2002. The score and scale variables were imputed in groups according to similar patterns of missingness (i.e., the joint missing rates) and the degree of correlation among them. This strategy was viewed as a compromise between the desire to avoid throwing away reported scores and the goal of preserving the correlation among score variables. In general only the missing scores were imputed on each record, and children with partially reported scores did not have them overwritten by the donor’s scores. However, for patterns of missingness represented by a small number of children, the donor’s scores were allowed to overwrite the reported scores in the interests of reducing the number of computer runs. It should be noted that the percentage of child records with partial reporting of score and scale variables is small. The socio-emotional scales were either entirely missing or entirely reported for all but a trivial (< 0.1%) percentage of the sample. For the depression, locus of control, welfare, and crime and violence scales, 8.3 percent of the sample had partially missing data (5.6 percent were missing all but one scale, 2.5 percent were missing only one scale, and 0.2 percent were missing some other combination). For the continuous score variables, less than 5 percent of the sample had partial reporting of scores; most were either missing all scores or none.
The order in which items are imputed is also important in preserving the correlation structure in the data, because some imputed items can be used to form imputation cells in the subsequent imputation of related items. This strategy was used, for example, in the imputation of categorical assessment scores, so that the first score that was imputed could be used to create imputation cells for the next score. It was also used throughout in the imputation of correlated demographic and household variables. Similarly, for items associated with a skip pattern in the parent questionnaire, the item that leads into the skip pattern was imputed first and the subsequent items were imputed depending on the value of the skip indicator. The demographic variables were imputed first, then used to impute parenting practice, household income, child health, assessment score, and scale variables. Items with the least amount of nonresponse within a group of related categorical variables were imputed first, then used in the imputation of items with larger amounts of missing data.
In general, donors were randomly selected from within the same Head Start program within a cell when possible, collapsed with a geographically adjacent program in the cell when necessary. Programs were sorted within a cell by broad geographic area (our primary sampling unit, or PSU) within Census region, so adjacent programs tended to be from the same county or a nearby county. When there were a large number of imputation cells, the donor search often was broadened to the entire geographic PSU within a cell, and sometimes PSUs within a region were also collapsed. Some items such as fall scores required a closer match on demographic variables than geography or Head Start program in order to find a similar donor pool, and no attempt to stay within the PSU or program was made for these. Geography was also ignored for certain items requiring a very close match to the donor on other questionnaire items for logical consistency.
The distribution of each imputed variable was compared before and after imputation to check that the imputation procedures had not appreciably changed the distribution of the variable. Correlation matrices were examined to check that bivariate correlations among scores and scales were not attenuated. Crosstabs between categorical variables involved in skip patterns and those requiring logical consistency were checked to make sure that inconsistencies had not been introduced. The only variable where the distribution shifted more than a trivial amount was father’s employment status, which had a very high missing rate of 51 percent. The percent age of fathers employed full-time shifted from 74 percent to 71 percent, and the percentage unemployed increased from 16 percent to 20 percent. Fathers for whom employment status is unknown tend to come from cells with higher unemployment rates among respondents; thus, the inclusion of their imputed values will raise the overall unemployment rate. The variables used to create imputation classes for employment status were receipt of food stamps, receipt of TANF, father’s level of education, father’s race, and PSU.
Variable Name | Reported Count | Imputed Count | Percent Imputed | Total of Reported and Imputed Count |
---|---|---|---|---|
Crime & Violence Maximum Likelihood Ability Estimate | 3,546 | 352 | 9.0% | 3,898 |
Crime & Violence IRT True-Score | 3,546 | 352 | 9.0% | 3,898 |
Number of children age 17 and under in household | 3,796 | 102 | 2.6% | 3,898 |
Restricting Child Movement Scale - fall | 3,539 | 359 | 9.2% | 3,898 |
Family Cultural Enrichment Scale | 3,524 | 374 | 9.6% | 3,898 |
Family Cultural Enrichment Scale 2 | 3,540 | 358 | 9.2% | 3,898 |
Removing Harmful Objects Subscale - fall | 3,538 | 360 | 9.2% | 3,898 |
# Times child is read to | 3,548 | 350 | 9.0% | 3,898 |
Safety Devices Subscale - fall | 3,538 | 360 | 9.2% | 3,898 |
Parental Safety Practices Scale - fall | 3,537 | 361 | 9.3% | 3,898 |
Spanked child in last week | 3,544 | 354 | 9.1% | 3,898 |
# Times spanked child | 3,528 | 370 | 9.5% | 3,898 |
Used time out in last week | 3,542 | 356 | 9.1% | 3,898 |
# Times used time out | 3,524 | 374 | 9.6% | 3,898 |
Adult books in home | 3,547 | 351 | 9.0% | 3,898 |
Derived caregiver's race | 141 | 7 | 4.7% | 148 |
Derived child race | 3,882 | 16 | 0.4% | 3,898 |
Child sex | 3,898 | 0 | 0.0% | 3,898 |
Derived father's race | 3,710 | 188 | 4.8% | 3,898 |
Head Start participation | 3,897 | 1 | 0.0% | 3,898 |
Derived mother's race | 3,777 | 121 | 3.1% | 3,898 |
Caregiver's age | 137 | 11 | 7.4% | 148 |
Child born in the United States | 3,792 | 106 | 2.7% | 3,898 |
Economic difficulty scale | 3,525 | 373 | 9.6% | 3,898 |
Father's employment status | 1,875 | 2023 | 51.9% | 3,898 |
Father's highest educational attainment | 3,460 | 438 | 11.2% | 3,898 |
Father's marital status | 3,421 | 477 | 12.2% | 3,898 |
Father's age | 3,283 | 615 | 15.8% | 3,898 |
Biological father's immigrant status | 3,702 | 196 | 5.0% | 3,898 |
Biological father a recent immigrant | 1,273 | 90 | 6.6% | 1,363 |
Biological father lives with child | 3,660 | 238 | 6.1% | 3,898 |
Biological father years in the United States | 1,170 | 193 | 14.2% | 1,363 |
Grandparent in the household | 3,786 | 112 | 2.9% | 3,898 |
Anyone in household with health condition | 3,537 | 361 | 9.3% | 3,898 |
Homelessness | 3,535 | 363 | 9.3% | 3,898 |
Primary home language | 3,870 | 28 | 0.7% | 3,898 |
Biological mother's immigrant status | 3,773 | 125 | 3.2% | 3,898 |
Biological mother recent immigrant | 1,210 | 104 | 7.9% | 1,314 |
Biological mother lives with child | 3,789 | 109 | 2.8% | 3,898 |
Biological mother years in the United States | 1,210 | 104 | 7.9% | 1,314 |
Household monthly income range | 3,403 | 495 | 12.7% | 3,898 |
Mother's employment status | 3,598 | 300 | 7.7% | 3,898 |
Mother has a GED | 3,757 | 141 | 3.6% | 3,898 |
Biological mother educational attainment | 3,757 | 141 | 3.6% | 3,898 |
Mother's marital status | 3,759 | 139 | 3.6% | 3,898 |
Mother's age | 3,722 | 176 | 4.5% | 3,898 |
Number of moves in last 12 months | 3,449 | 449 | 11.5% | 3,898 |
Other caregiver's employment status | 135 | 13 | 8.8% | 148 |
Other caregiver's educational attainment | 134 | 14 | 9.5% | 148 |
Number of adults 18 and over in household | 3,534 | 364 | 9.3% | 3,898 |
Primary caregiver health impairs caring for child | 3,545 | 353 | 9.1% | 3,898 |
Primary caregivers health | 3,545 | 353 | 9.1% | 3,898 |
Child had dental care, fall 02 | 3,542 | 356 | 9.1% | 3,898 |
Child's health status, fall 02 | 3,544 | 354 | 9.1% | 3,898 |
Child had care for an injury, fall 02 | 3,537 | 361 | 9.3% | 3,898 |
Child has health insurance, fall 02 | 3,542 | 356 | 9.1% | 3,898 |
Child needs ongoing health care, fall 02 | 3,785 | 113 | 2.9% | 3,898 |
Child has regular place for medical care, fall 02 | 3,538 | 360 | 9.2% | 3,898 |
PELS, fall 02 | 3,548 | 350 | 9.0% | 3,898 |
Child has special needs, fall 02 | 3,787 | 111 | 2.8% | 3,898 |
Child has an unmet health need, fall 02 | 3,540 | 358 | 9.2% | 3,898 |
Housing problems scale | 3,514 | 384 | 9.9% | 3,898 |
Receives Food Stamps | 3,771 | 127 | 3.3% | 3,898 |
Receives TANF | 3,765 | 133 | 3.4% | 3,898 |
Respondent's relationship to child | 3,786 | 112 | 2.9% | 3,898 |
Public or subsidized housing | 3,523 | 368 | 9.5% | 3,891 |
Mother had a teen birth | 3,733 | 165 | 4.2% | 3,898 |
Number of children under age 6 in household | 3,796 | 102 | 2.6% | 3,898 |
Depression maximum likelihood ability estimate | 3,536 | 362 | 9.3% | 3,898 |
Depression IRT true-score | 3,536 | 362 | 9.3% | 3,898 |
Elision IRT score | 2,408 | 294 | 10.9% | 2,702 |
Elision true score | 2,408 | 294 | 10.9% | 2,702 |
PPVT IRT score | 3,187 | 465 | 12.7% | 3,652 |
PPVT true score | 3,187 | 465 | 12.7% | 3,652 |
PPVT standard score | 3,187 | 465 | 12.7% | 3,652 |
PPVT W-ability score | 3,187 | 465 | 12.7% | 3,652 |
Spanish Elision IRT score | 1,015 | 124 | 10.9% | 1,139 |
Spanish Elision true score | 1,015 | 124 | 10.9% | 1,139 |
TVIP IRT score | 1,038 | 101 | 8.9% | 1,139 |
TVIP true score | 1,038 | 101 | 8.9% | 1,139 |
TVIP standard score | 1,038 | 101 | 8.9% | 1,139 |
TVIP W-ability score | 1,038 | 101 | 8.9% | 1,139 |
Locus of control IRT scale score | 3,534 | 364 | 9.3% | 3,898 |
Locus of control true scale score | 3,534 | 364 | 9.3% | 3,898 |
Is respondent mother or father? | 3,796 | 102 | 2.6% | 3,898 |
How well did child do in bear counting | 3,434 | 464 | 11.9% | 3,898 |
Bear counting score | 3,260 | 638 | 16.4% | 3,898 |
Book score, total | 3,473 | 425 | 10.9% | 3,898 |
Color name score, total | 3,516 | 382 | 9.8% | 3,898 |
CTOPPP Elision total score | 2,408 | 294 | 10.9% | 2,702 |
CTOPPP Spanish Elision total score | 1,015 | 124 | 10.9% | 1,139 |
CTOPPP print score | 1,034 | 105 | 9.2% | 1,139 |
McCarthy total drawing score | 3,508 | 390 | 10.0% | 3,898 |
KFAST raw score | 3,220 | 678 | 17.4% | 3,898 |
PPVT: total score | 3,187 | 465 | 12.7% | 3,652 |
Print knowledge score: total | 3,445 | 453 | 11.6% | 3,898 |
TVIP: total score | 1,038 | 101 | 8.9% | 1,139 |
WJ3 Applied problems standard score | 2,392 | 310 | 11.5% | 2,702 |
S_WJ3APPLIED_W | 2,392 | 310 | 11.5% | 2,702 |
WJ3 Applied problems W score | 2,378 | 324 | 12.0% | 2,702 |
WJ3 Oral comprehension standard score | 2,378 | 324 | 12.0% | 2,702 |
WJ3 Oral comprehension W score | 2,426 | 276 | 10.2% | 2,702 |
WJ3 Spelling W score | 2,426 | 276 | 10.2% | 2,702 |
WJ3 Letter-word standard score | 3,217 | 435 | 11.9% | 3,652 |
WJ3 Letter-word W score | 3,217 | 435 | 11.9% | 3,652 |
WJ3 Applied problems total score | 2,392 | 310 | 11.5% | 2,702 |
WJ3 Oral comprehension, total score | 2,378 | 324 | 12.0% | 2,702 |
WJ3 Spelling, total score | 2,426 | 276 | 10.2% | 2,702 |
WJ3 Letter-word total score | 3,217 | 435 | 11.9% | 3,652 |
WM Applied problems total score | 1,017 | 122 | 10.7% | 1,139 |
WM Applied problems, standard score | 1,017 | 122 | 10.7% | 1,139 |
WM Applied problems, W score | 1,017 | 122 | 10.7% | 1,139 |
WM Dictation, total score | 1,024 | 115 | 10.1% | 1,139 |
WM Dictation, standard score | 1,024 | 115 | 10.1% | 1,139 |
WM Dictation, W score | 1,024 | 115 | 10.1% | 1,139 |
WM Letter-word, total score | 1,028 | 111 | 9.7% | 1,139 |
WM Letter-word, standard score | 1,028 | 111 | 9.7% | 1,139 |
WM Letter-word, W score | 1,028 | 111 | 9.7% | 1,139 |
Child age as of 9/1/02 | 3,898 | 0 | 0.0% | 3,898 |
Welfare IRT scale score | 3,689 | 209 | 5.4% | 3,898 |
Welfare true scale score | 3,689 | 209 | 5.4% | 3,898 |
Table of Contents | Previous | Next |