CONTEXTUAL DATA FILES FROM THE 1995
NATIONAL SURVEY OF FAMILY GROWTH:
ACCESS AND ANALYSES
Linda J. Piccinino and William D. Mosher
National Center for Health Statistics
Contextual Data and
the NSFG
The National Survey of Family Growth first constructed a contextual
data file to supplement its 1982 survey of women of childbearing age. The motivation for this was that up until
then, most of the knowledge about
fertility and contraceptive behavior, two main thrusts of the NSFG, was
at the microlevel. While microlevel
data are useful for many things, these kinds of data, when used alone, bias
possible analyses. Because community level factors were not usually measured in
individual level survey data, community context was often omitted when
fertility rates were being explained.[1] However, now that there are datasets and
statistical techniques that allow us to measure community level variables as
well as individual variables, there is a growing recognition that the
characteristics of places in which women and couples live shapes their
reproductive decisions by influencing individual alternatives and their
associated social, economic and psychic costs.[2]
Contextual data are collected to provide information on the context in
which individual attitudes, behavior, or other experiences take place. Data on the characteristics of the communities where
survey respondents live are useful to address issues relevant to policy and
program planning. Contextual data from
the National Survey of Family Growth 1995 (NSFG95) are now available, on
approval, to qualified users. These
data include variables at up to four levels of aggregation (state, county,
census tract, block group) and for calendar years (1990, 1993, 1995). The richness of the NSFG95 data file is
increased by the ability to link NSFG95 main study files to geocoded data
compiled from various government and private sources.
Research with contextual data has
clear policy relevance and can be used to determine whether policies should be
aimed at individuals, or at their neighborhoods, or both. Examples of policy research topics are: the effects of state welfare policies on
teen sexual activity and childbearing; the effects of local economic conditions
on infant mortality and birth rates; and the effect of the availability of
county health and family planning clinics on sexually transmitted diseases
(STDs) and contraceptive use.
The Need for
Confidential Access Systems
Legal restrictions prevent NCHS from releasing contextual data files
directly to the general public. The dictates of Section 308(d) of the Public Health Service
Act make it illegal for the National Center for Health Statistics (NCHS) to
release data to anyone outside of NCHS that could be used to identify survey
respondents. These restrictions are necessary because contextual (neighborhood,
community, macrolevel) data are particularly sensitive and also can be used to
show the characteristics of very small and easily identifiable areas. The confidentiality pledges, made by NCHS
and other researchers, could not be possible if data were released that allowed
respondents to be identified.
NSFG Contextual
Data Access Procedures
Although confidentiality disclosure
limitations apply, access to NSFG contextual data is open to all bona fide researchers for a reasonable
fee. First, the researcher must apply
to NCHS for access. They will then receive a dummy data file complete
with full documentation to familiarize themselves with the data, and to set-up
and test their computer programs. To
better accommodate the needs of researchers wishing to use the NSFG contextual
data, NCHS has devised three options for accessing the data, thereby optimizing
the opportunity for analysis of the data while preserving confidentiality
strictures. These options include
remote access, on-site access, and access through collaboration.
At this time, these are the three
access options that are available: remote access, on-site access, or access
through collaboration. If the
researcher is able to use SAS, remote access can be used. If other software is indicated, then on-site
or collaborative access is suggested.
Remote access
Under the remote access option, the
NSFG95 contextual data files are available to qualified researchers through a
remote system called ANDRE (ANalytic Data Research by E-mail).
SAS programs written by researchers
to analyze the NSFG95 contextual data
are submitted directly in ASCII format via e-mail to ANDRE to be run
in-house 24 hours a day, 365 days a year.
Certain SAS procedures and SAS functions are not allowed. For example, users may not use PROC TABULATE
or PROC IML or functions that can produce listings of individual cases such as
the commands LIST and PRINT. Functions
which select for individual cases also are not allowed, and minimum cell sizes
must be maintained in all frequency tables.
For the NSFG, cells containing fewer than 5 cases will be suppressed
and, if necessary, additional cells will be suppressed (complementary
suppression) to disallow calculation of suppressed cell sizes from
marginals. The analyst will be informed
of all prohibited functions and
commands when they register for remote access.
All output and job logs are scanned for violations. If the program does not break any of the
confidentiality rules, the output is returned via e-mail to the researcher. Researchers are notified if there is a
problem, and told how to remedy it.
This procedure, which is similar to the one used for the Luxembourg
Income Study,[3] is now
in full operation.
Prospective users of these data files will sign a confidentiality agreement
with NCHS, and then move through the remote access process:
> Client writes computer
programs; tests programs on the >dummy= data set.
> Client e-mails >finalized= programs to ANDRE or to contact person at
NCHS.
> NCHS runs programs.
> ANDRE (or contact
person) checks for confidentiality breaches.
> If problems, NCHS e-mails
program back to client for modification.
If no problems, NCHS provides client with output.
On-site access
Contextual data file users may
choose to do their analytic work on-site at the new Research Data Center (RDC),
a facility located at NCHS in Hyattsville, Maryland. Researchers with approved projects are granted access to the data
files, but must complete all work within the confines of the RDC. Electronic or hardcopies of data files or
file documents may be removed from the RDC after being reviewed by RDC staff
for disclosure risk.
Since resources and physical space
are limited, researchers are advised to contact the RDC well in advance of
their planned work visit to ensure that their research proposal has been
processed and accepted, and that all necessary data files, computer hardware,
software, and staff are available at the time of their arrival. Researchers may only work at the RDC under
the supervision of NCHS-RDC staff during normal working hours. There is a fee charged for use of the RDC
facility. This fee helps to defray part
of the cost for equipment, workspace, NCHS staff time for monitoring, technical
assistance, disclosure limitation review, and file management. Files needing special handling and set-up,
or custom file formats, can be accommodated for an additional charge.
Collaboration
NSFG staff welcome the opportunity
to work with researchers from outside agencies, academic institutions, and non-profit
organizations, etc. to analyze data and publish reports using the contextual
data files. Collaborations are formed
on an ad hoc basis and depend on
the joint interests of the prospective researchers. While these collaborations are encouraged, the availability of
the NSFG staff is limited. The NSFG
staff will request joint authorship on
all collaborative projects and will be required to submit all manuscripts to
NCHS for clearance.
Staff programming is another
alternative for users who need a lot of technical assistance with large complex
analytic projects or who do not have access to programming support at their
home institution. Researchers are
advised to contact NCHS-RDC to perform programming tasks necessary for their
research project.
Contextual Data Analysis Software
NCHS currently supports use of several software packages for analysis
of the NSFG contextual data files. SAS
is the primary and most popular package, and is the one that can be used with
our remote access system, ANDRE. Other
packages, including LIMDEP, HLM (Version 4, for Windows) and STATA, are
available for use but must be run manually by an NCHS staff person on-site. SAS PROC MIXED is also being made available.[4]
Origins and Types of Contextual Data in the NSFG
A contextual data file is the set of characteristics of the environment
in which the respondents (women interviewed) live. The 1995 NSFG contextual data file is actually a set of three
main files, one file for variables describing the area in which the respondent
lived at one of three dates: 1990,
1993, and 1995. Each contextual data
file contains 10,847 observations, one for each respondent in the NSFG. The number of states (including the District
of Columbia), counties, tracts and block groups for the respondent residence years are shown in
Table 1.
Table 1. Number of Geographic Areas Represented by
NSFG95 Respondents= Residences |
||||
Year |
States |
Counties |
Tracts |
Blocks |
1990 |
51 |
972 |
5591 |
6208 |
1993 |
51 |
815 |
5605 |
6228 |
1995 |
51 |
890 |
6250 |
7000 |
Source: Battelle, 1997.
Source files
Variables were culled from several existing files, e.g. census data,
area resource files and private databases.
Data on where the women lived are shown by state, county, census tract,
and block group. Hundreds of variables
are included, with variables on topics ranging from the availability of family
planning service providers per capita to median Hispanic household income to
unemployment rates by race and sex.
Each variable in the NSFG95 contextual data file was derived from one
or more of the source files listed below.
When multiple sources were used to calculate a variable, the numerator
was used to code the data source of the constructed variable.[5]
> US Census Bureau files:
--Population Estimates
--State Government Finances
--United States Summary
> Area Resource File, 1995
> Census of Population and
Housing, 1990: Summary Tape File 3A
> HCFA File
> Dept. of Labor ETA Form
Files
> CDC STD File
> NCHS Natality Tapes
> USA Counties, 1994
> Alan Guttmacher
Institute Files
> Uniform Crime Reports,
US-FBI
> The Green Book
Time points
The NSFG contextual file contains a rich set of information for the
respondents= residences at three key points in time:
1990 -- decennial census year
1993 -- year of NHIS from which NSFG household sample drawn
1995 -- NSFG survey year
Levels of aggregation
Data at different geographic levels are shown for the state, county,
census tract & block group in which the sample women resided in 1990, 1993
and 1995. Note that many variables are
only available at the state or county level, and that many variables are highly
correlated with each other. While these
may be a limitation for some analyses, both of these factors reduce the chances
of disclosure.
Examples of variables available
> Percent of population that is black, white, Hispanic
> Median rent; median value of homes
> Median family income; median household income
> Percent receiving public assistance
> Average value of public assistance
> Unemployment rate
> Percent below poverty level
> Rates of gonorrhea, chlamydia, syphilis
> Crime rates (violent, property, and total)
> AFDC payment per family, or per recipient
> AFDC income cut-off
> Infant death rates by race
> Family planning providers per capita
> Medicaid payments per capita
> Abortion rates
Analyses Using the NSFG Contextual Data
Contextual items cover a broad range
of topics, from poverty and income data to information on health services and
sexually transmitted disease infection rates.
A variety of variables from some of these subject areas, as well as
others, illustrate the kinds of research results that can be generated with
these data files.
Test researchers
The NCHS remote data access system was first tested with the help of a
group of researchers[6]
that were interested in doing contextual data analyses with NSFG data. Although
extensive testing was done in-house by NCHS staff, testing by outside
researchers was also desirable to develop a product and system that was usable
by a larger research community with its
unique interests and needs. The
test researchers enabled us to refine our remote automated system, and alerted
us to additional types of software that might be requested as add-ons to our
manual submission process. They also
gave us practice in merging outside data sources with our contextual data
files, and helped us streamline our disclosure review process.
Current research
In an effort to stimulate research on the NSFG contextual data, a paper
session was held at the NCHS National Conference on Health Statistics, in
August, 1999, in Washington, DC. At
this session four groups of researchers presented their analyses that
demonstrated a variety of methodologies, software, and research questions. Some of their work is discussed below.
In their paper on economic incentives, teenage sexual activity, and
contraceptive use, Argys et al.[7]
merged in their own state-level policy variables with NSFG95 data and submitted
programs remotely to create a SAS output analysis data file of contextual and
individual variables. They then worked
on-site at the NCHS RDC and used LIMDEP
to jointly estimate the probability of being sexually active and, conditional
on being sexually active, the probability of using contraception, using a
series of bivariate probit models.
Specific contextual variables from the NSFG file that were of interest
in their study included county-level abortion providers and family planning
clinics, median household income, population density, the male/female ratio,
the AIDS death rate, and others.
State-level variables included per-capita state government expenditures
on health and hospitals, and syphilis and gonorrhea rates.
They found that while some contextual variables had effects, some did
not. The availability of family
planning services (number of clinics per 10,000 women in the county of
residence) had no effect on the probability that a teen was sexually active,
which was encouraging from a policy perspective. They also found strong
evidence that family planning availability was linked to an increased
probability of contraceptive use.
Additionally, they indicated that neighborhood context (as measured by
median income in the census tract) influenced both the decision to become
sexually active and the decision to use contraception.
Brackbill and Piccinino[8]
looked at two important outcome variables -- condom use versus non-use, and
dual use of condoms and other methods -- with the general objective of
explaining effects of individual and contextual variables.[9] This approach was intriguing in that it
additionally used contextual factors to explore outcome behaviors that are
generally only described in terms of individual characteristics. Wherever possible they used block level
contextual variables assuming that the measured population level characteristic
was more likely to encompass the environment in which the respondents
lived. When correlation matrices of
county and block level variables were compared, it was shown that block level
measures had a higher correlation with individual level variables. Multilevel logistic regression models were
used in an exploratory approach to assess possible contextual influences on
behaviors. These model types are
limited, however, because of the variance constraints on the higher level
variables in a mixed (fixed and random effects) level regression. This research, too, found strong effects of
family planning service providers on the outcome variables. Analysis runs were
done with a combination of on-site and remote SAS submissions.
Kanaiaupuni and Fomby=s[10]
multidimensional approach placed an emphasis on policy variables, though the
impact of contextual variables in this paper was not as clear cut as in the
Argys et al.[11] These researchers analyzed variables at the
individual, county, and census tract level using SAS and HLM, a software
package for analyzing hierarchical linear models. In this instance, SAS programs were submitted remotely for early
exploratory work. Later, SAS programs
were e-mailed to an NSFG staff person where they were used to create a set of
intermediate confidential output files.
These files, because they contained identifiers, remained in-house and were used as input for
the series of HLM runs performed by staff of the RDC.
In the first stage of a three-part proposal, Manlove and Terry merged
in state-level data from 1980 to 1995 generated from vital statistics, the
Department of Education and census data to run multivariate hazard models to
analyze the effects of family, individual, partner and community-level
characteristics on the risk of teen motherhood and non-marital teen motherhood
across three cohorts of teens.[12] Another researcher, Zavodny of the Federal
Reserve Bank of Atlanta, Georgia, currently is working with the RDC to run
nested logits in STATA for her project on characteristics of teens and their
partners, and resolution of their pregnancies.
Contextual variables of interest in this analysis are abortion providers
per capita in the state, whether the state allows Medicaid-funded abortions,
whether the state has a parental consent law, as well as variables measuring
the generosity of welfare benefits.
Appendix
Hierarchical Linear Models (HLM)
NCHS now has on-site HLM Version 4.04 by Anthony Bryk, Stephen
Raudenbush and Richard Congdon, Jr.
This software package can read data from SAS and STATA, among other
packages. It allows estimation of two-
and three-level models, e.g. models using data observations within persons,
persons within communities, and communities within states.[13] This software is available for purchase and
license through Scientific Software International, Inc., 7383 N. Lincoln
Avenue, Suite 100, Chicago, IL 60646-1704.
LIMDEP
LIMDEP is a general econometrics program for estimating linear and
non-linear regression models, primarily for cross-sectional, time-series and
panel data. It is available through
Econometric Software, Inc., 15 Gloria Place, Plainview, NY 11803.
SAS PROC MIXED
This procedure is offered as a SAS System feature through the SAS
Institute Inc. It=s
basic function is to fit linear models, and can sometimes used in the analysis
of multilevel models.
More information may be obtained from the SAS Institute Inc., Box 8000,
Cary, NC 27511-8000.
STATA
STATA is a powerful general purpose package for data analysis and data
management, with graphics capabilities and a graphics editor. STATA
covers a
wide range of statistical techniques and is programmable, allowing the
user to add new
commands. For more information contact STATA, 702 University Drive E.,
College Station, TX 77840.
[1] WR Grady, DH Klepinger and JOG Billy,
1993, The Influence of Community Characteristics on the Practice of Effective
Contraception. FPP 25(1), pp. 4-11.
[2] Battelle, 1997, User Documentation for the
National Survey of Family Growth, Cycle V, Contextual Database, Final Report.
[3] Journal
of Official Statistics, 1993, Supplement.
See also S Fienberg and L
Willenborg, 1998, Introduction to the Special Issue: Disclosure Limitation
Methods for Protecting the Confidentiality of Statistical Data, Journal of
Official Statistics.
[4] These
packages are described briefly in the Appendix.
[5] Ibid.
Battelle, 1997, Appendix B.
[6] From academic and government research
environments.
[7] LM Argys et al., 1999, The Impact of Economic Incentives on Teenage Sexual
Activity and Contraceptive Use. Paper
presented at the National Conference on Health Statistics, August 2-4, 1999,
Washington, DC.
[8] RM Brackbill and LJ Piccinino, 1999, The Influence of Individual and
Contextual Variables on Condom Use to Prevent Sexually Transmitted
Diseases. Paper presented at the
National Conference on Health Statistics, August 2-4, 1999, Washington, D.C.
[9] Much of the early analysis for this paper
was done using remote submissions through the automated system ANDRE.
[10] S
Kanaiupuni and P Fomby, 1999, Assessing the Impact of Family Planning Services
on Individual Contraceptive Use and Pregnancy: A Multilevel Analysis. Paper presented at the National Conference
on Health Statistics, August 2-4, 1999, Washington, D.C.
[11] LM Argys et al., ibid.
[12] J Manlove and E Terry, 1999, The Effect of
Contextual Factors on Demographic Trends in Teen Fertility. Poster presented at the National Conference
on Health Statistics, August 2-4, 1999, Washington, D.C.
[13] A Bryk et al., 1996, Hierarchical Linear
and Nonlinear Modeling with the HLM/2L and HLM/3L Programs. Chicago, IL: Scientific Software
International, Inc.