Help.

Account.

Cart.

Home

This Web site is a component of the SAMHSA Health Information Network.

Quick Find & Order

Top 50

Pubs in Series

Cost Recovery Items

Posters

Videos

Spanish

Drugs

Audiences

Issues

Publications Home

NCADI Home

This Web site is a component of the SAMHSA Health Information Network.

Skip Navigation
	The National Clearinghouse for Alcohol and Drug Information A service of SAMHSA	Search PREVLINE:
Questions \| About Us \| Contact Us \| Services \| Site Map \| Home \|
Send This Page to a Friend top

" Section="" Keywords="" Description=" " runat="server"/>
TABLE OF CONTENTS || NEXT

I. Introduction

The primary objective of the National Household Survey on Drug Abuse (NHSDA) is to measure the prevalence of use of illicit drugs, alcohol, and tobacco products, as well as the nonmedical use of prescription drugs in the United States. The 1998 NHSDA is the eighteenth in the series, which began in 1971 and has been conducted annually since 1990.

The Population Estimates report is published by the Office of Applied Studies (OAS) within the Substance Abuse and Mental Health Services Administration (SAMHSA). The report provides the drug abuse prevention, treatment, and research communities, and other interested parties, with timely data on current substance use prevalence measures. OAS issues a companion volume, the Main Findings report, at a later date to present an expanded analysis of the data, including information on drug and alcohol use trends; demographic correlates of use of illicit drugs, alcohol, and tobacco; patterns and problems of drug use; and perceived harmfulness of drug use. OAS produces and distributes another NHSDA report containing prevalence estimates when the survey results are first released each year. Special analytic reports also are periodically produced on topics of current interest (e.g., drug use among employed people, drug use and family structure).

Estimates presented in Sections III and IV of this report are based on a questionnaire and estimation methodology introduced in 1994 and continued through 1998. 1 Due to the effect this new methodology may have on the magnitude of estimates, comparisons with NHSDA data prior to 1994 should be made with caution. 2

TOC || BACK || NEXT

SURVEY METHODOLOGY

The NHSDA is based on a stratified, multistage area probability sample. For the 1998 study, 139 primary sampling units (PSUs) were selected at the first stage of sampling. The 1998 national sample was comprised of 115 PSUs (43 certainty PSUs and 72 noncertainty PSUs). The national certainty PSUs have been included in the NHSDA since 1988, and the noncertainty PSUs are the same PSUs selected since 1996. The 1998 national sample was supplemented by six noncertainty PSUs from the 14 Arizona PSUs that were selected in 1997, plus two noncertainty PSUs from the four California PSUs that were selected in 1997, and an additional 16 new noncertainty PSUs selected from 13 States to provide minimal sample sizes for testing the small area estimation methodology in all States.

————

¹ See page 9 of the following: Office of Applied Studies, Substance Abuse and Mental Health Services Administration. (1995). National Household Survey on Drug Abuse: Population estimates 1994 (DHHS Publication No. SMA 95-3063). Rockville, MD: Author.

² For additional detailed information on the 1994 questionnaire modification and its implications, see the following: Office of Applied Studies, Substance Abuse and Mental Health Services Administration. (1996). The development and implementation of a new data collection instrument for the 1994 National Household Survey on Drug Abuse (DHHS Publication No. SMA 96-3084). Rockville, MD: Author.

Within each PSU, area segments were selected with probability proportional to a composite size measure that was designed to overrepresent concentrated Hispanic and black neighborhoods, as well as younger individuals. ³ Dwelling units were selected from each sample segment. The target population included all 12-year-old or older civilian residents of households (including civilians residing on military installations) and noninstitutional group quarters (e.g., college dormitories, homeless shelters, rooming houses). Persons excluded from this target population included military personnel on active duty, transient populations (such as homeless people who do not reside in shelters), and residents of institutional group quarters (e.g., jails, hospitals). Data collection was continuous over the 1998 calendar year.

Survey data were collected through personal visits to each selected residence. Introductory letters were mailed to each residence and were used to explain the survey prior to the interviewer's visit. Upon arrival, the NHSDA field representative conducted a short voluntary screening procedure with any resident of the household 18 years old or older who was capable of providing information on the age, race/ethnicity, gender, and marital status of each resident 12 years old or older. This information was used in a random selection procedure to determine whether any resident members were eligible for an in-depth interview (either one, two, or no individuals were selected). The interviewer had no control over this selection procedure. The 1998 within-household person selection probabilities were based on the race/ethnicity of the head of household and the ages of each household member. Selected individuals then were asked if they would complete a voluntary interview. NHSDA field representatives conducted the interviews using a paper-and-pencil questionnaire that included both interviewer-administered questions and self-administered answer sheets (for collection of sensitive information). All screening and interview responses were kept confidential.

In 1998, a total of 33,128 eligible dwelling unit members were selected for an interview; of these, a total of 25,500 interviews were completed (4,903 interviews were conducted in California, 3,869 interviews were conducted in Arizona and 16,728 interviews were conducted in the remaining States). Total response rates for screening and interviewing were 93.0% and 77.0%, respectively.

TOC || BACK || NEXT

PROCEDURES FOR DERIVING POPULATION ESTIMATES

For convenience, both rate estimates and corresponding population estimates are included in each table. Population estimates are presented in thousands (e.g., a population estimate of 430 represents 430,000 persons). Each "observed estimate" is followed by its 95% confidence interval in parentheses. Estimates are provided for each of the following time periods when use of illicit drugs, alcohol, and tobacco occurred: (a) use in the lifetime (ever used), (b) use in the past year (used past year), and (c) use in the past month (used past month), also referred to as "current use." These estimates have been obtained by weighting the data to reflect current population totals for various demographic subgroup populations.

——————

³ In the interest of readability for this report, “white” is used to indicate “white, non-Hispanic,” and “black” to indicate “black, non-Hispanic.”

Development of Weights

An analysis weight was calculated for each completed interview to reflect selection probabilities and to compensate for nonresponse and undercoverage. A poststratification adjustment was made to force the respondent weight totals to equal U.S. Bureau of the Census projections of the civilian, noninstitutionalized population 12-years or older. The poststratification totals 4 were obtained from the National Estimates and Population Projections Branch of the U.S. Bureau of the Census and classified by age group, gender, race/ethnicity, and Hispanic origin. These poststratification totals were appropriately adjusted using State-level population projections that also were obtained from the Census Bureau. With the State-level demographic control totals so obtained, a three-level “State” variable (i.e., California, Arizona, and remainder of the United States) was used in the poststratification to produce stable State-level estimates for California and Arizona. In 1998, regional control totals also were obtained and used during poststratification. In general, the interview samples from each quarter were poststratified to one-fourth of the projected population totals. These totals represent the population at the midpoint of each quarter's data collection period (the 15th day of February, May, August, and November 1998). The resulting quarterly analysis weights sum to the average of the four quarter-specific projections. The final analysis weight can be viewed as the number of population members that each respondent represents.

Tables 1A, 1B, and 1C in Section II present the sample sizes and U.S. population totals on which the population estimates of drug use are based. The population figures are estimates of the civilian, noninstitutionalized population and are generated by summing the individual final analysis weights of the respondents belonging to each population category.

Adjusting for Nonresponse Through Imputation

The population estimates in this report are based on either the total responding sample or all cases in a subgroup, including some cases where missing data for some recency-of-use and frequency-of-use variables were replaced with logically or statistically imputed (i.e., replaced) values. The interview classification "minimally complete" (a status necessary for a case to be included in the database) requires that data on the recency of use of alcohol, marijuana, and cocaine be present. To determine case completeness, an editing procedure was employed to replace missing data for these substances based on information supplied by the respondent elsewhere in the questionnaire. After this editing, case completeness was determined. When necessary, additional logical imputation also was done to replace other inconsistent, missing, or otherwise faulty data.

After editing, any data still missing for recency-of-use and frequency-of-use questions (for drugs other than alcohol, cocaine, and marijuana) were statistically imputed using the technique of sequential hot-deck imputation. The first step in this procedure involves sorting the data file progressively using data on recency of use of alcohol, marijuana, and cocaine; age; gender; Hispanic origin; race/ethnicity; and a State indicator variable (i.e., California, Arizona, or remainder of the United States). The hot-deck imputation procedure replaces a missing item on a particular record by the last encountered nonmissing response for that item (from the previous record) on the sorted database. The hot-deck imputation procedure is appropriate for recency-of-use and most frequency-of-use variables because the level of item nonresponse is low.

Missing data for the variables on frequency of use of alcohol, cocaine, and marijuana in the past 12 months were statistically imputed using a regression-based method of imputation. This imputation procedure involves estimating a polytomous logistic model using a number of respondent characteristics. The explanatory variables used in these models included those variables used in the recency-of-use hot-deck imputation procedure, such as recency of use of alcohol, marijuana, and cocaine; age; gender; Hispanic origin; race/ethnicity; and State. After the model parameters were estimated, the resulting model was used to predict a categorical value for each frequency-of-use item nonresponse. The model-based imputation procedure is appropriate for alcohol, cocaine, and marijuana frequencies for two reasons: (a) the relative amount of nonresponse or faulty responses to these questions is larger than what is observed for the recency-of-use and other frequency-of-use items, and (b) the model-based procedure allows a greater number of statistically significant explanatory variables to contribute to imputing a response compared to what is possible with the hot-deck method.

The main advantage of imputation is that it simplifies the calculation of estimates. Its use can reduce the bias caused by missing data and thus improve the accuracy of estimates. In the 1998 NHSDA, however, the potential impact of bias due to item nonresponse and the impact of imputation on the estimates themselves were quite small because item nonresponse was less than 2% for most of the drug use recency questions.

—————

´ These 1998 population projections were based on the 1990 U.S. Census counts.

Sampling Error and Confidence Intervals

The NHSDA, like all sample surveys, has an inherent degree of statistical uncertainty based on the sample design. NHSDA estimates are subject to uncertainties of two types: sampling errors and nonsampling errors. Examples of nonsampling errors are recording mistakes, coding errors, nonresponse, differences in respondents' interpretations of questions, and purposely false answers. The effects of nonsampling errors on the estimates cannot normally be quantified; however, rigorous attempts are made to minimize their occurrence through pretesting, interviewer training, interview verification, coder training, coding verification, and other quality control measures.

Sampling errors denote the random fluctuations that occur in estimates based on samples drawn from a population; such variations can be eliminated only by conducting a complete census. Using the same procedures, different samples drawn from the same population would be expected to result in different estimates. Many of these observed estimates would differ to some degree from the "true" population value, and these differences are due to sampling error. The variance of an estimate is the basic measure of this type of error.

To account for the complex features of the NHSDA sample design (such as unequal selection probabilities, stratification, and clustering), the variance estimates of the NHSDA drug use statistics are computed for this report using a survey data analysis software package called SUDAAN. 5 Estimates of means or proportions, such as drug use prevalence, take the form of nonlinear statistics where the variances cannot be expressed in closed form. Variance estimation for nonlinear statistics in SUDAAN is based on a first-order Taylor series approximation of the deviations of estimates from their expected values. The resulting variance estimates are approximately unbiased for sufficiently large sample sizes.

For a given variance estimate, the associated design effect is the ratio of the design-based variance estimate over the variance that would have been obtained from a simple random sample of the same size. Because the combined design features of stratification, clustering, and unequal weighting are expected to increase the variance estimates, the design effect should virtually always be greater than one. For prevalence rates near zero, however, the variance-inflating effects of unequal weighting and clustering are sometimes underestimated, resulting in design effects of less than one. Because the corresponding variance estimates are then considered anomalously small, two other variance estimates are computed as quality control measures. The first of these other variance estimates is based only on the stratification and unequal weighting effects, and the second is based on simple random sampling. The variance estimate used for obtaining confidence intervals is then the maximum of these three estimates.

The 95% confidence intervals for the drug use proportions and corresponding population estimates are constructed based on the logit transformation. Because the drug use proportions in the NHSDA are frequently small, the logit transformation has been used for this report to yield asymmetric interval boundaries. These asymmetric intervals are more balanced with respect to the probability that the interval is above or below the true population value than is the case for standard symmetric confidence intervals.

To illustrate the method, let

p = estimated proportion,

var(p) = variance estimate of p,

q =1-p,

L = logit of p = ln [p/(1-p)], where "ln" denotes the natural logarithm, and

var(L) = var(p)/(pq)² .

The approximate 95% confidence interval for L is then calculated as

where the quantity in parentheses that is multiplied by 1.96 estimates the standard error (SE) of L. Applying the inverse logistic transformation to the confidence interval endpoints, A and B, yields a 95% confidence interval for the proportion, P, as

where "exp" denotes the inverse log transformation. The lower and upper confidence interval endpoints for percentage estimates are obtained by multiplying the lower and upper endpoints for proportions by 100. The confidence interval for the corresponding population estimate is obtained by multiplying the confidence interval endpoints by the estimated number of individuals in the population subgroup constituting the base or denominator of the associated proportion.

The precise interpretation of the 95% confidence interval is as follows: If repeated samples of identical design are drawn from the population, and the sample estimate and corresponding upper and lower confidence limits are calculated for each sample, then the true population value is covered by the confidence intervals of, on average, 95 of 100 samples.

For tables in this report, each estimate of the number of users of the drug in the defined subgroup (as well as its corresponding estimated percentage of the subgroup's total population) is accompanied by an upper and lower confidence limit. For example, in the lower portion of Table 3A, the "observed estimate" for the total number of people who have "ever used" marijuana is 72,070,000. The "lower limit" is 69,122,000, and the "upper limit" is 75,080,000. The interpretation of these estimates is that one can be 95% confident that the total number of people who have ever tried marijuana at least once in their lifetime lies between 69,122,000 and 75,080,000, with the best 1998 NHSDA estimate being 72,070,000. The corresponding percentage estimates for the lower and upper confidence limits are 31.6% and 34.4%, respectively, with the best estimate being 33.0%.

As in other publications in the NHSDA series, estimates with low precision are not reported. The criterion used for suppressing estimates is based on the size of the estimate and the relative standard error (RSE) of the estimate. The RSE is defined as the ratio of the standard error of an estimate divided by the estimate itself. Specifically, cell percentages and corresponding estimates of numbers of users are suppressed if at least one of the following three criteria is met:

(1) p < .0005 or p A .9995

(2) RSE[-ln(p)] > 0.175 when p < 0.5

(3) RSE[-ln(1-p)] > 0.175 when p > 0.5

where RSE[-ln(p)] is the RSE(p)/-ln(p). For computational purposes, this is equivalent to

where SE(p) is the standard error estimate of p. The log transformation of p is used to provide a more balanced treatment of measuring the quality of small, large, and intermediate p values. The switch to (1-p) for p greater than 0.5 yields a symmetric suppression rule across the range of possible p values. Because the sample sizes for subgroup populations are relatively large, low precision generally occurs only for prevalence rates that are near either 0% or 100%.
TOC || BACK || NEXT
NHSDA drug use prevalence data are presented for each gender; four major age groups (12 to 17, 18 to 25, 26 to 34, and 35 years old or older); three major mutually exclusive racial/ethnic groups, based on respondents' self-classifications (Hispanic in origin, regardless of race; white, not of Hispanic origin; and black, not of Hispanic origin); and four geographic regions. (Those who did not identify themselves as Hispanic, white, or black are included in the population totals, but separate estimates are not presented for this "other" category because the sample size is too small [see Table 1B].) Tables are presented separately for the total population, whites, Hispanics, blacks, and geographic region. The four U.S. Bureau of the Census regions are Northeast, North Central, South, and West (see Exhibit 1). For each drug, eight tables are arranged to facilitate group comparisons. Data for the estimated numbers of users in subgroups are arranged in rows and presented by gender for each of four age groups. Data in the remaining seven tables for each racial/ethnic or regional subgroup are presented first by age, then by gender, and finally for the total population.
Time periods of use shown in column headings are "ever used," "used past year," and "used past month." These categories are cumulative (i.e., those who have "used [in the] past month" also are included in the "used [in the] past year" and "ever used" categories). Likewise, those who have "used [in the] past year" are included in the "ever used" estimates.
Other than presenting results by age group and other basic demographic characteristics, no attempt is made in this report to control for potentially confounding factors that might help explain any associations observed. This point is particularly salient with respect to race/ethnicity, which tends to be highly associated with socioeconomic characteristics. Also, the cross-sectional nature of the data precludes any causal interpretations of observed relationships. Nevertheless, data presented in this report are useful for comparing demographic subgroups with respect to drug use rates, regardless of why they differ.
Prevalence Estimates for Specific Drugs and Drug Classes
Section III presents the basic set of drug use prevalence estimates grouped by various drug categories. The first drug category presented is "Any Illicit Drug," which includes any use of marijuana/hashish; cocaine, including crack; inhalants; hallucinogens, including lysergic acid diethylamide (LSD) and phencyclidine (PCP); heroin; and the nonmedical use of psychotherapeutics (i.e., stimulants, sedatives, tranquilizers, and analgesics). Following the estimates for any illicit use, tables are presented separately for various specific categories of illicit drug use, as well as for alcohol, cigarettes, and smokeless tobacco. The small number of respondents reporting these drug use behaviors resulted in low precision for most "used past month" estimates, as well as many other estimates; therefore, less detail is shown for estimates of PCP use, LSD use, heroin use, and needle use (see Tables 16 to 19).
Frequency of Drug Use Among Past Year Users
Data presented in Section IV are useful for identifying how often a drug is used. After results from earlier surveys were published, the estimate of those who had used a drug in the past month was cited by some readers as an estimate of the number of "regular users." This interpretation was unsatisfactory because past month users include both those experimenting with the drug as well as regular users. Therefore, information has since been collected on the frequency of drug use in the past year.
Frequency of drug use during the past 12 months is classified into three categories: "at least once," "12 or more days," and "51 or more days." The categories are cumulative; those using "51 or more days" also are counted among the "12 or more days" and the "at least once" users. Similarly, those using "12 or more days" also are counted among those who have used "at least once" in the past year. By definition, estimates for those who have used “at least once” are equivalent to those who have “used past year” in earlier tables.

TOC || BACK
CONSIDERATIONS IN INTERPRETING THE DATA

The estimates produced in this report should be viewed as approximations based on the best available data. Readers are therefore cautioned to take the following points into account when using or interpreting the data in this report:
The value of self-reports obviously depends on the honesty and memory of sampled respondents. Research has supported the validity of self-report data in similar contexts.^6,7 Although NHSDA procedures are designed to encourage truthfulness and recall, as with all studies of this type, some underreporting or overreporting may occur.
NHSDA drug use prevalence estimates for specific subgroups are sometimes based on modest to small sample sizes, which may lead to substantial sampling error.
Population projections prepared for the U.S. Bureau of the Census' Current Population Survey (and used in weighting the 1998 NHSDA sample) are subject to error, which increases with the age of the last census.
The population surveyed consists of noninstitutionalized civilians living in households, college dormitories, homeless shelters, rooming houses, and on military installations. Therefore, this report does not present estimates for some segments of the U.S. population that may contain a substantial proportion of drug users, such as transients not residing in shelters (e.g., users of soup kitchens or residents of street encampments) and those incarcerated in county jails or State and Federal prisons.
⁶ Rouse, B.A, Kozel, N.J., & Richards, L.G. (Eds.). (1985). Self-report methods of estimating drug use (NIDA Research Monograph No. 57, DHHS Publication No. ADM 85-1402). Rockville, MD: National Institute on Drug Abuse.
⁷ Turner, C.F., Lessler, J.T., & Gfroerer, J.C. (Eds.). (1992). Survey measurement of drug use: Methodological studies (DHHS Publication No. ADM 92-1929). Rockville, MD: National Institute on Drug Abuse.

TOC || BACK

Homepage

Initiatives | Funding | Home

	U.S. Department of Health and Human Services
	Substance Abuse and Mental Health Services Administration
	Center for Mental Health Services
	Center for Substance Abuse Prevention
	Center for Substance Abuse Treatment

(800) 729-6686

TDD: (800) 487-4889

Español: (877) 767-8432

SAMHSA Home | Freedom of Information Act | Department of Health and Human Services | The White House | USA.gov