Office of National Drug Control Policy bannerskip
skip tertiary linksHome | About | Site Map | Contact

Home | Publications | What America's Users Spend on Illegal Drugs 1988–1998

What America's Users Spend on Illegal Drugs 1988–1998

December 2000

Return to Standard Version of Appendix D.

Spoken Math Version.

Appendix D.

The formulae in the spoken math version of Appendix D are expressed using the Nemeth Code, developed by Dr. A. Nemeth to articulate mathematical expressions in spoken language.

Please note that we deviate from this standard in coding word variables as words instead of spelling out all letters. For instance, we represent the variable "joints" as "joints," rather than spelling out j o i n t s. We also omit using the term "upword" when coding a word variable that is presented in upper case.

Imputations for Missing Data on Marijuana Use.

Calculations of the amount of marijuana used by household members was straightforward. We multiplied the number of marijuana users per month, by the average number of joints smoked per user, by the average weight of a joint. The result was then multiplied by twelve months to give a year's estimate. The principal problems when making this calculation are dealing with missing data and with responses that represent a range. The latter presents a problem because the ranges are not suitable for our calculations. Because the Substance Abuse and Mental Health Services Administration had already imputed responses when there was missing data about recent use, this was not a problem. This appendix explains how we imputed responses when either the number of joints smoked or the amount of marijuana smoked were missing or were reported as a range.

Imputing the Number of Joints Smoked.

From the National Household Survey for 1991, analysts selected respondents who said they used marijuana in the past month and who gave valid responses to three related questions. The first question was the number of days they smoked marijuana in the past month L pare DAYS R pare. Valid responses were 1 to 30 days. The second question was the number of marijuana cigarettes smoked per day in the past month L pare JOINTS R pare. From the responses to these two questions, analysts created a variable,

TOTAL JOINTS equals DAYS times JOINTS.

The third question was the amount of marijuana used during the last month L pare AMOUNT R pare. This is exactly the question that the analysts sought to answer, but the AMOUNT question was not directly useful for this purpose because it was specified as a range. The acceptable answers to AMOUNT were:

  • 1 to 10 joints
  • 11 to 20 joints
  • 1 ounce
  • 2 ounces
  • 3 to 4 ounces
  • 5 to 6 ounces

The analysts' problem was to infer the amount of marijuana used by people who said they used marijuana in the last month based on the variables TOTAL JOINTS and AMOUNT.

As short-hand, let upper J represent TOTAL JOINTS, let upper A represent AMOUNT, and let upper W equal the weight of marijuana used in ounces. The analysts wanted to estimate upper W.

Now, upper W is unknown, but it might be represented as:

Upper W equals lambda upper J plus epsilon,

where lambda is the weight per joint and epsilon is a random error term, which will be discussed below. Equation L brack 1 R brack says that, on average, a person who smokes upper J joints will use upper W ounces of marijuana, because lambda is the average weight of a single joint. Of course, some people who smoke upper J joints use a little less; some use a little more. This variation about what is typical is reflected in the term epsilon.

Assume that epsilon is distributed normally with a mean of zero, a standard deviation of sigma, and that the error terms are independently and identically distributed. It turns out that these assumptions about the distribution of epsilon are hard to justify, and alternative assumptions are adopted later. However, this simple, if somewhat unrealistic, specification is useful for explaining the approach.

Although upper W is unknown to the analysts, it is known to the respondent, and by assumption the value of upper W determines the respondent's answer for AMOUNT. Specifically, the respondent will say that he used:

1 to 10 joints when upper W is less equal propto sub 2 base.
10 to 12 joints when propto sub 1 base less upper W less equal propto sub 1 base.
1 ounce when propto sub 2 base less upper W less equal 1 point 5.
2 ounces when 1 point 5 less upper W less equal 2 point 5.
3 to 4 ounces when 2 point 5 less upper W less equal 4 point 5.
5 to 6 ounces when 4 point 5 less upper W.

The logic here is that the respondent will select the usage category that most closely describes his use, although it seems reasonable to suppose that he makes errors when making this translation. Two terms are unknown, propto sub 1 base and propto sub 2 base. The first, propto sub 1 base, is presumably the weight of 10 point 5 joints. The second is harder to interpret, but propto sub 2 base is some value that distinguishes the response "10 to 2" joints from "1 ounce," at least in the eyes of the respondent.

There are four parameters to be estimated here: lambda, sigma, propto sub 1 base, and propto sub 2 base. These parameters can be estimated by maximum likelihood once a probability has been assigned to every response.

Upper P sub 1 base equals upper P L pare 1 to 10 joints R pare equals O slash L pare B frac propto sub 1 base minus lambda upper J over sigma E frac R pare.

Specifically,


Upper P sub 2 base equals upper P L pare 11 to 20 joints R pare equals O slash L pare B frac propto sub 2 base minus lambda upper J over sigma E frac R pare minus upper P sub 1 base.

Upper P sub 3 base equals upper P L pare 1 ounce R pare equals O slash L pare B frac 1 point 5 minus lambda upper J over sigma E frac R pare minus upper P sub 1 base minus upper P sub 2 base.

Upper P sub 4 base equals upper P L pare 2 ounces R pare equals O Slash L pare B frac 2 point 5 minus lambda upper J over sigma E frac R pare minus upper P sub 1 base minus upper P sub 2 base minus upper P sub 3 base.

Upper P sub 5 base equals upper P L pare 3 to 4 ounces R pare equals O slash L pare B frac 4 point 5 minus lambda upper J over sigma E frac R pare minus upper P sub 1 base minus upper P sub 2 base minus upper P sub 3 base minus upper P sub 4 base.

Upper P sub 6 base equals upper P L pare 5 to 6 ounces or more R pare equals 1 minus upper P sub 1 base minus upper P sub 2 base minus upper P sub 3 base minus upper P sub 4 base minus upper P sub 5 base.

where O slash is the standard normal distribution function.

This approach is similar to an ordered probit model. There is an important difference between this approach and a traditional probit model, however. Specifically, the threshold values of 1 point 5, 2 point 5, and 4 point 5 are known although propto sub 1 base and propto sub 2 base are unknown. This allows the parameter sigma to be identified and estimated. In turn, this allows lambda to be identified and interpreted as the weight of a marijuana cigarette.

One further extension is to assume that:

propto sub 1 base equals lambda times 10 point 5.

That is, the parameter propto sub 1 base equals the weight of 10 point 5 joints, because the weight of 10 point 5 joints is the threshold value between the responses "1 to 10 joints" and "11 to 20 joints." There are only three remaining parameters to estimate: propto sub 2 base, lambda, and sigma.

As stated, this model is an unacceptable representation of the relationship between the number of joints smoked and the amount of marijuana smoked. A more convincing model is:

Upper W equals L pare bar lambda plus epsilon sub 1 base R pare upper J plus epsilon equals bar lambda upper J plus upper J epsilon sub 1 base plus epsilon equals bar lambda upper J plus epsilon sub 2 base.

This implies that the average joint weighs minus bar ounces, but that the weight varies across users. This variation is represented by the distribution of epsilon sub 1 base. The model would be complete once the distribution of epsilon sub 2 base is specified.

The distribution of epsilon sub 2 base has to satisfy some a priori constraints. First, upper W must be positive, so epsilon sub 2 base has a lower limit that depends on minus bar upper J. Second, the distribution of epsilon sub 2 base should account for an apparent upward skew: inspection of the data shows that some users seem to use much more than the average amount of marijuana, but nobody can use much less because zero is a lower limit. Third, the error term is heteroscedastic.

A new specification is more useful, given these a priori constraints:

Upper W equals lambda upper J equals e supe epsilon supe sub 3 base base upper J,

where epsilon sub 3 base sim upper N L pare mu comma sigma R pare. Here, lambda has a lognormal distribution, and thus lambda upper J is always positive and lambda is skewed upward. In this specification:

Upper E L pare lambda R pare equals e supe mu plus 0 point 5 sigma supe supe 2 base base.

VAR L pare lambda R pare equals e supe 2 mu plus sigma supe supe 2 base base L pare e supe sigma supe supe 2 base base minus 1 R pare.

Taking logarithms on both sides of L brack 3 R brack, we have

1 n upper W equals 1 n upper J plus epsilon sub 3 base.

1 n upper W equals 1 n upper J plus mu plus epsilon sub 4 base.

where epsilon sub 4 base sim upper N L pare zero comma sigma R pare. As with the earlier, less realistic model, the parameters can be estimated using maximum likelihood. A simple extension is to let mu equals beta sub zero base plus beta sub one base upper J over 100. The "100" is just a scale factor that has no effect on analysis. This specification allows frequent smokers to smoke larger or smaller joints than average smokers.

The most important estimate is upper E L pare lambda R pare, the average weight of a marijuana cigarette. An estimate of upper W, then, is:

Upper W equals upper E L pare lambda R pare upper J.

This tells us that if a respondent says he smoked upper J joints during the month L pare TOTAL JOINTS R pare, then upper E L pare lambda R pare upper J is the best estimate of the quantity (in ounces) of marijuana smoked.


Table D presents parameter estimates based on an analysis of 1623 smokers who reported DAYS, JOINTS, and AMOUNT. Before estimating these parameters, the analysts changed some of the data.

Table D–1
Regression Results: The Total Amount of Marijuana Smoked in the Past Month.

Parameter Parameter Estimate Standard Error Probability
beta sub zero base minus 4 point 95 point 24 point 0000
beta sub 1 base point 13 point 11 point 0000
prop to sub 2 base 1 point 50 point 39 point 0001
sigma 1 point 08 point 013 point 0000
Sources: NHSDA 1991

Before calculating TOTAL JOINTS, responses of more than 30 for JOINTS (number of marijuana cigarettes smoked per day in the past month) were truncated to 30. These extreme responses represented only about 0 point 1 percent of the total number of monthly users.

After calculating TOTAL JOINTS, analysts compared TOTAL JOINTS with AMOUNT and corrected for extreme inconsistencies between (or highly unlikely combinations of) the two variables. If JOINTS greater equal 100 and AMOUNT less equal 20 joints or if JOINTS greater equal 200 and AMOUNT less equal 2 ounces, then analysts assumed that the respondents had mistakenly given the total number of joints they had smoked in the past month for the question on JOINTS (number of marijuana cigarettes smoked per day in the past month). For these respondents, analysts treated JOINTS as TOTAL JOINTS in calculating the quantity estimates.

Results from the analysis imply that a person who smokes 1 joint per month uses 0 point 013 ounces (0 point 37 grams per joint) of marijuana. A person who smokes thirty joints per month uses 0 point 4 ounces (0 point 38 grams per joint) of marijuana. A person who smokes 120 joints per month uses 1 point 79 ounces (0 point 43 grams per joint) of marijuana. Applying the parameter estimates from Table D–1, Equation L brack 7 R brack was then used to compute the average weight per joint L pare upper W over upper J R pare for every respondent in each year of the NHSDA. Results, which appear in Table 6 of the main report, are used in the calculations reported in the body of this report.

Imputing Joints.

A related problem is that the variable JOINTS was sometimes missing. We could not just substitute the average response when JOINTS were known, because those with missing data seemed to have different usage patterns from those who did not have missing data. Instead, we estimated regressions where JOINTS was the dependent variable and MJFREQ was the independent variable. MJFREQ is "frequency used marijuana in the past 12 months." We used results from these regressions to impute responses when JOINTS was missing.

MJFREQ is coded:

1 - several times a day;
2 - daily;
3 - almost daily (3 to 6 days a week);
4 - 1 or 2 times a week;
5 - several times a month (about 25 to 51 days a year);
6 - 1 or 2 times a month (12 to 24 days a year);
7 - every other month or so (6 to 11 days a year);
8 - 3 to 5 days in the past 12 months;
9 - 1 or 2 days in the past 12 months.

We treated this variable as a continuous measure. To capture nonlinearities, we added an additional independent variable MJFREQ supe 2 base equals MJFREQ times MJFREQ.

The regression had two special features. The first was that the respondent could have said that he used zero joints during the month before the interview. After all, marijuana use during the year L pare MJFREQ R pare does not imply marijuana use during the month before the survey L pare JOINTS R pare. To take this special feature into account, the regression specification was written:

Upper Z equals alpha sub zero base plus alpha sub 1 base MJFREQ plus alpha sub 2 base MJFREQ super 2 base plus epsilon.

JOINTS equals upper Z when upper Z greater equal 0.

JOINTS equals 0 otherwise.

where

Epsilon sim upper N L pare zero comma sigma R pare.

Sigma equals beta sub zero base plus beta sub 1 base upper Z.

Note that in this specification the error term is heteroscedastic and a linear function of the underlying latent variable upper Z.

Table D–2 shows regression results.

Table D–2
Regression Results: The Average Number of Joints Smoked in the Past Month.

  Model 1 Model 2
Parameter Parameter Estimate Probability Parameter Estimate Probability
propto sub zero base 81 point 23 0 point 00 12 point 62 0 point 09
propto sub 1 base minus 20 point 64 0 point 00 minus 1 point 42 0 point 24
propto sub 2 base 1 point 30 0 point 00 minus 0 point 07 0 point 30
beta sub zero base 12 point 15 0 point 00 20 point 30 0 point 00
beta sub 1 base 0 point 48 0 point 00 2 point 18 0 point 05
upper N 1418   190  
Sources: NHSDA 1991

The table shows two regressions. Model 1 was estimated for the 1418 respondents who reported use of marijuana in the 1991 NHSDA survey. Model 2 was estimated for the 190 respondents whose use of marijuana was imputed by SAMHSA. We estimated two separate models because specification testing showed that estimates based on the 1418 cases did not work well for the 190 cases and vice versa.

The regressions over predict slightly. Based on the 1418 cases, the regressions predict 23 point 4 joints on average per month. In reality, respondents said they used an average of 21 point 6 joints per month. For the 190 cases, the prediction was 10 point 7 joints on average per month and the actual was 8 point 5 joints. Because these predictions were only used when responses were missing for the variable JOINTS, we considered them to be close enough for our purposes.