Calculating Standard Errors

This document describes how users can calculate GVF-based standard errors based for SESTAT estimates. Separate sections describe:

the GVF model used,
the method used to obtain directly calculated variance estimators, and
the resulting estimated GVF parameters, displayed as parameter tables.

The parameter tables enable the user to calculate standard errors for a wide range of population totals and percentages. Instead of displaying standard errors, these tables provide parameters that the user inserts into formulas (provided below) to calculate standard errors. Examples showing how to use these tables for SESTAT and each component survey are provided.

Click here for a description of the basic steps to approximating standard errors.

Click on the highlighted text below to calculate estimated standard errors for SESTAT or to learn more about the methods used to develop the estimation parameters.

Calculate SESTAT Standard Errors Background: GVF Model for SESTAT Background: Developing Directly Calculated Variance Estimators for SESTAT

Calculate SESTAT Standard Errors

We offer two methods for obtaining standard errors for SESTAT estimates.

Method 1 (Look-up Tables) is easiest to use but is appropriate only for predicting standard errors for estimated totals.
Method 2 (Parameter Tables) can be used to predict standard errors for both estimated percentages and totals.

While the 1995 sample designs differ from the 1993 survey sample designs, the designs are similar enough that the generalized variance functions and subsequent lookup tables derived for 1993 are appropriate for evaluating the accuracy of 1995 estimates. Separate generalized variance functions were derived for 1997 and 1999.

Method 1. Obtaining Standard Errors from the Look-up Tables

The look-up tables provide approximate standard errors for estimated counts of scientists and engineers for the total population and for different segments of the population.

There are several versions of the Look-up Tables. As a general rule, use the table that is most specific to the domain you are studying. For example, the "total" category is used when more than one degree level is included.

In many cases, the exact estimate will not be included in the Look-up Tables. For these standard errors, you may use linear interpolation for intermediate values or you may wish to use Method 2 (Parameter Tables).

Click here to see an example of how to calculate a predicted standard error using this method.

Parameter tables for the following groups are available in 'html' and Microsoft Excel format:

Table Group Excel HTML

A-1 SESTAT: Total Scientists and Engineers .xls .html

A-2 SESTAT: Bachelor's Scientists and Engineers .xls .html

A-3 SESTAT: Master's Scientists and Engineers .xls .html

A-4 SESTAT: Doctoral Scientists and Engineers .xls .html

Method 2. Using the Parameter Tables

Calculating Predicted Standard Errors for Totals

Use the following equation to calculate a predicted standard error for an estimated total:

where is the predicted standard error of the estimated total , and and are estimated parameters obtained from the appropriate parameter table below.

Click here to see an example of how to calculate a predicted standard error for an estimated total using this equation.

Calculating Predicted Standard Errors for Percents

Use the following equation to calculate a predicted standard error for an estimated percent:

where is the predicted standard error for a specific estimated percentage and is the estimated number of persons in the base of the percentage. is an estimated parameter obtained from the appropriate parameter table below.

Click here to see an example of how to calculate a predicted standard error for an estimated percent using this equation.

Parameter Tables

Parameter tables for the following groups are available in 'html' and Microsoft Excel format:

Table Group Excel HTML

B-1 SESTAT: Total Scientists and Engineers .xls .html

B-2 SESTAT: Bachelor's Scientists and Engineers .xls .html

B-3 SESTAT: Master's Scientists and Engineers .xls .html

B-4 SESTAT: Doctoral Scientists and Engineers .xls .html

Background: GVF Model for SESTAT

The Scientists and Engineers Statistical Data System (SESTAT) formed the GVF model for the variance of the estimate as a quadratic function of the total, or:

where Y is the population total and is the variance of an estimated total . ß₀ and ß₁ are parameters of the model. (A comparable model found in GVF: A Methodology for Estimating Standard Errors uses the relative variance as the dependent variable.) For the SESTAT data, GVF models were specified for the overall population and for separate subgroups such as gender, race/ethnicity group, field of highest degree, occupation, and combinations of these characteristics. Separate models were estimated for 1993, 1997 and 1999. Because of the similarity in the sample designs for 1993 and 1995, a separate model was not estimated for 1995. Users are urged to use the 1993 results when evaluating stanard errors for 1995 estimates.

To fit the model, 60 population totals were estimated for each domain. Direct estimates of the variances for these domain totals were generated using the method of random groups. Ordinary Least Squares Regression was used to derive estimates of ß₀ and ß₁ with the estimated domain totals and their directly calculated variances as inputs. The results are presented as a table of generalized variance model parameters which can be used to estimate standard errors. Instructions are provided on how to use these parameters to calculate standard errors for an estimated total or proportion.

Background: Developing Directly Calculated Variance Estimators for SESTAT

The Method of Random Groups

The random group technique is appropriate when the sampling structure(s) of the survey(s) is sufficiently complex that analytically-derived variance estimation formulas become unmanageable. In general, variance estimation using the method of random groups consists of drawing multiple samples from a target population (or subpopulation) of interest and then constructing separate estimates for each sample. The dispersion of the different population estimates provide the basis for the variance measure.

For the SESTAT variance measures, the survey sample was divided into random subsamples, chosen to mimic the sample design procedures for the total sample and weighted appropriately. From the SESTAT component surveys, the observations within each stratum were randomized, and separate random group samples were systematically selected without replacement:

NSCG - Sampled cases were assigned to random groups within each sampling strata.
SDR - Respondent cases were assigned to random groups within each sampling strata.
NSRCG - Because of the two-step sample design of the NSRCG, two sets of random groups were selected:
1. Responding students from certainty institutions were assigned to random groups within each sampling strata.
2. Noncertainty institutions were assigned to random groups. All sampled students from the institution were then assigned to that institution's random group.

The sets of random groups for each survey were combined to create SESTAT random groups, with each group representing a valid sample of the combined SESTAT target population.

Example: Using the Look-up Table

Assume the estimate of the number of scientists and engineers employed in S&E occupations in 1993 was approximately 8 million people. The total column in the look-up Table A-1 shows a standard error estimate of 57,450 associated with an estimated count of 8 million.

Then the 95 % confidence interval is 1.96 (the factor for the 95% confidence interval) times the standard error from the table (57,450) -- or 1.96 x 57,450 = 112,602.

Thus, the 95% confidence interval for the true value is the interval between 7,887,398 and 8,112,602 (8,000,000 +/- 112,602).

Example: Using the Parameter Tables for Totals

Suppose SESTAT data are used to estimate the total population size of individuals employed in science or engineering occupations. As the domain for this population is the total science and engineering population, we look in Table B-1 and determine the values for ß₀(= 0.00003) and ß₁ (= 176.69490). We estimate the standard error as:

We substitute the value of = 8,311,787 and obtain = 59,508. Thus, a 95 % confidence interval for the true value for the total number of individuals employed in science or engineering occupations would be 8,311,787 +/- 116,636, where 116,636 represents 1.96 times the standard error.

Example: Using the Parameter Tables for Percentages

To illustrate the use of the formula for determining standard errors for percentages, suppose that we use SESTAT estimates to determine that 82 percent of female scientists and engineers in 1993 were participating in the labor force as of the reference week. The base for this percentage is the number of female scientists and engineers, estimated at 3,867,887. Obtaining the value for

=270.32836 from the appropriate parameter table, we calculate:

Substituting for the values of the base =3,867,887 and the percentage =82, we obtain an estimated standard error of 0.32. The 95 % confidence interval for the labor force participation rate for females is 82 % +/- 0.63 % (where 0.63 equals 1.96 times the estimated standard error).

Basic Steps to Approximating Standard Errors

The following steps may be followed to approximate the standard error of an estimated total or percentage:

Determine the appropriate survey and domain for a characteristic of interest
Obtain the estimated total or percentage from the survey;
Determine the most appropriate domain for the estimate from parameter table(s);
Refer to the parameter table to get the parameter estimates of and for this domain; and
Compute the approximate standard error using the equations provided.

Updated: October 17, 2001

Table	Group	Excel	HTML
A-1	SESTAT: Total Scientists and Engineers	.xls	.html
A-2	SESTAT: Bachelor's Scientists and Engineers	.xls	.html
A-3	SESTAT: Master's Scientists and Engineers	.xls	.html
A-4	SESTAT: Doctoral Scientists and Engineers	.xls	.html