Statistical Policy Working Paper 20 Seminar on Quality of Federal Data Part 2 of 3 Federal Committee on Statistical Methodology Statistical Policy Office Office of Information and Regulatory Affairs Office of Management and Budget March 1991 MEMBERS OF THE FEDERAL COMMITTEE ON STATISTICAL METHODOLOGY (February 1991) Maria E. Gonzalez, Chair Office of Management and Budget Yvonne M. Bishop Daniel Kasprzyk Energy Information Bureau of the Census Administration Daniel Melnick Warren L. Buckler National Science Foundation Social Security Administration Robert P. Parker Charles E. Caudill Bureau of Economic Analysis National Agricultural Statistics Service David A. Pierce Federal Reserve Board Cynthia Z.F. Clark National Agricultural Thomas J. Plewes Statistics Service Bureau of Labor Statistics Zahava D. Doering Wesley L. Schaible Smithsonian Institution Bureau of Labor Statistics Robert M. Groves Fritz J. Scheuren Bureau of the Census Internal Revenue Service Roger A. Herriot Monroe G. Sirken National Center for National Center for Education Statistics Health Statistics C. Terry Ireland Robert D. Tortora National Computer Security Bureau of the Census Center Charles D. Jones Bureau of the Census PREFACE In 1975, the Office of Management and Budget (OMB) organized the Federal Committee on Statistical Methodology. Comprised of individuals selected by OMB for their expertise and interest in statistical methods, the committee has during the past 15 years. determined areas that merit investigation and discussion, and overseen the work of subcommittees organized to study particular issues. Since 1978, 19 Statistical Policy Working Papers have been published under the auspices of the Committee. On May 23-24, 1990, the Council of Professional Associations on Federal Statistics (COPAFS) hosted a "Seminar on the Quality of Federal Data." Developed to capitalize on work undertaken during the past dozen years by the Federal Committee on statistical Methodology and its subcommittees, the seminar focused on a variety of topics that have been explored thus far in the Statistical Policy Working Paper series. The subjects covered at the seminar included: Survey Quality Profiles Paradigm Shifts Using Administrative Records Survey Coverage Evaluation Telephone Data Collection Data Editing Computer Assisted Statistical Surveys Quality in Business Surveys Cognitive Laboratories Employer Reporting Unit Match Study Approaches to Developing Questionnaires Statistical Disclosure-Avoidance Federal Longitudinal Surveys Each of these topics was presented in a two-hour session that featured formal papers and discussion, followed by informal dialogue among all speakers and attendees. Statistical Policy Working Paper 20, published in three parts, presents the proceedings of the "Seminar on the Quality of Federal Data." In addition to providing the papers and formal discussions from each of the twelve sessions, this working paper includes Robert M. Groves' keynote address, "Towards Quality in a Working Paper Series on Quality," and comments by Stephen E. Fienberg, Margaret E. Martin, and Hermann Habermann at the closing session, "Towards an Agenda for the Future." We are indebted to all of our colleagues who assisted in organizing the seminar, and to the many individuals who not only presented papers and discussions but also prepared these materials for publication. A special thanks is due to Terry Ireland and his staff for their work in assembling this working paper. Table of Contents Wednesday, May 23, 1990 Part 1 KEYNOTE ADDRESS TOWARDS QUALITY IN A WORKING PAPER SERIES ON QUALITY. . . . . . 3 Robert M. Groves, The University of Michigan and U. S. Bureau of the Census Session 1 - SURVEY QUALITY PROFILES THE SIPP QUALITY PROFILE. . . . . . . . . . . . . . . . . . . 19 Thomas B. Jabine, Statistical Consultant INITIAL REPORT ON THE QUALITY OF AGRICULTURAL SURVEY PROGRAM. 29 George A. Hanuschak, National Agricultural Statistics Service DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Barbara A. Bailar, American Statistical Association DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . 46 Nancy A. Mathiowetz, U. S. Bureau of the Census Session 2 - PARADIGM SHIFTS USING ADMINISTRATIVE RECORDS PARADIGM SHIFTS: ADMINISTRATIVE RECORDS AND CENSUS-TAKING. . . 53 Fritz Scheuren, Internal Revenue Service AN ADMINISTRATIVE RECORD PARADIGM: A CANADIAN EXPERIENCE . . . 66 John Leyes, Statistics Canada DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . 77 Gerald Gates, U.S. Bureau of the Census DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . 83 Edward J. Spar, Market Statistics Session 3 - SURVEY COVERAGE EVALUATION CONTROL MEASUREMENT, AND IMPROVEMENT OF SURVEY COVERAGE . . .87 Gary M. Shapiro, U. S. Bureau of the Census; Raymond R. Bosecker, National Agricultural Statistics Service QUALITY OF SURVEY FRAMES. . . . . . . . . . . . . . . . . 100 Judith T. Lessler, Research Triangle Institute DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . 108 Fritz Scheuren, Internal Revenue service DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . 114 Joseph Waksberg, Westat, Inc. Session 4 - TELEPHONE DATA COLLECTION QUALITY IMPROVEMENT IN TELEPHONE SURVEYS. . . . . . . . . . 123 Leyla Mohadjer, David Morganstein, Westat, Inc. COMPUTER ASSISTED SURVEY TECHNOLOGIES IN GOVERNMENT: AN OVERVIEW. . . . . . . . . . . . . . . . . . 137 Marc Tosiano, National Agricultural Statistics Service DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . 155 William L. Nicholls II, U. S. Bureau of the Census DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . 161 .161 James T. Massey, National Center for Health Statistics iv Part 2 Session 5 - DATA EDITING OVERVIEW OF DATA EDITING IN FEDERAL STATISTICAL AGENCIES .167 David A. Pierce, Federal Reserve Board EDITING SOFTWARE (An excerpt from Chapter IV of Working- Paper 18). . . . . . . . . . . . . . . . . . . . . .173 Mark Pierzchala, National Agricultural Statistics Service RESEARCH ON EDITING. . . . . . . . . . . . . . . . . . . 180 Yahia Ahmed, Internal Revenue Service DISCUSSION. . . . . . . . . . . . . . . . . . . . . .. 184 Charles E. Caudill, National Agricultural Statistics Service DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . 186 Richard Bolstein, George Mason University Session 6 - COMPUTER ASSISTED STATISTICAL SURVEYS OVERVIEW OF COMPUTER ASSISTED SURVEY INFORMATION COLLECTION. 191 Richard L. Clayton, U. S. Bureau of Labor Statistics A COMPARISON BETWEEN CATI AND CAPI. . . . . . . . . . . . .197 Martin Baum, National Center for Health Statistics COMPUTER ASSISTED SELF INTERVIEWING. . . . . .. . . . . . 202 Ralph Gillmann, Energy Information Administration COMPUTER ASSISTED Self INTERVIEWING: RIGS AND PEDRO, TWO EXAMPLES . . . . . . . . . . . . . . . . . . . . 205 Ann M. Ducca, Energy Information Administration DATA COLLECTION. . . . . . . . . . . . . . . . . . . . . 209 Cathy Mazur, National Agricultural Statistics Service v DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . 212 Robert N. Tinari, U. S. Bureau of the Census DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . 216 David Morganstein, Westat, Inc. Thursday, May 24, 1990 Session 7 - QUALITY IN BUSINESS SURVEYS IMPROVING ESTABLISHMENT SURVEYS AT THE BUREAU OF LABOR STATISTICS .. . . . . . . . . . . . . . . . . . . . . .221 .Brian MacDonald, Alan R. Tupek, U. S.Bureau of Labor Statistics A REVIEW OF NONSAMPLING ERRORS IN FEDERAL ESTABLISHMENT SURVEYS WITH SOME AGRIBUSINESS EXAMPLES. . . . . . . . . . . 232 Ron Fecso, National Agricultural Statistics Service DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . 243 David A. Binder, Statistics Canada DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . .247 Charles D. Cowan, Opinion Research Corporation Session 8 - COGNITIVE LABORATORIES THE BUREAU OF LABOR STATISTICS' COLLECTION PROCEDURES RESEARCH LABORATORY: ACCOMPLISHMENTS AND FUTURE DIRECTIONS . .253 Cathryn S. Dippo, Douglas Herrmann, U. S. Bureau of Labor Statistics THE ROLE OF A COGNITIVE LABORATORY IN A STATISTICAL AGENCY. . 268 Monroe G. Sirken, National Center for Health Statistics DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . 278 Elizabeth Martin, U. S. Bureau of the Census DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . .281 Murray Aborn, National Science Foundation (retired) vi Part 3 Session 9 - EMPLOYER REPORTING UNIT MATCH STUDY INTERAGENCY AGREEMENTS FOR MICRODATA ACCESS: THE ERUMS EXPERIENCE. . . . . . . . . . . . . . . . 291 Thomas B. Petska, Internal Revenue Service; Lois Alexander, Social Security Administration SAMPLE SELECTION AND MATCHING PROCEDURES USED IN ERUMS. . . 301 John Pinkos, Kenneth LeVasseur, Marlene Einstein, U. S. Bureau of Labor Statistics; Joel Packman, Social Security Administration RESULTS, FINDINGS, AND RECOMMENDATIONS OF THE ERUMS PROJECT. 309 .309 Vern Renshaw, Bureau of Economic Analysis; Tom Jabine, Statistical Consultant DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . 318 W. Joel Richardson, Charles A. Waite, U. S. Bureau of the Census DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . .324 Thomas J. Plewes, U. S. Bureau of Labor Statistics Session 10 - APPROACHES TO DEVELOPING QUESTIONAIRES TOOLS FOR USE IN DEVELOPING QUESTIONS AND TESTING QUESTIONNAIRES. . . . . . . . . . . . . . . . . . . . 331 Theresa J. DeMaio, U. S. Bureau of the Census TECHNIQUES FOR EVALUATING THE QUESTIONNAIRE DRAFT. . . . . 340 Deborah H. Bercini, National Center for Health Statistics DESIGNING QUESTIONNAIRES FOR CATI IN A MIXED MODE ENVIRONMENT. . . . . . . . . . . . . . . . . . . . . . 349 Gemma Furno, U. S. Bureau of the Census DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . 360 Carol C. House, National Agricultural Statistics Service vii Session 11 - STATISTICAL DISCLOSURE - AVOIDANCE DISCLOSURE AVOIDANCE PRACTICES AT THE CENSUS BUREAU. . . . . .367 Brian Greenberg, U. S. Bureau of the Census THE MICRODATA RELEASE PROGRAM OF THE NATIONAL CENTER FOR HEALTH STATISTICS .. . . . . . . . . . . . . . . . . . ...377 Robert H. Mugge, National Center for Health Statistics (retired) DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . 385 George Duncan, Carnegie Mellon University Session 12 - FEDERAL LONGITUDINAL SURVEYS FEDERAL LONGITUDINAL SURVEYS . . . . . . . . . . . . . . . . 393 Daniel Kasprzyk, U. S. Bureau of the Census; Curtis Jacobs, U. S. Bureau of Labor Statistics THE ADVANTAGES AND DISADVANTAGES OF LONGITUDINAL SURVEYS. . ..407 Robert W. Pearson, Social Science Research Council LONGITUDINAL ANALYSIS OF FEDERAL SURVEY DATA. . . . . . . . . 425 Patricia Ruggles Joint Economic Committee DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . . 438 Michael Brick, Westat, Inc. DISCUSSION. . . . . . . . . . . . . . . . . . . . . . . . . 447 Marilyn E. Manser, U. S. Bureau of Labor Statistics TOWARDS AN AGENDA FOR THE FUTURE Stephen E. Fienberg, Carnegie Mellon University . . . . . . . 455 Margaret E. Martin. . . . . . . . . . . . . . . . . . . . . . 462 Hermann Habermann, Office of Management and Budget. . . . . . 465 viii Part 2 Session 5 DATA EDITING 165 166 OVERVIEW OF DATA EDITING IN FEDERAL STATISTICAL AGENCIES David A. Pierce Federal Reserve Board Abstract This paper is the first of three in the session on Data Editing presenting highlights of the report "Data Editing in Federal Statistical Agencies", Statistical Policy Working Paper 18, OMB, prepared by the Subcommittee on Data Editing in Federal Statistical Agencies, FCSM. Included in this paper are a listing of the Subcommittee members, a discussion of its mission statement from the FCSM, definition and concepts of data editing, the major areas investigated and the methods used to do so, the development of case studies, and the Subcommittee's recommendations for data editing in Federal statistical agencies. The paper highlights the findings from a survey of current data editing practices which was conducted by the Subcommittee. 1. Introduction The Subcommittee on Data Editing in Federal Statistical Agen- cies was established by the Federal Committee on Statistical Methodology (FCSM) in November 1988 to document, profile, and discuss the topic of data editing in Federal censuses and surveys. The Subcommittee consisted of the following individuals: George Hanuschak, National Agricultural Statistics Service, Chair Yahia Ahmed, Internal Revenue Service Laura Bauer, Federal Reserve Board Charles Day, Internal Revenue Service Maria Gonzalez, Office of Management and Budget Brian Greenberg, Bureau of the Census Anne Hafner, National Center for Education Statistics Gerry Hendershot, National Center for Health Statistics Rita Hohenbrink, National Agricultural Statistics Service Renee Miller, Energy Information Administration Tom Petkunas, Bureau of the Census David Pierce, Federal Reserve Board 167 Mark Pierzchala, National Agricultural Statistics Service Marybeth Tschetter, Bureau of Labor Statistics Paula Weir, Energy information Administration A key aim of this effort was to further the awareness within agencies of each other's data editing practices, as well as of the state of the art of data editing, and thus to promote improvements in data quality throughout Federal statistical agencies. To further these goals, the Subcommittee was given a "charge", or mission statement, of determining how data editing is currently being done in Federal agencies, recognizing areas that may need attention, and, if appropriate, recommending any potential improvements for the editing process. Among the many items investigated by the Subcommittee were the role of subject matter specialists; hardware, software, and the data base environment; new technologies of data collection and editing, such as CATI and CAPI; current research efforts in the various agencies; and some recently developed editing systems such as at the Census Bureau and Statistics Canada. In fulfilling its mission the Subcommittee followed a number of paths, including developing a questionnaire on survey editing practices, assembling several case studies of editing practices, investigating alternative editing systems and software, exploring research needs and practices, and compiling an annotated bibliography of literature on editing. The result of the Subcommittee's work is its report (1990), organized into 5 main chapters with several supporting appendices as follows: Chapters Appendices I. Executive Summary A. Questionnaire Responses II. Background B. Case Studies III. Current Editing Practices C. Software Functions checklist IV. Editing Software D. Annotated Bibliography V. Research on Editing E. Glossary of Terms After discussing some general topics pertaining to editing and to the Subcommittee's work, this paper summarizes some of the main results of a questionnaire on Current Editing Practices, designed, administered and compiled by the Subcommittee. The two papers immediately following address, respectively, the subjects of software developments and recent research findings in editing. 168 2. Data Editing--Definition and Concepts The subcommittee first addressed the definition of data editing. While no universal definition of survey data editing exists, the following working definition was developed: Procedures designed and used for detecting erroneous and/or questionable survey data, with the goal of correcting (manually or electronically) as much of the erroneous data as possible (not necessarily all of the questioned data), usually prior to data imputation and summary procedures. Thus data editing can be seen as a data quality improvement tool by which erroneous or highly suspect data are found and (if necessary) corrected. We have focused primarily on editing rather than imputation in our work, though in practice the boundary between these is not absolute. 3. Current-Editing Practices To obtain a profile of current editing practices, in the various Federal statistical agencies, the subcommittee developed an editing questionnaire, which was completed for 117 Federal censuses and surveys representing 14 different Federal agencies. These 117 surveys were selected by subcommittee members, and thus they were not a scientific sample of all Federal surveys; however the Subcommittee felt that the 117 surveys represented a broad coverage of agencies and types of surveys or censuses that would present different editing situations. The Subcommittee members primarily involved with the questionnaire and editing profile were Charles Day, Yahia Ahmed, George Hanuschak, Rita Hohenbrink and Renee Miller. The questionnaire that was designed was a six-page document containing general questions about the particular survey as well as specific questions on editing. The report contains a complete listing of the questions asked, along with a tally of the results obtained for the 117 surveys, and should serve as a useful reference for the current (1990) state of data editing practice. A few of the major results follow. Regarding general characteristics of the surveys, about three- fourths of the surveys are actually sample surveys, and the remaining one-fourth censuses. A wide range of frequencies of collection are represented, from daily to quinquennial. About one- fourth are completed by individuals, and three-fourths by establishments. While traditional means of data collection such as mail, personal and telephone interviews were most common, a small 169 proportion of the surveys used CATI, and some were administrative records. Turning to editing, while the idea that there's no such thing as a free lunch seems to be as true of data editing as it is of anything else, there was wide variation in the actual cost of editing as a percent of total survey cost. The median editing cost for the surveys was more than one-third of the total cost of the survey. One of the interesting findings was that surveys of individuals had lower relative editing costs than surveys of establishments. The questionnaire also elicited information on when in the survey process the editing occurs. For about two-thirds of the 117 surveys, most of the data editing takes place after data entry. Editing at the time of data entry is on the increase but not yet common. Subject matter analysts play a large and important role in data editing. In about three-fourths of the surveys, subject matter analysts review all unusual or large cases. Only seven of the surveys had little or no intervention by subject-matter specialists. In this regard, we found that surveys of establishments had heavier involvement from subject-matter specialists than surveys of individuals; and this could also be related to the, finding, mentioned above, of lower editing costs in individual than in establishment surveys. The degree of automation in data editing varies considerably among the surveys in our study. In about three-fifths of the surveys, automated edit checking is done, but error correction is performed by clerks or analysts. In about one-third of the cases, only unusual situations are referred to analysts. Only 3% of the surveys were totally automated, though all but 1% had at least some automation. There are different types of edits that are applied to surveys. Almost all the surveys in our study use validation editing, which detects inconsistent data within a record. About five-sixths also use macro editing, where aggregated data are examined. The majority of surveys use other types of edits as well, such as range edits, edits using historical data, ratio edits, some of which may overlap. Additional information is also utilized in editing many of the surveys, such as comparisons with other surveys, comparison to a value estimated by regression analysis, or the use of interquartile measures. Satisfaction with the current editing system varied widely. About half the respondents were satisfied with their current editing systems, and another one-fourth felt only minor changes were needed. The remaining one-fourth thought major changes were desired, with 5% of those being in favor of a complete overhaul. 170 Among those desiring improvements, those most frequently mentioned were: an on-line system for data editing, the use of prior periods' data to test the current period, more statistical edits, more sophisticated validation and macro editing, an audit trail, more automation, particularly automated error correction, user-friendlier systems, incorporation of imputation into the error package, evaluation of effects of data editing, reduction of the number of edit flags to follow up, incorporation of information on auxiliary variables, greater use of Expert Systems, and multivariate editing. An Audit trail, or a complete record of the original and corrected data, the edits failed and any other relevant information, is very helpful in monitoring and improving the editing process. The importance of an evaluation of the effects of editing on the data, and our current lack of knowledge of such effects, have also been noted by Bailar (1990). 4. Case Studies In addition to the breadth of valuable information obtained from the questionnaire, the Subcommittee also felt that an examination of a relatively few surveys in greater depth would shed light on the complexity of the different editing situations in operation. Therefore several case studies are described, some in two-paragraph summary format and others in greater detail. These comprise Appendix B of the report. Anne Hafner and Yahia Ahmed had primary responsibility for preparation of the Case Studies. 5. Recommendations The report lists a number of recommendations for future data editing practice, some general and some specific. Many of them fall into the following general categories. The quality of an agency's existing editing practices and technology should be examined in the light of possible improvements or alternatives, with respect to such criteria as cost efficiency, timeliness, statistical defensibility, and accuracy. Important recent developments in data processing, such as new microcomputers, workstations, local area networks, data base software, and mainframe linkages, should be 171 examined for their possible incorporation into the survey editing process. Agencies should stay in communication with each other and with other professionals regarding their research in editing, particularly the development and implementation of new editing procedures and related methodologies such as data base technologies and expert systems. References Bailar, Barbara (1990), "Discussion of 'Survey Quality Profiles'", Seminar on the Quality of Federal Data, May 22, 1990, COPAFS. This Proceedings. Groves, Robert (1990), "Towards Quality in a Working Paper Series on Quality", Keynote Address, Seminar on the Quality of Federal Data, May 22, 1990, COPAFS This Proceedings. Hanuschak, George, Yahia Ahmed, Laura Bauer, Charles Day, Maria Gon-zalez, Brian Greenberg, Anne Hafner, Gerry Hendershot, Rita Hohenbrink, Renee Miller, Tom Petkunas, David Pierce, Mark Pierzchala, Marybeth Tschetter and Paula Weir (1990), Data Editing in Federal Statistical Agencies, Statistical Policy Working Paper 18, Statistical Policy Office, Office of Management and Budget, Washington, DC. 172 EDITING SOFTWARE (An excerpt from Chapter IV of Working Paper 18) Mark Pierzchala National Agricultural Statistics Service A. Introduction For most surveys, large parts of the editing process are carried out through the use of computer systems. The task of the Software Subgroup has been to investigate software that in some way incorporates new methodologies, has new ways of presenting data, operates in recently developed hardware environments, or integrates editing with other functions. In order to fulfill this charge, the Subgroup has evaluated or been given demonstrations of new editing software. In addition, the Subgroup has developed an editing software evaluation checklist that appears in Appendix C of Statistical Policy Working Paper 18. This checklist contains possible functions and attributes of editing software, which would be useful for an organization to use when evaluating editing software. Extremely technical jargon can be associated with new editing systems; and new approaches to editing may not be familiar to the reader. The purpose of section B is to explain these approaches and their associated terminology as well as to discuss briefly the role of editing in assuring data quality. A distinction must be made between generalized systems and software meant for one or a few surveys. The former is meant to be used for a variety of surveys. Usually there is an institutional commitment to spend staff time and money over several years to develop the system. It is hoped that the investment will be more than recaptured after the system is developed through the reduction in resources spent on editing itself and in the elimination of duplication of effort in preparing editing programs. Some software programs have been developed that address specific problems in a particular survey. While the ideas inherent in this software may be of general interest, it may not be possible to apply the software directly to other surveys. Section C of Chapter IV of Working Paper 18 describes three generalized systems in some detail, and then briefly describes other systems and software. These three systems have been used or evaluated by Subgroup members in their own surveys. New and exciting statistical methodology is also improving the editing process. This includes developments in detecting outliers, aggregate level data editing, imputation strategy, and statistical quality control of the process itself. The implementation of these activities, however, requires that the techniques be encoded into a computer program or system. 173 B. Software Improving Quality and Productivity Reasons for the Development of New Editing Software Traditional editing systems do not fully utilize the talents or expertise of subject matter specialists. Much of their time may be spent in dealing with unimportant or spurious error signals and in coping with system shortcomings. As a result, the specialist has less time to deal with important problems. In addition, editing systems may be able to give feedback on the survey itself. For example, a pattern of edit failures may suggest misunderstandings by the respondent or interviewer. If this is recognized, then the expertise of the specialist may then be used to improve the survey itself. Labor costs are a large part of the editing costs and are either steady or increasing, whereas the cost of computing is decreasing. In order to justify the heavy reliance on people in editing, their productivity will have to be improved through the use of more powerful tools. However, even if productivity is improved, different people may do different things in similar situations. If so, this makes the process less repeatable (reproducible) and more subject to criticism. When work is done on paper, it is hard to track, and it is impossible to estimate the effect of editing actions on estimates. Finally, some tasks are beyond the capability of human editors. For example, it may be impossible for a person to maintain the multivariate frequency structure of the data when making changes. These reasons and several others are commonly given as explanations for the increased use of computer software to improve the editing process. It is in the reconciliation of these two goals, (the increased use of computers for some tasks and the more intelligent use of human expertise), that the major challenge in software development lies. There will always be a role for people, but it will be modified. One positive feature of new editing software is that it can often improve the quality of the editing process and productivity at the same time. Ways That Productivity Can Be Improved One way to improve productivity is to break the constraints imposed by computer systems themselves. The use of mainframe systems for editing data is widespread. In some cases, however, an editor may not use the system directly. For example, error signals may be presented on paper printouts, and changes entered by data typists. Processing costs may dictate that editing jobs are run at low priority, overnight, or even less frequently. The effect of the changes made by the editor may not be immediately 174 known: thus, paper forms may be filed, taken from files, and re-filed several times. The proliferation of microcomputers promises to eliminate many of these bottlenecks, while at the same time it creates some challenges in the process. The editor will have direct access to the computer, and will be able to prioritize its use. Once the microcomputer is acquired, user fees are eliminated, thus resource-intensive programs such as interactive editing can be employed, provided the microcomputers are fast enough. Moving from a centralized environment (i. e., the mainframe) to a decentralized environment (i.e., microcomputers) will present challenges of control and consistency. In processing a large survey on two or more microcomputers, communications will be necessary. This will best be done by connecting them into a Local Area Network (LAN). New systems may reduce or eliminate some editing tasks. For example, where data are edited in batch and error signals are presented on printouts, a manual edit of the questionnaires before the machine edit may be a practical necessity. Editing data and error messages on a printout can be a hard, unsatisfactory chore because of the volume of paper and the static and sometimes incomplete presentation of data. The purpose of the manual edit in this situation is to reduce the number of machine-generated error signals. In an interactive environment, information can be efficiently presented and immediately processed. The penalty associated with machine-generated signals is greatly reduced. As a result, the preliminary manual edit may be eliminated. In addition, questionnaires are handled only once, further reducing filing and data entry tasks. Productivity may be increased by reducing the need for editing after data are collected. Instruments for Computer Assisted Telephone Interviewing (CATI), Computer Assisted Personal Interviewing (CAPI), and on-site. data entry and editing programs are gaining wider use. Routing instructions are automatically followed, and other edit failures are verified at the time of the interview. There may still be many error signals from suspicious edits, however, the analyst has more confidence in the data and is more likely to let them pass. There are two major ways that productivity can be improved in the programming of the editing instruments. First is to provide a system that will handle all, or an important class, of the agency's editing needs. In this way the applications programmer need not worry about systems details. For example, in an interactive system, the programmer does not have to worry about how and where to flag edit failures as it is already provided. The programmer only codes the edit specification itself. In addition, the end-user has to learn only one system when editing different surveys. Second is the elimination of multiple specification and programming of variables and edits. For example, if data are 175 collected by CATI, and edited with another system, then essentially the same edits will be programmed twice, possibly by two sets of people. If the system integrates several functions, e.g., data entry, data editing, and computer assisted data collection, then one program may be able to handle all of these tasks. This integration would also reduce time spent on data conversion from one system to another. Systems That Take Editing and Imputation Actions Some edit and imputation systems take actions usually reserved for people. They choose fields to be changed and then change them. The human element is not removed, rather this expertise is incorporated into the system. One way to incorporate expertise is to use the edits themselves to define a feasible region. This is the approach outlined in a famous article by Fellegi and Holt (1976). Edits that are explicitly written are used to generate implied edits. For example, if 100 < x / y < 200, and 3 < y / z < 4, are explicit edits, then an implied edit obtained algebraically is 300 < x / z < 800. Once all implied edits are generated, the set of complete edits is defined as the union of the explicit and implied edits. This complete set of edits is then used to determine a set of fields to be changed for every possible edit failure. This is called error localization. An essential aspect to this method is that changes are made to as few fields as possible, or alternatively to the least reliable set of fields which are determined by weights given to each field. The analyst is given an opportunity to evaluate the explicit edits. This is done through the inspection of the implied edits and extremal records (the most extreme records that can pass through the edits without causing an edit failure). In inspecting the implied edits, it may be determined if the data are being constrained in an unintended way. In inspecting extremal records, the analyst is presented with combinations of the most extreme values possible that can pass the edits. The human editor has several ways to inject expertise into this kind of a system: (1) the specification of the edits; (2) the inspection of implied edits and extremal records and then the re-specification of edits; (3) the weighting of variables according to their relative reliability. There are some constraints in systems that allow the computer to take editing actions. Fellegi and Holt systems cannot handle certain kinds of edits, notably nonlinear and conditional edits. Also algorithms that can handle categorical data cannot handle continuous data and vice versa. Within these constraints (and others), most edits, can be handled. For surveys with continuous data, a considerable amount of human attention may still be necessary, either before the system is applied to data or after. 176 Another way that computers can take editing actions is by modeling human behavior. This is the "expert system" approach. For example, if typically maize yields average 100 bushels per acre, and the value 1,000 is entered, then the most likely correction is to assume that an extra zero was typed. The computer can be programmed to substitute 100 for 1,000 directly and then to re-edit the data. Ways That Data Quality Can Be Improved or Maintained It is not clear that editing done after data collection can always improve the quality of data by reducing non-sampling errors. An organization may not have the time or budget to recontact many of the respondents or may refrain from recontacts in order to reduce respondent burden. Additionally, there may be cognitive errors or systematic errors that an edit system cannot detect. Often, all that can be done is to maintain the quality of the data as they are collected. To use the maize yield example again, if the edit program detects 1,000 bushels per acre, and sets the value to 100 bushels per acre, then the edit program has only prevented the data from getting worse. Suppose the true value was really 103 bushels per acre. The edit and imputation program could not get the value closer to the truth in this case. Detecting outliers is usually not the only problem. The proper action to take after detection is the more difficult problem. One of the main reasons that Computer Assisted Data Collection is employed is that data are corrected at the time of collection. There are a few ways that an editing system may be able to improve data quality. A system that captures raw data, keeps track of changes, and provides well conceived reports, may provide feedback on the performance of the survey. This information can be used, to improve the survey in the future. To take another agricultural example, farmers often harvest corn for silage (the whole plant is harvested, chopped into small pieces, and blown into a silo). Production of silage is requested in tons. Farmers often do not know their silage production in tons. Instead, the farmer will give the size (diameter and height) of all silos containing silage. In the office, silo sizes are converted into tons of production. If this conversion takes place before data are entered, then there is no indication from the machine edit of the extent of this reporting problem. Another way that editing software can improve the quality of the data is to reduce the opportunity cost of editing. The time spent on editing leaves less time for other tasks, such as persuading people to participate, checking overlap of respondents between multiple frames, and research on cognitive errors. 177 Ways That Quality of the Editing Process Can Be Defended or Confirmed There is a difference between data quality and the quality of the editing process itself. To refer once again to the maize yield example, a good quality process will have detected the transcription error. A poor quality process might have let it pass. Although neither process will have improved data quality, the good quality process would have prevented their deterioration from the transcription error. Editing and imputation have the potential to distort data as well as to maintain their quality. This distortion may affect the levels of estimates and the univariate and multivariate distributions. A high quality process will attempt to minimize distortions. For example, in Fellegi and Holt systems, changes to the data will be made to the fewest fields possible and in a way such that distributions are maintained. A survey organization should be able to show that the editing process is not abusing the data. For editing after data collection, this may be done by capturing raw (unedited) data and keeping track of changes and the reasons for change. This is called an audit trail. Given this record keeping, it will be possible to estimate the impact of editing and imputation on expansions and on distributions. It will also be possible to determine the editor effect on the estimates. In traditional batch mode editing on paper printouts, it is not unusual for two or more specialists to edit the same record. For, example, one may edit the questionnaire before data entry while another may edit the record after the machine edit. In this case, it is impossible to assign responsibility for an editing action. In an on-line mode one person handles a record until it is done. Thus all changes can be traced to a person. For editing at the time of data collection, (e.g., in CATI), it may be necessary to conduct an experiment to see if either the mode of collection, or the edits employed, will lead to changes in the data. A high quality editing process will have other features as well. For example, the process should be repeatable, in time and in space. This means that the same data passed through the same process in two different locations, or twice in one location, will look (nearly) the same. The process will have recognizable criteria for determining when editing is done. It will detect real errors without generating too many spurious error signals. The system should be easy to program in and have an easy user interface. It should promote the integration of survey functions such as micro- and macro-editing. Changes made by people should be on-line (interactive) and traceable. Database connections will allow for quick, and easy access to historical and sampling frame data. An editing system should be able to take actions of minor impact without human intervention. It should be able to accommodate new advances in statistical editing methodology. 178 Finally, quality can be promoted by providing statistically defensible methods and software modules to the user. Acknowledgements Other members of the Editing Software Working Group for Working Paper 18 were Tom Petkunas, Bureau of the Census, Gerry Hendershot, National Center for Health Statistics, Charles Day, Internal Revenue Service, Marybeth Tschetter, Bureau of Labor Statistics, and Rita Hohenbrink, National Agricultural Statistics Service. 179 RESEARCH ON EDITING Yahia Ahmed Internal Revenue Service Introduction This paper is one of three papers presented in a session organized to present topics from the Statistical Policy Working Paper 18, "Data Editing in Federal Statistical Agencies." The Subcommittee on Data Editing in Federal Statistical Agencies was established by the Federal Committee on Statistical Methodology to document, profile and discuss data editing practices in Federal surveys. To effectively accomplish its mission, the subcommittee was I divided into four major groups: Editing Profile, Case Studies, Editing Software, and Editing Research. The purpose of this paper is to present briefly the goals, findings and recommendations of the Editing Research Group. A more detailed description of editing research is provided in Chapter V of the Working Paper. The goals of the Editing Research Group were to identify areas in which improvements to edit systems would prove most useful, to describe recent and current research activities designed to enhance edit capabilities, to make recommendation for future research an to develop an annotated bibliography on editing. Areas Which Need Improvement, The Editing Research Group used two sources of information to identify areas which need improvement. The first source was the editing profile questionnaire which was administered to managers, of 117 Federal surveys covering 14 different agencies. This questionnaire included questions about edit movements. One question asked was "For future applications, what would you like your edit system to do that it doesn't do now?" The second source was discussions with those responsible for edit tasks within a number of Federal agencies. The following areas emerged as priorities: 0 More on-line edit capabilities 0 Better ways to detect potentially erroneous responses 0 More sophisticated and extensive macro-editing 0 Evaluation of the effect of data editing. 180 Areas of Edit Research Much editing research has been conducted in national statistical offices around the world. It is these organizations, which conduct huge and complicated surveys, that have the most to be gained from developing new systems and techniques. They also have the resources upon which to draw for this development. One area of current research interest is that of "on-line edit capabilities". BLAISE, SPEER, and PEDRO discussed in the preceding paper are examples of such research activities. A second area of active research is in the detection of potentially erroneous responses. The method most commonly used is to employ explicit edit rules. For example, edit rules may require that: 1) The ratio of two fields lie between prescribed bounds, 2) various linear-inequalities and/or equalities hold, or 3) the current response be within some range of a predicted value based on a time series or other models. Edit rules and parameters are highly survey specific. A related area of editing research is the design of edit rules and the development of methods for obtaining sensitive parameters. In order to make sure that all errors are flagged, often many unimportant error flags are generated. These extra flags not only take time to examine but also distract the reviewer from important problems. These extra flags are generated because of the way that the error limits are set. A related area of research focuses on developing statistical editing techniques to reduce the-number of error flags, while at the same time, ensuring that not many errors escape detection. Several research studies in which different statistical techniques (such as clustering, exponential smoothing and Tukey's biweight) to detect potentially erroneous responses or to set error bounds are described in the working paper. In contrast to the rule-driven method f or the detection of potentially erroneous response combinations within a record, one alternative procedure is to analyze the distribution of questionnaire response. Records which do not conform to the observed distribution are then targeted as outliers and are selected for review. Although there has been research interest in this method, no application of these multivariate methods was found. 181 Recommendations The most important recommendation is that agencies recognize the value of editing research and place in high priority on devoting resources to their own research, to monitoring developments in data editing at other agencies and elsewhere and to implement improvements. Often innovations in editing methods made by survey staff are viewed as enhancements to processing for that particular survey and little thought is given to the broader applicability of methods developed. Accordingly, survey staff do not prepare discussion of new methods for publication. We encourage survey staff to take the time to describe their work and publish them in order to share their experiences with others who may be working under similar conditions. It is often in such articles that methods which may be applicable to more than one survey are first introduced and described. The survey on editing practices indicated that there was little analysis of the effect of editing on the estimates that were produced. Considering that the cost of editing is significant for most surveys, this is clearly an area in which more work is required. A related issue is the need to attempt to determine when to edit and not to edit. Clearly, all the errors are not going to be found and we should not attempt to find them all. Therefore, there is a need to design guidelines for determining what is an acceptable level of editing. Another neglected research area in this country concerns the editing of data at the time they are keyed from mail responses. This area is usually discussed in the setting of quality control; however, it is an area that can benefit from further research from the perspective of data editing. Annotated Bibliography It is quite difficult to provide a complete assessment of current research activities in the area of editing because so much of the research, progress, and innovations are described only in specific documentation. However the group was able to identify 86 references which describe research efforts over the past years. Appendix D of the working paper contains the annotated bibliography The annotations are brief and are only intended to give a very general idea of the paper's content. The appendix provides a valuable source of information on the editing literature. In addition it includes papers which describe the underlying methods, the software, proposed uses, and possible 182 advantages of three generalized editing software systems -- GEIS, BLAISE and SPEER. Acknowledgements Other members of the Editing Research Group for Working Paper 18 were Laura Bauer, Federal Reserve Board, Brian Greenberg, Bureau of the Census, Renee Miller, Energy Information Administration, David Pierce, Federal Reserve Board, Paula Weir, Energy Information Administration. 183 DISCUSSION Charles E. Caudill National Agricultural Statistics Service As Administrator of a Federal-State Cooperative Statistical Agency, I am quite impressed with the information contained in OMB Statistical Policy Working Paper No. 18 on Data Editing in Federal Statistical Agencies. The working paper thoroughly, documents many existing editing practices, generalized editing software developments and provides a detailed software evaluation protocol. In addition, it covers current research activities on editing, provides an annotated bibliography and has a good executive summary including recommendations. I believe that this report, if read and seriously considered by federal survey managers and administrators, can have a substantial effect on improving productivity. Thus, "precious" resources could be freed up to more formally address nonsampling errors, quality control, and total survey error models, measurements and structures. In my opinion, if there was ever a report that survey administrators should take seriously, this is it. There are several more detailed comments and observations that I have about working paper number 18. The data on the costs of editing was intriguing. My observation is that there may be an upward bias in the data, and some non-editing cost may have been included. However, even if this is the case, there obviously is still plenty of room for productivity gains in the editing process. With the proliferation of personal computer networks and data base software, there is substantial potential to improve the productivity of editing systems by being on-line and providing the editor with immediate screen feedback and re-editing of their proposed changes. Recent computer processing technology advances also make the use of audit trails more available for more users. Inexpensive audit trails provide the capability to analyze and conduct research on the effects of editing on the estimators and also on the overall performance of the survey as well. The detailed checklist of edit software system features in Appendix C of working paper 18 will be beneficial to both the development of new systems and maintenance and evaluation of existing systems. The annotated bibliography of articles and papers on editing presented in Appendix D will be valuable for researchers and system developers as a substantial source of literature and information. 184 Working paper 18 certainly demonstrated that current data editing practices are labor intensive. Many remain mainframe and batch oriented, with multiple passes of the data. Also, I think that there may be a tendency to stay with existing systems too long. My final comments are on total quality management of surveys. As an Administrator, one of my major concerns is with the quality of the final products and reports that the Agency delivers to the public. Thus, if the editing process can be made more efficient, without degrading accuracy, then that adds to the potential of using the saved resources on other important areas of the survey process. Total quality management techniques applied to surveys are useful tools in efficiently identifying the most important potential sources of survey error. DISCUSSION, Richard Bolstein George Mason University The serious impact that erroneous survey data can have on results, the fact that the number of errors tend to increase with the size and complexity of the survey, and the relatively large proportion of survey costs currently required to edit and correct data, make the need for new and improved methods of data editing imperative. To this end, the authors have done a laudable job in researching methods currently used, presenting several case studies, testing and discussing the advantages and disadvantages of some current and developing editing software, and providing a synopsis of current research. A working definition of editing was clearly necessary in this study, since, among other things, in order to estimate costs of editing, a fairly rigorous definition of the scope of editing was required. The working definition used by the authors, namely, "procedure(s) designed and used for detecting erroneous and/or questionable survey data with the goal of correcting as much of the erroneous data as possible, usually prior to data imputation and summary procedures" is quite suitable for this purpose. We should keep in mind, however, that while it feels comfortable to clean up erroneous data prior to imputation for missing data, in practice the two are often intertwined. The paper states that the cost of editing was available for 40% of the 117 surveys in the sample, and cost estimates were possible for an additional 40%. It was reported that between 75% and 80% of these surveys had editing costs of at least 20% of total costs. It is not too meaningful to compare the relative costs of editing across all types of surveys however, since one would naturally expect that these costs would be higher in less expensive surveys (such as mail or administrative records) than in expensive surveys (such as personal interview, surveys of institutions), as found by the authors. Thus, it would be more informative if the relative cost figures cited above were reported by survey type. Another factor that can account for a large percentage of editing costs is the presence of a relatively large number of questions requiring open-ended responses and subsequent coding of the responses. But although the distribution of the relative cost of editing may vary considerably, there is no doubt that editing is costly and methods to reduce this cost and improve data quality are much needed. Finally, no discussion of the costs of editing is complete without determining what percentage is due to bad data that should not have occurred but for inadequate interviewer training, poor supervision and quality control of interviewers, and simple common 186 sense errors. For these are errors which should not have occurred and should be deducted from the cost of editing in the estimates of the surveys above, since they are likely to have varied considerably. Although elimination of such unnecessary errors was not part of the project of the three authors, it seems appropriate in a discussion of improving data editing procedures to mention ways in which the need for editing can be reduced. To illustrate an example of a common sense error that should be eliminated, in a certain survey, the sponsor of which I will not name, fishermen are interviewed and their catch is weighed and measured. The interviewer is supposed to record weight in kilograms, but the scale used shows weight in both pounds and kilograms. As expected, frequent errors occur. The obvious solution is to use a scale that only shows kilograms, but when I suggested this to the survey firm, the response was "no one makes such a scale". When I then suggested taping over the side of the scale showing pounds, the reply was "but the fishermen want to know what their fish weigh in English". Finally, I suggested taping over the kilogram side of the scale, have the interviewer record the weight in pounds, and have the data entry program convert it to kilograms. The response to this suggestion I am sure you have all heard before: "well, that's the way we're used to doing it". There are numerous other examples of course (for example, in some surveys interviewers are required to record the hour in military time). The most promising methods to reduce editing costs and improve data quality (after elimination of the unnecessary errors) are found in interactive data entry software and in general editing software systems. These methods seem appropriate for large, complex surveys, or surveys which are repeated. For small one-time surveys the cost of purchasing, learning, and programming the software will most likely outweigh the savings, as this is even true with CATI. But this is generally not the case with surveys gathering Federal Data. The three generalized editing software systems studied in detail by Mark Pierzchala seem very promising, especially BLAISE because of its generality and ability to handle both categorical and continuous data. GEIS and SPEER are specific to economic type surveys. To what extent can graphics or other theoretical tools be used in editing systems? The STAR WARS software described uses graphics to compare edited values with the originals, but not to detect outliers. The parallel coordinate system for graphic displays of high-dimensional data [see Miller and Wegman (1989), Wegman (1990)] may be used to detect outliers. Yahia Ahmed noted that analysis of the multivariate distribution of questionnaire responses to flag records that don't conform to the distribution as outliers has been infrequently used, no doubt due to its complexity. I believe that graphical methods for detecting outliers will meet with more acceptance than the multivariate analysis approach has but it would 187 not be cheap (time-wise) and probably would be best used as a final check rather than at the front-end of the editing task. Finally, I have two recommendations. In view of the increasing abundance of software we will see in the future, we should construct a standard set of test data sets for evaluating present and future software editing systems. Secondly, a one or two-day demonstration seminar of some of these systems would be well received. References Miller, J.J. and Wegman, E.J. (1989), "Construction of line densities for parallel coordinate plots", Technical Report No. 53, Center for Computational Statistics, George Mason University. Wegman, E.J. (1990), "Hyperdimensional data analysis using parallel coordinates", Journal of the American Statistical Association, to appear. Session 6 COMPUTER ASSISTED STATISTICAL SURVEYS 189 OVERVIEW OF COMPUTER ASSISTED SURVEY INFORMATION COLLECTION Richard L. Clayton U. S. Bureau of Labor Statistics This section provides a summary of Working Paper 19 on Computer Assisted Survey Information Collection (CASIC). For additional information, we encourage you to see this document. The power of rapid calculating has been applied to virtually every phase of the survey process, including sample design and selection, and estimation. The most important implication of these applications is that survey practitioners are allowed to consider a growing range of techniques which were not affordable prior to the availability of inexpensive and fast calculating capability. The field of computer assisted collection applications may be the area of greatest and most rapid change in survey methods. This field includes the rapidly expanding variety of applications based on the availability of powerful and inexpensive computers. Most familiar of the new techniques are CATI and CAPI. However, a variety of other collection methods are being developed across the Federal government's statistical agencies, including Touchtone Data Entry, Prepared Data Entry and more recently, voice Recognition Entry. High quality published data begins with collecting high quality data from our respondents. Much of survey processing addresses, and compensates for, weaknesses in the quality of the collected data and the data we do not collect. Those methods which capture data quickly and accurately should be developed which allow respondents to answer our questions accurately and quickly. With this in mind, we provided the results of research and development activities using new technological features throughout the Federal government seeking new data collection methods, and in modifying the old, to improve the quality of data collection. For the purposes of this report, we defined computer assisted survey information collection methods as those using computers as a major feature in the collection of data from respondents, and in transmitting of data to other sites for post-collection processing. Goal: The overall goal of Working Paper 19 was to provide information on new data collection methods to challenge Federal survey managers to reconsider their operations in light of recent changes in survey methods available, or made attainable through changing technology to reassess their methods of accomplishing the common goal of providing the critical information to the public which is accurate, timely and relevant. We hope that by sharing information and experiences, that others may gain and forward the overall effectiveness of governmental activities. 191 Objectives: The primary objective is to describe emerging methods of interactive electronic data collection, the potential benefits, and current examples of its use in Federal surveys. In describing current uses and tests, a secondary objective is to pose questions about the implications of use of computer assisted methods and try to suggest some answers. These questions involve such factors as quality, costs, and respondent reaction to. computerized surveys. Scope: The survey operations included in this report includes all of the activities and tasks from the transmittal of the questionnaire, conduct of the interview, data entry, editing and followup for nonresponse or edit reconciliation. The last major survey operation to benefit from automation is data collection. Computers were first applied to collection using mainframes to control certain aspects of telephone collection, and Computer Assisted Telephone Interviewing (CATI) was born. The first applications of CATI stimulated new research worldwide evaluating the impact on of CATI on the survey error profile and costs. CATI is now used to assist interviewers in all collection activities, including scheduling calls, controlling detailed interview branching, editing and reconciliation, providing much greater control over the collection process and reducing many sources of error. At the same time, a tremendous amount of information it captured by the computer providing additional insight into the data collection process. The ongoing advances in computer technology, and particularly the advent of microcomputers, continue to offer additional opportunities for improving the quality of published data. The first portable computers were quickly pressed into service to duplicate the advantages of CAT! in a personal visit environment. Thus, Computer Assisted Personal Interviewing (CAPI) was launched from the work in CATI. While CATI and CAPI represent advances for surveys requiring interviewers, microcomputers are now finding important roles in self-administered questionnaires, where interviewers are not needed. Prepared Data Entry (PDE), developed by the Energy Information Administration, allows respondents which have a compatible microcomputer or terminal to access and complete the questionnaire directly on their screen. Touchtone Data Entry (TDE), developed at the Bureau of Labor Statistics, allows respondents to call a toll-free telephone number. Questions posed by a computer are answered using the keypad of their touchtone telephone. The machine repeats the answers for verification with the respondent which are stored in a database. TDE systems are now commonplace for bank transfers, and 192 telephone call routing, as examples. We have just applied existing technology to the data collection process. As an extension of this approach, techniques have been developed more recently allowing respondents to answer the questions by speaking directly into the telephone. The incoming sounds are matched to known patterns recognizing the digits and the words "yes" and "no". Voice Recognition Entry (VRE), as this is known, is not the distant future. The Bureau of Labor Statistics is currently conducting live tests where this method is being warmly received by respondents as natural and convenient. Both TDE and VRE offer inexpensive data collection where the respondents initiate the calls, enter and verify the data. Refinements to procedures will now focus on minimizing nonresponse prompting activities. Respondent Burden: For many respondents, the use of automated methods can actually reduce the collection burden placed on them. For example, use of Prepared Data Entry, where respondents interact with computer screens, provides a single set of step-by-step procedures with on-line editing to prevent inconsistent or incorrect reporting, thus reducing the need for expensive and troublesome recontacts. Also, these methods have, in some cases, substantially reduced the time taken to provide complex data for large establishments. Similar methods may be applied to other surveys covering large establishments where the one-time costs of data conversion to a standard format would be cost-effective, especially in repeated surveys. Ouality: Automated collection allows for improved control yielding reduced error from several sources including errors caused by the respondent, the interviewer, and post collection processes such as key entry error. The instant status capabilities of CATI, for example, provide stronger intervention features for nonresponse prompting, reducing nonresponse error. In deciding which collection method to use, quality can become a relative concept that is affected by a tradeoff between cost and benefit. The choice of a data collection method is usually based on a combination of performance and cost factors determining affordable quality. For traditional collection methods, these factors and the decision-making process are fairly well known. Now, these new methods discussed in Working Paper 19 expand the array of potential collection tools and challenge the survey designer to reevaluate old cost/performance assumptions. Costs: The data collection process is composed of a few major activities, including transmitting and receiving the questionnaire, data entry, editing and nonresponse prompting. The labor and nonlabor costs will vary depending on the method used. For example, under mail collection virtually each action is conducted 193 manually and postage is the dominant nonlabor cost. By contrast, CATI operations can minimize postage costs reduces many of the expensive mail handling operations. However, CATI adds new costs in the form of telephone line charges and computers (including Systems design and ongoing maintenance). Self-response methods, such as TDE, VRE and PDE collection, reduce postage, the manual mail operations and the labor involved in CATI interview activities, but may still require edit reconciliation and nonresponse followup. Thus, the factors of production, and the composition of each those inputs vary greatly among the existing and newer techniques. Many factors can change in a short period. Only a few years ago, automation costs were driven by the scarcity of mainframe hardware capacity. Now, the costs of automation are driven by the labor involved in developing specialized systems dominates automation costs. Portable and desktop microcomputers were not widely available at the beginning on this decade. Now, microcomputers are widely available, very inexpensive and extremely powerful. Old assumptions about costs need to be reevaluated. Labor and postage costs have risen steadily in recent years, while capital costs, such as microcomputers and telephone services have been declining. The decision on which collection mode to use, or which combination, will depend on the particular survey application and the existing cost structure. However, it is important to view such investments over the long-term as the relative costs of each of the inputs do not remain constant over time. Survey managers should periodically review old assumptions in light of new technology and project operating costs over the reasonable foreseeable future in deciding not to investigate new methods. Users: Automated data collection includes three major groups of people: the respondents, the interviewers and the designers and developers of the system and procedures for collection. This report covers the essential factors involved in successfully including the requirements of each group. Respondents: The respondent must be considered the primary user of any survey vehicle, whether automated or not, and all aspects of the response environment must be developed with the respondent in mind. The cooperation of the respondent is the single most critical factor in survey operations. Respondents must be treated with the greatest care. We must consider our respondents as a Customer, after all, if our survey vehicle doesn't "sell", if the questionnaire is not successful in getting an accurate response, we will have no input for the rest of our production process. 194 Even one-time surveys must strive to leave the respondent with the feeling of contribution and importance, and most of all, a willingness to participate in other surveys in the future if called on. Thus, our primary job is to develop techniques which allow the respondent to complete the survey completely and accurately and with a minimum level of burden. The use of these collection methods, while bringing improvements in the quality of collected data, has entailed other challenges. These automated collection methods are made possible through the close interaction of subject matter experts, statisticians, and computer scientists. To effectively use these methods, each of these groups learned the basic tenants of the others. This close relationship will only continue to grow, with advances in each field aiding advances in the others. Interviewers: The second most important user is the interviewer. The systems provided to assist in the interview process must be easy to use, must work infallibly and must actually provide improvements in his or her work environment. Interviewers must feel as they are the most valuable feature in the interview, that the machine is merely a tool to expedite and simplify their work. This is not always an easy task. Survey Practitioners: We are the third major group of users. The decisions made early in the development process will carry over into the ongoing use and maintenance of the system. Systems designers face difficult choices, such as building customized systems from scratch versus linking standardized "off the shelf" routines or commercial, packages. The inevitable limitations would have to be traded off against reduced maintenance and lower start up costs. Automated collection methods can also improve data quality. All of the methods discussed could be designed to include on-line editing to prevent impossible and inconsistent entries. Some of these methods, such as TDE and VR, improve data quality by verifying recorded data with the respondent. These are potential improvements. The final impact of quality lies in the up front planning and execution. This place responsibility for clearly defining and controlling the collection environment directly with the survey designer. Future: The future application of these techniques is limited only by our creativity and initiative of program managers and planners. The "case studies" serve to illustrate the options available, and will surely raise many more questions for further investigation. 195 We hope that the discussion of technological advances generates discussion and stimulates creative, new applications to the whole range of governmental information collection activities. In addition to the methods described here there are other advances in, technology which hold potential for vastly changing data collection. Integrated Services Digital Network (ISDN) is a powerful network system which will provide simultaneous transmission of sound, video and data. The result could be a change in the way some surveys are conducted offering all of the benefits of personal interviewing with the lower costs of telephone interviewing. You have heard a several different collection methods described and discussed which are currently available. And you can see that the pace of change will accelerate and match changes in technology. So what does the future hold? You have to ask yourself how your survey operations will be conducted in 5 or perhaps 10 years. In doing so, ask yourself how things were done 5 or 10 years ago. What sorts of things have happened and what were their implications? 196 A COMPARISON BETWEEN CATI AND CAPI Martin Baum National Center for Health Statistics Introduction I will describe for you some of the critical factors one must consider when deciding whether to conduct a survey by either CATI or CAPI. I also will try to indicate the similarities and differences between these to methods of survey data collection automation. Definition Let me first define each of the methods. Computer Assisted Telephone Interviewing (CATI) is a computer assisted survey process which uses the telephone for voice communications between the interviewer and the respondent. Computer Assisted Personal Interviewing (CAPI) is a personal interview usually conducted at the home or business of the respondent using a portable computer. Rationale The rationale for the development and for your use of these methods are based primarily on reasons of improved data quality and improved timeliness of data release. Cost is a factor, but in our experience, it has been a break-even situation; the cost of automating has equaled the savings. This result has been due primarily to the high cost of software development. Factors The following are critical factors that must be considered in addition to those of improved data quality and timeliness, and cost when deciding whether to use CATI or CAPI for your survey data collection. I will discuss each of these factors in some detail. Hardware CATI Initially CATI was developed as a mainframe application but as computer technology changed, CATI moved to the mini computer and then to a networked micro computer application. The investment in hardware has steadily decreased without any lost of capability. Telephone technology, which impacts telephone availability is important to the CATI application - no phone no respondent. 197 Hardware CAPI The most important computer hardware criteria for a CAPI application are generally quite different from those that would be critical to most other applications. The major reason is the role that environmental conditions play in the selection of CAPI hardware. The fact that CAPI is a personal interview situation, usually taking place in or at the home of the respondent, dictates a number of possible circumstances under which the interview will be conducted. For example, screen visibility becomes a paramount criterion because of the environmental conditions. Interviews will take place under all types of lighting conditions; outside in bright sunlight, twilight, and normal light, and inside under lamp light, fluoresce light, and bear bulb. Weight is especially critical because of the variety of environmental conditions. Interviewers may be conducting the survey in an urban setting where the computer will be carried up and down the stairs of apartment houses; or in a suburban setting where the computer is carried many blocks; or in a rural setting where the computer is carried long distances from car to house. In any of these conditions, the computer is moved in and out of a car many times. This situation is further compounded by the fact that the interviewer must also carry considerable paper e.g. back-up paper questionnaires in case the computer fails, letters of explanation, introduction, and thank you. Carrying all of this weight in and out of cars and up and down steps all day is no easy job, particularly if the computer and back up battery weighs 10 plus lbs. and the paper weighs an additional 5 lbs. or more. For a household type survey, the interviewers are generally reluctant to ask for the respondent's permission to use power for the computer because of fear of possibly losing the interview. Also, surveys frequently are conducted outside of the house where no power is available. Many of our surveys can last as long as 2- 4 hours. Consequently, battery life it critical. Environmental conditions often impact the ergonomics of the hardware. Consider a survey interview conducted where the computer must be placed on the interviewer's lap. This situation would be quite difficult if the computer were either top heavy when open or the interviewer was small and the computer's depth long. Balancing would be a problem. Also consider the door step interview with a 10 lb. clam shell design computer. Software Now let's discuss the most costly factor in the CATI/CAPI decision - software. There are four components to the CATI/CAPI 198 software: Questionnaire, Case Management, Output Reporting, and Authoring System. The questionnaire component refers to the software that places each question in the survey on the computer screen in the proper sequence with the appropriate information (i.e. prompts) and allows the entry of an answer or answers to the question with edits on those answers such as; range, specific values, consistency with another question's answer. This software should also contain on screen help and if necessary, rostering. The case management component is the software that allows the interviewer to keep track of the status of the survey interview; that is, is the interview complete?; if the interview is not complete, what has been completed and what is the next question to be asked?; is the interview a partial interview or is the interview to be completed later?; what sections of the survey are mandatory?; and in some instances, interviewer assignments. In the case of CATI, case management software also would provide the sample selection and dialing of the phone number. The output reporting component is often either overlooked or given minimal consideration. This is a big mistake. Collection of the data is not very useful if the data cannot be easily accessed for analysis. Output reports can be categorized as either survey questionnaire statistics or management statistics. The level of detail and complexity can vary significantly. Survey questionnaire reporting can be as little as the ability to place the data into specific analysis software file format e.g. SAS or can include actual analyses. Management statistics can be extremely useful for the conduct of the survey data collection. For example, data can be automatically collected on the time to complete a section of the questionnaire by interviewer. This information could provide insights for training and/or question rewrite. The authoring system allows a non-computer programmer e.g. a survey questionnaire designer, to create the questionnaire while simultaneously and automatically generating the questionnaire software component. It has been our experience that this is the most difficult component to develop. Although there are a number of such systems that are available, none of these systems has met all of our requirements for the type of complex survey we conduct e.g. NHIS. The authoring system should be extremely user-friendly and be able to handle a large number of question types. 199 Data Transmission In the case of CATI, the data is automatically transmitted to a central point for either uploading to larger computer or further processing e.g. analysis. In the case of CAPI, the data collection is dispersed generally over a wide geographic area. The two primary methods for data transmission have been mailed floppy disk or telecommunications. For data that is needed in one day or later, floppy disk has been adequate. Telecommunications, however, adds a new dimension - Two way communications. Not only can data be transmitted to a central point, but instructions for the interviewers, for example, could be transmitted from the central point to the field. The major problem with the telecommunications method has been consistent quality of the communication lines. Cost can also be a barrier. Interviewer Training The level and amount of training needed depends, to large extent, on the level of user-friendliness of the software. Our experience has shown that the type of training is different for either a CATI or CAPI conducted survey than for the pencil and paper conducted survey. In the paper and pencil conducted survey, training is focussed on almost entirely on the content of the questionnaire, management of the questionnaire, and the proper question sequencing. It would not be unusual to have an accompanying instruction manual 3-4 inches thick that would have to learned by each interviewer. Whereas, in the CATI or CAPI conducted survey, training included both questionnaire content and the care and use of the computer. The major focus, being the computer not the content because the computer software can handle most of the problems the interviewer needs to worried about in the pencil and paper conducted survey, such as; probes, question sequencing, completeness. There is one major difference between CATI and CAPI that impacts on the training: the level of interviewer anxiety. CATI is conducted at a central location where supervision and help are readily available. CAPI, on the other hand, is conducted in the field where no supervision or help is readily available. Therefore, CAPI training must try to provide the interviewers with sufficient confidence in the software and hardware to cope with this lack of help. One method that has proven effective is to emphasize hands-on practice. Interviewers are encouraged to take home their computer and practice interviews with anyone they can get prior to going into the field. In addition, interviewers are given their computer prior to the training so they can have some familiarity with them. CAPI interviewers must be able to cope with 200 problem occurrences. Consequently, training must concentrate on such situations. Future Technology Impending technological advances can have a profound impact on these automation methods; particularly CAPI. Changes in hardware such as; an "etch-a-sketch" microcomputer and an inexpensive long- life, light-weight battery would open new possibilities for the CAPI conducted survey. Use of a light-weight computer, under 5 lbs,no key board, with light pen hand-written entry would allow door step surveys as well as reduce training efforts. The "etch-a- sketch" computer has been introduced by one vendor and several other are about to announce. The long-life light-weight inexpensive battery, although not currently announced or available, when available will produce much faster and larger light-weight computers. Thus allowing larger and more complex surveys to be automated. The development of an generalized authoring system software would open up the use of CATI and CAPI to the quick-turn-around type survey. Survey questionnaires could be designed and implemented quickly and easily. Staff productivity would also increase significantly because computer programming efforts to automate each survey questionnaire would be reduced to a minimum. The survey designer, in effect, would be programming the survey while designing the questionnaire. 201 COMPUTER ASSISTED SELF INTERVIEWING Ralph Gillmann Energy Information Administration The phrase "computer assisted self interviewing" (CASI) covers all survey methods in which respondents access computers. These methods include "computerized self administered questionnaires" (CSAQ) and "prepared data entry" (PDE) where the respondent fills out a computerized version of the survey instrument. Also included are methods where the respondent uses a telephone to access a computer: "touch tone data entry" (TDE) and "voice recognition data entry" (VRE). Let's step back for a moment and look at different ways that computers can be used in interviews: The top line represents direct interaction between an interviewer and a respondent. The left line represents the interviewer accessing a computer such as in CATI and CAPI which were previously discussed. CASI methods are illustrated by the lower right triangle. The diagonal represents respondents accessing an agency computer as in TDE and VRE. The right line represents respondents accessing their own computers as in PDE. With the personal computer (PC) becoming ubiquitous, at least in establishments, respondents usually have access to a computer. The bottom represents computer to computer interaction for data transmission. The missing diagonal would represent the activities of hackers and spies. 202 Next, let's compare manual and computer assisted methods: Some methods are part manual and part computer assisted. For instance, CATI and CAPI combine a personal interview with an electronic survey instrument. One survey which uses all of the computer assisted methods is the Petroleum Electronic Data Reporting option (PEDRO) in use at the Energy Information Administration. In general, the manual methods are slower and more prone to processing errors. Labor and postage costs are also rising faster than the operational expenses of computer assisted methods. For transmission of the data to the collecting agency, paper copies can be sent via facsimile machines (fax). This method is faster than the mail but doesn't eliminate the need to key in the data. If the data are in electronic form, a diskette with the data can be mailed in. This is useful if security and authenticity are a particular concern. Transmission time may be saved by sending the data over the telephone network or using "electronic mail" over a computer network. (Note that it's becoming harder to tell telephone and computer networks apart.) The use of an electronic mail service is feasible now and likely to be more important in the future. This method allows a third party to handle the support for telephone lines, security, and temporary storage. Respondents only need to have a terminal which operates over ordinary telephone lines if the survey instrument resides with the electronic mail service in the form of an electronic questionnaire. Security can be provided by passwords and data encryption. The survey agency can retrieve the data at its convenience. Finally, CASI offers several quality improvements: Increased timeliness of the data (especially important in monthly and weekly surveys) Fewer follow-up calls to respondents (because many, if not all, data edits can be done immediately) 203 Reduced respondent burden (fewer persons are needed to fill out an electronic form) Lower costs (at least in cases where labor and postage make up a large part of the costs) 204 COMPUTER ASSISTED SELF INTERVIEWING: RIGS AND PEDRO, TWO EXAMPLES Ann M. Ducca Energy Information Administration I am going to talk about two systems that the Energy Information Administration has for reporting data using personal computers (PC's). One system is a mail submission of a PC diskette, and the other uses telecommunications between the respondent's PC and our mainframe computer. The first example is the Reserves Information Gathering System, known as RIGS. It is a system for reporting data on domestic oil and natural gas reserves on PC diskettes. The data are collected by the EIA in its annual survey of oil and natural gas well operators. Reporting to this survey is mandatory. Briefly, this survey is a stratified sample survey with the stratification being done according to the amount of production of oil and natural gas. Respondents in the first strata, representing the largest amounts of production and having the most data to report, are eligible to report using RIGS. They will also continue to have the option of reporting on paper forms. The EIA cannot require an electronic form of submission. RIGS first became operational for the reporting of 1988 data. We anticipate that 25-30 percent of the 1989 reserves information will be reported using the RIGS system. The EIA sends PC diskettes containing the RIGS processing software by mail to respondents. A user's guide is also provided. The respondents install RIGS onto their PC's and use it to enter data. The basic hardware requirement is an IBM compatible PC with at least 360K of random access memory, and two floppy disk drives or one floppy and one hard disk drive. A printer should also be attached to the system so that a hard copy can be printed. Version 2.0 or higher of MS DOS is also required. The IBM PC compatible computer was chosen because of its wide availability. The software for RIGS was originally written in dBASE III, a PC database management system. dBASE III programs can only be executed using the dBASE III software, that is, stand-alone programs cannot be created. Since the EIA did not want to purchase and provide the dBASE III software for every respondent, Clipper, a linkage compiler, was used to compile dBASE III into object code to make it a portable system. The licensing agreement with Clipper permits run-time programs created by it to be operated outside the agency. Thus, the respondents are provided with an executable load module, not programs. Licensing agreements must be carefully 205 reviewed before planning to use software products outside an agency. An advantage of a load module is that respondents cannot directly or inadvertently change the programs. Also, there is no cost to the respondents since the RIGS software was developed by the government. Using the RIGS software, the respondents enter data directly on their PC. The data entry screens for RIGS are formatted like the data collection form. There may be some benefits to exploring other formats which take advantage of options available to automated collection, such as question sequencing. There is also the option of sending an ASCII file to the RIGS system so that data already available in an automated form at the respondent site can be submitted without re-keying. The RIGS User's Guide gives the instructions and record layout requirements for downloading ASCII files. Respondents are required to submit to us by mail a diskette containing a copy of the cover page and the data. They must also return a paper copy of the cover page with the signature of the certifying official. Because the survey is an annual one, it was decided that telecommunications with the EIA mainframe computer was not needed, and that the mail submission would be sufficient. Since the data in the RIGS system are proprietary, it was also decided that respondents would not be provided with their previous year's data because of the risk of sending confidential data to the wrong respondent. Preliminary edits such as range checks are performed as the data are entered into the RIGS system. If the system detects an incorrect entry, the bell sounds and a message appears across the top of the data entry screen. The message will prompt the user for a response. Help screens are available to assist the user, and help is also available by telephone on a toll-free number. For data that have been downloaded into RIGS, an edit report is produced afterwards. A respondent may then use the RIGS edit function to correct the errors. Final edits, such at comparisons with previous year's reports, are made after the data are returned to the EIA. These edits are performed on our mainframe system. When questionable data are identified, a quality control analyst contacts the respondent by telephone and changes are made by the EIA. Respondents also have the option to make notes in a footnote. These notes may be helpful in explaining data that appear to be questionable. 206 The second example is the Petroleum Electronic Data Reporting Option (PEDRO). It gathers monthly data for petroleum supplies from petroleum companies. The respondents eligible to use PEDRO participate in 7 monthly surveys. They include refineries, storage facilities, pipelines, importers, and extraction facilities. Reporting to these surveys is also mandatory. But again, the EIA cannot require an electronic form of submission. The participation in PEDRO varies among the 7 surveys. The market share represented by reports to PEDRO ranges from 25 to 90 percent of the total volume for a survey. The main difference between the PEDRO and RIGS systems is that PEDRO uses telecommunications to transmit data directly to the EIA mainframe computer. PEDRO users need an IBM compatible PC with a hard disk and a floppy drive, and a modem. As with the RIGS system, respondents are provided with an executable load module at no cost. PEDRO also requires the Arbiter communications software which is licensed only for use with the EIA. Arbiter was selected because it satisfied our security needs. The EIA supplies the respondents with Arbiter. The basic methods of entering data to PEDRO are the same as those with RIGS -- keying on the PC or sending an ASCII file to the PEDRO system. However, data submission in PEDRO is done by telecommunications directly to our mainframe, rather than by mailing diskettes. Since these are proprietary data, PEDRO submissions are encrypted. The transmissions are time-stamped to replicate a postmark. The respondents must use passwords to transmit data, and the password, rather than a written signature, serves as the certification of the validity of the data. All edits in the PEDRO system appear on the respondent's PC. Since there is a direct link to our mainframe, all data needed for editing comparisons, for example prior month's data, are available on-line. Preliminary edits are performed before respondents transmit. any data. Final edits are performed after the link to the EIA mainframe and transmitted back to the user. The EIA is very pleased with the RIGS and PEDRO reporting systems. We believe that we are getting data faster and more accurately from these systems, and are encouraged by the increase in interest in using them. 207 208 208 DATA COLLECTION Cathy Mazur National Agricultural Statistics Service In this session, I will first mention several factors to consider when deciding on a mode of data collection. Then I will spend a few minutes comparing the modes of data collection that have been discussed. The primary factors in choosing a method of data collection for a given survey are (as previously ;mentioned) the available time frame, the desired quality, and the cost of resources. It is unusual to have all three of these in abundance. Therefore, tradeoffs must be considered. Several other factors to consider which relate to survey design and operation are whether the survey is mandatory or voluntary, whether a onetime or ongoing survey is to be implemented, whether households or businesses are sampled, whether the data will be collected; in a centralized or decentralized manner, whether networking of computers will be done, the sample size, and the complexity of the questionnaire. The remaining factors to consider in automated data collection refer to the characteristics of the technology. First is the speed of the hardware and data transmission over the phone lines. Next is the size of the computer's memory, and the system's weight (as in CAPI). Portability is a concern to data collection when different hardware and/or software is to be used (as in Prepared Data Entry (PDE). The type of display is important in some modes (as in CAPI). The mode of data entry can be through the keyboard, a pushbotton phone, or using one's voice. Data verification depends on the desire for quality, the complexity of the data, and other factors. The database generation is also an important step (as was discussed by Martin Baum). It refers to integrating the data with other survey processes (label generating, data summaries). Hardware is selected based on cost, the amount of time available, the data quality desired, and the background of the staff that will operate the machines. Lastly, training is important in any survey, the amount of which depends on the technology chosen. The priorities that are given to these factors and the relationships between them, help to decide which technology to use. All combine data collection with data entry, and most add editing at the time of data collection. This reduces the time component and increases the quality component. Also, mixed modes of data collection are possible in a survey. 209 First, (as a means of comparison), a mail or manual survey would require a fairly long time to send out personal enumerators or to send and receive questionnaires through the mail. The amount of editing is very limited as data entry and editing is done after all the data is collected and the interview is completed. The cost is fairly high if personal interviews are done, and nonresponse may also be high if questionnaires are mailed out. CATI is used because it collects data quickly and accurately. The cost component (which is fairly high), comes from the hardware, software, training, and support factors (such as phone charges). One cost component which is eliminated is the travel expense. One suggestion is that CATI improves the cost benefit. The respondent, however, must have a phone. Other benefits are that it is useful in complex survey environments, can provide information on call scheduling successes/failures, and can be used for non-response follow up. CAPI also has fairly high costs, but it provides accurate data with a tendency for higher response rates (which may be a problem in CATI), and saves on the separate keyentry time. The largest cost component is due to travel (with some in hardware and software support costs). The weight, battery life, and screen visibility are important issues to CAPI. As to computer-assisted interviewing, 3 data collection modes are discussed -- Prepared Data Entry (PDE), Touchtone Data Entry (TDE) and Voice Recognition Entry (VRE). PDE provides faster and more accurate data, for an average cost. Costs are incurred in software development and support areas. This mode requires the availability of a PC (usually by establishments), and two issues are data security and data integration (as different PC's are used). TDE allows respondents to call and answer questions posed by a computer using the keypad of their touchtone telephone. VRE also allows respondents to call and answer questions posed by a computer, but the respondent answers by speaking directly into the telephone, and a computer system translates the incoming sounds into text. TDE and VRE offer low cost alternatives in a short data collection time, but editing is more limited. In both, surveys tend to be shorter and simpler, non-response prompts are used, and respondent acceptance is a concern. TDE requires access to a touchtone phone and service, where VRE can use any phone. The Bureau of Labor Statistics collects data monthly for the Current Employment Statistics Program using mail, CATI, TDE, and VRE. The VRE system recognizes any American English-speaking person with continuous speech of the numbers 0-9, yes, and no. These are not simple issues, and there are no clear cut answers. The definitions and importance of the factors must be 210 agreed upon. This comparison only represents the current state of technology, much will change with future development. Lastly, I hope this session has made you more aware of the possibilities, the issues, and what to consider when choosing a data collection method. 211 DISCUSSION Robert N. Tinari U. S. Bureau of the Census I want to begin my remarks today by noting that this paper is a very thorough treatment of the issues surrounding automated survey collection methodologies. I am impressed with the organization of the paper and the thoroughness of discussion of the many considerations that go into selecting, designing, and implementing these types of data collection systems. The subcommittee is to be commended for the excellent job they have done in bringing together in one document a tremendous amount of information that I think will be extremely useful to those considering alternative data collection methodologies. Based oh my experience as a program manager responsible for the initial development and implementation of CATI on the National Crime Survey, there are several issues raised in the paper that I believe need more emphasis. The first issue I want to discuss has to do with organization and its affect on CATI/CAPI development and implementation. In its conclusion, the committee notes that increased reliance on software development has important implications for hiring and training skilled survey designers. It also states that previously distinct boundaries between occupational groups will-continuously blur and disappear and survey design will likely be increasingly accomplished through teams of skilled workers from different occupations. Based upon my experience, I believe that this is an accurate assessment. Obtaining the maximum benefit from the these data collection methodologies requires that a fully integrated system be developed and this, in turn, requires the concerted effort and collaboration of programmers, survey design experts, statisticians, field staff, program managers, and survey sponsors. However, the level of cooperation and communication necessary to successfully design and implement CATI/CAPI may be very difficult to achieve in a large, hierarchical organization. Staffs tend to be highly specialized and not experienced in projects requiring a multi-disciplined approach. From my own experience working on one of the first CATI applications at the Census Bureau, we had a very difficult time organizing the right team with the right experience necessary to get the project underway and in keeping the lines of communication 212 open among the various divisions involved to implement it successfully. We learned a lot from that process and have come a long way. A recent example is a cooperative effort between the Economic Area and the Demographic Area in successfully developing and implementing a CATI system for the Survey of Manufacturing Technology. The Industry Division was responsible for conducting the survey and wanted to use CATI for nonresponse followup of manufacturing plants. The division lacked the experience to develop the questionnaire on CATI. Demographic Surveys Division offered to help with the authoring, Industry assisted with testing and Field Division worked on interviewer training and data collection. The survey was carried out on time, within budget, and with high quality. This is a good example of what can be accomplished by individuals working together from the various divisions and sharing their expertise to get the job done. Poor organization and control can have a very serious impact on the cost and time of development and the quality of the final product. I believe that what is needed to successfully design and implement automated data collection methodologies is: 0 commitment and full support from upper-level management. 0 a full-time, dedicated staff - no part-time work along with other projects. open lines of communication with clear assignment of responsibility/accountability. 0 designate a project coordinator/facilitator 0 breaking down of traditional barriers between survey statisticians, mathematicians, survey designers, programmers, and field staff in order to work effectively. 0 ongoing commitment and organizational change to adapt to needs of the new data collection methodology. Especially important if you are using mixed mode such as personal visit (paper) and centralized telephone (CATI). 0 reduced layers of bureaucracy. 0 empowerment of the team to get the job done. We must think of new ways of organizing ourselves to be more flexible and effective in designing and implementing new technologies. In addition, there must be more sharing of 213 information among the various statistical agencies on approaches and experiences in the area of organization. The second issue has to do with interviewer acceptance of new technologies like CATI and CAPI. The paper points out the importance of involving the user in the design process. I do not think this point can be over-emphasized. In the rush to develop survey instruments on tight time schedules or in deciding which portable machines to use for CAPI applications, we the developers and/or program managers, take it upon ourselves to decide what is best for the interviewers and may not actively involve them in the decision or development process. This can be a big mistake. If the interviewers are not comfortable with the interface, if it is slow, clumsy or awkward to use, "not natural" feeling, not helpful, etc., the survey is in serious trouble. If the interviewers have no say in the design and for any reason should decide that the system is not helping them to get the job done better, then you face an uphill struggle to gain their acceptance, and in some instances, the system may never be fully accepted. Interviewers may work to defeat the system, morale may suffer, respondent cooperation may suffer, turnover rates will increase, quality will suffer, and costs will escalate. In addition, if you are contemplating switching from a personal visit environment to CATI, you must consider the effect on the interviewer staff out in the field. Field interviewers will be concerned about losing their jobs and quality may suffer during the transition to CATI. How the Field interviewers will be treated and possible impact on data quality during the transition period should most definitely be taken into account. For example, in planning the transition of cases from personal visit to CATI for the National Crime Survey we used attrition among interviewing staff and hard to enumerate areas for conversion to CATI. By using this approach, CATI was viewed as positive tool by Field staff. This plan helped to gain acceptance of CATI. The third and final area I want to discuss has to do with the need for adequate testing and evaluation of these new methodologies. Before implementing any survey operation, it is good practice to allow enough time for adequate testing and evaluation of the instrument and the data collection and processing system. This is especially crucial for automated data collection systems. Complex questionnaires (those with complex branching or edits)need to be thoroughly tested and evaluated before they are introduced on a production basis. 214 While the automated data collection systems provide us with the ability to field much more complex questionnaires than we could using conventional paper forms, they also pose additional challenges related to testing. Aside from the obvious problems that may surface during interviewing, if the instrument is not adequately tested, there may be logic errors hidden in the instrument that go undetected or aren't found until after the data collection phase is complete. In addition, when changes are introduced to the questionnaire, (even minor ones), thorough testing should be conducted again to insure that other questions or skip patterns have not been affected. In the paper, the committee discusses the possible application of expert systems in questionnaire development. I would suggest that perhaps some application could be found for these systems to testing and evaluating as well. There is definitely a need for more systematic and thorough methods for checking out the questionnaire. In addition, attention must be paid to testing the case management, call scheduling, training, data transmission, and processing systems before the survey is fielded. This is not something that only needs to be done before, a survey is fielded. It should be an ongoing effort to evaluate how well the system is functioning. It should allow for feedback for continuous improvement/refinement such as monitoring, observation, debriefing interviewers/respondents. I want to thank the organizers for giving me the opportunity to share my views on this important topic. I think the committee has made an important contribution by bringing together in one document many of the issues facing project managers in deciding whether or not to adopt these technologies. I hope that the document will be treated as a dynamic one that will be expanded as we gain more experience with the various aspects of these data collection methodologies. 215 DISCUSSION David Morganstein Westat, Inc. I thank Terry Ireland for organizing this intriguing session and I would like to express my appreciation to the speakers for the work they have done in their examination of new methods for assisting in the processor conducting government surveys. It is a pleasure to be given this opportunity to participate in the session as a discussant. The job description for a discussant might be: - To agree with the speakers comments, - To point out errors or omissions, - To suggest areas of new research, or - To do something completely different that they'd like to do! I think I will try a little of all four of these objectives. There is a great heed for new approaches to gaining cooperation as the respondent population is increasingly bombarded with requests for survey participation. The initial 1990 Census experience indicates the level of difficulty surveyors can anticipate. According to our speakers, their "Primary job is to develop ... computer related techniques which allow the respondent to answer the survey completely and accurately". The emphasis on the respondent's cooperation is very appropriate. There is a potential trap of having the software developed by software experts who have little knowledge of or interest in the respondent/interviewer who must use the system. At a minimum, a part of the system designer team should be practitioners of long standing who understand the process. There may be good reason to have this leader-of the team be such a practitioner. I was concerned by the following statement found in the paper, "Interviewers must believe that Computer Assistance will improve their effectiveness. They need to be convinced that the computer is simply a tool to expedite and simplify their work. This sounds a bit like psychological behavior modification. Such verbal persuasion should be unnecessary. In fact, the users WILL believe and be convinced IF the system actually DOES this! You can be sure that no amount of argumentation will insure the interviewer is support if the system is awkward, difficult to use and makes their work harder. 216 The focus of the paper was primarily on the technology. It said little about comparison studies which measure the accuracy/reliability of CASIC responses as compared to more traditional methods. For example, an L84 paper by Waterton & Duffy in the International Statistical Review indicated self-reported alcohol consumption rates that were significantly higher when obtained via CASI than previously measured by interviewer. Perhaps there have not been enough such studies, however, there is a need for them. The paper pointed out the importance of a good authoring system to CAPI but didn't say the same for CATI. I believe it is true in that environment as well. Quality Measures (Human Interface discussion) are very important and are needed if we are to evaluate the efficacy of these new approaches. The authors also mentioned an evaluation by 'user' (interviewer), something I agree is important as it speaks to the committees 'primary job' mentioned earlier. I found the Appendix 3 examples a useful reference for contacts. The authors would perform a valuable service if they would include names and phone numbers for all contacts. These approaches conform to the modern concept of quality. Reduced variability is designed into the system. They reduce the potential for 'creative interviewing' in which undesired variation is introduced by the interviewer during the interview process. While I have not worked with CASI, it would appear that it could suffer from a potential loss of control by the survey operator. It could be subject to 'creative respondents' who are intrigued by technology or who seek to befuddle the survey operators. Care must be taken to insure that this does not occur. The survey instrument's logic/design still depends upon the human mind. Techniques for encoding it into a CATI/CAPI/CACI system need to be better understood. An unrealized advantage of these methods is that they force the designer to better understand the instrument/flow earlier in the process. The designer can't rely upon last minute training/role plays with the interviewers to clarify muddy logic or instrument flow. I would like to close my comments on the value of these high tech methods for assisting in survey operations with the following short essay on the beauty of the abacus written by Robert Fulghum. Essay taken from All I Needed to Know I Learned in Kindergarten, Robert Fulghum. 217 218 Session 7 QUALITY IN BUSINESS SURVEYS 219 220 IMPROVING ESTABLISHMENT SURVEYS AT THE BUREAU OF LABOR STATISTICS Brian MacDonald Alan R. Tupek U. S. Bureau of Labor Statistics, Introduction The report on "Quality in Establishment Surveys" (see Statistical Policy Working Paper 15, 1988) concluded that there were few commonly accepted approaches to the design, collection, estimation and analysis of establishment surveys. In contrast to household surveys, there was little standardization of methodological approaches across establishment surveys. The report classified potential sources of errors in establishment surveys and examined the range of practices which are used to improve and measure quality. Each Federal agency which collects statistical data from establishments develops their own frame of business establishments. These frames are of varying quality, which greatly affects the methodology for surveys and contributes to the divergence of methodology across establishment surveys. This paper first provides a summary of the design considerations for establishment surveys as discussed in Statistical Policy Working Paper 15. This paper then describes the efforts at the Bureau of Labor Statistics (BLS) for improving their business establishment list, the effect of these improvements on BLS surveys, and the potential impact on other statistical agencies. Design Considerations for Establishment Surveys Establishment populations differ from household populations in several ways (see Statistical Policy Working Paper 15). These dissimilarities result in frame development, sample design, and estimation approaches which are in some areas markedly different from Approaches for household surveys. Among the major distinctions between establishment and household populations and frames are: 1. Establishments come from skewed populations wherein units do hot contribute equally (or nearly equally) to characteristic totals, as is the case for households; and 2. Accuracy of frame information about individual population units is crucial to sample design and estimation for establishment surveys, while for household surveys the 221 accuracy of frame characteristics concerning individual units is not as critical to the sample design. Establishment surveys are characterized by the skewed nature of the establishment population (see, for example, Table 1). A few large firms commonly dominate the estimates for most of the characteristics of interest. This is especially true for characteristics tabulated within an industry. Small firms may be numerous, but often have little impaction survey estimates of level although they may be more critical to estimates of change over time or for measuring characteristics related to new businesses. This distribution has a major impact on both the frame development and maintenance and on the sample designs used for establishment surveys. SOURCE: U.S. Bureau of Labor Statistics List frames are widely used in establishment surveys conducted by the Federal government. The use of list frames for establishment surveys arose from the availability of administrative records on businesses compiled mainly for tax purposes. However, because these administrative record files are not normally developed for statistical purposes, they often need refinement before being used as sampling frames for surveys of businesses. Extensive resources are spent on maintaining the list frames since a significant source of nonsampling error may be due to inadequacies in the frame. 222 Establishment list frames typically are characterized by detailed establishment identification information, periodic updating of this information, and multiple sources for the information. The data on the frame are required for sample design, sample selection, identification of sample units, and estimation. The primary source of administrative records for a frame may have shortcomings which require the identification information to be supplemented using other sources of information. This may include using identification information from the surveys themselves. Supplemental filet, including the use of area frames, may also be required to overcome coverage problems in the primary source. Duplication of sampling units may also be a problem associated with the use of list frames. Refinement of the frame includes efforts to unduplicate units prior to sampling. The individual establishment information on the frame is critical to the effectiveness of the sample design and estimation for the survey. Maintaining a frame over time is complicated by the dynamic nature of the establishment community. Changes in ownership, mergers, buyouts, and internal reorganizations make frame maintenance a real challenge. Matching and maintaining unit integrity over time provides the opportunity for consistent unit identification in the numerous periodic surveys conducted by the Federal Government. New establishments must be added to the frame. However, it is often difficult to differentiate, using administrative records, new establishments from formerly existing establishments that have changed their name or corporate identity. It is also difficult to link businesses over time when there have been ownership or other changes. Each survey may have different requirements as to handling of new establishments and changes in existing establishments. The timeliness of adding new establishments to the frame and reflecting them in the sample is also a problem. The lag time between formation of new establishments and selecting them into the sample may be anywhere from several months to several years. While new establishments may have little impact on estimates of level, in some instances they may dominate estimates of change. The Business Establishment List Improvement Project In May 1987, the Economic Policy Council issued a report that noted five areas in national economic statistics where improvements were needed. One of these areas dealt with the business lists used by the three major Federal statistical agencies to conduct their surveys. One of their recommendations was that the Bureau of Labor Statistics and the National Agricultural Statistical Service of the Department of Agriculture be designated as the central Federal government agencies for the collection of nonagricultural and agricultural, respectively, business identification information. 223 In addition, the Economic Policy Council recommended that efforts be initiated to revise the statutes that prohibit the sharing of survey data collected by the Census Bureau with other specified Federal statistical agencies. The main purpose of the Economic Policy Council recommendations was to have a single, high- quality source of business data available to selected Federal statistical agencies in order to increase the quality and comparability of national economic statistics. Shortly thereafter, the Office of Management and Budget (OMB) requested that the BLS develop a proposal to assume this role. The issue of devoting resources to developing a central frame is not unique to the fragmented U.S. statistical system. Statistics Canada is in the process of developing a central frame for its business establishment surveys (see Colledge and Lussier 1981). For the BLS universe file to sufficiently serve as the primary frame for statistical survey sampling by Federal statistical agencies, the BLS recognized that modifications to its existing file were necessary. The most critical need was to improve the information available about employers engaged in multiple operations within a State. The Business Establishment List (BEL) Improvement Project was initiated to do this. Its primary purpose is to create an establishment (i.e. worksite) based register of units with full identification information on United States' businesses. At present, data for multi worksite employers in the BLS register are available mostly at a higher level of aggregation. The data for the current BLS universe file come primarily from administrative records collected by State Employment Security Agencies (SESAs) as part of the administration of the Federal/State Unemployment Insurance (UI) System. All employers covered by unemployment insurance are required to file quarterly UI Contributions Reports with the SESAs for each of their UI accounts. On these forms, employers report the number of full and part-time workers, employed during the pay period including the 12th of each month in the quarter and the total payroll for the quarter. This reporting is mandatory for single location employers as well as those engaged in multiple operations in the State. Data collection and classification procedures for multi-unit employers differ from those for single units. For multi-unit employers, the statistical branch of the SESA is responsible for the direct collection and review of monthly employment and quarterly wages at the reporting unit (county by industry) level of detail. A multi-unit employer is defined as an employer who has more than one industrial activity (four-digit SIC) and/or county location covered by the same UI account and meets, the following criteria. To quality as a multi-unit employer the employer must have 50 or more employees in the sum of their secondary industries 224 or counties. The primary industry or county is defined as the industry or county that has the greatest number of employees. Under the BEL Improvement Project (see Searson and Pinkos 1990), this threshold is being lowered from 50 employees to 10 employees with the States being responsible for collecting employment, wage and identifying data at the worksite level. Thus, more detailed business identification information will be available for small multi-establishment employers. Multi-unit employers that do not meet the above criteria are treated as if they were single-unit employers for data collection and recordkeeping purposes. These small multi-unit employers who are engaged in multiple industrial activities within one county are assigned industry codes based on their primary activity (that is, the activity providing the most shipments or sales). Conversely, those in one industry with several locations are given a county code based on the location employing a majority of all the employees. Large multi-unit employers are treated differently than single units as they are requested to file a quarterly statistical supplement form in addition to the Contributions Report. On the SESAs' current forms, large multi-unit employers report monthly employment, quarterly wages, industry and location information for each reporting unit. These supplements are used to maintain separate identification and characteristic records on the individual reporting units to ensure correct geographical and industrial totals are maintained. As part of the BEL Improvement Project, the BLS is replacing the 53 individually-designed State forms with a standardized statistical supplement form. The name of the form is being changed to the Multiple Worksite Report. Each quarter, the employer will be requested to verify the identifying information (trade name, description of the establishment, and physical location address) for each establishment (worksite) that will be computer printed on the new Multiple Worksite Report. In addition, the employer will be requested to provide the monthly employment and total wages for each worksite for that quarter. By using a standardized form, the reporting burden on many large employers, especially those engaged in multiple economic activities at various locations across numerous States, should be reduced. States will accept listings and floppy diskettes of this information in lieu of the form. In addition, the BLS is investigating the central collection of multiple worksite employers data from major multi-establishment employers. The Multiple Worksite Report form will be used in all States to collect data by establishment (worksite) beginning with data for the first quarter of 1991. Some twenty-one States, however, are switching to a State version of the new form with data, collected for the first quarter of 1990. 225 As a result of these efforts at worksite reporting, we expect the number of units on the frame to increase from approximately six million to slightly more than seven million. Because the UI system still serves as the basis for the worksite based frame, both the scope as well as the data on employment and wages on the new frame will be identical to that on the old frame, only the level of disaggregation will be different. Implications of BEL on BLS Surveys Several features of the BEL Improvement Project will affect the design of BLS sample surveys (see Plewes 1989). These include: - reporting unit number for each worksite of multi-unit companies; - better identification information, including multiple unit multiple addresses, worksite descriptions and telephone numbers; - better linking of data over time through the use of reporting unit number for worksites within multi-unit UI account numbers. Also, UI accounts will be linked through the use of predecessor and successor codes for ownership changes such as buyouts, mergers, etc; - more data items for each unit, such as initial date of tax liability, date of establishing a new worksite, and comment codes for explaining unusual employment changes; - quarterly data, historical files, and response history files to track the surveys for which a worksite has been selected and whether they have responded; - linking of units within enterprises or corporations, across UI accounts; and - improved standard industrial classification (SIC) refiling process, in order to identify new multi-'worksite reporters in addition to updating SIC codes on a 3-year cycle. The effect of these BEL improvements on four areas of survey design will be examined. These include sample frame development, sample design, data collection, and estimation. Implications for the short-term, during the period in which the survey program will transition into the improved system, as well as the long-term will be discussed. The transitional period implications are usually related to problems in maintaining consistency of survey estimates while BEL improvements are implemented. The long-term implications 226 are usually related to improvements that can be made to survey designs by reexamining survey design objectives. Over the years, each BLS survey has developed activities for creating their sampling frame from the old Universe Maintenance System, which BLS will change. These unique activities for each survey focus on specific survey requirements as well as limitations of the list. For example, BLS surveys which attempt to maximize sample overlap over time must match frame units from one time period to another. The BEL improvements will affect the matching operation, due to the shift to worksite reporting. During the transition period, the surveys may need to reexamine the need to maximize sample overlap. If they maintain this objective, then less sample overlap is likely, and much of the operation will need to be done manually. However, in the long-term the use of reporting unit numbers, and predecessor and successor codes should greatly facilitate the automated matching operation. Other BLS surveys use supplemental frames to survey populations not entirely covered by the BEL. These populations may include railroads; federal, state and local government; religious organizations; and seasonal industries. BEL improvements will allow many surveys to reexamine the need for supplemental frames, especially for state and local governments, and seasonal industries. Several other long-term benefits for sample frame development are possible through BEL improvements. The availability of quarterly data can be used by some surveys for creating their sample frame. The identification of new businesses on the BEL can be used as a stratification variable for surveys. Although BLS does not now do so, the new list will enable survey operators to conduct surveys of enterprises or companies. This will bring about reconsideration of the scope of the surveys. All surveys will need to modify their control file systems to handle additional data items on the BEL. At this stage of the planning process, certain obvious changes have been identified for each survey. The following three examples illustrate the types of operational modifications which are planned. First, the survey which is used to develop the Producer Price Index (PPI) must use lst quarter data for measures of size. The BEL improvements will allow PPI to use more current quarterly data, or other quarters for seasonal industries. This is expected to improve the coverage of some industries, and to increase the sample design efficiency. Second, an annual survey which measures occupational industries and illnesses supplements the BEL with a frame of the 500 largest companies in the United States, 227 including all of their subsidiaries. Currently, this supplemental frame is developed specifically for this survey. The BEL improvement plan will provide adequate organizational relationships for large companies, so that the separate operation will be terminated. Third, a monthly survey of employers, which measures employment and average hourly earnings, lags in measuring the affect of new businesses. A sampling strategy is being developed for this survey, which will bring in a sample of new businesses each month, once the BEL improvements are introduced. Greater flexibility in sample designs will be possible with the introduction of BEL improvements. Separate strata for seasonal or volatile firms can be considered. Stratification by age of firm may be appropriate for some surveys. Surveys designed to produce local area estimates can use worksite locations for stratification. Surveys may want to stratify by multi-reporters versus single reporters, or by enterprise size. The survey response history can be used to avoid overlap between surveys and to spread respondent burden. During the transition period for BEL improvements, there will be some loss in sample design efficiency. The use of current data to develop sample designs for surveys conducted during the transition period will be somewhat inappropriate. In the long- term, sample design efficiencies will be possible through the use of new design variables and more homogeneity within size classes. Surveys with size cutoffs will need to reevaluate the survey scope or target population. Some BLS surveys cover only large establishments. For example, most of the occupational wage surveys cover only, establishments with 50 or more employees. The BEL improvements will shift units between size classes. In general, the sampling unit will shift from a county-wide report to a worksite report. Maintaining a 50 or more employee size cutoff will artificially move units in or out-of-scope of the survey and decrease employment coverage. The effect on wage estimates will need to be examined, and decisions made on how to maintain consistency over time. Surveys designed to measure change can use the linking of data over time to improve on the efficiency of the sample design through sample overlap. Samples for surveys conducted three or more years apart are now independently selected. With historical relations maintained over time, samples could be selected which improve upon estimates of change, possibly using composite estimation. The new features of BEL will be most beneficial during the data collection phase. Because of better address information, especially physical location addresses and telephone numbers, 228 response rates are expected to increase for mail and telephone surveys, since one of the primary reasons for low response rates is failure to reach the correct respondent. Additionally, better address information will result in a decrease in data collection time and effort, such as reduction in telephone and mail follow-up of nonrespondents. The breakdown of the multi-establishment companies that presently report on a consolidated basis (e.g., county-wide) into establishment or worksite level reporting will affect all BLS surveys. Surveys will need to make special reporting arrangements with these companies to provide data on a worksite basis. Recent cognitive research conducted by Statistics Canada shows that respondents who are in the survey on a regular basis report data in the same manner from one time period to another and usually do not take into account changes to the survey instrument or procedures. The worksite information should reduce the reporting error due to failure to identify the selected sample unit. The impact of BEL during the estimation process for BLS surveys will vary significantly by survey type and estimation procedures used. An area of survey estimation that will be affected by BEL is benchmarking. Benchmarking is a process that accounts for changes that occur during the time lapse between the reference date of the sampling frame and the date of data collection. In other words, it accounts for births, or those units which have come into existence since the sampling frame was created. This is accomplished by multiplying the sample estimates of totals by the benchmark factor at the estimating cell level, usually SIC or size class within an SIC. For BLS surveys, the benchmark factor is calculated at the estimating cell level as the rator of the reference period employment (benchmark employment) to the weighted employment from the sample. Surveys that benchmark at the size class level would be most affected because of the change in the distribution of units across size classes due to worksite level reporting. For example, size class benchmarks for a survey that measures occupational employment statistics (OES) by industry may be inappropriate during the transition period. A possible solution for all surveys which benchmark by size class is to benchmark at the industry level during the transition period. With the new business registry, population data for benchmarking employment will be available for all 12 months. This additional information may be utilized by the Current Employment Statistics (CES) Survey, which is a monthly survey of about 300,000 establishments that measures employment at National and State levels by industry, to benchmark the employment data quarterly and thereby better analyze the components of error by time period. 229 Central Agency Status When the OMB issues the directive naming BLS as the central agency charged with maintaining a list for nonagricultural businesses, several actions will have to be undertaken before extracts from the BLS list can be made available to other Federal statistical agencies for use in surveys. First, BLS will have to conduct a series of negotiations with the State Employment Security Agencies to gain their agreement to waive or modify existing State confidentiality rules and regulations that would currently not allow widespread use of the state provided UI data. We expect that most SESAs will readily welcome the sharing for statistical purposes of these data. There have recently been examples where most, if not all, State agencies authorized this type of data sharing, but on a much more limited basis. In those few States where current State law might prohibit the sharing with other Federal statistical agencies, we will propose modifications to the State Unemployment Insurance laws to allow the sharing and work with the state agencies to seek passage of the heeded legislation. Similarly, there will have to be certain actions taken both by BLS and those Federal statistical agencies authorized by OMB to have access to the BLS list before the sharing can begin. BLS will have to develop formal procedures for use of the file by other agencies. These procedures will include such obvious items as security measures for the data, assurances that the confidential data will be used for statistical purposes only, agreements on feeding back 'corrections' or updates to the file, access rules and techniques (the BLS list is maintained at the NIH computer facility) and arrangements made to cover marginal Operating costs for providing the data. A possible solution to the question of providing for satisfactory computer security may be for the using agency to have conducted an application security review for its own sensitive Automated Information System in compliance with the requirements of OMB circular A-130. Summary A central agency charged with maintaining a list of nonagricultural businesses provides an opportunity for improving business establishment surveys conducted by the Federal Government. However, the key to its success will rest with the ability of all the agencies involved to provide clear and concise requirements to the central agency, and to weigh the costs of improvements to the central list against the benefits to survey operations and data quality. 230 References Colledge, M. and Lussier, R. (1987), "A Generalized Methodology for Economic Surveys" in Proceedings of the Business and Economic Section of the American Statistical Association Annual Meetings, pp. 131-149. Plewes, T. (1989), "Improving the Business Establishment List: Survey Design Implications" in Proceedings of the Fourth International Roundtable on Business Survey Frames, Newport, Gwent, United Kingdom: Available through the U.S. Department of Labor, Bureau of Labor Statistics, in press. Searson, M., and Pinkos, J. (1990), "The Bureau of Labor Statistics' Business Establishment List Improvement Project" in Proceedings of the Sixth Annual Research Conference, Washington, D.C.: U.S. Department of Commerce, Bureau of the Census, in press. Statistical Policy Working Paper 15 (1988), "Quality in Establishment Surveys", U.S. Office of Management and Budget. 231 A REVIEW OF NONSAMPLING ERRORS IN FEDERAL ESTABLISHMENT SURVEYS WITH SOME AGRIBUSINESS EXAMPLES Ron Fecso National Agricultural Statistics Service Working Paper 15 (WP-15), "Quality in Establishment Surveys," addresses the accuracy of establishment surveys. Although WP-15 concentrates on accuracy, we need to recognize that accuracy is only a part of the total quality picture. Remember the importance of other aspects of quality and their interaction with accuracy concepts. The definition of survey quality is the totality of features and characteristics of a survey that bears upon its ability to satisfy a given need. Sometimes these ideas are referred to as "fitness for use." Discussions of quality usually address how well something is made. We must also address the true needs of the product or service as well as productivity issues such as increased output and unit cost. Continued pressure on budgets and demands for increased statistical output are quality aspects which may be occupying major portions of our time. Thus, a model for survey quality needs four elements: accuracy, timeliness, relevance and resources. The intent of this paper is to provide a glimpse of the nonsampling error treatment from WP-15 and several examples of the treatment of nonsampling errors in agricultural surveys. I hope that I can persuade the audience to study working paper 15 in more detail after seeing this commercial. Many sources of error are possible in establishment surveys. While there are several good ways to organize the presentation of these errors, WP-15 chose two main groupings: design and estimation, and methods and operations. The latter group contains the nonsampling errors which are highlighted here. Nonsampling Errors Errors which arise during the specifications for and the conduct of establishment surveys are called nonsampling errors. Commonly known examples of nonsampling errors include incomplete sampling frames, nonresponse and keypunching errors. The variety of nonsampling error sources and results from studies of these sources lead survey researchers to believe that nonsampling errors may often far exceed sampling error. There are three objectives found in the chapter on nonsampling errors in WP-15. The objectives are to outline major categories of nonsampling errors in establishment surveys, to identify some of the diverse sources of error in each category, and to provide insight into strategies to detect, measure, and control these errors. The error categories 232 discussed are specification, coverage, response, nonresponse, and processing errors. WP-15 defines each of these error groups, gives examples, identifies major sources of the error, describes methods to control and measure the errors, and profiles the control and measurement techniques used in the major establishment surveys of the Federal Government (9 agencies and 55 surveys). (The presentation contained some detail about response error treatment and examples of WP-15's graphics since most of the audience had not seen WP-15. These materials are not reproduced here.) Although several good references are available concerning nonsampling errors in surveys of individuals (for example United Nations, 1982), WP-15 is the first detailed treatment for Federal establishment surveys. The need for this separate treatment arises because establishment surveys differ from surveys of individuals by typically seeking hard data for which records are available. This characteristic both simplifies the collection and complicates the interpretation of the data. The collection is simplified when hard data on record can be used, rather than relying on the memory, opinions, or interpretations of the respondents. These differences present complications when establishing the concepts and definitions to be used in the surveys. Special care must be taken to consider carefully the establishments' recordkeeping systems, definitions, and data availability to avoid introducing specification error into the data. Establishment surveys, which commonly use list frames, are subject to errors such as duplication, overcoverage of out-of-scope and out-of-business units, under coverage of business births, and misclassification of units. The availability of records affects the structure of the response and nonresponse errors as well as the methods to measure and control them. The treatment of processing errors differs the least from other types of surveys. SOME HIGHLIGHTS OF WP-15 WP-15, unfortunately, makes no specific recommendations. Yet, the profile of nonsampling error practices used in 55 Federal establishment surveys by nine agencies provides considerable insight into the state of quality in these surveys. This commercial for the paper will present a few of the highlights. 0 No single measurement of specification error is used in a large majority of the surveys profiled. 0 Relatively little is done to measure specification error. 0 Few direct measures of list coverage error were reported as regularly used. 233 0 Outside of the calculation of edit failure rates, little response error measurement is done. 0 Although follow up procedures for large units are common, very little is done to directly measure nonresponse error. 0 Cognitive studies are rare. 0 Questionnaire pretesting was not widely used on a regular basis. 0 Relatively few nonsampling error measurements are published. 0 There is relatively little information about processing errors. WP-15 contains considerably more detail on good practices which are currently in use as well as those practices which are lacking in use and need examinations WP-15 states in an overview that "Nevertheless, the tenor of the findings can be depicted as recommending more work to improve and document the quality of surveys... a need to focus additional attention, and resources, on the general improvement and documentation of survey practices." A Reinterview Study from Agribusiness An example of measuring response error in an establishment surveys is next. The results presented are from a reinterview study which measured the bias of Computer Assisted Telephone Interviewing (CATI) methods on a National Agricultural Statistics Service (NASS) survey.(Fecso and Pafford) As part of its estimating program, the NASS publishes quarterly estimates of crop acreage, intentions to plant, actual plantings, harvested acreage, stocks of grains, and livestock numbers. The source of these estimates is a multi- purpose, multi-frame survey. Because of the detailed nature of acreage, stocks and livestock inventory items, the NASS had relied primarily on personal interviews to get the most accurate answers from the farm population. For example, on-farm grain stocks data, extremely important because of their effect on commodity trading, is a collection problem because farmers may store these grains in multiple bins on property they own and/or rent. In addition farmers often have multiple operating arrangements involving their own grains, those of landlords, and those where formal and informal partnerships exist. Recently, NASS has expanded the use of telephoning, including CATI to collect these data. The primary reasons for change are 234 inadequate budget and the need to reduce the time between initial data collection and publication. We suspected difficulty in using the telephone to collect some of these quarterly survey data. Obtaining accurate responses is difficult because of the detailed nature of these data and the centralized (state) telephone interviewers often lacking farm experience and familiarity with farm terms. The reinterview study is our first attempt to measure response errors. You cat find the use of reinterview methods in the literature for measurement of simple response variance (Bailar, 1968; O'Muircheartaigh, 1986) and correlated response variance (Groves and Magilavy, 1986), for example. This response error study focused on measurement of the bias by treating the final reconciled response between the CATI and independent personal reinterview response as the "truth." To obtain truth measures, experienced supervisory field enumerators reinterviewed approximately 1,000 farm operations for the December 1986 Agricultural Survey. The following tables contain the results for the grain stocks items (corn and soybean stocks). Table I indicates that the difference in the CATI and final reconciled responses, "the bias," was significant for all but one item (soybean stocks in Indiana). The direction of the bias indicates that the CATI data collection mode tends to underestimate stocks of corn and soybeans. The process of reconciliation identified the reasons for differences. A summary given in Table 2 indicates that an overwhelming percent of differences (41.1%) could be related to definitional problems (bias related discrepancies), and riot those of simple response variance (random fluctuation). Definitional discrepancies contributed almost half of the large bias. About two-thirds of the definitional discrepancies had a relative difference (the reconciled response minus the CATI response divided by the CATI response) more than 25% or less than -25%. In contrast, the differences due to rounding and estimating contributed less than 10% of the overall bias. Almost all of the rounding and estimating relative differences were between -25% and 25%. 235 TABLE I. Estimates of Bias in CATI Collected Responses * Indicates the CATI and final reconciled response were significantly different at a=.05. These results suggest that we can reduce the bias in the survey estimates generated from the CATI telephone sample using a revised questionnaire design, improved training, or a shift in mode of data collection back to personal interviews. considering the constraints of time and budget, the change to additional personal interviews is unlikely. Thus, the alternative is to use reinterview techniques to monitor this bias over time to determine whether the bias has been reduced through improvement in questionnaires or training. If large discrepancies continue, the estimates for grain stocks can be adjusted for bias through a continuing reinterview program. If the bias stabilizes, even at zero, periodic reinterview studies can validate a "constant" bias adjustment used in interim periods., An Example -- Bias Measurement NASS conducts crop yield surveys in states which are major, producers of field crops. The survey data are used to forecast expected yield and production during the growing season and to estimate these values at harvest. Briefly, the survey design can be described as a multiple step sampling procedure. Samples are drawn from an area frame to estimate acreage for harvest, followed by subsampling of fields and small plots to make measurements related to yield per acre. Detailed information on the area frame design is available in Fecso, Tortora and Vogel. More detail on the crop yield surveys, called objective yield (OY) surveys,, is in Matthews (1985), Reiser, Fecso and Taylor (1987), and Francisco, Fuller and Fecso (1987). 236 237 Several control procedures existed for the OY surveys. Supervisory enumerators visited the plots (approximately a 10 percent subsample which included the first sample visited by each enumerator). The field office survey statistician occasionally visited plots. Data are hand and computer edited. Finally, periodic validation surveys, covering a subset of crops and states in a given year, were conducted to measure the overall bias of the survey estimate in the domain studied. These control procedures had shortcomings. For example, visits by the supervisory enumerator served mostly as a retraining system; the data was not used to improve the estimates or to estimate biases. Budget and staff reductions reduced the number of field visits by survey managers. Edits have been changing. New computer edits and some areas creating individualized recording forms have resulted in estimates which may differ from those based on the old editing procedures. Finally, the expensive and administratively burdensome validation survey received increased questioning. The validation survey had one major goal -- to measure the differences between the objective yield crop cutting and the farmer's harvest. The validation surveys had clearly shown that the difference between the OY crop cutting and farmer's harvest is not equal to zero. These studies found differences by crop, year, and state. Since the validation surveys have answered the major question for which they were designed, we asked what purpose would they have in the future? Our main consideration remained the assessment of the bias. Several concepts needed attention. Was the overall bias consistent over the years? Our data is a time series, especially when considered by the users; thus, knowledge of bias-included level change is important. Are the sources of bias changing? Are there large enough bias changes to deserve extra concern? Are there any needs for procedural changes to reduce specific bias sources, or do we only need to monitor the overall level of bias? Finally, if we use overall bias measures to adjust survey values, are the biases within a specified tolerance? NASS currently conducts a redesigned validation survey for soybean OY. This survey is done in all states in the OY sample program. This design removed some unpopular aspects of the old validation surveys, including the concentration of work in one or two states and the variable workload resulting from changing states each year. Our goal was to verify the approximate 6% bias adjustment suggested by the historic series of studies. The current approach differs from prior studies. We now combine sources of error rather than trying to measure specific components. Thus, the results provide a basis for adjusting the survey for the many 238 small sources of error found in prior studies. These errors These errors included: incorrectly measured row widths, field counts differing from lab counts, time lag bias due to the enumeration differing by several days from actual harvest, new planting patterns causing enumeration and imputation difficulties, enumerator fatigue errors, and plot location biases. The rational for the redesign begins with our estimator of state yield, the mean of the sample field yields, which is basically unbiased, except that we do not have the true field yield, Y; but a sampled value, y. This estimate can be modeled as follows: 239 Three years of data from the validation survey have produced the following results: Estimated Estimated Bias as percent of Year Bias in Bushels Standard Error the Estimate 1987 2.2 .9 5.8 1988 2.3 .8 7.6 1989 3.2 .9 8.7 Thus, the studies validated the 6% adjustment of the survey data as reasonable. Future research can determine the optimal use of the validation survey for adjustment. We also need to assess the implicit missing at random assumptions. We can get some ideas on the reasonableness of the assumption using farmers reported yields to measure group differences. We need the assumption that the biases measured by the validation survey are uncorrelated with the action of obtaining elevator yields. This assumption is reasonable, but should be tested occasionally. With the redesigned validation survey we have two of the three estimates (the OY yield estimate, the validation survey estimates of OY bias, and a nonresponse bias estimate). These are the estimates, of the major error components which are necessary to assess the accuracy of the between-year of yield estimates. Conclusion Although the level of nonsampling error in establishment surveys was not directly measured in WP-15, nonuse of control and measurement techniques should not be interpreted as a lack of errors. Is it time for us to regain the balance between the importance which we put on the elements of survey quality and our actual practice? For too many years, emphasis in most government agencies has been on timeliness and resources (usually shrinking). It's time to shift more effort to relevance and accuracy issues. We might help ourselves by training users in survey quality concepts so they can help us prioritize our efforts and maybe lead the effort to secure more funding. Our easiest beginning in this road to quality could start merely by publishing more of what we do know about the errors. 240 Increased interest in organized quality efforts such as total u quality management philosophies is promising. organizations need to ask questions such as: 1. What measure(s) does top management use to quantify survey or organizational effectiveness? (Is it the same as the data users?) 2. How are these measures used to manage and plan' for the long run? Agencies need to assess their training needs. We will face at least some shortage of new hires with the survey research skills necessary. Some predict that the shortage will be acute and go beyond survey skills to general quantitative skills. Will agencies respond with creativity in developing staffing and training plans? We should do more to address this problem now. Finally, WP-15, actually all the working papers, needs to be more widely read. (Only a small percentage of the audience at the presentation had seen WP-15.) Agencies and users can benefit by identifying errors which were not previously considered and/or techniques which could be used. I caution against being overwhelmed with the quantity of errors displayed it WP-15. Don't worry that you can't eliminate or measure them all at once. I doubt that you have all these errors. Yet, don't be complacent. To improve survey quality you need a strategy. The strategy should define a systematic approach to the improvement and measurement of the effects of existing error sources as well as proposed changes in the survey process. Be flexible as you move along with the strategy, enjoying small successes as they come and avoiding the expectation of overnight miricles. References Bailar, B.A., (1968) "Recent Research in Reinterview Procedures," JASA 63:41-63. Fecso, Ron, (1986) "Sample Survey Quality: Issues and Examples from an Agricultural Survey," Proceedings of The Section on Survey Research Methods, American Statistical Association. Fecso, R., R.D. Tortora and F.A. Vogel, "Sampling Frames for Agriculture in the United States," J. of official Statistics, Vol. 2, No. 3, pp. 279-292, 1986. Fecso, Ron and Brad Pafford, "Response Errors in Establishment Surveys with an Example From Agribusiness Survey," Proceedings of the Section on Survey Research Methods, ASA, 1988. 241 Francisco, C., W.A. Fuller and R. Fecso, "Statistical Properties of Crop Production Estimators," Survey Methodology Vol. 13, No. 1, June 1987, pp. 45-62. Groves, Robert M. and Lou J. Magilavy, (1986) "Measuring and Explaining Interviewer Effects in Centralized Telephone Surveys," Public Opinion Quarterly, Vol. 50:251-266. Matthews, R. V. , "An overview of the 1985 Corn, Cotton, Soybean, and Wheat Objective Yield Surveys," USDA, Stat. Rept. Ser., Staff Report. Nov. 1985. Office of Management and Budget, Ouality in Establishment Surveys, Statistical Policy Working Paper 15, Washington, D.C., 1988. O'Muircheartaighl Coln A., (1986) "Correlates of Reinterview Response inconsistency in the Current Population Survey." Second Annual Research Conference, Bureau of the Census, March 23-26, 1985 in Reston, Va. Pafford, Brad, (1988) "Use of Reinterview Techniques for Quality Assurance: The Measurement of Response Error in the Collection of December 1987 Quarterly Grain Stocks Data Using CATI," National Agricultural Statistics Service, Research Report, USDA. Reiser, M., R. Fecso and K. Taylor, "A Nested Error Model for the Objective Yield Survey," Proc. of Section on Survey Research Methods, ASA, 1987. United Nations, National Household Survey Caipability Proctram, Nonsampling Errors in Household Surveys, New York, 1982. 242 DISCUSSION David A. Binder Statistics Canada I would like to thank the organizers for inviting me as a discussant at this important session on Quality in Business Surveys. Prior to these meetings, I reviewed once again the Statistical Policy Working Paper 15, "Quality in Establishment Surveys", and I would highly recommend it be read by both novices and experienced survey statisticians who deal with the design or analysis of business surveys. One clear fact which comes out of Working Paper 15 is that there are many issues and methods which are common to most federal business surveys. Certain issues faced in business surveys are more difficult than in social and demographic surveys. Part of this is due to the complex and dynamic structures within which the business community, operates. When designing and conducting such surveys, it is important to keep in mind the operational realities of the business world. Since there are many commonalities among business surveys, statistical agencies should pool their knowledge and expertise to take advantage of their combined experience. For example, there are sufficiently many common practices for sampling, data collec- tion, editing, estimation and dissemination of the results, that certain, standards and guidelines could be developed among the agencies. Sharing information and expertise is a worthwhile objective which meetings such as this can help accomplish. Whereas legalities of data sharing pose some obstacles at present, hopefully these can be overcome in the longer term. There are, of course, many aspects to improving the quality of business surveys, including frame issues and non-sampling errors. The development of general purpose business frames can lead to sophisticated and expensive systems, especially with respect to development and maintenance. This is because, a general purpose frame should reflect the realities of the operating structures in the business world and there must also be user-friendly interfaces with such a frame. In practice, there is of ten a gap between conceptual frameworks and actual application. Quality of the Frame An important area of concern in the quality of business surveys is the quality of the frame itself. Survey quality will depend on the quality of the frame information as well as the ease of accessibility to the frame data. Frames can never be perfect. Some of the sources of error are: 243 - undercoverage, especially for births - overcoverage, especially due to duplication and inclusion of out-of-scope units - misclassification of industry code, employment size, other size measures, etc. - identification of appropriate reporting units (collection entities) which reflects the operating structure of the business It is important to include in the development of a frame a Program to measure the quality of the frame information. This is particularly true when the frame will be used by a variety of users other than the developers themselves. Examples of quality measures are: - site of the backlog for SIC classification - distribution of lag times for births and other updates to the frame - errors resulting from cutoffs for multi-unit employers - duplication - matching errors If the frame is to contain the most up to date information, there should be some facility for incorporating and verifying feedback from the surveys themselves. This can lead to complications, where the information being derived from one survey may affect other surveys (e.g. a change in the relationships among multi-unit employers). Structure of the Frame If it is anticipated that the Business Establishment Listing (BEL) of the Bureau of Labor Statistics will be used by other agencies conducting business surveys, it should be noted that many of their needs cannot be met within the framework being discussed here. The administrative world does not always correspond to the business world. A listing which is useful for employment and related labor characteristics may not be suitable for surveys of economic production and other special characteristics The structure of the BEL for multi-unit employers needs some clarification. Whereas the worksite may be able to report employment data, it may not be able to report on profit and loss or balance sheet data. Different reporting units (collection entities) may need to be identified for different surveys. It 244 cannot be assumed that the respondent will necessarily conform to your concepts. At Statistics Canada, we have developed a hierarchical structure of statistical entities for the larger businesses. These are (i) the enterprise, where a full set of consolidated financial statements are available, (ii) the company which can report on profit and loss and other balance sheet items, (iii) the establishment, which can report on such items as value of output, cost of intermediate inputs, inventories, number of employees, and salaries and wages, (iv) the location, which can report sales and number of employees. This recognizes the relationship between the business world and the statistical needs for economic surveys. However, it is a complex structure to maintain. Retrieval Systems Not only are frame maintenance procedures resource intensive, but effective retrieval systems can be quite complex. and expensive to develop. Quality improvements to business surveys through better quality frames can only be realized if the frame information is easily obtained both cross-sectionally and through time. Examples of some of the needs which are expressed by users of frame information are: - linking of data through time - historical files - response histories - linking of data within enterprises - identification of seasonal and volatile firms - having sufficient structure to roll up to enterprise and track changes in structure over time - survey feedback (and verification) - requirements for estimation (regression, ratio, composite, benchmarking, poststratification) Other Frame Considerations The needs of the frame will change depending upon the survey frequency and the reference periods. For example, the units considered in-scope could vary according to whether the survey is monthly, quarterly or annual. 245 Even with all the complexities I have mentioned regarding the development and maintenance of business frameso I would strongly encourage such development, with any deficiencies explicitly laid out. One of the uses of a high quality frame is the ability to perform analyses of business demographics, showing behaviour of births, deaths, mergers and amalgamations, which is an important side benefit. Total Survey Error As was pointed during the session, improving frame quality is only one of the many mechanisms to meet the overall objective of controlling survey errors. Development of survey quality profiles has been mentioned as an important tool to monitor, control and manage surveys. Response errors should be a particularly important concern to the survey-taker. However, response errors are often due to the survey instrument itself, rather than the respondent. Recent experiences with cognitive methods have proven useful here. Often there are trade-offs between ideal concepts And the respondents' ability to respond accurately. For example, when asking a farm operator about value of equipment on land which he operates, he may prefer to report on equipment which he owns but which may be situated on another farm, rather than including equipment which is owned by someone else, but which is situated on his land. This creates difficulties for the survey-taker who is trying to avoid coverage errors. These are not easy problems to overcome, but the first step in all these endeavors is to recognize the problem and possibly measure its impact. Without special studies, it would be difficult to assess the relative merits of coverage error on the, one hand and response error on the other. In general, we need to concentrate on methods to synthesize all the errors into, an overall measure of survey quality. This would allow informed decisions to be made regarding the relative merits of improving one survey process over another. If such a model existed, we could answer some common concerns such as the relative contribution of edit and imputation to the reduction in total survey error and whether simpler methods could achieve comparable results. One possibility would be to use develop a microdata simulation database which incorporate as many of the known errors as possible. This database would consist of microdata which look like the real population. Various models for response and nonresponse errors could be simulated and then the data would be processed using existing or proposed methods. Since the original "true" data are known, we could assess the relative impacts of improving survey coverage versus using an Alternative estimator versus adding more edits to the survey process, for example. 246 DISCUSSION Charles D. Cowan Opinion Research Corporation What These Papers Have in Common If there is a single message that comes through in both the papers being discussed, it is that: Avoidance and/or Control is the Best Approach in Dealing with Nonsampling Error. Quality is something that one builds into surveys and continues to monitor. While one cannot completely avoid problems in surveys, it is markedly better to avoid or control a problem than it is to attempt to make an a posteriori correction to fix the problem. Such a fix usually is based on a much smaller amount of information collected from a supplemental sample or survey and adds variance to the original survey estimates. It is also usually the case that a fix introduced at the end of a survey only takes care of one problem and is not very cost efficient. In their paper, Tupek and MacDonald describe a process of expanding a sampling frame for business surveys that addresses several different sources of nonsampling error. Their work with the sampling frame deals with coverage issues, timing issues, definitional problems in the surveys, estimation, use of administrative records for weighting and variance reduction, and other aspects of the conduct of business surveys. Their approach is to improve the basic materials used for surveys to encourage more efficiency and accuracy at later stages. Pecso in his paper describes a process of measuring and controlling as many aspects as possible of incidence of nonsampling error. He also supports the idea that nonsampling error it best dealt with by avoidance, but is also realistic in suggesting that a catalog of problems is useful for two primary purposes: planning future surveys and providing documentation for users of the current effort. This control process can be used to ensure that the data produced in a survey are of the best quality given the constraint that control is imposed as part of the process, since many types of nonsampling errors cannot be totally avoided. Specific Quality Issues for Business and Establishment Surveys As one reads and compares these papers, one is reminded of the fact that business and establishment surveys, are different household surveys in several key ways: 247 1) The availability of attributes on the frame and the use of this frame information at the unit level differs from what can be done in household surveys, 2) The surveys themselves make extensive use of records as a basis for reporting, and 3) The data to be collected in business and establishment surveys has a multilevel nature, meaning that information about the businesses is hierarchical and we are interested in the information at each level (e.g., Sears Headquarters, regional offices, distribution centers, and individual stores). These factors are crucial to the design of business and establishment surveys. Use of information on the frame for design and use of records in collection makes it possible to improve the quality of these types of surveys relative to household surveys, but, this is counterbalanced to an extent by the complications introduced by the multilevel nature of the data to be collected. Tupek and MacDonald note in their paper that for the surveys they conduct that establishments come from skewed populations, and having this information on the frame makes it possible to design a survey that is much more efficient, especially for multiple characteristics to be measured simultaneously. However, reliance on this information in the frame makes the accuracy of frame information crucial at the individual unit level for both sampling and estimation purposes. Their project on frame expansion and improvements has an impact in several areas. The first is sample frame development, so that more business and establishments are represented. This is broader than a coverage issue, since coverage is usually viewed, as a problem that pervades an extant frame. Tupek and MacDonald address coverage issues in this way, but also include whole segments of the business population previously excluded from the frame. A second area impacted by. the frame expansion and improvements project on which they report is the actual design of the sample, where the sample can be optimized for making different types of estimates using information available on the frame. A third area impacted by the frame expansion and improvements. is in data collection, and the final area is in estimation. Tupek and MacDonald point out that the new frame encourages the conduct of new longitudinal surveys, the selection of sample at the unit of analysis (instead of collecting the information by proxy or sampling down to the unit of analysis after starting at a higher level in the hierarchy), improvement in response rates because of higher eligibility rates, savings in terms of time and effort expended on the survey, and improvement in weighting and ratio estimation procedures. 248 Fecso takes a different approach to dealing with nonsampling error. He catalogs sources of nonsampling error, and his approach is to detect, measure, and control the nonsampling error. Many of the sources of nonsampling error he lists are common to both household and business surveys, but with business surveys he has a variety of records, including past survey collections, available for detection and measurement of nonsampling error. A primary concern for the use of records is the accuracy of the data in the records, since the records themselves could be in error. Although not mentioned in the paper, some of the most interesting work in health care surveys is modeling of nonsampling error when hospital records and information based on patient recall don't match and either is potentially wrong. The same is true for business surveys -- accuracy in the records systems is crucial for detection and measurement of nonsampling error as part of a quality management system for a survey. Another factor related to accuracy is the consistency of definitions used by different respondents. If the data are accurate but based on different definitions, then there is a problem in how the data might be used for detection and measurement of nonsampling error. Concerns with Business and Establishment Surveys Not Covered While both papers are excellent in the way they cover in depth quality issues facing business and establishment surveys, they both miss some salient points peculiar to these types of surveys. The first was mentioned earlier, namely that businesses are hierarchical, which leads to some difficult questions regarding who reports in these surveys, and how the various businesses relate to one another (i.e., at what level do we define the unit of analysis?). In terms of how units relate an example was given earlier for Sears, which owns not only Sears Retail, but also has Allstate Insurance, a mailing service, regional offices, catalog stores, and local retail stores. Are we interested in these surveys in getting reports from the lowest level in this chain? How does Sears headquarters report exactly -- for itself as an establishment with a certain number of employees, or does it include all employees and sales at all locations? If there is confusion in reporting rules for a survey, we could wind up with severe overcounting or undercounting of activities and personnel. Another issue has to do with the reporting of activities within a firm. In reporting mailing activities, for example, each firm and each location of a firm will have some activities to report. To whom do we speak in the firm to get a complete picture? There are separate operating units within firms, each with a manager knowledgeable about his own unit's activities. And there are sometimes other units that assist in terms of technical or operational support. Do we talk to managers in both or all offices 249 or units, or is there a central source that can answer all questions knowledgeably and without duplication? There ate two final concerns we have regarding quality in business and establishment surveys. One has to do with the process of improving and expanding the frame for a business survey, which usually translates into adding smaller firms. These firms are more likely to be related to other members of the population, and they are more prone to movement in and out of the population (births and deaths). Because of these factors, they add a certain amount of instability to the estimation process. This may be good or bad -- on the one hand we have a more realistic representation of the population of businesses when we include more firms, but on the other hand for certain types of statistics we may be adding more variation without a real gain in forecasting or descriptive accuracy. This problem could be labeled: "messiness at the edge". The other problem not addressed in either paper, and of particular concern in the Fecso paper, is that a large, well conceived and executed survey might not benefit from a Nonresponse/Nonsampling Error Correction that it estimated from a small onetime experiment. While in theory the idea of implementing research studies to monitor the quality of ongoing surveys is laudable and should enhance the quality of the surveys, implementation for Federal surveys often falls a bit short, with a simple, one-time study implemented to measure a particular problem. A small scale, high variance research study should be viewed as just that, and not a vehicle for making corrections to a multimillion dollar effort. If the nonsampling error problem is sufficient to justify such an effort, and the nonsampling error cannot be dealt with as part of the design, then sufficient resources should be devoted to measurement and control to take care of the problem. Essentially, the problem becomes one of design again, with focus on the Proper allocation of resources between the survey and the experiment to fix the survey. Conclusions Both papers were excellent summaries of the state of the art for measuring and maintaining, quality in Federal surveys of businesses and establishments. Researchers involved in the design of either business or household surveys would benefit from studying and implementing the principles found in either paper. 250 Session 8 COGNITIVE LABORATORIES 251 252 THE BUREAU OF LABOR STATISTICS' COLLECTION PROCEDURES, RESEARCH LABORATORY: ACCOMPLISHMENTS AND FUTURE DIRECTIONS Cathryn S. Dippo Douglas Herrmann U. S. Bureau of Labor Statistics I. Introduction The accomplishments of the Cognitive Aspects of Survey Methodology movement (Jabine, et al. 1984) have clearly been substantial. This is especially true in Washington, where three Federal agencies (Bureau of the Census, Bureau of Labor Statistics (BLS), and the National Center for Health Statistics) have established laboratories. Consider the scope of BLS' survey research programs. Most of the sampling units from which data are collected by or for BLS are establishments. While approximately 60,000 households are questioned about labor force participation each month in the Current Population Survey (CPS), 340,000 establishments are being asked to report their payroll employment each month in the Current Employment Statistics Survey. More than 200,000 price quotes are being collected each month from establishments in the Consumer Price, Producer Price, and International Price Index programs. Moreover, much of the data are currently being collected by mail, without person-to-person interaction. In the future, more and more of the data will be collected with computer assistance, and the human-machine interface will take on added importance. Furthermore, in most establishment surveys, the needed data can be directly observed (e.g., consumer prices) or exist in records rather than in the memories of the respondents. Even in household surveys, many respondents are being asked to recall not only autobiographical events, but also information that exists in household records and information about other members of their household. Thus, the mission of the Bureau requires the BLS laboratory to consider more than just questionnaires to be used with personal visit interviewing in the context of a household survey about autobiographical events. The Bureau acknowledged this fact when selecting the name for its laboratory -- the Collection Procedures Research Laboratory (CPRL) -- which was established in 1988. The basic goal of the CPRL is to improve through interdisciplinary research the quality of data collected and published by BLS. As originally envisioned, all forms of oral and written communication used in the collection and processing of survey data are appropriate subjects for investigation, as are all aspects of data collection, including mode, manuals, and interviewer training. 253 The CPRL's staff includes cognitive psychologists, social psychologists, sociologists, and a psychological anthropologist. For most of their projects, they work closely with the economists or program specialists responsible for defining the concepts to be measured by the Bureau's survey programs. To augment staff resources, the CPRL has labor hour contracts with the Institute for Social Research at the University of Michigan and Westat, Inc. The laboratory also does work under contract for other Federal agencies such as the Internal Revenue Service. Although the CPRL has only existed for two years, its research program has been both broad and prolific. In section II, some accomplishments of the CPRL, are reviewed. The discussion is organized within the framework of an information processing model. In section III, some directions for future research are described. The success of focusing on the cognitive system suggests that focusing on other behavioral systems may produce further gains in data quality through improved survey theory and practice. Moreover, the success of using laboratory techniques for investigating the data collection processes used in sample surveys leads us to believe the techniques can be useful in improving other aspects of survey design. II. Accomplishments to date The CPRL has integrated the cognitive approach into the Bureauls survey research program to good effect in many ways. Primarily, the laboratory has changed how data collection research is conducted at BLS. Not only has the research conducted to date affected our understanding of the survey process, but the fact of its existence has heightened awareness throughout BLS of the need for a better understanding of all aspects of the data collection process (Norwood and Dippo in press). Some results of the CPRL's research efforts are presented here within the framework of an information processing model (Cannell et al. 1989; Tourangeau 1984) that has four distinct stages: comprehension, retrieval, judgment, and communication. As applied to respondents, these stages refer to the comprehension of a question, retrieval of pertinent information, judgment about the accuracy of the information retrieved, and communication about this information within social and other restrictions imposed by the survey situation. As applied to interviewers, these stages may refer to comprehension of the question, retrieval of appropriate ways to say the question aloud, judgment about whether the respondent has understood the question, and communication to ensure the question has been understood (such as by rereading it) or, if the question has apparently been understood, to indicate that another question is about to be presented. 254 A. Comprehension Question comprehension clearly requires that the terms making up a question be correctly understood. The accuracy of term comprehension has been shown by many psycholinguistic investigations to differ in certain ways. Multiple meanings of terms: A term may lead some respondents to answer inappropriately because it may convey a meaning different from that intended by the designer. Research at BLS has accordingly attempted to identify terms with several meanings that are not made explicit by the phrasing of questions and might be likely to produce misinterpretations. Since the issue of employment is of personal significance to most people, questions about employment status are likely to predispose respondents (especially the unemployed or those with insecure employment) to be influenced by social desirability when answering the CPS (DeMaio 1984; Edwards, Levine, and Allen 1989). The misinterpretation of employment status terms may easily occur in a survey such as the CPS (Martin 1987). Accordingly, respondents I interpretations of two key terms on the CPS concerning unemployment status, "on layoff" and "looking for work," have been examined. The CPS definition of unemployment refers to persons who were not employed during the survey week, were available for work, and had made specific efforts to find employment sometime during the prior four weeks. Persons who are waiting to be recalled to a job from which they have been laid off need not be looking for work to be classified as unemployed. As expected, research demonstrates that these terms are sometimes misinterpreted by laboratory respondents to the CPS. Similar research into the effects of multiple meanings of terms has also beet conducted for several sections of the Consumer Expenditure (CE) Interview Survey, including the sections on medical care, home purchase, and trip expenditures (Miller and Downes-LeGuin 1989). Since our results indicated that people interpret "payments" in different ways, the section on medical care expenditures has since been modified to avoid misinterpretations of this term. Diverse Meanings: Diversity of term meaning also may impair comprehension. For example, in a recent pilot survey of business establishments, respondents were asked to report all "nonwage cash payments" paid to employees during the calendar year. BLS defined the payments to include bonuses and awards, lump-sum, cash profit sharing, and severance payments, and nonregular commissions, but since this technical term probably was not too familiar to respondents, the meanings of "nonwage cash payments" can be expected to vary across respondents. When the interpretations of this term by respondents were investigated, it was found that respondents interpreted "nonwage cash payments" in a diverse fashion. Some interpreted it too broadly to include payments in kind, such as a new car (Boehm 1988), and some too narrowly to 255 include only cash And not cashable checks (Phipps 1990). Another group of respondents who had made such payments simply checked they had made no payments because of a lack of understanding of what the term included. Respondent exclusion and nonreporting of payments were more serious comprehension errors than inclusion of inappropriate payments, contributing to underreporting. Format Properties: When respondents complete a survey form received in the mail, the format of the instrument may play a crucial role in the respondents' comprehension. If the format does not make it- clear what parts of the instructions are essential, respondents may overlook these parts and respond inappropriately. For example, in the Nonwage Cash Payments Pilot Survey (Phipps 1990), instructions, definitions, and examples were on the back of a one-page questionnaire, for which two different layouts were ,used. one layout required respondents first to provide an annual nonwage cash Payment total and an annual payroll total, then answer a set of yes/no questions asking if they made specific types of nonwage cash payments. The second layout placed the set of yes/no questions first, with the payments and payroll totals requested at the bottom of the page. Reporters receiving the second layout were much less likely to provide the annual payroll total, stating in retrospective interviews that they overlooked it or did not understand they were to provide it. Thus, the layout of the second form, combined with a lack of instruction, caused an entire section of the form to be overlooked. As expected, the format of a survey played an important role in the respondents I comprehension of survey items. The types of cues used on a self-administered form like an expenditure diary also can affect comprehension. In developing a diary for recording clothing expenditures, alternative cueing levels were tested in a laboratory. Results indicated that a shorter diary with multiple pages that repeated the general cues, e.g., buying clothes, was more effective than a longer, more structured version with specific cues. Respondents were better at clarifying the domain of purchases to be recorded with the general cues than with the specific cues, i.e., the specific cues led them to restrict their comprehension of listed items more narrowly than intended. B. Retrieval Most Federal surveys require respondents to retrieve information about factual or autobiographical events. Faced with the need to control data collection costs, the time period for which the events Art to be recalled is often long. For example, the reference period for the CE Interview Survey is three months. In the CPS, respondents may be asked questions about last week, the last four weeks,, or the last time they worked, which could require recall for a long period of time. (For further discussion of 256 memory retrieval errors in CE and CPS, see Dippo 1989 and Mullin 1990). Cues: Often a situation is inadequate in the cues it presents for retrieval. Alternatively, when enough appropriate cues are brought forth, a person can retrieve the previously "forgotten" memory. while some information is probably lost from memory due to diseases and environmental influences (such as alcohol), cues clearly play an important role in retrieval. Accordingly, several investigations have attempted to increase response, accuracy on surveys by providing additional cues to retrieval, e.g., Lessler, et al. (1989). Still, it is important to recognize that some cues can be misleading and ensure that a respondent does not retrieve the appropriate information. Cues facilitate only when they correctly direct retrieval. In the Nonwage Cash Payments Pilot Survey, underrepotting was investigated by presenting cues to facilitate retrieval. When respondents (company representatives) were given specific cues pertaining to bonus and award payments, recall of such payments was 11 percent higher thin without cues (Phipps 1990). Also, in the CE Diary Survey, cues with varying levels of generality have been tested. For example, general cues included "beef (ground, roasts, steaks, briskets, etc.)" and specific cues included "ground beef, chuck roast, round roast, other roast, round steak, sirloin steak, other steak, other beef and veal." Underreporting was greater with general cues for certain items, particularly nonfood items. On the other hand, the level of reporting for many food items was not affected by the type of cues (Tucker and Bennett 1988). Strategies: To get accurate recall about the past, it is necessary to get people to retrieve the mental records of what they actually did. Several strategies to get respondents to access their memories of experiences have proved useful in our investigations at BLS. One strategy has respondents recall a critical personal event that occurred in the reference period in order, to anchor the period. A second strategy has a respondent consult a calendar when attempting to recall. A third strategy has respondents decompose events recalled into smaller events to ensure that what is being recalled is a real experience and not a stereotypical schema. Research funded by BLS has found that respondents vary in the extent to which they employ the strategy that they were instructed to use. only one-third of the laboratory subjects instructed to use a decomposition strategy when responding to questions on their hours worked used the strategy. Also, the vast majority of proxy respondents presented with this strategy ignored it because they did not have the knowledge necessary to use it. Expertise: In a laboratory study of household respondent pairs using the CPS questionnaire, proxy responses disagreed with those of the self-respondent approximately one-third of the time 257 (Boehm 1989). In another laboratory study, when responents were instructed to use the decomposition procedure, the vast majority of proxy respondents ignored the procedure, since they did not have the knowledge necessary to use it (Edwards, et al. 1989). Self- respondents were found to overreport and proxy respondents to underreport the hours worked. Also, proxy respondents were more likely than self-respondents to make errors, and their errors tended to be larger (see also Tanur 1990). As might be expected, proxies fail in areas they are less likely to know about. For example, proxies underreport more when the I person reported on worked weekends or worked extra hours. Also, proxy error was greater when the respondent was unrelated to or from a different generation than the person to whom the data related (Edwards, et al. 1989). C. Judgment People may recall correctly but not realize the recalled information is correct. They may recall correct information, know it is correct, but express it inappropriately because they misconceive how responses are to be expressed. It was noted above that field research on the CE Diary Survey indicated specific cues were often more effective and led to less underreporting than general cues (Tucker and Bennett 1988). Laboratory research has indicated that judgment is also a factor. When given specific cues, laboratory subjects were sometimes unsure of where to record products on the form. Whether this hinders reporting is still an open question, but the accuracy of reports is affected (Tucker, et al. 1989). The specific cues also may make the task more onerous. D. Communication The importance of communication to cognition has largely been recognized in social psychology and anthropology. A considerable amount of survey research has shown that respondents' inclination to answer questions may be affected by the social desirability of the answers. In some cases, respondents may be disinclined to answer because they do not want to share certain kinds of information. In other cases, they may not want to present themselves in a bad light. In other cases yet, they may want to adapt their response to what they perceive to be the expectations of the interviewer. While BLS has yet to complete an investigation of communication, it has recently begun several such investigations. First, the laboratory is conducting research into the psycholinguistic factors that persuade a respondent to provide confidential information to a survey (Herrmann, et al. 1990). This research will indicate the degree of trust elicited by different protection terms (confidential, private, secret, concealed, 258 nondisclosed). Second, we are examining the influence of interviewer errors an the errors of respondents using techniques developed by Cannell (Cannell, et al. 1989). For example, tape recordings of CE Survey interviews are being analyzed to determine whether the quality of answers produced by respondents varies with the quality of the interviewers' presentation of a question. Third, like other agencies we are investigating the use of computer-assisted telephone interviewing (CATI) for some BLS surveys. Research is underway for the CPS, CPI-Housing, and Continuing Point-of-Purchase surveys to determine if people respond in the same manner in a computer-assisted telephone interview as they do in a personal interview. It has been suggested that the personal interview ensures better attention from the respondent, but it has also been suggested that CATI elicits information that otherwise might not be disclosed because the respondent feels less personally involved when interacting with an interviewer on the telephone. In various ways our research is addressing these alternative expectations about CATI. III. Future directions Prior to the establishment of the laboratory, BLS sponsored a Questionnaire Design Advisory Conference to seek advice on the types of questionnaire research that should be undertaken for the CE and CPS (Bienias, et al. 1987). The conference participants all advocated the incorporation of cognitive concepts into the BLS research program and suggested that research focus on the issues of respondent rules, respondent and interviewer roles, questionnaire form and content, and statistical estimation. In addition, our ongoing research program has taught us that many aspects of the data collection process require a broader integrated-systems approach rather than a cognitive approach to research. The accuracy and efficiency of survey responses are affected not only by cognitive variables (e.g., abstractness of terms, retrieval cues) but also by other kinds of variables (e.g., physiological, perceptual, emotional, motivational, social, societal, cultural, and economic; see Royce 1973). In some cases, these variables affect responding because they interact with the quality of cognitive processes underlying responding. In other cases, these other variables leave cognitions unaffected but instead interact with a respondent's inclination to report accurately about these cognitions. A. Looking beyond the cognitive approach An integrated-systems conception of cognition has been advocated increasingly in recent years by scholars in anthropology (Cole and Scribner 1974), psychology, and neuroscience. Some noncognitive psychological and societal factors that may affect the 259 response process are: Physiological condition, perception, emotional state, motivation, familial roles, and societal norms. Physiological condition: The accuracy and efficiency of cognitive responses are affected by the physical state of a person's body (Squire 1987). Physiological condition, as affected by physical health, influences a person's ability to understand, remember, reason, and analyze. A variety of routine health conditions (such as the common cold) may impair the accuracy and/or efficiency of cognitive processes (Cutler and Grams 1988). Cognitive processes are also impaired by commonly imbibed substances, such as coffee, tobacco, tranquilizers and antidepressants, and even certain antibiotics. The CPRL has been sponsoring laboratory research on the effects of computer-assisted personal interviewing (CAPI) on the interviewer (Couper et al. 1990). Although the studies have been within the context of the Consumer Price Index survey, where interviewers conduct interviews both on the doorstep of housing units and walking the aisles in retail establishments, the procedures developed, concerns raised, and results are generally applicable. For example, more than, 40 percent of the 46 interviewers who volunteered to be laboratory subjects stated that they had suffered neck, shoulder, and/or lower back problems in the 12 months prior to any contact with a portable computer. Moreover, approximately 75 percent of the subjects wore some form of corrective lenses, with bifocals presenting particular problems for interviewers trying to focus on the keyboard, screen, and respondent. Perception: The quality of visual stimuli affects the ease of reading and comprehension. The role of perception is of special importance in many Federal surveys where data are collected via a self-administered form. For these surveys, the perceptual constructs may have significant effects on the quality of data. Wright (1980) suggests classifying form-design issues into three categories: the language of forms, overall structure, and the substructures within the forms such as the questions themselves. In addition, there are perceptual issues related to the appearance of questionnaires, such as color and print font. The presence of visual stimuli affects retrieval processes more than thinking about or imagining the stimulus. For example, psychological research indicates that the frequency at which academics use external aids, such as files and piles of papers on one's desk, has been found to be positively correlated with scholarly productivity (Hertel 1988). Survey research indicates that expenditure reporting increases with the use by respondents of an information booklet describing the types of items that belong to the categories being read aloud by the interviewer. More respondents appear to be willing to read the item lists than to listen to an interviewer read the list to them. 260 Respondents to the Occupational Safety and Health Survey face a very difficult task in deciding if an incident is an injury or an illness and if it is reportable or not. Currently, respondents receive a 22-page set of guidelines. Laboratory staff are now investigating different methods for communicating the decision logic to respondents, i.e., flow charts or graphic representations of the decision paths. In addition, a simple user's guide (no more than 10 pages) is being prepared for respondents who are new to OSHA recordkeeping. Unlike the longer guidelines, this guide contains background on the 1970 OSHA act and provides examples on how to recognize, record, and report occupational injuries and illnesses. Emotional state: Our cognitive ability to comprehend, retrieve, evaluate, and respond may be affected by our emotional state (Wolkowitz and Weingartner 1988), which in turn may be affected by recent events or prolonged stress. Stress, a major factor moderating emotional states, has been associated with cognitive failures in everyday life. Sometimes, emotional states may prevent people from producing correct responses, that they "know" at some level. For example, despite decades of controversy, it is now generally accepted that sometimes people repress memories. Nontrivial levels of stress are currently experienced by interviewers. With the change over the next decade to increased CATI, the possibility of increased interviewer stress is real. In surveys like the CPS, the proportion of personal visit interviews will increase for most interviewers working in large metropolitan areas as many of their telephone interviews are transferred to a centralized CATI facility. Concerns about personal safety and administrative pressures to maintain high response rates are but two factors which may contribute to increased interviewer stress. In a centralized CATI facility, interviewers know their work is constantly being monitored. Recent news stories about the effects of constant observation and work quotas in the telephone industry indicate stress levels can be very high in these kinds of situations. Motivation: We know little about respondents' motivations for responding to survey questionnaires. Census' recent experience of overestimating the mail-return rate in the decennial census is but one indicator of how little we know. At BLS, those of us working on the CE Interview Survey constantly wonder why anyone would agree to an interview that is expected to last 2 hours. To investigate survey respondent motivation, a large-scale research project on household survey response has been initiated by Robert Groves at the University of Michigan, sponsored by the Bureau of Justice Statistics, the Bureau of Labor Statistics, and the National Center for Health Statistics. One part of the project is an examination of both interviewer (e.g., attitudes, behavior, and characteristics) and administrative (e.g., procedures, workload 261 levels, design parameters) influences on survey participation (Groves, R.M. and Cialdini, R. 1990). To examine the effects of alternative forma of persuasive communication on sample attrition rates and item response rates, BLS is conducting experiments using appeals that stress the use of Current Employment Statistics data by the trade associations representing the establishments (McKay 1990). Familial roles: The roles people assume within the family have been found in recent years to affect cognitive processes. While it may be assumed in some surveys that people within a home are equally able to answer questions pertaining to the household, research shows that different family roles carry responsibility for knowing -about certain kinds of information. For example, wives tend to know more about the health and activities of children whereas husbands tend to know more about how community activities affect the household. Single parents tend to know the information possessed by both spouses in dual-parent households. With the prevalence of proxy reporting in most household surveys, the importance of :learning about what information is exchanged within households and how should not be understated. Recent research on proxy reporting in the CPS indicates adults may be worse proxy reporters for youths than, for other adults in a household (Tanur 1990). Moreover, the proxy reporting of job search may be dependent upon the type of job search strategies being used by youth. At Tanur notes, there is no literature about family communication patterns and the issue of who in the family talks to whom about what. Societal norms: Cognitive performance is affected by groups in several ways. For example, people are disinclined to perform memory tasks when the social stereotypes that apply to them indicate that they cannot perform well, such as the stereotypes associated with age or with gender. Also, people will sometimes knowingly give the wrong answer to a question because they recognize that their answer is contradicted by the other members of a group. Moreover, social pressures sometimes dispose people to communicate falsely what they do or do not know in order to achieve social goals. For example, people may say they cannot recall some event or information to avoid, or speed up the questioning or to make a certain impression on the questioner. We do know that social desirability plays a role, but there has been little research into understanding the role (DeMaio 1984). We also know that the mode of data collection appears to have an effect on data, but we do not know why (Shoemaker, et al. 1989). Recent research by Suchman and Jordan (1990) shows clearly the influence of social and cultural variables. 262 Evidence indicates that members of all cultures can equally perform all manner of cognitive tasks if the environment has provided the cultures equivalent education and experience. However, because cultures typically involve different educational systems, belief systems, and occupational opportunities, members of different cultures acquire different cognitive skills (Cole and Scribner, 1974). hus, members of different subcultures of a multicultural society will interpret certain concepts differently and answer differently. B. Looking beyond the interviewing process The research laboratory and laboratory techniques can be used in a variety of survey design applications. Just as the responding process is affected by noncognitive variables, the survey process consists of more than just question answering. The entire survey design process, from defining the concepts to be measured through analyzing the data, involves the communication of concepts between people with different knowledge bases or an interaction between people and things. The process can benefit from a broad range of interdisciplinary research including both cognitive and other areas of psychology, other behavioral sciences, and human neuroscience. The importance of the role of the interviewer has long been recognized. Data collection and training methods designed to control interviewer error, such as structured questionnaires and verbatim training, have been developed in an attempt to control interviewer error. Interviewer training typically stresses the need for neutrality, the use of specified questionnaire wording and administration procedures, and appropriate probing techniques. Recognizing the importance of this source of error, many BLS- sponsored laboratory studies conducted in the last two years have focused on the interviewer. These studies indicate the role of the interviewer can be studied effectively with laboratory techniques. Thus, it seems natural to expand our research in this area. IV. Summary As survey researchers, we really know very little about the psychological processes underlying interviewer and respondent behavior. The few laboratory studies to date indicate the cognitive approach is very useful. With this approach we are learning about the roles of comprehension, recall, judgment and communication in the survey response process. Eventually, as we learn more, we can develop detailed models which questionnaire designers can use to assess new questions and forms for survey data collection. Just as the research to date has shown that the cognitive approach is effective, it has shown that a more broad-based approach is necessary. Survey responses clearly emanate from all 263 behavioral systems within and outside the respondent. An understanding of how responding is affected by the cognitive system is not enough. A respondent's behavior is influenced by physiological, emotional, social, societal, and economic variables. A complete explanation of responding requires an understanding of all systems and how their influences are integrated overall to produce a response. The adoption of an integrated-systems approach would be a natural step in the evolution of survey science. Consider the disciplinary history of economic statistics. First, there were economists producing simple descriptive statistics. The discipline of mathematical statistics was not really incorporated until probability sampling became the basis for sample designs. Then came the advent of computers. Just as we have expanded our use of statistical theory as applied to survey research beyond just sampling (e.g., to incorporating operations research techniques in sample design optimization and iterative methods such as raking in survey estimation) survey research may progress further by making use of not only cognitive psychology but also of knowledge of other psychological and sociopsychological systems. References Bienias, J., Dippo, C., and Palmisano, M. (1987), Questionnaire Design: Report on the 1987 BLS Advisory Conference, Washington, DC: U.S. Department of Labor, Bureau of Labor Statistics. Boehm, L. (1988), "CES Nonwage Cash Payment Prepilot Interviews," Internal memorandum to Alan Tupek dated December 16, Washington, DC: U.S. Department of Labor, Bureau of Labor Statistics. Boehm, L. (1989), "The Relationship Between Confidence, Knowledge, and Performance in the Current Population Survey," in Proceedings of the Section on Survey Research Methods, American Statistical Association, in press. Cannell, C., Fowler, F., Kalton, G., Oksenberg, L., and Bischoping, K. (1989), "New Quantitative Techniques for Pretesting Survey Questions," in Bulletin of the International Statistical Institute, pp. 481-495. Cole, M. and Scribner, S. (1974), Culture and Thought: A Psychological Introduction, New York: John Wiley and Sons. Couper, M., Groves, R., and Jacobs, C. (1990, in press), "Building Predictive Models of CAPI Acceptance in a Field Interviewing Staff," in Proceedings of the 1990 Annual Research, Conference, Washington, DC: U.S. Department of Commerce, Bureau of the Census. 264 Cutler, S.J. and Grams, A.E. (1988), "Correlates of Self-Reported Everyday Memory Problems," Journal of Gerontology, 43, 582-590. DeMaio, T. (1984), "Social Desirability and Survey Measurement: A Review," in Surveying subjective Phenomena, eds. C. Turner and E. Martin, New York: Russell Sage. Dippo, C.S. (1989), "The Use of Cognitive Laboratory Techniques for Investiating Memory Retrieval Errors in Retrospective Surveys, in Bulletin of the International Statistical Institute, Vol. LIII, Book 2, pp. 363-382. Edwards, S., Levine R., and Allen, B. (1989), "Cognitive Strategies for Reporting Hours Worked, " in Proceedings of the Section on Survey Research Methods, American Statistical Association, in press. Groves, R.M. and Cialdini, R. (1990), "Toward a Useful Theory of Survey Participation," unpublished manuscript. Herrmann, D., van Melis-Wright, M., and Stone, D. (1990), "The Semantic Basis of Confidentiality," in Proceedings of the Section on Survey Methods Research, American Statistical Association, to appear. Hertel, P. (1988), "External Memory," in M. Gruneberg, P. Morris, and R. Sykes (eds.), Practical Aspects of Memory, New York: John Wiley and Sons. Jabine, T., Straf, M., Tanur, J., and Tourangeau, R. (1984), Cognitive Aspects of Survey Methodology: Building a Bridge Between Disciplines, Washington DC: National Academy Press. Lessler, J., Salter, W., and Tourangeau, R. (1989). "Questionnaire Design in the Cognitive Research Laboratory: Results of an Experimental Prototype," Vital and Health Statistics, Series 6, No. 1 (DHHS Publication No. PHS 89-1076), Washington, DC: U.S. Government Printing Office. Martin, E. (1987), "Some Conceptual Problems in the Current Population Survey," in Proceedings of the Section on Survey Methods Research, American Statistical Association, pp. 420-424. McKay, R. (1990), "Application of Persuasive Communication Strategies to a Business Establishment Survey," in Proceedings of the section on Survey Methods Research, American Statistical Association, to appear. Miller, L. A. and Downes-LeGuin, T. (1989), "Reducing Response Error in Consumers' Reports of General Expenses: Application of Cognitive Theory to the Consumer Expenditure Interview Survey," Advances in Consumer Research, in press. 265 Mullin, P., (1990), "Proposal for Laboratory Research on the Feasibility of an Extended Interview Period for the CPS," unpublished memorandum to A. Tupek, in preparation. Norwood, J. and Dippo, C. (in press), "Goverrment Applications," in Questions about Questions: Memory, Meaning and Social Interaction in Surveys, New York: Russell Sage. Phipps, P. (1990), "Applying Cognitive Techniques to an Establishment Mail Survey," paper to be presented at the annual meeting of the American Statistical Association, Anaheim, California, August. Royce, J.R. (1973), "The Present Situation in Theoretical Psychology," in B.B Wolman (ed.), Handbook of General Psychology, Englewood Cliffs, NJ: Prentice Hall. Shoemaker, H., Bushery, J., and Cahoon, L. (1989, in press), "Evaluation of the Use of CATI in the Current Population Survey," in Proceedings of the Section on Survey Research Methods, American Statistical Association. Squire, L. (1987), Memory and Brain, New York: Oxford University Press. Suchman, L. and Jordan, B. (1990), "Interactional Troubles in Face- to-Face Survey Interviews," Journal of the American Statistical Association, 85, 232-240. Tanur, J. (1990, in press), "Reporting Job Search Among Youths: Preliminary Evidence from Reinterviews," in Proceedings of the 1990 Annual Research Conference, Washington, DC: U.S. Department of Commerce, Bureau of the Census. Tourangeau, R. (1984), "Cognitive Sciences and Survey Methods," in Cognitive Aspects of Survey Methodology: Building a Bridge Between Disciplines, T. Jabine, M. Straf, J. Tanur, and R. Tourangeau (eds.), Washington, DC: National Academy Press. Tucker, C. and Bennett, C. (1988), "Procedureal Effects in the Collection of Consumer Expenditure Information: The Diary Operations Test," in Proceedings of the section on Survey Methods Research, American Statistical Association, pp. 256-261. Tucker, C., Miller, L., Vitrano, F., and Doddy, J. (1989), "Cognitive Issues and Research on the Consumer Expenditure Diary Survey," paper presented at the annual American Association for Public opinion Research Conference. Wolkowitz, O.M. and Weingattner, H. (1988), "Defining Cognitive Changes in Depression and Anxiety: A Psychobiological Analysis," Psychiatry Psychobiology, 3, 1-8. 266 Wright, P. (1980), "Strategy and Tactics in the Design of Forms," Visible Language, XIV 2, pp. 151-193. 267 THE ROLE OF A COGNITIVE LABORATORY IN A STATISTICAL AGENCY Monroe G. Sirken National Center for Health Statistics Introduction The statistical survey is an invention of the twentieth century. It produces a commodity, namely information, which many believe is the most important property in the modern world. Our Federal establishment, for example, would be unable to function nearly as effectively without the information being produced by surveys that are conducted by the Federal agencies represented at this Seminar. The Congressional and Executive branches use Federal surveys to monitor the nation's well-being, to evaluate the government's social, health and economic programs, and to plan legislation involving the collection of billions of tax dollars and the disbursement of billions of benefit dollars. Federal surveys could not have attained this level of acceptance and importance without the technological advances in survey methods that have occurred during the past half century. However, we can hardly afford to be complacent. As data producers, we are even more mindful than data consumers of the limitations of current survey technology. We realize that further technological advances are essential to assure that Federal surveys will meet the growing needs for, more and better survey data. There have been two major technological advances in survey methodology during the past 50 years and I believe a third may be in the offing. Each advance has introduced innovative technologies for improving the precision of the survey measurement process and was made possible by technology and theory transfers from the applied sciences. The "sampling" revolution in survey methodology that began in earnest during the 1930's came about as a result of technology transfers from the statistical sciences, and produced substantial advances in survey sampling and estimation methods. The "automation" revolution had its onset in the late 1960's. it came about as a result of technology transfers from the computer sciences, and has produced substantial advances in the methods of compiling and processing survey data. The "cognitive" revolution, which, as some of us believe got underway during the 1980's [Jabine, 1989], was made Possible by technology and concept transfers from the cognitive sciences. Whether called a revolution or a movement, it has been introducing improved methods of designing data collection instruments and conducting questionnaire design research. Federal Statistical agencies were major players in the "sampling" and "automation" revolutions in survey technology.: Now they are playing a major role in the "cognitive" movement by 268 developing and applying cognitive laboratory techniques to find better solutions to survey response problems. It is noteworthy that the cognitive movement is not confined to the U. S. government nor to the United States [Jobe and Mingay, 1991]. This paper, moreover, deals with only one part of the U.S. movement, namely, the work of the cognitive laboratory at the National Center for Health Statistics. The paper briefly describes the history and programs of the NCHS Laboratory and outlines the Laboratory's benefits to survey research, cognitive psychology, and Federal statistics. History of the NCHS Laboratory Until 1984, the role of cognition in the survey measurement process was largely ignored in the survey research programs of the National Center for Health Statistics. None of the earlier NCHS projects had been conducted in a cognitive laboratory, though one study [Laurent, Cannel and Marquis, 1972] used psychological theories to guide the development of interviewer and questionnaire techniques. Prior to 1984, survey response had been modeled as a two stage stimulus/response process with little attention paid to the effects that the respondents' mental processes had on the accuracy of their responses. In accordance with this psychological paradigm, survey research investigated the error effects of survey instruments and procedures almost exclusively in field tests. Since these field tests sought to replicate the actual conditions of the survey, they provided little opportunity to investigate cognitive issues, such as the following: - What kinds of cognitive processing modes and strategies do respondents use in answering survey questions? - How do the cognitive processing modes and strategies of survey respondents affect the accuracy of their responses to survey questions? In 1984, with the support of an NSF grant, the NCHS embarked on a demonstration project that was motivated largely by the work of the Advanced Research Seminar on the Cognitive Aspects of Survey Methodology [Jabine, Straf and Tanur, 1984]. This project sought to demonstrate the utility of investigating the cognitive aspects of answering survey questions in a laboratory setting as a means of improving the design of Federal survey instruments [Sirken and Fuchsberg, 1984]. The project compared alternate versions of the dental supplement to the questionnaire of the 1986 National Health Interview survey.- One supplement was designed by the traditional field test method and the other by the proposed cognitive laboratory method [Lessler and Sirken, 1985]. The rationale for the demonstration project as expressed in the NSF grant proposal [Sirken, 1984] was: 269 "... because (1) questionnaire design is one of the weakest links in the survey measurement process, (2) past efforts to improve the quality of questionnaire have posed serious and difficult methodological problems, (3) the traditional field methods currently being used to improve questionnaire design are inadequate by themselves to handle many of these problems, and (4) complimentary methodologies that are not subject to the weakness of traditional field methods need to be developed, it is [therefore) essential to investigate the potential of using the (combined] techniques of the statistical and cognitive sciences in a laboratory setting as a complementary methodology for improving questionnaire design..." The demonstration project was conducted in an interdisciplinary mode and in close collaboration with university scientists so that, as the NSF grant proposal noted, another potential benefit was: "... it could go a long way in bridging the gap that exists between cognitive scientists academia and survey statisticians in Federal Statistical Agencies..." This was critical to the ultimate success of the project because it was felt that gap between the disciplines had been largely responsible for the delay in applying cognitive methods in survey research. At the successful conclusion of the demonstration project in 1986, NCHS established, with the support of a second NSF grant, the National Laboratory for Collaborative Research in Cognition and Survey Measurement. The National Laboratory's broad mission is to promote and advance interdisciplinary research on the cognitive aspects of survey methodology among Federal Statistical Agencies and the nation's universities and research centers. Interdisciplinary research with university scientists is promoted. by a Collaborative Research Program which awards competitive research contracts and appoints visiting scientists. Collaborative research with other Federal Agencies is promoted by the Questionnaire Design Research Laboratory which serves as the workplace for NCHS and other Federal Agencies to conduct intramural research [Royston, et al 1986]. The Collaborative Research Program has been largely funded by NSF grants and the Questionnaire Design Research Laboratory has been partially funded by reimbursable work agreements with other PHS Agencies [Sirken, et al 1990]. Activities of the NCHS Laboratory Much of the work of the National Laboratory is based on a cognitive theory of survey response errors that can be stated as 270 follows: "survey respondents carry-out a series of mental tasks in the interval between being asked a survey question and providing a response. When these mental tasks pose serious mental burdens for respondents they are likely to cause response errors." This view of the survey response process stimulated the development of cognitive methods for designing and pretesting questionnaire and for conducting questionnaire design research. Developing and testing survey instruments has short term objectives, namely, to detect and revise the design flaws before the survey instruments are field tested. in contrast, questionnaire design research objectives are long term, namely, to improve the designs of the next generation of survey instruments. These differences in objectives led to the development of distinctly different cognitive methods for developing and testing survey instruments and for conducting questionnaire design research. Developing and Pretesting Questionnaires The cognitive laboratory approach to developing and pretesting survey questionnaires is based on the premise that difficult, unreasonable or impossible the mental tasks implicit in some survey questions increase the likelihood of response errors. For example, survey questions containing terms respondents do not understand, that are vague or ambiguous, that impose unrealistic demands on recall, that require complicated mental calculations, that contain too many elements for the respondent to think about simultaneously, that involve issues the respondent knows or cares little about, or that ask for embarrassing or threatening information-all impose cognitive burdens that are likely to result in invalid responses. The realization that questionnaires obtain poor quality data when they ask respondents to perform difficult, if not impossible, mental tasks led to the development of a battery of laboratory techniques for investigating the cognitive burdens posed by survey questions [Bercini, In press, Royston, 1989] including think-aloud interviews, in-depth probing and focus group discussions, etc. These techniques are not new to questionnaire designers [DeMaio, 1983] but never before had they explicitly and systematically served as means of observing the manner in which respondents mentally process survey questionnaires and procedures. Intensive interviewing techniques detect questionnaire design flaws by observing the cognitive problems that result from these flaws. Poor questionnaire designs may impose difficult mental tasks at any cognitive stage of the response process including comprehending the questions recalling or estimating the information needed to answer the questions, and deciding whether or how to answer, the questions. Identifying the underlying cognitive difficulties experienced by respondents facilitates the process of revising the questionnaires appropriately. 271 Many questionnaire design problems detected and repaired by laboratory techniques are far less likely to be detected by traditional field testing methods. Consider the following question which was proposed for the National Health Interview Survey (NHIS), "During the past 12 months, have you been bothered by pain in your abdomen?" When laboratory respondents were asked this question, most answered it readily with a "Yes" or a "No". It was not until the laboratory interviewer probed into how respondents interpreted the term "abdomen" that it became apparent that respondents were unsure of what section of the body to include. The interviews also determined that respondents had variable interpretations of the phrase, "bothered by," which in turn, affected whether they answered the question affirmatively or negatively. Intensive interviewing methods not only revealed that the question was apt to result in response errors, but also the underlying cause of the problem. When the cause of a question problem is understood, the solution is more likely to be found. In this case, part of the solution was a respondent flash card that showed an outline of the torso with the abdominal area shaded in. Intensive interviews are conducted by laboratory trained questionnaire designers with many years of survey research experience. Paid subjects are recruited for the interviews. The topic and target populations of the survey determine the criteria for subject recruitment. Subjects are often selectively recruited to include those that would be most burdened by the survey questions or least successful in adopting effective mental strategies in answering the questions. Laboratory testing is usually carried out in interviewing waves of 5 to 10 subjects at a time; the questionnaire is revised in consultation with the sponsor after each wave; and the testing continues until an acceptable version is obtained. Typically, flawed questions undergo 2-4 revisions before an acceptable version is ready for field testing. Field testing is essential in order to determine how the questionnaire will work under actual survey conditions. Additional laboratory testing may be needed to evaluate the questionnaire revisions that are suggested by the field test. Depending on the complexity and scope of the questionnaire and on the number of conceptual problems associated with it, laboratory testing can be completed within several weeks or could span a longer period. For example, projects that involve special subject recruitment and testing may require a lead time of about six months or even longer. Also, laboratory projects are conducted collaboratively with survey sponsors and therefore involve frequent meetings to assure that the designed questionnaires satisfy the sponsors' research objectives. 272 Questionnaire Design Research Cognitive methods of conducting questionnaire design research investigate why some survey questions and procedures pose cognitive tasks that are difficult, unreasonable or impossible for respondents to perform. In the same way that much has been learned in medicine by studying the cognitive aspects of amnesia and other memory disorders, so it is hoped that much can be learned in survey research by studying the cognitive aspects of questionnaires that' pose severe response burdens. Questionnaire research seeks to improve the design of the next generation of survey questionnaires, especially those questionnaires dealing with topics for which better quality survey data are needed. Causal relationships between the mental tasks performed by respondents and the accuracy of their responses are investigated in experiments. These experiment may be conducted in the cognitive laboratory or embedded in on-going surveys. The laboratory approach makes it possible to undertake many types of complex experiments that would be administratively impossible or prohibitively expensive to conduct as field experiments. Embedding cognitive experiments in on-going surveys makes it feasible to test laboratory findings under actual survey conditions. Several features of cognitive laboratory experiments are noteworthy. They are interdisciplinary, involving the joint participation of cognitive psychologists and survey researchers. They generally involve testing questions that ask for the kinds of information that typically is poorly reported in surveys. They investigate those mental tasks implied by the survey questions that pose the greatest risks to accurate reporting. For example, if the question implied retrospective reporting, the focus would be on the cognitive aspects of the memory tasks and if the question asked for sensitive information the focus would be on the cognitive aspects of risk taking under conditions of uncertainty. Generally, the subjects of laboratory experiments are recruited from population frames that contain information needed to, validate the experiment's findings. For example, the laboratory subjects for experiments on retrospective reporting of medical visits were selected from the files of a Health Maintenance Organization. Because the files provided access to the recruitment of subjects with known health conditions and doctor visit patterns [Means, et al, 1988]. Finally, the findings of the laboratory experiments are interpreted in terms of their potential contributions to cognitive theory as well as their implications for improving the design of survey instruments. A recent project on dietary recall in nutrition surveys illustrates some of the benefits of conducting experiments in a cognitive laboratory. This complex multi-experiment project, involving randomization of subjects, diary keeping, and multiple 273 data collection sessions, could probably not have been undertaken as a traditional field experiment. The project investigated the cognitive burdens posed by the kinds of questions that are asked in household nutrition surveys [Smith, In press]. Generally these surveys collect dietary histories, food frequency inventories, and data on food portion sizes. Collecting these kinds of data imposes mental tasks involving free recall, frequency estimation, and magnitude estimation, respectively. Separate laboratory experiments were designed and conducted to assess the ability of respondents to provide accurate information on each of these tasks. The laboratory subjects participating in these experiments kept food diaries so their subsequent responses to dietary questionnaires could be validated. For example, one of the nutrition survey experiments tested the effect of varying the portion size definitions on respondents' reports of the amount of food consumed. For each listed food item, respondents indicated whether their typical portion was small, medium or large in comparison with a defined medium portion size. Surprisingly, the food consumption reports in the experiment were invariant to changes in the definition of medium portion size. These findings raise serious questions about the design of nutrition survey questionnaires and the quality of survey data on food consumption that are based oh portion size reports. Over the past several years, laboratory experiments have investigated the cognitive factors involved in responding to difficult-to-answer questions on a variety of health related topics including utilization. of health services, cigarette smoking histories, illegal drug use, chronic pain episodes, and chronic disease prevalence. A recent project on recall of doctor visit illustrates the benefits of embedding experiments in surveys. This split-ballot experiment was embedded in the pilot study of the National Medical Expenditure Survey. The experiment investigated the relative accuracy of retrospectively reporting doctor visits in a forward or in a backward temporal order [Jobe, et al, 1990]. It was suggested by the findings of previous laboratory experiments indicating that subjects varied in their preference between forward and backward recall order but that backward recall seemed to produce more accurate reporting [Loftus, 1985]. The survey experiment assessed the accuracy of forward,, backward and free recall reporting strategies by comparing the medical visits reported by each strategy with the visits listed in medical records. The survey experiment did not confirm the findings of the laboratory experiments and showed little difference in accuracy between the alternative recall strategies. It was concluded that there was no evidence to suggest that survey instruments should be designed to favor either the forward, backward or free recall strategies. 274 Cognitive experiments involving survey material, whether conducted in laboratories or embedded in surveys, are valuable for several reasons. First, they provide in-depth knowledge about the cognitive processes respondents use in answering hard-to-answer survey questions. In particular, they often identify the kinds of question approaches that pose response burdens. And they suggest methods of designing the questionnaires to reduce the response burdens and response errors. Secondly, because validation information is almost always collected (e.g., diaries, medical record matches, and biochemical markers) the response error effects of different questionnaire designs and cognitive strategies can be assessed. Third, the cognitive bounds on the abilities of respondents to perform specified kinds of mental tasks (comprehension, recall, etc.) posed by survey questions can be assessed. Benefits of the NCHS Laboratory The activities and programs of the NCHS cognitive laboratory during the past five years have benefitted survey research, cognitive science and Federal statistics in variety of ways. Some of the benefits are briefly outlined in these summary remarks. Survey research has benefitted from the development of methods for investigating the cognitive aspects of the survey response process. Intensive interviewing methods were perfected for designing and pretesting survey instruments in a laboratory setting, and experimental methods were perfected for conducting laboratory experiments and for embedding experiments in on-going surveys. Cognitive science benefitted from the opportunities afforded its scientists by the NCHS laboratory to participate in the interdisciplinary research projects in cognition and survey measurement. Cognitive psychologists participating in these projects had opportunities to test cognitive theories with real world survey phenomena either in laboratory experiments or in experiments embedded in on-going surveys. And it is believed that the gains in cognitive psychology will ultimately benefit survey research and the quality of Federal surveys. The activities of the NCHS laboratory fostered an appreciation and respect for the importance of conducting cognition and survey measurement research within and outside the Federal establishment. For example, the NCHS laboratory played a vital role in designing and testing NCHS survey instruments during the past several years, and it is being viewed increasingly as a PHS laboratory with a mission to service the needs of agencies throughout the Public Health Service. As the first cognitive laboratory of its kind devoted to survey research, the NCHS laboratory served as a point of reference, if not the prototype, for the cognitive laboratories 275 that have since been established at other statistical agencies including the Bureau of the Census, Bureau of Labor Statistics and Statistics Sweden. Information dissemination has always been a high priority activity and during the past five yearsi the NCHS laboratory staff and collaborators published nearly 50 reports, and presented more than 100 papers at meetings and conferences. Whether the existing movement in cognition and survey research, of which the NCHS laboratory is a part, will evolve into a full-fledged cognitive revolution with an impact equal to the sampling and automation revolutions remains to be determined. We will know that. the cognitive revolution has, occurred when it becomes apparent that the cognitive sciences are providing scientific support to survey response research comparable to the support the statistical and computing sciences have been providing to research in survey sampling and in the automation of survey data. References Bercini, D.H. Presented at the EPA/AOWNA Symposium on Total Exposure Assessment Methodology. Pretesting Questionaire in the Laboratory: An Alternative Approach. In print. Toxicology and Industrial Health. DeMaio, Theresa J. (Ed.) (1983). Approaches to Developing Questionnaires. Statistical Policy Working Paper 10. Statistical Policy Office, Office of Information and Regulatory Affairs, Office of Management and Budget. Washington, D.C. Jabine, Thomas B. (1990). Cognitive Aspects of Questionnaire Development. Presented at the EPA/AOWNA Symposium on Total Exposure Assessment Methodology. In print. Toxicology and Industrial Health. Jabine, T.B., Straf, M.L., (1984). Tanur, J.M. and Tourangeau R. (Eds.). (1984). Cognitive Aspects of Survey Methodology: Building a Bridge Between Disciplines. Washington, D.C. National Academy Press. Jobe, J.B., White, A.A., Keileyi C.L., Mingay, D.J., Sanchez, M.J., and Loftus, E.F. (1990). Recall Strategies and Memory for Health Care Visits. Milbank Memorial Fund Quarterly/Health and Society, 68, 171-199. Laurent, A.C., Cannell, C. and Marquis, K. (1972). Reporting Health Events in Household Interviews: Effects of an Extensive Questionnaire and Diary Procedure. Vital and Health Statistics, Series 2, No. 49 (DHHS Publication No. PHS 91-1079). Washington, D.C., U.S. Government Printing Office. 276 Lessler, J.T. and Sirken, M.G. (1985). Laboratory-Based Research on the Cognitive Aspects of Survey Methodology: The Goal of the National Center for Health Statistics Study. Milbank Memorial Fund Quarterly/Health and Society, 63, 565-581. Loftus, E.F. and Fathi, D.C. (1985). Retrieving Multiple Autobiographical Memories. Social cognition. Vol. 3, pp. 280-95. Royston, P.N. (1989). Using Intensive Interviews to Evaluate Questions. In F.J. Fowler, Jr. (Ed.), Heath Survey Research Methods (pp. 3-7) (DHHS Publication No. PHS 89-3447). Washington, D.C., U.S Goverrment Printing Office. Royston, P.N., Bercini, D.H., Sirken, M.G. and Mingay, D. (1986). Questionnaire Design Research Laboratory. American Statistical Association, 1986 Proceedings of the Section on Survey Methods Research, pp. 703-707. Sirken, Monroe G. (1986). National Laboratory for Collaborative Research on Cognition and Survey Measurement. Grant Proposal to the National Science Foundation. Washington D.C. Sirken, Monroe G. (1984). Laboratory Based Research on the Cognitive Aspects of Survey Methodology. Grant Proposal to the National Science Foundation. Washington, D.C. Sirken, M.G. and Fuchsberg R. (1984). Laboratory Based Research on the Cognitive Aspects of Survey Methodology. In Cognitive Aspects of Survey Methodology: Building a Bridge Between Disciples. Washington, D.C. National Academy Press. Smith, A.F. (in press). Cognitive Processes in Long-term Dietary Recall. Vital and Health Statistics, Series 6, No. 4 (DHHS Publication No. PHS 91- 1079). Washington, D.C., U.S. Government Printing office. 277 DISCUSSION Elizabeth Martin U.S. Bureau of the Census In their two papers, Monroe Sirken of the National Center for Health Statistics, and Cathryn Dippo and Douglas Herrmann of the Bureau of Labor Statistics, document the activities of the cognitive laboratories which were established in 1984 and 1988, respectively, at their two agencies. The cognitive laboratories represent a commitment to survey data quality which is accredit to the two agencies. And Monroe Sirken and Cathryn Dippo, as two of the main instigators and initiators responsible for establishing the laboratories, deserve credit and appreciation for their effort and accomplishment. The record of achievement by the two laboratories is a good one. Dippo and Herrmann organize their paper around a clear and comprehensive discussion of the sources of cognitive problems which can introduce errors in the response process; it id impressive how many of these problems have already been tackled in the BLS Collection Procedures Research Laboratory in its short history. Excellent research on a range of topics is also being conducted at the NCHS National Laboratory for Collaborative Research in Cognition and Survey Measurement, though in his paper Sirken does not actually describe the research. The NCHS lab lives up to the "collaborative" in its name; the number and caliber of academic researchers who have been involved in their projects are very high. The growth of laboratory-based research on cognitive aspects of survey methodology is described by Dippo and Herrmann as a "movement" and by Sirken as a "revolution." These characterizations accurately reflect the enthusiasm and ferment of activity and new ideas in this area. However, "revolution" may not be the most useful metaphor to describe how cognitive psychology is affecting (or, more importantly, should affect) survey research. In fact, the metaphor of "revolution" reflects and reinforces a weakness of the work currently going on in the new cognitive laboratories. By emphasizing discontinuity with the past, researchers are, led to ignore relevant work which preceded many of the methods and ideas of the current "movement." Sirken characterizes survey research as (until recently) "based almost exclusively on the behaviorist paradigm" with "respondent's mental states... virtually ignored." This isn't accurate. Survey researchers, at least those practicing in academic or commercial settings, have hypothesized about and investigated psychological states intervening between survey questions and respondents' answers at least since World War II. (Jean Converse's Survey Research in the United States: Roots and Emergence, 1890-1960 provides a fascinating and useful history which traces the intellectual 278 origins of survey research.) Much of this work is still very relevant, and should be built on rather than ignored. For example, Dippo and Herrmann state that, "except for social desirability, the survey field is just beginning to investigate factors that affected communication of responses." They would benefit from reviewing the survey literature on the topic of communication, beginning with Herbert Hyman et al's comprehensive, Interviewing in Social Research, published in 1954. The methods used in the cognitive laboratories also have roots in the past. For example, Naomi D. Rothwell used very similar methods to conduct research on questionnaire design at the Census Bureau during the 1960s and 1970s. It is a bit of an overstatement for Sirken to claim in his paper to have invented the cognitive laboratory, without acknowledging similar, earlier activities. In the field of survey research, there is a tradition of applying ideas from psychology to survey measurement issues. For the new work in the, cognitive laboratories to advance the state of the art of survey measurement, it should build on this tradition. This would also increase its credibility to many survey researchers. A danger of the "revolution" metaphor is it suggests a philosopy of "out with the old, in with the new." In some cases, this leads researchers to forget what they know about good survey practice. Compared to a survey, the cognitive laboratories generally rely on more intensive, less structured interviews with smaller numbers of respondents. This approach can be very informative about the nature and sources of cognitive errors in surveys. However, the "samples" usually are very small and not selected according to probability methods. one must be cautious drawing inferences from the results of most of the cognitive lab studies to date. For instance, I think Dippo and Herrmann are overstating the case when they conclude that, "research done at BLS shows clearly that proxy recall is different than self recall, both in terms of amount and kinds of information recalled." Laboratory findings such as this are more usefully thought of as hypotheses which should be subjected to more rigorous testing in a sample survey, and/or experimentally. It is important to keep in mind that standards of evidence and proof still apply to research conducted in the cognitive laboratories. In some writings, the word "cognitive" is repeated so often as to suggest that the writer believes the word itself is sufficient to establish the merits of the research. But the researcher is still obliged to make his or her case on the evidence. For example, Sirken presents an example of a question on marijuana use which he says was improved by cognitive testing. How do we know it is better? He presents no evidence or logic to support his claim. In the long run, if the cognitive "movement" is to be taken seriously, it must demonstrate, not simply assert, the 279 value of its products, and be wary of the temptation to oversell itself. I believe there are two common goals behind the activities in the cognitive laboratories. One goal is to improve particular survey measurements. The second is to develop a theoretical foundation (beyond sampling theory) for improved survey design. The latter, broader aim requires that we develop better measures of nonsampling errors, and a better understanding of the effect of alternative survey designs on nonsampling errors. Methods and ideas from cognitive psychology are tools for achieving both specific and general improvements, but are not an end in themselves. Other social sciences (for example, social psychology) also have relevant knowledge to contribute. With these goals (and the previously-stated cautions) in mind, what then is new and revolutionary about the work being done in the cognitive laboratories? First, this research has yielded new appreciation of the vulnerability of factual survey questions to biases and errors. I think it is fair to say that most government statisticians and academic survey methodologists probably have taken for granted the validity of simple factual questions. The research on problems of comprehension, recall and other cognitive difficulties is contributing to a more sophisticated understanding of how much we have yet to learn about the error properties of survey measurements. Second, and more important, the research in the cognitive labs represents a new and more extensive set of methods for pretesting survey questionnaires and procedures. This in itself is a great leap forward. Traditionally, pretests of survey questionnaires have been ad hoc and informal, based on interviews with a few respondents and with no real guidelines beyond common sense to decide when one has succeeded or failed. The cognitive laboratories are changing that. Close and in-depth examination of Problems of respondent comprehension, recall, and' judgment, is shedding new light on the causes of these problems and (better yet) new ideas about how to correct them. The new methods which are being used and developed in the cognitive laboratories form a logical series of pretests prior to fielding a survey, proceeding from intensive, informal interviews, to small-scale experiments testing alternative questions or designs, to large- scale field experiments. In addition, as Cathryn Dippo points out in her remarks, testing can be integrated into the main survey itself, to provide ongoing information about nonsampling errors. The new methods thus make Possible a more scientific and, systematic approach to pretesting, and they promise to yield improvements in the quality of data collected by the federal government. 280 DISCUSSION Murray Aborn National Science Foundation (retired) I am grateful to my co-discussant, Elizabeth Martin of the Census Bureau, for providing the perfect lead-in to my own commentary on the papers presented at this session. Dr. Martin reminded us of the importance of viewing any disciplinary development from the perspective of its historical predecessors, and in this connection she succeeded in moving the advent of CASM (Cognitive Aspects of Survey Methodology) -- writ large -- back several decades from the year most commonly cited as the date of its birth -- namely, 1980. More consequential than revising our perception of the chronology of CASM (again writ large) is the difference Dr. Martin's remarks point up between the characterization of CASM in the paper presented by Cathryn Dippo and Douglass Herrmann of the Bureau of Labor Statistics, and the one presented by Monroe Sirken of the National Center for Health Statistics. Dr. Martin's remarks implicitly characterize CASM as a reawakening of old concerns, and thus place her in strong agreement with Dippols and Herrmann's labeling of CASM as a "movement," in contrast with Sirken's labeling of CASM as a methodological "revolution." Indeed, there is much to support the view of CASM as a movement; for instance, the enthusiasm of its adherents and the growing frequency with which its ideology is being endorsed by sectors of the statistical community and users of statistical data generally who have heretofore tended to ignore the psychosocial underpinnings of survey-taking (see, for example, Suchman and Jordan, 1990). However, this does not mean that Sirken's description of CASM as representing a revolutionary development is totally incorrect. It may merely be premature, for the potential of CASM as a true breakthrough -- as a true revolution in survey research -- is clearly present in the programmatic and research agenda laid out for it in the seminal CASM document prepared by the National Academy of Sciences (see Jabine, et al, 1984). At the present time, only half the CASM prospectus is being actively pursued; namely, those objectives having to do with the adoption of certain recent advances in cognitive science into the survey design and instrumentation process. What we have seen little of to date is action on those objectives having to do with the use of surveys as naturalistic test beds for laboratory-based theories of the functioning of the neuronal mind and, ultimately, the emergence of a new paradigm for social/behavioral research in which survey- taking plays an important role in understanding such basic cognitive phenomena as how the brain stores memories and how mental imagery influences perception and recall, and in which developments in cognitive science relating to such branches of the field as 281 natural language semantics are used to produce greatly improved methods for achieving high-quality survey measurement. In other words, fulfillment of the "cognitive revolution" alluded to in Monroe Sirken's paper is clearly in prospect, but is yet to materialize. I shall have a bit more to say on this subject at the close of my commentary; meanwhile, however, it is my opinion that much of the force behind Dr. Martin's view of CASM as a reawakening of old survey concerns -- as a "movement" more so than a "revolution" -- stems from the present truncated status of the programmatic agenda initially prescribed for the field. This gives CASM the appearance of a one-sided effort to adopt, in fairly superficial terms, some of the investigative techniques employed in recent laboratory-based cognitive psychology, and incorporate them in the conventional procedures for constructing and pretesting survey questionnaires. Under such a perspective, not much may appear to have been added to what has long been known to be of influence in survey responding, and audiences such as the one attending the present session may rightfully feel that CASM amounts to little more than another real- life example of the familiar tale of "The Emperor's New Clothes" which, albeit a story-from the literature of childhood, embodies a profound adult theme concerning human gullibility and our tendency to accept uncritically what experts -- genuine and otherwise tell us is true, novel, or significant. Now, let me examine the Emperor's New Clothes proposition against the CASM-engendered activities at the BLS and NCHS laboratories reported in the papers by Dippo and Herrmann and by Sirken. Reducing a sample of these activities to their most generic properties (in the sense of survey factors which, induce 282 response error), I would break them down into the following classification: COLLECTION PROCEDURES QUESTIONNAIRE DESIGN RESEARCH LABORATOY LABORATORY (BLS) (NCHS) - Question Ambiguity - Question Wording and Order (The extent to which a question (The differential results induced may be interpreted in more than by synonymous variation one way.) rearrangement of sequence.) - Long-term Recall - Memorial Decay (The length of time over which (The validity -- or veridicality the respondent is required to -- of information supplied from retrieve from memory.) short- and long-term memory.) - Emotional Loading - Affective Sensitivity (The degree of psychological (The likelihood that a question stress which a question may place may be embarrassing or impinge upon the respondent.) upon the respondent's privacy.) - Subcultural Norms - Linguistic Complexity (Question comprehensibility (The effect of gramatical across ethnic subgroups.) construction on the respondents ability to comprehend.) - Social Desirability - Lexical Level (The extent to which a question (The extent to which a question is likely to elicit a normative requires the respondent to have rather than an idiopathic specialied -- in this case response.) medical -- knowledge.) Now, it is hard to believe that the many survey researchers trained in social psychology and cognate fields of social science are oblivious to influences -- such as those charted above -- regardless of whether intellectual, technical, and/or cost factors make it impractical to subject such nonsampling sources of error to adequate control, or to estimate the proportion of total survey error due to their ubiquitous presence. To take the phenomenon of Social Desirability, for example, it does not require a social scientist to comprehend the universal tendency of people represent a societally acceptable facade when questioned about attitudes and behavior. The popular press and many humorous books have for decades poked fun at surveys by ridiculing the informational value of asking such survey items as, "Do you bathe at least once a week?" or "Do you brush your teeth every day?" 283 To take some other examples, did it require CASM to alert survey researchers to the difference in results when a question is phrased one way as opposed to another? Or to the difficulty of most respondents to deal with questions presented in grammatically complex form? Or to the impingement of certain areas of questioning on the sensitivity of respondents? Or to memorial decay overtime? Or to a respondent's understanding of questions embodying medical terminology? I can't resist regaling the audience with a personal anecdote illustrating how ordinary, and even old-fashioned, if you will, is appreciation of the fact that few individuals not trained or highly educated in medicine cannot comprehend medical lexicography, and that one is apt to get ludicrous results from asking questions embodying medical terminology. More than 25 years ago, when employed at the National Institute of General Medical Sciences, I shared an office with a public health epidemiologist who had just returned from a tour of duty in Puerto Rico. He told me of an effort to obtain data on the extent of interruption to normal life activities due to amoebic dysentery, which was then prevalent in most rural areas of Puerto Rico. Having never before conducted a survey, his group of public health officials put together a series of questions utilizing such terms as diarrhea and defecation to get estimates of frequency. When the obtained results showed an average of only one to two bowel movements per day, the survey takers knew something was wrong and quickly realized that it was likely due to the language employed in identifying the disease. The Public Health people reran a small subsample of respondents using the term "bowel movement" in the questionnaire, and obtained a slightly higher, but still medically incredible, estimate of frequency. Finally a native informant suggested that they phrase all questions pertaining to diarrhea in terms of La Mange or "The Curse" as it was known in the rural areas of the island and when they did this, the average reported frequency shot up to a more medically believable 11 or 12 occurrences per day. If sheer knowledge of the fact that such variables as 1evel of lexical comprehension, differences in subcultural norms, and the tendency to respond in socially desirable ways are sources of error in survey research, what, then, is it that is truly new about the CASM movement? There are, to my mind, three major issues that have been brought to the fore by the CASM movement, coupled to the addition of new technical procedures which have proved powerful in cognitive research in psychology and artificial intelligence. And, as I have mentioned before and will emphasize at the close of my remarks, there is the potential for bringing about a truly interdisciplinary 284 effort to understand just what goes on in the interactional dynamics for survey and respondent. The three major issues which have surfaced as a result of CASM are: 1. A reawakening of the essential conflict between survey questionnairing and ordinary conversation owing to the need for artificially imposed standardized conditions of administration from the standpoint of survey statistics on the one hand, and the natural world existence of individual differences in mentality on the other. 2. The extent to which laboratory-based treatments and results can be transferred to the field in the case of survey-taking. This issue is of general importance to social science, as well as being particularly relevant to survey research insofar as the laboratory setting, which provides greater conditions of control and flexibility, creates possibilities for a more systematic approach to instrumentation, and hence to survey measurement. 3. The degree to which the contemporary shift in the underlying paradigm of survey research's cognate substantive discipline -- i.e., psychology -- requires a realignment away from behaviorism and toward cognition. CASM represents a bold attempt to test this issue and assay its yield, but there has thus far been far too little involvement of cognitive psychology per se apart from the importation of certain investigative techniques. I by no means wish to detract from the accomplishments reported in the papers by Dippo and Herrmann and by Sirken based upon the importation of the techniques employed in contemporary cognitive psychology, into the innovative laboratory facilities now ensconced in such two prestigious governmental agencies as BLS and NCHS. Much thought and expertise have been applied to the transfer of technology represented, by the successful adoption of such cognitive probes and methods as: (1) Focus Groups; (2) Part-set Cueing; (3) Protocol Analysis; and (4) Think-aloud procedure. But in my opinion, this could be just the beginning of a truly revolutionary development in survey research and, through its influence, on social science more broadly. The laboratory-based techniques and procedures you have heard presented at this session are derived from research begun in the early 1960's by Nobel Laureate Herbert Simon and Alan Newell that resulted in the General Problem Solver and led, to the foundations of the field of artificial intelligence (Barr and Feigenbaum, 1982). The more recent work of Simon (Simon, 1987), shows the even greater, potential of cognitive technology to uncover human information processing systems. 285 However, there is reason to be both pessimistic and optimistic about the future of CASM. On the one hand, the statistical framework of survey research -- the dominant framework for the field -- is concerned with drawing inferences about populations -- about whether the sample of a population is large and representative enough to permit accurate and valid conclusions to be reached about the distribution of characteristics in the population from which the survey sample was drawn. On the other hand, the cognitive framework is concerned with drawing accurate and valid inferences about individuals about respondent "truthfulness," if you will. Therefore, one framework calls for instrumentation designed to enhance person-to-person comparability, while the other calls for instrumentation designed to enhance the assessment of person-to- person variations on each survey variable. It is the work of the two survey/cognitive research laboratories reporting here today that represents one of the two reasons I find to be optimistic about the future of CASM. Such facilities offer the best opportunities for reconciling the conflicting survey conceptual frameworks described above. The other reason I find to be optimistic lies in the pronouncement appearing in a neuropsychological book which has become a national bestseller in addition to its importance to the scientific literature on brain-behavior relationships. I refer to -- and endorse to you as top-quality literature as well as a work of cognitive science importance -- Oliver Sacks' The Man Who Mistook His Wife for a Hat. I close my remarks by quoting from a passage in this work that, I believe, should stimulate cognitive scientists to become fuller participants in CASM, recognizing that survey centers and facilities are ideally suited to cognitive explorations and offer the prospect of a vital new interdisdipline. After presenting and analyzing the case of The Man Who Mistook His Wife for a Hat, Sacks concludes, as I do here, that: cognitive sciencesiare themselves suffering from an agnosi similar to the one afflicting the man who mistook his wife for a hat. That man may thus serve as a warning and parable of what happens to a science which eschews the judgmental, the particular, the personal, and becomes entirely abstract and computational (Sacks, 1987, p. 20). I hope that cognitive psychologists will take heed of Dr. Sacks' warning and see the opportunity that survey research offers to offset the present trend toward abstract computationalism. 286 References 1. Suchman, L. and Jordan, B., "Interactional Troubles in Face-to- Face Survey Interviews," JASA, Vol. 85, No. 409, pp. 232-253, 1990. 2. Jabine, T., Straf, M., Tanur, J., and Tourangeau, R. (eds). Cognitive Aspects of Survey Methodology: Building a Bridge Between Disciplines, Washington, D.C.: National Academy Press, 1984. 3. Barr, A. and Feigenbaum, E.A., (eds.) The Handbook of Artificial Intelligence. Stanford, CA: Heuristech Press, 2:184-192, 1982. 4. Simon, H., "The Steam Engine and The Computer: , What Makes Technology Revolutionary," EDUCOM Bulletin, 22(l):2-5, 1987. 5. Sacks, O., The Man Who Mistook His Wife For A Hat, New York: Harper and Row, p. 20, 1987. 287 288 Reports Available in the Statistical Policy Working Paper Series 1. Report on Statistics for Allocation of Funds (Available through NTIS Document Sales, PB86-211521/AS) 2. Report on Statistical Disclosure and Disclosure-Avoidance Techniques (NTIS Document Sales, PB86-211539/AS) 3. An Error Profile: Employment as Measured by the Current Population Survey (NTIS Document Sales PB86-214269/AS) 4. Glossary of Nonsampling Error Terms: An Illustration of a Semantic Problem in Statistics (NTIS Document Sales, PB86- 211547/AS) 5. Report on Exact and Statistical Matching Techniques (NTIS Document Sales, PB86-215829/AS) 6. Report on Statistical Uses of Administrative Records (NTIS Document Sales, PB86-214285/AS) 7. An Interagency Review of tizie-Series Revision Policies (NTIS Document Sales, PB86-232451/AS) 8. Statistical Interagency Agreements (NTIS Documents Sales, PB86-230570/AS) 9. Contracting for Surveys (NTIS Documents Sales, PB83-233148) 10. Approaches to Developing Questionnaires (NTIS Document Sales, PB84-105055/AS) 11. A Review of Industry Coding Systems (NTIS Document Sales, PB84-135276) 12. The Role of Telephone Data Collection in Federal Statistics (NTIS Document Sales, PB85-105971) 13. Federal Longitudinal Surveys (NTIS Documents Sales, PB86- 139730) 14. Workshop on Statistical Uses of Microcomputers in Federal Agencies (NTIS Document Sales, PB87-166393) 15. Quality in Establishment Surveys (NTIS Document Sales, PB88- 232921) 16. A Comparative Study of Reporting Units in Selected Employer Data Systems (NTIS Document Sales, PB-90-205238) 17. Survey Coverage (NTIS Document Sales, PB90-205246) 18. Data Editing in Federal Statistical Agencies (NTIS Document Sales, PB90-205253) 19. Computer Assisted Survey Information Collection (NTIS Document Sales, PB90-205261) 20. Seminar on the Quality of Federal Data (NTIS Document Sales, PB91-142414) Copies of these working papers may be ordered from NTIS Document Sales, 5285 Port Royal Road, Springfield, VA 22161 (703) 487-4650 "1"David A.Pierce is Senior Statistician, Micro Statistics Section, Division of Research and Statistics, Federal Reserve Board, Washington, DC 20551, and a member of the Federal Committee on Statistical Methodology and its Subcomittee on Data Editing in Federal Statistical Agencies. Any views expressed do not necessarily reflect those of the Federal Reserve System. 2 The sampling design in the original CATI sample was stratified simple random sampling. The reinterview sample was a random sample of CATI respondents within strata. The bias was approximated by expanding the difference in reconciled and CATI response at the sample unit level.