[GRAPHIC] \WP14CVR.GIF


                  MEMBERS OF THE FEDERAL COMMITTEE ON
                        STATISTICAL METHODOLOGY

                              (June 1986)

                       Maria E.Gonzalez (Chair)
                    Office of Management and Budget

Barbara A. Bailar                                    William E. Kibler
Bureau of the Census                             National Agricultural
                                                    Statistics Service

Yvonne M. Bishop                                          David Pierce
Energy Information                               Federal Reserve Board
Administration

Edwin J. Coleman                                         Thomas Plewes
Bureau of Economic Analysis                 Bureau of Labor Statistics

John E. Cremeans                                             Jane Ross
Office of Business Analysis             Social Security Administration

Zahava D. Doering                                   Wesley L. Schaible
Defense Manpower Data Center                Bureau of Labor Statistics

Daniel E. Garnick                                       Fritz Scheuren
Bureau of Economic Analysis                   Internal Revenue Service

Terry Ireland                                         Monroe G. Sirken
National Security Agency                    National Center for Health
                                                            Statistics

Charles D. Jones                                     Thomas G. Staples
Bureau of the Census                    Social Security Administration

Daniel Kasprzyk                                      Robert D. Tortora
Bureau of the Census                             National Agricultural
                                                    Statistics Service


                                PREFACE

The Federal Committee on Statistical Methodology was organized by OMB
in 1975 to investigate methodological issues in Federal statistics. 
Members of the committee, selected by OMB on the basis of their
individual expertise and interest in statistical methods, serve in
their personal capacity rather than as agency representatives.  The
committee conducts its work through subcommittees that are organized
to study particular issues and that are open to any Federal employee
who wishes to participate in the studies.  Working papers are prepared
by the subcommittee members and reflect only their individual and
collective views.

The Subcommittee on Statistical Uses of Microcomputers in Federal
Agencies organized a one-day workshop held on April 24, 1985.  This
working paper is based on the workshop and discusses four topics:
planning to buy and use microcomputers for statistical purposes;
electronic data dissemination; applications of microcomputers; and
expert systems.  The report is intended to provide helpful guidance to
Federal agencies in purchasing and using microcomputers for
statistical purposes.

The Subcommittee on Statistical Uses of Microcomputers in Federal
Agencies was chaired by Terry Ireland of the National Security Agency,
Department of Defense.


                    MEMBERS OF THE SUBCOMMITTEE ON

              USES OF MICROCOMPUTERS IN FEDERAL AGENCIES

                         Terry Ireland*, Chair

                       National Security Agency

Ken Berkman                                             Michael Leszcz
Bureau of Economic Analysis                   Internal Revenue Service

Jay Casselberry                                              Tom Nagle
Energy Information Administration             Internal Revenue Service

Frederick J. Cavanaugh                                   Ronald Steele
Bureau of the Census                  National Agricultural Statistics
Service

Lawrence H. Cox                                          Peter Stevens
Bureau of the Census                        Bureau of Labor Statistics

Richard Engels                                   Linda Bouchard Taylor
Bureau of the Census                          Internal Revenue Service

Maria E. Gonzalez* (ex officio)                             Mark Winer
Office of Management and Budget        Office of Management and Budget


*Member, Federal Committee on Statistical Methodology


                           ACKNOWLEDGEMENTS


The idea of a workshop as a focal point for proceedings on Statistical
Uses of Microcomputers was suggested by Maria Gonzalez, Chairperson of
the Federal Committee on Statistical Methodology.  She also provided
contacts in many Federal agencies, which made possible a broad Federal
participation in the workshop.

The planning of the workshop was done by the Subcommittee.  Four
topics were selected for the sessions of the workshop.  The
chairpersons designated by the Subcommittee organized each session. 
They were:

                                                           Chairperson

Session on Planning                                      Lawrence Cox,
                                                  Bureau of the Census

Session on Electronic Data                                Ken Berkman,
Dissemination                              Bureau of Economic,Analysis

Session on Applications                                 Ronald Steele,
                                                 National Agricultural
                                                    Statistics Service

Session on Expert Systems                               Terry Ireland,
                                              National Security Agency

The proceedings were prepared by the chairpersons and rapporteurs of
each session based on input from the speakers.  The Subcommittee
thanks all the speakers in the workshop for their participation.

Terry Ireland, who chaired the Subcommittee, and Norman Glick edited
the final report.

Linda Taylor ably handled all the organizational and administrative
details of the workshop the real basis for a very smooth-running
conference.


                                 -iii-


             FEDERAL COMMITTEE ON STATISTICAL METHODOLOGY

            WORKSHOP ON STATISTICAL USES OF MICROCOMPUTERS
                          IN FEDERAL AGENCIES

                            April 24, 1985

                           TABLE OF CONTENTS

                                                                  Page

Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Members of the Subcommittee on Statistical Uses of
Microcomputers . . . . . . . . . . . . . . . . . . . . . . . . . . .ii

Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . iii

Introduction.  MARIA E. GONZALEZ, Office of
Management and Budget. . . . . . . . . . . . . . . . . . . . . . . . 1

Session on Planning. . . . . . . . . . . . . . . . . . . . . . . . . 3
     Summary. Prepared by FREDERICK J. CAVANAUGH,
     Bureau of the Census. . . . . . . . . . . . . . . . . . . . . . 3
     Introduction. LAWRENCE H. COX, Bureau of the
     Census. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
     The Census Bureau Microcomputer    Information
     Center. RONALD SWANK, Bureau  of the Census . . . . . . . . . . 6

     The National Security Agency Personal Computing
     Information Center. KATHY SCHNAUBELT, National
     Security Agency . . . . . . . . . . . . . . . . . . . . . . . .11
II
     Use of Microcomputer Technology at the Bureau of
     Labor Statistics. PETER STEVENS, Bureau of
     Labor Statistics. . . . . . . . . . . . . . . . . . . . . . . .13
     Discussion. LAWRENCE H. COX, Bureau of the
     Census. . . . . . . . . . . . . . . . . . . . . . . . . . . . .23
     Questions,and Answers . . . . . . . . . . . . . . . . . . . . .25

Session on Electronic Data Dissemination . . . . . . . . . . . . . .29
     Summary. Prepared by JAY CASSELBERRY, Energy
     Information Agency. . . . . . . . . . . . . . . . . . . . . . .29
     Use of Microcomputer Disks to Disseminate
     Information. STUART WEISMAN, National
     Technical Information Service . . . . . . . . . . . . . . . . .29

Cendata:  Development and Implementation.
     BARBARA ALDRICH, Bureau of the Census . . . . . . . . . . . . .34
     Electronic Dissemination of Perishable
     Information. ROXANNE-WILLIAMS, U.S.
     Department of Agriculture . . . . . . . . . . . . . . . . . . .38
     Questions and Answers . . . . . . . . . . . . . . . . . . . . .40

Session on Applications. . . . . . . . . . . . . . . . . . . . . . .45
     Summary. Prepared by THOMAS NAGLE, Internal
     Revenue Service . . . . . . . . . . . . . . . . . . . . . . . .45


                                 -iv-


Spreadsheets and Statistical/Econometric
     Applications in Econometric Research. LINDA
     P. ATKINSON, U.S. Department of Agriculture . . . . . . . . . .46
Spreadsheets and Data Base Applications Used by
     the Crop Reporting Board in Reviewing Survey
     Indications and Preparing Publications. GARY
     NELSON, U.S. Department of Agriculture. . . . . . . . . . . . .50
Manager's Perspective on the Acquisition and Use
     of Microcomputer-Based Graphics Packages.
     RICHARD W. HAYS, Internal Revenue Service . . . . . . . . . . .51
Current Applications of UNIX-Based Microcomputer
     Systems. BRIAN CARNEY, U.S. Department of
     Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . .54
Equipped for the Future? PAUL DOBBINS, U.S.
     Department of the Treasury. . . . . . . . . . . . . . . . . . .56
Concerns About Data Integrity, Security, and
     Accessibility in an Environment Where
     Microcomputers and Mainframes Are Interfaced.
     DICK SHIVELY, U.S. Department of Agriculture. . . . . . . . . .58
Questions and Answers. . . . . . . . . . . . . . . . . . . . . . . .61

Session on Expert Systems. . . . . . . . . . . . . . . . . . . . . .67
     Summary. Prepared by NORMAN GLICK, National
     Security Agency . . . . . . . . . . . . . . . . . . . . . . . .67
Introduction. TERRY IRELAND, National Security
     Agency. . . . . . . . . . . . . . . . . . . . . . . . . . . . .69
Expert System Tutorial. GEORGE LAWTON, Army
     Research Institute. . . . . . . . . . . . . . . . . . . . . . .70
An Extension of Statistical Software to Expert
     Systems. JAMES J. FILLIBEN, National     Bureau of
Standards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78
Editing and Imputation. BRIAN GREENBERG, Bureau
     of the Census . . . . . . . . . . . . . . . . . . . . . . . . .85
Discussion. MARK WINER, Office of Management and
     Budget. . . . . . . . . . . . . . . . . . . . . . . . . . . . .93
     Questions and Answers . . . . . . . . . . . . . . . . . . . . .94

Appendix. Announcement of Workshop on Statistical
     Uses of Microcomputers in Federal Agencies. . . . . . . . . . .97


                                  -v-


                             INTRODUCTION

          Maria E. Gonzalez, Office of Management and Budget

A subcommittee of the Federal Committee on Statistical Methodology
organized a one-day workshop on statistical uses of microcomputers in
federal agencies.  The purpose of the workshop was to share 
information among federal agencies on the statistical uses of
microcomputers.

About 200 persons from federal agencies attended the workshop.  The
audience had an opportunity to ask questions and make comments in the
discussion period of each session.  All were acquainted with the uses
of microcomputers.  Some were also responsible for the planning of
statistical uses of microcomputers in their agencies.  The
announcement of the workshop is included in the Appendix.

Four topics were discussed at this workshop.

1.   Planning of Statistical Uses of Microcomputers.  The first
session described three microcomputer information centers in federal
agencies.  The purpose of personal computer (PC) information centers
is to familiarize the agency users with the PC potentialities.  This
session focused on planning#, implementation, and evaluation within
federal agencies of statistical uses of microcomputers.  The main
questions asked were: Who should have microcomputers? For what
purposes should microcomputers be used? In what configurations? At
what costs? How will microcomputers coexist with central automatic
data processing services?

2.   Electronic Data Dissemination.  This session dealt with different
data dissemination methods.  The discussion covered each agency's
approach to data dissemination and the problems encountered in
implementation.

3.   Applications of Microcomputers. This panel discussion focused on
the usefulness and weaknesses of microcomputer software and operating
systems, the interface of mainframes and microcomputers, and factors
affecting data integrity, security, and accessibility.

4.   Expert Systems The methodological basis for expert systems was
discussed and several examples were given.  The examples describe
current expert systems with statistical applications.

The proceedings of this one-day workshop follow.  For each session
there is a summary, the presentations, and the discussions that
followed.


                                  -1-


                          SESSION ON PLANNING


                           SESSION SUMMARY*


The microcomputer technology of the 1980s is a personal and,
therefore, a user-oriented technology.  However, planning for
microcomputer,technology is often very complex and causes many changes
in the workplace.  Program planners must take many factors,into
account when planning the introduction of a microcomputer system into
their organization.  Three personal computer information centers were
described:

The Census Microcomputer Information Center of the Bureau of the
Census

The Personal Computer Information Center of the National Security
Agency

The microcomputer system of the Bureau of Labor Statistics

The planning, management and evaluation of microcomputer technology at
the Census Bureau officially began in 1983 with a meeting of the
Executive Staff.  Prior to that time, microcomputer technology testing
and evaluation work was ongoing at the Census Bureau, but this was the
first time that agency-wide distribution of microcomputers was
discussed.  The Census Microcomputer Information Center (CMIC) was
established as a result of this meeting.  To give greater emphasis to
the importance of microcomputer technology, the Census Bureau located
the Center in the Office of the Director with its manager reporting
directly to the Associate Director for Administration.

The purposes of CMIC are to assist employees in learning about
microcomputer technology -- both from a user point of view and a
manager/procurer point of view -- and to reduce the overall costs of
microcomputer technology purchase and maintenance.  Employees
are.given access to various brands of hardware and software to test
prior to purchasing.  They are also given "hands-on" experience in the
use of the newest in microcomputer hardware and software through
special arrangements made with the various vendors and manufacturers. 
On-site training in the use of hardware and software is provided
by,outside trainers, with the divisions paying the costs for their
employees.  Costs currently range from about $100 to $125 per person
per day, which are quite favorable in comparison with commercial costs
of similar training.

The activities of the National Security Agency's Personal Computer
Information Center (PCIC) started approximately 18 months ago, when
NSA
----------------------
*Frederick J. Cavanaugh, Bureau of the Census.
                                  -3-


established the PCIC to train employees in the use of PCs and vendor-
developed software.  It did not take long to discover problems of
compatibility among various brands of microcomputers.  Therefore,
standards were established to ensure that:

1.All microcomputer systems at NSA are compatible with one another for
effective communications and portability.

2.All systems are able to function using the UNIX operating system --
again, to allow for communications and portability.

3.The microcomputer systems-are supportable; that is, they must be
easily and cheaply repaired.

4.The systems are secure, so as not to divulge secret information.

NSA has set its microcomputer standards around the IBM PC and PC/XT in
a UNIX-based environment (IBM's PC/IX) and its office automation
standards around the Wang PC.

BLS's microcomputer system is essential for efficient office
operation, and BLS has kept this in mind in designing-and developing
its system.  The BLS Executive Staff is very supportive of the
microcomputer system.

In designing the microcomputer system at BLS, several critical needs
have to be met.  These include:

     1.The need for a system that can readily provide terminal
communication with mainframe computers.

2.The need for a system capable of communicating among various
machines and those located in field offices as well.

3.The need to provide security for confidential information.

BLS undertook research and experiments to determine which
microcomputer system best met its needs.  Upon completion of the
research, a single system comprised of machines from a single
manufacturer was implemented and a set of standards was developed
around its operation and use.  The present system includes over 100
IBM PC/XTs and three Ethernet (FIPS 1O7) local area networks.

The microcomputer systems described in the presentations form a
continuum from the experimental or user-oriented approach to the more
standard production or program-oriented approach.  However, despite a
commonality of needs and objectives, each agency has chosen a
different approach to planning and managing microcomputer technology.

                                  -4-


                             INTRODUCTION

                 Lawrence R. Cox, Bureau of the Census

Welcome to the Workshop on Statistical Uses of Microcomputers in
Federal Agencies, sponsored by the Federal Committee on Statistical
Methodology.  We begin with this session on planning.

Microcomputer technology is the technology of the 1980's.  It is a
personal and, therefore, a user-oriented technology.  However, its
focus on the individual often can be misleading from a planning
perspective -- at the agency or office level, planning and managing
the use of microcomputer technology becomes very complex very fast. 
While encompassing important technical issues concerned with hardware,
software and communications networks, this technology also quickly
brings the planner face-to-face with the.business of managing and
deriving improvements systematically from technological change. 
Inevitably, the introduction of microcomputers into an organization
changes the workplace and the skills and orientation of workers.  It
presents new choices and often demands that these be made swiftly.

In large organizations and small offices, the following.questions must
be addressed:
     - where does microcomputer technology fit into the agency or
office?
     
-how should it be introduced?
     
-how can the organization experiment and grow with this
          technology?

-what must the organization do to plan and manage this technology
effectively?

     - should standards be set for its use? which 'standards? how
should they be set? by whom? how should they be enforced?

     - what sort of future decisions need to be made, and who should
make them?

We are fortunate today to have a panel-of experts in this field, whose
experience should shed light on answers to these and other important
questions facing the statistical program manager about to embark on
the introduction of microcomputers into his or her organization.

They speak-with the experience of individuals tasked with managing
groups assigned these responsibilities in three different Federal
agencies: the Bureau of the Census, the National Security Agency, and-
the Bureau of Labor

Statistics.
                                  -5-


The speakers are:
     - Mr. Ronald Swank, Manager, Census Microcomputer Information
Center, Bureau of the Census.

     - Ms. Kathy Schnaubelt, Chief, Information Resources And New
Technology Branch, National Security Agency.

and
     - Mr. Peter Stevens, Chief, Division of Communications and
Computing Technology, Bureau of Labor Statistics.

Until recently Kathy was Chief of NSA's Personal Computer Information
Center and had direct responsibility for the functions we will discuss
this morning.  Ron and Peter have had these responsibilities on a
continuing basis for some time.

Each speaker will make a brief presentation on how the problem of
planning the use of microcomputers was addressed in their agency.  I
will follow with a few comments by way of formal discussion, and we
will then open the floor for discussion and questions from the
audience.

          THE CENSUS BUREAU MICROCOMPUTER INFORMATION CENTER
                  Ronald Swank, Bureau of the Census


The words "microcomputer" and "personal computer" are often used in a
manner that blurs their intended use.  In the true sense of the word
"microcomputer," the Bureau of the Census has been using
microcomputers since 1968.  The FOSDIC (Film Optical Sensing Device
for Input to Computers) allowed us to film and input census forms to
computers without manual data entry.  In 1973 we attached IBM 6250
tape drives to Sperry mainframes, approximately three years.before
Sperry announced similar availability, and experimented with the ATL
automated tape library.  In 1982, eight Apple II+ personal computers
were used to do the Puerto Rican Economic and Agriculture Census data
checking and editing.  These projects established the feasibility of
using microcomputers in much of the Bureaus work.

The Bureaus organization (3500 employees at headquarters, 9000
nationwide) can best be described as 35 separate companies (divisions,
in our parlance) sharing the same resources, computers, management
services, etc.  You can imagine the problem this presents in setting
priorities, standards and general directives.  All of the Census
Bureaus funding does not come directly from Congressional
appropriations.  So there has been a great deal of discussion on:the
best way of introducing microcomputer technology to the Census user
community, funding it and not intimidating or alienating Bureau users. 
In 1983, a joint decision was made to establish the Census
Microcomputer Information Center (CMIC).  The Center with a staff of 4
was placed organizationally in the Director's Office for two reasons:
(1) to show Executive staff support for technology and.encourage users
to make

                                  -6-


active efforts to become familiar with its capabilities and (2) to
avoid turf battles.

The CMIC is a clearinghouse of information for use by Census Bureau
employees.

The goals and objectives of the CMIC are to: 

     - assist Bureau employees in their analysis of microcomputers;   

     -provide access to and demonstrations of a variety of hardware,
software and peripherals;

     -provide hands-on experience with microcomputers without capital
investment by the individual divisions;

     -provide training on microcomputer hardware and software;

     -provide a clearinghouse for documentation, catalogs, and
pointers to knowledge  for microcomputers, end-user computing and
office automation;

     - decrease the cost of hardware and software through more
informed procurement decisions.

With the direction of program managers, Census Bureau employees may
visit the Center for information about microcomputers, for discussions
of the characteristics of particular computers and the applicability
of microcomputers to projects, or for hands-on experience on a variety
of machines in an attempt to implement those projects.  One can use
the computers in the Center for weeks if necessary, experimenting with
various software on different machines.  The role of the Center is to
help Census Bureau staff define their processing needs, advise them of
applicable software and guide them towards suitable computer
equipment.  The CMIC contains the more popular microcomputers and the
more popular software.  Yet, there are significant numbers of
microcomputers that may provide a unique perspective in the industry
and may offer the best overall systems for a particular problem. 
Therefore, the CMIC also sponsors product demonstrations about those
microcomputers that are not currently on display in the Center.


CENTER OPERATION/USE

The Center's hours of operation are 9:00 a.m. to 4:00 p.m. Census
Bureau personnel may schedule time to use a particular machine,
software package, tutorial or specialized peripheral device for one-
hour segments.  They may also request one of the Center support
personnel to work with them.  We generally have personnel in the
Center from 7:00 a.m. to 5:30 p.m. Time before and after hours of
operation is devoted to Center personnel, allowing us to gather and
exchange information on the day's occurrences and to provide
specialized support to executives.

                                  -7-


Some of the typical questions arising on a given day may be as simple
as:
What's the difference between a hard disk and a floppy?

How can I get specific information on a specific product and its
capabilities?


What kind of tutorials/training are available for Lotus, dbase, etc.?


Why doesn't a package perform in a specific manner?


WHAT's AVAILABLE IN THE CENTER

The Center subscribes to approximately 40 periodicals dealing
primarily with microcomputers and associated technology.  About 20% of
these magazines are provided free.  Also the Center has a library of
300+ books dealing with microcomputers, hardware, software, peripheral
devices, etc.  These books are directed at all levels of personnel. 
The magazines and books are available for checkout by Census Bureau
employees.

The Center subscribes to Data Pro for microcomputer hardware and
software.  There are numerous other vendor- or industry-provided
catalogues available for review in the Center:

IBM Personal Computer and XT Software Guide The Blue Book for IBM
Engineering and Scientific Progress The Book of Apple Software
The Ratings Newsletter
IBM Software Directory

Many of the supply and peripheral device catalogues are provided by
vendors Public domain software is available in the Center.  Most of it
was acquired from Capital-PC for IBM's and compatibles and the
Freeloader 500 software for the Apple machine.  This software is not
copyrighted and is available for the cost of reproduction.  We have
found many useful utilities available that have saved our users much
development time.

Microcomputer software in the following categories is available:

     Communications software            Mathematical
     Database management systems        Specialized
     Electronic spreadsheets            Statistical
     Integrated software                Word processing
     Presentation graphics              Utilities
     Programming languages

This software is available for user evaluation.  The end user
determines whether the product will produce the required results. 
About 15% of our software was provided by vendors for use in the
Center -- but only in the Center -- for evaluation purposes and not
for production work.

The Bureau's policy on copyrighted software is that it is not to be
copied for any reason other than backup.

                                  -8-


HARDWARE
There are 2 IBM PC/XT's hooked to a local area network.  A Sperry
Model 50, a Wang PC a Grid, Apple Macintosh, peripheral devices,
plotters,- printers, Polaroid palette, etc. are available.  Many
microcomputer vendors (43 to date) have come to the Census Bureau to
demonstrate their products, and many have loaned their products for
evaluation from 30 to 60 days, depending on product.  Some of the
vendors are:

          A & F Computers          Sony
          Digital Equipment        Fujitsu
          Olivetti                 Motorola
          Hewlett-Packard          Exxon
          Data General             Radio Shack


ELECTRONIC BULLETIN BOARD

In February 1985, an electronic bulletin board was placed into service
to facilitate information interchange on product evaluations, user
projects, etc.


PROCUREMENT POLICY

The Census Bureaus procurement policy evolved because of our
organizational structure and our funding.  While all procurement
actions are to be processed and controlled through the Bureaus
Procurement office, requests for ADP-related actions will continue to
require some specialized processing.

The justification and acquisition approval for microcomputer equipment
and off-the-shelf software and supplies totaling less than $10,000 is
delegated to the Associate Director level.  The ADP staff no longer is
required to review and approve such purchases.  When the purchase
order is sent to the Procurement Office, requests for sole source and
brandname purchases costing more than $500 must include a brief
justification. When the purchase order is received in the Procurement
Office, information copies are forwarded to the Census Microcomputer
Information Center to be used to update the Census Bureau inventory of
microcomputer equipment and software.


MICROCOMPUTER    MAINTENANCE POLICY

The Census Bureau's microcomputer maintenance policy is based on cost. 
For every six machines purchased we purchase a spare machine because
the cost of a one-year maintenance contract on the first six equals
the cost of the. spare.  These machines are not just stored; they are
used in noncritical environments where they can be removed to replace
a critical machine as needed within one hour.  When a user encounters
an equipment problem that is beyond the users capability to resolve,
he or she contacts, our Technical Services Division (TSD) service
representative who will respond by sending a technician to the user's
site to isolate the cause of the equipment problem.

                                  -9-


If it is something simple that the technician can repair on the spot
(such as replacing a fuse, resealing a loose board, or tightening a
plug), the technician will make the repair.  If the problem cannot be
resolved by the technician on site, the technician will telephone the
CMIC to request that a replacement computer or input/output device be
loaned to the user until the user's machine is repaired.  TSD will set
up the replacement equipment for the user (if necessary) and take away
the machine that needs repair.  The user should be able to resume
normal operations with minimal delay, aggravation and frustration.

If the device is still covered by its original warranty, TSD will
arrange to have it repaired under the terms of the guarantee.  If the
warranty is no longer valid, TSD will arrange to take the machine to a
designated dealer for a repair estimate.  When the machine is left
with the dealer, a hand receipt will be signed by the dealer and
returned to TSD.  When the dealer calls the estimate to TSD, TSD will
prepare a purchase request and forward it to the user's division.  The
division will insert the appropriate accounting code, approve the
action, place a priority flag on it, and send it to the Procurement
Office.  The Procurement Office will expedite all micro maintenance
requests by calling the dealer with a purchase order number.  When the
repairs have been finished and the machine is ready for pickup, a
driver will take the purchase order to the dealer and pick up the
machine.  This procedure is valid for any repairs totaling less than
$1,000.  In cases involving repair estimates in excess of $1,000, TSD
will contact the microcomputer user to discuss whether the repairs
should be authorized and, if so, what procedure must be followed.

The loaner machine will be under the control of the CMIC with the
following priorities governing their use:

Top priority -- to any user where TSD has removed a machine for
authorized repairs.

Second priority -- for use in support of hands-on training classes
sponsored by CMIC.

Third priority -- for use by someone who wants to do small projects on
a borrowed machine.

Priority will mean exactly that A broken machine will be replaced with
a loaner from the CMIC even if it means having to take the loaner away
from someone who is using it under a lower priority.  I want to
emphasize that this is our current policy, but it can be changed very
quickly. We are constantly monitoring this procedure and continually
reassessing our options (i.e., outside service contract).


MICROCOMPUTER TRAINING SUPPORT

We established a classroom with 16 machines for hands-on-training.  We
did this because of the numbers of people requiring training and the
cost of sending people to outside courses.  The types of courses
taught are: Introduction to Microcomputers, Databases, Word
processing, Spreadsheets, Graphics, etc.


                                 -10-


Originally there were requests for training of 3000 persons in all
aspects of microcomputers.  That has been reduced to approximately
2100.  We believe this training demand will be high initially and then
will drop off dramatically.  Outside instructors have been hired to
teach our classes. We have had a great deal of success with this
process because of the quality of instructors acquired.  To pay for
this training facility we charge back directly to the user division
the cost of the instructor, software purchased and maintenance cost of
the classroom.  This cost goes to a maximum of $125 per class,
significantly cheaper than to send all people to outside training.

NOTE:     Many vendors sell at a small cost educational licensing
agreements providing copies of their software for each machine in the
classroom. some vendors will not do this; then we must purchase copies
for each machine at full price.


OFFICE AUTOMATION

I have specifically not addressed the topic of office automation, as
we are still planning and discussing exactly what office automation is
going to mean at the Census Bureau.  Our primary planning focus at
this time is to determine what functions need to be provided Bureau-
wide and what functions will be left to individual operating units.

                     THE NATIONAL SECURITY AGENCY
                 PERSONAL COMPUTING INFORMATION CENTER
              Kathy Schnaubelt, National Security Agency


The National Security Agency established a Personal Computing
Information Center (or PCIC for short) approximately a year and a half
ago.  This action was taken in response to the Agency's growing demand
for personal computer products.

In the year prior to the opening of the PCIC, many new personal-
computer products and vendors were reaching the marketplace.  A
growing number of these products were in turn being purchased by a
cross-section of Agency elements.  This mix of products across the
Agency began surfacing problems such as that of system
incompatibility.  This may be illustrated by, the example of a
diskette of data or software running on one computer brand but not on
a different brand of computer.  The PCIC was designed to assist Agency
personnel in the selection, acquisition and use of an established set
of "standard" personal computer products.

The basis for the selection of standard products was determined by the
Agency's needs as a whole.  One such requirement was for the UNIX
operating system.  Hardware selected as the Agency standard
workstation would have to be able to run under the UNIX operating
system.  At the root of decisions of this nature was the concept of
compatible hardware and software products that would be easy for
people to acquire.


                                 -11-


Another important concern for us was security.  By going to
standardization, that problem may be minimized by the selection of
products that meet this requirement and then training personnel to use
them.

A third consideration was supportability.  Maintaining a variety of
microcomputers, or personal computers, can be a logistics nightmare;
stocking of parts, replacing them, etc., in any number can be
devasting.

Finally, there is cost.  By limiting the number of kinds of personal
computers and software products that we use, we are able to buy large
numbers of each at a lower per-unit cost.  Right now we have thousands
of microcomputers in the Agency, and we have plans to buy many more,
which should result in a significant savings from bulk buys.

The PCIC was established to meet the following objectives: 1) to
promote the use of standard equipment; 2) to share an centralize our
small systems resources (like everyone here, we have a limited number
of people to support these products); 3) to minimize the end-user
application load; 4) to maximize cost effectiveness; and 5) to
centralize product registration (providing anonymity in our
workplace).

The PCIC has become a focal point for all Agency standard products,
and to date these products include: an Agency standard
terminal/workstation which is an enhanced IBM XT; the standard office
automation equipment which is the WANG Professional Computer; an
interim standard local area network.  So there will be a family of
Agency standard host computers.

The PCIC provides its customers with information on all of the
standard products that are available; and this includes a reference
collection of books, periodicals, in-house-developed working aids,
research guides, comparison charts of the capabilities of the
different products, and a referral service for technical questions. 
It also provides demonstrations of standard products.  Anyone can go
down to the PCIC and use one of the standard products, whether it's
hardware or software.

To encourage the use of the PCIC by Agency personnel, the PCIC tries
to make the acquisition of standard commercial products as simple as
possible.  Rather than have each office go out and do their own
purchase request, an authorized individual can come into the PCIC and
request commercial software. The software is actually stocked in the
PCIC.  We have licensed some items (like CONDOR and MICROPRO products
for example).  By doing that, we have actually reduced some costs by
70%.

Non-standard products may still be purchased, but on a limited basis. 
A non-standard product must be requested in writing.  This request is
reviewed by a software evaluation team to determine the validity of
the purchase request.  When a product offers a unique capability, it
is purchased and evaluated.  A favorable evaluation results in the
product's being added to the list of standard products.  A product.
which does not offer any capabilities beyond the standard product
line, or in fact is defective, would be placed on a prohibited-
purchase list.  In any case, the PCIC still does the actual
purchasing, whether it's for a standard product or, an evaluation copy
of a non-standard product.  This saves the requester from, the
paperwork of writing a purchase request document.

                                 -12-


While the purpose of the PCIC is to furnish standard products, it also
functions in identifying products that meet certain minimum
requirements for Agency use.  These products are added to the list of
standard products to provide a flexible work environment for Agency
personnel. The goal is not to restrict what people do or how they do
it, but to make sure that the products they use are compatible with
other products used throughout the Agency.

                    USE OF MICROCOMPUTER TECHNOLOGY
                   AT THE BUREAU OF LABOR STATISTICS
               Peter Stevens, Bureau of Labor Statistics


I made the discovery when putting this talk together that I could take
the various displays and shuffle them and present them in almost any
order I chose.  I not quite sure what the conclusion from that would
be, but with this heady sense of freedom, I decided to start in the
middle.  Therefore, the first display you see discusses a brief
introduction as to where we are now.  The Bureau of Labor Statistics
has approximately, 100 microcomputers, almost all of them standard IBM
PC/XT's (see Display 1).  We also have three Ethernet Local Area
Networks two in D.C. and the other in the San Francisco regional
office.  We have network licenses and centralized software libraries
for all of these machines.  This is one point, and the first of the
points which I will be emphasizing, where some of the things that we
are doing that are, perhaps, different from what is commonly done. 
Floppy disks have no essential role in the entire operation.  If I had
my way, I wouldn't have them.

                      Bureau of Labor Statistics
                      Networks and Microcomputers

                           Where we are now

Approximately 100 microcomputers in use (mostly highly modified IBM
PC/XT3)

Three Ethernet (FIPS 107) Local Area Networks, two in DC, the
other in San Francisco.  Software libraries are centralized.

We are close to completing the "large scale pilot" stage of our
development effort.

For each application area our goal is to identify and validate quality
products which can be made a part of the standard BLS microcomputing
environment.

                              Display 1.

In general, the way people get software onto their machines is through
local area networks from centralized storage devices. We are getting
to the end of what might be called the "research phase" of this entire
new technology operation.  The three networks were all acquired by a
competitive

                                 -13-


procurement which we ran a couple of years ago and which is, in
effect, a large-scale+west.

That gets to the last point on Display 1, which is the basic goal for
what we are trying to accomplish right now: to identify and validate
quality products which can be made A part of the standard BLS
microcomputing environment; then, in the next stage of our operations,
to make standards for use throughout the Bureau.


When I looked at Display 2, I decided I could put it up and talk about
it for twenty minutes without any trouble at all because it enumerates
the applications and I think that gives some scope of the project. 
But given the terrible time constraints that we are under, I will
spare you a lot of discussion here.

              The following are major application areas:

Word Processing

Graphics

Spreadsheets
Statistical Analysis

Data Base Management

Survey Data Collection

Survey Control

Project Management

Calendar Management

Network Services, including Electronic Mail, Shared Data Management
and Inter-network Routing.

National Communications via Public Value-Added Networks (X.25 & FIPS
100 standards).

Mainframe Communications Gateways for Interactive and Batch
Operations.
Access to the Local Networks from remote (usually portable)
microcomputers.

                              Display 2.


However, there are two things worth pointing out.  Some may know from
the previous references that "FIPS" stands for Federal Information
Processing Standards, which are produced by NBS and which we are
trying to follow. We have more standards than FIPS 100, and those
things are, in general, a significant part of our operation.

One other point, before moving on here, that I think is worth some
mention: applications like word processing, graphics, and spreadsheets
are stantard and well known; but the applications that I call here
Survey Control, Project Management, and Calendar Management get into a
function for the microcomputer which I don't think has gotten the
emphasis it deserves. This is a Control and Management function.  In
the same sense tnat a microcomputer is a useful tool to use with a
project management package, It is also used and useful for keeping
track of one's personal calendar and the

                                 -14-


ordinary flow of activities through the division. responding to
technology, this is definitely a growing area.

Anyway, enough for the present.  The reason for Display 3 is not so
much a chance to give you the details of how the Bureau operates, but
to make a point that our efforts, in these areas were started in
response to a serious and well-understood operational problem that we
are having.  The large, centralized mainframe computer provides, in
our view, a very poor, very weak environment for the general area of
interactive applications.


                       How This All Got Started

Throughout the 70's the Bureaus approach to computing relied upon two
large, IBM-mainframe, computer centers accessed via dial-up telephone
lines.

While this environment served the large-scale, batch-oriented, survey
processing well, other applications were served poorly:

Interactive applications were very hard to develop, and
response from the mainframe computers varied widely.

Data communications were a constant source of problems, especially
those with our Regional Offices.

The proliferation of incompatible word processing equipment caused
continuing operational problem and prevented any more ambitious office
automation efforts.

The most promising technical approach to solving these problems was:

Powerful microcomputers for interactive processing.

Local Area Networks for the heaviest communications and for
configuration management.

Internetwork and Mainframe Gateways for extended
communications.

Public    Data Networks for national communications.

                              Display 3.

Again, I'm sure you wouldn't like to see me stand here and cry, so
I'll spare you the details of the problems we have had with data
communications since the AT&T divestiture.

The final point under the problem areas is again worth some emphasis.
We have, I think, some thirteen odd different brands of word
processors in place.  None of them communicate with each other.  This
is a story that has

                                 -15-


been, again, welltold.  There was, in the Bureaus top management and
operations management, a perception that this had caused us a great
deal of difficulty and a very strong desire not to perpetuate that
same sort of incompatibility and lack of communication in the new
technology.

The lower part of Display 3 shows briefly what we have selected as the
technological underpinnings of the steps we are taking.  Again, we
could have, a long discussion on say, minicomputers versus
microcomputers and the local area network services, but it is beyond
the scope of this panel. will only mention that these issues were very
seriously considered, and the choices listed were not made lightly.


I would like to draw your attention to the phrase "configuration
management." Having, let's say, several hundred microcomputers all
using the same software packages would not be, in our view, sufficient
to guarantee compatibility.

Companies are constantly issuing new versions, and these new versions
are frequently incompatible with each other.  So you need not only to
standardize with the level of machinery, but you need to do version
control and configuration management to insure that the potential of a
standard environment endures.  One of the major functions of the local
area network is that it makes it really possible to do this.  If we
wish to put up a new version of a particular procedure, we can do so. 
We can test it and then make that transition very easily.

Back when I was planning this, I had visions of myself running down
the hall with 500 floppy discs trying to distribute them. It was the
horror of that nightmare that led us in that direction.

Display 41 "How This All Got Started" is from a configurations
perspective.  I urge you not to take this too literally, but, in
conjunction with Display 3, it does demonstrate the basic structure of
the communications and technical environment.  The large, vertical
black bars indicate the local area networks themselves (that is, cable
connections between machines in a single area).  We use two computer
centers: National Institute of Health and Optimum Systems, Inc.  Those
dotted lines indicate communications through the public telephone
system.

On the networks themselves we basically have two types of devices: The
workstations (that is, machines that people use) and network services
for file storage, printing, Communications, etc.

Now we are at the point where we can get down to the most important
part of this presentation.  One of the things that I would like to try
and share with you, from our experience, is an idea that I call, on
Display 5, "Important Operating Assumptions." An assumption here means
about the same thing that "theory" means in physics or chemistry.  It
means an idea that we believe and accept as true and act upon, but at
the same time are constantly retesting and reevaluating.

                                 -16-


[GRAPHIC] \WP1417.GIF


                    Important Operating Assumptions

No single supplier can come even close to supplying top-quality
products for all our requirements.

The best quality and most creative software development now is being
done by independent (and frequently quite small) Software Vendors.

Standards, de facto and formal, play a much more important role for
the microcomputer market than they do for the mini or mainframe
market.

We can increase effectiveness and reduce risk by emphasizing, open
systems and standards rather than by becoming locked in to one
manufacturer's product line.

The most reliable source of information about new products is our own
testing.

The selection, testing and integration of hardware and software are
professionally very demanding tasks.  Statisticians and economists
should not have to become Microcomputer experts to use the equipment
well.

Quality in the initial selection of hardware and software is only the
start of an effective operation.  Support, maintenance, and especially
release control for software are essential to long-term effectiveness.

Planned and controlled redundancy is the best and, in many cases, the
only way to achieve high reliability.

                              Display 5.


The first four items are a basic description of why we are interested
in open systems" or open-systems interconnection. We have substantial
experience with being in the tender and enveloping grasp of a single
manufacturer and in discovering that manufacturer's products don't
meet new needs, or that there is no way to interface some new piece of
equipment to the existing equipment.

THE MOST RELIABLE SOURCE OF INFORMATION ABOUT ANY PRODUCT IS OUR OWN
TESTING. This point belongs in bold print because that is probably the
essence of the whole project.

The computer business has always been full of what I will call "hype":
statements of doubtful truths, made just to sell equipment.  The
microcomputer business is, if anything, worse than the mainframe side
of the business.  We have found that things like articles and
advertisements in magazines, the flowing promises of salesmen, and
similar frivolities are simply not a basis upon which we can operate. 
We have certain responsibilities to our users in the, Bureau so that
when we say something is going to work, they can expect that it will
work.  We can't then turn and


                                 -18-


say that the salesman said it will work.  Much of our validation is
this testing of the product claims.

The next two items on Display 5 deal with another very important
aspect of our work.  Doing the kind of validation that will cut
through the hype is, in our view, a demanding task and not one which
need be or should be placed upon the working statistician and
economist.  We have a very large number of users that want to use this
technology.  We have a much smaller number that wish to become
microcomputer experts.  We are trying to create an environment in
which economists, statisticians, managers, clerical personnel, and the
whole BLS community can use microcomputers effectively without having
to go through the struggle and pain that is associated with selection,
testing, and integration of the underlying technology.

The last item in Display 5, I think, is very similar to the ones
already expressed by Census.  The way you get the reliability is
through redundancy.  One of the conclusions that followed from that
idea is to use a standard configuration.  Even though a particular
machine may be intended for word processing and the machine next to it
may be intended for statistical analysis, the underlying hardware will
be the same.  So that, if on the day the analysis is due, that
particular machine decides to go out to lunch, the other machine can
be used to finish the job.

We are getting down toward the end, so we can summarize this by
talking about the Project Goals and Current Policies (Display 6).  You
may remember that I mentioned there were three important problems that
this research effort was attempting to address: the need to have an-
environment in which we could create good interactive systems; the
need to deal with our data communication flows;  and a need to provide
effective intercommunication between machines when used for
statistical survey work, office automation, or any other purpose. 
Those were the goals and the motivation to start the project.  They
remain the goals.  Every product we distribute must be thoroughly
tested before full regional use.  Some of the regional offices have
very little background in data processing.  What we put there had
better work, because we don't have the travel budget to fix the mess
if it doesn't.

                                 -19-


                  Project Goals and Current Policies
Project Goals:

To solve the identified major problems with communications and
interactive computing.

To ensure that new products are thoroughly tested before being put
into production system or into all Regional Offices.

To open up new application areas, especially in the areas of end-u3er
computing and office automation.

To establish the basis for the continuing, orderly introduction of
improved hardware and software.

Current Policies:

The selection, evaluation, procurement, and support of new products is
centralized.  Strong, de facto standards exist.

The development of end-user applications is decentralized.

The introduction of new products to Bureau production systems is
closely managed.  Pilot tests are required and high-level approval
must be gained before production commitments are made.

The emphasis on compatibility, full communications, and Bureau-wide
usage is quite strong.

                              Display 6.


Finally, we see this whole technology as having opened up the
potential to get into kinds of applications, that simply weren't being
done at,all by any type of computer,,such as some of those personal
and local organizational ones that I mentioned earlier.  We now need
to establish a basis so that we can continue to introduce, in an
effective and orderly manner, new products and new technology that
continue to pour out of the industry.

From that, we have certain policies: the centralized selection,
evaluation, procurement, and support of new products.  There is some
doubt as to whether we will be able to sustain a centralized
procurement function because of some of the problems in government
procurement which are beyond the scope of this presentation.  In
contrast to this centralization, the development of end-user
applications is decentralized.  That is, the way that persons use the
machines for a particular personator organizational task is a matter
of their judgment and their discretion.

When we are talking about introducing this technology into Bureau
production statistical systems, there is much stronger management
control; and developments are closely watched.  We insist on Bureau
testing and evaluation before committing important Bureau projects to
the new technology.

                                 -20-


I think I have said enough about the need for compatibility.

Finally, on Display 7, under the heading of Where We Are Going, there
is basically more of the same.  I mentioned we are getting toward the
end of the large-scale research phase.  We are planning to add local
area networks into all eight regional offices instead of just San
Francisco.

We have one aspect of the Bureau which may be unique in that the
Commissioner of Labor Statistics has A PC in her office.  She also has
one at home and uses them both.  She has an intense personal interest
in what I call here, "Management Communications." Through the local
networks we have possibilities that we never had before.

Through the research phase of this work, we have not had what I might
call "traditional government procurement cost/benefit justification
analysis" very much.  I expect" as we move to the broader expansion of
microcomputers into Bureau activities, that analyses of that nature
will become important.  There are many areas about procurement issues
that are, at the moment, looking through a glass very darkly.

                          Where We Are Going:

As the performance of specific hardware and software products is
validated, their use will be expanded to production tasks.

The number of Local Area Networks will be expanded to include all
Regional Offices.

The communication facilities will be expanded to include Cooperating
State Agencies for data collection and'survey processing.

Management communications, among the Commissioner, Office
Chiefs and Division Chiefs, will become increasingly important.

The number of microcomputer workstations will be significantly
expanded.  Obsolete or ineffective equipment will be replaced by
microcomputers.

New hardware and software developments will be watched for possible
replacements to standard products.

As the new technology replaces existing equipment and applications,
greater emphasis will be placed on cost/benefit justifications.

                              Display 7.


Display 8 shows where we expect to go technologically.  I ask you not
to take that too literally.  This is not a technical model, but rather
a demonstration of the way we see things getting done with each of the
regions having its own network communicating to our network in
Washington.

                                 -21-


[GRAPHIC] \WP1422.GIF


                              DISCUSSION

                 Lawrence R. Cox, Bureau of the Census


I will attempt to keep my comments brief so that we can have a full
interchange between the speakers and the audience in proper "workshop-
fashion.  In proper "discussant" fashion, I will highlight what I see
as the major similarities and differences among the three approaches
taken, in the context of what I have learned from the presentations
collectively and from my experiences at the Census Bureau.

I have learned that microcomputer technology is a must for statistical
programs.  Automated, interlinked statistical program offices are more
efficient and effective than those which are not.  Users of
statistical information have discovered microcomputer technology; and,
so, statistical data providers have a responsibility to keep pace. 
Data review and analysis at its best is an interactive process between
the expert data analyst and the data, supported by statistical
software.  Mainframe computing cannot offer these services on a large
scale in a realistic manner or at a competitive price.

I have learned that an organizational focus is needed to provide
information and support both to management and users as this new
technology becomes introduced and assimilated within the organization. 
We have seen that such a group can have any of several functions,
depending upon organizational size,.needs, goals and objectives:

     -user education and handholding

     -repository of literature

     -source of hands-on experience

     -maintenance

     -training

     -develop and distribute product lists and recommendations

     -establish guidelines for microcomputer procurement, use,
maintenance, training, etc.   

     -recommend standards for microcomputer hardware, software and
uses of microcomputer technology   

     -establish and enforce such standards   

     -aid in the procurement process    

     -evaluate procurement requests     

     -decide upon procurement requests  
     -advise in the management of this new technology  

     -play an active role in its management


These functions, as I have presented them, lie on a continuum from the
more passive, permissive or experimental approach to the more
standardized, structured, or production-oriented approach.  These
needs and the management philosophies underlying them seem to me to be
well-represented on that continuum by the three agencies represented
here today.

The free-market or laboratory approach adopted by the Census Bureau
says, in effect, let's provide our diverse group of programs and users
with the information necessary to begin to explore uses of
microcomputer technology.  Let's minimize the procurement obstacles to
doing so, and let's work closely

                                 -23-


with users in their applications and see what lessons are to be
learned and what patterns emerge.  In effect, as an organization,
let's not force microcomputer hardware and software choices, but-let's
closely manage and monitor several experiments and learn from each of
them.

At the National Security Agency, decisions were driven by the
overriding need to standardize on hardware and software choices-
sufficiently to allow diverse and distant groups to talk to each other
and access the same data and programs, but stopped short of imposing
inessential standards.  Within a predefined architecture of standards,
NSA users are free to experiment, to share information and to tailor
choices to programmatic and individual needs.

At the Bureau of Labor Statistics, the requirements for good and
standard communications between offices and geographic areas were
paramount.  Experiments were conducted to fix upon the best choices,
from which standards are to emerge.  The environment is intended to be
uniform and capable of supporting continuing, production-oriented
work.

Reflecting upon this continuum for a moment, I could equally describe
it as being from user-oriented to program oriented, reflecting a
progression defined in terms of the number of diverse programs and
functions within these agencies which each agency seeks to address
with automation at the microcomputer level.

Interesting, all three organizations share several characteristics:   
they are not small, they deal routinely with massive amounts of data, 
their paramount concern is improved and broader access to their own
data, their systems require mainframe-gateways or links, and they
operate under strict data security requirements.  However, for reasons
which we have heard and others you may explore in open discussion,
they have chosen three different approaches to tackling the problem of
planning and managing microcomputer technology.

                         QUESTIONS AND ANSWERS

Ql:  How was the Census Bureau able to acquire 500 microcomputers in a
little over one year given GSA guidelines?

Al (Mr. Swank): The Census Bureau did not go around GSA guidelines and
standards, but worked within the existing regulations. Most
procurements are off the GSA schedule.

Q2: What variety does the Census Bureau have in their brands of
microcomputers?

A2 (Mr. Swank): Currently there are 25 different brands of
microcomputers in operation at the Census Bureau.

Q3: Does Census go through the "GSA microcomputer store"    in
procuring its microcomputers?

                                 -24-


A3 (Mr.  Swank): Yes, when possible.  However, the GSA microcomputer
store does not stock all brands, and this forces the Census Bureau to
go elsewhere.

Q4:  Why did Census create a separate staff for microcomputers when
they already had an established automatic data processing staff?

A4 (Mr.  Swank): The Executive Staff of the Census Bureau wanted to
show support for microcomputer technology and to give it high
visibility and, therefore, created the Census Microcomputer
Information Center and placed it in the Director's Office.

Q5:  Has the Information Center taken an active role in education of
upper-level management in the uses of microcomputer technology,

A5 (Mr.  Swank): Yes, each member of the Executive Staff has been
given at least an introductory course on microcomputer usage.

Q6:  The presentation left several unanswered questions that should be

addressed:

1. What about the lack of a management system for electronic files?

2.   How are archiving and disposition of files handled?

3.   What about programming for the PC's?

A6 (Mr.  Swank): Electronic filing systems will come in the near
future.  There are several such systems in existence now, but the
costs are astronomical.

A6 (Mr.  Stevens): Software for record retention currently exists but
the big problem is file retention for which very little software is
available

.Q7: Are the PC's at Census "stand-alone" or are they networked?

A7 (Mr.  Swank): Some PC's are networked others are "hardwired" to the
mainframe; the majority are "stand-alones."

Q8:  Two questions regarding the presentations:

1.   What is meant by "software standards"?

2.   Some software packages need improvements, corrections, etc. In
each agency, does anyone speak to the manufacturers as a
representative of the agency?
A8 (Mr.  Stevens):  "Software standards" means software standards.    
     For example, there are at least three subcategories of word
processing software, and each would have a   separate software
standard at BLS.

A8 (Mr.  Swank):   Corporate licensing would be the answer.  Those
manufacturers that will not discuss corporate licensing have so much
business they do not need to help and keep the client happy.


                                 -25-


A8 (Ms.  Schnaubelt): The focal point for NSA is with the vendor
rather than the manufacturer.  NSA has had problems with RUBIX from
IBM.  The smaller vendors are much more eager to get the business and
give better contractual terms than the large firms.

Q9:  Is there a very strong recommendation from the panel for a PC
information center?


A9 (Dr.  Cox) An independent PC information center is an absolute,
necessity in a large organization.

A9 (Mr.  Swank) Each agency definitely needs at least a resource
person if not a center.

Q10 Would a small group need a PC information center?

A10 (Dr.  Cox): Not necessarily a center, but at least a reference
person.

Qll: Regarding machine-oriented versus people-oriented use of
microcomputers, what would the individual agencies do for the people?
What are the goals?

All (Mr.  Swank): At the Census Bureau, if the individual divisions
have the budget, they will get the microcomputers they ordered within
3 0 days of the request.

All (Ms.   Schnaubelt): The goal is to have a PC on each desk.

All-(Mr.  Stevens): At BLS, the only drawbacks to a microcomputer on
every desk are budget and procurement.

Q12: With the advent of work-at-home, is there a use of portable PC's
for this purpose?

A12 (Dr.  Cox): The major problem with portable PC's for take-home use
is data security -- a large problem for each of the agencies
represented.

A12 (Ms.  Schnaubelt): At NSA, portable microcomputers are used by
executives and others, but these machines are kept "clean" (i.e., they
have never had any sensitive data on them).  The portables are used
for training purposes only.

A12 (Mr.  Swank): The Census Bureau has many "checkout" machines, but
some" of these are secure-machines and cannot be taken out of the
building.

A12 (Mr.  Stevens): BLS definitely believes in the work-at-home
concept and has machines for this purpose.  However, precautions are
taken to protect confidential data.

Q13: How are services provided to field operators?

A13 (Mr.  Stevens): The regions do their own training on the uses of
the BLS system.

                                 -26-


A13 (Mr.  Swank): There is a standardized configuration of
microcomputer technology in each regional office with a nationwide
company contracted to carry out maintenance.

A13 (Ms.  Schnaubelt): Data and software are transmitted world-wide by
mail or other secured means of communication.

                                 -27-


               SESSION ON ELECTRONIC DATA DISSEMINATION

                                   
                            SESSION SUMMARY


The second session dealt with electronic data dissemination, focusing
on disseminating information for use with microcomputers.  While the
first panel discussion focused on.how agencies use microcomputers
within their own internal environments, this session deals with the
impact of microcomputers on users of federal agencies' data and the
possibilities for agencies to make information available for
microcomputer users (that is, dissemination of data using floppy discs
or through telecommunications).

There are some very interesting opportunities for federal statistical
agencies to use new media to provide data to users more quickly and in
a form that is more highly usable than current printed methods.  The
three speakers will deal with these issues.  The first speaker is from
the National Technical Information Service (NTIS) which is primarily
an archival-type agency for disseminating federal data and
information.  The NTIS program to disseminate data on floppy discs,
the problems encountered, and the various issues surrounding this area
will be discussed.

The second speaker is with the Bureau of the Census and works with
their telecommunications system called CENDATA.  CENDATA is used to
distribute perishable Census information to users.

Our final speaker is from the Department of Agriculture.  She will
describe the current, ongoing process to implement a contract with the
Martin Marietta Corporation to establish a telecommunications system
for the dissemination of large databases containing agricultural
information.


         USE OF MICROCOMPUTER DISKS TO DISSEMINATE INFORMATION
        Stuart Weisman,  National Technical Information Service


The history of the National Technical Information Service (NTIS) dates
back to 1945 with the establishment of a publication board to assist
in making unclassified government documents available to the private
sector.  The program went through various transformations, reaching
its current status as an agency of the Department of Commerce in 1970.

----------
*Jay Casselberry, Energy Information Agency


                                 -29-


The law creating NTIS states that NTIS is to search for, collect,
classify, coordinate, integrate, record, catalog, and disseminate
information.  In the early 1970's, NTIS received its first machine-
readable information product.  In 1981 a new unit was established
within NTIS to manage its product line of data base files and
software.  In the summer of 1984 NTIS began to sell data on floppy
discs.

The current NTIS machine-readable-products program contains about 10
bibliographic data bases, 300 source-text non-bibliographic data
bases, 800 numeric and statistical data bases, and 1300 computer
software programs.  With this substantial amount of information
available, NTIS began a review of procedures for disseminating
information products for microcomputers.

The following criteria were considered when NTIS reviewed the
potential for disseminating their information products on
microcomputer diskettes:
     -Forecasts of the number of microcomputers
     -Forecasts of,the primary type(s) of microcomputers being used by
business and professionals
     -Physical size of the computer diskette
     -Microcomputer operating systems
     -In-house and/or contractor production of diskettes
     -Information products to be made available on diskettes
     -Entire and/or subsets of information files made available
     -Production of microcomputer software
     -Whether to reformat the data for use with popular data base
spreadsheet formats

NTIS has decided to make information products available on 5 1/4 inch
diskettes for IBM and IBM-compatible microcomputers.  Diskettes are
produced by A contractor, and costs are determined based on the number
of diskettes required.

The main problems that have been encountered are in the loss or
incorrect conversion of data when tapes or diskettes are produced,
mishandling of diskettes during shipment, and improper use of the
diskettes by customers.  The way to overcome these problems is to
establish procedures for checking a diskette against the original
magnetic computer tape, and to instruct transportation companies and
end-users on the proper handling of diskettes.

In the future NTIS will consider producing information products on
high density diskettes, hard discs, and, where it is practical,
optical or video discs.

With the future increases in microcomputers by business and
professionals, NTIS is making a long-term commitment to having
information products available for microcomputer users.  With -the
proliferation of data
                                 -30-


management and analysis being done with microcomputers, NTIS
recognizes the needs of this user community.  Displays 9 through 16
illustrate the work of NTIS.

           HISTORY OF MACHINE-READABLE INFORMATION PRODUCTS

Late 60's      First machine-readable products arrive at NTIS

Early 70's     Production Group formed to process orders for machine-
               readable products

Late 70's      Concept of Product Management introduced

1981           Office of Data Base Services

1983           Video disc products from NASA

1984           Data files,available on diskette

                              Display 9.

                              DATA TAPES

Over 1,000 Titles           32 Source Agencies

40 Titles Updated Annually  25 Titles Updated 2-6 Times a Year

15 Titles Updated monthly   Remainder Updated Less than Annually 
Standing Orders Available
                                   

                              Display 10.


                        MAJOR DATA COLLECTIONS

National Center for Health Statistics (NCHS)

Federal Communications Commission (FCC)

Energy Information Administration (ETA)/ U.S. Department of Energy

National Bureau of Standards (NBS)

Human Nutrition Information Service/ U.S. Department of Agriculture

Defense Logistics Supply Center/ U.S. Department of Defense

Federal Reserve Board (FRB)

Environmental Protection Agency (EPA)

                              Display 11.

                                 -31-


                   DECISIONS, DECISIONS, DECISIONS!!

Size:     5 1/4" vs. 8 1/2" (3 1/21, not readily available)

Density:  Double vs. single aided;
Single vs. double-density (quad-density not readily available)

MS-DOS vs.  CP/M (or MS-DOS Vs. PC-DOS)

Total in-house vs. contracting-out vs. in-house/out-house balance

Products pre-selected vs. demand-driven selections

Complete files only or subsets/extract3

Software

ASCII only or various DBMS/spread3heet/fo   ta

                              Display 12.

                            DATA DISKETTES

5 1/4" Diskettes         Standard ASCII Fo-t

For IBM-PC Microcomputer      Unique Accession Numbers Assigned

Data Tapes Converted to Diskettes       Documentation Required

                                Display 13.

                        PLAYER RESPONSIBILITIES

     NTIS                     Contractor          Source Agency

Order Input & Control    Create diskette master   Provide master tape 
                                                  diskettes (with
                                                  appropriate
                                                  documentation) 

Copy tape to be used     Archive Master           Available for
conversion                                        consultation        


Ship Orders (with        Duplicate Master
documentation)

Available for            Get duplicates to NTIS
consultation

                         Available for consultation
                              Display 14.


                                 -32-


                              The Action

               Customer contacts NTIS   "Available on Diskette?"

     YES                                          NO

1. Price                                1. Estimate price (based on
                                             #of diskettes)

2. Customer orders                      2. Customer orders

3. Order to contractor                  3. Copy master tape

4. Contractor duplicates master         4. Order to contractor with
                                             tape
5. Duplicate to NTIS                    5. Contractor creates master
   diskette and duplicates master
   for customer order

6. NTIS mails (with documentation)       6. Duplicate to NTIS (price   
   is to customer--overnight delivery    actual # of diskettes)

                                         7.  NTIS mails (with
                                             documentation)
                                             to customer--overnight
                                             delivery
                              Display 15.

                               Problems

Original tape            ----------------    Bad tape from agency

Copy tape at NTIS        ----------------    NTIS error in  copying
tape
Contractor converts      ----------------    Contractor error in
tape to diskette master                           conversion processor
and duplicates master                             duplication process

Duplicated diskettes     ----------------    Problems created in  
sent to NTIS                                 handling of diskettes

NTIS ships diskettes     ----------------(magnetic field, dropped,
to customer                                  smudge, coffee, etc.)

Customer receives        ----------------Customer mishandles diskettes
and processes diskettes                 (see above) plus diskette
                                        processing

                              Display 16.

                                 -33-


                CENDATA: DEVELOPMENT AND IMPLEMENTATION
                 Barbara Aldrich, Bureau of the Census

CENDATA is an information system for disseminating Bureau of the
Census ("Census") information electronically.  Development of CENDATA
began in mid-1983 when Census decided that certain data, especially
time-sensitive economic data, should be available on-line.  CENDATA
was developed under the guidelines that the data should be available
on-line as soon as possible after release and that the system
developed should be done at no cost to Census.

The system was proposed as non-sole source, (i.e., not limited to only
one contractor).  In addition, no money was to be involved in the
arrangement with any contractor, and Census was to have control over
the information made available.  During the entire process of
developing the specifications @and establishing memoranda of
understanding with qualified vendors, Department of Commerce lawyers
assisted in refining the language and procedures.

Census' list of qualifications for vendors wishing to access CENDATA
and make the information available included: 

     -A CENDATA user should only have to pay for time used accessing
CENDATA   

     -CENDATA should be available separate from other data bases, be
clearly identified, and include the entire CENDATA package
     
     -CENDATA must be available seven days a week
     
     -A CENDATA vendor must be willing-to accept data delivery via
telecommunications
     
     -A CENDATA vendor must be able to offer its users the services of
national telecommunications networks

     -The system must be an end-use-based, user-friendly system

The reasons behind the above qualifications were to:

     -ensure that vendors did not add hidden fees or package CENDATA
with other services

     -enable users to use major telecommunications networks to
minimize costs

     -obtain vendors with the capabilities to handle a large-scale
data base such as CENDATA

-    increase dissemination of Census information products.

Of the dozen vendors who have shown interest in the CENDATA system,
four met the criteria established and; memoranda of understanding have
been signed with two.

                                 -34-


The first vendor, Dialog Information Services, went on-line with
CENDATA on August 1, 1984. (Dialog is extremely prominent in the
library community.) Dialog has CENDATA available using the standard
menu-based system and also makes the information available in a full-
text-searchable format.

In-mid-October, 1984, the Glimpse Corporation made CENDATA available.
Glimpse, in cooperation with the Chemical Bank of New York, markets
data to the financial community.

With the success achieved by the first two vendors in expanding the
dissemination of Census data, Census is anticipating adding new
vendors who service different sectors of the public.  With the
inherent advantages of CENDATA over traditional publications, Census
hopes to continue to expand its user network.

The primary advantages of CENDATA are the timeliness of the data and
the ease of using the system.  One of the first goals of CENDATA was
to have sensitive economic information available within minutes after
any embargo on the information is lifted.  Examples of the- type of
sensitive information. available are manufacturers' and shippers,
orders, retail sales, housing starts, and balance of payments.

Having this information available electronically assists users who are
located away from Washington where the information is initially
disseminated in press releases.  The data are available weeks before
users would receive it in published form, and it can be downloaded
into a user's standard information system for review and analysis.

Census also maintains an inventory of its products on CENDATA.  This
allows a user to quickly determine if a particular publication has
been released, and, if so, the price, source, and Government Printing
Office stock number.

The illustrations that follow, Displays 17 through 21,.show how
CENDATA has been developed for ease of use.  Menus are designed to
provide an inexperienced user with a choice of selections, and to move
from general to the more specific.  In addition, instructions are
provided to help a user move through the system.

                    THE CENDATA INTERACTIVE SYSTEM

The Online Information Utility at the U.S. Census Bureau.

A very small portion of the Census Bureaus vast data holdings has been
included in this "information utility."

Do you wish to see the CENDATA menu? If yes, enter Y or (return).  If
not, enter LOGOFF to end session.
?Y
                              Display 17.

                                 -35-


                         -- CENDATA MAIN MENUS

1    Introduction to Census Bureau
          Products and Services
2What's New in CENDATA   

3U.S. Statistics at a Glance  

4Press Releases     

5Census User News   

6Product Infoxmation     

7CMMATA User Feedback

8    General Data   

9    Agriculture Data

10   Business Data

11   Construction and Housing Data

12   Foreign Trade Data

13   Governments Data

14   International Data

15   Manufacturing Data

16   Population Data

Enter item number or ? for help.

?15

                              Display 18.

15--MANUFACTURING

1    Introduction to the Manufacturing Statistics Program   

2M3 Preliminary Report, July 1984
.
.
8    Aluminum Ingot and Mill Products, 
          June 1984 (CIR 1433-2)

Enter item number or ? for help.
?2
                              Display 19.

                                 -36-


15.2--MX3 PRELIMINARY REPORT,
     JULY 1984

1    M3 Narrative Summary     
2value of Manufacturers Shipments  
3value of Manufacturers New Orders
.
.
7    Ratio of Manufacturers Inventories an Unfilled Orders to
     Shipments

Enter item number or ? for help.
?3

                              Display 20.

15.2.3--August 30, 1984

TABLE 2, PART 1: VALUE OF
MANUFACTURERS
NEW ORDERS FOR INDUSTRY GROUPS, MARKET
CATEGORIES, AND SUPPLEMENTARY SERIES

                                                --Seasonally adjusted-
                                                               Monthly
(                                                 Millions of dollars)

SIC                                     Jul.      Jun.      May
Code      Industry                      1984(p)   1984(r)   1984

All  manufacturing industries.          192,450   190,620   193,680

Manufacturing industries
with unfilled orders..............      103,496   102,051   104,482

Durable goods industries............    100,489   99,171    102,,256
     --more-
     Display 21.


After moving through the choices of information topics, the user is
presented with the information requested.

An experienced user may move through CENDATA more.quickly by
specifying all parameters of its search at the same time.  For
example, by specifying 15.2.3 initially, all menus may be bypassed;
and the user moves directly to manufacturing (15), the M-3 report (2),
and specifically the value of manufacturers new orders (3).  This
development allows CENDATA to provide the necessary information and
instructions for novice users without unduly hindering more
experienced users.
                                 -37-


As with any developing system, Census is soliciting comments from
actual and potential users to determine possible system improvements
and expansion of the data base.  The primary users at the current time
are economists,, industry analysts, and market researchers.

Future plans are to, expand the data base with additional Census
products.  Upcoming products to be added are 1984 country population
estimates and statistical profiles of every country in the world. 
With the addition of the statistical profiles, CENDATA moves into a
new area since the information is from the International Data Base
rather than from a publication, and the profiles are not readily
available outside the system.


ELECTRONIC DISSEMINATION OF PERISHABLE INFORMATION
Roxanne Williams, Department of Agriculture


The Department of Agriculture has as a primary function the
dissemination of information about conditions related to Agriculture. 
The Extension Service is one way the Department uses to get
information disseminated at the local level.  In addition, the
Department has long utilized the printed media for the dissemination
of information around the nation.  A few years ago, a number of
agencies in the Department became dissatisfied with the print media
because of the difficulty in getting information to interested parties
as quickly as necessary.  The agencies, acting independently, tried,
electronic communication of data.  Use was made of a number of
commercial services such as 'DIALCOM, AGNET, and AGRADATA.  DIALCOM is
equivalent to an electronic bulletin board.  AGNET is an on-line
information system developed at the University of Nebraska.

About two years ago, the Department started to have problems with the
use of these services.  Other information companies wanted the
Department to provide the data going to existing services.  They did
not want to have to go to competitors for the information for a
variety of reasons.  One reason was that they wanted to be able to say
they obtained the data directly from the USDA.  Supplying each
potential vendor with USDA data was just too much of a burden for the
Department.

In order to continue to get data to the ultimate end user and at the
same time meet the needs of commercial vendors, it was decided to
establish a single department-wide .system of electronic data
dissemination.  No agency will be forced to use this system; but if an
agency decides to use electronic media, it must use the Department's
system.  This central system will then service the commercial vendors,
including DIALCOM, AGNET, and AGRADATA.

The Department decided to limit the scope of the project to what we
call "time-sensitive perishable data." One example of this type of
data is the agriculture marketing reports.  These Are perishable
because they contain the current prices and the current sales of all
the different commodities around the country.  The data are in
constant demand and they are constantly changing as new reports arrive
continuously.  The demand for the quick and timely dissemination of
these data is very high.

                                 -38-


The Department is utilizing commercial vendor, Martin Marietta
Corporation, to provide this service.  This maintains a Department
policy of not allowing public access to the Department's computer.  It
also keeps the Department from establishing a service that can be
adequately provided by the private sector.  Martin Marietta acts as an
agent of the Department and has agreed not to use its position in
order to benefit itself in the dissemination of these data to ultimate
users.  Martin Marietta can only disseminate these data through the
system established for the Department.  Other commercial vendors (we
call them Level I users) can tie into the system with auto-dial or
auto-set facilities.  For a price, they can even have the main
system's computer call their computer as soon as data ate released and
transfer those data immediately.  Thus all vendors will have excellent
and "equal" access to USDA's perishable data.

Equal access also meant to us that Martin Marietta would not charge
other commercial vendors outrageous prices for access to the system. 
We wanted to keep the costs to Level I users reasonable.  Martin
Marietta was very reasonable and agreed to modest and uniform charges.

Ease of access was also important to the Department.  In order to
maintain simplicity and keep programming costs low, we decided to use
a straightforward file structure for the data with access obtained
through a menu-driven system.  The resulting simplicity of the system-
not only makes -for easy access by users, but it also allows
originating offices within the Department to upload files with a
minimum of effort.

Further, the originating offices maintain complete control over their
own data in the system.  They determine when data go into the system,
when they are to be released, and when they are to be deleted.  Martin
Marietta only maintains the hardware and software of the system.

In addition to meeting the requirements of outside (Level I) users,
the system has been designed to the Department's own intern al
requirements for information.  A second type of user (Level II) has
been defined.  Level II users are primarily offices within the
Department and the Extension Service.  Other Federal agencies which
make heavy use of agriculture data will be included.  In order to
service the Level II users, we asked Martin Marietta to allow access
to smaller segments of data.  These users do not need to obtain bulk
data by telecommunications.  The system allows us to break down bulk
reports into smaller segments all of which are accessible via simple
menus.

The Department anticipates that the effect of the new system will be
manifold.  Users should have much better access to a wider range of
information.  Internal communication of information within the
Department should improve significantly.  The demand for hard copy
should be significantly reduced.  All of these effects should help to
reduce the cost to the Department of data dissemination.

                                 -39-


                         QUESTIONS AND ANSWERS


Ql:  What were the particular problems with mailing floppy discs; what
kind of reject rates were encountered; and, if the discs are used for
data transfer, how much of a backup do you need?

Al (Mr.  Weisman): Some problems in handling of the discs during
shipment may have been avoided because we chose to use an overnight
delivery service instead of the Postal Service.  The quality of the
service has been very high, there is very little handling required,
and the service has not failed yet.

Q2:  Did you mention that there were some bad discs that needed to be
replaced?

A2 (Mr.  Weisman):  Yes.  It is very difficult to track down where the
mishandling of discs actually occurred.

Q3:  Is there a flat percentage of reliability?

A3 (Mr. Weisman): The percentage of problems is very small, but it
does occur.

Q4:  Has NTIS considered direct phone transmission of data; that is,
could users call directly to the NTIS computer, similar to commercial
data bases?

A4 (Mr. Weisman): We did make our bibliographic data bases available
similar towhat Census is now doing (as mentioned in the talk by
Barbara Aldrich). That was started around 1974, or perhaps earlier. I
believe there are now four vendors carrying our data base. In
addition, NTIS encourages vendors to carry its statistical files and
source files.  To date, no vendor has elected to carry these files
because it is more difficult to carry these files than a bibliographic
data base.  NTIS has no plans at this time to make these files
available through telecommunications.

Q5:  What are the plans for disseminating data from the 1990 decennial
census?

A5 (Ms.  Aldrich): In terms of data dissemination for 1990 decennial
census data using CENDATA, there are no solid plans, but it is an
issue for thought.  The product information section of CENDATA could
be used as a daily update or product release for 1990.  I believe that
there will be some electronic dissemination, but the amount and the
level are not really being addressed at this time.

Q6:  Please tell us more about the software available through NTIS; is
it public-domain software, software that the agencies have written for
their own use, or some other type of software?
A6 (Mr.  Weisman): While I am the manager for data files and data
bases and there is a separate product manager for software, I will try
to answer your question.  The criteria that NTIS uses for handling
software are the same as, those used, for data files; that is, the
software must be Government-produced. the software must also have a
common usage and be useful to others.

                                 -40-


Q7:  NTIS currently sells a catalog of public domain software for $40
that includes quite a lot of information. Why doesn't NTIS publish
separate catalogs of microcomputer software and mainframe software?

A7 ( Mr.  Weisman): At the present time NTIS only has three packages
available on diskettes for microcomputers, the rest are for
mainframes. NTIS does not convert software at the present time and may
never do so.  Currently  there are not enough diskettes available for
microcomputers to justify a separate catalog.

Q8: Does Census have any feedback from CENDATA, users on the services
and charges?

AB (M.  Aldrich): Yes, based on discussions with users, the charges
seem reasonable.  DIALOG priced CENDATA at $36 per hour, their most
inexpensive commercial rate.  That price does not include the
telecommunications network charge which, with discount, is generally
about $6 per hour.  The Chemical Bank version of CENDATA is priced at
$28 per hour and includes the telecommunications charge.  In addition
to the positive feedback we are receiving on prices, we receive
feedback on what is in CENDATA, what users would like to see in
CENDATA, and what they do not like.

Q9:  Is it possible to download CENDATA data and create other data
files based on this?

A9 (Ms.  Aldrich): CENDATA is all public domain and no part is
copyrighted.  Therefore, it is available for users to download to
their computers or add to other data bases.  This caused a slight
problem with DIALOG because so many of their data bases are
copyrighted.  To end any confusion, a notice was put in the DIALOG
newsletter pointing out that CENDATA is in the public domain.

Q10 (Mr.  Berkman): Would Barbara and Roxanne discuss the impact upon
their particular agencies' personnel who generate the data, in
transferring the data to the two systems they discussed?

A10 (Ms.  Aldrich): I would like to cover the impact in two areas: the
positives and the negatives.  The negative for the people generating
the data is that they must provide it to us in machine-readable form,
either in the appropriate kind of floppy disc or via
telecommunications to our microprocessor.  There are some guidelines,
with respect to designing tables that must be followed, which are
quite difficult.  The industry standard for CRT screens is 80
characters across, so any table must be defined in 75 characters since
the vendors requested five characters for control.  Often tables are
split vertically, with the first part becoming Table 1, Part A; then
the second part is Table 1, Part B; and so forth.  The positive
advantage to people preparing time-sensitive information and providing
the data to CENDATA is a reduction in the interruptions from outside
the agency with requests for data.  Prior to CENDATA, when a data
embargo was lifted, staff members would spend the remainder of the day
answering the telephones and reading data over the phone.  With the
advent of CENDATA, users have an alternative where they can quickly
receive the data.  They can copy the data from CENDATA to their
microcomputers and eliminate the need to listen to it over the phone
and record it.  There are both positives and negatives to the 
individuals who provide CENDATA with the information.  In all cases
the

                                 -41-


individual division which is the source of the data provides the
CENDATA staff with the information.

A10 (Ms.  Williams): Agriculture has designed a system whereby each
agency retains control over its own data.  This is a very sensitive
subject, so the system was designed so that each agency enters its own
data into the system.  Because of the wide variety of equipment used
to process data and create reports by our agencies, the system also
needed to be designed so that the agencies did not need to change
their current methods of doing business.  To accommodate the agencies,
each agency only needs to put a header card on its report to identify
the report.  If a report is to be broken up into different levels of
service, an additional header card is necessary.  Based on the header
card(s), the system knows how to handle the report that follows. one
agency, the Agriculture Marketing Service, required another
accommodation because it used a leased wire service with a special
protocol.  Current users of these data had taps on the wire which were
usually linked .to teletype machines. A microcomputer system was
placed between their system and our system to convert the protocol and
place the headers on the data.  This allowed their system to operate
exactly as it did prior to development of our system.

Q11: Does CENDATA provide a computer tape to its vendors or is data
communicated via telec ications? Also, how often are the vendors'
files updated?

All (Ms Aldrich): All CENDATA are transmitted via telecommunications.
We use an enhanced word processor with telecommunications
capabilities.  Information initially goes into a private file where it
is integrated into our standard system. We review the system exactly
as a user would see it and,determine if there are any problems. Simple
problems are corrected using the vendor's editor; serious problems may
be corrected by deleting the file and starting over. When we give the
go-ahead, the data become available on the vendors' systems.  On
DIALOG the files ate brought up overnight so the data becomes
available the next day.  We update daily based on data to be made
available and changes in our product listings.  The update is
controlled by a vendor's software. We move records into and out of
their systems.

Q12: Does the Bureau of the Census pay for the update costs?

A12 (Ms.  Aldrich): No.  Census developed the menu.  We work closely
with the software design people at each vendor.

Q13: Do the vendors limit the amount of information?

A14 (Ms.  Aldrich) Certainly not in the case of DIALOG.  They have the
philosophy that however much information you can give them they will
accept it. They consider data storage to be cheap and pride,
themselves on being one of the largest vendors.  In the case of
Chemical Bank, they have not constrained us either.  About once a year
they request for planning purposes an estimate of how much storage we
will need in the next two years.  We have a small amount of data
available on-line with a rich potential for it out of hand, but thus
far there are no problems.

                                 -42-


Q14: What were the reasons Census decided not to go sole source?

A14 (Ms.  Aldrich): One of the primary reasons was our objective to
get the system operational as quickly as possible.  By offering it to
several vendors, we could avoid the procurement process.  Another
appeal was that by going with several vendors, CENDTA would be
available to different segments of the community.  With different
vendors it might be possible to reach users that previously had not
been Census data users.  I think that in the case of DIALOG we have
found a lot of librarians who were not previously users.

Q15: Has meeting the different protocol requirements of the different
vendors involved much extra work?

A!5 (Ms.  Aldrich): No, because we have only one system and one format
for the data; each vendor must agree to adapt that format to whatever
they see fit to use.  There is one set of, codes which are very simple
and straightforward.

                                 43 -


                        SESSION ON APPLICATIONS


                            SESSION SUMMARY


The relatively recent emergence of powerful microcomputers (micros)
coupled with the availability of specialized vendor software packages
for micros has significantly enhanced the federal statistical
community's ability to gather, manipulate and analyze data.  Today,
more than ever, it has become easier to perform data analyses
previously considered to be impractical due to resource and time
limitations associated with traditional manual and computer
methodologies.  Accompanying enhanced analytical capabilities have
improved methods for communicating the results of our data analyses. 
Powerful graphics software along with improved graphics plotters and
color displays have made it possible to easily paint pictures
reflecting data. analyses, which before were only possible through
relatively expensive and involved mainframe processing.

The boom in microcomputer usage in the areas of statistical and
economic analyses is due in large part to the many advantages micros
have over mini and mainframe computers.  In particular, today's micros
have storage capacities and processing speeds which often exceed
mainframe capabilities commonly found just 10 years ago.  Micros are
generally simpler and easier to use than minis and mainframes; they
are often portable; and they cost less to procure, operate and
maintain.  Micros are usually more reliable (less down time), and they
often possess the ability to.communicate with minis and mainframes,
which permits micros to access and transfer large data files.

Along with the "hardware" advantages, there are also "software"
advantages associated with micros.  In particular, there is an
abundance of high quality and user-friendly vendor software packages
available, many of which permit the user to add his or her own code to
modify and enhance the package's capabilities.  Relative to mini and
mainframe costs, these software packages are inexpensive.

A few disadvantages of micros should be mentioned as well.  The
ability to exercise security measures and ensure control appear to be
more limited.  Today's micros are slow in comparison to current state-
of-the-art mainframes.  There exist serious compatibility problems of
file structures between vendor software packages.  Finally, there is
often an added personal cost to the micro user in the area of
additional time spent in procurement and maintenance, since these
activities are usually not required of a mainframe user.

The discussions which follow address many of the issues mentioned ----
above.


------------------
*Thomas Nagle, Internal Revenue Service

                                 -45-


         SPREADSHEET AND STATISTICAL/ECONOMETRIC APPLICATIONS
                        IN ECONOMETRIC RESEARCH
          Linda P. Atkinson, U. S. Department of Agriculture

Microcomputers are in widespread use throughout the Economic Research
Service (ERS).  I will be discussing their application not by
secretarial staff for word processing or by data processing
professionals, but rather by the economic research staff themselves.

Our economists first became involved with microcomputers through the
use of spreadsheet software, and this is still where the bulk of the
applications are. Packages such as Supercalc and Lotus 1-2-3 are used
extensively for data preparation, developing tabular reports,
producing high-quality charts, graphs, and plots, performing if-then
analyses, and interfacing with mainframe software.  Some of the
systems which have been developed with these packages are, in fact,
quite sophisticated.

One group, for example, has developed a program using Lotus 1-2-3 to
assess preliminary economic impact of foreign pests to producers,
consumers, and society in general.  A partial budget analysis is used
in which different economic scenarios are simulated by allowing
changes in costs of production, yield, and prices for the affected
crops.  The entire system is menu driven and has options for various
tables and graphs which can be produced.  The program set-up is-being
used as a template from which similar analyses can be developed, such
as a program to evaluate the impact of change in ozone concentrations
on yields.

Another group hail been using Supercalc for data entry and preparatory
calculations before running a program on the microcomputer to convert
the data to-the form required for input to mainframe packages such as
TROLL or SAS.  After running these mainframe programs, files of output
were then transmitted back to the microcomputer and reformulated for
spreadsheet entry so that tables and graphs of output were
automatically generated.  Additional changes in the form of model
output results could then be made, interfacing the flexibility of the
microcomputer with the calculating power of the mainframe computer.

Now this group has a simplified version of their model, the world
grain-oilseeds-livestock (GOL) trade model, running entirely on the
micro in Supercalc.  The GOL model is an annual simulation model
consisting of 27 country and regional models and 20 major agricultural
commodities.  The individual models are linked to solve simultaneously
for a vector of prices which clear world trade.  The global model
system has equations for 339 country-commodity combinations.  Running
a 20-year projection on the full linked model on an IBM PC/XT took 48
hours; however, an individual country model runs in about 15 minutes. 
They hope to improve speed considerably by the acquisition of an IBM
PC/AT with memory upgrades.  The program has been. set up to ask
questions of the user, such as what country is to be analyzed for what
start and end dates.  Users like the flexibility of the spreadsheet
format; one can.get in and look at a simulation, watch the numbers
change and see where any problems are.  Built-in equation writers
allow you to change the structure of a model or you can edit it
directly.  You can pre-create graphs and have them contain historical
data to compare to simulated results.

                                 -46-


A good reference on building such models in spreadsheets is an article
from the February 1985 issue of Byte magazine entitled "Simultaneous
Equations with Lotus 1-2-3." The author demonstrates how to formulate
and solve a famous macroeconomic model, Klein's Model I, using
standard Lotus commands.  The Gauss-Seidel iterative method is used to
numerically solve the system, with a one-line Lotus macro written to
test for convergence.

Another example of Supercalc use is to make projections of coarse
grain production in foreign countries using population projections,
real GNP growth rate, elasticities of consumption with respect to
income, and growth rates of production.  The spreadsheet format allows
the analyst to change one item, such as an elasticity and have
everything else recalculated.  In this way it becomes easy to cross-
check to see if implications of certain assumptions are reasonable.

A planned enhancement to this analysis technique is to begin to use
the regression capabilities of a microcomputer statistical package,
ABSTAT specifically.  Regression of grain conversions over time can
yield estimated elasticities, which can then be put back into the
spreadsheet.

ABSTAT was acquired as a user-friendly package to do basic descriptive
statistics and simple linear regressions.  We have also acquired
SPSS/PC, the micro version of the popular mainframe package.  Many of
our economists are accustomed to using SPSS for analyzing survey data
and large cross-sectional data files such as those provided by the
Census Bureau.  To provide databanking of larger files of which
portions might be analyzed using SPSS/PC, we recently licensed SPSS/X
to run on our in-house minicomputer.  SPSS/PC's ability to handle
"portable" system files which can be uploaded and downloaded easily
aids in forming an interface between the large and small computers. 
We will first apply this in analyzing the results of an in-house
information-needs, survey; complete questionnaire results can be
stored on the minicomputer, with data for particular groups of
respondents or selected variables downloaded to the micro for detailed
analysis without having to be redefined.

We have two packages in-house that can perform more complex
econometric estimation techniques: RATS (Regression Analysis of Time
Series) and SORITEC.   A domestic sugar model has been set up in
SORITEC.  Varioust estimations were performed, including OLS and two-
stage least squares and Cochrane-Orcutt autocorrelation correction for
each equation.  The model was too large at 15 equations for SORITEC to
do maximum-likelihood estimation of it, but the new version, when it
comes, should be able to handle it.  The model was simulated in
SORITEC with the various sets of coefficients and also with various
changes made to the model, for example perturbing an exogenous
variable by 10%  SORITEC has a command to compare actual and fitted
values, computing summary statistics to measure goodness-of-fit.

Because the model is somewhat large, it is run in a "batch" mode with
Wordstar used to edit the SORITEC program.  The model has also been
put up on Lotus 1-2-3 to experiment with the parameters.  Graphwriter
is used to output plots of results.

There is a free version of SORITEC called SORITEC Sampler which has
capabilities of the main package up through, two-stage least squares. 
It cannot perform three-stage least squares maximum-likelihood
estimation or

                                 -47-


handle-nonlinear models.  It produces nice screen graphics of
regression plots including residuals, which can be dumped to a line
printer (but not at present to a plotting device).  While not of
publication quality, the plots are very useful for analytical work. 
For example, as part of a farm production model, an equation was
estimated with prices paid by farmers for feed as a function of corn
price and the price,of soybean meal.  The residuals showed some
problems; an autocorrelation correction was tried and the regression
re-estimated.  The new plot showed substantial improvement in the
residual analysis.

Another analyst uses RATS to estimate import demand for wheat, corn
and soybeans in four Asian countries.  The 10-equation model has been
run through OLS, instrumental variables and Taylor-series
approximations, and he is trying to get around memory constraints
(supposedly temporary until the new release of the package) to do
seemingly unrelated regressions.  The ARIMA time-series analysis
capabilities of RATS were used in this project in determining how to
average prices on a yearly basis" looking at the cross-covariances
between prices and imports to decide on a lag structure.

RATS is also being used to estimate a Canadian grains and rapeseed
model.  Again, a spreadsheet, in this case Lotus, is being used to
update the data and provide graphical output, as well as to simulate
the results.

We have at ERS a number of other software packages for microcomputers
to perform more specialized functions.  GAUSS is a matrix programming
language that allows you to write out an analysis the way you would
write it mathematically.  You can easily write down the estimation
commands for the coefficients of a simple linear model, or the code
for a complex statistical algorithm as it appears in a journal
article.  GAUSS does not currently come with built-in statistical
routines but is planned to in the future.

Another program, TK!Solver, solves simultaneous nonlinear systems,
again allowing you to express the equations similarly to how you would
mathematically.  A package called MUMATH solves mathematical problems.
symbolically land can take derivatives, etc.  Especially useful in
macroeconomic theory, one can change coefficients or other aspects of
a model symbolically rather than numerically and see the logical
implications in terms of cross-relationships that result.

We even have some researchers who use small programs written in Basic
to perform a specific statistical function, such as regression or the
calculation of standard deviations or coefficients of variation,
rather than bother learning how to use a more complete statistical
package.

Finally, I would like to mention one macroeconomic model to which ERS
subscribes, FAIRMODEL which is a model-of the U.S. economy developed
by Professor Ray Fair of Yale University-and programmed for the IBM PC
and XT.  The model consists of  30 stochastic equations and 98
identities and is re-estimated quarterly.  It can be used for
forecasting, policy analysis, scenario development and as a research
tool.  An analyst can run experiments change exogenous assumptions,
enter adjustment factors, or exogenize an equation or block of
equations, and view the results.  An interface to Lotus 1-2-3 can be
obtained with FAIRMODEL to use for setting up an analysis and deriving
tables and graphs from the model output.

                                 --48-


These have been only a few of the very many applications of
microcomputers that we have in-house.  The use of microcomputers has
revolutionized the way our analysts conduct their research.  In the
area of econometric modeling, many more alternatives can be considered
and assumptions tested in a much shorter period of time,, taking
advantage of the interactive nature of the software oh these machines. 
Researchers who in some cases had little computer experience
previously have become proficient with the easy-to-use and flexible
software available on microcomputers, particularly spreadsheets, and
seem to prefer this to the use of cumbersome statistical packages. 
However, now that better statistical software is becoming available,
interest in it is growing.  The economists I spoke with seemed to want
to choose their, own components of an analysis system - spreadsheet,
statistical program, graphics package, word processor - and are
concerned with having good interfaces so they can quickly move data
from one program to another.  Some problems with memory constraints
and speed have been experienced, but hardware is rapidly improving to
alleviate this. There are worries about having errors creep into
programs, especially with spreadsheets that may not be well documented
and might be passed from one researcher to another.  These and
security of data issues will have to be addressed now by analysts who
perhaps had that taken care of for them in a mainframe environment,
but this seems to be a fair trade for the ability to interact directly
with their models and better understand what the data are saying.

Reference

Johansson, Jan-Henrik, "simultaneous Equations with Lotus 1-2-3,"
Byte, February 1985, p. 399.

Acknowledgments

Many thanks to the following researchers who shared the results of
their work on microcomputers: Walter Ferguson, Vernon Roningen,
Michael Lopez, David Weisblat, Suchada Langley, Gary Lucier, Carlos
Arnade, Larry Deaton, Clark Edwards Paul Prentice, and Merv Yetley.


Software Vendors

AbStat                                  SORITEC
Ander-Bell                              Sorites Group, Inc.
P.O. Box 191                            P.O. Box 340
Canon City, Co                          Springfield, VA 22151
81212                                                                 
               

Graphwriter                             SPSS/PC
Graphic Communications, Inc             SPSS, Inc.
200 Fifth Avenue                        444 Michigan Ave.
Walthham, Ma  02254                     Chicago, Il 60611
                                        (312) 329-2400

                                 -49-


Lotus 1-2-3                               SuperCalc 3
Lotus Development Corporation             SORCIM/IUS Micro Software
16l First Street                          2195 Fortune Drive
Cambridge, Ma 02142                       San Jose, Ca 95131
(408) 942-1727

MUMATH                                    TK!Solver
Microsoft Corporation                     Software Arts, Inc.
10700 Northrup Way                        27 Mica Lane
Bellview, Wa 98604                        Wellesley, MA 02181

RATS                                      FAIRMODEL
VAR Econometrics                          Urban Systems Research &     
134 Prospect Ave.                         Engineering
Minneapolis, Mn 55419                     2067 Massachusetts Avenue
(617) 661-1550                            Cambridge, Ma 02138


SPREADSHMT AND DATA BASE APPLICATIONS USED BY THE CROP REPORTING
BOARD IN  REVIEWING SURVEY INDICATIONS AND PREPARING PUBLICATIONS
Gary Nelson, U. S. Department of Agriculture


Our Agency, the National Agricultural Statistics Service, is
responsible for gathering crop and livestock statistics for the
Department of Agriculture.  We make forecasts of the crop size during
the growing season, and final, estimates at the end of the year.  We
have a network of 44 field offices serving all fifty states.  These
field offices regularly survey thousands of operators of fa=s, ranches
and agriculture businesses to gather information about their
operation.  Statisticians in our field offices assemble the
information and make recommendations on such items as acres planted or
harvested, yield per acre or the amount of grain that is in storage. 
They then send indications and recommendations to our headquarters
office in Washington, D. C. where the data are assembled and reviewed,
and U. S. estimates are set and published.

Our state offices are connected to a large computer network, the
Martin Marietta Data System.  The indications, recommendations and
comments are submitted over the network to our office in Washington,
D.C. We have several IBM PC/XTIS in our section, which we utilize
extensively for summarizing data and weighting the data to give state,
regional and national totals, as well. as designing questionnaires and
various othee spreadsheet applications and some graphic applications.

One microcomputer application that we have developed is called the
Grain Stocks Program.  This program produces a report that we release
four times a year, that shows the amount of grain that is in storage,
both on the fa= and what is stored off the farm.  The report was
produced manually in the past and we wanted to put it on the micros. 
In designing this application, we wanted a system that would: be easy
to use, be menu driven, be able to download the data from the data
base on Martin Marietta to the microcomputers; assemble the data,
provide a means for making changes, provide us with summaries and
camera copy that we could use to print the report, provide the ability
to transmit the changes back to the data base at

                                 -50-


Martin Marietta, and provide the capability to compute and print a
balance sheet.  The program uses a combination of Condor and
SuperCalc3.

Another application of the micros is in tabulating and charting data
used in making forecasts on the size of the various crops each month
throughout the growing season.  These forecasts are released on a
specific day each month.  Since the forecast of the size of the crop
can have a definite impact on the prices, it is extremely important
that strict security be maintained in compiling these statistics until
the report is released to the general public.  To insure that the data
are kept confidential, we operate under a "lockup" procedure.  The
members of the Board review the data, read charts, and recommend a
yield for each State and the Region.  The Board then jointly agrees on
a yield for each State to give the U.S. totals.  The biggest use of
the micros in this application has been to assemble the data to a
Regional level and at the same time provide printed worksheets to the
Board members for setting the estimates.  We usually have less than
one hour to prepare the data for review by the Board.  I can enter the
indications on .the PC, and within about five minutes print out the
spreadsheet with all the indications on it.  In the past it would take
almost 6 ne hour with two or three people doing the calculations and
checking the totals manually to complete these tasks.  Furthermore,
these -time savings permit extra time for reviewing the data and
ensuring they are correct.

In conclusion, we find ourselves putting almost all of our
calculations on spreadsheets, and even people who have little
experience on computers are able to effectively use the micros.  In
most cases there has been a considerable time savings, coupled with
improved data quality.

               MANAGER'S PERSPECTIVE ON THE ACQUISITION
           AND USE OF MICROCOMPUTER-BASED GRAPHICS PACKAGES
               Richard W. Bays, Internal Revenue Service


The capability to display statistical data graphically as opposed to
tabularly has been greatly enhanced with the advent of graphics
software packages which can be used on microcomputer equipment.  This
paper summarizes the experiences of one small statistically-oriented
organization, the Projections and Forecasting Group, a component of
the IRS Research Division, in using microcomputer-based graphics to
upgrade the quality and impact of its products.


Mission of the Projections and Forecasting Group (PFG)

Until 1983/1984, the Group's projection activities were completely
mainframe bound.  All consequential projections were performed at a
remote IRS -computer facility in a dumb-terminal, time-sharing mode. 
There was no graphics capability in this system.  Even tabular
information was difficult to extract in a format which was ready for
camera-copy reproduction.

The introduction of 16-bit 10 MB hard-disk micros into the Group in
early 1984 radically altered work processes within six months:

                                 -51-


--   Lotus 1-2-3 spreadsheet software was used to format smaller
     projection projects.

--   A downloading- capability was created so that large-scale
     computations could be done on a mainframe, with numbers dumped
     into preformatted tables.

--   A variety of different tables were created which allowed more
     rapid scanning for errors or problems in projections.

--   Data transmission arrangements were made with key users so that
     data previously supplied in hardcopy only could be provided
     electronically, thereby facilitating further analysis by the user
     without data re-entry.

--   Experiments with Lotus 1-2-3 graphics suggested that much could
     be done to present analytical information and projection
     highlights in pictures rather than words or spreadsheets. 
     Graphic representation of data would expand the managerial and
     executive audience.


Graphics Experimentation

Early experiments in presentations were done using Lotus graphics and
a dot matrix printer.  The Group found Lotus graphics satisfactory but
limited in the quality of presentation both in terms of sharpness for
reproduction purposes (a printing problem) and sophistication (a
software problem).

The problem of quality reproduction was solved by acquiring a six-pen
Hewlett-Packard plotter. Tests and discussions with other
organizations showed eight-pen plotters to be too slow, too
complicated and too expensive.  The second problem, sophistication and
flexibility of presentation, necessitated a software survey.  A number
of different packages were reviewed against survey criteria:

--   compatibility with Lotus 1-2-3 files, 
--   menu structure, 
--   equipment compatibility, and 
--   memory demands.

Chart Star, software marketed by Micro Pro, Inc., was chosen at the
end of this survey.  Chart Star has a wide range of charts and
graphics to choose from and is in all ways superior to Lotus 1-2-3
graphics.  For example, bar graphics can be three dimensional; it has
exploding pie-charts, and a number of other options, all prefaced with
easy-to-use menus.

With both hardware and software in place, the Group began to routinely
use graphical data representations in its reports and documents.  We
discovered that once it was demonstrated what microcomputer graphics
could do, demand for such presentations increased exponentially. 
Consequently, the Group added to its repertoire a software package
called Statmap which permits shaded/cross-hatched representations of
data on maps at the zip code, county, stateand U.S level.

                                 -52-


Some observations on Impact

There is no question that microcomputer graphics have greatly improved
the quality of and increased the audience for Projections and
Forecasting work productions.  Managers and executives-are more aware
of-key trends and have tended to ask for additional data and displays
on them.  There are organizational impacts, however.

Search time Finding the hardware and software which suits the
presentation requirements of the organization requires time--staff
time.  Hardware and software specifications need to be reviewed,
demonstrations arranged and procurement initiated.  Getting the right
technology for your needs requires carving out enough time from
everyday-work to do an adequate job of review.

Implementation - Training employees to use graphics software is not
usually a major issue.  However, selecting the right graphics to
demonstrate the point in question is a more substantive issue.  Doing
so requires consultation and testing and adds to production time.  The
longer the review chain, the more frequent will be requests to alter
graphic presentations or present data,in some other manner.

Integration - Good graphics create their own demand.  We found that
top managers expect textual material with high data content to be
graphically illustrated.  We have not found software which does a good
job of integrating text and graphics for camera-copy development. 
This means graphics and text are separately produced, then cut and
pasted into camera copy.  The result is that making changes becomes
more difficult than simply making textual adjustments, on a word
processor.

Color - All good graphics packages can give video displays of charts
and graphics in color.  With a plotter, camera copy can also be
produced in color as can overhead projections.  The rub develops in
moving from camera copy to production.  Few organizations have the
color xerography necessary to make color reproductions, although this
may be coming.  For the interim, graphic presentations need to be
developed with a black and white final product in mind.

Competition - Good analysts quickly realize that graphic data
representations help sell their products.  Consequently, there is
competition for the use of both equipment and software.  If the
organization has micros with either built-in or external hard disks,
the equipment side of the equation can be solved by loading software
into the hard disk. The number of software packages needed will
depend on the volume output of the Group.  In our case, during peak
production, one package for five, analysts seems to meet the need. 
Supervisors and/or reviewers need also to guard against "over
illustration," a problem which can occur once analysts have seen the
power of graphical presentations.

                                 -53-


       CURRENT APPLICATIONS OF UNIX-BASED MICROCOMPUTER SYSTEMS
             Brian Carney, U.S. Department of Agriculture


The situation in the National Agricultural Statistics Service-(NASS)
is unusual in that several of our microcomputers are based on the UNIX
operating system.  Instead of having just one user on a machine, we
have multiple users and multiple tasks per user.

First, a little about what the Research Division of NASS does.  There
are three branches in the division: Remote Sensing, Yield Research,
and Sampling Frames and Survey Research.  The Remote Sensing Branch
uses a UNIX system with exotic graphics hardware for satellite image
processing.  The Yield Research Branch is using a UNIX system for
editing programs that are submitted to a mainframe computer.  The
Sampling Frames and Survey Research Branch, the group I am in, works
in three general areas: nonsampling error research, area-frame design
and construction, survey design and analysis, and statistical
consulting.

Much of the work of the latter branch involves large datasets from our
agricultural surveys and requires the use of a statistical package on
a mainframe.  We use SAS primarily.

Efforts to reduce nonsampling errors in telephone interviews is what
led to our use of UNIX.  We became research partners with the Center
for Computer Assisted Survey Methods at the University of California
at Berkeley.  That group had been working on a system for computer-
assisted telephone interviewing (CATI).  Using the CATI system, we
replace paper questionnaires with a terminal, and the interviewer can
enter respondents replies directly into the computer.  Error checking
is performed while the respondent is still on the telephone.  CATI was
developed and runs under the UNIX operating system.

For a while back in 1982, when we were just starting our work with
CATI, we tried connecting the terminals over telephone lines at 1200
baud; and at that slow data rate it takes a while to paint the screen,
delaying the interview.  Shortly thereafter, some multiuser
microcomputers became available that ran UNIX; and the CATI system was
ported over to them easily.  The cost of the system was not too bad,
something under $40,000; so we were able to procure an inhouse system
for a work group of about ten people.  That was our introduction to
UNIX.

Once we had installed the machines, it was clear we could do quite a
bit more with them besides run CATI. what we have done falls into the
category of analysts' support. There is a video display terminal on
each desk, with access to the mainframe systems both interactively and
in batch mode.  Programs for analyses are written on the UNIX system,
using the native full-screen editor, and are transmitted to the
mainframe for execution.  This avoids the cost of being online to the
mainframe.  By dialing in to other systems or having them dial in to
the UNIX systems, we can transfer information on the word processors
and PC's in and out of UNIX.  The electronic mail system has been very
useful to the managers; and some of the material usually covered in
staff meetings is now mailed to the staff using UNIX, and they can
read it at their convenience.  The electronic mail system extends to
the research UNIX systems in Washington and the field.

                                 -54-


Of course, UNIX has tools to facilitate programming, technical writing
and publishing; and these are widely used on our UNIX systems.  All
this capability is right there under UNIX; many operating systems are
not so rich.

The communications capability of these systems is such that they are
accessible-from remote dial-in terminals in the same way as mainframes
and minicomputers.

I mentioned the specific use of these systems for CATI research.  The
Agency, because of the interest in making CATI operational, has
procured twenty to twenty-two UNIX-based machines for our field
offices.  Besides running CATI these will be used for direct data
entry, transcribing the responses to paper questionnaires we still
generate in field interviews.

Software costs have generally been lower since one package is purchase
per system.  The individual price is high, but is generally cheaper
than PC software on a per-user basis.

I have mentioned office automation.  All the analysts have a CRT on
their desk, and prepare reports and analyses through that terminal. 
We can edit and review before the manuscripts hit hard copy.  This can
be a real time saver.

There are several spreadsheets available A number of imaginative
simulations have been done on spreadsheets under UNIX by our group. 
Database software is available, but is used now primarily for
administrative purposes.

The statistical analysis capability under UNIX is a limitation right
now.  Probably one of the most complete statistical. languages
available is the AT&T Bell Labs.  However, it is very large and does
not run on every UNIX system because of certain hardware and memory
requirements.  P-Stat and Minitab both run under UNIX, but would have
to be converted to run under specific systems.  SAS might run someday
under UNIX, but probably not for a while.  The SORITEC system that
Linda Atkinson mentioned is available, too, and is good for
econometric analyses.  A system called UNIX/STAT is available for
basic statistics and psychometric analyses.  Because of these
limitations, we do not do, much statistical analysis directly on the
UNIX systems.

Now a little about UNIX itself.  Among its disadvantages is its size:
it requires as much as twelve megabytes of disk for the operating
system and utilities, of which there are hundreds.  The system can
appear to be quite complicated, and it is usually necessary to have
someone available to help out with solving system problems.  The
commands are a bit terse, most only two to four characters long, and
that can be a problem for new users.

Among the advantages of UNIX are its flexibility and power.  You have
an operating system that operates on minicomputers, mainframes and
microcomputers and has the same essential capabilities across them
all.  UNIX is powerful because you can do extremely complicated
functions with a very small number of keystrokes.  The multitasking
means that a user can have several programs operating simultaneously. 
When I use viewgraphs, I

                                 -55-


set each one up interactively, then have the system actually draw the
viewgraph in the background.  While it was drawing, I could go on to
the next viewgraph.

There are hundreds of utilities native to UNIX that are useful for the
full range of tasks from text processing to database to programming.
The UNIX hierarchical file structure is important for managing large
numbers of files.

Some of the new UNIX systems feature displays with multiple windows
that can run different processes at the same time.  For applications
requiring detailed graphics or typesetting, several new systems use
bitmapped screens.  What you see is what you get, even to the
different fonts, special characters, and drawings.  The results,
printed on a laser printer, are quite good.  It is not unlike the
Macintosh, but with a more substantial operating system.

Decision support on a microcomputer is the idea of having all the
functions an analyst needs for assembling, analyzing, and presenting
data, in both graphic and text form.  The systems are flexible enough
to manage both text and graphics in the same files.

UNIX is also developing sophisticated networking to allow shared
access to file systems, but with separate processors available to each
user.

We have found the UNIX systems to be extremely useful because of their
power and the wide variety of utilities built into the operating
system.  But we have been limited, by the small number of statistical
packages available under UNIX, and still rely on a mainframe for most
of the statistical analyses.

                       EQUIPPED FOR THE FUTURE?
            Paul Dobbins, U. S. Department of the Treasury


The Office of Tax Analysis (OTA), which is part of the Treasury
Department's  Office of Tax Policy, is responsible for three major
functions.  First, providing revenue estimates for the Administration
for the budget and its quarterly reviews, as mandated by law.  Second,
providing on demand revenue estimates of tax proposals.  Third,
providing economic analysis of current and proposed tax legislation,
often on very short notice.  Timeliness is of the essence for the work
of OTA to be of any impact during the sensitive, if fast-paced
negotiations that are characteristic of mark-up whenever tax bill is
pending.

OTA specializes in what is called micro-simulation, which in our case
is simply defined as modeling the responses of tax or household units
to tax-law changes on an individual basis and weighting up the sample
resists to get population estimates.  Our data is primarily tax-return
information, but we are making increased use of other sources.

Even though our data files are relatively small samples, they are
still large data sets and have made us largely mainframe bound. (Micro
data does

                                 -56-


not imply microcomputing!) But we have begun to use microcomputers
increase overall office productivity and, having seen the light, we
are attempting to push our frontiers even further out to the limits of
the possible.

We would ideally have a triad of computers/computer systems supporting
our work.  At the top or first level would be the individual
microcomputer workstations bringing the burgeoning new wave of
software development into the hands of tax economists, and lawyers. 
Currently only, our economists and computer specialists have
microcomputers: the Z80-based CP/M machine, the Superbrain (TM), a
fine machine in its day but certainly no longer in the mainstream due
to memory and system limitations.

Our economists have benefited greatly from having these
microcomputers.  Most of the staff were quickly converted into very
proficient users of Wordstar (TM) and Supercalc (TM), and several have
demonstrated considerable programming talent.  The programming staff
has found its burden lightened somewhat by a transfer of focus by
staff wherever possible from the mainframe to the micros.

But as described above, we are mainframe bound by the nature of our
micro-simulation bread and butter.  The mainframe is the third and
fundamental level of our triad.  What we're hoping to implement is a
second level: a mini- or supermicro-based network linking the first
and third levels together into a smooth, efficiently running system.

Why this is needed can be illustrated by a generalized paradigm of how
OTA often does its work.  A particular tax proposal will need to be
reviewed and analyzed in the course of an afternoon.  First, the
appropriate mainframe simulator will be run and the results brought to
the staff, who may then input the numbers into an analytical framework
they have developed (e.g., in Supercalc (TM).  These modified results
then may be inserted into a document residing on a third device, a
stand-alone word processor.  Finally, time being we have results, but
not without considerable carrying paper from office to office and re-
inputting numbers at each step along the way.  This is, I submit, an
old story for many an office.  We have duplication, wasted effort. 
And yet the very presence of microcomputers has made the process
faster and more reliable.

A triad of micro-mini-maincomputers seems to fit our demands
perfectly.  The new Micro-Vax (TM) may very well fit into the-second
slot.  The workstations could easily be IBM PC's and look-alikes or
DEC PC's, while any mainframe fits the foundation (we're running on a
UNIVAC 1100/81,series and will soon have an 1100/92).

Our ideal system would make it, possible to share software and
software tools at the local and network levels, and allow us to easily
move text and estimates across and among all of its levels.  The
intermediate level also goes some way towards narrowing the software
gap between mainframes and micros by providing considerable computer
power and much of the much of the latest software design.  We can only
anticipate the hardware advances that will narrow the gap even
further.

Finally, it cannot be overemphasized that the introduction of
microcomputers has fundamentally altered the way OTA does business.   
But perhaps the

                                 -57-


greatest lesson learned was how much better we can do than our current
partial solution.  In an environment that features considerable
interaction among many different specialists, a network unifying all
computer assets seems to be the only way to go.


   CONCERNS ABOUT DATA INTEGRITY, SECURITY, AND ACCESSIBILITY IN AN
ENVIRONMENT WHERE MICROCOMPUTERS AND MAINFRAMES ARE INTERFACED
             Dick Shively, U. S. Department of Agriculture


A.   Background (History)

The NASS is known as a data collection agency and reporter of
agricultural statistics.  In this line, a substantial amount of effort
is directed toward list maintenance and data collection.

As the agency moved into single- and multiple-frame probability
samples for more and more indications, large-scale computer resources
became extremely critical to allow evaluation of results in a manner
timely enough to meet the reporting schedules, as well as to support
much more exacting requirements for list maintenance.

A large proportion of the field office (SSO) efforts are directed
toward maintenance of the lists associated with their individual
state, as well as collecting and analyzing their data.  The NASS
estimating procedure normally consists of each SSO preparing the
indications and estimates for their individual state; then the Crop
Reporting Board reviews all of these estimates to arrive at regional
and national estimates.

Since many diverse commodities are estimated, and sample sizes are
fairly large to cover the desired geographic areas, data conversion is
a major task.  For this reason, all of the SSO's are mainly equipped
as remote-job-entry sites with high-volume data-entry equipment.

While the NASS has been a proponent of "generalized" software for some
time, operations of this software at multiple sites required that
installation and maintenance activities were duplicated for each site.

The NASS history in mainframe computing has progressed from each of
the field offices (SSO's) being responsible for obtaining their own
computer resource locally to the current approach of providing a
single large-scale commercial time-sharing-network vendor who can
provide adequate resources to satisfy all of the agency's
requirements.

Until the bulk of national interest surveys were processed on the.
time-sharing vendor's equipment, it was sometimes difficult to
determine what procedures were used to obtain indications and even
more difficult to review multiple state outputs from different
reporting formats.  Even shared required modifications to be used on
different brands of hardware, and there was little assurance that
these modifications would always provide identical results.

                                 -58-


The introduction of microcomputers into the NASS offers some
possibility of encountering the same problems recognized when local
computer capacity was utilized.  However, with care in selection of
commercial software to ensure standardization where necessary, these
devices offer substantial opportunities for improvement in personal
productivity.


B. Data Accessibility

The typical processing method for survey data in the NASS for national
surveys is for the Washington, D.C., staff to provide general
guidelines for the type and amount of data validation to take place,
as well as the summarization techniques.  The SSO staff provides
detailed validation specifications appropriate to their state, taking
into consideration any specialized local conditions.  SSO personnel
will collect and validate the data, and summarize to the state level. 
The D.C. staff consolidates all of the state level information into
regional and national values. The majority of the post survey analysis
is also accomplished by the D.C. staff, using the data from each SSO.

For this type of approach, the data values need to be available to
people at widely-scattered locations.  Storage on the microcomputer
devices does not satisfy this requirement, since only one user at one
location can access the data.

The same accessibility requirement also holds for Crop-Reporting-Board
released values.  The D.C. staff reviews the SSO recommendations in
establishing regional and national values.  Following the Crop
Reporting Board review during lockup, the state, regional and national
estimates are made public on a known date and time.  These estimates,
normally released at 3:00 p.m., must be immediately available to each
of the SSO's to prepare reports emphasizing those items of local
interest.  In addition, many people outside of the NASS are allowed
direct access to these published values.

Microcomputer usage in the NASS appears to be best adapted to play a
support function, rather than providing a source of computational
power.  This includes primarily office-automation functions, such as
word processing and spreadsheet analysis.  Because of the volume of
data, the stringent time constraints imposed on processing the data,
and also the geographic distribution of processing, the data needs to
be stored in a common repository.  This allows each individual SSO
accessibility to their own data, while still making the same data
available to the review staff in D.C.

A recent "Viewpoint" column in INFOWORLD made the point that data
security and microcomputer-enhanced productivity are incompatible. 
The relationship is very strong, in that the more stringent the
security measures used become, the less accessible the data is with
corresponding losses in productivity.  This column's analogy is that
security on microcomputers is similar to inventorying pencils and
paper.

One current weak point with data accessibility, both from the
mainframe and the micro standpoints, is a lack of communications
ability.  A reliable file transfer ability, consistent from machine to
machine, and allowing usage of the strong points, of both mainframe
and micro, will enhance our data processing capability.

                                 -59-


C. Data Integrity

Storage of data on microcomputers in the NASS environment, for more
than a temporary working basis, has a tendency to lead to integrity
problems.  This happens whenever the same data is stored in more than
one location, whether it is on a mainframe or microcomputer, or in a
file cabinet.  Anytime that a value has occasion to be changed, unless
all occurrences are simultaneously changed, someone will likely accept
the wrong value as correct.

Version control is another name for data integrity, and when using
individual stand-alone microcomputers this is difficult to handle. 
Each machine has its own copy of software, so whenever a new version
becomes available every micro user must be provided with a copy.  This
is compounded by those machines having only floppy-disk drives, where
the same software and/or data may appear on many diskettes.  To
upgrade to the new version, all copies must be located and modified.

The problem of maintaining data integrity on microcomputers is the
same one we have been battling for years - a single copy must be
identified as that containing the "correct" values, and any accesses
must be directed to this copy.

Local Area Networks (LAN) provide some help in those situations where
data is needed in a small area, where all users can be contained
within the LAN.  In this way, each user of the LAN can have
accessibility to the same values, which will alleviate the integrity
problem.


D. Data Security

The most secure system is undoubtedly manual although it may not be
very productive.  Microcomputers are not the place to store data that
is, sensitive, unless special considerations are made such as a locked
environment and limited access.

Floppy disks are extremely easy to duplicate and just as easy to carry
away from an office without detection.  However, this is a human
security problem and not a microcomputer security problem.  An
authorized person can just as easily remove a printout from a
mainframe computer or pages from a record book in a file cabinet.

A "Jim Seymour" column in PC WEEK suggests increasing security by
locking up, diskettes when not in use, checking them out for usage,
and encrypting the data stored on the micro.  To me, this seems
contrary to the usage of a PC as a productivity tool.

The main security consideration that should be given to microcomputer
usage with sensitive data is the RF or radio-frequency emission
associated with them.  Using fairly inexpensive devices, these
emissions can be recorded from many feet away from the computer and
reconstructed into the data that was being processed.  This can be
solved with Tempest machines or RF shields.

In closing, I would like to say that I think that microcomputers are
an excellent productivity tool.  We need to be aware of their
strengths and limitations when designing projects for them to
accomplish.

                                 -60-


                         QUESTIONS AND ANSWERS


The following discussion reflects questions and answers related to the
"Applications of Microcomputers" Session, which involved the Chair,
Speakers, and certain members of the audience.

Mr. Steele: We have heard several interesting applications discussed
here this afternoon, and the purpose was to give you some sense of the
breadth of applications that microcomputers are being used for.  I We
have seen applications that were very simple uses of spreadsheets,
spreadsheet templates, and data bases, ranging on to econometric
projections, graphics, and then the sophisticated time-sharing
multiuser UNIX-based systems.  We wanted to give you a sense of the
kinds of things that are being done so that we can talk more about
where we are going.  What more can we expect out of microcomputers and
what kinds of developments would we like to see in terms of hardware
and software to achieve greater productivity from the use of
microcomputers?

In that light, I would like to start off and ask several questions.

Ql: Linda, you seem to be a strong proponent of the IBM
microcomputers. Could you explain why?

Al (Ms. Atkinson): Being with the Federal Government, I am probably
supposed to start with some sort of a disclaimer about products;
therefore, my comments do not constitute a product endorsement.

When we first started acquiring PC's, it was done by our Economic
Research Staff; and we had a proliferation of all kinds of machines
including some that are no longer being made.  It became clear that
not only was this hard to-support, but we were going to have problems
in being able to move data from one machine to another and possible
communications problems later should we want to network them.

Also, the situation that we are seeing how is that the newer software
such as SPSS PC and SAS, are being developed first for the IBM
machines and later, if at all, for other machines.  So if you are on
these other types of computers, you are going to lose some time during
that waiting process.  I would say by now that probably about eighty
percent of our PC's are IBM compatibles or IBM.

Q2: Brian, I see you seem to  be a very strong proponent of the UNIX-
based machines; and you give very strong arguments for, well, a good
discussion of the relative strengths of them.  We have heard a lot of
sales hype about UNIX being the operating system of the future. Is
that really going to happen?

A2 (Mr.  Carney): AT&T would certainly have us believe UNIX will be
the operating system of the future.  UNIX offers some special
capabilities that you don't get on other ones.  One in particular is
software compatibility across UNIX machines.  As an example, CATI
software will run on practically, anybody's UNIX box, anywhere.  So,
we have acquired a degree of vendor independence right there.

                                 -61-


But, as far as hardware is concerned, say for example you want to
network machines together; at that point you are down to a level where
UNIX doesn't really do you a whole lot of good, because the individual
vendors choose to implement UNIX differently on different types of
hardware.  There is some effort to relax those restrictions.  Sun
Microsystems has a hardware and software implementation of networking
that can be done that is largely vendor independent.  But that's not
something in general that you can get and walk out with today.

UNIX suffers from being extremely complicated.  As I mentioned
earlier, you have to have an expert around or available when the
machine goes down.  Because of the size and complexity of the system,
it does take the user quit a while to be productive.

In terms of the so-called popular software (Lotus, Symphony, etc.),
none of that is available under UNIX.  Whether it becomes available
depends a lot on what AT&T can pull off with their market.  They say
it will happen, but we haven't seen it yet.

Q3:  Paul, as you look towards implementing your design of the future,
having this triad or three levels, do you anticipate. making them all
one brand or one vendor and one standard software line; or what kind
of problem do you envision in the interchange of data between
packages?

A3 (Mr.  Dobbins): First of all I would say that we don't have enough
experience at the moment to say exactly what it is that we will have a
year or so down the road.  I personally would look forward to having
more flexibility and not essentially going toward one standardized
system or one standardized software.  We ultimately might, want to cut
bait with our mainframe.  As the minis become more powerful and we
have super-micros on the desk, we may begin to use the mainframe for
initial processing and then download a data base to one of our PC
machines.

I am looking forward to experimenting for a while before I will be
able to say too much more.

Q4: Rick, you mentioned that you had some problems with output
devices.

Could you expand on that a little?  What problems have you
encountered, and what could the vendors do to solve some of these?

A4 (Mr.  Hayes): What we are looking for is a good vehicle for
integrating our text with our pictures or graphics.  We haven't found
that and we still are looking.  One of the things we find is that
there are a tremendous number of products out on the market.  We are a
small shop and we are doing our own research.  So far we haven't had
any luck with Text/Graphics integration.  Once, you start
experimenting you find that you can spend a lot of time looking at
various packages.

The other thing that ou will find is that people will suggest
different software you can look at, you can try.  Each of these
experiments takes time out of your production so there is a tradeoff
between finding something that will work for you and finding the best
and latest product.  Once you have found something which works for
you, I think you should stick with it for a while and find out its
complete capabilities.  I When we talk about output devices, we are
having problems withtfinding print devices that will handle

                                 -62-


graphics, pie charts, diagrams, as well as text, at a reasonable
speed.  I think most of the products presently available have problems
with integration of text and graphics.  It is not an insurmountable
problem, but it slows you down.

Q5:  Dick, you expressed several concerns about data security when
interfaced with mainframes. Does that mean that you think people
shouldn't have microcomputers hooked to the mainframes or that we
should lock everyone's door or what?

A5 (Mr.  Shively): All I was saying on that was that the microcomputer
has a problem with security.  Mainframe computers are very well
adapted to securing data, securing devices, making things pretty
secure.  Microcomputers, by design, are a productivity tool.  If we
try to add security to it beyond what is necessary, we have reduced a
lot of the productivity gains that we have possible there.

Mr. Steele: At this time I would like to open the floor up for general
questions.  Please stand and identify yourselves by your name and
agency before asking the question.

Qg:  This is a question that is appropriate to organizations that
haven't yet capitalized on current technology and, are looking to get
into the business.  What is the appropriate level of computer power
for economic feasibility forecasts and statistical work? I wonder if
anybody can comment on the relative merits of waiting until 16-bit  --
-microprocessor technology is developed.

A6 (Mr.  Hayes: As it turns out, we recently had a contractor come
through and evaluate our operation in terms of what ought to be our
computer configuration in the future.  Basically, their conception is
that anybody who is working with any sizable base is not going to get
along without it. Their suggestion is that you I use the mainframe for
your heavy duty computations and as a storage device to deal with some
of the security problems we talked about here.  They then are
suggesting to us, at least, that we look now to microcomputers, 16-bit
machines that can network into the mainframe and be used either in the
stand-alone capacity for test purposes to try out graphics, to upload
and download, or I as remote terminals.

I know we are going to maintain our mainframe capabilities while doing
as much as we can on micros, because micros are much simpler and
flexible to use by analysts.  Where we need to spend time is on
gateway architecture that links our micros to the mainframe and allows
analysts who are not experts in programming to get in and out of the
mainframe and get data in and out of software packages.

A6 (Mr. Carney): In the UNIX-based workstations that I am familiar
with, the real 32-bit technology comes on a single chip with a
powerful box.  I think people are looking towards Motorola 68020 which
should be in production sometime in the Fall, 1985.  It takes a little
while, not I too long, under UNIX for the software to catch up with
it.  It may be a while before it's really fully mature as you are
looking for right now.

You've heard this old song before, but you really want to find what
software you want first, and then-figure out what sort of boxes you
can afford, that it will run on.  You really are looking for
productivity after all.

                                 -63-


Q7:  Each of the panelists has described a decentralized system,
especially one using spreadsheets and/or spreadsheets with something
else.  How do you sure the integrity of the data that you are using;
and second, how do you insure that whatever statistical standards may
exist in your agency are being followed in those decentralized system?

A7 (Mr. Carney): I can talk about it a little bit in the research
environment.  Basically, we always have to use the data on the
mainframe as the benchmark data.  That is the correct data, and
anything we pull off has to be pulled off checking protocols; and we
can't change those numbers.

As far as the statistical standards are for the research unit, you
pretty much have to depend on the review process, review by our peers.

A7 (Mr.  Shively): I second Brian's statement.  We basically consider
the mainframe data to be the official source unless some special
circumstances exist where there is no need for it to ever be on the
mainframe.  But for any data that is shared or that is nationwide in
scope, the mainframe is the official source of control. If you pull a
copy of that to your micro to run it through a model, you are working
with data that at that point in time is not official copy.  It may be
a copy of the official and you can Use it for your model or plan --
anything like that -- but if you want to go to publication with it you
need to go back to the mainframe for the official copy.

QB:  This is a comment. We are using more and more graphics lately.
There are some packages on the market which were released recently
that will capture the picture of the U.S. map -- then you can enhance
that map.  You can add titles, text, whatever you want to these
graphics.  The program works in the background -- one is the graphics
partner. You can call it like the Sidekick package.  It captures the
picture you have on the screen and you can enhance and modify the
picture. You have the integration of text and graphics. I suggest that
you try SMART; they have the software package.  Also, take a look at
GEM which is just coming out by DIGITAL Research.

Q9:  I have a question for Linda Atkinson. I think you mentioned a
spreadsheet that ran for forty-eight hours? Did you have an 8087 (math
ematical functions) processor.

A9 (Ms. Atkinson): Yes we did.  It is a very large model and this is
the simplified version of it on the micro, and yet it took that long. 
It's very large and it ran very long; but yes, it was an 8087 chip in
there.  They are hoping that the AT is going to improve the situation. 
If not, they may not be able to ultimately move to the micro but will
need to use the mainframe as well for that model.

Mr. Steele: I have one anecdote that I would like to share with, you
about computers and applications to computers.  I became involved with
microcomputers in 1978.  By 1980 I had most of my functions already
automated in my office on the computer, and I called up one of my
colleagues who was Secretary of the Crop Reporting Board to get some
information from him on when one of our employees had last been in on
the Crop Reporting Board.  He said, hold on just a minute, I need to
check my data base, and within thirty seconds he had an answer back
for me.  He read back all the

                                 -64-


times that this guy had been in and when he was next scheduled and I
was really impressed.  I couldn't believe how quickly he had all of
that information.  I didn't have a nice data base like that, so I
asked him what kind of data base he was using, and he said, "file
cards."

certainly, the purpose of telling that anecdote is to illustrate that
there are certain applications that are best left to a manual
procedure, and then oftentimes I encounter people trying to automate
procedures that aren't well defined manually.  I think that any time
people try to automate procedures that aren't well defined manually,
they are expecting magic; and they usually end up with a lot less than
what they are asking for.

Ms. Atkinson: I would-like to make one comment about acquiring
statistical software.  If any of you are looking for, software or a
good source of reviews or products that people have tried, there are
at least two electronic bulletin boards that I am aware of.  Capital
PC users group is a special interest group for statistics.  Charlie
Hallihan who is one of the chairs of that group has left some
information on the desk outside which will tell you how to access
their bulletin board or attend their meetings.

There is a SAS users group, even though SAS is not yet on the micro. 
This is a group right now who likes SAS and also uses micros.  I guess
they like to discuss their applications of getting data back and forth
from SAS.  They have a bulletin board also.

A paper was presented on that at the SAS users' group meeting last
month which I think had the phone number of the bulletin board. 
Otherwise, you can contact me.  As I said, there is software available
on these bulletin boards that people are willing to share.  There is a
SAS macro on the SAS users"group bulletin board.


                                 -65-


                       SESSION ON EXPERT SYSTEMS


                           SESSIONS SUMMARY*


Both the DATAPLOT and Editing and Imputation systems described here
were not developed by computer scientists or knowledge engineers but
by subjectmatter specialists who were presented with new tools to
assist them in improving their jobs.  Although "expert system" tools
and techniques were developed by a community of researchers who happen
to call their field "Artificial Intelligence," the tools and
techniques can be considered to be useful in their own right without
the necessity to call the result "Artificial Intelligence" systems. 
In fact, there is good reason to say that none of the existing expert
systems is truly intelligent or even expert.  A true expert has the
ability to learn new rules in his specialty and to apply common sense
reasoning in cases where specific rules don't happen to reside in his
"knowledge base." Both "learning" and "common sense reasoning" are
areas of artificial intelligence research in which there are only a
handful of active workers and in which progress has been slow. 
Contemporary "expert systems' neither learn nor exhibit common sense
behavior when it is warranted.  But, as a set of tools and techniques,
expert system technology has proved to be useful for some specific
applications.  We have seen two examples of such applications today in
Mr. Filliben's DATAPLOT system and in Mr. Greenberg's edit and
imputation software.

I think it is worth noting that these successful expert system
examples were done by mathematicians and statisticians, rather than
artificial intelligence specialists or even computer scientists.  Less
than two years ago, there were dire warnings that expert system
techniques could not be generally applied because there were so few
PhDs being granted to people who had specialized in artificial
intelligence research. what we are finding though, is that the
techniques important for developing expert systems can be taught to
people in other specialties.  In fact, many organizations (.including
the Digital Equipment Corporation and IRS), given the choice of
training artificial intelligence researchers in application domains or
training subject-matter specialists in the tools and techniques of
artificial intelligence have opted for the latter.  The Bureau of the
Census and the National Bureau of Standards show that good subject-
matter specialists are perfectly capable of learning the techniques
without any deliberate training program by their agencies.

One of the reasons  to have a panel on expert systems at a conference
on statistical uses of microcomputers is that such systems.as DATAPLOT
can be adapted to personal computers as soon as the personal computers
are powerfulenough to accommodate them.  To expand on that theme please 
note that 
________________________________

*Norman Glick, National Security Agency

                                 -67-


today's personal computers are already much more powerful than the
"supercomputers of the early 1960's".  We are guaranteed, given what
can already be seen in computer engineering laboratories, that the
inexpensive personal workstations of the future will be powerful
enough to accommodate the kinds of systems that need mainframe
computers today.  But the existence of that future power for the
benefit of an individual will make even more important some of the
research that hasn't been discussed today but is part of what the
artificial intelligence research community is concerned with.  The
ability to provide a user model to accompany an expert system
addresses some of the points made in the talks and the question period
today.  Users do have different levels of sophistication and expertise
of their own.  We would like the system to accomnodate to the needs of
the user, even to adapt to the changing expertise of a single user. 
The same person who might need substantial help in using a system for
the first few times, might ultimately consider verbose assistance to
be a nuisance.  The ability of a system to adapt to the evolving needs
of such a user is a subject of active research in the artificial
intelligence community today.

Since it was announced that this session might be on "pure fantasy,"
and since what we have heard from the Bureau of the Census and the
National Bureau of Standards has been on eminently practical systems
(whether they are called expert systems or not), perhaps we should end
with some speculations that some might consider fantasies.  One class
of artificial intelligence research that promises to have relevance to
statistical systems of the future is automatic programming.  Both the
editing and imputation and the DATAPLOT systems required that the
statisticians and mathematicians write programs.  Whether they were
intentionally building expert systems, unconsciously building expert
systems or simply writing a program to assist in statistical analysis,
they needed to provide significant detail about how the computer
should do what they wanted.  If sufficient expertise of the
programming art can be captured in an expert system and can be
combined with sufficient expertise in a particular domain, even one of
the domains we've heard about today, then the combined system might
permit a user to state what he wants done, rather than the details of
how he wants to do it, and a program to perform the job could be
generated automatically.  To a modest extent, so-called fourth-
generation languages provide existence proofs of such systems today. 
These fourth-generation systems work in very limited domains (e.g.,
payroll And inventory control), but there is substantial research
aimed at increasing the set of applications for which such approaches
are practical.  Some even see this class of activity as the future of
software engineering.  Please note that some differences exist in the
life cycle of standard software relative to the life cycle of
"artificial intelligence " software.  It is clear that current
software engineering techniques will not provide the quantity and
quality of software required in the future.  More statisticians,
mathematicians, and psychologists will need to tell computers what
they want done in the future without computer-specialist
intermediaries.  Let's hope the automatic programming "fantasy"
becomes less fanciful so that, in the future, more subject-matter
specialists can be their own "knowledge engineers' rather than to be
dependent on programming specialists.  Statisticians shoudn't have to
spend inordinate time learning the details of how to use specific
computers when their talent is to apply their mathematical and
statistical knowledge.

                                 -68-


                             INTRODUCTION

                Terry Ireland, National Security Agency


It's possible that the organizers of this workshop, and I was one of
them, wanted to have one session on pure fantasy and this might be it. 
Building expert systems that model in software the behavior of human
experts, and evolve in a natural way so you can more clearly
understand the expert, is so clearly impossible that there must be--
and is--an unlimited amount of high-priced advice on how to do it.
Some statisticians may argue that random sampling and surveying
procedures on computers can already model the experts, so we really,
perhaps, have two questions:

     What do we mean by "an expert"?

     If we know what an expert is and if we have one on hand, how do
     we go about modeling his behavior?

In order to give some more practicality and reasonableness to this
presentation, we have made absolutely certain that none of the
speakers is a computer scientist.  However, they are skilled
developers and users of software systems, and they have built expert
systems.

George Lawton is a psychologist with the Army Research Institute.  He
has an interest in systems that support the interface between human
factors and computer science.  He will give the introduction to expert
systems.

Jim Filliben is a statistician with the National Bureau of Standards
who has an interest in systems that model and support statistical
expertise.  In fact, one of his software systems is said to be the
most requested piece of software from NTIS.

Brian Greenberg is a mathematician with the Bureau of the Census.  He
has an interest in expert systems for data editing and imputation.

Roughly a year ago I gave a talk on expert, systems -- an abstract
talk because my practical knowledge was limited.  After the talk,
Brian came up and observed that he wasn't sure about the jargon that
was being used, but he felt that he had built an expert system.

The Rapporteur is Norman Glick.  He and I are both computer scientists
and he may try to have the last word.  Mark Winer, an economist from
the Office of Management and Budget, is our Discussant who will keep
us honest.

Computer scientists are trying to create tools for the development of
expert systems and to make them commercially available.  They are also
trying to give the impression, that they are the most skilled at
eliciting information from experts.  Thus, the once humble programmer
now calls himself a knowledge engineer.  Ultimately, a knowledge
engineer is a person (sometimes statistician, psychologist or
mathematician) who makes the most substantial effort to understand and
model the expert.

                                 -69-


                        EXPERT SYSTEM TUTORIAL

                George Lawton, Army Research Institute


A number of things have happened recently that would lead me to change
some of the things I say today had I the opportunity to do so. 
Fortunately, some of those things are available to you through your
local newsstand.

One was the publication last month of a magazine called PC, not to be
confused with PC World and PC Junior, which has a section describing a
number of proprietary software packages which are available on
microcomputers; for developing your own expert systems.  Anyone who is
inclined to go out And build an expert system may look at this article
and review the software.

The other was my attendance last week at a conference at Bell Labs
which brought together a number of computer scientists and
statisticians who were all interested in what I am going to be
discussing here this afternoon was, surprised to find that there are
at least sixty people from the United Kingdom and the United States
and at least a couple o f European countries who are interested in
this subject.  No less than John Tukey of Princeton University
believes that this is the next wave of software for statistical
applications.  It seems to me that this is something of a coming
concern.

Expert systems come out of laboratories for research in artificial
intelligence.  Artificial intelligence, as I think everybody in the
world now knows if he reads the popular press, is a line of research
developed at MIT, Carnegie-Mellon University, and Stanford University
in particular, concerned with building computing machines which will
emulate high-level human cognitive capabilities.

In the earliest incarnation of artificial intelligence, primarily at
Carnegie-Mellon University, researchers tried to develop very powerful
general-purpose problem-solving algorithms which would give a user
appropriate support in tackling a problem that a human expert could
solve.  Those programs were largely failures.  As a consequence of
those failures, research in artificial intelligence has converged on
one organizing theme in the past decade: to be intelligent, computer
programs have to be able to access large bodies of knowledge.  It
isn't their deep problem-solving capabilities that make people
intelligent, it's the fact that they know a lot about the world in
which they operate, and so it must be for programs.

An outgrowth of that discovery is a type of computer program which is
essentially nothing but a large collection of knowledge and a
relatively simple mechanism for accessing that knowledge and using it
to solve various problems.  These programs are called Expert Systems.
I intend to talk about what expert systems are, about how their
software is written, about the software techniques that expert-system
developers use, and finally about what statisticians might want to do
with expert systems.

                                 -70-


             ARCHITECTURE BASED ON THREE SEPARATE MODULES

                        1. KNOWLEDGE BASE

                         2. INFERENCE MECHANISM

                        3. USER INTERFACE

                              Display 22.


Following good programming practice expert systems are modular and
they usually have at least three fundamental modules.  The first is a
collection of facts and rules in a knowledge base; the second is a
relatively simple program evaluator we call an inference mechanism. 
The third is the interface with the user which gives the user the
illusion that he is dealing with something that's intelligent (see
Display 22).  That's what, makes the machine capable of pasting what
we call the Turing Test. This test is very simple.  If a person acting
as judge asking questions cannot tell the difference between a machine
and a human any more frequently than he can tell the difference
between a man and a woman, then the machine must be considered
intelligent.  Of course certain rules prevent the obvious shortcuts
(for example, a terminal should be used to ask and to answer the
questions).  And, of course, the respondents (woman, man, machine) are
not required to tell the truth. In the knowledge base we have some
knowledge, and it has to be represented in some form that can be used
by the computer.  Almost every expert System that I have had occasion
to study uses one basic knowledge representation (possibly
supplemented by some others): some kind of conditional structure we
call production rules.  The fundamental idea of a production rule is
strikingly simple.  It has two parts.  The first asks if something --
tome state of the world -- is true. The second part takes some
specified action if that something is true.  Programmers know them as
if-then constructs in programming languages.  In conventional programs
they are usually scattered throughout the program text.  In a
knowledge base they are collected together into a list of rules.


     1. Production Rules

          There are two classes of production rules.

          A.   Situation-Action Rules (which are essentially data-
               driven procedure invocations)

               e.g., if the data are skewed, then call a re-expression
               procedure.

          B.   Conditional Assertions

               e.g., if the case has V1 = 10 and V2 = 20, then it is
               an outlier.

                              Display 23.

                                 -71-


Production rules really break down into two different classes.  One of
them is something we call a Situation-Action Rule.  It's essentially a
pattern invoked program.  The other is really a conditional assertion
(see Display 23).

Most expert systems are based on one of those two kinds of production-
rule systems.  There are at least two other ways of structuring
knowledge that are widely used in expert systems.

Conceptual Networks. First, you may have a large number of concepts to
represent in the knowledge base, and the concepts might be related to
each other (for example, a class-instance relationship or a set-subset
relationship).  There are species of animals, and each animal species
has members.  It doesn't make any sense to represent all of the
features of each of those members, so they may be organized into
conceptual, often hierarchical, networks which show these
relationships.

Frames and other structured objects.  This is really an extension of
the idea of a record with the record containing not only the usual
type of data, e.g., name or age, but also other records as data. 
Moreover, the data can be specified as a computation, e.g., if you
know the person's year of birth and you know the year, you can compute
age without storing its actual value. Frames also contain default
information to be used or computed when the required data is missing.


     1. Forward Chaining

          e.g., If P then Q;

          If Q then R;,

          If R then T;

          P therefore T.

     explanation

     By P infer Q; by Q infer R; by R infer T.

     2.   Backward Chaining

     e.g., T if R;

     R if Q;

     Q if P;

     P therefore T.

explanation

     To show T, first show R; to show R, first show Q; to Show first
     show P; P is true.

                              Display 24.


Inference mechanisms.  Because we are talking primarily about
knowledge that is represented as conditional structures, we must have
some logical reasoning process to make use of them.  For example, we
can use some of the

                                 -72-


basic rules of logic: if we know that Proposition P is true and if we
know that Proposition P being true implies Proposition Q is true, then
we can conclude from these two true statements (one a fact, the other
a general reasoning procedure) that Proposition Q is true.  Or, we can
reverse the process, starting with a goal to show Proposition Q is
true and reasoning backwards, looking for a sequence of statements
that could bring us to the desired conclusion.  Again, most expert
systems use some variation of the first and second form of reasoning
shown on Display 24.

Propagation of uncertainties, statistical reasoning.
reasoning, which we will come back to in a minute, attaches to each
conditional structure something called a certainty factor. 
Statisticians might think that the certainty factor might be a
probability because some vary between zero and one or maybe a
correlation because others vary between minus one and plus one.  In
general, they are not that well motivated -- they are ad hoc.  They
are just numbers that somebody pulled out of a hat, saying this is how
certain I am that this fact is true.  The question is, how do they get
propagated through a sequence of inferences? This is a difficult
problem about which there has been much recent discussion.

Inheritance of Properties: Again, we can represent objects in terms of
a network of class relationships.  By default, certain things may
inherit properties from their class.  If Fido is a dog, then Fido is
warm-blooded, because Fido is a dog and dogs are warm-blooded.

Heuristic Rules: In certain cases, no straightforward and logical
procedure may apply, in which case you may apply what we call a Risk
It Rule.  It's a rule of thumb that says "if we don't know any better,
do this.  

Meta-reasoning: Last, but not least, an expert system may have rules
about rules, a form of reasoning about process often called meta-
reasoning.  It deals with reasoning about the representation of the
problem.  All of these methods have been incorporated into the
handling of expert systems.


     USER INTERFACE

     1.   Read user input.

     2.   Provide user with useful output.

     3.   Explanation facility, to give the user a useful trace of the
          program's inferences.

     4.   Knowledge-base input and editing.

                              Display 25.

This is the most important part of an expert system.  The
distinguishing feature of an interface for an expert system, and this
is a well-designed interface, is Item 3 (see Display 25).           
Expert systems, unlike other computer programs, explain the
conclusions they reach.  I would say there is no other really
necessary feature for an expert system than this ability to

                                 -73-

A third form of


explain.  Previously, I showed several logical forms for the reasoning
process.  They provide examples of what an explanation might look
like.

There you have a crude expert system.  They are relatively unfriendly. 
They just say "by this rule, I infer this; and by that rule, I infer
that" and so on, until-you get to a conclusion.

Other systems are much better at knowledge representations, including
diagrammatic representations of good inference and explanations in
good English (see Display 26).


Traditional Software Engineering -- Linear Program Development

1.   System requirements

2.   Software requirements

3.   Preliminary design
4.   Detailed design
5.   Code
6.   Debug
7.   Test
8    Use
9.   maintain

                              Display 26.


How do you go about building an expert system? The methodology that
most people use is a little different from the methodology you might
have learned in a basic programming or a software engineering class. 
Display 27 shows the steps that a conventional programming text might
tell you to follow when building software. By the way, expert systems
are just enlarged software systems.


Alternative Approach
     Cycles of Progressive Refinement
     
          Preliminary Requirements
          
          Preliminary Knowledge Engineering

          Prototype I

               1. Design

               2. Coding
     
               3. Debugging

               4. Testing

          Prototype II

               1.   etc., etc., etc.

                              Display 27.

                                 -74-


This is an alternative list that is used by most people who have
developed expert systems.  Rather than starting from the
specifications and going step by step by step through the program
development, requirements analysis and the rest to a final software
product, expert-systems developers follow an iterative process which
begins with a small program that is written and tested, then
elaborated, and written and tested again.  In fact, the programming
language used in this development methodology was designed to support
the iterative and experimental development of software.  The ability
to express your ideas in a high-level flexible language enables the
programmer to develop rapid prototypes or models of the system he
wishes to define.

LISP is the language of American artificial intelligence research. 
Notice that I said research.  It's the language of artificial
intelligence research.  That may not mean that it's the best language
for artificial intelligence implementation.  It is a functional
programming language.  That means that programs written in LISP are
functions.that can be passed around just as ideas are passed around
and used where appropriate.  They are more like mathematical functions
in the sense that they have the mathematical properties that you
associate with function, rather than properties that you would
associate with FORTRAN function.  That's a formal statement that I
don't really want to defend any further than to say it really is true.

          Four Basic Components At The LISP Top Level

                         (print (eval (read)))

The first three are in the top-level loop of the LISP interpreter.

1. Reader

2. Evaluator

3. Print

4. A table of LISP objects which serve as a data base.

                              Display 28.


LISP provides a series of operations that you would have to make for
yourselves if you were-going to write a program in something like
FORTRAN.  When you invoke LISP on a computer, you are invoking an
endless repetitive loop which looks like this.  That is a top-level
interactive computing environment from which you can either ask for a
computation or define a new function very much as you would interact
with another person.  Sometimes you are computing values or
ascertaining facts; other times you are developing new ideas.  Both
activities are done in the, same environment.  The innermost
expression is ready and waiting for you to type something into the
terminal which it will read.  Then there is the high-level functional
evaluator which knows how to evaluate any well-formed expression in
LISP.  Then it will print out the results of that.  It continues to go
through this loop.  It is like a conversation (see Display 28).

                                 -75-


Invoked by the functions in this loop are three programs: the LISP
reader which includes both the ability to scan the characters entered
and to format them into a LISP expression, the LISP functional
evaluator which can evaluate or compute the expression entered, and
the LISP printer which knows how to format and print the results of
the evaluation.  Moreover, the LISP reader also stores information
about the names or identifiers read in.  Most of this information is
stored in a table that holds rules, values and names.  This table is
useful as the data base.  It is more than a simple table produced, for
example, by FORTRAN.

Why would you want to use LISP? Because LISP is interactive, you can
write a program and see how it works almost immediately.  Compilation
is unnecessary.  Its modularity enables you to write small segments of
code in the form of functions, checking each one as it is written. 
LISP doesn't require declarations, although good programming practice
suggests they be included in the final product.  This enables you to
develop functions quickly for experimental use.  LISP dynamically
allocates whatever kind of data structure you want to use.  That means
when you call in a function, LISP will make it immediately available
to you.  This means LISP handles all storage allocation for you,
allocating it when needed, cleaning it up when you are finished with
it.

Because there are no differences in LISP between programs and data
structures, LISP can be represented as lists just as if it were
another data structure.  Therefore, it is easy for LISP to reason
about or deal with its own functions just as humans examine their own
procedures.  As, a consequence, LISP provides sophisticated tracing
and debugging capabilities.


PROLOG

Prolog is based on a general-purpose pattern matching and inference or
theorem-proving mechanism called unification resolution. It is based
on a formalism symbolic logic called first order predicate calculus.


Basic Components:

1.   Read and Print

2.   Procedure evaluation based on unification-resolution, implemented
     as backtracking search.

3.   A data base which contains the definitions of procedures and the
     facts needed by the program.

                              Display 29.

PROLOG is a sort of second-class language for artificial intelligence
research in the United States, but it is gaining adherents (see
Display 29) It has a big following in England and in Europe.  PROLOG
stands for PRogramming in LOGic.  The idea is that PROLOG is a
language based on a subset of first-order predicate calculus called
Horn clauses.  It makes use of inference procedures used in proving
the correctness of logical

                                 -76-


statements.  It allows you to write computer programs simply by making
statements about what you want to be true in a clause form.

PROLOG provides many programming structures you would otherwise have
to build.  It can read program goals or procedures and print out the
results for you.  It's interpretive and it gives you all of the
facilities of allocation of storage and reformation of unused storage
in a manner similar to the support environment in LISP.

You do not need LISP or PROLOG to build an expert system--but it sure
helps.  The two speakers who are immediately following me are going to
talk about expert systems they built in FORTRAN.

What's important is to identify and to make use of the distinguishing
features of expert systems: the abstractions and program structure.

How do we use expert systems in statistics? This is a review of
suggestions discussed at the conference on Expert Systems and
Statistics that I attended last week.

First of all, I don't know how many of you are statisticians; but if
you are, your knowledge, as distinct from the knowledge of laypeople
who come to you for consultation or as distinct from pure subject-
matter knowledge (for example, economics) consists of a collection of
strategies for working with a set of data.  And those strategies could
probably be readily represented in the form of an expert system which
knows what test to do next.  Parts of the data need to be cleaned up. 
One of the most active areas I of research that I have found is the
specification of individual statistical strategies in the form of
expert systems, either as interfaces to existing statistical acronyms
like S or SPSS or SAS, or in the form of a complete statistical
system.  Nobody wants to suggest that you can improve upon the
capabilities of these statistical packages.  What you may want to
suggest is that it may be possible to improve their usability by
adding something between the user and the package.

Another area of research concerns reasoning with uncertainty. 
Statisticians have something to say about that.  I mentioned the
adding of certainty factors to knowledge bases in most expert systems. 
An active area of research is to determine how those certainty factors
should be propagated through a rule system and how certain conclusions
can be based on uncertain knowledge.

Existing statistical software deals only with statistical ideas at the
lowest level.  It provides code to do things like least-squares
fitting and so on.  We want to use the ideas of expert systems to move
software up one more level to deal with the abstract ideas of
statistical strategy--the choosing of statistical methods and
selective analysis of data in light of these methods.  Our success in
this area depends on the development and use of modern programming
languages and on the development and use of expert system models.
                                 -77-


        AN EXTENSION OF STATISTICAL SOFTWARE TO EXPERT SYSTEMS
            James J. Filliben, National Bureau of Standards


The outline for this talk falls into four general areas.  We are going
to be talking about the real relation of an expert system to a
particular piece of software, namely, DATAPLOT.  I am going to speak
first about DATAPLOT to show how the expert system can be described
with respect to, DATAPLOT.  We are going to be talking about the
general structure for the intelligent subsystem -- expert subsystem --
and DATAPLOT and go into the interpretation of conclusions mode and
the analysis guideline mode.  The last mode deals with providing a
guide for carrying out data analysis.  We will go through a particular
data problem.

DATAPLOT is a high-level interactive statistical system with its own
language, a high-level language with English-like commands.  It was
designed at the National Bureau of Standards (NBS) in 1977.  The
National Technical Information Service (NTIS), has been distributing
it for the last three years.  The software is written in FORTRAN.  The
cost is $1200.  It is the most heavily distributed software of its
type at NTIS. , It has been installed at about 200 sites.  Next, year
it, will be the most heavily distributed piece of software, period. 
Its primary capability is graphics.

That means it can run on Tectronix, HP and various other graphics
terminal devices and on a variety of mainframes.  It has both analysis
graphics and presentation graphics.  There are extensive additional
capabilities in graphical data analysis and nongraphical data
analysis, modeling and fitting, mathematics and diagram graphics.

At the National Bureau of Standards (NBS) we are interested in
modeling and fitting data  In particular, we are often interested in
fitting nonlinear models.  Moreover, we make extensive use of applied
mathematics and diagram graphics.  By diagram graphics I mean the
construction of schematic diagrams.  The NBS is an engineering and
scientific research organization.  We have people that like to make
schematics.  We spend our time making schematic diagrams and charts. 
This component of DATAPLOT supports the automation of that work.

There is a heavy emphasis on data fitting in order to test underlying
assumptions.  The graphical displays are important because they
provide insight into the underlying structure.  Insight is important
if you must go into court, for example, and defend your understanding
of mechanisms at work in the data you have analyzed.  Three notable
cases that have arisen in our area are the analysis of the draft
lottery, the argument over the use of daylight-saving time a couple of
years ago, and data concerning the Alaska Pipeline.  Graphical
analysis was a critical component in those projects.

Display 30 shows the structure for DATAPLOT.  It is a data analysis
activity on one side and a mathematics activity on the other side. 
Three common activities common to both are plotting, fitting and
various transformations and function evaluations.

                                 -78-


[GRAPHIC] \WP1479.GIF


Display 31 shows the typical commands you can issue to DATAPLOT.  They
support plotting (commands 1, 2 and 4), fitting (commands 3 and,5),
and function evaluation (command 6).


                           TYPICAL COMMANDS

1.   PLOT X Y
2.   PLOT EXP(-X**2) FOR X = -3 .1 3
3.   FIT Y = A+B*EXP(-ALPHA*X)
4.   BOX PLOT Y X
5.   ANOVA Y Xl X2 X3
6.   LET A = ROOTS SIN(X**2)+EXP(-X) FOR X = 0 TO 5

IL

                              Display 31.


Displays 32 and 33 give examples of the display capabilities of
DATAPLOT.

All the graphics shown can be generated with any sort of system. 
Whether you have TECTRONIX, Spot 10, IGL or any graphics terminal, the
important question is how long does it take to generate the graphical
display.  If it takes more than thirty seconds to a minute to do so,
we lose the continuity so important to human-machine interaction.  In
data analysis the only concern is finding underlying structure, and
getting insight.  When generating graphics gets in the way of the
objective of finding underlying structure in data, we lose control of
the analysis.  Thus, the utility of graphics software is measured not
in what it can do per se, but rather in how easy it is to do it -- how
easy it is to,understand, write, modify and communicate the
instructions.

The DATAPLOT Intelligence Subsystem is an augmentation of the current
system to provide information and guidance as if a statistical
consultant were present during an analysis.  Basically we want to
provide an expert subsystem that asks the right questions as we step
through an analysis.  In order to get insight -- more than answers --
asking the right questions is just as important as coming to the right
conclusions.  The expert system interacts with the analyst, setting
the pace and posing questions along the way: have you checked this,
have you checked that, what does such and such a plot look like? It
will look like such and such so perhaps you should go in this
direction, that direction, etc.

Display 34 shows some of the human-machine interaction problems that
must be addressed in an expert system.  If the user requests an
operation like BOXPLOT, he should be able to see a one-line definition
and the rationale for its use.  In other words, if the expert system
recommends a certain course of action,  the analyst should be able to
ask questions like, "What is the penalty if we don't follow this?.

                                 -80-


[GRAPHIC] \WP1481.GIF


[GRAPHIC] \WP1482.GIF


                     HUMAN PROBLEMS (DESIGN GOALS)

DEFINITIONS

RATIONALE

LINKING TOOLS

KEY TESTS

HYPOTHESES/CONCLUSIONS

VARYING EXPERIENCE

                              Display 34.


The expert system should support the linking together of statistical
analysis tools, often in unexpected ways.  Data analysis is primarily
sequential and interactive.  We step through the data, step through
the analysis; and at each step, the next step is dictated by what we
have seen before that.

Scientist's often deal with correlation plots to see if there is any
correlation structure in the data.  DATAPLOT supports a correlation-
plot command and many other graphic commands, and analysis.  However,
if someone asks for a correlation plot, the expert system should
assist the analytic effort by carrying out appropriate statistical
tests behind the scenes.

Another important but time-consuming aspect of data analysis
(especially when you are writing research papers for the general
science community) is the need to frame your hypothesis and
conclusions in proper statistical terms.  An expert system should
support this formulation. We have found that to be very helpful to the
average scientist and engineer.  Every paper that goes out of NBS goes
through our statistics review process to guarantee that hypotheses and
conclusions have been properly stated in statistical terms.

The last aspect is the varying experience.  Any expert system is going
to have a problem dealing with different kinds of methods.  No one
expert system is going to be ideal because users have various degrees
of experience.  That's a very sticky problem.  A tough problem.  You
don't want the expert system to be so simple minded that an
experienced analyst must go through 20 menus just to carry out a legal
analysis.  On the other hand, someone with limited experience needs
the extensive guidance that 20 menus would give.

Display 35 shows the general content of the expert system component of
DATAPLOT.

                                 -83-


SUBSYSTEM OUTPUT (THE EXPERT SYSTEM)

SEQUENCE OF MENUS

DATAPLOT COMMANDS

CAUTIONS/CONCERNS

MENU EXPLANATIONS

ADDITIONAL TESTS

RIGOROUS STATISTICAL CONCLUSIONS

                              Display 35.


Sequence of menus: Each menu should have guidelines at the bottom of
the menu explaining not only the current menu but offering suggestions
as to which menus to, select next for specific analyses and why. 
These suggestions can include specific DATAPLOT commands.  Moreover,
within the menu environment the cautions and concerns about the form
of the analysis should be displayed clearly (e.g., a caution about the
data not following a Normal distribution).

The user should always have access to HELP functions for each menu
These menu explanations should include a description of where the
particular menu fits within the entire collection of menus.

Additional tests: I mentioned the idea of performing statistical
computations behind the scene.  Although the analyst may be unaware of
their specifics, he may want to make use of their results at a later
time.  The expert system is aware of this and can provide them.

DATAPLOT thus has 2 expert subsystems: A consultant-style expert
system which offers expert guidance for thoroughly and rigorously
carrying out a data analysis; and a data-interpretive expert system
which chooses a test, applies the test to the data, interprets the
output, and formulates a rigorous statistical conclusion (couched in
proper statistical terminology).

The remaining displays provide some idea of the analyst's interaction
with the expert system component of DATAPLOT.

As you can see the need for a great variety of interactions in a large
expert system requires a lot of thought and a large comprehensive
software system.  If any of you want to see DATAPLOT in operation, we
are out in Gaithersburg, and we will be glad to come out and
demonstrate it locally.


                                 -84-


References

Filliben, J. J. and Fong, J. T. (1984), "DATAPLOT as an Expert System
for Data Analysis," available from American Society of Mechanical
Engineers, June, 1984.

Hahn, Gerald J. (1985), "More intelligent Statistical Software and
Statistical Expert Systems: Future Directions," The American
Statistician February, 1985.

                        EDITING AND IMPUTATION
                 Brian Greenberg, Bureau of the Census


In talking today about an application of expert-system methods to data
editing and imputation, it will be the first time that I use the words
"expert system" in describing the edit and imputation program we have
developed -- SPEER (Structured Program for Economic Editing and
Referrals).  In the past, the focus was more on describing the
underlying methodology and discussing what the edit and imputation
system could do for users.

While preparing notes for this talk, I found that the emphasis was
less on SPEER itself and more on editing and imputation as an expert
system in principle.  When work started on our project to develop edit
and imputation software we had no intention of building an expert
system.  The goal was to develop techniques that corrected - survey
and census response data and imputed for missing values.  Looking
back, one can see that as work on, this project proceeded, an expert
system was evolving; and in the talk I will describe some of the steps
in the development of this system.

The purpose of editing and imputation is two-fold.  First, if a
respondent form is received and some responses are blank (item non-
response), one tries to fill in missing values in order to create a
complete data record for tabulation purposes.  In addition, one wants
to detect-erroneous responses and correct them.  For example, a
response may indicate a fifty-acre farm with five million bushels of
wheat or a twelve-year-old grandfather.  Such problems do occur in the
response data; they can be data entry errors or errors at the source. 
Which field does one adjust and what value should replace those
selected for change?

When confronted with large data sets such as one has in the Census
Bureau and many other Federal agencies, an automated system is a
necessity.  For surveys dealing with similar types of data, one would
like to have general programs to avoid continually having to reinvent
the wheel.  On the other hand, it is desirable that an edit and
imputation program incorporate has much survey-specific information as
is available, and one would like the survey-specific information, to
be exercised through a family of rules. An addition, one usually would
like a mathematical model to ensure that rules are, applied
consistently and to assist in selecting from among rules. In
particular, one wants to blend survey-specific information 
mathematical procedures within a coherent framework.  The expert-
system model is a natural structure for this type of program.

                                 -85-


              FUNCTIONS IN AN EDIT AND IMPUTATION SYSTEM
                             EDIT CHECKING
                          ERROR LOCALIZATION
                              IMPUTATION
                              Display 36.

What are the functions in an editing and imputation system? (See
Display 36.) The first is edit checking.  Edits are rules that detect
prohibited response combinations; and it is easy to check when an edit
fails, that is, a prohibited combination is encountered.  Given an
edit failing record, one endeavors to change as few responses as
possible in order to make the remaining responses consistent. 
Determining fields to change is called error localization.  Finally,
one wants to impute in order to allocate values for non-response and
replace responses deleted during the error localization process.

          DESIGNING AN EXPERT SYSTEM FOR EDIT AND IMPUTATION 
      UNDERSTANDING THEORETICAL ASPECTS OF EDITING AND IMPUTATION
UNDERSTANDING FACETS AND NATURE OF SUBJECT-MATTER EXPERTISE 

                              Display 37.

In designing edit and imputation software along the lines of an expert
system, each function that was described above should be structured in
its own module (see Display 37).  In a general system one wants to
enter as parameters the information that will be requested of all
users.  Survey-specific information, particularly decision rules, can
be entered in specified, well-defined places throughout the program. 
These rules will be different for each user.  SPEER had been employed
on six segments of the 1982 Economic Censuses.  The edit-checking
routine never changed from user to user, and the area-localization
subroutine was always the same.  The imputation rules, however, varied
in each application.  How does one impute? In general, one must rely
on those with expertise about the particular survey vehicle.  One
works with. the subject-matter specialists to elicit well-defined
decision rules based on their knowledge and experience.

What does one have to do in designing edit and imputation programs
along the lines of an expert system? First, one must understand the
nature and the facets of the subject-matter expertise;  What do the
experts know?  Their experience concerning the survey vehicle is
extensive; it is often based on experience in the analysis of response
forms and familiarity with respondents.  They are knowledgeable about
the survey target population, the survey form itself, and often the
source of errors or non-response in data.

                                 -86-


As a matter of fact, for some kinds of missing data, the survey
specialist can tell you why it's missing.  For example, it may be

known by people working on a survey that when a certain field is
blank, the respondent means zero -- they just routinely skip the
question.  Other blank fields will never be zero.  The respondent
either did not know the answer to that question or did not want to
reveal it; and so that data field was left blank.  Knowledge of this
sort is certainly survey-specific.  It cannot be gleaned through
standard analysis of reported data, nor are there usually auxiliary
data sets available to design models of "missingness." The subject-
specialist, however, is a source of information that can be profitably
utilized.  Statistically-derived procedures (such as appropriate
model-based imputation techniques) can be viewed and utilized as
survey specific decision rules.

In addition to subject-matter expertise, one must incorporate
appropriate editing models.  In SPEER, the error-localization process
is basically a set-covering problem -- a mathematical model.  One
utilizes linear analysis and graph theory to select fields to delete
on edit-failing records.  Once these fields are deleted, the remaining
fields will be mutually consistent; and then one can begin to impute. 
The process of imputation uses survey-specific rules provided by
subject-matter experts.  The knowledge base of decision rules can be
organized within coherent imputation modules through which they can be
applied.  That is, the system goes back and forth between the subject-
matter information land the mathematical model.  Mathematical
techniques and subject-based imputation rules are two components that
one should have in an overall edit and imputation system.

thinking of it that way, the mathematical procedure and the subject-
matter rules can be treated as separate.  One can extend the
mathematical methods and revise the flow of the system as a whole,
unencumbered by survey-specific considerations.  The survey-specific
rules can be examined in their own right, updated and revised as
needed independent from the programs through which they are applied. 
On the other hand, the mathematical procedures and decision rules are
integrated.  The mathematical constructions provide a framework to
assist in choosing the most appropriate decision rule and to ensure
that the value imputed will pass all applicable edits.  Thus, an
expert system for imputation should do more than provide a vehicle for
accessing expert rules.  It should also provide a mathematical
framework to help decide from among the rules, choosing only rules
which are valid for the record under consideration.

                                 SPEER
             CONTINUOUS (ECONOMIC) DATA UNDER RATIO EDITS

                           (A(l), ..., A(N))


                          TYPICAL RATIO EDIT
                       L(ij)< A(i)/A(j).< U(ij)

                              Display 38.


                                 -87-


SPEER (Display 38) is an edit and imputation system designed along the
lines of an expert system.  SPEER was designed for economic data such
as wages, assets, inventories, etc.  The typical edit is a ratio of
two fields, called a ratio edit.  The total salaries paid to employees
divided by the total number of employees should be within some
reasonable range consistent with our knowledge of the industry and
occupation.  The amount of crop yield divided by the number of acres
should be in a certain range.  Ratio range checks are a very common
edit in economic surveys.  Given that a family of ratio edits is
failed by a response record, one must select a set of fields to
delete.  We illustrate the workings of the error localization
mechanism of SPEER on two samples below.

Let the circled numbers in Display 39 represent response fields and
the edges in the graph represent edit failures between the adjoining
fields.  For example, the value in field 6 is inconsistent with fields
1 through 5 as determined by the collection of edit rules.  If we
delete the value in field 6 -- that is, remove node 6 from our graph -
- all edges vanish.  Thus, the remaining fields are mutually
consistent because there are no edges connecting the corresponding
node, hence there are no edit failures between them.  That is, we can
delete a single field to eliminate all edit failures.  The lower graph
is a little more complicated. one can see, however, that by deleting
the values in fields 2 and 3, the remaining fields are mutually
consistent.  These simple examples capture the spirit of the error
localization methods built into SPEER -- a little graph theory is used
to find the minimal subset of field values to delete.

After error localization one has a collection of blank fields (some
due to non-response, others because fields were deleted during the
error localization process).  The remaining fields are consistent with
one another, and they must be consistent with values imputed.  The
program sets up a series of range specifications for a blank field
taking into account the value for each valid field.


                                 -88-


[GRAPHIC] \WP1489.GIF


       If A(n) is missing, and A(j) are consistent for all j n,
                                   
                   L (nj) fa A (n) /A (j) -,C U (nj)

                                  so

                   L (nj) A (j) � A (n) � (nj) A (j)
                                   
                              Display 40.

Every valid imputation for a missing field (field n in Display 40)
must lie in the overlap of the regions determined by each fixed field
on the record in order to be consistent with every other field.

Once the feasible region for a missing field,is computed the program
reaches into the imputation module for the value to be imputed.  The
first applicable rule is selected and an imputation is derived based
on this rule. if the derived value falls in the feasible region, it is
Accepted as a valid imputation.  If not, the second rule is accessed,
an imputation value is derived, etc.  The value ultimately selected as
the imputation will be chosen based on subject-matter based rules and
will also be consistent with all other fields on the record under
review, because it is forced to lie in the feasible region.

This may be a good time to provide an example of what a rule sequence
might look like.  Suppose one is to impute for a field such as Annual
Payroll (APR) on an economic census or survey.  For concreteness, let
us couch our discussion in terms of the 1982 Economic Censuses.  The
first rule might be to derive an imputation based on the 1982
Administrative Data value for APR.  If the value derived does not lie
in the feasible region, one might try the 1981 Administrative Data
value for APR.  If this value is not suitable, we pass to a third
option, etc., until a valid imputation is derived.  Some imputation
rules can be extremely field specific.  For example,.suppose some
field is to be reported in tons.  Assume that, the feasible region
allows valid responses to be between 500 and 1,000 tons and the value
1,800,000 was reported and deleted as an error.  The applicable option
might be to divide the reported value by 2,000 (subject-based
information that respondents

sometimes report in pounds rather than tons).  In this example, we
would derive 900 tons and observing that this value is feasible,
accept it as the valid imputation. A common error in reporting
economic data is that respondents provide answers in units rather than
in thousands as per instructions.  For fields in which this error may
occur, the first rule (when appropriate) is to divide the reported
response by 1,000.

The editing and imputation for the 1982 Enterprise Summary Report and
the 1982 Auxiliary Establishment Report (both portions of the 1982
Economic Censuses) was performed using SPEER.  In addition, SPEER was
used-to process the Manufacturing, Retail, Wholesale, and, Service
segments of the 1982 Economic Censuses of Puerto Rico.  In each of
these applications, the edit checking, error localization routines,
and basic system flow are the same.  Each application, however, had
its own family of decision rules for imputation.  Each application
employed different rules based on the survey-specific fields, relation
between fields, and auxiliary information.

                                 -90-


How does one implement an edit and imputation system based on expert-
system principles?  For a given application, start with the experts. 
their expertise, elicits rules, and embeds those rules in the system
components requiring them.  Sample data is tested, performance
evaluated, rules are refined as needed.

Editing of economic data records at the Census Bureau is a two-phase
process.  All records are run through an automated edit and imputation
system in batch mode. Within the automated routines, selected records
are targeted as referral cases and are directed for analyst review. 
An optimal strategy will include automated procedures to resolve the
majority of cases and individual review for establishments needing
special handling.  Typical referral criteria are: (1) large change to
reported data; (2) amputations for large establishments; and (3)
highly atypical combination of responses.  The analyst reviewing a
response form is a subject-matter specialist, and the review is
currently a pencil-and-paper process.  After analyst adjustments are
made to the results of automated processing on an establishment
record, the revised form is once again processed through the automated
system.

SPEER allows on-line, interactive processing of referral cases.  Used.
in this mode, the system converses with an expert using it.  The human
expert can override the decision rules residing in the system and
replace them based on his/her expertise, and auxiliary information
about the case under review.  Using this system, the analyst requests
a specific record and reviews the processing done by the automated
system.  The analyst has the original response form and hence access
to information not incorporated into the rules.  Based on this
additional information and his/her own experience, an analyst may
overrule the decision rules built into the automated system.

                      IMPUTATION OPTIONS FOR APR

A.   RANGE OF APR: (250,750)
B.   CURRENT VALUE: 375 OPTIONS
     1.   REPORTED VALUE: 82
     2.   1982 ADMINISTRATIVE DATA:
     3.   1981 ADMINISTRATIVE DATA BASED:
     4.   1977 CENSUS DATA BASED:
     5.   IMPUTATION AND TOLERANCE:
     6.   ANALYST SUPPLIED VALUE:

                              Display 41.


The display seen by the analyst looks something as in Display 41. 
Using Annual Payroll (APR), this display shows an acceptable range for
APR from 250 to 750 (i.e., the feasible region).  The current value is
375, which was derived by the automated system.  The next value is the
actual reported value of 82 followed by the reported 1982
Administrative Data and other candidate amputations based on 1981
Administrative Data, 1977 Economic Census Data, etc.  The ordering
above reflects the order in which the rule options are applied.  By
requiring that the range in which the imputed value must fall be
consistent with all fields, plus a variety of options, the

                                 -91-


analyst then has a significant amount of information at his/her
disposal to assist in the decision-making process.  If there is reason
to believe that the most appropriate imputation value lies outside the
feasible region (for example, because of explanatory notes on the form
or through a call-back to the respondent), the analyst can select an
imputed value outside the feasible region.

A revised imputation for field APR is decided, and the analyst enters
it into the data record.  This value is accepted by the program, and
field APR is, considered to be completed.  Suppose there is a second
field to be reviewed on this record (for example, Number of Employees
(EMP)).  Once again, the program displays on the terminal screen the
feasible region for EMP, currently residing value, and candidate
values for imputes derived according to each option, as it did for
APR.  Note, however, that each of these values is based, in part, on
the new value of APR just entered by the analyst.  As above, the
analyst will determine an appropriate value for EMP, enter this value,
and move on to the next field, if any.  After all fields have been
examined and adjusted if needed, the review is complete.  The revised
record will be consistent, and no further batch processing will. be
required.

The important observation from the perspective of an expert system, is
that a true expert, converses with the automated expert programs in
order to augment the system expertise and override decision rules as
needed.  Initial testing has shown that analysts have found this
system easy to use.  It has the potential for making their decisions
in the review of establishment records less tenuous than is currently
the case.  Because the individual review of establishment records is a
time-consuming and costly process, one can anticipate savings of time
and money in the use of an "expert-system aided," on-line, interactive
review process.  The on-line, interactive portion of this program has
not yet been put to use for actual survey processing.  We are actively
working with potential users to incorporate this aspect of the program
in future editing and imputation processing.

In summary, an edit and imputation system should blend statistical and
subject-matter expertise in a coherent framework and integrate edit
constraints with imputation strategy.  We have described a structured
system that attempts to meet these requirements and is sufficiently
flexible to accommodate a variety of users.  Development work
continues on this system, enhancements are being made, and additional
users are being identified.  The references provide more information
about some of the technical features of the SPEER system.

                              References

Greenberg, Brian (1981), "Developing an Edit System for Industry
Statistics," Computer Science and Statistics: Proceedings of the 13th
Symposium on the Interface, 11-16, Springer-Verlag, New York.

__________     (1982), "Using an Editing System to Develop Editing
Specifications, " Proceedings of the Section on Survey Research
Methods, American Statistical Association, 366-371.

     and Surdi, Rita (1984), "A Flexible and Interactive Edit and
Imputation System for Ratio Edits," Proceedings of the Section on
Survey Research Methods, American Statistical Association, 421-426.

                                 -92-


                              DISCUSSION
              Mark Winer, Office of Management and Budget


When Terry first mentioned the idea of having a panel on expert
systems in a conference on Statistical Uses of Microcomputers in
Federal Agencies, my question was "What do expert systems have to do
with microcomputers and statistical systems?."  I decided I would take
a look at all the things I saw today to see how this session fits into
the other sessions.  The first thing we see is that you can use the
IBM PC, or other personal computers, as a terminal to a mainframe. 
The mainframes have excellent software systems.  That allows you to
use both machines for the things those machines are best at. With the
large amounts of data and large amounts of information you might need
with an expert system, it is good to use a mainframe for most cases;
but for the kind of processing and quick response you might want it's
nice to use downloaded results from an expert system on a personal
computer.

The second reason this fit in is that, as I mentioned, the system
developed by Brian Greenberg has just been adapted to personal
computers.  As memory capacity and storage capacity on personal
computers increase, even the large systems like DATAPLOT could be
extended to personal computers.

The third reason that this fit in is that every couple of years there
is a real hot topic in the computer field.  In 1982 and 1983 it was
decision support systems.  If we were having this workshop in 1982 or
1983, we would have undoubtedly had a panel on decision support
systems; and since in 1984 and 1985 the hot topic is expert systems,
we are having this workshop and it is incumbent upon you to have a
panel on expert systems.  This brings up the obvious question of
whether an expert system really is something new or is it just
something old, another big word that people use to bring out high-
priced consultants to design your system.

I guess I will say that from what we have seen in these demonstrations
today, expert systems are doing more than ordinary software systems. 
Ordinary software systems help the user do the things he normally does
but make it possible for him to do those things faster and save him
some of the tedious parts of the task.  Both the systems we heard
talked about today have the advantage of actually bringing additional
knowledge to the user in that he can do what he wouldn't necessarily
know how to.  Expert systems show you how to locate the error that is
the easier error to change if you are trying to do efficient editing. 
You have subject-matter expertise that analysts couldn't produce on
their own.  This system does that for you. It uses the subject matter
of the expert to figure out how that record can be changed with some
help from the system.

The DATAPLOT system teaches you about the tests that are available to
you as you use them.  It suggests to you additional steps to do as you
are doing things; so even if you are not an expert statistician, you
can figure out the ways to proceed as you are, working on a problem. 
So, as I say, expert systems provide something beyond what we
ordinarily have in software systems.  They are an extension of the
existing packages rather than things that stand by themselves.

                                 -93-


                         QUESTIONS AND ANSWERS


Ql:  You mentioned a new fad.  Isn't some of this just like a
sophisticated help facility?

Al:  (Mr.  Winer): Yes.

Q2:  This is this years new thing, but better help facilities have
been a growing need since computers got started.  I think expert syst-
ems are a logical outgrowth of that.

A2 (Mr.  Lawton): I think expert systems are more than that, but help
facilities are at least part of what we are dealing with here.  I
would say what makes the help facility more sophisticated is that they
have some expertise built into them, that is based on knowledge of
where in the program the user calls the help facility.  So what you
say is partly true, but I think there is some intelligence built into
the back of the help facility in the expert system that wouldn't be
there in a more conventional system or from just reading, the help
file.

A2 (Mr.  Ireland): There probably are two other issues.  First, the
help system can be changed incrementally as you come to understand
what kind of help you need.  The idea of rules makes it easy to
develop small help modules that are added to a system that already has
a help facility.  Second, for some of Brian's things, it isn't a user
help facility, but a specification of how to handle a particular piece
of data.  So, the help facility might never be seen by analysts unless
they ask to see it, but it would be used to make a proper kind of
modification to the data. 

A2 (Mr.  Greenberg): Expert systems can be run in batch mode once ex-
pertise is built into it, and that bears on the use of the help
facility in a batch mode.

Q3:  I am curious about some of the details of the DATAPLOT and the
editing and imputation system.  Let me start with the editing system
since that is fresh in your memory.  How much memory on the IBM XT
does your system take up?

A3 (Mr.  Greenberg): I really don't know.  I haven't been doing the
actual transfer to the XT.

Q4: But you could fit it into 64OK?
A4 (Mr.  Greenberg): Yes, plenty of data is on one floppy disk.
Q5:  You said it was easy to use.  What did you say--a half day? How
long?

A5 (Mr.  Greenberg): I would say a half day working with somebody like
myself or someone familiar with it.
                                 -94-


Q6:  We do surveys a lot and they are typically tedious -- we have
people coming in to do error checking and editing in a rather
primitive way, so I think your product would be very useful to us and
a lot of other people. What would be a good way for us to learn more
about it?

A6 (Mr.  Greenberg): Drop me a line or give me a call.

Q7:  The degree of statistical information you obviously have in your
head goes well beyond that of everyone in our office.  The only fear I
would have would be whether those of us who have a much lower level of
statistical knowledge could still make use of DATAPLOT. What do you
think about that?

A7 (Mr.  Filliben): That is a general problem, and one of the displays
dealt with varying experience.  This addresses the point of whether
this is an extended help facility.  We tried to make sure that the
menus that came up would be a part of the education process too--a
tutorial, if you will.  We have had people use this expert system who
have very limited statistical background, and they have come, out with
good results.  It's a matter of learning, and I think the expert
systems are at the point now where it's nice to have a machine that
has an expert system, but it's also nice to have some statisticians
and other consultants around who can augment them.  One thing we did
not mention was references.  Where does someone go if he really wants
to read up on graphical or residual analysis, for example? That is one
command as far as DATAPLOT is concerned.  There should be a reference
command.  It's not in there yet, but there is a body of literature
that's out there that has a lot of details.  If people want to go in
and fill up their own base knowledge, they should have access to this
base.  It is very much, as you say, an extended help.  There are lots
of different ways these various systems can be of help because there
are a lot of different ways we can have deficiencies in our own
knowledge.

Q8: What kind of mainframe are they working with?

A8 (Mr. Filliben): All the major mainframes.  UNIVACS.

one, and in fact the default machine, is the VAX 11/780.  IBM/PC's. 
The Pentagon has it on a Honeywell Multics system.  PERKIN-ELMERS. 
PRIMES.  The only machine we had difficulty putting it up on was the
CYBER machine.  That problem will disappear because we are getting a
CYBER machine and we will be forced to address that problem.  They
have a hardware restriction on memory.  In UNIVAC you run into an
overlay problem.  In terms of whether it would download to a PC or
could be put up on a PC, you would need about-a half a megabyte of
memory.  Small machines--micros--are expanding to the point where it's
a real possibility to put DATAPLOT on a PC.

Q9: You   say that NTIS sells this?
A9 (Mr.   Filliben): National Technical Information Service sells it
for
$1200 --a one-time-only fee.  You get the source code.  The source
code on the file is 12 megabytes, so you have to have somewhere where
you can put it.

Q10:  Did you write this yourself?

A10 (Mr.  Filliben): Yes.


                                 -95-


Q11: How long did it take?

All (Mr.  Filliben): We started back in 1972 with a software system
called DATAPAK which is free from NBS.  That sort of got us into the
problem.  By 1975 it became clear that interactive systems were
becoming more important.  By 1977 we had the first DATAPLOT running,
and things have essentially been the same since then.  We augmented it
to include the expert system.

Mr. Winer: Perhaps less a question than, a comment.  At the end of the
last session, I asked the panelists how their decentralized and
spreadsheet-type statistical systems insure or assure data integrity
and adherence to the statistical standards.  Here I think we have had
two presentations in which one could in a sense say "Hey, that answers
the question!".  If people start using systems like Brian's, they will
have more data integrity; and if one starts using systems more like
Jim.'s, one could have more adherence to agencies present standards.

I would like to take this opportunity to thank Terry Ireland who
chaired this session, but who is also the Chairman of the Subcommittee
of the Federal Committee on Statistical Methodology who organized this
entire workshop, including this session.  We thank him and thank you
all for coming.

                                 -96-


                               Appendix

                      Announcement of Workshop on

        Statistical Uses of Microcomputers in Federal Agencies

The Subcommittee on Statistical Uses of Microcomputers in Federal
Agencies of the Federal Committee on Statistical Methodology is
sponsoring a one-day workshop on April 24, 1985, to discuss with other
Federal employees selected topics on statistical uses of
microcomputers.  The workshop will be held at the IRS Auditorium, 1111
Constitution Avenue, N.W., 7th floor, from 9:15 a.m. to 4:30 p.m.

The agenda and speakers are as follows:

9:15 a.m.    WELCOME AND INTRODUCTION

Chair:    Maria Gonzalez, Office of Management and Budget
Arrangements: Linda Taylor, Internal Revenue Service

9:20 a.m.    PLANNING

Chair and Discussant: Larry Cox, Bureau of the Census

Rapporteur: Fred Cavanaugh,.Bureau of the Census

Microcomputer technology has much to offer statistics, and many
statisticians have become microcomputer users at work and at home. 
This technology and the keen interest of statisticians in it provide
statistical agencies with many opportunities, each bringing with it
responsibilities for planning, implementation and evaluation: if every
statistician (programmer/secretary) in an agency wants a
microcomputer, who should have them? For what purposes can/should
microcomputers be used? In what configuration? At what cost
(overall/per user)? How will this technology coexist with central ADP
services? What policy decisions need to be made -when -- by whom?

In this session on planning, we will explore such questions through
discussion, focusing on three different and successful approaches to
these problems -- those adopted by the Census Bureau, the National
Security Agency and the Bureau of Labor Statistics.

Speakers: Ronald R. Swank, Bureau of the Census Kathy Schnaubelt,
National Security Agency Peter Stevens, Bureau of Labor Statistics

                              DISCUSSION

10:45 a.m.                   COFFEE BREAK    

                                 -97-


11:00 a.m.             ELECTRONIC DATA DISSEMINATION

Chair:    Ken Berkman, Bureau of Economic Analysis

Rapporteur: Jay Casselberry, Energy Information Agency

This session is a panel discussion of the different approaches
to.electronic data dissemination by various Federal agencies. 
Different approaches will be described with particular emphasis on the
factors determining an agency's approach to data dissemination and the
problems encountered in their implementation.  The experience gained
by these agencies will be presented: National Technical Information
Service (NTIS) distribution of microcomputer floppy disks; Census
CENDATA system; and the U. S. Department of Agriculture's current
development of an on-line system.

Speakers: Stuart Weisman, National Technical Information Service 

Barbara Aldrich, Bureau of the Census

Roxanne Williams, Department of Agriculture

                           ***DISCUSSION***

12:30 p.m.                  LUNCH

1:15 P.M.       APPLICATIONS OF MICROCOMPUTERS

Chair: Ron Steele,  Department of Agriculture

Rapporteur: Tom Nagle, Internal Revenue Service

This session is a panel discussion of statistical applications of
microcomputers.  The utility and weaknesses of applications software
and operating systems will be discussed.  Some examples involve
interfacing mainframe and microcomputers.  Issues to be addressed
include an assessment of the utility of microcomputers at present, the
future utility in light of new hardware and software technologies, and
considerations regarding data integrity, security and accessibility.

Speakers:      Linda Atkinson, Department of Agriculture 

               Gary Nelson, Department of Agriculture 

               Rick Hayes, Internal Revenue Service 

               Brian Carney, Department of Agriculture 

               Paul Dobbins, Department of the Treasury 

               Dick Shively, Department of Agriculture.
                                 -98-


                              DISCUSSION

2:45 p.m.                    COFFEE BREAK

3:00 p.m.       EXPERT SYSTEMS

Chair:    Terry Ireland, National Security Agency

Discussant:    Mark Winer, Office of Management and Budget

Rapporteur:    Norman Glick, National Security Agency

Recently, the idea of incorporating techniques used by professional
experts into software has become popular.  This session will introduce
the basis for expert-system methodology and give several practical
examples of expert systems with statistical applications that are
currently in use.


Speakers: George Lawton, Army Research Institute

          James Fillben, National Bureau of Standards 

          Brian Greenberg, Bureau of the Census 

                          ****DISCUSSION****

4:30 p.m.                   ****ADJOURN****

                                 -99-


     Reports Available in the Statistical Policy working Paper series


1.   Report on Statistics for Allocation of Funds (Available through
     NTIS Document Sales, PB86-211521/AS)

2.   Report on Statistical Disclosure and Disclosure-Avoidance
     Techniques (Available through NTIS Document Sales, PB86211539/AS)

3.   An Error Profile: Employment as Measured by the Current
     Population survey (Available through NTIS Document Sales PB86-
     214269/AS)

4.   Glossary of Nonsampling Error Terms: An Illustration of a
     Semantic Problem in Statistics (Available through NTIS Document
     Sales, PB86-211547/AS)

5.   Report on Exact and Statistical Matching Techniques (Available
     through NTIS Document Sales, PB86-215829/AS)

6.   Report on Statistical Uses of Administrative Records (Available
     through NTIS Document Sales, PB86-214285/AS)

7.   An Interagency Review of Time-series Revision Policies (Available
     through NTIS Document Sales, PB86-232451/AS)

S.   Statistical Interagency Agreements (Available through NTIS
     Document Sales, PB86-230570/AS)

9.   Contracting for Surveys (Available through NTIS Document Sales,
     PB83-233148)

10.  Approaches to Developing Questionnaires (Available through NTIS
     Document Sales, PB84-105055/AS)

11.  A Review of Industry Coding Systems (Available through NTIS
     Document Sales, PB84-135276)

12.  The Role of Telephone Data Collection in Federal Statistics
     (Available through NTIS Document Sales, PB85-105971)

13.  Federal Longitudinal Surveys (Available through NTIS Document
     Sales, PB86-139730)

14.  Workshop on Statistical Uses of Microcomputers in Federal
     Agencies (Available through NTIS Document Sales, B87-166393)

Copies of these working papers may be ordered from NTIS Document
Sales, 5285 Port Royal Road, Springfield, VA 22161 (703) 487-4650

