Table of Contents

Section

  List of Figures             . . . . . .    .


I.    RESOURCE IDENTIFICATION PAGE . . . .    .


II.   RESOURCE OPERATIONS . . . . . . .


   1I.A   PROGRESS             . . . . . .    .


      II.A.l RESOURCE SUMMARY AND GOALS .    .


      II.A.2 TECHNICAL PROGRESS . . . .    ,

II.A.2.a

II.A.2.b


II.A.2.c


II.A.2.d


II.A.2.e


II.A.2.f

II.A.2.g

II.A.2.h


II.A.2.i

II.A.2.j

II.A.2.k

FACILITY HARDWARE , . ,

TENEX SYSTEM SOFTWARE .

. . . .



.    . . .



.    . . .



. . . .



.    . . .



. . . ,



. . . .



. . , .

NETWORK COMMUNICATION FACILITIES . .


SYSTEM RELIABILITY AND BACKUP . . .


PROGRAMMING LANGUAGES . . . , .


MAINSAIL OVERVIEW . , . . . . .

OPERATIONS AND USER SOFTWARE . . .


USER ASSISTANCE AND CONSULTING . .


INTRA-COMMUNITY COMMUNICATION . , .


DOCUMENTATION AND EDUCATION . . .


SOFTWARE COMPATIBILITY AND SHARING .

11.~ .3 RESOURCE MANAGEMENT . . . . . . . .

   II.A.3.a MANAGEMENT COMMITTEES . . . , .

   II.A.3.b NEW PROJECT RECRUITING . . . , ,

   II.A.~.c STANFORD COMMUNITY BUILDING . . .

   II.A.3.d RESOURCE ALLOCATION POLICIES . . .

   II.A.3.e AIM WORKSHOP SUPPORT . . . . , .

.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.

.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.

. .

.   iv



.    1



.    2



.   2



.   2



.   4



.   4



.   12



.   15



.  23



*  25


  27

.  30



.  36



.  37



.  40



.  40



.  44



.  44



.  45

  46


  46

Page

.  47


   II.A.4 FUTURE PLANS . . . . . . . . .


1I.B   SUMMARY OF RESOURCE USAGE . . . . . .


   II.B.l RELATIVE SYSTEM LOADING BY COMMUNITY .

   II.B.2 INDIVIDUAL PROJECT AND COMMUNITY USAGE

      II.B.~ NETWORK USAGE STATISTICS


   1I.C   RESOURCE EQUIPMENT SUMMARY .


   1I.D   PUBLICATIONS         . . . .


III.   RESOURCE FINANCES         . . . .

  1II.A REFERENCE TO BUDGETARY DETAILS

  1II.B RESOURCE FUNDING . . . .


IV.   RESOURCE PROJECT DESCRIPTIONS , .


   1V.A   FORMALLY APPROVED PROJECTS .

      IV.A.l STANFORD USERS . . .

          1V.A.l.a DENDRAL PROJECT


          1V.A.l.b MYCIN PROJECT .

. .



. .



. .



. .



. .



I .



. .



. .



. .



. ,



. m

.



.



.



.



.



.



.



.



.



.



.

   1V.A.l.c PROTEIN STRUCTURE MODELING

IV.A.2 NATIONAL USERS . . . . . .

   IV.A.2.a CHEMICAL SYNTHESIS PROJECT

   Iv.A.2.b INTERNIST (DIALOG) PROJECT

. .



o ?



? ?



? ?



? ?



? ?



? ?



? ?



? ?



? ?



? ?

. .



. .



. .



o ?



? ?



? ?



? ?



? ?



? ?



? ?



? ?



? ?



? ?



? ?



? ?

PROJECT .

. . . ,



. . . .



. . . .

       IV.A.2.c HIGHER MENTAL FUNCTIONS MODELING . .

      IV.A.2.d LANGUAGE ACQUISITION MODELING PROJECT

       IV.A.2.e MEDICAL INFORMATION SYSTEMS LABORATORY

      IV.A.2.f RUTGERS COMPUTERS IN BIOMEDICINE . .


1V.B   INFORMAL PROJECTS . . . . . . . , . . .

   IV.B.l STANFORD PILOT PROJECTS . . . . . . .

       1V.B.l.a AI IN MOLECULAR GENETICS - MOLGEN    .

.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.

1V.B.l.b BAYLOR-METHODIST CEREBROVASCULAR PROJECT

.



.



,



.



.



.



.



.



.



.



.



.



.



0



.



.



.



.



,



.



.







.



.



.



.

1V.B.l.c AUTOMATIC LV MODELING . . . . . . .

.  48



.  51



o  51



.  54



o  57



.  59



.  65



.  66



.  66



.  67



#  68



.  69



.  69



.  69



.  89



.  100



.  105



.  105



.  110



.  112



.  118



.  121



.  126



.  136



.  136



.  136



.  138



.  143


ii

1V.B.l.d INFORMATION PROCESSING PSYCHOLOGY
         PROJECT . . . . . , . . . .

1V.B.I.e AIM RESEARCH - UNIVERSITY OF ROCHESTER .

   1V.B.l.f QUANTUM CHEMICAL INVESTIGATIONS


IV.B.2 NATIONAL PILOT PROJECTS . . . . .

   IV.B.2.a NATURAL LANGUAGE UNDERSTANDING


   IV.B.2.b KRL PROJECT . . . . . . .


   IV.B.2.c COMPUTERIZED PATIENT MONITORING


   IV.B.2.d AI IN PSYCHOPHARMACOLOGY . .

Appendix A

OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH    .

Appendix B

AI HANDBOOK OUTLINE . . . . . .

Appendix C

HEURISTIC PROGRAMMING PROJECT WORKSHOP

Appendix D

TYMNET RESPONSE TIME DATA . . . . .

Appendix E

MAINSAIL DESIGN SUMMARY . . . . .

Appendix F

  SUBSYSTEMS AND DOCUMENTATION DIRECTORIES

Appendix G

. .









. .









. .









. .









o ?

.



*



.



.



.



.









.









.









.









.









.









.

SUMEX-AIM SUMMARY FOR ARPANET RESOURCES HANDBOOK ,

. .



. .



. .



. .



. .



. .









. .









. .









. .









. .









. .









. .









. .

.



.



.



.



.



.



.









.









.









.









.









.









.

.  147



.  150



.  151



.  154



.  154



.  158



.  159



.  160

.  166

.  180

.  194

.  195

.  208

.  218

. .  230


iii

Appendix H

AIM MANAGEMENT COMMITTEE MEMBERSHIP . . . . . , . , . 252

Appendix I

USER INFORMATION - GENERAL BROCHURE . . . . , . . . . 255

Appendix J

GUIDELINES FOR PROSPECTIVE USERS . . . . . , s . . . 264


1.

2.

3.

4.

5.

6.

7.

8.

9.

iv

List of Figures

SUMEX-AIM Computer Configuration . . . . . . . .

Elapsed Time/CPU Minute versus Load Average . , . .

I/O Wait and Core Management Overhead versus Load Average

TYMNET Network Map         . , , .



TYMNET Response Delay Statistics .

ARPANET Geographical Network Map .

ARPANET Logical Network Map , . .

CPU Usage by Community . . . .

File Space Usage by Community . .

, .





. .





. .





I .





, .





. .

10.   TYMNET and ARPANET Usage Data . . , .

. .





. .





. .





. .





. .





. .





. .

. .

I9

2@

21

22

52

53

.  58



:MTIOXiiL IXSTI'I'UTF; OF ZX,'-iX
DiVISIO?I OF R:ESEARCH RESOLRCES
BIOTECZOLOGY RESOURCES PROGRW

SECTION I - RZSOURCE IDEKTIFICATION

Report Period:                                     Grant Number:

Fron August 1, 1975 to July 31, 1976

Name of Resource:

Resource Address:

Stanford University
Medical Experimental
Computer (SUMEX)

Stanford University
Stanford, California
94305

Principal Investigator:

Title:

Joshua Lederberg, Ph.D.

Chairman and Professor
Department of Genetics

Grantee Institution:       Type of Institution:

Stanford University        Private IJniversity

Name of Institution's Biotechnology Resource Advisory Comm

RR-00785-03
Report Prepared:

May, 1976
Resource Telephone Number:

Academic Department:

School of PIedicine
Department of Genetics

Investigator's Telephone No.:

(415) 497-5801

rtee:

SW-AIM Executive Committee

Membership of Biotechnology Resource Advisory Committee:

Name

Saul Amarel, Ph.D.

Title             Department       Iastitution

Chairman and Professor Computer Science Rutgers University

Donald Lindberg, M.D. Professor
                 Director

Pathology      University of Missouri
Information     School of Medicine
Science Group

Principal Investigator:

Joshua Lederberg, Ph.D.
Chairman and Professor


Stanford University Official:

& + Dat::,,. ~976
Signature:                   Date:

D'Ann E. Downey
Sponsored Projects Officer

May 27, 1976



\


2

II    RESOURCE OPERATIONS

1I.A    PROGRESS

II.A.l    RESOURCE SUMMARY AND GOALS

   The SUMEX (Stanford University Medical Experimental computer)
project is a national computer resource funded by the Biotechnology
Resources Program of the National Institutes of Health (NIH-BRP). It
encompasses a dual mission: 1) the promotion of applications of artificial
intelligence (AI) computer science research to biological and medical
problems and 2) the demonstration of computer resource sharing within a
national community of health research projects.   The SUMEX resource
resides administratively within the Genetics Department of the Stanford
University Medical School and serves as a nucleus for a growing community
of projects, both at Stanford and nationally.  SUMEX provides computing
facilities specifically tuned to the needs of AI research and
communication tools to facilitate inter- and intra-group contacts and the
demonstration of research products to medical users.  The project also
develops tools for and encourages community relationships among
collaborating projects and medical researchers,

   User projects are separately funded and autonomous in their
management.  They are selected for access to SUMEX on the basis of their
scientific and medical merits as well as their commitment to the community
goals of SUMEX (see Section 11.~.3 on page 44).  Currently active
projects span a broad range of application areas such as clinical
diagnostic consultation, molecular biochemistry, belief systems modeling,
mental function modeling, and instrument data interpretation (see Section
IV on page 68).

   Artificial Intelligence is a branch of computer science which
attempts to discern the underlying principles involved in the acquisition
and utilization of knowledge in reasoning, deduction, and problem-solving
activities(l).  Each authorized project in the SUMEX community is
concerned in some way with the application of these principles to medical
problems.  The tangible objective of this approach is the development of
computer programs which, using formal and informal knowledge bases
together with mechanized hypothesis formation and problem solving
procedures, will be more general and effective consultative tools for the
clinician and medical scientist.  The exhaustive search potential of
computerized hypothesis formation and knowledge base utilization,

----------------------------------------------------------------------

   (1) Two recent reviews give some perspective on the current state of
AI: see Nilsson, N.J., "ARTIFICIAL INTELLIGENCE", Information Processing
74, North-Holland Pub. Co. and a summary by Buchanan, B. G. and
Feigenbaum, E. A., attached as Appendix A, page 166,   An additional
overview of research areas in AI is provided by the outline for an
"Artificial Intelligence Handbook" being prepared under Professor
Feigenbaum by computer science students at Stanford (see Appendix B on
page 180).


constrained where appropriate by heuristic rules or interactions with the
user, has already begun to produce promising results in areas such as
chemical structure elucidation, diagnostic consultation, and mental
function modeling.  Needless to say, much is yet to be learned in the
process of fashioning a coherent scientific discipline out of the
assemblage of personal intuitions, mathematical procedures, and emerging
theoretical structure of the "analysis of analysis" and of problem
solving.  State-of-the-art programs are far more narrowly specialized and
inflexible than the corresponding aspects of human intelligence they
emulate; however, in special domains they may be of comparable or greater
power, e.g., in the solution of formal problems in organic chemistry or in
the integral calculus,

   Our community building role is based upon the current state of
computer communications technology.  While far from perfected, these new
capabilities offer highly desirable latitude for collaborative linkages,
both within a given research project and among them.   Several of the
active projects on SUMEX are based upon the collaboration of computer and
medical scientists at geographically separate institutions; separate both
from each other and from the computer resource.  Another major goal of the
network experiment is to enable diverse projects to interact more directly
and to facilitate selective demonstrations of available programs to
physicians and medical students,  Even in their current developing state,
such communication facilities allow access to the rather specialized SUMEX
computing environment and programs from a great many areas of the United
States (even to a limited extent from Europe) for potential new research
projects and for research product dissemination and demonstration.

    This past year has seen much activity and growth in the SUMEX-AIM
resource and community.  Two new formal projects (one maturing from an
earlier pilot effort) have been authorized to join the AIM community as
well as several new pilot efforts.  These new projects together with the
increasing load from the earlier projects have raised the loading of
SUMEX-AIM beyond productive limits, particularly during prime time. To
accommodate this load, we received authority from the Executive Committee
to augment the processing capacity of the system - implementation of this
addition is in progress.  Efforts have continued to build inter- and
intra-group interactions through system communication facilities,
workshops,  a local "mini-conference" on AI techniques to pull together the
Stanford community of projects, and a seminar project initiated by
Professor E. A. Feigenbaum of Stanford to assemble from the community a
handbook of AI concepts, methods, and state-of-the-art.  There have also
been continuing,  substantial efforts by the community to introduce non-
affiliated research people to a number of the programs which are far
enough along in their development,  The system staff has worked hard to
maximize the computing resources and to enhance the repertoire of software
available to the community and has investigated a variety of alternatives
related to the import and export of operational programs.  And finally,
the management committees which help direct the allocation and development
of the resource have been active in recruiting and evaluating new
projects, planning future AIM workshops,  and guiding the dissemination of
resource objectives and opportunities to other medical institutions.


II.A.2    TECHNICAL PROGRESS

II.A.2.a    FACILITY HARDWARE

Memory Swapping Drum Svstem:

    The hardware system has stabilized nicely over the past year,
especially after the correction of a design flaw in the DEC-supplied drum
controller (2).  We had been having an abnormally high number of drum
errors, mostly recoverable,  The number exceeded the manufacturers'
specifications and could not be explained by memory overruns or other
contention problems.  After much detective work (and vendor interactions),
we discovered that a delay in the DEC drum controller between "drive
select" and "sector ready" signals was too short to allow the read and
write heads to settle down.  After putting in the appropriate delay
circuitry (in September 19751,  the system has run to date without a single
error (recoverable or permanent) or failure in the swapping system!

Computational Capacity:

   A major event over the past year relating to system hardware was the
decision to upgrade the central processor capacity.  An updated diagram of
the hardware configuration is shown in Figure 1 on page 9. As
discussed in the last report, the high loading of SUMEX-AIM during prime
time has restricted work.  From a subjective viewpoint, the system became
unworkably sluggish above a load average(j) of 4 or 5.   We have made a
number of administrative efforts to push as much work as possible into
non-prime time.  These have included excellent cooperation from user
projects in encouraging programmers to work during night hours, doing
operational functions (such as file system dumps) during the evening, and
providing an effective batch system for running jobs in background mode
and during non-prime hours.  These steps have helped make better use of
the non-prime hours but have not substantially relieved the midday
congestion; the decreases achieved were offset by new development work and
increased community use of operational AI programs (ONET, DENDRAL, MYCIN,
and PARRY in particular).  The principal period during which medical
collaborators can, in practice, work with the programs remains the prime
hours and our goal is to provide a computational capacity which allows
reasonably interactive access to the programs at these times.

----------------------------------------------------------------------
   (2) We follow a long tradition in calling our fast, fixed-head disk a
"drum" to distinguish it from the file system disks

   (3) The "load average" signifies the number of jobs waiting in queue
to be processed at a given instant:  it measures the number of people
awaiting service at that moment,  so that responsiveness will be
(approximately) inversely related to the load average. Two, three, or even
four times as many users may be connected to the system at such times; but
users typically take time out to ponder what the computer has reported, or
the jobs may be preoccupied with input or output rather than the CPU.


                           5



   We made a series of measurements to identify system bottlenecks(4)
and observed a number of simultaneously limiting resources.  A plot of the
elapsed time required to accumulate 1 CPU minute is shown in Figure 2 on
page 10 as a function of load average.  A plot of the corresponding
system I/O wait and core management overhead time is shown in Figure 3 on
page 11.  Data are plotted for a variety of jobs ranging from a one page
CPU-bound job to a large, page-fault intensive jobs and include several
FORTRAN and INTERLISP jobs.  The data were not collected on a dry machine,
they were run at night when the user load was low but not zero.  Thus,
some dispersion exists in the data.  The key features of the data (and
other internal system parameters) are that for small jobs the elapsed
time-to-completion increases linearly with load average.   That is at load
average N, the CPU is split into roughly l/N equal parts.  For larger
jobs, the low load average behavior is a linear increase in time-to-
completion with load average but with an offset in elapsed time.   This
reflects the substantial waiting time (for a given job) to swap pages in
and out.  The I/O wait curve shows much dispersion at low load averages
depending on the character of the system load,  If there are not many jobs
to be run, and one becomes unrunnable because of waiting for swapping or
disk I/O, the wait time will be very high (see the upper curve of Figure
3).  On the other hand, for the small, CPU-bound limit, the I/O wait loss
is negligible (lower curve in Figure 3).

   At load averages above 3 or 'I, a non-linearity sets in for large
jobs because of memory limitations and the increased swapping load
(relative wait time, interrupt service, etc.) on the system.   In the same
limit the I/O wait approaches 15-205.  Of the 512 "core" memory pages
currently on the system, almost 380 are available to users.  With the
current working set (5) limitation parameter (maximum 150 pages), 2-3
large jobs and up to 5 or so mixed jobs may be resident at once.   If more
than this number of jobs are runnable, some must be totally out of core
and receiving no service.  Because it is costly to move whole working sets
in and out of memory (5-10 milliseconds per page), the system attempts to
minimize "thrashing" by approximating a batch mode of scheduling, giving
more run time before forcibly removing one program from memory to run
another.  This degradation is spread around evenly but causes added swap
and core paging time in order to run a job and hence increases the per job
time to complete.

    However, based on user comments at loads of 3-5, the runtime
increase because of load average (even without any additional overhead)
becomes excessive as well - jobs that ran in several minutes at low load
average begin to take tens of minutes at higher loads making interactions
much more cumbersome.  Another factor is that by the time there are more


----------------------------------------------------------------------
   (4) These bottlenecks refer to program execution. File storage has
been another limiting factor for the system and is discussed later (see
page 7)

   (5) The working set is a group of pages which is a subset of the
total active memory used by a program and which the system "guesses"
(based on previous running history) will be addressed during the next
running time quantum.  In this way the system attempts to keep only those
pages needed at any point for a program to execute during its time slice,


6

than 6 large jobs on the system, we begin to run out of drum swapping
space,  The current capacity is 3300 pages and, allowing for the monitor,
can hold 5-6 full 256K work address spaces.  Typically with a load average
of 4-5, there are many more jobs on the system (25 - 30) with a range of
memory requirements.  Thus under such loads,  the drum can accommodate even
fewer large LISP jobs.  Of course when the drum fills, swapping overflows
to the much slower moving head disks and contends as well with regular
disk I/O traffic.  This substantially increases the system I/O wait time.
As noted on page 12, we have implemented a page migration facility which
assures that drum space is used only by active pages; but under heavy load
we may still exceed the capacity.

   From these data it is clear that with the typical mix of jobs on
SUMEX-AIM including many large LISP jobs, above a load average of 5 or so
the system runs out of memory,  CPU, and swapping space at about the same
time.  Because of budgetary constraints we are not able to augment all
three resources at once however.  FOR A GIVEN LOAD, the effect of adding
memory and swapping storage would be to linearize the response curve
(Figure 2) through a reduction in system overhead at load averages above
4.  For load averages in the range of 5-6, this could recover up to 15-20%
in elapsed time/CPU minute.  The augmentation of processor capacity to
first order reduces the overall slope of the curves in Figure 2 and thus
benefits users at all levels of loading.  If the load is truly interactive
(jobs complete or require terminal input after a few minutes), any speed-
up in running time will tend to reduce the load average as well since the
jobs will leave the run queue sooner,  For many long, CPU-bound jobs this
effect doesn't exist and the load average would stay the same - the
overall run times for the jobs would be reduced however.   In the ideal
case, doubling processor capacity would improve elapsed time/CPU minute by
50%.  This cannot be realized in fact since having a faster processor with
the same memory means that the swapping rate will increase and hence total
overhead will go up.

   Because of the greater advantage for interactive jobs and because of
budgetary considerations, the strategy approved by the AIM Executive
Committee was to augment the CPU capacity as the first step, taking note
of the certain need to augment memory and swapping storage soon
thereafter.  In addition to the technical arguments, the rationale for the
decision also takes account of the fact that DEC CPU prices have been
rising recently whereas memory prices are presumably still falling.

    We examined both an upgrade from the KI-10 to a KL-10 and the
installation of a dual processor KI-10.  From a technical viewpoint, our
preference was to upgrade to the KL-i0, particularly in light of DEC's
indication that the machine would be configured (microcode) within the
year to efficiently run TENEX.  For essentially economic reasons, however,
the KL-10 option was not feasible.  DEC marketing has taken a firm stand
about selling KL-10's as "systems"  which means that we would have had to
upgrade not only the CPU but disks, tapes, and data line scanner as well.
The net cost of upgrade would have been in excess of $500,000 - well over
our budget.  In view of this and the feasibility of a dual processor
system based on our studies,  we decided to add a second KI-10.   The
implementation of this plan is now underway and proceeding very well - we
are about ready to bring up the new system for user testing and hope to
have it ready for use during the next AIM workshop this June.


7

   It must be pointed out that at the time the decision for a dual
processor was made (fall of 1975), we realized that the trend within the
ARPA TENEX community could be toward a DEC-supported TENEX system although
DEC had not made clear its plans for TENEX support.   At this time that
indeed seems to be the long term prognosis.  DEC has announced the KL-20
(2040) running TOPS-20 which is a direct descendant of TENEX. The KL-20
is not a fast enough machine to have benefitted us in upgrade (it is
slower than the KI-10) and delivery will only start in volume next fall.
The rest of the KL-20 series of machine has not been announced although
two bigger machines (currently denoted 1080T - to be called 20??) may be
delivered to ARPA contractors late in 1976.  A substantial amount of work
remains to bring the DEC TENEX system up to the state of the current KA
and KI TENEX systems which will likely take another year at least.   Thus
whereas in the long term the dual KI TENEX system will diverge
increasingly from the DEC mainstream (by current projections), the pace
with which it is coming into operational status and the minimal disruption
to on-going user work, support the correctness of the decision relative to
the pragmatic needs of the existing community.

   We had expected delivery of the second KI-10 in late March but
because of scheduling problems within DEC, we did not receive it until
April 15.  Because of additional delays in getting the needed memory and
I/O bus cables to connect the new machine into the existing system, it
could not be checked out to begin software development until the last week
in April.  Once installed, the machine has worked with only a few minor
problems which were quickly corrected.  Software development has gone
equally well and we were ready to bring the full dual processor system up
to begin user testing as of May 16.

Disk File Storage:
--

   As mentioned earlier, the system has been operating at file system
capacity as well over the past year.  We have implemented policies which
regularly clean out the file system (expunge deleted and temporary files
as well as archive old files) to keep user projects within allocated
limits.  Nevertheless, many of the projects face severe constraints in
available on-line storage needed for large LISP program development and
community interactions.  Because the system is fully allocated, there is
little we can do to alleviate the problem within the present hardware. We
are implementing operational improvements as possible to facilitate file
archiving and restoring.  We have also investigated the Datacomputer
facility managed by the Computer Corporation of America over the ARPANET
as well as other sources of on-line storage (at less loaded network sites)
which could be available for a fee,  The space available at the
Datacomputer has been disappointing up to now,  Some projects are trying
out storing files at other sites but because of ARPANET access
constraints, this is likely not a useful long term solution for the whole
SUMEX-AIM community.

   Since augmenting the file system is presently beyond our budget
(higher priority is being given to improving computing capacity through
CPU and memory enhancements), we have encouraged user projects which are


8

particularly cramped for space to assist in budgeting disk augmentations
for themselves and the community.  The DENDRAL and Chemical Synthesis
projects each have proposals to cover additional RP-03 drives.  We have
two slots left on the existing controller and incrementally this is the
lowest cost way to augment file space.  Another highly attractive approach
is using a System Concepts SA-10 "IBM" channel with double-density "3330"
drives .  However the initial cost of the channel, new controller, and one
or two drives would be about $100,000.


,                                                                 .

  Memory         Memory        Memory         Memory

   MF-10            M-F-10           MF-10           MF-10

                                  L
          I       I

Central
Processor
KI-10 81

' :&I
Central       Channel &
Processor      Controller
KI-10 t0      RES-10

   I      o
Controller
RP-1oc

(2) 4800 baud
network links

TSMSHARE
TIP

Figure 1.                         AI Lab
SL%X Dual PDP-10 Configuration        WP

*fast, fixed-head disk


10

      Figure 2.
SUM'EX RESPONSIVENESS UNDER LOAD

D,  Small, CPU-intensive job

KY Large, page-swapping job

     + FORTRAN x-ray diffraction analysis               Large memory,
9   x1 INTERLISP CONGEN lobs                     paging-intensive

LOAD AVERAGE


11

45

40

35

30

25

Figure 3.

I/O WAIT AND CORE MANAGEMENT OVERHEAD

A Small, CPU-intensive job

o Large, page swapping job
-k FORTRAN x-ray diffraction analysis
$ INTERLISP CONGEN jobs
1

                                       m
m
      X

      0
o+ x+
0

  0

X
         X

+

  Small memory,
/  CPU-intensive

        A          A         A           A,
           u                       u     Ir\
1            1                       1
                       1            I
0          1          2         3         4                                  I
                                             5      .6         7

LOAD AVERAGE


12

II.A.2.b    TENEX SYSTEM SOFTWARE

    Our goals to improve resource allocation control capabilities,
improve guest facilities, and maintain system compatibility with other
ARPANET TENEX sites notwithstanding, we have continued to run TENEX
release 1.31 this past year.  The decision to upgrade to a more current
release was deferred pending the decision on the processor augmentation.
Had we been able to work out the acquisition of a KL-10, the monitor
development effort would have had to take a different course to intercept
DEC's monitor development efforts directly.  Since that was not possible
and a second KI-10 was approved in December 1975, we have begun the
conversion to TENEX 1.33/1.34 in concert with the dual processor
development effort (see below for a description of the trade-off3 between
the older 1.33 release and the very recent 1.34).

Swapping Storage Management:

    Earlier in the year, while waiting for the processor decision, we
finished implementing a drum page migration system which ensures that the
drum is used only by active (recently accessed) pages.  This optimizes the
use of swapping space and reduces the substantial overhead when swapping
overflows to the disk which is 5-10 times slower and contends with other
file I/O.  The garbage collector operates cyclically (currently every 10
minutes) and if a page allocated on drum has not been accessed during the
previous interval and if alternative space is available on disk, it
reassigns the page to disk.  The cycle time of 10 minutes was chosen to
give a reasonable time for a program to get around to using a page before
declaring it dormant and at the same time not to penalize swapping of
newly created pages by forcing them to reside on disk too long.   This
cycle time seems to be acceptable as we observe migrations of several
hundred pages per cycle on the average.   This has eliminated the situation
where users first on the system leave dormant jobs around on drum and
users who login later have their job pages allocated on disk because no
drum space is free.  Of course, during really peak loads the drum space
may still become saturated with active jobs.  We find that aggregate I/O
wait times approach 40-50s with significant amounts of disk swapping
whereas using drum, the I/O wait falls to under 20%.

Dual Processor lh?VelODment:

   Since the dual processor decision, the plan of attack has been: a)
develop the dual processor system for KI-TENEX 1.31 which has been a
thoroughly debugged system in our environment, b) in parallel, to transfer
local TENEX changes (drum handler, TYMNET service, special JSYS's, etc.)
to TENEX 1.33/1.34 and debug it as a single processor system, and c) after
the dual processor system has stabilized and TENEX 1.33/1.34 is runninn
well, complete the upgrade of our TENEX 1.33/1.34 to the dual processor
configuration.  The sequencing of these changes is designed to get the
added capacity on line as soon as possible (particularly for the second
workshop at Rutgers the first week in June) and to minimize the impact on
users.


13

   The dual processor software system is in the process of final
implementation and checkout by R. Schulz and B. Hasselblad.  We expect to
bring the system up for user testing by mid May, conduct a detailed
performance evaluation after the workshop in June, and complete
documentation in July.  Thus the following is only a preliminary report on
the overall design philosophy,  We will detail the system implementation
and performance in the next report.  From the start the design emphasized
treating the two machines symmetrically and has maintained the ability to
run the system either as a single or dual processor.  The processors
operate independently using common monitor code and system status data,
each scheduling jobs independently, executing system calls, etc.   The
coordination between the machines is through status information in the
data base and a set of interlocks which each machine can test and avoid
simultaneous interference in sensitive areas.  There have been many
difficult issues in constructing the system of monitor interlocks and in
debugging sections of monitor code for dual processor operation.   This
work has been greatly aided by the highly reentrant nature of the initial
TENEX monitor design.  The dual processor design has remained stable from
its initial conception and implementation is proceeding on schedule.  The
detailed design began in early February with final design and
implementation being done during late March and early April.   After
hardware installation during late April, debugging began.  We began user
access to the system on a test basis on May 16.

TENEX Monitor Upgrade:

  Approximately 6 man-months of effort are being expended to upgrade
the Tenex operating system at SUMEX from version 1.31 to 1.33/1.34.
Version 1.32 was skipped because it was primarily a maintenance release
which contained no new features or capabilities that we desired or
required, although certain bug fixes and efficiency improvements were
incorporated as deemed beneficial.  We have been running version 'I.31 for
some 3 years.  The major portion of work involves the incorporation of
local SUMEX features into the new version including the dual processor
changes, with the ensuing checkout and documentation phase.

   Version 1.33 has been out in the field since January, 1975 and is a
well proven and reliable system. It includes numerous bug fixes and
improvements in efficiency along with a number of new features, the most
important of which is the inception of the pie-slice scheduler.  Version
1.34 is the most current version but has not been running as long.   It has
further updates to the pie-slice scheduler, bug fixes, and a
reorganization of the source code.  A final decision about whether to go
to the proven 1.33 or immediately to 1.34 will be made before the end of
May.  We expect to have the new system up and running by late July.

   The pie-slice scheduler provides system administrators with a
mechanism for dividing user communities into groups ("pie-slices") and
establishing minimum service levels for each group.  These minimum service
levels are guarantees which are met by the system regardless of activity
in other groups.  It is possible, of course,  to observe a level of service
in excess of the guarantee.  This may happen either as a result of a group


14

being explicitly assigned the unclaimed share of an unrepresented group
(the so-called "windfall") or simply as a result of small system load; no
cycles are ever deliberately discarded.  This represents a radical
departure from the basically "laissez faire" 1.31 philosophy. In
particular, at SUMEX where we have three somewhat separate user
communities, a) local Stanford useers, b) national AIM users, and c> SUMEX
staff, we will be able to explicitly assign relative priorities of 40-40-
20 respectively for the three groups and have the Tenex scheduler
dynamically enforce them.

   Completion of the 1.33/1.34 conversion effort has a few other
implications.  First, the dual processor software was developed as a
parallel effort, so those changes will have to be incorporated into
1.3311.34 as well.  Second there is a new version of the EXEC (1.53) that
goes along with 1.33 that has a number of new features and takes advantage
of the new monitor JSYS calls provided by 1.33. Third there will be a
reasonable documentation effort required, although most of the new
features and commands are already documented in machine readable form, and
only need to be put together in a suitable package.  Fourth, there is the
consideration of moving right on to Tenex 1.34.   This version ties the
core management functions of the operating system in more closely to the
pie-slice scheduler,  and would no doubt be beneficial in our environment.

   In any case, the target is to have 1.33 (or 1.34) up and running
long enough before September so that we can be sure we have a stable
system, and any problems that arise can be ironed out during the summer.

TENEX Executive Uparade:

    Another area of software development is in the Executive program
which is the basic user interface to manipulate files, directories, and
devices; control job and terminal parameter settings; observe job and
system status;  and execute public and private programs.   As mentioned
above under "monitor",  significant upgrade work here was delayed pending
the decision on system augmentation since the TENEX upgrade affects the
EXEC as well.  As with all system work, we face a dilemma which is
particularly strongly felt in this area; should we run a "standard" system
or should we adapt things to user community needs and thereby tend toward
a "home-brew" system?  This is a difficult issue in that in many respects
the SUMEX community is special - it includes a broad spectrum of users
from professional computer scientists and programmers to biomedical
research scientists and clinicians.  The latter group, of course, want a
minimum impedance to using the performance programs they are interested in
while the former group wants a rich assortment of system facilities and as
much flexibility as possible.  Since most systems are designed for the
programmer community, we have adopted the viewpoint that controlled
augmentations of the system must be made to accommodate the medical user.
Much of this work is still in process and will be for some time.  The key
point of this effort is to introduce knowledge about the individual user
into the system (such as his usual defaults in using system functions, his
level of expertise coupled to on-line assistance, his domain of interest
to alert him to new information and perhaps personalized system commands


15

or macros convenient to his needs) so that he perceives a system tailored
to his style and conventions in using the computer.

   At this stage such information is stored and used from special files
on a per program basis as in the MSG message reading program and TV
editor.  The EXEC has built in a number of parameter and subcommand
setting commands which can be initialized by the LOGIN.CMD file.   We will
continue to devote effort in this area in up-coming work particularly to
try to design a more uniform system pathway to such user-specific data.

   Other EXEC command changes have been introduced to improve user
interactions with the system (some developed locally and others designed
by BB&N).  These include commands for setting version retention
specifications for files, purging (delete and expunge) individual files,
improved system status displays, mail checking, TENEX error number
interpretation, running programs explicitly as ephemerals (separate,
transient address space) or non-ephemerals, and a group of SET and SHOW
commands for various status conditions,  One particular feature assists
managing the large number of user written and supported programs that are
available to the community,  To keep these programs separate from the
system-supported programs, another directory was created.  Since the EXEC
routinely searched only <SUBSYS>, <CONNECTED>, and <LOGIN> directories to
find a program, it would miss all the user supported software.  To solve
this problem and give each user added control, we implemented a search
path facility that is user settable.  This allows each user to specify
(with the SET PATH command) up to six directories (and their order) for
the EXEC to search to find a program,  The path is initialized by the EXEC
to <SUBSYS>, <USESYS>, <CONNECTED>, and <LOGIN>,

   Other changes will be forthcoming with the upgrade to TENEX
1.33/1.34 and the corresponding EXEC 1.53. These will include better "C
handling to solve type-ahead problems, a facility to have the EXEC
periodically run a program for you (mail check, calendar check, etc.),
system status displays accommodating the pie-slice scheduler, better human
engineering in various areas, and a number of bug fixes.

   System users can find up-to-date information on EXEC features
through the EXEC manual and various on-line files:

<BULLETINS>NEW-EXEC.INFO
<DOC>TENEX-EXEC-MANUAL-UPDATE.INFO
<BULLETINS>LOGIN-CMD.BBD
<DOC>TENEX-133-EXEC.CHANGES

II.A.2.c   NETWORK COMMUNICATION FACILITIES

   A most crucial aspect of the SUMEX system is effective communication
with remote users.  In addition to the economic arguments for terminal
access, networking offers other advantages for shared computing such as
uniform user access to multiple machines and special purpose resources,


16

convenient file transfers for software sharing and multiple machine use,
more effective backup, co-processing between remote machines, and improved
inter-user communications.  We have based our remote communication
services on two networks - TYMNET and ARPANET.  These were the only
networks existing at the start of the project which allowed foreign host
access.  Since then, other commercial network systems (notably TELENET)
have come into existence and are growing in coverage and services.   The
two networks to which we are currently connected complement each other;
the TYMNET providing primarily terminal service with very broad
geographical coverage and unrestricted user access, and the ARPANET having
more limited access but providing a broader range of communication
services.  Together, these networks give a good view of the current
strengths and weaknesses of this approach.

   From the user's viewpoint, the reality of using a remote computer as
if it were next door depends singularly on achieving the perception that a
network connection is like a local telephone call to the computer.
Current network terminal facilities do not quite accomplish the illusion
of a local call,  Data loss is not a problem in network communications -
in fact with the more extensive error checking schemes, data integrity is
much higher than for a long distance phone link.  On the other hand,
networking has as its underlying principle that through shared community
use of telephone lines, widespread geographical coverage is possible at
substantially reduced cost,

TYMNET :

   Networks such as TYMNET are a complex interconnection of nodes and
lines spanning the country (see Figure 4 on page 19).  The primary cause
of delay in passing a message through the network is the time to transfer
a message from node to node and the scheduling of this traffic over
multiplexed lines.  This latter effect only becomes important in heavily
loaded situations; the former is always present.  Clearly from the user
viewpoint, the best situation is to have as few nodes as possible between
him and the host - this means many interconnecting lines through the
network and correspondingly higher costs for the network manager.   TENEX
in some ways emphasizes this conflict more than other time-sharing systems
because of the highly interactive nature of terminal handling (e.g.,
command and file name recognition and non-printing program commands as in
text editors or INTERLISP).  In such instances,  individual characters must
be seen by the host machine to determine the proper echo response in
contrast to other systems where only "line at a time" commands are
al lowed.  We have connected SUMEX to the TYMNET in two places as shown in
Figure 4 so as to allow more direct access from different parts of the
country .  Nevertheless, based on delay time statistics collected over the
past year from our TYMSTAT program, the response times are not very
acceptable.  The aggregate data are statistically summarized in Appendix D
on page 195 and plots of the response time over the past year for
particular nodes where we have extensive data are shown in Figure 5 on
page 20.  When delay times exceed 200-300 milliseconds, the character
printing lag problems become noticable with a full duplex, 30 char/see
terminal.  These times have been particularly bad in New York with peak


17

delays approaching 3 seconds one way ! Other nodes show uniformly high
readings as well.  These data reflect the subjective comments of many of
our user groups as expressed in their individual reports (see Section
IV on page 68).  Problems have been particularly acute for Dr. Sefir's
group in New York and Dr. Amarel at Rutgers (see page 131).

   We have had numerous meetings with TYMNET personnel to try to ease
these problems and have instituted reroutings of the lines connecting
SUMEX-AIM to the network.  Also local lines to more strategic terminal
nodes have been considered for users in areas poorly served by the
existing line layout.  These remedies have not had substantial effects.
In general the TYMNET design goals are not to provide much better service.
To quote from the April 1, 1976 TYMNET User's Group Newsletter:

"Current delay experience is from 0.25 to 3 seconds
for a character to make a round trip through the
network, with an average of 1.2 seconds.  By early
next year,  "Clusters" will be installed in high
density areas and will be interconnected by 9600 BPS
lines.  The result will be that round trip
response/delay time will be less than 1 second for 80%
of the cases and less than 2 seconds for 98% of the

cases.  This also is the design objective for TYMNET
II."

   We will continue to pursue improvements in TYMNET response but user
terminal interactions such as used in TENEX programs are not realized in
the time-sharing systems offered by most other TYMNET users and hence are
not supported well by TYMNET. With these delays, it is not clear how well
the proposed 1200 baud service they are going to inaugurate will work.

ARPANET :

   The ARPANET, while designed for more general information transfer
than purely terminal handling, has similar bottleneck problems in its
topology (see the current geographical and logical maps of the ARPANET in
Figure 6 and Figure 7 on page 21).  These are reduced by the use of
relatively higher speed interconnection lines (50 K baud instead of 2400 -
9600 baud lines as in TYMNET) but response delays through many nodes
become objectionable eventually as well.

   We are enforcing a policy to restrict the use of the ARPANET to
users who have affiliations with ARPA-supported contractors and
system/software interchange with cooperating TENEX sites.   The
administration of the network passed from the ARPA Information Processing
Techniques Office to the Defense Communications Agency as of July 1975.
At that time policies were announced restricting access to DOD-affiliated
users.  We have protected the facilities for calling from SUMEX out to
other sites on the ARPANET to authorized users.  This also protects the
SUMEX-AIM machine from acting as an expensive terminal handler for other


18

machines - this function is better fulfilled by dedicated terminal
handling machines (TIPS).  In general, we have developed excellent working
relationships with other sites on the ARPANET for system backup and
software interchange - such day-to-day working interactions with remote
facilities would not be possible without the integrated file transfer,
communication, and terminal handling capabilities unique to the ARPANET.

   We take very seriously the responsibility to provide effective
communication capabilities to SUMEX-AIM users and are continuously looking
for ways to improve our existing facilities as well as investigate
alternatives becoming available.  We are investigating the TELENET
facilities that have been rapidly expanding this past year.   BB&N has
hooked one of their TENEX systems up to TELENET and subjective reports are
that response problems similar to those reported above were present there
as well.  We have requested specific data on their experience but have not
received any yet.  We have received comments particularly from Professor
Colby's group which uses the ARPANET primarily (see page 116) that serious
network delays place the remote user at a substantial disadvantage in
competing for system resources and that compensating biases in allocation
procedures should be implemented to offset the problems,   Another critical
problem is the lack of high speed printing facilities local to remote
groups.  The new system being installed should help assure remote users
their fair share of CPU; but a simple bias in system percentage will not
offset network delay problems.  The communication problems must be solved
as communications problems and the only way to ensure good terminal
response is to provide high enough speed lines that are not over loaded,


21






11






11






1






1






1






8






6






4






2

1        R       5       7       9        11       13       15       17      19    21    23    25      27 ,    25'      ,l .

Figure 4.  TYMNET Network Map


20

         Figure 5.
   TYMNET RESPONSE DELAY STATISTICS
St. Louis, MO.
Node 1043

1000 -



800 -

New York
Node 1034

600 -

400 -

1000

800

Washington, D.C.
Node 1022
--- 09:00-17:00 (Pacific Time)

600 -

400-


                                   ,-

200-                    '.
               ----- -AH

0 '      I      1      I      I      I      I      I      I      I      I      I
     Jun   Jul  Aug  Sep   Ott   Nov   Dee   Jan   Feb Ma,r   Apr
              1975 .                  1976


Figure 6.

ARPANET GEUGRAFI-UC MAP, FEBRUARY 1976

- LJ                                  -_,--.-.-- r

-Y    /\_ABt\nUttN

SCOTT

* SATELLITE CIRCUIT            `\ +
0 IUP
0 TIP            ul

A PLURIBUS IMP
(NOTE: THIS MAP DOES NOT SHOW ARPdS EXPERIMENTAL
SATELLITE CONNECTIOHS)


Figure 7.

ARPANET LOGICAL MAP, FEWJAHY 1976

I --f
m
[POP-G
--

HP2115
kl
- --
HP2100

STANFORD

d
PDP-;o

SUMEX \
2-l

pcP7x
--

I
PDP-IO

LBL    I UTAH

CRL

I I



(



C

CMU
\

)OCB

0 IMP   A PLUR~BUS IMP
0 TIP  * SATELLITE CIRCUIT

(PLEASE NOTE THAT WHILE THIS MAP SHOWS THE HOST POPULATION OF THE NETWORK ACCORDING TO THE BEST
INFORMA'I'ION OBTAINABLE, NO CLAIM CAN BE MADE FOR ITS ACCURACY)


23

II.A.2.d    SYSTEM RELIABILITY AND BACKUP

   System reliability has improved over the past year; excellent under
stable hardware and software conditions and degrading during debugging and
development periods (drum debugging, dual processor work, etc.) and during
periods of hardware problems.  The pertinent data are given below with
indications of periods during which development took place.

SUMEX-AIM CRASH FREQUENCY (crashes/month)
AND DOWN-TIME DATA (hours/month)

                   1975
Crash Type    MAY JUN JUL AUG SEP OCT

DEC HARDWARE  10 7   22 26    8   10    4   16 9    6 16
SOFTWARE     4 3    6 6    6    0   3 5 1    3 2 5
ENVIRONMENT    0 0    1 0    0    0    1 0 1    12 1
TYMNET HDWRE   0 0    0 0    1    0    0 0 0    1 1 0
UNKNOWN        10    10    0    0    10 1    1 0 0

DOWN-TIME
SCHEDULED   80   79   98  123   72   52
UNSCHED     28   19   30   42    7    4

NOV DEC JAN  FEB MAR APR

52 43 41   48 38 67
3   21 26   11 2 7

1976

DEFINITIONS:

Crash = Any occasion on which an operational system must be restarted
      or reloaded.  Multiple crashes while trying to reload are not
     counted unless the system comes up fully between crashes,

DEC Hardware Crash = Any crash caused by a failure in the PDP-10
    hardware or peripheral equipment (CPU, disk, drum, etc.)

Software Crash = Any crash caused by a malfunction within the TENEX
     software system.

Environmental Crash = Any crash caused by power failure, air
     conditioning outage, lightning, etc.

TYMNET Hardware Crash = Any crash caused by the TYMNET hardware or the
     interface to the PDP-10.  This includes only the times when a
     TYMNET problem causes the PDP-10 to crash and not the times
     when the TYMNET goes down and the PDP-10 continues in
     operation.

Unknown Crash = All other crashes in which the cause is not assignable.


24

Scheduled Down-time = Preventive maintenance time (6-8 hours/week),
    file system backup (3-6 hours/week), scheduled maintenance to
     repair non-critical component failures, and system development
     activities requiring a stand-alone machine.

Unscheduled Down-time = Time lost because of unexpected hardware or
      software failure.  For the most part this is the time to
     diagnose and either repair the problem or to reconfigure the
     system and bring it up to run in a somewhat degraded mode until
     a later scheduled shutdown for permanent repair.

   Whenever development efforts are undertaken which affect the system
hardware or monitor, additional downtime and some period of unreliability
may result causing more crashes than are representative of the overall
reliability of the system.  The following gives some insight into these
development efforts as reflected in the above data,

Jul - Sep 1975: Debug drum system error rate problem.

Late Apr 1976: Begin dual processor installation.

   As can be seen, we have had some periods of hardware unreliability
stemming mostly from intermittent problems.  Particularly troublesome
components of the system in terms of such problems have been the disk
drives, memories, and during hardware relocations, the inter-device
cabling.  The KI-10 CPU has been very stable and given only one problem
over the past year (an I/O bus driver).

   From the user's viewpoint,  besides the obvious inconvenience of not
being able to work during down time, the fragility of the highly
interlinked TENEX file system has caused only a few occasions of having to
backup to previous file system states this past year.  We save changed
files daily and copy the entire file system to fresh disk packs weekly.
Thus an unexpected crash may cause the loss of up to one day's worth of
work - it in fact may take longer for a given user to reconstruct the lost
work if complex debugging or development changes were involved and
undocumented.  When the system is known to be subject to intermittent
crashes, we backup more often to protect users,

   Our current schedule for system backup is early Sunday morning
(Pacific Time).  We now have two students who do the file system backups
at night as well as the archive/retrieve requests.  By moving these
activities to night hours, we off-load them from the prime time and also
provide added coverage for quick recovery from any system crashes.   This
does not require full time attention and the students also help out with
system programming tasks in developing utilities.


25

   Another aspect of reliability and backup is the need to assure
computing service for critical demonstrations, lectures, and the like. We
have a good mutual relationship with existing ARPANET TENEX sites for such
backup when needed (e.g., for the AIM workshop).

II.A.2.e    PROGRAMMING LANGUAGES

   Over the past year we or members of the SUMEX-AIM community have
continued to maintain the major languages on the system at current release
levels, have TENEXized several languages to improve efficiency, and have
investigated a number of issues related to the efficiency of programs
written in various LISP implementations and the exportability of programs.
These issues are becoming increasingly critical in dealing with AI
performance programs which have reached a level of maturity so that
substantial, non-developmental user communities are growing.  The
following summarizes general accomplishments and the following section
discusses in detail the work this past year in designing a machine-
independent SAIL system (MAINSAIL).

General Language Support:

   The ALGOL-like modeling language, SIMULA, was requested by the
DENDRAL group for consideration as a language in which to implement a more
efficient version of the chemical structure generation programs.   The most
recent release of SIMULA has been brought up on the system.   It is also
used by a number of the Rutgers project members.

   Two existing programming languages were TENEXized by Mr. Tom Wolpert
of IMSSS.  TNXFAIL is now the official version of FAIL for TENEX sites.
His code has been incorporated under compilation switch into the standard
FAIL sources maintained at the Stanford Artificial Intelligence Laboratory
(SU-AI).  Mr. Wolpert also TENEXized UC Irvine - LISP (ILISP) which is an
extension of LISP 1.6 to include the break package and editor facilities
of INTERLISP circa 1971.  ILISP is used extensively by Prof. Colby's group
at SUMEX.

   The latest DEC release of FORTRAN10 was installed late last year and
is relatively stable although several bug fixes have since been made. As
part of an effort to remain current with all new DEC releases, we have
also updated the versions of: MACRO, BLISlO, and BASIC.

   Two other languages which received active maintenance at SUMEX this
year are INTERLISP and SAIL,  New versions of INTERLISP are continually
being issued by XEROX-PARC and are brought up on SUMEX by Mr. Larry
Masinter (of Xerox) and Ms. Suzanne Johnson.  Because of the large number
of LISP programs that are written in various versions of INTERLISP (which
are not necessarily compatible with the new releases) and the need to keep



26

many of these programs running for the growing community of collaborative
users, we have implemented orderly scheme of introducing the new versions
slowly.  Old versions are removed only when there are no longer any
SYSOUT's on them.  At the same time we actively encourage users to keep
their programs up-to-date to minimize the maintenance problems with LISP
versions no longer supported.

    TENEX-SAIL is maintained by Dr.  Robert Smith of IMSSS/SUMEX and is
exported through SUMEX so excellent service is locally available.   A PRINT
feature was added to SAIL this year in response to user suggestions as
well as a number of new runtime routines.  This year the RECORD data
structure became available in SAIL and an effort has been made to
familiarize users with this structure which is well-suited to AI
applications.  A collection of utilities for SAIL programming has also
been gathered at SUMEX and introduced to the SAIL users.

   A very important new part of the SAIL system is the BAIL interactive
debugger which was written by Mr.  John Reiser of SU-AI but in close
consultation with SUMEX/IMSSS using the TENEX facilities to test the
BAIL/TENEX interaction,  BAIL allows users to interactively examine and
change the contents of previously-defined variables and to enter SAIL
statements using a subset of the language,  Mr. Reiser offered a BAIL
class at Stanford to introduce his system to local users.

LISP Efficiencv:

    There has been an on-going debate this past year between advocates
of INTERLISP and ILISP over the relative efficiencies of the two languages
and the level of assistance the language systems provide the user in
developing programs.  These issues are important because they influence
the time required to develop new AI programs and subsequently the
incremental load placed on the SUMEX machine when in use.   A number of
people have contributed to an evaluation of these two LISP's including Dr.
R. Smith (IMSSS), Dr. Tom Wolpert (formerly of IMSSS), Mr. Larry Masinter
(Xerox PARC), and Mr. Larry Fagan (MYCIN project - formerly USC-ISI). The
tests were based on an implementation of a subset of REDUCE (a symbolic
algebra manipulator).  The results of several iterations in program
refinement by experts in the respective languages were that the runtimes
for the two versions were quite comparable (far less than the 5-10
disparity predicted by ILISP enthusiasts).  A more disquieting result was
the substantial difference in runtimes depending on how particular
functions were coded IN THE SAME LANGUAGE.  It is apparent from the
results that factors of 10 differences in time can result from a
superficial implementation - expert programming insight is essential to
efficient program performance,  This is not a real surprise in that it is
true of programming in any language - the problems may be increased by
such a rich language as INTERLISP with such a wide array of ways to do the
same thing but with little guidance as to the relative costs.   It has
proven very difficult to quantify the "rules" for good programming. Mr.
Masinter and Mr. Phil Jackson attempted to document good INTERLISP
programming habits and issued a bulletin for SUMEX users.


27

   A further impact of these data is that it is very difficult to
simultaneously develop a new AI program and make the implementation highly
efficient.  With the iterations required to develop the conceptual design
of the program, it is difficult to ensure its efficiency.  This may lead
to the need to reimplement the program after the basic development
stabilizes to increase efficiency while still accommodating convenient and
orderly further development.  Such reimplementation may or may not by best
done in LISP - this will depend on many factors including the nature of
the program data structure requirements and anticipated further
development efforts.

II.A.2.f    MAINSAIL OVERVIEW

   Another aspect of SUMEX's role in encouraging community software
sharing which has received substantial attention this past year is the set
of problems involved in software exportation.  The following is a general
description of the on-going development of a machine-independent
programming system (MAINSAIL) by Mr.  Clark Wilcox of our staff.  A more
detailed description of the language elements can be found in Appendix E
on page 208.

   The MAINSAIL programming system (referred to as SAILEX in the 1975
annual report) has undergone extensive development during the past year.
The considerable interest expressed to date from across the country
indicates that MAINSAIL could be a significant step towards the
distribution of portable software (programs which can be executed, without
alteration, on a variety of computer systems).  In response, SUMEX is
pursuing plans to make MAINSAIL available to other sites, and to promote
the exchange of programs within a diverse computer community.  This type
of language and program sharing is now made more difficult by the
incompatabilities among the various implementations of current languages.
MAINSAIL embodies a unified approach which presents to the user the same
programming system, regardless of what computer or operating system
supports its execution.

   SUMEX, in its role as a nationally shared computer resource, is an
appropriate vehicle for the development of high-quality software unbound
by the underlying machine environment.  We have a built-in community of
program developers acutely aware of the significance of providing their
work to a broader base of users,  This intersection of hardware
capability,  software expertise, and dedication to resource sharing
presents a unique opportunity to promote a system designed for program
sharing.

   MAINSAIL is being developed for two closely-related reasons: as a
general-purpose programming system,  and as a tool for research into the
design of a machine-independent programming system.   MAINSAIL will be
fully implemented and actively used on a number of machines.   It is
perhaps one of the most highly-developed languages available for the mini-


28

computer environment.  Its machine-independent design allows it to be used
for the development of programs on one computer which will be executed on
another.  This capability will be of increasing importance as smaller
dedicated computers are used in conjunction with larger general-purpose
computers, and as programs are more readily exchanged over computer
networks.

   The MAINSAIL language is derived from SAIL, a programming language
developed at Stanford University's Artificial Intelligence Laboratory. It
is not compatible with SAIL, since SAIL was designed for a PDP-10 with
TOPS-lo, and hence contains machine-dependencies.   However it has retained
the basic attributes of SAIL as an extended ALGOL-like language.  Among
MAINSAIL's language features are:

machine-independent language design
straightforward syntax and semantics
efficient code generation for variety of machines
separately compiled segments
double precision integer and real
bit and address manipulation
variable-length strings
dynamic and static records
generic procedures
default and repeatable arguments
static initialization
in-line assembly language
macro facility
compile-time evaluation
conditional compilation
multiple source files during compilation
sequential and random i/o
terminal interaction
comprehensive system procedures
mathematical library
access to dynamic storage allocation
access to runtime system
"garbage collectionl' of strings and records.

   MAINSAIL is designed as a programming system rather than just a
programming language.  It is presently composed of a compiler generator, a
compiler, and a runtime system.  Further components envisioned are a
debugger, a code optimizer,  and a text editor.  All of these components
are to be written in MAINSAIL, and hence made fully portable.  Also, there
are plans for extensions to the MAINSAIL language, such as:

corout ines

extensible data types
extensible code generation
list processing system (LEAP).

    In its role as a research too 1, MAINSAIL
examination of the following design  issues :

is being used for an


29

language design --- what syntactic and semantic constraints are
             imposed by portability.

compiler design --- how can a single language be compiled into
             efficient code for a large number of computers.

runtime design  --- to what extent can the runtime system be made
             transportable.

computer design --- what architectures best support a high-level
             language implementation.

program design  --- what role can portability play in the design of
                reliable software.

   Since the PDP-10 and PDP-11 implementations will be the first to
become available, this user community has been introduced to MAINSAIL at
several meetings of DEWS, the Digital Equipment Corporation Users
Society.  MAINSAIL was first described in a talk delivered at the DECUS
DECsystemlO Fall '75 symposium in Los Angeles.   It was then featured in a
session entitled "Languages for Portability", at the DECUS DECsystemlO
Spring '76 symposium in Hyannis Port,  A paper will be presented at the
DECUS mini/midi Spring `76 symposium in Atlanta. These sessions are
resulting in an almost continual stream of inquiries concerning MAINSAIL,

   Before MAINSAIL is exported to other sites, it will be thoroughly
tested on several local computer systems.  It is now being used on a PDP-
10 with TENEX and a PDP-11 with RTll.  Implementations for a PDP-10 with
TOPS-10 and an IBM-370 with ORVYL (Stanford operating system) are now
under development.  Code has also been generated for an INTERDATA- and
a Data General NOVA, both mini-computers.  There is interest in using
MAINSAIL on a PDP-11 with operating systems such as UNIX and RSX-11; a
DECsystem20; and the NIH-GPP, a parallel processor being constructed for
the NIH Image Processing Unit.  MAINSAIL will be made available on such
machines as sufficient funding is obtained to support an expanded effort.

   A number of projects are interested in using MAINSAIL for the
development of portable software,  Among these are:

robotics project

mass spectrometry system

computer-aided-instruction system for the teaching of logic
automated cell classification laboratory
machine-independent version of (a subset of) INTERLISP
display-oriented text editor (TV-EDIT)

   Several INTERLISP programs are being considered for translation into
MAINSAIL.  These programs have developed to the point that they no longer
require the very general environment supported by INTERLISP, and hence can
avoid the related inefficiencies,  Also, there is interest in providing
such programs to a wider community with systems other than a PDP-10 with
TENEX.


30

    MAINSAIL can furnish the capabilities necessary to develop generally
useful software, and to distribute that software to a wider community than
any existing language can now reach.  To reach that goal, several issues
must be resolved.

    First, it must be demonstrated that MAINSAIL can be efficiently
implemented on a wide variety of computer systems.   The creation of new
implementations must be largely automated, with the machine-dependent
aspects requiring at most a few man-months for development.   Some
plausible target systems are:

PDP-10      (TENEx, ~0~~40)
PDP-11      (RT-11, RSX-11, RSTS, UNIX)
DECsystem20
IBM-360/370   (ORVYL, TSS, TSO)
NOVA/ECLIPSE

additional large machines: CDC, UNIVAC, . . .
additional small machines: INTERDATA, TI990, MICRODATA, HP-3000, . . .

   Second, a suitable means of distributing the MAINSAIL programming
system, along with portable software written in MAINSAIL, must be
determined which is consistent with resources available to SUMEX. Steps
must be taken to insure compatibility among the various sites, and to
minimize maintenance of the machine-independent parts.  The development
and distribution of MAINSAIL is viewed as a time-consuming process fraught
with complications.  SUMEX must be careful not to make promises which
cannot be kept.

   Finally, program developers must be motivated to design machine-
independent systems, to promote their distribution, and to use portable
software developed at other sites.  Machine-dependencies must be
eliminated,  or at least isolated and documented.  Program design must
face, from the onset, the restrictions imposed by portability.

II.A.2.g    OPERATIONS AND USER SOFTWARE

Operations Programs:

    The programs which assist system operations and management have been
effectively organized this past year.  A catalog has been made of
approximately 40 operator programs which had been maintained by various
staff members to facilitate their particular tasks and passed along to
other staff members informally only.  Some of the purposes of the new
operator utilities include: setting the GUEST password and the access of
guests to other user programs (only restricted access is allowed for
guests); providing summary information on directories and groups;


31

exercising various hardware devices; performing error analysis for system
crashes; changing system downtime information; creating new directories
and setting the appropriate parameters for each user; sending important
notifications to logged in terminals; watching terminal activity; and
handling special resource allocation situations such as project
demonstrations.  Many operator tasks have also been automated as
"autojobs" or batch jobs which run on the system performing tasks
continuously or on predefined time schedules.  In addition to this
collection of basic operator utility programs, software has also been
improved for system-wide statistics gathering, accounting (including
information on diurnal community loading and service), disk allocation
checking and enforcement, system backup onto tape for file protection (any
file in existence for 24 hours will now be guaranteed retrievable for a
two-month period), and currently a new spooler is being written for the
lineprinter to provide faster handling with more control of the printer
queue available,  Another development effort is in the area of collecting
and organizing statistics about subsystem use.  Such data can help in
planning where to concentrate development effort, determining which users
to notify about new program information, and getting users and user groups
with similar interests together.

User Software:

   As the user community has stabilized certain trends in user needs
have been noted and utility software has been collected and written in
those areas.  Particular attention was also paid this year to making the
user interfaces for all existing utilities as consistent as possible and
many programs received minor cosmetic modifications.

STATISTICS PACKAGES

   Three new utilities packages were added: STP (STATPACK Statistical
Package) and BANK (a data management package) both from Western Michigan
University; and SPSS (Statistical Package for the Social Sciences)
converted by University of Pittsburgh for PDPlO use (and obtained from
them under contract).

DEC SOFTWARE

   There is a new DEC policy to issue maintenance releases at least
every two years for all software.  This has resulted in a large volume of
new DEC releases over the past year.  Essentially we receive DEC software
under three separate arrangements.  First, FORTRAN10 and LINK10 were
purchased under contract.  Second, we have subscribed on an annual basis
to the DEC distribution tapes from which this year, we put up new versions
of RUNOFF, PIP, BASIC, BLISlO, CCL, FILEX, CREF, CHANGE (known at SUMEX as
DCHANGE due to name conflict), FLECS, GLOB, MTCOPY, MACRO, TECO, and
LOADER.  A variety of bug fixes were done on these programs and sent back
to DEC.  Also, the SCAN program was modified for handling TENEX-style
directory names for these programs.  For most of the programs, after some
practice and with the help of a TECO routine to automatically check the
stored SRCCOM record of local modifications to the old versions, it became


32

a one or two day effort to get each program up.   The third source of DEC
software is from the DECUS tapes which are ordered individually with a
minimal per tape charge.  We order all programs from these tapes which
appeared to be possibly useful.  The tapes are then stored until a user
request is actually received for a given program.  The only such request
to date has been for SIMULA.

PA 1050 EMULATOR

    All of the above official DEC software plus an assortment of other
programs imported from TOPS10 sites can only be run through the use of a
TOPS10 emulator - a program which converts all TOPS-10 system calls
(UUO's) into "equivalent" JSYS TENEX calls. The PA1050 emulator is a
standard part of TENEX software.  However,  it is badly out-of-date and
inadequate for many complicated programming situations.   SUMEX has
developed its own version of PA1050 (incorporating features of other local
versions wherever possible).  The SUMEX PA1050 has now received reasonably
wide distribution to other TENEX sites.  Modifications to PA1050 this year
included:  conditional code for SRI and IMSSS use of PA1050, a change in
the standard altmode character, additions to the terminal input/output
routines, a provision for running FORTRAN jobs detached and a number of
other changes to accommodate the new FORTRANlO.   Also added were
capabilities to assign device names to common directories to facilitate
easy access, additions required by the plotter, debugging of buffer
synchronization, improvements in the pseudo-interrupt (PSI) handling, and
other assorted bug fixes.

EDITORS

    A variety of editors are offered so that users will have a choice
and to make available any editor that a user may be accustomed to from
another PDPlO system.  In general, only SOS or TECO for hardcopy terminals
and TV for DataMedia display terminals are widely used,   We have the TENEX
version of TECO and also added the DEC TECO this year.   The transition
from our local SOS to UTAH-10 SOS is proceeding smoothly and should be
completed this month.  Error reports and suggestions for new features were
solicited from all SOS users and passed along to KKAY@UTAH-10. Similarly,
we have made a recent effort to determine the SUMEX community priorities
for development of TV; and we are currently conferring with Mr. Pentti
Kanerva of IMSSS regarding the future direction of TV development.

BATCH PROCESSING

   With the increasing system load,  it is advantageous to both the user
and the system to perform as much work in non-peak hours as possible.   The
BATCH facility allows submission of non-interactive jobs to be spooled for
later automatic running.  BATCH required extensive debugging and
modification this year but is now fully operational.  Users are being
strongly encouraged to consider this option.


33

MESSAGE PROGRAMS

   Bug fixing and streamlining of the SNDMSG (mail sending facility)
has continued.  We have also adopted a new standard mail reading facility
called MSG which is authored/maintained by Mr. John Vittal of the
Information Sciences Institute (ISIB).  Extensive communication with Mr.
Vittal in the developmental stages has lead to a product which serves the
SUMEX community needs very well and runs with no problems after local
incompatibilities were diagnosed and accommodated in the MSG program.

SEARCH PROGRAMS

   A new fast substring search program, XSEARCH, has been written in
SAIL by Mr. Scott Daniels of IMSSS/SUMEX.  This has been highly successful
as a stand-alone search program.  It can look through the entire <DOC>
directory in about a minute of CPU time.  The core code of XSEARCH has
also been worked into two library packages:  USEARCH which stresses
flexible printing of the context of the found string (and is used as the
basis for the new HELP program) and PSEARCH which uses TENEX PMAP'ing to
increase the search speed even more.  A similar very fast algorithm has
been incorporated in the WHOIS program by Dr. Lederberg.   WHOIS is an
often-used program for searching a file containing user names, addresses,
and affiliations and the speed-up from the better search algorithm has
been a major improvement.

OMNIGRAPH

   SUMEX has added a CALCOMP plotter, a Tektronix terminal, and a GT40
light-pen facility to the locally available hardware.   The software for
the Tektronix was already available in the OMNIGRAPH package from NIH.
SUMEX did write a local demonstration program for the cross-hair feature
of the Tektronix.  The CALCOMP plotter was also available with OMNIGRAPH;
but due to differences both in the facilities offered with the particular
terminal and in the TOPS10 and TENEX environments, a large scale three-
fold programming effort was necessary to get our CALCOMP running.  First,
a spooler is being written for the plotter (independent of OMNIGRAPH).
Second, the PLOTX program of OMNIGRAPH has been debugged and extended for
local use: online access is allowed, plot titles are used, x and y may be
stretched independently, a clipping routine is added, and the SAIL record
data structure is used so that the limit on number of plots is removed.
Third, a general CALCOMP control load module was written which currently
is used to drive PLOTX but could also be used as a general plotting
facility to form the basis of other plotting packages.   This module adds
more extensions to the plotting capabilities:  string prints and line plots
with edge checking and arbitrary specification of character codes.

    A light pen facility has been added to the OMNIGRAPH code for the
GT40 by Mr. Frank Wingate or our staff.  This work was done in conjunction
with NIH and the original design was followed with all code being
incorporated back into the NIH master source files for OMNIGRAPH. A
demonstration program was also written for this new feature.


34

DISPLAY TERMINAL SUPPORT

    An increasing number of users have access to Datamedia display
terminals,  In addition to the TV editor, a variety of programs have been
written specifically for these terminals.  EDIR places a representation of
the user's directory on the screen and allows him/her to point the cursor
at a file and give a command letter to indicate desired actions such as
Archive, Delete, Undelete, Type (which clears and later refreshes the
screen), List, etc.  The screen is updated to reflect the current state of
each file and a running total of active and deleted file pages is
displayed at the top of the screen.

   SYSMON displays system loading statistics on the CRT screen and
updates the display at regular intervals to reflect changes.   included in
the display is a ranking of active users according to CPU time used and
statistics on i/o use, size of the balance set, idle time, load average,
etc.

    SCROLL is an editor for creating and storing pictures which can then
be reprinted on the screen at any time under program control.   The editor
facility is especially designed for moving freely among the parts of a
picture and is particularly suited for flow-chart drawing.  The ability to
call pictures which are stored under a given name is very useful in CA1
work and other demonstrations where predefined displays can be used as
illustrations.

RECORD

   The RECORD program written in SAIL by Dr. Robert Smith has become
very popular at SUMEX this year and also has been modified to meet the
needs of the DENDRAL group for dealing with their GUEST users.   RECORD was
designed to use the pseudo-teletype facilities for making a file
typescript of a terminal session.   It has been extended so that the PTY
job, once started, can run independently freeing the physical terminal for
other work.  In the new DENDRAL version, a guest user logging-in on a
special directory is automatically interfaced to RECORD to make a
typescript of the session (with the user's knowledge).   This entire
operation is transparent to the user and facilitates later analysis of the
session by the DENDRAL staff to learn where the programs need improvement.

PUB

   PUB (written in SAIL by Mr. Larry Tesler formerly of SU-AI and
currently at XEROX-PARC) is a very powerful documentation preparation
language.  It is more difficult to use than DEC's RUNOFF but also is more
flexible.  Last year SUMEX produced a substantially improved set of macros
that make use of PUB simpler.  This was accompanied by a new manual and a
series of well-received PUB introductory classes.   New extensions to the
PUB macro facility this year include: automatic bibliographic entries from
a library with flexible cross-reference printing formats, automatic
queuing and placement of tables and figures, multi-column handling, and a
marginal notation feature allowing revised or otherwise emphasized text to


35

be marked.  Requests for the SUMEX macro package for PUB have been
received from NIH, SRI, AMES, ISI, and IMSSS.

DIABLO

   This program is a driver for terminals with a DIABLO or DIABLO-like
print mechanism which are used to produce high quality typed output.  The
DIABLO program is specifically designed for printing PUB output files and
it will handle PUB underlining and half-line printing.  DIABLO now
supports GEN COM as well as DTC terminals.  The PUB/DIABLO combination has
been upgraded to print a form of proportional spacing which consists of
evening the spaces between words when justifying to the right margin.
This produced a significant improvement in the appearance of the output.
Plans are also on-going to include a hyphenation algorithm in PUB which
will be another significant improvement.

TAPE PROGRAMS

   Foreign tapes continue to cause time-consuming problems in
transferring data to our machine.  New tape documentation has been written
which explains the tape situation to users and makes recommendations about
which of the various tape reading programs to use for various format
tapes.  Two new tape programs have also been added.  MTCOPY from DEC has
the ability to handle multiple tapes in a single command.  The SUPXEC
program by Mr. Ron Roberts of IMSSS was designed to operate like an EXEC
for tapes with complete Directory, Copy, and Delete commands available.

   A large variety of smaller utilities programs have also been added
to the SUMEX catalog.  With the file system operating near capacity, one
emphasis this year has been on file management programs.  Utilities are
now available:  to plot the age of a user's files, to allow
deletion/archiving of all files older than a cut-off date, to find all
files newer than n hours, to clean up wasted file space in text files, to
find files with multiple versions, to rename files, to copy selected
portions of files together, to view the actual character codes in a file,
to check the file-descriptor-block (FDB) for a file, to find all new
public files, to find the exact location of a file, to recognize partial
filenames, to print files in reverse order, to provide a handle on files
too large to edit conveniently, to encrypt text files, to convert the
character case of files, etc.

  Another area of utilities development has been programs to manage
personal calendars or to serve as on-line reminder systems.  And, of
course, an area of continuing development is informational utilities. T
WHOIS program to give name/address/phone number and project affiliation
information on users may well be the most-often used of these utility
programs.

'he


36

II.A.2.h    USER ASSISTANCE AND CONSULTING

   User consulting continues to play a key role at SUMEX.   Because of
the geographic distribution of our users where they may have little or no
direct contact with computer experts and the nature of the user community
in which many non-computer professionals are involved, the user consulting
load is higher than on most similar systems. While direct individual
consultation has been a major component of the effort and will continue to
be, other solutions to the problem are continually sought.   A number of
approaches have proven useful:

1) Foster interactions among users themselves to help each other learn and
to solve particular problems,  This has been the case with the new
statistical packages,  The staff is available to fix program bugs but
in fact has little expertise in the use of the packages.  So the groups
expressing interest have been put in touch with each other.   This
effort has been quite successful.

2)

3)

Among the systems coupled by networks (ARPANET currently), focus
maintenance responsibility for particular pieces of software in the
groups which developed them or where an extensive expertise already
exists.  Typically the author/maintainer is best able to deal with user
problems.  With the ARPANET to provide communication access at a non-
local site, the duplicated effort in a local staff trying to
familiarize themselves with imported programs for bug fixing and
consulting is minimized.  An example of this is the shift this year
from our local SOS editor program which had been developed by a former
staff member and was no longer actively supported by any current staff
member to a version of SOS which is well-maintained at UTAH-10 by Kevin
Kay.  He analyzed our version, incorporated the improved features into
his version, and assumed support of it at SUMEX.   This is an
increasingly common phenomenon among the TENEX sites on the ARPANET.
Other software maintained by one site for all the sites includes MSG,
REDUCE, SAIL, INTERLISP and PUB.  In general,  one local staff member is
the primary contact for each of these and does handle routine problems
and questions but does so in close communication with the author.   And
in fact, for both SOS and SAIL, users are encouraged to go directly to
the authors; SOS has a built-in GRIPE facility which sends the user
message to SOS@SRI.

Employ media which reach a large number of users rather than dealing on
an individual basis.  This includes writing of documentation (see
Section II.A.2.j on page 40) and holding classes and tutorials.
Last year, several classes were given (PUB, SAIL, and machine
language).  These were successful; and we have followed up with an
advanced SAIL course, a BAIL lecture, and a planned repeat of the PUB
class.  More informal meetings have also been held to discuss issues
such as choice of a programming language and efficient programming in
languages such as SAIL and INTERLISP.  Participants in last year's
Workshop requested an INTERLISP tutorial which Dr. R. Carhart of the
DENDRAL project gave.  A number of tutorials on languages and other
aspects of SUMEX use are planned for the June 1976 Workshop.

4) Use other techniques for interactive on-line teaching.   Interactive


37

computers have been used in a number of areas for computer-assisted-
instruction (CA11 applications.  We may be able to adapt some of these
techniques and will be studying possibilities of a training mode for
selected programs (possibly a text editor or the EXEC). In this we are
fortunate to have close ties with IMSSS, a well-known leader in the
field of CAI.

II.A.2.i    INTRA-COMMUNITY COMMUNICATION

Help Svstem:

   A substantial problem for users not intimately familiar with the
system (and even those who are) is how to locate the appropriate
documentation on-line to answer a particular question as it arises.  Many
times a staff member or other user is not available to help so we have
been developing various forms of on-line assistance or "help".   After
considering a number of help schemes, early this year we put up a
temporary help program which optionally printed a general information file
and then interactively helped the user search through the names of the
<DOC> files with a keyword search.  This approach is rather effective at
SUMEX where long filenames composed of as many keywords as possible are
used to identify the information files,  The interim help program was
well-liked by users.  A new more sophisticated version is currently being
tested.

   It offers all the facilities of the simple version but it also has
extensions in three basic directions,  First, in addition to checking the
filenames for the given keyword,  it also checks the contents of several
general information files, e.g., the file listing all programs with a one-
line summary of each, a file containing an on-line index of the jsys's, a
file containing entries on new programs, and a file containing the TENEX
command equivalents for TOPS10 commands.  If the keyword is found in the
keyword line of any of these entries then the individual entry is printed
out.  These files were all existing documentation files which needed only
slight format changes to be used in this way.  Second, the user can
specify certain standard keywords which allows him to do more specialized
searching of the relevant data base for the topic.  Third, when the
filenames of the <DOC> directory are searched for the keyword and a list
of possibly relevant files has been produced then the user has several
options.  The contents of any of the files can be searched for further
keywords or the entire files (or selected pages) can be typed on the
terminal, put together in a new file, or listed on the printer.   This
allows the user to browse around and tailor-make a new document with just
the desired pieces found.


38

Bulletin Board System:

   Another kind of user communication system has been under design and
implementation for some time at SUMEX.  Some information is of a more
informal or transient nature than that comprising files suitable for the
<DOC> directory.  Other types of information have relevance for only a
subset of the community such as intra-project communication about program
design, external users, etc.  We have maintained a <BULLETINS> directory
for system-related "bulletins" (see page 228 for a current directory
listing) but have needed a more general facility serving public and
private bulletin boards as well as allowing users to selectively direct
their interests to particular subsets of the information.   Such a
bulletin-board system would fill the gap between the <DOC> directory,
intended for permanent documentation, and the mail system, where each user
(and the system) has his own mailbox.  It would complement the help
system,  providing quick access to such intermediate information.

   An on-line "bulletin board" system is nearly ready for release to
the SUMEX community.  The following is a brief description of the system
in its preliminary implementation.  We expect it to grow and change as
users begin to use it and identify additional communication needs.

   The system has a number of bulletin boards available.  A public
bulletin board encompasses system-wide information.  It is for community
use and has bulletins on new features of system programs, announcements,
corrections, progress reports, suggestions, queries, and comments of
general interest to the SUMEX community.  There may be need for other
public bulletin boards such as for the AIM workshop depending on the
overall volume of information involved.  Private bulletin boards may focus
on individual projects (e.g., DENDRAL, ONET, MOLGEN, etc.), subsystems, or
other subgroupings of the community.

   Bulletins are filed on the bulletin boards under topics, which may
have any number of subtopics.  They each have an expiration date, because
some information may be of a temporary nature.  In general, the kind of
bulletin posted is what used to be sent out in multiple copies with a
SNDMSG distribution list to a number of users, and would now be posted on
the bulletin board pertaining to that interest group or project,

    Two programs operate on the bulletin-board files.   POST is the
equivalent of SNDMSG or ADDMSG with extra editing and display features and
doubles as a bulletin poster.  POST can send copies of the same bulletin
to a user list and bulletin boards at the same time.  When not used to
post a bulletin,  it behaves like SNDMSG.  The program BBD performs other
inquiry and editing functions on bulletin boards,  It is designed to
behave as the TENEX EXEC does, for consistency with existing command
conventions and ease of use.   There are also some similarities between BBD
and message-reading programs like MSG and BANANAAD.   A BBD user will be
able to connect to the bulletin board of his choice and:

Get a directory listing of topics and bulletins

Type a (set of) bulletin(s) by number or topic name


39

Ask to see the first 5 lines only of each

Copy bulletins to other files (including TTY: or LPT:)
  in message format

   In each of the above, the user can narrow his range of bulletins by
asking BBD only for those that meet certain criteria, such as:

New bulletins

Bulletins filed under topics on his "interest list"

Bulletins written by a particular author or group of authors

Deleted bulletins

Expired bulletins

Bulletins posted before or after a specified date

Bulletins with a desired phrase in the message or subject line

or combinations of the above.

   There will be a notification system whereby bulletin-board users are
notified once per day of new bulletin arrivals.  The user tells BBD to add
a given topic to his "interest list".  He is notified of new arrivals only
for those topics, unless his interest list is "all topics".   There is a
separate interest list for each bulletin board.  The notification system
can easily be extended to be an automatic reminder system.  One could post
a bulletin on a "Reminders" bulletin-board.   The bulletin would be sent to
those to whom it was addressed when it expired.

   Bulletin-board users will likely want BBD to assume a few things for
them.  They may specify a bulletin-board to connect to upon entering the
program (default is the main bulletin-board), or they may never want to
see anything but what is on their interest list.   BBD will have a
"Defaults" command which will set these things up on a per user basis.

   Each bulletin board will have a manager, who has special privileges.
He will be able to create and destroy topics, delete and undelete
bulletins,  reorder the bulletins in each topic,  change the assignment of
bulletins to topics, and change expire dates.  Everybody will be able to
delete, undelete, refile, and change the expire dates of bulletins they
author.

    A first version of the bulletin board system is in final checkout.
We expect to release it to the community this summer and to continue
development based on user reactions and needs during the upcoming year.


40

II.A.2.j    DOCUMENTATION AND EDUCATION

    SUMEX has set and maintained high standards for documentation.   This
year we achieved virtually complete documentation of all available
programs.  The list of the <DOC> directory reported last year contained
142 files.  That has now increased to 220 files (see Appendix F on page
218); many of which have been updated and reorganized.  All of the general
information files have also been updated during the course of the year.
SUMEX is probably the best documented PDPIO site on the ARPANET. However,
we have long recognized the fact that the usefulness of the documentation
is severely limited by the ease of access to the particular bit of
information that a user is currently seeking,   As the volume of
documentation increases,  more information is available in an abstract
sense but may perversely become less available in real terms because of
the difficulty of finding it.  The HELP and BULLETIN BOARD systems are
designed to help overcome these problems,

   This year, SUMEX submitted a 25-page entry to the ARPANET Resource
Handbook (September 1975) which contains a variety of information on the
ARPANET host systems.  Our entry includes a description of the SUMEX
facility and projects, a list of the software available for export with a
policy statement on the procedure for obtaining this software, and a
summary of the major areas of interest at SUMEX.  A copy of this material
is attached as Appendix G on page 230.

   As a follow-up to last year's very successful SAIL introductory
classes, this year a series of more advanced SAIL and Machine Language
seminars was given by Dr. R. Smith.  These covered the interface of SAIL
to the PDPlO host machine and timesharing system, SAIL program debugging,
and implementation and efficiency considerations.   Mr. John Reiser of SU-
AI gave a seminar on the features and use of the new SAIL debugging
system, BAIL.  A SAIL Tutorial has been written by Dr. Nancy Smith of
SUMEX which will be published shortly by SU-AI along with a reprinting of
the standard SAIL document plus the TENEX-SAIL Manual and the new BAIL
Manual.  Finally, Dr. Nancy Smith, in conjunction with the improved macro
facilities and documentation, gave an introductory class in using PUB.

II.A.2.k    SOFTWARE COMPATIBILITY AND SHARING

   Over the past year,  in our commitment to software importation where
possible rather than reinvention, we have encountered numerous experiences
in the sharing of software,  At SUMEX many avenues exist for sharing
between the system staff, various user projects, other facilities, and
vendors.  In the past without communication networks, the system vendor
served as the focal point for distribution of most software to user sites,
Since the process of distributing tapes (and particularly of handling bug
reports and user suggestions) was very slow, it was common for sites to
take a version of a program and then modify and maintain it locally.   This
caused a proliferation of home-grown versions of software.   Similar
impediments have existed to the dissemination of user software.   User
organizations like SHARE and DECUS have helped to overcome these problems
but communication is still cumbersome.  The advent of fast and convenient
communication facilities coupling communities of computer facilities has


41

the potential of making a major difference in facilitating inter-group
cooperation and to lower these barriers.

   Recently, the TENEX sites on the ARPANET have been interacting
increasingly with each other to develop new software systems.  This
functions effectively to build communication around the network and
promote a functional division of labor and expertise.  The other major
advantage is that as a by-product of the constant communication about
particular software, personal connections between staff members of the
various sites develop.  These connections serve to pass general
information about software tools and to encourage the exchange of ideas
among the sites.  Certain common problems are now regularly discussed on a
multi-site level.  We continue to draw significant amounts of system
software from other ARPANET sites reciprocating with our own local
developments.

   Currently the number of sites involved is relatively small and the
interactions are informal.  It may be that this informality is an
essential ingredient to making this process work, much as friendships
among people develop,  It may be the bureaucracy of vendor systems and
procedures (which do have useful fallout in uniform documentation,
interfaces, etc.) which caused the proliferation of home-grown systems in
the past.  Indeed our own attempt at building a SAIL library may have
foundered because we tried to be too formal about it.

   We began an effort last year to accumulate useful SAIL library
routines from the various groups which have been working with this
language (Stanford AI, IMSSS, SRI, NIH, USC-ISI, etc.).   It has been
somewhat surprising that so little communication of SAIL library programs
has taken place - it is almost literally true that each user has his own
stock of tools in private procedure libraries.  We sent a letter to
interested groups soliciting inputs on a basis which attempted to balance
the problem of assuring library quality and integrity against establishing
so high a threshold for quality and polish that individuals are not
motivated to cooperate.  Despite the willingness of active community to
share time and ideas on an individual basis,  there have been virtually no
external entries to the library in response to our efforts.

   In other areas, however,  where individuals have undertaken the
design of major software components, mutual design cooperation between
sites has a growing list of examples of success.  Undoubtedly the
particular personalities involved play some role as well as the
orientation of the funding agencies.

    Certainly the TENEX operating system itself is an example of
community cooperation although there has been some tendency for
localization because of BB&N's rigidity.  Other examples of cooperation
mentioned earlier include SOS, MSG, PUB, the batch system, and our
substantial efforts to contribute to software exportability through
developing the MAINSAIL system.  This latter effort has received very
enthusiastic support from many quarters of the computer community.   Other
noteworthy examples encountered this past year include the following.

When Mr. John Reiser began writing the BAIL debugger for SAIL, he


42

was contacted and agreed to design the program for maximum compatibility
between the SU-AI system,  TENET, and standard TOPSlO.   This effort was
quite successful.  BAIL was written in such a way as to use each of the
operating systems optimally with no compromise in program design.   One
estimate of the extra development time involved was only 'IO per cent.   The
important ingredients are complete program comments, modular design, and
no unnecessary system dependent code.   It can be contrasted with other
programs written at SU-AI.  For example,  in the process of designing a
bulletins system, SUMEX learned of the AP NEWS Service program at SU-AI
which has many features similar to our design for bulletins.  The program
was studied and proved to be very difficult to transfer and adapt because
of the choice of language and the degree of dependence on the SU-AI home-
brew operating system.   Both BAIL and NEWS were written at SU-AI and both
are well-written programs by other criteria.

     MLAB and OMNIGRAPH --  Even without the facilities of the ARPANET
and with all the compatibility problems of TOPSlO/TENEX program sharing,
our interactions with the NIH Division of Computing Research and
Technology concerning MLAB and OMNIGRAPH have been mutually beneficial.
SUMEX has sent code for a "TENEX" conditional compilation switch to NIH
which has been incorporated into the source files.  Also, the light pen
and plotting development work done here this past year has had close
communication with NIH.

   With very careful organization by Dr. R. Smith, the export of TENEX-
SAIL has proceeded with very good community cooperation,  A special
directory called <XSAIL> has been created (which can be accessed by the
ANONYMOUS feature of the FTP file transfer program so that there is no
need for exchanging passwords).  This directory contains ALL the necessary
files for exporting the SAIL system.  All sites running TENEX-SAIL were
individually contacted and requested to appoint a local contact for
communications purposes.

    The openness of the ARPANET communication facilities encourages some
members to copy pieces of software without the author's knowledge thereby
defeating the necessary more orderly processes of maintenance and upgrade
and in some cases losing proper attribution for the program's development.
This has occurred with a number of the programs that we have available for
export.  It is difficult to control such behavior without at the same time
limiting access for cooperating members of the community.  We try to
discourage it by pointing out the self-defeating effects.

    Another continuing effort is in the maintenance of software
compatibility with DEC.  The PA1050 program (for TOPS10 emulation) is an
important part of the software for each TENEX site.   SUMEX made a search
for all local versions of PA1050 and combined the best features.   Much new
work was also done.  This version has been made available to other TENEX
sites - IMSSS and SRI are running, it with direct cooperation and other
sites have copied it without informing us.  Since almost all sites using
TENEX are doing government-funded work and this is an obligatory condition
for ARPANET access, we have not felt it necessary until now to take
strenuous (and possible costly) measures to protect this software. We
will, however,  review this problem periodically.


43

   A new aspect of DEC compatibility arises with the announcement of
the TOPS-20 operating system which has been developed by DEC for their KL-
20 machine.  The current 2040 system is a relatively small system but will
be followed by larger members of the 20-family.  TOPS-20 is based on
TENEX.  It appears that ARPA may be transferring support from BBN to DEC
for system development and the ARPANET interface for the system.  This has
the potential for greatly decreasing the compatibility problems since both
TOPS-10 and TOPS-20 will be under DEC control,  On the other hand, DEC has
implemented a variety of minor changes already (new JSYS's, different file
name notation, etc.) which are causing a divergence between TOPS-20 and
TENEX that may well lead to greater compatibility problems than exist now.
We noted these possibilities in the decision to remain with TENEX and
implement the dual processor system,  The timing of DEC's evolution of
TOPS-20 with larger scale processors is uncertain as is the rate with
which the ARPANET community might move in that direction.  There are many
existing KA-10 and KI-10 machines running TENEX for which there are no
current prospects of replacement.  Over the next few years we feel our
decision was correct, especially in view of budgetary constraints.
However, we are sensitive to remaining as parallel as possible with the
mainstream of the community and will actively pursue this goal.


4 4

II.A.3    RESOURCE MANAGEMENT

   Over the past year, the SUMEX project has devoted a substantial part
of its effort toward its community-building role in recruiting new
projects, promoting interactions between user projects, and encouraging
dissemination of running performance programs to medical scientists.   The
following summarizes specific aspects of SUMEX-AIM community management
activities.

II.A.3.a    MANAGEMENT COMMITTEES

   The SUMEX-AIM resource is constituted to attempt to bring into
closer contact collaborating health research groups from around the
country.  This mission entails both the recruitment of appropriate
research projects interested in medical AI applications and the catalysis
of interactions among these groups and the broader medical community. As
this effort is not a unilateral undertaking by its very nature, we have
created several management committees to assist in administering the
various portions of the SUMEX resource.   As defined in the SUMEX-AIM
management plan adopted at the time the resource grant was awarded, the
available facility capacity is allocated 40% to Stanford Medical School
projects, 40% to national projects, and 20% to system development and
related functions.

    Within the Stanford aliquot,  Dr. Lederbern has established an
advisory committee to assist him in selecting and allocating resources
among projects appropriate to the SUMEX mission.  The current membership
of this committee is listed in Appendix H.

    For the national community,  two committees serve complementary
functions.  An Executive Committee oversees the operations of the resource
as related to national users and makes the final decisions on authorizing
admission for projects.  It also establishes policies for resource
allocation and approves plans for resource development and augmentation
within the national portion of SUMEX.   The Executive Committee oversees
the planning and implementation of the AIM Workshop series and assures
coordination with other AIM activit,ies as well.  The workshops are being
carried out under Dr. S. Amarel of the Rutgers Computers in Biomedicine
resource.  The current membership of the Executive committee is listed in
Appendix H.

    Under the Executive Committee functions an Advisory Group
representing contact with medical and computer science research relevant
to AIM goals.  The Advisory Group serves several functions in advising the
Executive Committee;  1) recruiting appropriate medical/computer science
projects, 2) reviewing and recommending priorities for allocation of
resource capacity to specific projects based on scientific quality and
medical relevance, and 3) recommending policies and development goals for
the resource.  The current Advisory Group membership is given in Appendix
H.


45

   These committees are actively functioning in support of the
resource.  Meetings to date have been held by telephone conference for the
most part owing to the size of the groups and to save the time and expense
of personal travel to meet face to face.  These "missings" (a term coined
by Dr. Licklider),  in conjunction with terminal access to related text
materials, have served quite well in accomplishing the agenda business and
facilitate greatly the arrangement of meetings.  A few technical problems
occasionally attend such sessions such as poor telephone reception for
some members but in general this approach is quite satisfactory.  The key
to success seems to be a) fairly short and not too infrequent sessions, b)
a firm agenda, c) mail distribution of relevant documents, d) computer
network backup for exchange of information, and e) informality and
personal rapport of the members.  Other solicitations of advice requiring
review of sizable written proposals are done by the mails.

II.A.3.b    NEW PROJECT RECRUITING

   As a result of the public announcements of the SUMEX resource, NIH
contacts with prospective grantees, and personal contacts by the staff or
committee members, a number of additional projects have been admitted to
SUMEX; others are working tentatively as pilot projects or are under
review.  We have prepared a variety of materials for the new user ranging
from general information such as is contained in the brochure (Appendix I)
to more detailed information and guidelines for determining whether a user
project is appropriate for the SUMEX-AIM resource.  Dr. E. Levinthal has
prepared a questionnaire to assist users seriously considering applying
for access to SUMEX-AIM (see Appendix J).  Pilot project categories have
been established both within the Stanford and national aliquots of the
facility capacity to assist and encourage projects just formulating
possible AIM proposals pending a formal review.

   The projects newly admitted over the past year include (see Section
IV for more detailed descriptions):

National -

1) Chemical Synthesis Project (SEC.?); Dr. T. Wipke (University of
California at Santa Cruz)

2) Language Acquisition Modelling (ACT); Dr. J.  Anderson (University
of Michigan)

    As an additional aid to new projects or collaborators with existing
projects, we have a limited amount of funds which are being used to
support terminals and communications needs of users without access to such
equipment.  We are currently leasing 6 terminals and 4 modems for users as
well as 4 foreign exchange lines to better couple the Rutgers project into
the TYMNET and a leased line between Stanford and U. C. Santa Cruz for the
Chemical Synthesis project.


46

11.A.3.~    STANFORD COMMUNITY BUILDING

   During the past year, the Stanford community has undertaken s~~v~r'al
efforts to encourage interactions and sharing between the projects
centered here.  Beginning in the fall term, Professor Feigenbaum organized
a seminar class with the goal of assembling a handbook of AI concepts,
techniques,  and current state-of-the-art.  This project has had
enthusiastic support from the students and substantial progress made in
preparing many sections of the handbook.   An outline of the material to be
prepared along with an indication of the status of each article can be
found in Appendix B on page 180.  Several examples of completed articles
are given in Appendix A on page 166.

   A second community-building effort was a "mini AI conference" held
at Stanford in January 1976.  This 3 day series of meetings featured
presentations by each of the local projects and comparative discussions of
approaches to current problems in AI research such as knowledge
representations, production system strategies and rule formation, etc. A
brief summary of the conference is attached as Appendix C on page 194.

II.A.3.d    RESOURCE ALLOCATION POLICIES

   As the SUMBX facility has become increasingly loaded, a number of
diverse and conflicting demands have arisen which require controlled
allocation of critical facility resources (file space and central
processor time).  We have already spelled out a policy for file space
management; an allocation of file storage is defined for each authorized
project in conjunction with the management committees.   This allocation is
divided among project members in any way desired by the individual
principal investigators,  System allocation enforcement is implemented by
project each week.  As the weekly file dump is done, if the aggregate
space in use by a project is over its allocation, files are archived from
user directories over allocation until the project is within its
allocation.

    As described under TENEX monitor software development, we have been
using a primitive CPU scheduling algorithm intended to ensure that no one
user gets more than a fair share of the machine when other users are
contending.  With the implementation of TENEX 1.33 this summer, the pie-
slice allocation system will be available to more rigorously ensure CPU
allocation by project and community allocations.

   As also mentioned earlier, we have categorized users in terms of
access privileges.  These comprise fully authorized users, pilot projects,
guests,  and network visitors in descending order of system capabilities,
We want to encourage bona fide medical and health research people to
experiment with the various programs available with a minimum of red tape
while not allowing unauthenticated users to bypass the advisory group
screening procedures by coming on as guests.  So far we believe we have


47

had little or no exploitation compared to what other sites have
experienced, perhaps on account of the personal attention that senior
staff gives to the logon records,  However, the experience of most other
computer managers behooves us to be cautious about being as wide-open as
might be preferred for informal service to pilot efforts and
demonstrations.  We will continue developing this mechanism in conjunction
with management committee policy decisions.

II.A.3.e    AIM WORKSHOP SUPPORT

   The Rutgers Computers in Biomedicine resource (under Dr. Saul
Amarel) is actively working on plans for the second AIM workshop this
June.  The current plans call for a four day series of meetings covering a
range of topics related to artificial intelligence research, medical
needs, and resource sharing policies within NIH.  The SUMEX facility will
act as a prime computing base for the workshop demonstrations,  We hope to
have the new dual processor system in operation for the meetings.   A final
decision will depend on progress over the next week in completing the
debugging of the initial system and our ability to assure reliable
operation.  We are in the process of working with Rutgers to provide
backup modes for program demonstrations in the event of computer system
problems.


43

II.A.4    FUTURE PLANS

Svstem Development:

    In the next year much work remains to complete the dual processor
system.  We must complete installation,  evaluate its performance in terms
of increased throughput,  identify and fix excessive waiting for monitor
interlocks, and optimize system scheduling and resource handling.  We plan
to implement a mutual interrupt facility between the two machines and to
implement a bus switch allowing I/O devices to be moved easily between the
two machines.  This will increase our ability to keep the system running
in the case of a processor failure by reconfiguring to a single processor
mode.

    We plan to continue evaluation of system hardware bottlenecks and to
pursue avenues to eliminate them.  We know that disk space is currently a
problem and are trying to augment the system through user project
cooperation.  Other limiting resources over the next year may be memory
and swapping space.

   We will install version 1.33/1.34 of TENEX with necessary dual
processor and KI-10 modifications in order to stay current with other
TENEX sites and to improve resource allocation controls among the AIM
community members.

   We plan to improve the batch processing capability for those jobs
which need not run interactively.  A current system has helped to move
system loading from prime time to off hours.  We plan to extend facilities
for error handling and more flexible job scheduling.

   we will continue to refine the Executive program and capabilities
for guest users in conjunction with the TENEX 1.33 upgrade.

    We will also investigate ways of improving network communication
services.  This will include attempts to optimize our current facilities
for users through better ties to the networks and selective lines to tie
individual users into more advantageous access points.  We will also
continue to explore other network and communication alternatives as they
become available over the next year.  Specific goals include improved
response times and increased output speeds.  TYMNET will be starting 1200
baud service soon and we would like to make it possible for users to take
advantage of the higher output speed.

MAINSAIL:

    We are awaiting the funding of the MAINSAIL project to allow initial
export of this language system.   We have established contacts with
numerous outside groups interested in this machine-independent language
ranging from university research projects to industry.  (The university's
research projects office is handling any problems or opportunities that
may arise from proprietary values of these products, in accordance with
established procedures).  We have proposed an initial list of target


49

machines including PDP-10, PDP-11, and Nova.  We plan to develop an
exportable, documented form of the system for each of these environments
and to test them in conjunction with appropriate collaborating user sites.

Adaptive User Interfaces:

   We plan to continue work toward a more adaptive system for users
including both simplifying access for non-expert users and anticipating
default parameter conventions of individual users.   We are now in the
process of defining system calls which will make user specifications
accessible to utility programs in a uniform way.

Software Facilities and Libraries:

   There is a continuing need for improved documentation and self-
learning facilities for various aspects of the system and of available
programs.  We will be up-grading this material, particularly as it relates
to the inexperienced user.

   We will continue to up-grade the various DEC-originated subsystems
to the newest versions to increase the chance of compatibility.  We have
recently done this with FORTRAN and MACRO and will bring the other
programs along as soon as possible.  The whole issue of compatibility is
one which will receive continued attention.  We will also increase our
mutual ties in software sharing with the TENEX and AI communities,

    Some requests to look into additional software subsystems have been
received and we will consider mounting them if the community develops a
definite need.

Informal Information Access:

    One characteristic of the SUMEX community is the diversity of
information, formal and informal, which flows around the system or is
available from users.  We will continue to work on the HELP and BULLETIN
BOARD systems to capture that information and direct it to other
interested individuals.  We will be working on capabilities both to ease
the entry and cataloging of information and to assist in guiding the user
to that subset which is of interest to him at a given time.   These user-
oriented lookup protocols are, of course, strongly related to the problems
of adaptive user interfaces to the system and each will benefit from the
experience of the other.

Community Management:

    We will continue to work with the management committees to recruit
the additional high quality projects which can be accommodated and to
evolve resource allocation policies which appropriately reflect assigned
priorities and project needs.  We hope to make more generally available
information about the various projects both inside and outside of the


community and thereby to promote the kinds of exchanges exemplified
earlier and made possible by network facilities,  The AIM workshops
provide much useful information about the strengths and weaknesses of the
performance pro&rrams both in terms of criticisms from other AI projects
and in terms of the needs off practicing medical people.  We plan to use
this experience to Ruide the community buildin aspects of SUMEX-AIM.



51

11.0   SUMMARY OF RESOURCE USAGE

   The following data give an overview of the resource usage from May
1975 through April 1976.  There are three sub-sections containing data
respectively for 1) resource usage by community (AIM, Stanford, and
system), 2) resource usage by project, and 3) Network usage data.

II.B.l    RELATIVE SYSTEM LOADING BY COMMUNITY

   The SUMEX resource is divided, for administrative purposes, into 3
major communities: user projects based at the Stanford Medical School,
user projects based outside of Stanford (national AIM projects), and
systems development efforts.  As defined in the resource management plan
approved by BRP at the start of the project, the available resource will
be divided between these communities as follows:

CPU Usage - Stanford   40%
         AIM       40%
         Staff     20%

File Space - Stanford  27,000 pages(*)
          AIM      27,000 pages
          Staff     13,500 pages

(*) One TENEX page is 512 36-bit words or 2560 text characters)

An additional allocation of approximately 30,000 pages serves system files
including documentation, subsystems, monitor, etc.

   The monthly usage of CPU and file space resources for each of these
three communities relative to their respective aliquots is in the plots in
Figure 8 and Figure 9.  Our diurnal variations in loading have retained
the same characteristics as previously, with a bimodal distribution
reflecting the complementary loads from the east coast and the west coast,


52

   Figure 8.
CPU USE BY COMMUNITY

40 -7

       National AIM

40 -

        Stanford

0 -      I      I      I      I      I      I      I      I      I      I      I      I

20 -

      System Staff

0 L      I      I      I      I      1      I      I                                 -
                                                I      I      I      I      I
     bY   Jun   Jul   Aug   Sep   Ott   Nov   Dee   Jan   Feb Mar   Aw
                1975                        1976


     Figure 9.
FILE SPACE USE BY COMMUNITY

"        I      I      I      I      I      I      I      I                 0      I
    May   Jun   Jul  Aug   Sep   Ott   Nov   Dee   Jan  Fdb   Mar  Apr
                   1975                       1976


54

II.B.2    INDIVIDUAL PROJECT ANLj COMMUNITY USAGE

   The table following shows average resource usage by project in the
past grant year.  The data displayed include a description of the
operational fundinE sources (outside of SUMEX-supplied computing
resources) for currently active projects,  averape monthly CPU consumption
by project (tlours/month), average monthly terminal connect time by project
(Hours/month), and average file space in use by project (PaEes/month, 1
page = 512 computer words).  Averages were computed for each project for
the months between May 1975 and April 1976.


55

RESOURCE USE BY INDIVIDUAL PROJECT

STANFORD COMMUNITY

  CPU     CONNECT    FILE SPACE
(Hrs/mo)   (Hrs/mo)   (Pages/ma)

1) DENDRAL PROJECT           68.41       1574      18280
   "Resource Related Research
   Computers and Chemistry"
  NIH RR-00612 (3 yr award)
  $240,967 this year

2) MYCIN PROJECT
  "Computer-based Consult.
   in Clin. Therapeutics'1
  HEW HSO-1544 (3 yr award)
  $163,965 this year

20.76       494       5959

3) PROTEIN STRUCT MODELING     19.45
  "Heuristic Comp. Applied
   to Prot. Crystallog."
  NSF DCR74-23461 (2 yrs.)
  $88,436 total

4) PILOT PROJECTS
  (see reports in
   Set IV.B.l)

COMMUNITY TOTALS

NATIONAL AIM COMMUNITY

1) SECS PROJECT
  "Chemical Synthesis"
  NIH proposal pending

2) INTERNIST PROJECT
    (DIALOG)
  "Computer Model of
   Diagnostic Logic"
  HEW MB-00144 (3 yrs.)
  $167,168 last year

296       2452

14.12       433      3459

------       ------


122.74       2797

10.32

196

3284

7.64       209      4705

3) Higher Mental Functions      1.94
  tlComputer Models in
   Psychiatry and Psychother.
  NIH MH-27132 (2 yrs.)
  $67,000 this year

85       1299

-----


30150


4) ACT PROJECT
  "Language Acquisition
   Modelinq"
  NIMH $20,000 this year

5) MISL PROJECT
   "Medical Information
   Systems Laboratory"
  HEW MB-00114 (3 yrs.)
  $248,793 this year

6) RUTGERS PROJECT
  "Computers in Biomedicine"
  NIH RR-00643 (3 yrs.)
  $314,880 this year

7) AIM PILOT PROJECTS

o) AIM Administration

COMMUNITY TOTALS           37.97       1121      21757

SUMEX STAFF ANU SYSTEM

1) Staff

2) System & Operations

COMMUNITY TOTALS

RESOURCE TOTALS           275.41       9732

56

2.77

0,98

55






45

559






773

12.17       446      8174

0.27         9        66

1.88        76      2897

--_---       --m-e-

50.99       2126

63.71      3688


m---w-       ------


114.70      5814

-_-_--        _--_--
-_-_-_        e----e

-----

14453

27033

-----

41486

-----
e---e


93393


57

II.B.3    NETWORK USAGE STATISTICS

NETWORK USAGE PLOTS

   The plots in Figure 10 show the major billing components for SUMEX-
AIM TYMNET usage.  These include the total connect time for terminals
coming into SUMEX and the total number of characters transmitted over the
net.  The ratio of characters received at SUMEX to characters sent to the
terminal is about l:(lO-14) over the past couple of months.   Also shown
for recent months is a plot of ARPANET connect time which tracks the
corresponding data for TYMNET usage fairly closely.   No data for
"Character" transmission is available for ARPANET since file transfers and
terminal traffic use different byte sizes and these data are not resolved
and maintained for the ARPANET.


58

    Figure 10.
SUMEX-AIM NETWORK USAGE

---- TYMNET Data

---  ARPANET Data
    (no data <lo/75 and for 11175,
     also no meaningful character count)

1000

800

200

20 -

18 -

      I     I     I     I    I     I     I     I     I     I     I     I     I     I     I     I     ,     I
Ott Nov Dee Jan Feb Mar Apr May Jun Jul Aug Skp Ott  Nov Dee Jan Feb Mar Apr
   1974                         1975


59

1I.C    RESOURCE EQUIPMENT SUMMARY

   The following table gives a list of the items of equipment purchased
to date for the SUMEX resource along with details on vendor, description,
price,  and date,


ITEM    QUANTITY      DESCRIPTION         MANUFACTURER

MODEL      DATE      DATE      PURCHASE SOURCE
NUMBER     INSTALLED ACCEPTED PRICE     FUNDS

KI-10 CPU

1    Central processor,    Digital Equipment    KI-10     3/l/74    4124174   $178,500 NIH
   including console     Corporation

1    Central processor,    Digital Equipment    KI-10     4/15/76   517176   $203,138 NIH
   including console     Corporation

Memory

3    Core memory (64~     Digital Equipment    Ml?-1OG    3/l/74    4/24/74   $224,910, NIH
   words including 4     Corporation
   MC-10 memory ports)
1    Core memory (64~     Digital Equipment    MF-1OG     11/74     12174    $ 63,484 NIH
   words including 4     Corporation
   MC-10 memory ports)
1    Memory port        Digital Equipment    Mx-10     8174     9174   $ 4,770 NIH g
   multiplexer        Corporation

Clock

1    Programmable clock    Digital Equipment    DK-10     3/l/74    4/24/74   $ 2,678 NIH
                   Corporation

1    Programmable clock    Digital Equipment    DK-10     4115176   517176    (incl.  in second
                   Corporation                           processor)

Disk System      1    Single double density Digital Equipment    RP-1OC     3/l/74    5/l/74
                  disk controller      Corporation
              1    Memory data channel   Digital Equipment    DF-10     3/l/74    4/24/74
                               Corporation      ? I
             4    Double density disk   Digital Equipment    RP-03     3/l/74    4/24/74   $108,153 NIH
              drives and disk packs Corporation
              3    Double density disk   Digital Equipment    RP-03R     2175     3/75     $ 44,636 NIH
              drives and disk packs Corporation


ITEM     QUANTITY

DESCRIPTION         MANUFACTURER

MODEL
NUMBER

DATE       DATE      PURCHASE SOURCE
INSTALLED ACCEPTED PRICE     FETDS

Swapping
Storage

2     Fixed head disk with  Digital Development A-7312-D-8 l/75     3175     $ 37,206 NIH
   1.7M word capacity    Corporation
   and 4 track parallel
      access

1    Special systems      Digital Equipment    RES-10     10174     11/74    $ 81,090 NIH
    controller for       Corporation
    DDC disks

DEC Tapes
(~~-56)

1    DEC tape control     Digital Equipment    TD-10     3/l/74    4124174
                   Corporation


1    Dual DEC tape drive   Digital Equipment    TV-56     3/l/74    4/24/74   $ 17,850 NIH
                   Corporation                                                                     m

Magnetic Tapes 1    Magnetic tape       Digital Equipment    TM-1OA    3/l/74    4124174
(2 x TU-30)           controller         Corporation


               2    Tape transports      Digital Equipment    TU-30     3/l/74    4/24/74   $ 31,238 NIH
                               Corporation

Line Printer      1    Special systems line   Digital Equipment   Special    6174     7174    $ 7,208 NIH
               printer control for   Corporation
                   Data Products 2410

1    Line printer with      Data Products       2410      6/74     7174     $ 18,963  NIH
    96 character drum,
    vertical format
   control, parity check


ITEM    QUANTITY

DESCRIPTION

MANUFACTURER

MODEL
NUMBER

DATE      DATE     PURCHASE SOURCE
INSTALLED ACCEPTED PRICE     FUh?)S

GT-40

1    Graphics terminal    Digital Equipment    GT-40     3/l/74    4/24/74   $ 11,156 NIH
                  Corporation

Line Scanner      1    Data line scanner    Digital Equipment    DC-1OA    3/l/74    4124174
                              Corporation

1    &line unit        Digital Equipment    DC-1OB    3/l/74    4124174   $ 16,275 NIH
                   Corporation

TYMNET
Interface

1    PDP-10 TYMNET        TYMSHARE                 8/74     10/74    $ 50,774 NIH
    communications
    controller                                                                           m
                                                                           r-4

ARPANET
Interface

1    BB&N ARPANET/KI-10    Bolt, Beranek              l/75     2/75     $ 21,200 NIH
    interface and VDH     & Newman

PDP-ll/lO

1     Communications      Digital Equipment   PDP-ll/lO   2175     3175     $ 13,445 NIH
    processor          Corporation


ITEM    QUANTITY      DESCRIPTION

MANUFACTURER

MODEL
NUMBER

DATE       DATE     PURCHASE SOURCE
INSTALLED ACCEPTED PRICE     FUNDS

Terminals        1     Terminal

2     Terminals -         Computer
   Execuport portable     Transceiver
   with carry case      Systems, Inc.

6     Terminals -
    elite CRT with
   edit capabilities

2     Terminals -
    elite CRT with
   edit capabilities

2     Terminals -
    elite CRT with
   edit capabilities

Data Terminals
Communications

DTC-300    3/18/74

6174

311-3     3118174   6174

Datamedia

2500      g-10/74

11/74

Datamedia          2500      8/75     8175

Datamedia          2500      12/10/75   l/29/76  $ 24,618 NIH

$ 4,597 NIH

$ 6,402 NIH

Keyboards

3    Keyboards, special,    Datamedia          7ODVK7019   11174     12174    $ 1,118 NIH
    for leased Datamedia
    elite 2500 CRT
    terminals at -
    NIH
   Rutgers Univ.
   Washington Univ.


ITEM

QUANTITY                                       MODEL      DATE       DATE     PURCHASE SOURCE
            DESCRIPTION         MANUFACTURER      NUMBER     INSTALLED ACCEPTED PRICE     E-UNTDS

Modems

16


5

4

3


3

Auto answer modems

Auto answer modems

Auto answer modems

Originate modems

Modem enclosure
with loopback
switch and cables

Modem enclosures for
8 modems with cables,
power supply, digital
loopback, line
loopback,  indicator
lights
Acoustic coupler
modems

Modem enclosure with
line loopback switch
to house P-103F
modems

Prentice
Electronics

Prentice
Electronics

Prentice
Electronics

Prentice
Electronics

Prentice
Electronics

Prentice
Electronics

Prentice
Electronics

Prentice
Electronics

P-113B     516174    516174

P-1200/150  5/6/74    516174

P-1200/150  412176

P-1200/150  516174

4/2/76


516174

P-100     516174    516174

P-850     516174    516174

DC-22

P-100

3174


3174

3174


3174

,

$ 11,319 NIH

Oscilloscope      1    Oscilloscope        Tektronix, Inc.     475DM43    l/75     l/75     $ 3,476 NIH


65

1I.D    PUBLICATIONS

   Publications for the SUMEX staff have included papers describing the
SUMEX-AIM resource and on-going research:

[ll Carhart, R.E., Johnson, S.M., Smith, D.H., Buchanan, B.G., Dromey,
  R.G., and Lederberg, J, Networking and a Collaborative Research
  Community: a Case Study Using the DENDRAL Programs, ACS Symposium
  Series, Number 19, COMPUTER NETWORKING AND CHEMISTRY, Peter Lykos
  (Editor), 1975.

[2] Levinthal, E.C., Carhart, R.E., Johnson, S.M., and Lederberg, J., When
  Computers Talk to Computers, Industrial Research, November 1975

   Mr. Clark Wilcox was asked to chair the session on Languages for
Portability at the DECUS DECsystemlO Spring '76 Symposium.  A description
of his work will appear in the proceedings,

   In addition as reported earlier, a substantial effort has gone into
developing, upgrading, and extending documentation about the SUMEX-AIM
resource, the SUMEX-TENEX system, and the many subsystems available to
users.  These efforts include a number of major documents (such as SOS,
PUB, and TENEX-SAIL manuals) as well as a much larqer number of document
upgrades, user information and introductory notes, an ARPANET Resource
Handbook entry (see Appendix C), and policy guidelines (see Appendix F,
Appendix I, and Appendix J).  Publications for individual user projects
are summarized in the respective reports (see Section IV).


66

III    RESOURCE FINANCES

1II.A    REFERENCE TO BUDGETARY DETAILS

   The budgetary materials for the SUMEX project covering past actual
costs, current performance, and estimates for the next grant year are
submitted in a separate document to the NIH.


67

1II.B    RESOURCE FUNDING

    The SUMEX-AIM resource is essentially wholly funded by the
Biotechnology Resources Program [*I.  The various collaborator projects
which use SUMEX are independently funded with respect to their manpower
and operating expenses.  They obtain from SUMEX, without charge, access to
the computing and, in most cases,  communications facilities in exchange
for their participation in the scientific and community building goals of
SUMEX.

[*I Except for the participation by Stanford University in accordance with
  general cost-sharing, and for assistance to SUMEX by other projects
  with overlapping aims and interests.


68

IV    RESOURCE PROJECT DESCRIPTIONS

   The following are inputs from the various user projects currently in
the SUMEX-AIM community.  These project descriptions and comments are the
result of a solicitation for contributions sent to each of the project
Principal Investigators requesting the following information:

I> Summary of research program

A) Technical goals

B) Medical relevance and collaboration
C> Progress and accomplishments
D) Current list of project publications
E) Funding status (current funding level and pending applications or

renewals)

II) Interactions with the SUMEX-AIM Resource

A) Examples of collaborations and medical use of programs through
   networks

B) Useful contacts and cross fertilization with other SUMEX-AIM
  projects (via workshop, messages, terminal links, etc.)
C) Critique of resource services

   The text which follows on the various projects is primarily the
responsibility of the indicated project leaders,


69

1V.A    FORMALLY APPROVED PROJECTS

IV.A.l    STANFORD USERS

1V.A.l.a    DENDRAL PROJECT

                DENDRAL PROJECT
Principal Investigators:  Profs. C. Djerassi (Chemistry),
J. Lederberg (Genetics), and E. Feigenbaum (Camp. Sci.)

(Grant NIH RR-00612-06, 3 years, $240,967 this year)

OVERVIEW

   In the period August,1975 to July,1976 the DENDRAL programs and the
gas chromatography/mass spectrometry (GC/MS) data system have made
significant progress toward the goals stated in the research proposal.
This report of progress is organized in three parts, corresponding to the
three specific aims of our December, 1973, proposal: (PART 1) Enhancing
the power of the mass spectrometry resource, (PART 2) Developing
performance and theory formation programs, and (PART 3) Applying the
computer programs and instrumentation to biomedically relevant structure
elucidation problems.

   The DENDRAL project, one of the major users at Stanford of the
SUMEX-AIM computer facility, has also been forming its own community of
remote users.  This national "EXODENDRAL" community has already provided
valuable contributions to program development and both the community and
contributions are expected to grow at an increased rate.

PART 1: ENHANCING THE POWER OF THE M.S. RESOURCE

1.1 Introduction

   Our grant proposal requested funds for significant upgrading of our
capabilities in mass spectrometry.  The goals of this upgrading were to
provide routine high resolution mass spectrometry (HRMS), combined gas
chromatography/low resolution mass spectrometry (GC/LRMS) and to develop a
combined gas chromatography/high resolution mass spectrometry (GC/HRMS)
facility.  In addition, this would provide the capability for new
experiments in the detection and utilization of data on metastable ions.
These capabilities would then be available as required for application to
our wider goal, solution of biomedical structure elucidation problems of a
community of researchers.

   The upgrading included several items of hardware and software
development, as follows: 1) Acquire stand-alone computer support for the
mass spectrometer because existing facilities were inadequate and very
expensive; 2) convert existing software, written in the PL/ACME language


70

into FORTRAN so that it would run on the new system; 3) develop new
software as required for the demanding task of GC/HRMS; 4) provide
hardware and software for semi-automatic acquisition of data on metastable
ions.  The initial development phase of this upgrading included
performance tests to determine the capabilities and limitations of the
GC/HRMS system to define the scope of problems to which it can be applied.
The past year's efforts (year two of the DENDRAL grant) have culminated in
accomplishment of many of the above goals for development.   In the first
year, the computer system'(a Digital Equipment Corp. PDP 11/45) was
purchased, installed and is now operating routinely in conjunction with
the mass spectrometer (a Varian-MAT 711) and an auxiliary PDP 11120
system. Program conversion and modification for the initial version of the
software system was completed and the computer system now provides
complete stand-alone support for our experiments in mass spectrometry.
Over the past year we have developed further our philosophy of data
acquisition and reduction based on computed models of the actual
performance of the mass spectrometer.  This was and is necessary for
routine automated collection and reduction of combined GC/HRMS data with
minimal operator intervention in the procedures,

   The system development is motivated by two goals.  First, the system
must be robust in the sense that it continue to operate under a variety of
changing conditions, including intermittent misbehavior of the mass
spectrometer.  This ensures that the system can recover from hardware or
software error conditions to prevent fatal "crashes" of the system and
resulting loss of data.  Second, the system must automate the GC/HRMS
task.  The volume of data acquired in GC/HRMS experiments can be
efficiently handled only when every spectrum can be acquired and reduced
for final output by the system without manual intervention. We are
successful in these goals because we have written the software to
determine the actual performance of the mass spectrometer and to have
subsequent calculations based on that measured performance, as opposed to
some hypothetical ideal.

   We are now providing GC/HRMS service on a limited basis as we
improve the system.  The time devoted to system development and testing
will slowly diminish over the next year, leaving additional time for
analysis of mixtures obtained in our own work and that of our
collaborators.  We have deferred implementation of the metastable system
(see below) while the CC/HAMS development is continuing, *although we have
completed the hardware and much of the software for the system.

   Because we view GC/HRMS as the most important new capability of our
mass spectrometer/computer work, the requirements of GC/HRMS have guided
development of the software system,  These requirements include continuous
automatic monitoring of instrument performance to avoid wasting time
collecting poor or erroneous data.  By approaching GC/HRMS with an
electrical recording system, we can monitor the instrument continuously,
both during initial setup and during the course of the GC/HRMS experiment.
While photographic recording may capture more of the signal, it is
vulnerable to fluctuations in sample and instrument behavior in addition
to the difficulties in reading the data from film for computer analysis.
Major sections of the software and how they interact among one another are
summarized below.


71

   During the past year the routine production usage of the HRMS data
has become a reality.  The direct utilization of the system for the
acquisition of high resolution mass spectrometry data typically occupies 6
hours per day.  This figure does not include time for the post-processing
of data,  retrieval of data from the archival data base, or for the
generation of duplicate print outs of selected data.  These demands add 1
to 2 hours of system service each day to the total high resolution system
requirements.

    Low resolution mass spectral data whether it be smoothed from high
resolution data or obtained directly as low resolution data, places
additional time demands upon the data system.  High to low resolution
conversion,  low resolution plotting, and low resolution spectral library
searching have all generated a need for increasing amounts of system time.

   In an effort to utilize the data system more completely during non-
prime time, batch and spooling mechanisms have been constructed. The high
resolution spectral reviewing mechanism may be actuated and then left
unattended while the hard-copies are being generated.  The high to low
resolution conversion process contains a mechanism for the generation of a
low resolution plotting spool which can be played without operator
intervention.  Batch procedures have been written which provide for the
archival of newly acquired spectral data in the archival data base.

    As with any system as large as the high resolution system there is a
continual need for system maintenance and minor software upgrades. A
wider range of data acquisition and analysis places new demands upon the
system which require further modification of the software.

    The net result of the production demands has been to reduce the
amount of system time available for the development of new software
facilities.  Software development and production compete for the available
system time reducing the productivity of both the chemical user and the
software developer.  This competition can be drastically reduced if
software development can proceed on a machine separate from that on which
production is done.  The SUMEX PDP-10 and TENEX operating system provide a
more tractable medium for development than does the restricted environment
of the PDP-1 1.

    A major factor in the ease with which programs can be constructed is
the ease with which text can be manipulated.  The TV-EDIT program which is
available on the PDP-10 has proven to be effective for this task.   This
program provides an extremely flexible text editing system for display
terminals.  The mechanics of program construction can be greatly
simplified by the utilization of this facility.  Typically all major (more
than a few changes) text modification of programs are carried out on the
PDP-10 using TV-EDIT and then transferred to the PDP-11.   Thus even the
task of writing FORTRAN programs is simplified even though there exist
FORTRAN incompatibilitier between the two machines.

    While TV-EDIT has reduced development demands on the PDP-11 by
eliminating PDP-11 text editing sessions, the problem of program
compilation and debugging remain.  Clark Wilcox, of the SUMEX staff, has
provided an effective solution to this problem with the development of the


72

MAINSAIL (Machine Independent SAIL) compiler.  This compiler provides the
user with a powerful machine independent structured language.  Not only is
the compiler machine independent, but exhibits superior execution speeds
and storage requirements as compared to the DOS 9 FORTRAN which has been
used previously.

    The combination of TV-EDIT and MAINSAIL has proven to be an
effective method for the development of software for the PDP-11s within
the PDP-10 environment.  Most debugging can be carried out on the PDP-10
and then transferred to the PDP-11s for final debugging of machine-
dependent facilities.  The class of machine-dependent facilities includes
device drivers and interaction with the operating system.   The class of
machine-independent facilities includes analysis algorithms, file
manipulation, and most other programs which need development,  This means
that the amount of time required on the PDP-11 for program development can
be reduced significantly using the aforementioned process, leaving more
time for production demands.

1.2 Summary

    As the above hardware and software improvements are being made we
will continue evaluation of the GC/HRMS system in parallel with its actual
application to real problems.  GC/HRMS is a relatively new and difficult
technique for routine application,  In order to use it effectively, we
will have to exert some effort toward determining and optimizing the
performance of the many elements of the system, the GC, the MS, and the
computer hardware and software.

PART 2: DEVELOPING PERFORMANCE AND THEORY FORMATION PROGRAMS
      TO ASSIST IN BIOMEDICAL STRUCTURE ELUCIDATION PROBLEMS

2.1 Introduction

   The Heuristic DENDRAL computer programs assist with structure
elucidation problems by helping interpret mass spectra and helping
generate structures that are consistent with data obtained from a variety
of spectroscopic and physical/chemical courses.  The Meta-DENDRAL programs
assist with rule formation problems in cases where the rules of mass
spectrometry are not known.

   Both the interpretation and rule formation programs are written as
interactive tools to be controlled by professionals to combine the
professional's judgment with the computer's combinatorial power.

2.2   CONGEN .

   The CONGEN[48,53] program represents a significant extension of a
program which has developed over the last several years, the cyclic
structure generator[40,411.  The purpose of CONGEN is to assist the
chemist in determining the chemical structure of an unknown compound by 1)


73

allowing him to specify certain types of structural information about the
compound which he has determined from any source (e.g., spectroscopy,
chemical degradation, method of isolation, etc.) and 2) generating an
exhaustive and non-redundant list of structures that are consistent with
the information.  The generation is a stepwise process, and the program
allows interaction at every stage; based upon partial results the chemist
may be reminded of additional information which he can specify, thus
limiting further the number of final structures.

   CONGEN fits with the other DENDRAL programs as a "backstop" solution
to structure elucidation problems.  If the mass spectrum of an unknown
compound is available, then CLEANUP and MOLION could be used, but if the
general class of the compound is not known, PLANNER has no starting point
from which to work.  In such cases,  structural information can be
extracted manually from the spectrum and given to CONGEN for analysis.
Because CONGEN makes no assumptions about the source of this information,
other spectroscopic or chemical techniques may be used to supply
supplemental data.

    At the heart of CONGEN are two algorithms whose accuracy has been
mathematically proven and whose computer implementation has been well
tested.  The structure generation algorithm[31,37,40,41] is designed to
determine all topologically unique ways of assembling a given set of
atoms, each with an associated valence, into molecular structures.   The
atoms may be chemical atoms with standard chemical valences, or they may
be names representing molecular fragments ("superatoms") of any desired
complexity, where the valence corresponds to the total number of bonding
sites available within the superatom.  Because the structure generation
algorithm can produce only structures in which the superatoms appear as
single atoms (we refer to these as intermediate structures), a second
procedure, the imbedding algorithm[48,53] is needed to expand the
superatoms to their full chemical identities,

    These two routines give the chemist the ability to construct
structures from a given set of molecular "building blocks" which may be
atoms or larger fragments.  By itself, this capacity is of limited utility
because the number of final structures can be overwhelming in many cases.
Usually, the chemist has additional information (if only some general
rules about chemical stability, which the program has no concept of) that
can be used to limit the number of structural possibilities.  For example,
he may know that because of a compound's stability, it cannot contain a
peroxide linkage (O-O) and thus the programs need not consider such
structures when there are two or more oxygens in the "building block"
list.

   In the past year CONGEN has reached the level of a practical
production program which can aid chemists, both locally and at remote
network sites,  in solving the structures of drug-related compounds and
natural products.  The development of this program during the year has
been strongly guided by the difficulties and new requirements which have
appeared as it was applied to a wide variety of cases, and its efficiency
and usefulness have increased dramatically.  We report here the details of
the modifications and additions we have made to CONGEN, and the effects
they have had on its utility.  Also, because of the rich repertoire of


74

structure modification and testing functions available within CONGEN, we
have found it to be an invaluable "laboratory" for the testing of new
ideas, and we briefly describe several pilot projects which form the basis
for future research.  Discussion of applications of CONGEN to problems of
biochemical interest is included in Part 3.

    NEW CAPABILITIES FOR THE USER.  There have been several additions to
CONGEN which are visible to the user and which generally Increase the
flexibility and power of the program,  These include:

1)

Making CONGEN aware of aromaticity, a chemical property of molecules
which results from certain combinations of double bonds in rings.
Aromaticity has a profound effect upon both the chemical reactivity
and symmetry properties of molecules, and CONGEN can now be directed
to detect aromaticity in its output structures, to compensate for the
difference between the actual symmetry of an aromatic system and the
symmetry which appears in the graph representing it, and to
distinguish aromatic from non-aromatic atoms when it tests GOODLIST
and BADLIST entries.

2) Giving the user the ability to type `I?" to any prompt in the program,
which results in a summary of the possible inputs.   In some cases this
summary is a list of possible commands, while in others it is a short
explanatory message.  A new interactive teletype-input routine was
developed which makes it easy to include such help messages in the
  program, and which mimics the handy command-recognition and command-
completion features of the TENEX operation system.

3)

Including new specifications in the EDITSTRUC language for describing
substructural features.  The user can now declare a bond in a
substructure to be an "anybond",  which means that the atoms at the
termini are connected but that the multiplicity of the connection is
unspecified.  This is especially handy when defining substructures
containing aromatic portions because bond multiplicity is an
indistinct concept in aromatic systems.  Another new structural
element which can be specified is a "linknode", a node which stands
for a variable-length chain of atoms of the given type rather than a
single atom.  The minimum and maximum lengths of such a chain can be
specified as well,  The linknode feature is useful for defining
constraints on ring fusions and other constraints such as Bredt's rule
which depend on path length.  Other extensions have been made internal
to CONGEN which will shortly be reflected in the user-level language
of EDITSTRUC.  These include numerical inequalities involving node
properties (e.g., "the number of H's on atom 3 is greater than the
number of H's on atom 5") or linknode lengths (e.g., "the sum of the
lengths of linknodes 2 and 6 is greater than 5"), and greater control
over the number of fittings found for a GOODLIST constraint (e.g., the
ability to distinguish between "the number of N's in six-membered
rings" and "the number of six-membered rings containing N").

4) Allowing greater flexibility in the selection of terminal type. This
choice controls the output of structural drawings so they are best
  suited to the user's terminal.  Several different types of character-
oriented and graphics-display terminals are now supported.


75

5) Making CONGEN accessible from the GUEST login account at SUMEX. This
involved preventing a GUEST user from reaching certain critical points
in CONGEN which would allow greater system access than is normally
authorized for guests.  We can now offer trial access to CONGEN via
the guest mechanism without worrying about SUMEX misuse,

6) Creating a BATCH command for CONGEN. This allows the user to submit
time-consuming, compute-bound calculations to the batch-processing
  facility of SUMEX.  The computation is then run automatically at off-
  hours when it will not overload the system resources.   The user can
  now run CONGEN in its interactive mode to input all of his data and
  then submit the large tasks to BATCH for overnite processing.

7)

Including a pruning function MSPRUNE which is used to test a list of
candidate structures for consistency with a set of observed peaks from
a mass spectrum.  The candidates are typically generated by CONGEN
using structural data from other sources.  The user specifies the
observed MS peaks (high- or low-resolution, or a combination of both)
along with a set of constraints on the allowed cleavage processes.
MSPRUNE retains only those candidates which can account for the
observations via one of these allowed processes.   The constraints
speak of the number of bonds broken and the number of steps in a
process, the proximity of pairs of cleaved bonds (i.e., whether or not
two adjacent bonds can break in a given process), the multiplicity or
aromaticity of each cleaved bond and the possible neutral transfers.
MSPRUNE is the first CONGEN function which can aid directly in the
interpretation of "raw" spectral data.

2.3   Meta-dendral Rule Formation Programs

   The INTSUM program [34] is in routine, production use to assist in
interpretation of the mass spectra of new classes of molecules (see Part 3
for details).  When the mass spectrometry rules for a given class of
compounds are not known, the INTSUM, RULEGEN and RULEMOD programs can help
a chemist formulate those rules.  Essentially, these programs categorize
the plausible fragmentations for a class of compounds by looking at the
mass spectra of several molecules in the class.  All molecules are assumed
to belong to one class whose skeletal structure must be specified.  Also,
the mass spectra and the structures of all the molecules must be given to
the program.

   INTSUM collects evidence for all possible fragmentations (within
user-specified constraints) and summarizes the results.  For example, a
user may be interested in all fragmentations involving one or two bonds,
but not three; aromatic rings may be known to be unfragmented ; and the
user may be interested only in fragmentations resulting in an ion
containing a heteroatom.  Under these constraints, the program correlates
all peaks in the mass spectra with all possible fragmentations.   The
summary of results shows the number of molecules in whose spectra there is
evidence for each particular fragmentation, along with the total (and
average) ion current associated with the fragmentation.

The RULEGEN program attempts to explain the regularities found by



76

INTSUM in terms of the underlying structural features around the bonds in
question that seem to "drive" the fragmentations.  For example, INTSUM
will notice significant fragmentation of the two different bonds alpha to
the carbonyl group in aliphatic ketones.  It is left to RULEGEN to
discover that these are both instances of the same fundamental alpha-
cleavage process that can be predicted any time a bond is alpha to a
carbonyl group.

   The RULEMOD program modifies and condenses the set of rules produced
by INTSUM and RULEGEN together.  It looks at the negative evidence
associated with each candidate rule in order to select the best ones, then
merges rules that seem to explain the same breaks (if possible).   The
program was substantially improved in several ways, as described in the
next section.

2.3.1  INTSUM Improvements

   Transfers of arbitrary neutral species can now be specified as part
of the mass spectrometry processes, instead of transfers of hydrogen atoms
alone.  This capability increases the utility of the program in at least
two ways: first,  it allows a chemist to control the program better -- to
produce the kinds of results that are more chemically meaningful -- and
second,  it allows the program to explore more complex processes within its
space and time limitations.  For example,  carbon monoxide and water were
listed as plausible neutral molecules to transfer in or out of fragments
for the triketoandrostanes.  Thus, the processes are listed with and
without these transfers, just as chemists prefer, instead of showing loss
of CO as a set of two breaks around the keto group, or loss of H 0 as loss
of oxygen (breaking the C=O bond) accompanied by loss of two hydgogens.
What is more, the program can now produce these results without violating
its chemical heuristics of (a) not breaking adjacent bonds, and (b) not
breaking double bonds,  This economy also pays off in increasing the
complexity of the processes that can be considered.  Because loss of CO,
for example, is a result of a transfer instead of the result of breaking
two bonds, the number of bonds broken in accompanying processes can be
increased by two.

    Another INTSUM improvement was to increase the options for initial
data filtering.  Thresholding is too simple for many problems, so we now
provide an option to cluster peaks and select the n largest peaks from
each cluster.

    The format of the input data is also now less strict than before.
We have written programs to read spectra in Aldermaston format.   And we
have merged CONGEN's Editstruc package into the INTSUM setup routines to
allow a chemist to associate structures with spectra interactivity.   This
greatly decreases the chances of error in setting up the input data.

   Several modifications were also made to the program to increase its
efficiency, e.g., processing all intensities as integers (between 0 and
1000).

2.3.2 RULEGEN Improvements


77

   The evaluation of prospective rules in RULEGEN guides the entire
rule generation procedure.  To tune this procedure, we modified the
evaluation function in several ways and compared the resulting sets of
rules.  We were looking for an objective way of telling the program to
keep rules general, but "not too general".   The current evaluation
function is substantially improved as a result.

   Because the RULEGEN program searches such a large space of partial
and complete rules, it requires large amounts of computer time (sometimes
more than 60 cpu minutes).  Thus, we have investigated several
improvements for efficiency alone.  In addition, we have made the program
easier to set up and run in batch mode to reduce the chemist's personal
time investment.  And we have made the program easily restarted from any
intermediate point -- to protect the chemist from machine failures.

2.3.3 RULEMOD Improvements

   At the time of the last annual report RULEMOD was a new program
still in its experimental stages.  Since then we have added new
subprograms and integrated the program with other programs to make it a
useful and necessary part of Meta-DENDRAL.

   Two new subprograms greatly improve RULEMOD's performance.  (1) A
program to add specifications to rules was completed.   It looks for
plausible ways of making a rule more specific in order to decrease the
number of counterexamples to the rule.  (2) A complementary program to
make rules more general was also completed.  The program tries to find
ways to reduce the number of descriptors on nodes of subgraphs in order to
increase the breadth of applicability of rules.  Its major constraint is
that it cannot make any change that would increase the number of
counterexamples.  Both of these subprograms make the final rules much
closer to rules that chemists approve of.

   The subprogram that merges rules was also improved.  The program
tries to merge pairs of rules into a more general form for economy and
clarity of rules.  Its major constraint is that no explanations are lost,
L.2, ) all the data points explained by the initial pair of rules will
still be explained after merging.  Formerly we insisted that the more
general form must cover all the same data points as the initial rules, but
this was found to be too narrow a constraint.  By giving the program a
more global view of the entire set of rules, we can let the more general,
merged form explain fewer data points as its component rules as long as
other rules explain the remainder.

PART 3:  APPLICATIONS TO BIOMEDICAL STRUCTURE ELUCIDATION PROBLEMS

3.1 Introduction

   In our grant proposal we discussed the application of the
instrumentation and computer programs described above to the study of
molecular structure problems in a variety of biomedical applications
areas.  This is our primary research area, and we discussed specific


78

classes of problems and compounds for investigation.  We also made it
quite clear that our facilities would be made available to wider community
of collaborators/users as our resources permitted.  Both categories of
application, i.e., within our own group, and with an outside group, are
described in some detail below.

    Our last annual report described several steps taken to encourage a
broad community of researchers to use our facilities.  For example, we
sent a questionnaire to members of the American Society for Mass
Spectrometry, Committee III on Computer Applications, and a follow-up
letter to persons indicating a desire to know more about access to our
programs.  The same note has been sent to several other persons whom we
know from personal contacts might be interested.  Because of the nature of
their investigations, many of these people receive NIH support.   Several
of our publications (e.g., [45]-[49]) mention the availability of our
programs.  In addition, through individual contacts and formal
presentations at conferences we have been encouraging outside use of the
programs.

    The availability of SUMEX as a mechanism for resource sharing has
made it possible for us to extend access to our programs to a number of
people.  Without SUMEX, this access would be impossible, and most of our
programs (those which are not easily exportable) could be used only by
ourselves .

3.2  Applications by Professor Djerassi's Research Group

   Our existing grants, outlined below, mesh well with our
instrumentation and program development under the present award.  Under
NIH Grant GM06840 we have been studying natural products from marine
sources with major emphasis on terpenoids and sterols.   For this work we
have been dependent on the use of our 711 instrument for high resolution
mass spectrometry which we require for the identification of all new
compounds, many of which are present in only very small quantities. We
are particularly anxious to have access to GC coupled with a high
resolution mass spectrometer because we hope to be able to screen large
numbers of marine animals for their sterol content using this technique.

   We are currently engaged in intensive efforts in analysis of
mixtures of marine sterols involving our computer-based procedures.  The
program for the development of the computer operated and assisted system
of marine sterol structure analysis has been planned to proceed in three
stages :

1) Analysis of all literature published concerning marine sterols so
that a complete listing of known sterol structures and organisms
studied could be compiled.

2) Collection, evaluation,  digitization and computer file construction
for the mass spectra of all known marine sterols, followed by the
institution of a computer operated file search sequence for direct
analysis of marine sterol CC-MS data.


79

3) The application of the INTSUM, RULEGEN, and RULEMOD programs to the
computer file of marine sterol spectra so that a series of
fragmentation rules can be extracted for use in the generation of
possible structures from mass spectral data for new marine sterols,
that is, sterols whose mass spectra cannot be matched with any
spectra contained in the computer search file.

3.3 Applications of Programs by External Scientists

   The DENDRAL project, still one of the major users of the SUMEX-AIM
computer facility, has formed a small community of regular, remote users.
This "exodendral" community has continued to provide valuable
contributions to program development, although the growth of this
community has had to be slowed in response to increasing demands by other
projects upon the SUMEX-AIM facility.  As an example, for the months of
September 1975 to February 1976, the number of CPU hours used by
exodendral persons amounted to at least 8 percent of the CPU hours used by
the DENDRAL project.  There are currently four remote chemist-users whose
groups' regularly use CONGEN in their day to day work.  Additionally,
there are several remote users who use their accounts on an occasional
basis, or who access SUMEX-AIM via the GUEST mechanism.

   The SUMEX-AIM facility has grown markedly in number of projects over
the past year.  Due to this increase in system loading; the DENDRAL
project, which had previously been able to offer trial usage of its
programs to almost any chemist who expressed a need to use the programs,
has found itself in the unfortunate position of of having to carefully
screen potential collaborators.  Those chemists who have been granted
access, have been requested to restrict their usage to off-prime time
hours.  CONGEN, the DENDRAL program which receives most of this usage, has
evolved in a manner designed to try to remedy the system loading problem
which can be created by the enthusiasm of it's chemist-users.  Since a
typical, long GENERATE, PRUNE or IMBED within CONGEN can be very time
consuming, as well as a voracious consumer of CPU cycles, a provision to
permit a user to easily take advantage of SUMEX-AIM's off-hour batch
processing has been implemented.  A CONGEN user can now interactively set
up his problem, and when ready to commence with a time consuming
procedure, can, from within CONGEN, request automatic submission to BATCH,
to be run late at night.  The CONGEN users' also benefit from this
ability, in that they no longer must leave a terminal tied up during the
sometimes hour-long compute times.  This development then, can be viewed
as responding to CONGEN users' needs as well as being an effort by the
DENDRAL project to be conscientious in its resource-sharing
responsibilities.

   Following is a brief summary of the major users of CONGEN over the
past year, as well as notes on chemists who contacted us about trial usage
of the programs.

   Dr. Clair Cheer, Professor of Chemistry, University of Rhode Island,
Kingston,  Rhode Island.  Dr. Cheer is on sabbatical leave from the
University of Rhode Island to the Stanford University Chemistry
Department.  He has, in recent work with Professor Djerassi's group,


80

demonstrated the utility of CONGEN in the identification of (+I-Palustrol,
a tricyclic sesquiterpene alcohol from the marine Xeniid Cespitularia
virdis (Cheer, Djerassi et. al., Tetrahedron, in press).  Dr. Cheer plans
to continue his work with CONGEN once he returns to Rhode Island in
December.

   Dr. Jon Clardy, Professor of Chemistry, Iowa State University. Dr.
Clardy read of CONGEN in an article appearing in the Journal of the
American Chemical Society and contacted 'Professor Djerassi concerning the
possibility of using the program from Iowa.   He was offered GUEST access
during the winter of 1975, but has not yet had an opportunity to evaluate
the potentials of the program.

   Dr. Douglas Dorman, Eli Lily Corp., Indianapolis, Indiana. Dr.
Dorman's research involves the identification and characterization of drug
related compounds by chemical and spectrographic methods.  Using primarily
the NMR and Cl3 NMR spectra of these various compounds, Dr. Dorman has
found CONGEN to be a time-saving adjunct to his structure elucidation
work.

   Dr. H.M. Fales, National Heart and Lung Institute, Bethesda,
Maryland.  Dr. Fales, along with Doctors Sanford Markey and Peter Roller
had a joint account set up for them in April of 1975.   Most of the use of
this account came during late summer at which time Dr. Fales experimented
with the use of CONGEN for assistance in the elucidation of the structure
of a novel quinolinone, known to be tumorogenic.  Although the crystal
structure had been solved at the time of his usage of CONGEN, Dr. Fales
felt that the program produced an abundance of useful ideas. The main
problem initially faced by Dr. Fales in using CONGEN was in getting a feel
for problem size and the effects of various constraint types.

   Professor Kenneth Gash, California State College at Dominguez Hills.
Professor Gash is a professor of chemistry who is on temporary leave to
Small College, the research branch of Dominguez Hills. Dr.  Gash did some
of the original work, in 1965, with Professor Morton Munk, on the
structure elucidation program developed at Arizona State University. Dr.
Gash has been reviewing some of the problems originally done with Munk's
program and has been studying input, output and constraint capabilities
found in CONGEN.  He has generally concluded that CONGEN provides an
excellent tool for the chemist to use in structure elucidation problems
subject to the constraint of slow system response time.

  Mr. Neil A. B. Gray, King's College, Cambridge, England.  Mr. Gray,
following a three week visit to the Stanford chemistry department,
requested copies of all the current DENDRAL programs to be sent to him in
England.  He is a chemist who has been working'in areas related to
developments in various of the DENDRAL programs, and hopes to be able to
benefit from work already done at Stanford,  His current interest in
intelligent constraint application during structure elucidation merges
well with one of the directions in which CONGEN is tending to develop.
Unfortunately, Mr. Gray does not have access to an ARPANET or TYMNET node
to access SUMEX-AIM directly.  Therefore,  all collaboration has had to be
carried on by mail.


81

   Dr. Jerrold Karliner, Ciba Geigy Corporation, Ardsley, New York.
Dr. Karliner and his research group at Ciba-Geigy have become regular
users of CONGEN in their day-to-day operation of a research laboratory.
Dr. Karliner is a completely self-taught user of CONGEN, and has served to
encourage others to request permission to use this program.

   Dr. Milton Levenberg, Abbott Laboratories, Chicago, Illinois. Dr.
Levenberg has been an occasional user of CONGEN as an adjunct to his work
as head of a mass-spectrometry laboratory.  Primary usage has been to
provide assurance that the proposal of a structure for a compound on the
basis of chemical and spectroscopic evidence has not overlooked other
plausible possibilities.

   Dr. Gino Marco, Ciba Geigy Corporation, Greensboro, North Carolina.
Dr. Marco heard about CONGEN during a company seminar presented by Dr.
Karliner.  After a brief trial use via the GUEST mechanism, Dr. Marco
requested an account for use by his group of metabolic and organic
chemists.  Dr. Marco's research group studies unknown insect metabolites
by micro-IR and micro-NMR methods, and attempts structure elucidation
based on these forms of spectroscopic analysis.  Testing the utility of
the program before implementing it for day to day use, Dr. Marco
discovered that CONGEN could greatly narrow the alternatives of complex
metabolic conjugates which had to be considered in a typical elucidation
problem.  They have established a leased line to the nearest TYMNET node,
and expect increased CONGEN usage in the future,

   Professor G. Minole, Italy.   Professor Minole has been active in
elucidation of structures of marine natural products, an area of interest
which overlaps with our own.  We have provided, by written communication
due to absence of network access,  sets of structural alternatives in
current problems being studied by Professor Minole.   We have used some of
the mass spectrometric prediction functions of our DENDRAL programs to
determine which structures in a set of possibilities could yield the
observed mass spectral data.

   Professor Nogi Nakanishi, Department of Chemistry, Columbia
University.  Professor Nakanishi is one of the most active and productive
persons engaged in structure elucidation activities.  He has developed an
active interest in CONGEN and is collaborating with us on several novel
problems,  One of these problems has involved the structure of the active
component of defense secretions of an insect (termite).   Other defense
secretion components are under investigation as we explore structural
alternatives based on current data.

   Dr. David Pensak, DuPont de Nemours and Company, Wilmington,
Delaware,  Indirectly requested information about CONGEN through a letter
written by his immediate superior to Professor Lederberp;. Dr.   Pensak has
been offered GUEST access, and has just begun a potential collaboration
with a DENDRAL group which is studying model builders and their production
of reliable geometries for certain types of molecules.

    Professor Manfred Wolff, University of California at San Francisco.
Dr. Wolff is chairman of the Department of Pharmacological Chemistry, and
inquired as to the possibilities of accessing SUMEX-AIM and appropriate


82

programs for a faculty which is interested in many aspects of drug design
and drug action, ranging from physical chemistry to purely biological
studies. He has been encouraged to use GUEST access to explore CONGEN,
although he has taken no action up to the present time.

   We have cases where requests for GUEST access had to be denied due
to system loading considerations. We made these decisions according to the
extent to which the requested use would fit within the research guidelines
of SUMEX/AIM and our own stated criteria from the 1973 proposal to NIH.
In one case, for instance, the use was for an individual's report on
potential educational uses of CONGEN.

FUNDING STATUS

   The DENDRAL project is in its sixth year of NIH funding through the
BRB (Grant RR-00612).  For the period 8/1/75 - 7/31/76 the total (direct
costs) amount awarded was $240,967.  After nine months of the seventh year
the project cease to be supported by the current grant: a competing
renewal application will be submitted June 1, 1976.   For the nine months
period 8/l/76 - 4/30/77 the total (direct costs) amount awarded is
$210,778.

INTERACTIONS WITH THE SUMEX-AIM RESOURCE

    The research summary above described several ways in which we see
the DENDRAL programs helping biomedical scientists.  See Part 3 for a list
of persons with whom we have actively collaborated,  One of the major
goals of the research is to extend the usefulness of the programs for just
such persons.

   The SUMEX-AIM community is an exciting and productive collection of
projects and individuals who contribute in many ways to the progress of
all projects in the community.  Our programming in INTERLISP, SAIL and
FORTRAN, for example, is speeded considerably by the ready availability of
expert programmers from many projects.  We have shared ideas about
intelligent interfaces between programs and users with members of the
MYCIN, X-Ray Crystallography and MOLGEN projects.  Perhaps the most used
and most useful means of communication is the SNDMSG program on SUMEX. It
is much more efficient than campus mail and much less intrusive, as well
as more efficient for multiple messages, than the telephone,  We are
cooperating with the SUMEX staff on the Bulletin Board facility, which
will be another efficient means of communicating, especially when the
sender of a message is not certain who the receivers should be,  (It will
allow potential receivers to say what they are interested in and notify
them of relevant bulletins, without the sender making an explicit
distribution list).

   The SUMEX-AIM staff is the most professional computer facility staff
we have worked with over the ten year life of the DENDRAL project.   The
very low amount of unscheduled downtime is a direct indication of their
professional attitude and abilities.  Less measurably, the helpfulness of
the staff also translates directly into increased productivity for


83

DENDRAL.  There have been numerous instances of the SUMEX staff answering
our questions immediately and fixing errors in system programs for us as
quickly as we could expect.

   As the system becomes more heavily loaded, we notice longer and
longer delays in computer response time.  This is the one major criticism
voiced by DENDRAL project members.  Many of these persons have changed
their work habits to conform to the lighter loading between midnight and
5:00 because they cannot get any significant computing done during the
day.

SUMMARY OF PUBLICATIONS

(1) J. Lederberg, "DENDRAL-64 - A System for Computer Construction,
  Enumeration and Notation of Organic Molecules as Tree Structures and
   Cyclic Graphs", (technical reports to NASA, also available from the
   author and summarized in (12)).  (la) Part I.  Notational algorithm
  for tree structures (1964) CR.57029 (lb) Part II. Topology of cyclic
  graphs (1965) CR.68898 (1~) Part III.  Complete chemical graphs;
  embedding rings in trees (1969)

(2) J.  Lederberg, "Computation of Molecular Formulas for Mass
  Spectrometry", Holden-Day, Inc. (1964).

(3) J. Lederberg, "Topological Mapping of Organic Molecules", Proc. Nat.
   Acad. Sci., 53: 1, January 1965, pp.  134-139.

(4) J. Lederberg, "Systematics of organic molecules, graph topology and
   Hamilton circuits,  A general outline of the DENDRAL system." NASA
   CR-48899 (1965)

(5) J. Lederberg, "Hamilton Circuits of Convex Trivalent Polyhedra (up to
  18 vertices), Am. Math. Monthly, May 1967.

(6) G. L. Sutherland, "DENDRAL - A Computer Program for Generating and
  Filtering Chemical Structures", Stanford Artificial Intelligence
  Project Memo No. 49, February 1967.

(7) J.  Lederberg and E.  A. Feigenbaum,  "Mechanization of Inductive
  Inference in Organic Chemistry", in B.  Kleinmuntz (ed) Formal
  Representations for Human Judgment, (Wiley, 1968) (also Stanford
  Artificial Intelligence Project Memo No. 54, August 1967).

(8) J.  Lederberg, "Online computation of molecular formulas from mass
   number," NASA CR-94977 (1968)

(9) E. A. Feigenbaum and B. G. Buchanan, "Heuristic DENDRAL: A Program
  for Generating Explanatory Hypotheses in Organic Chemistry", in
  Proceedings, Hawaii International Conference on System Sciences, B.
   K. Kinariwala and F. F. Kuo (eds), University of Hawaii Press, 1968.

(10) B. G. Buchanan, G. L. Sutherland, and E. A.  Feigenbaum, "Heuristic
  DENDRAL: A Program for Generating Explanatory Hypotheses in Organic


84

Chemistry".  In Machine Intelligence 4 (B.  Meltzer and D. Michie,
eds) Edinburgh University Press (19691, (also Stanford Artificial
Intelligence Project Memo No. 62, July 1968).

(11) E.  A. Feigenbaum, "Artificial Intelligence: Themes in the Second
   Decade".  In Final Supplement to Proceedings of the IFIP68
  International Congress, Edinburgh, August 1968 (also Stanford
  Artificial Intelligence Project Memo No. 67, August 1968).

(12) J. Lederberg,  "Topology of Molecules",  in The Mathematical Sciences -
  A Collection of Essays, (ed.) Committee on Support of Research in the
  Mathematical Sciences (COSRIMS), National Academy of Sciences -
  National Research Council, M.I.T. Press, (19691, pp.  37-51.

(13) G. Sutherland,  "Heuristic DENDRAL: A Family of LISP Programs", to
   appear in D.  Bobrow (ed), LISP Applications (also Stanford
  Artificial Intelligence Project Memo No. 80, March 1969).

(14) J. Lederberg, G. L. Sutherland, B. G. Buchanan, E. A.  Feigenbaum,
   A. V. Robertson, A. M. Duffield, and C. Djerassi, "Applications of
   Artificial Intelligence for Chemical Inference I.   The Number of
   Possible Organic Compounds: Acyclic Structures Containing C, H, 0 and
   N" .  Journal of the American Chemical Society, 91:ll (May 21, 1969).

(15) A. M. Duffield, A. V. Robertson, C. Djerassi, B. G.  Buchanan, G.
   L. Sutherland, E. A. Feigenbaum, and J.  Lederberg,  "Application of
   Artificial Intelligence for Chemical Inference II.  Interpretation of
   Low Resolution Mass Spectra of Ketones".   Journal of the American
  Chemical Society, 91:ll (May 21, 1969).

(16) B.  G. Buchanan, G.  L. Sutherland, E. A.  Feigenbaum, "Toward an
   Understanding of Information Processes of Scientific Inference in the
  Context of Organic Chemistry", in Machine Intelligence 5, (B.
    Meltzer and D. Michie, eds) Edinburgh University Press (79701, (also
   Stanford Artificial Intelligence Project Memo No.  99, September
   1969).

(17) J.  Lederberg, G.  L. Sutherland, B. G. Buchanan, and E. A.
   Feigenbaum, "A Heuristic Program for Solving a Scientific Inference
   Problem: Summary of Motivation and Implementationl, Stanford
  Artificial Intelligence Project Memo No. 104, November 1969.

(18) c.  W. Churchman and B. G. Buchanan,  "On the Design of Inductive
   Systems: Some Philosophical Problems".  British Journal for the
  Philosophy of Science, 20 (19691, pp.  311-323.

(19) G. Schroll, A. M. Duffield, C. Djerassi, B. G. Buchanan, G. L.
   Sutherland, E. A.  Feigenbaum, and J. Lederberg, "Application of
   Artificial Intelligence for Chemical Inference III.  Aliphatic Ethers
   Diagnosed by Their Low Resolution Mass Spectra and NMR Data".
  Journal of the American Chemical Society, 91:26 (December 17, 1969).

(20) A. Buchs, A. M. Duffield, G. Schroll, C. Djerassi, A. B.  Delfino,
   B. G.  Buchanan, G. L. Sutherland, E. A. Feigenbaum, and J.


85

Lederberg,  "Applications of Artificial Intelligence For Chemical
Inference. IV.  Saturated Amines Diagnosed by Their Low Resolution
Mass Spectra and Nuclear Magnetic Resonance Spectra", Journal of the
American Chemical Society, 92, 6831 (1970).

(21) Y.M. Sheikh, A. Buchs, A.B. Delfino, G. Schroll, A.M.  Duffield, C.
   Djerassi, B.G.  Buchanan, G.L. Sutherland, E.A.  Feigenbaum and J.
   Lederberg,  "Applications of Artificial Intelligence for Chemical
   Inference V.  An Approach to the Computer Generation of Cyclic
    Structures.  Differentiation Between All the Possible Isomeric
  Ketones of Composition C6HlOO", Organic Mass Spectrometry, 4, 493
   (1970).

(22) A. Buchs, A.B. Delfino, A.M. Duffield, C. Djerassi, B.G. Buchanan,
   E.A.  Feigenbaum and J. Lederberg,  "Applications of Artificial
   Intelligence for Chemical Inference VI.  Approach to a General Method
   of Interpreting Low Resolution Mass Spectra with a Computer", Chem.
  Acta Helvetica, 53, 1394 (1970).

(23) E.A.  Feigenbaum, B.G. Buchanan, and J. Lederberg, "On Generality and
   Problem Solving: A Case Study Using the DENDRAL Program".   In Machine
   Intelligence 6 (B. Meltzer and D. Michie, eds.) Edinburgh University
   Press (7971).  (Also Stanford Artificial Intelligence Project Memo
   No. 131.)

(24) A. Buchs, A.B. Delfino, C. Djerassi, A.M. Duffield, B.G. Buchanan,
   E.A. Feigenbaum, J. Lederberg, G. Schroll, and G.L. Sutherland, "The
   Application of Artificial Intelligence in the Interpretation of Low-
   Resolution Mass Spectra", Advances in Mass Spectrometry, 5 (1971),
   314.

(25) B.G. Buchanan and J. Lederberg,  "The Heuristic DENDRAL Program for
   Explaining Empirical Data".  In proceedings of the IFIP Congress 71,
  Ljubljana, Yugoslavia (1971).  (Also Stanford Artificial Intelligence
  Project Memo No. 141.)

(26) B.G. Buchanan, E.A. Feigenbaum, and J.  Lederberg,  "A Heuristic
   Programming Study of Theory Formation in Science." In proceedings of
   the Second International Joint Conference on Artificial Intelligence,
   Imperial College, London (September, 1971).  (Also Stanford
  Artificial Intelligence Project Memo No. 145.)

(27) Buchanan, B. G,, Duffield, A.M., Robertson, A.V., "An Application of
   Artificial Intelligence to the Interpretation of Mass Spectra", Mass
   Spectrometry Techniques and Appliances, Edited by George W. A.
   Milne, John Wiley & Sons, Inc., 1971, p. 121-77.

(28) D.H. Smith, B.G.  Buchanan, R.S. Engelmore, A.M. Duffield, A. Yeo,
    E.A.  Feigenbaum, J.  Lederberg, and C. Djerassi,  "Applications of
   Artificial Intelligence for Chemical Inference VIII.  An approach to
   the Computer Interpretation of the High Resolution Mass Spectra of
   Complex Molecules.  Structure Elucidation of Estrogenic Steroids",
  Journal of the American Chemical Society, 94, 5962-5971 (1972).


(29)

(30)

(31)

(32)

(33)

(34)

(35)

(36)

(37)

(38)

( 39 )

86

B.G. Buchanan, E.A.  Feigenbaum, and N.S.  Sridharan,  "Heuristic
Theory Formation: Data Interpretation and Rule Formation". In
Machine Intelligence 7, Edinburgh University Press (1972).

Lederberg, J., "Rapid Calculation of Molecular Formulas from Mass
Values".  Jnl. of Chemical Education, 49, 613 (1972).

Brown, H., Masinter L., Hjelmeland, L., "Constructive Graph Labeling
Using Double Cosets".  Discrete Mathematics, 7 (19741, l-30.  (Also
Computer Science Memo 318, 1972).

B. G. Buchanan, Review of Hubert Dreyfus' "What Computers Can't Do: A
Critique of Artificial Reason", Computing Reviews (January, 1973).
(Also Stanford Artificial Intelligence Project Memo No. 181)

D. H. Smith, B. G. Buchanan, R. S. Engelmore, H. Adlercreutz and C.
Djerassi, "Applications of Artificial Intelligence for Chemical
Inference IX.  Analysis of Mixtures Without Prior Separation as
Illustrated for Estrogens".  Journal of the American Chemical Society
95, 6078 (1973).

D.  H. Smith, B.  G. Buchanan, W. C. White, E. A.  Feigenbaum, C.
Djerassi and J. Lederberg,  t*Applications of Artificial Intelligence
for Chemical Inference X.   Intsum.  A Data Interpretation Program as
Applied to the Collected Mass Spectra of Estrogenic Steroids".
Tetrahedron, 29, 3117 (1973).

B.  G. Buchanan and N. S. Sridharan, "Rule Formation on Non-
Homogeneous Classes of Objects".  In proceedings of the Third
International Joint Conference on Artificial Intelligence (Stanford,
California, August, 1973).  (Also Stanford Artificial Intelligence
Project Memo No. 215.)

D. Michie and B.G. Buchanan,  "Current Status of the Heuristic DENDRAL
Program for Applying Artificial Intelligence to the Interpretation of
Mass Spectra".  August, 1973.  To appear in Computers for
Spectroscopy (ed. R.A.G.  Carrington) London: Adam Hilger.  Also:
University of Edinburgh, School of Artificial Intelligence,
Experimental Programming Report No, 32 (1973).

H. Brown and L. Masinter,  "An Algorithm for the Construction of the
Graphs of Organic Molecules", Discrete Mathematics, 8(1974), 227.
(Also Stanford Computer Science Dept. Memo STAN-CS-73-361, May,
1973)

D.H. Smith, L.M.  Masinter and N.S. Sridharan, "Heuristic DENDRAL:
Analysis of Molecular Structure, I1 Proceedings of the NATO/CNNA
Advanced Study Institute on Computer Representation and Manipulation
of Chemical Information (W. T. Wipke, S. Heller, R. Feldmann and E.
Hyde, eds.) John Wiley and Sons, Inc., 1974.

R.  Carhart and C. Djerassi, "Applications of Artificial
Intelligence for Chemical Inference XI: The Analysis of Cl3 NMR Data
for Structure Elucidation of Acyclic Amines", J. Chem.  Sot, (Perkin
II), 1753 (1973).


87

(40)

(41)

(42)

(43)

(44)

(45)

(46)

(47)

(48)

(49)




(50)




(51)

L.  Masinter, N.S.  Sridharan, R.   Carhart and D.H.  Smith,
"Application of Artificial Intelligence for Chemical Inference XII:
Exhaustive Generation of Cyclic and Acyclic Isomers".   Journal of the
American Chemical Society, 96 (19741, 7702. (Also Stanford
Artificial Intelligence Project Memo No. 216.)

L.  Masinter, N.S.  Sridharan, R.   Carhart and D.H.  Smith,
"Applications of Artificial Intelligence for Chemical Inference,
XIII. Labeling of Objects having Symmetry".  Journal of the American
Chemical Society, 96 (19741, 7714.

N.S. Sridharan, Computer Generation of Vertex Graphs, Stanford CS
Memo STAN-CS-73-381, July, 1973.

N.S. Sridharan, et.al., A Heuristic Program to Discover Syntheses for
Complex Organic Molecules, Stanford CS Memo STAN-CS-73-370, June,
1973.  (Also Stanford Artificial Intelligence Project Memo No. 205.)

N.S. Sridharan, Search Strategies for the Task of Organic Chemical
Synthesis, Stanford CS Memo STAN-CS-73-391, October, 1973.  (Also
Stanford Artificial Intelligence Project Memo No. 217.)

R.  G. Dromey, B. G. Buchanan, J.  Lederberg and C. Djerassi,
"Applications of Artificial Intelligence for Chemical Inference.
XIV.  A General Method for Predicting Molecular Ions in Mass
Spectra".  Journal of Organic Chemistry, 40 (19751, 770.

D. H. Smith,  "Applications of Artificial Intelligence for Chemical
Inference. XV.  Constructive Graph Labelling Applied to Chemical
Problems.  Chlorinated Hydrocarbons".  Analytical Chemistry, in press
(to appear May or June, 1975).

R. E.  Carhart, D. H. Smith, H. Brown and N. S.Sridharan,
"Applications of Artificial Intelligence for Chemical Inference.
XVI.  Computer Generation of Vertex Graphs and Ring Systems".
Journal of Chemical Information and Computer Science (formerly
Journal of Chemical Documentation), in press (to appear in May,
1975).

R. E. Carhart, D. H. Smith, H. Brown and C. Djerassi, "Applications
of Artificial Intelligence for Chemical Inference,   XVII. An
Approach to Computer-Assisted Elucidation of Molecular Structure".
Journal of the American Chemical Society, submitted for publication.


B.  G. Buchanan, "Scientific Theory Formation by Computer." To appear
in Proceedings of NATO Advanced Study Institute on Computer Oriented
Learning Processes, 1974, Bonas, France.

E. A. Feigenbaum, "Computer Applications: Introductory Remarks," in
Proceedings of Federation of American Societies for Experimental
Biology, Vol. 33, No. 12 (Dec., 1974) 2331-2332.

S. Hammerum and C. Djerassi,  "Mass Spectrometry in Structural and
Stereochemical Problems - CCXLIV; The Influence of Substituents and


88

Stereochemistry on the Mass Spectral Fragmentation of Progesterone."
Tetrahedron (accepted for publication), 1975.

(52) S. Hammerum and C. Djerassi, "Mass Spectrometry in Structural and
   Stereochemical Problems CCXLV.  The Electron Impact Induced
  Fragmentation Reactions of 17-Oxygenated Progesterones." Steroids
  (submitted for publication).

(53) H. Brown,  "Molecular Structure Elucidation III." Submitted for
  publication to SIAM Journal on Computing.

(54) R. Davis and J. King, "Overview of Production Systems" To appear in
  Machine Representation of Knowledge, Proceedings of the NATO AS1
  Conference, July, 1975.

(55) B. G. Buchanan, "Applications of Artificial Intelligence to
  Scientific Reasoning." In Proceedings of Second USA-Japan Computer
  Conference, August, 1975.

(56) R. E. Carhart, S. M. Johnson, D. H. Smith, B. G. Buchanan, R. G.
  Dromey, J. Lederberg, "Networking and a Collaborative Research
  Community: A Case Study Using the DENDRAL Program," to appear in
  Computing Networking in Chemistry, Peter Lykos, ed., American
  Chemical Society Symposium Series, No. 19, 1975.

(57) D. H. Smith (Paper XVIII) "The Scope of Structural Isomerisml'.
  Journal of Chemical Information and Computer Sciences, 15, 203
   ( 1975).

(58) B. G. Buchanan, D. H. Smith, W. C. White, R. Gritter, E. A.
  Feigenbaum, J. Lederberg and C. Djerassi, "Applications of Artificial
  Intelligence for Chemical Inference.  XXII.  Automatic Rule Formation
  in Mass Spectrometry by Means of the Meta-DENDRAL Program.t1 Submitted
  to Journal of the American Chemical Society.

(59) E. H. Shortliffe, R. Davis, S. G. Axline, B. G. Buchanan, C. C. Green
  and S. N. Cohen, "Computer-Based Consultations in Clinical
  Therapeutics: Explanation and Rule Acquisition Capabilities of the
   MYCIN System." Computers and Biomedical Research 8, 303-320 (1975).

(60) R. Davis, B. Buchanan and E. Shortliffe, "Production Rules as a
  Representation for a Knowledge-Based Consultation Program", accepted
  for publication by Artificial Intelligence.  (Also Stanford
  Artificial Intelligence Project Memo No. AIM-266.)


89

1V.A.l.b    MYCIN PROJECT

Computer Based Consultation in Clinical Therapeutics

Prof. S. Cohen, M.D. (Pharmacology) and
Dr. B. Buchanan (Computer Science)

(Grant HEW HSO-1544-02, 3 years, $163,965 this year)

Introduction

   This report offers a review of the progress made by the MYCIN
project over the past year.  To provide some background, we start by
describing the system's basic task, and document its significance.  This
is followed by a description of the way knowledge is represented and used
in the system, and a brief discussion of the advantages of the
representation we have chosen, The progress report follows this, detailing
the accomplishments of the past twelve months, and spells out the plans
for the coming year,

Background

   The ultimate aim of the MYCIN project has been to develop a
computer-based system to which physicians will refer for antimicrobial
therapy advice.  One primary consideration for the system has been its
level of performance.  In order to provide a tool which would actually be
useful, and be used in the clinical setting, we have to provide a system
which displays a high level of competence in its field.  Clinicians must
have confidence in a program's ability before they will be willing to use
it.  A second consideration has been the ability of the system to explain
its reasoning.  Since clinicians are not likely to accept such a system
unless they can understand why the recommended therapy has been selected,
the system has to do more than give dogmatic advice,  It is also important
to let it explain its recommendations when queried, and to do so in terms
that suggest to the physician that the program approaches problems in much
the same way that he does.  This permits the user to validate the
program's reasoning, and to reject the advice if he feels that a crucial
step in the decision process cannot be justified.  It also gives the
program an inherent instructional capability, allowing the physician to
learn from each consultation session.  Third, we feel it is desirable that
an expert in infectious disease therapy who notes omissions or errors in
the program's reasoning should be able to augment or correct the knowledge
base so that future consultations will not repeat the same mistakes. The
system should therefore have some capability for acquiring knowledge via
interaction with experts in the field.

   Progress towards these goals has been made in development of the
MYCIN system, composed of three interrelated modules,   The Consultation
System uses MYCIN's knowledge base along with patient data entered by the
physician to generate therapeutic advice.  The Explanation System has the
ability to generate a thorough documentation of the motivation for


90

questions the system asks or of the rationale for conclusions it reaches.
Finally, experts may use the Rule Acquisition System to update MYCIN's
knowledge base.  Together, these three modules give the system a wide
range of capabilities for dealing with the problem of advising on
diagnosis and therapy selection for infectious disease.

Significance of the problem

   The task of therapy selection for infectious disease was chosen
because of the demonstrated need for high quality advice in this area.
There have been numerous studies detailing the misuse of antibiotics and
its resultant cost.  One study (reference [21) indicates that in a recent
year, one of every four people in this country were given penicillin, and
nearly 90% of these prescriptions were unnecessary. A major evaluation of
the antibiotic prescribing habits of a wide range of specialists was
reported within the last two months in reference [3].   It indicates that
the overall score was only 68% correct, and suggests possible underlying
causes.  While there are a number of sociological factors which are also
significant (e.g. patient pressure for treatment even when none is
indicated),  the study suggests that causes for the low score range from
the fact that physicians may be unfamiliar with generic names for
antibiotics, to a lack of knowledge of basic bacteriology, to the fact
that they appear to use antibiotics as a substitute for clinical judgment.
Problems such as these indicate the need for more (and more accessible)
consultants to physicians selecting antimicrobial drugs,

General Approach

   To give a general feeling for the way MYCIN works, we present here a
brief description of the way knowledge is represented in the program, and
indicate how it is used. We also suggest some of the advantages which
result from embodying knowledge in the format we have chosen.

   All knowledge used by MYCIN during a consultation session is
contained in decision rules that have been coded and stored in the
machine.  The MYCIN Project members have identified approximately 400 such
rules during discussions of representative case histories.   Each rule
consists of a set of preconditions (called a PREMISE) which, if true,
permits a conclusion to be made or an action to be taken, according to the
ACTION part of the rule.  Figure 1 below shows one such rule.


91

If

1) the stain of the organism is gramnegative, and
2) the morphology of the organism is rod, and
3) the aerobicity of the organism is aerobic,

Then
    there
    organ

is suggestive
ism is bactero

evidence t.6) that the identity of the
ides.

RULE124

Figure 1

The system uses its collection of rules to make its conclusions.  If, for
instance, it is attempting to determine the identity of an organism which
is causing an infection, it retrieves the entire list of rules which, like
the one above, conclude about identity.  It then attempts to determine the
truth of the premise of the first rule on the list by evaluating in turn
each of the clauses of its premise. Thus, for the rule above, the first
thing to find out is the gramstain. If this is already available in the
data base, the program retrieves it from there.  If not, gramstain becomes
the new goal, we retrieve all rules which conclude about it, and try to
use each of them to obtain the value of the gramstain. If, after trying
all the rules on the list, the answer still has not been discovered, the
program asks the user,  The rules thus "unwind" to produce a succession of
goals, and it is the attempt to achieve each goal that drives the
consultation. Figure 2 below gives a graphical view of this process. (A
more complete description of the program's operation can be found in
reference [l]).

                      I                I
           I identity (
                      I                I
              / I \
                 /    I     \
                /     I       \
              /           I        \
        RULE124     other rules . . . . . .
        /I \
      / I      \
      /       I       \
gramstain morphology   aerobicity

Figure 2

  Many of the system's important capabilities are made possible by way
knowledge is represented in rules like the one in Figure 1.   Such rules
offer modular "chunks" of knowledge about the domain, represented in a
form that is comprehensible to the clinician.  For instance, if the system


92

is asked "How did you determine the identity of ORGANISM-l? I', it answers
by displaying each of the rules which were actually used, in the format
shown above. This is something which the clinician can readily understand,
and it provides a far more comprehensible explanation than would be
possible if the program were to use a statistical approach to diagnosis.

   It also means that the expert clinician can offer new "chunks" of
knowledge, by expressing them in this same form.  He can therefore help to
make the program more competent, without having to know anything about
computer programming.

   There are several other interesting and important benefits gained
from the approach we have chosen. These are explained in more detail in
reference [ll.

Progress report

General objectives and goals during the previous year

   During the past year's work on the MYCIN system our goals have been
(i> to increase the competence and broaden the scope of the system's
therapeutic advice; (ii) to provide additional features to increase
utility and performance of the system in the clinical setting; (iii) to
develop further the system's collection of user-oriented features to make
it easier for novices to use; (iv)  t o make it possible for an infectious
diseases expert (who may know nothing about programming) to interact with
and educate the program directly; (v) to develop new techniques to deal
with the technical problems of managing a large and growing system; and
(vi> to design and execute a formal evaluation study to measure the
system's performance.  We consider each of these in turn.

Competence and scope

   One of the major accomplishments of the past year was the extension
of MYCIN to cover diagnosis and therapy for meningitis infections. Over
100 new rules were added to provide this capability. This has proved to
be an especially useful new domain to investigate because it has presented
several new challenges.  In particular, meningitis requires the ability to
deal with a disease that is often diagnosed on clinical grounds before any
specific microbiologic evidence is available. We have thus found it
necessary to consider a larger range of clinical factors.  This has
resulted in a system which has a broader picture of the whole patient, and
thus directly confronts one of the concerns about earlier versions of the
system.  The system has also become more robust, because it requires less
hard microbiologic data,  and is thus less sensitive to inaccurate
laboratory reports.  Like expert clinicians, it is now alert to the
possible existence of anomalous data.

   The broader range of expertise also means that the MYCIN can begin
to play a much more effective role in the clinical setting.  Another early
concern was that a system with too narrow a range of capabilities demands
a great deal of judgment before it is even used.  Thus, if MYCIN could


93

deal only with bacteremia, the user would need to decide that the patient
indeed had a bacteremia before he could use the system.  By giving MYCIN
the ability to diagnose and treat a broader range of infections, we allow
it to become useful at a much earlier stage in a patient's clinical
course.

   Other contributions to the system's competence came from the
expansion of the knowledge base to include information about normal flora
for a wide range of culture sites.  MYCIN can now usually distinguish
between normal and pathological flora,  and can hence decide more precisely
whether to treat.

   We have also investigated the addition of some widely applicable
routines for computing drug dose in renal failure.   These have been
developed by independent investigators,  but are available to us and could
prove to be extremely useful. Our system currently issues warnings simply
to modify dosage in renal failure. Since the problem of determining renal
status and the proper adjustment of drug dose is difficult, customized
drug dosage recommendations will be an important addition to the power of
the system.

   There have also been significant improvements in the system's
ability to handle organism genus and species.  The problem requires that
the system be able to deal with varying degrees of specificity; at times
it can deduce both genus and species, and at others only the genus.   Yet
it must be able to prescribe correctly in all cases.   A fundamental review
of the problem has resulted in the addition of a number of new rules which
handle the problem comprehensively and uniformly.

Additional clinical features

    Several new features have been added to the system in anticipation
of its use on the wards.  MYCIN now keeps continuous statistics on the use
of individual rules from its knowledge base.  This will help us to monitor
long term performance, to study interrelationships between rules, and
perhaps detect inconsistencies or gaps in the knowledge base.

    Also looking ahead, we have designed an  "on-line" evaluation. At
the end of each consultation, the system will ask a few questions about
quality of performance, to get some feedback from the clinicians who are
actually using it.  This interchange will be very brief to avoid being a
burden to the user, but will offer a very important form of instant
criticism from our users.

User-oriented features

   Several "human engineering" capabilities have been improved over the
past year.  For instance, the system's handling of questions asked by the
physician has been made more powerful.  This was achieved by improving our
handling of English text, and by a comprehensive review of the kinds of
questions that are asked.  The system can now answer a broader range of
questions, and, in particular, can explain why it did not take a specific


94

action, as well as why other conclusions were reached.  Capabilities like
these are very important in allowing our local clinical experts to
discover the program's rationale for its actions. They can then evaluate
its line of reasoning, and suggest any necessary changes.

   We are also engaged in a comprehensive effort to put all of the
system's deductive actions into rules.  Some important steps were
previously performed by blocks of code, and hence could not easily be
explained by the system.  We have begun to reformulate the process in
terms of rules.  This will permit the system to be more specific about the
source of its drug recommendations.

   We have also added several new capabilities to provide more
convenient use of the system in anticipation of its use on the wards.
Among these are the ability for the user to type a comment about system
performance at any time during the consultation.  His comment is recorded
in a special file which is periodically reviewed by our medical staff.
This is in addition to the "on line" evaluation described above, and
allows the user to offer any comment which he may feel is relevant.

   We also have a parallel ability to report problems.   The user can
indicate that the system has "broken down" in some way, and is invited to
describe the problem.  His description is saved along with a copy of the
program, so that our systems programmers can fix it later.

Linking the expert and the program

   We have recently implemented a prototype version of a "bridge"
between the clinical expert in infectious disease and the program, which
will allow the expert to "teach" the program directly.  Formerly, the
expert's comments on the system's performance were given to a programmer,
who then made the relevant changes to MYCIN,  Now the expert can himself
begin to discover the source of many problems, and can indicate the
necessary rules.  The dialogue is carried out in English, and requires no
knowledge of programming.

Technical issues

   Several changes in the structure of the program have made it easier
to deal with the large and constantly changing knowledge base. In
general, we are faced with the challenge of keeping the system's size
within well specified limits,  and have devoted some effort to insure that
it remains sufficiently compact,  We have, for instance, separated MYCIN's
dictionary of English words from the rest of the system. This not only
reduces the space requirement considerably, but has an additional benefit
of making it easier to update the dictionary as the system grows.

   There have always been extensive `1self-documenting11 capabilities in
the system that is, MYCIN can supply instructions and helpful information
if the user is confused at any point.  We have recently improved the
handling of this feature so that it is both faster and requires less
space.


95

Formal evaluation study

   A major undertaking this year has been the design and execution of a
formal evaluation of the system's performance.  The basic idea was to give
the same clinical data to both MYCIN and a set of recognized experts in
infectious disease therapy, to compare their judgments, and to ask the
experts to evaluate MYCIN's performance.  We began by designing a form
that would allow us to separate the variables requiring analysis. We
attempted to determine whether MYCIN (1) asks too many or too few
questions, (2) correctly determines which infections require treatment,
(3) correctly identifies the organisms that may be causing the relevant
infections, and (4) adequately selects therapy to cover for the relevant
organisms.  The form was designed to be maximally informative, but very
simple to complete,  It interweaves a sample consultation with questions
to the expert, and asks him to record his own opinions regarding the
patient and appropriate therapy It was tested first in a pre-evaluation
trial run, with five patients evaluated by three local Fellows in the
Division of Infectious Disease at Stanford.

   For the formal study, fifteen patients were selected according to
strictly defined criteria.  For each of these patients we prepared a 1-2
page clinical summary and made copies of relevant material from the
patient's chart.  This information was used to obtain therapeutic advice
for each of them.  Questions posed by MYCIN were answered solely on the
basis of information collected from patient records at the time of the
first positive blood culture,  to simulate actual clinical use of the
system.  These consultations were integrated into the forms and sent along
with the clinical data to ten experts in infectious disease therapy.

   We had decided some time ago that the introduction of the system
onto the wards for experimental use would be predicated on a successful
outcome of this evaluation. Thus, while we had originally expected to
begin use on the wards some time this year, the large amount of work
involved in carrying out the evaluation has delayed us.  We feel quite
strongly that premature introduction of the program would be unwise, since
it would almost surely lead to reduced acceptance by the clinical staff.
Upon the return of the evaluation forms in mid- to late-March we shall
have sufficient data to determine not only the current level of MYCIN's
performance, but also the degree of agreement among the experts
themselves.  By sending five of the ten evaluation packets to experts in
other parts of the country, we are also attempting to determine to what
extent MYCIN reflects clinical judgments that may be peculiar to the
Stanford environment.

   In summary, our work in the past year has focussed on broadening the
scope of clinical competence of the system, and on evaluating its
performance.  In anticipation of the use of MYCIN on the wards, we have
added and strengthened many features, to insure that the program is
maximally useful to the clinician who seeks advice.

Plans for the coming year

There are a number of major projects planned for the coming year.


96

There will, for instance, be extensive testing of the new meningitis
rules, to insure both that their performance is satisfactory, and that
there are no unforseen side effects on the rest of the system.  We plan
also to begin work on a knowledge base for pneumonia as the next step in
increasing the program's scope.  The introduction of the system onto the
wards will give us valuable experience on a wide range of cases, and
provide a basis for on-going monitoring and evaluation of performance.

   We plan also to restructure part of the program's approach to
requesting information from the user.  Our current technique has developed
a small number of technical problems, the most important of which involves
the order in which questions are asked.  By reorganizing some aspects of
the control structure slightly, we will be able to solve all of the
technical problems. From the user's point of view the system will continue
to function as before, but at the start of a consultation it will ask a
number of questions to get a global picture of the current case.   This
offers the additional, unanticipated, advantage of making interaction with
the system seem more natural to the user who is used to presenting the
consultant with a brief overview of the case at hand.

   We have recently discovered an increasing need for the ability to
use information about one infection to conclude things about another, as
for instance when one infection has clinical implications about another,
We plan in the coming year to implement this capability in a quite general
fashion, so that we can deal with interrelationships of infections,
cultures, organisms, and so on.

   As the program becomes available on the ward, it will become more
important to be able to tell the system about new information which may
arrive several days after the initial consultation.  Thus, as new test
results arrive, or as the patient's condition changes, it should be
possible to add the new information, and obtain updated recommendations.
We plan to implement this too in the coming year.

   Finally, since one of our fundamental tasks is the assembly of large
amounts of knowledge of infectious disease diagnosis and therapy, we plan
to develop further the prototype "bridge" which links the medical expert
with MYCIN.  Our current version lacks in particular many of the "human-
engineering" aspects which are so extensively developed in the rest of the
system.  We foresee an important acceleration of this process of knowledge
gathering when it becomes easy for an expert by himself to make
significant changes to the knowledge base.

References

[ll Davis R, Buchanan B, Shortliffe E H, Production Rules as a
  Representation for a Knowledge-based Consultation System, A.I.   Memo
  266, Stanford University, November 1975. (submitted to Artificial
  Intelligence).

[21 Kagan B M, Fannin S L, Bardie F, Spotlight on Antimicrobial Agents -
  '1973, JAMA, 226,  3 (October 1973) pp 306-310.


97

[3] Neu H C, Howrey S P, Testing the Physician's Knowledge of Antibiotic
  use, NEJM, 293,  25, (18 Dee 751, PP 1291-5.

The MYCIN Project - List of Publications

[ll Scott A C, Clancey W, Davis R, Shortliffe E H, Explanation
  capabilities of knowledge based production systems (in preparation).

121 Shortliffe E H, Davis R, Some considerations for the implementation
  of knowledge-based expert systems, SIGART Newsletter, 55:9-12,
   December 1975.

[31 Shortliffe E H, Computer-Based Medical Consultations: MYCIN,
  (adaptation of thesis), American Elsevier, New York, 1976

[41 Davis R, Buchanan B, Shortliffe E H, Production rules as a
  representation for a knowledge-based consultation system A.I.   Memo
  266, Stanford University, November 1975. (submitted to Artificial
   Intelligence).

[51 Davis R, King J J, An Overview of Production Systems Machine
  Representations of Knowledge, Proceedings of NATO AS1 Conference, to
   appear, Spring 1976.  (Also A.I.  Memo 271, Stanford University,
   October 1975).

[61 Shortliffe E H, Judgmental knowledge as a basis for computer-assisted
  clinical decision making, Proceedings of the 1975 International
  Conference on Cybernetics and Society, pp 256-7, September 1975.

[71 Shortliffe E H, Axline S, Buchanan B G, Davis R, Cohen S, A computer-
  based approach to the promotion of rational clinical use of
   antimicrobials, International Symposium on Clinical Pharmacy and
  Clinical Pharmacology, Sept 1975, Boston, Mass. (invited paper)

[81 E H Shortliffe, R Davis, S G Axline, B G Buchanan, C C Green, S N
   Cohen, Computer-based consultations in clinical therapeutics:
  explanation and rule acquisition capabilities of the MYCIN system
  Computers and Biomedical Research, 8:303-320 (August 1975).

[91 E H Shortliffe and B G Buchanan, A Model of Inexact Reasoning in
  Medicine, Mathematical Biosciences 23:351-379, 1975.

[lOI Shortliffe E H, Rhame F S, Axline S G, Cohen S N, Buchanan B G, Davis
  R, Scott A C, Chavez-Pardo R, and van Melle W J, MYCIN: A computer
  program providing antimicrobial therapy recommendations (abstract
   only).  Presented at the 28th Annual Meeting, Western Society For
  Clinical Research, Carmel, CA, 6 Feb 1975.  Clin.   Res.  23: 107a
   (1975).  Reproduced in Clinical Medicine, p. 34, August 1975.

1111 Shortliffe E H, MYCIN: A rule-based computer program for advising
  physicians regarding antimicrobial therapy selection (abstract only);


98

Proceedings of the ACM National Congress (SIGBIO Session), p.  739,
November 1974.  Reproduced in Computing Reviews 16:331 (7975).

[12] E H Shortliffe, S G Axline, B G Buchanan, S N Cohen, Design
   considerations for a program to provide consultations in clinical
  therapeutics, Presented at San Diego Biomedical Symposium 1974
  (February 6-8, 1974).

1131 E H Shortliffe, S G Axline, B G Buchanan, T C Merigan, S N Cohen, An
   artificial intelligence program to advise physicians regarding
   antimicrobial therapy, in Computers and Biomedical Research, 6:544-
   560 (1973).

[14] Shortliffe, E H, MYCIN: A rule-based computer program for advising
   physicians regarding Antimicrobial therapy selection, Thesis: Ph.D.
  in Medical Information Sciences, Stanford University, Stanford CA 409
  pages, October 1974.  (available from NTIS as document ADAOOl373)

Funding Status

   MYCIN is funded by the Bureau of Health Services Research and
Evaluation, and is currently completing the second year of a three year
grant (S. Cohen & S.  Axline, Principle Investigators).  The budget for
the current year (6/l/75 - 5131176) is $163,965; a total of $149,982 is
requested for 6/l/76 - 5/31/77. Renewal for the coming year is currently
under-going an in-house review.  A new 3-year grant request for comparab
levels of funding will be submitted in the fall of 1976.

le

Interaction with SUMEX-AIM Resource

    During the past year, we have been contacted by a number of
physicians who had read about MYCIN and were interested in trying out the
system.  Using TYMNET, these physicians in Boston, San Diego, Seattle,
Washington D.C.,  and Atlanta were able to use the SUMEX GUEST account to
run consultations on test cases.

   MYCIN users are urged to send us comments about the system's
performance.  [A new "COMMENT" feature in the system allows comments to be
entered at any time, without interrupting the consultation, nor even
requiring the user to know how to use SNDMSG.] Recent comments from
doctors associated with Rutgers Computers in Biomedicine served as very
helpful guidelines for making the program's instructions and questions
easier for a naive user to understand.  Such comments also focus our
attention on deficiencies in the program's medical knowledge as well as
pointing out programming problems which may exist.

    We have continued interaction via SNDMSG and terminal links with
members of the DIALOG group, who recently wrote MYCIN-like rules for
diagnosing and treating venereal diseases.  We have implemented these
rules and can modify or add to them as the doctors in Pittsburgh run more
consultations to test the validity and completeness of this set of rules.


99

   At a resent 3 day mini-conference, and at weekly meetings, members
of the different SUMEX-AIM projects which make up the Heuristic
Programming Project at Stanford discuss common problems faced by all the
wow, and how each group handles these problems.  These discussions have
proved very helpful, especially to those projects which are currently in
the design stage.  Several of the projects have been able to benefit from
the work that has been done in MYCIN on designing production rules and
explanation capabilities.

Critique of Resource Services

   Development of MYCIN has been greatly facilitated by the
availability of the Interlisp language and its extensive interactive
debugging capability and user-oriented features; in fact, it is doubtful
that MYCIN could have developed to its current state without this large-
scale interactive resource.  However,  in recent months its use has been
severely limited by the poor response time during peak hours, which
effectively prevents the use of MYCIN or Interlisp at such times.   In this
regard, we have found useful the SUMEX batch facility, which permits us to
run some of our non-interactive tasks at times of low system usage.

   We are fairly pleased with the availability of disk storage,
although its availability may, in the foreseeable future, present some
problems.  Continuing development of the project makes substantial demands
on disk space, since both experimental and publically accessible version
of the system must be kept available, as well as a library of patient
cases, system dictionaries, and documentation.  The archival and retrieval
mechanism currently available has proved to be very helpful, and we have
made considerable use of it,  This, along with careful management of the
available space, has made it possible to avoid any problems at this time,
As project development continues, however, we anticipate that disk space
may become a scarce resource.

   One of the outstanding aspects of the facilities at SUMEX continues
to be the attitude and competence of its staff and systems people.  They
seem constantly willing to help with problems and consider suggestions,
encouraging a sense of cooperation in the user community.  Their on-going
development of text editors and other features of the system contribute
directly to its utility as a scientific resource.


100

1V.A.l.c    PROTEIN STRUCTURE MODELING PROJECT

Protein Crystallography Project

Prof. J. Kraut and Dr. S. Freer (Chemistry, U. C. San Diego) and
Prof. E. Feigenbaum and Dr. R. Engelmore (Comp. Sci., Stanford)

(Grant NSF DCR74-23461, 2 years, $88,436 total)

I. Summary of Research Objectives

A. Collaboration, Biomedical Relevance and Technical Goals

    The Protein Crystallography Project is a collaboration of two
research groups,  one at Stanford University, the other at the University
of California at San Diego.  The Stanford group consists of Edward
Feigenbaum, Robert Engelmore, Penny Nii, and, during the current academic
year, Carroll Johnson of Oak Ridge National Laboratory,  The primary
activities are to 1) identify critical tasks in protein structure
elucidation which may benefit by the application of AI problem-solving
techniques, and 2) design and implement programs to perform those tasks.
The UCSD group consists of protein crystallographers: Joseph Kraut,
Richard Alden and Stephan Freer. As protein crystallographers, their
objective is to seek new techniques that will facilitate the elucidation
of the tertiary structure of proteins.

    The biomedical relevance of protein structure determination is well
known.  Solved protein structures have contributed substantially to our
understanding of molecular biology, enzymology and immunology.

   We have identified two principal areas where we believe the
collaboration is of practical and theoretical interest to both protein
crystallographers and computer scientists working in AI.  The first is the
problem of interpreting a j-dimensional electron density map.   The second
is the problem of determining a plausible structure in the absence of
phase information normally inferred from experimental isomorphous
replacement data.

B. Specific Objectives

1. Interpretation of electron density maps

   A major challenge in protein crystallography is the interpretation
of the crystallographic electron density map, Our goal is to build a
knowledge-based system which proposes plausible locations for
substructural units consistent with the electron density map, the amino-
acid sequence (if known), and physical, chemical and stereochemical
con& raints .  The system attempts to integrate knowledge from three
different areas: chemical topology,  microstructure and macrostructure.
The chemical topology knowledge base is task specific and contains the



101

known connectivities within the protein structure (i.e., the amino acid
sequence, cofactors, disulfide bridges, and coordination bonds to
prosthetic groups).  The microstructure knowledge base contains known
facts about protein molecules (e.g., the molecular geometry of the peptide
bond and the amino acid side chains, hydrogen bonding properties, helix
forming propensity, etc. 1.  The macrostructure knowledge base contains
stereotype templates for the plausible major components of the molecule
(e.g., alpha helices and pleated sheets).  Our strategy is to isolate an
individual molecule within the map, then determine the path of the main
chain by a skeletonizing procedure (as is done, for example, by J.  Greer,
J.  Mol.  Biol.(1974), ~01.82, pp. 279-301). We will then parameterize
the density along the backbone, identify the most obvious regions (heavy
atoms, alpha helices, planar groups) and determine, by region growing and
template matching, the identity and position of side chains.

2. Structure Determination in the Absence of Experimental Phase
  Information

   X-ray crystallography is the primary experimental technique for
investigating the 3-D structure of molecules.  The data so obtained are a
collection of intensity measurements at discrete directions with respect
to the x-ray beam and crystal axes.  These intensities are related to the
positions of the atoms in the crystal lattice by a Fourier transformation.
Thus, given the structure of the molecule, its orientation with respect to
the crystal axes and the symmetry properties which determine the molecular
stacking,  one can calculate the intensities.  However, given the
intensities and the unit cell properties, one cannot go the other way.
That is, the inverse Fourier transform cannot be calculated because the
experiment measures only the amplitudes of the diffracted waves and not
their phases -- the classical "phase problem" of x-ray crystallography.

   During our first year we have been investigating and implementing
various computational techniques for inferring trial structures, in the
absence of phase information.  Our aim has been to develop a system of
computer programs which apply as much general and task-specific knowledge
as possible to constrain the search for a plausible structure (or partial
structure) consistent with the experimentally measured structure
amplitudes.

   A procedure well known to x-ray crystallographers investigating the
structures of small molecules is that of Patterson search.  Recently this
technique has been shown to be effective in protein crystallography as
well, when there exists a family resemblance between the molecule under
investigation and a known protein.  Patterson search is basically an
image-seeking technique, where one searches for the "Patterson image" of
an hypothesized molecular structure in the Patterson map derived from the
experimental data.  The Patterson function does not require any knowledge
of phases.

   Patterson search is our primary technique for inferring structures
in the absence of experimental phase information.  In order to resolve the
ambiguities which often arise in Patterson search predictions, we are
investigating the use of other knowledge sources, among which are


102

anomalous dispersion Paterson interpretation, Patterson search in
reciprocal space, superposition and Fourier refinement methods.   The
integration of these diverse knowledge sources is a primary objective of
the research.

C. Summary of Project Accomplishments

   Our activities during the first year fall into three general
categories: augmenting our stockpile of crystallographic computing tools,
applying Patterson search methods to a solved protein structure
(Cytochrome C2) and an unsolved protein structure (Cytochrome F), and
initiating a new research objective.

1. Application of Patterson Search to Cytochrome F

   A major goal of our first year of research has been to apply the
method of Patterson search, in conjunction with other analytic techniques,
to solve a real protein structure.  Cytochrome F is an excellent
candidate, because (1) phase information is not yet available, and (2) the
protein's structure is expected to show a resemblance to other members of
the cytochrome family.  The current hope is that the family resemblance
will be sufficiently strong that the complete structure can be solved by
standard refinement techniques after one finds the correct orientation and
position of a characteristic substructure.  As of this writing a complete
solution has not yet been obtained.  A considerable effort, described in
the remainder of this section has been invested in the pursuit of the
correct orientations of the protein in the unit cell.  A large number of
Patterson search calculations were performed, exploring the effects of
variations in the search structures, the selection of search vectors, the
choice of measures of fit, and even in the primary data.   It now remains
to be seen if some of the candidate orientations proposed by the search
calculations can be verified by other sources of crystallographic
knowledge.

2. Selection of a new research objective.

   Shortly after the inception of the project, the two collaborating
groups agreed to the need for an additional scientist with an extensive
knowledge of crystallography and crystallographic computing, and a serious
interest in the application of AI techniques to his field of expertise.
We were fortunate to induce Dr. Carroll Johnson of Oak Ridge National
Laboratories, who well fulfills these qualifications, to join our project
for a one-year period beginning September 1, 1975.  His contributions have
been instrumental in defining a new task area for the application of AI
methodology to protein crystallography.  After studying recent work in
visual scene analysis, he noted the similarities of that AI application
with the crystallographic problem of interpreting a 3-dimensional electron
density map, i.e.  deriving the coordinates for a trial structure, given
the electron density function, the amino-acid sequence and the
stereochemical principles and constraints known to apply.


103

   The task so defined contains most if not all of the ingredients for
the development of a knowledge-based system, in the mainstream of current
AI research.  The crystallographer integrates several sources of knowledge
-- chemical, stereochemical, crystallographic -- as he builds a model of
the protein which is consistent with the given data.  He combines this
knowledge with a rich set of heuristics for focussing his attention on
promising regions of the map, for distinguishing characteristic features,
for deciding at what level of detail to stop the interpretation in
different regions, and for evaluating competing hypotheses.

  The model builder's decision-making process is dynamic and flexible,
driven at times by the need to reach specific subgoals, and at other times
by the current state of the model or special features of the data. A
computer program for interpreting the map will require a control structure
which combines both goal-driven and event-driven elements.  The design of
a suitable control structure, and the implementation of a prototype
program for performing the basic interpretive tasks are primary objectives
for our second year of research.

3. Assembly of crystallographic computing tools.

   With the assistance of several crystallographers around the country
we have augmented our collection of crystallographic computing programs
and systems (i.e., integrated collections of programs).  Those programs we
received and/or implemented on SUMEX include:

a) X-RAY 72.  This is a large system of Fortran programs developed by
   J.M. Stewart (Univ.  of Md.) and others,  A version written for the
  DEC-10 at the University of Pittsburgh was kindly furnished to us
   by Steve Ernst.  We have implemented some parts of the system as
   separate programs, including the Fourier transform, peak finding,
  and bond length and angle calculating programs.

b) Sequence-Structure Correlator.  A program which predicts alpha helix
  and pleated sheet regions of a protein molecule from its amino-acid
  sequence was furnished to us by Ray Salemme (Univ.  of Ariz.).   The
  program was subsequently rewritten in both the SAIL and LISP
   languages.  The algorithm is based on the rules developed by Chou
  and Fasman (Biochemistry, vol.  13, pp. 211-245 (1974)).

c> Oak Ridge Fast Fourier Program (ORFFP).  A system of Fortran programs
   for generating, analyzing and plotting Fourier maps was obtained
  from Henry Levy of Oak Ridge and implemented on SUMEX.   The
  plotting segment is the ORTEP program written by Carroll Johnson.

d) Greer Skeletonization Program.  A Fortran program for reducing an
  electron density map of a protein to a set of connected line
  segments, following the algorithm proposed by Greer, has been
  written and is currently being debugged.  This program will play a
  pre-processing role in the map interpretation problem, producing a
   highly abstracted representation of the map.

e) Huber Rotation and Translation Search Program.   This is a Patterson


104

search program which, like our PSRCH, computes correlations of two
vector sets in the vector space representation.  The program, in
Fortran, is on file but has not yet been tested at SUMEX.

f) ROTRAN.  These programs were written by B.M.  Craven (Univ. of
  Pittsburgh) and are designed to perform rotational and
   translational Patterson searching, employing a method developed by
    Crowther.  At present we have only a listing of the Fortran program
   and instructions for its use.

D. Publications

   Feigenbaum, E. A., Engelmore, R. S.  and Johnson, C. K., A
Correlation Between Crystallographic Computing and Artificial Intelligence
Research, submitted for publication in Acta Crystallographica.

II. Interaction with the SUMEX-AIM resource

   All program development,  and most communications between the two
collaborating groups are effected on the SUMEX computer.  The UCSD group
has a direct connection to SUMEX via the TYMNET computing network (UCSD
lost its ARPANET connection during the past year).  Routine daily
communications now take place using the system's message facility.
Program files are equally accessible from Stanford and UCSD, so that
either group can construct, edit or exercise the programs.  Large data
files are transmitted on magnetic tape,

    The greatest benefit of the interaction with the SUMEX-AIM resource
is the opportunity to share ideas, programming experience and utility
programs with other users in the community.  The availability of a pool of
INTERLISP programmers, for example, has been of great assistance in our
initial efforts with the electron density map interpretation task.
Members of the SUMEX staff have also been helpful and patient in solving
some of the more mundane problems associated with any computational effort
(e.g., reading magnetic tapes produced at other computer centers).


105

IV.A.2    NATIONAL USERS

IV.A.2.a    CHEMICAL SYNTHESIS PROJECT

Simulation and Evaluation of Chemical Syntheses (SECS)

   W. Todd Wipke
Department of Chemistry
University of California
Santa Cruz, CA.  95064

I.  Summary of Research Program

A.  Technical Goals

   The long range goal of this project is to develop the logical
principles of molecular construction and to use these in developing
practical computer programs to assist investigators in designing
stereospecific syntheses of complex bio-organic molecules.  Previously the
focus of our work had been to represent as accurately as possible the
fundamental chemical transformations and how steric, proximity, and
electronic factors affect these reactions, going into great detail even
involving analysis of three-dimensional models.  The goals for this year
focused on developing constraints to help guide the synthesis program in
growing the tree of synthetic precursors.  We wanted to utilize high level
information about symmetry, and stereochemistry to set up strategies
defining preferred orderings of making and breaking bonds. We also hoped
to completely separate these strategies from the chemical transforms to
allow experimenting with changing transforms keeping strategies constant
and vice-versa,  This separation was also deemed important for ease of
maintenance of large transform libraries.  Finally we hoped to use these
strategies for guiding multistep lookahead so the user could see a
sequence developed in the tree at one time.

B.  Medical Relevance and Collaboration

   The development of new drugs and the study of how drug structure is
related to biological activity depends upon the chemist's ability to
synthesize new molecules as well as his ability to modify existing
structures, e.g., incorporating isotopic labels into biomolecular
substrates.  The Simulation and Evaluation of Chemical Synthesis (SE&S)
project aims at assisting the chemist in designing stereospecific
syntheses of biologically important molecules.  The advantages of this
computer approach over a manual approach are manyfold: 1) greater speed in
designing a synthesis; 2) freedom from bias of past experience and past
solutions; 3) thorough consideration of all possible syntheses using a
more extensive library of chemical reactions than any individual can
remember; 4) greater capability of the computer to deal with the many


106

structures which result; and 6) capability of computer to see molecules in
graph theoretical sense, free from bias of 2-D projection,  SECS was
designed to be able to apply any kind of chemical transformation, and
because of this generality we see it finding application in biogenesis and
metabolism (see section II A below).

C.  Progress and Accomplishments

    The environment of this project has changed dramatically in the past
year with the move from Princeton to Santa Cruz.   SECS was moved from a
hands-on environment consisting of a KA PDP-10 with an LDS-1 graphics
system and standard DEC software to a remote environment with access to a
KI PDP-10 through a GT40 graphics terminal where the host monitor system
is TENEX.  The compatibility package of TENEX considerably eased the
problems of conversion.  Most problems resulted from differences in file
handling, and differences in the Fortran operating system.   Subtle
problems arose from the fact that our files were organized by tapes and
could not simply all be transfered to disk, because of space problems and
naming conflicts.  SECS was successfully converted to TENEX and the
graphical interaction was modified for greater efficiency in our new
remote low bandwidth mode of communication.

    Progress in developing strategy includes creating a general goal
list structure which allows complex logical combinations of goals to be
expressed, for example, It (break bond 2 or break bond 3) and use group 1".
Thus, instead of one set of "strategic bonds" to be broken, we now can
express strategies involving pairs of bonds, or groups or atoms.  We have
succeeded in isolating strategy from the chemical transforms--strategies
can only contain expressions which refer to structural units of the
molecule or changes in those units, and may not refer to any transform by
name,  The transforms have been given "character" which describes the type
of structural changes likely to occur if the transform is applied, e.g.,
cleaves ring,  removes group, modifies stereochemistry, etc.  The SECS
strategy module first sets up standard goal lists based on graph-theoretic
heuristics and then allows the user to view and modify the goal lists. In
this way the user can place constraints on the syntheses generated, e.g.,
"don't modify this ring, instead,  focus your attention on this part of the
molecule. " GOALTST was modified to interpret the new complex goals and
also to test the achievement or violation of a goal as early as possible.
Hence, there is testing before examination of the transform as well as
after interpretation of the transform.   The net result of this work on
strategy is that the user can very closely constrain SECS now to work only
in areas which the user decides is worthwhile, consequently fewer
precursors are generated which the user would delete.

   Significant progress was made in the recognition of symmetry and use
of that information in SECS.  A general algorithm based on SEMA, our
canonical naming algorithm, was developed and implemented for generating
the entire symmetry group of the molecule, using the stereochemical graph
isomorphism group.  We have applied this symmetry knowledge to make SEMA
itself more efficient , and have combined it with symmetry of the
transforms to make application of a transform generate a non-redundant set
of precursors,  Thus,  if a double bond is introduced into cyclohexane,


107

only one cyclohexene is generated, not six.  This algorithm takes into
account all stereocenters in the molecule of both double bonds and
saturated carbon.  Addition of this algorithm reduces execution time of
SECS on certain types of problems by a factor of up to six or more. We
have not yet developed heuristics for creating strategies from this
symmetry group.

    Considerable improvement of aromatic chemistry has resulted from the
addition of "character words " to the aromatic transforms and
reorganization of the aromatic module.  Electronic perception is only
performed now if SECS is fairly certain that aromatic chemistry will be
used and the user can prevent the MO calculations if he wishes.  Work is
still progressing on implementation of strategies to control when to apply
aromatic transforms based on heuristics derived from an extensive
literature study.  Many other modifications have been made to improve the
human engineering of SECS.  Documentation of modules whose authors have
left the project is still continuing.  Now with an expanding users group,
a good user's manual is required and is under revision.

D.  Current list of Project Publications

[l] W.T.  Wipke and T.  M Dyott, "Use of Ring Assemblies in a Ring
  Perception Algorithm," J. Chem.  Info.  and Computer Sci., 15, 140
  (1975).

[21 T.M.  Gund, P.v.R.  Schleyer, P.H. Gund and W.T. Wipke, "Computer
  Assisted Graph Theoretical Analysis of Complex Mechanistic Problems in
  Polycyclic Hydrocarbons.  The Mechanism of Diamantane Formation from
  Various Pentacyclotetradecanes," J.  Amer.   Chem.  sot., 97, 743
  ( 1975).

Papers in Preparation:

[ll W.T.  Wipke and P. Gund,  "Simulation and Evaluation of Chemical
  Synthesis.  Congestion: A Conformation Dependent Function of Steric
   Environment at a Reaction Center.  Application with Torsional Terms to
  Stereoselectivity of Nucleophilic Additions to Ketones," J.   Amer.
   Chem.   sot.

[21 W.T. Wipke, G. Birkhead, and T. Brownscombe, "Correlation of
  Congestion with Stereoselectivity of Olefin Epoxidation."

131 W.T Wipke, C. Still, G. Grethe, T.M. Dyott, and P.E. Friedland,
  "ALCHEM: A Language for Representing Chemical Transforms.  Application
  to Heterocyclic Chemistry."

r41 w.  Todd Wipke and Hartmut W.  Braun,  "Graph-theoretical Perception of
  Molecular Symmetry.

[51 W.T. Wipke, G. Smith and H. Braun,  "SECS-Simulation and Evaluation
  of Chemical Syntheses: Strategy and Planning," ACS Symposium
  Proceedings.


108

E.  Funding Status.

IBM Fellowship supporting S. Krishnan (postdoctoral)
    $4000. expires September, 1976

Merck, Sharp and Dohme fellowship supporting Graham Smith
(postdoctoral)

$1000. expires July, 1976

Sandoz Unrestricted grant to support computer synthesis
     $2000

Proposal submitted 1 Mar 1976 to Division of Research Resources
"Resource-Related Research: Biomolecular Synthesis"
    $391,532 for three years

II.  Interactions with the SUMEX-AIM Resource

    A.  Collaborations and Medical Use of Programs Through Networks,
Since SECS only recently has been operating on the SUMEX-AIM resource,
collaborations are just beginning.  However demonstrations of SECS have
been given at the National Cancer Institute and collaboration with
Division of Chemical Carcinogenesis to try to use SECS for metabolism of
compounds to evaluate carcinogenic activity of metabolites is currently
under discussion.  The National Library of Medicine toxicology program is
also interested in SECS and are planning to access SECS via SUMEX-AIM.
Dr.  Steve Heller of the EPA and Dr. G.W.A. Milne of the National Heart
and Lung Institute are currently exploring the possibility of putting SECS
up on the Cyphernetics network.  For the past year SECS has been available
over the TELENET from First Data Corporation,  Squibb, Merck, FMC,
American Cyanamid, Pfizer and Searle pharmaceutical companies have used an
experimental version of SECS and have provided useful feedback to us about
problems they discovered.  We expect increasing numbers of academic users
will be accessing SECS via SUMEX-AIM as they learn of its availability,

   The availability of SECS on SUMEX-AIM has also served health-related
research at the University of California, Santa Cruz.  For example, model
building using the SECS model builders is being performed for Professor
Edward Dratz (UCSC) to generate conformations of fatty acids isolated from
visual membranes ("Structure and Function of Visual Photoreceptors",
EY00175-051, and for Professor Howard Wang (UCSC> to study how
conformations of steroids may affect the local anesthetic - membrane
interaction ("Role of Membrane Proteins in Local Anesthetic Action,"
GM22242-01).

    B.  Cross Fertilization with other SUMEX-AIM projects.   The SECS
project held joint research group meetings at Stanford with the DENDRAL
and AI groups to discuss common problems and research goals,  This has
been very rewarding since the DENDRAL group has useful experience with
symmetry manipulation which SECS was getting into, and the SECS project
has useful experience with representing reactions, which DENDRALICONGEN
was getting into.  These joint meetings also let the members meet in
person after having met on-line on the network.  Last year's AIM


109

Conference at Rutgers was also a valuable experience, which allowed us to
meet people interested in similar problems in different disciplines, and
it also caused us to think about what we were doing in research with some
new perspectives.  We are looking forward to this year's AIM meeting.

    We find the SUMEX-AIM network very well human engineered.   The
ability to leave messages on the network, and to LINK to other users on-
line for advice has been extremely useful to us, since we were new to the
TENEX operating system.  But more than that,  we have been able to utilize
expertise of others which our group lacked, e.g., Trisha Davis (an
undergraduate) has been writing a model builder and display program in
SAIL although there is no SAIL expertise in the SECS group--that would not
have been possible without the network communication features.

     C.  Critique of Resource Services.  The SECS project finds the
SUMEX-AIM staff and community extremely helpful, and anxious to extend
themselves to meet our needs.  SUMEX provided a leased line and modems to
us and provided TYMNET access as well.  Were it not for SUMEX, this
research effort would have perished since there is no adeq.uate computer
facility on-campus,

   We do find we are short of disk space and in our grant proposal we
have requested funds for a disk drive to place at SUMEX to help resolve
this problem.  The response time during the day and sometimes even later
is poor for interactive graphics, but hopefully the second processor being
installed will help alleviate that difficulty.  We have an additional
problem that it is difficult to transfer files from TENEX to any other
PDP-10 with the files retaining their filenames.  This problem may also be
resolved if we are able to write tapes locally from over the network.
Basically we have found that SUMEX-AIM provides a productive and
scientifically stimulating environment and we are thankful that we are
able to access the resource and participate in its activities.


110

IV.A.2.b   INTERI~T (DIALING) PROJECT

INTERNIST - Diagnostic Logic Program
Dr. H. Pople and J. Myers, M.D.
   University of Pittsburgh

(Grant HEW m-00144-01, 3 years, $167,168 last year)

I.  SUMMARY OF RESEARCH PROGRAM

A.  BACKGROUND AND OBJECTIVES

   The principal objective of the MIS laboratory at the University of
Pittsburgh is to develop, test, and implement a computer-based diagnostic
consultation system for internal medicine.  Considerable progress towards
this goal had already been made prior to our receipt, in June, 1974, of a
three-year $524,000 grant from the Bureau of Health Resources Development
to establish a "Computer Laboratory Health Care Resource' at Pitt. At
that time, the medical data base accessed by the internist (formerly
DIALOG) program was estimated to comprise approximately twenty-five
percent of the major diseases of internal medicine, and a number of case
studies had been run illustrating the power of the INTERNIST heuristic
process in dealing with a variety of complex clinical problems.   Our
research plan envisioned a five-year development effort, intended to
yield:

(a> A four-fold expansion of the data base.

(b) Systematic field testing and evaluation of the system in actual
  clinical settings.

(c) Eventual implementation, making INTERNIST available for clinical
   use on a routine basis.

B.  PROGRESS AND ACCOMPLISHMENTS

   Shortly after award of the BHRD grant, arrangements were concluded
permitting use of the SUMEX-AIM computer resource for INTERNIST research
and development activities.  Although the SUMEX-AIM computer is of the
same genre as the one used in the original INTERNIST development work,
differences in the LISP language supported necessitated major conversion
efforts.

   As mentioned in our last progress report,, the need for rapid access
to large data files motivated the design of an interface between the
INTERLISP host processor and a set of structured files containing the
entire vocabulary and network of associations comprising the INTERNIST
data base.

As of June, 1975, conversion of the data base had been completed and


111

the necessary interfaces had been established to enable INTERNIST
diagnostic programs to work with these revised structures.  design and
implementation of an interactive data entry and editing system, was
completed in December, 1975, enabling expansion of the on-line data base
to its present size which is approximately 60% complete.

   This newly expanded data base is currently being subjected to
extensive testing in both typical examples of disease and difficult
diagnostic problems.  This procedure of systematically checking the entire
clinical data base should be completed by late June, 1976.  The planned
field test and evaluation effort will then commence in early fall.

C.  PUBLICATIONS

[l] Pople, H.E., "Artificial Intelligence Approaches to Computer-based
Medical Consultation; Proceedings of IEEE Intercon, 1975, New York.

[2] Pople, H.E., Myers, J.D., and Miller, R.A., "DIALOG: A Model of
Diagnostic Logic for Internal Medicine," Proceedings of Fourth
  International Joint Conference on Artificial Intelligence, Tbilisi,
  Georgia, USSR, 1975.

II.  INTERACTIONS WITH SUMER-AIM RESOURCE

   Because this year has been largely devoted to system development and
checkout activities, there has been no real opportunity to engage in any
meaningful collaboration via the communication networks associated with
SUMEX-AIM.  We fully expect to exploit this attractive feature of the
resource, however, during the evaluation and field test studies planned
for the coming year.

   Concerning the service provided by SUMEX-AIM, our only complaint is
the heavy loading during prime hours, which effectively prevents serious
use of the INTERNIST diagnostic programs during certain portions of the
day.  We applaud and eagerly await the advent of the SUMEX-AIM dual
processor.


112

IV.A.2.c    HIGHER MENTAL FUNCTIONS MODELING

HIGHER MENTAL FUNCTIONS MODELING ( HMF 1
    Project Summary - 1976

  Kenneth M. Colby, M.D.
Professor of Psychiatry, UCLA

(NIH MH-27132-01, 2 years, $67,000 this year)

Introduction.

   One of the oldest and newest applications of computers in artificial
intelligence is the simulation of human cognitive processes.  The Higher
Mental Functions project has been modelling belief systems and related
psycho-pathological delusional systems for a number of years.   The
specific goal for the past two years has been to construct, test, and
validate a computer simulation of paranoid processes.  The development of
such a model has clinical implications for the understanding, treatment,
and prevention of paranoid disorders.

    Recently we have been focussing on the origin of beliefs in belief
systems and the criteria by which beliefs are significant to the entity we
are modelling; i.e.  the motivation for the beliefs.  The motivation for an
entity's purposive behavior is based in its affect or emotion system. We
are currently formulating a theory of the motivational influence of affect
on conative (volitional) and cognitive (inferential) processes, with the
intent of implementing this background theory in a simulation model, By
specifying the underlying theory of motivation we hope to make the theory
of paranoia more explicit and the paranoid simulation model more adequate.

    The strategy of computer simulation of mental processes can be
characterized roughly by three phases:  (1) identification and critical
description of non-random patterns occuring in the phenomena under study,
(2) explanation by postulation of underlying mechanisms which generate,
produce, or are responsible for the non-random patterns, and (3)
validation by repeated attempts to test the reality of the proposed theory
or model.  The construction and use of simulation models of mental
processes closely parallels model-building in other sciences.  An attempt
is made to reproduce the relevant features of the patterns under study.
This attempt produces simplification and idealization of the phenomena.
Simplification implies that only centrally relevant variables are chosen
for representation in the model.  Idealization implies that exact classes
and perfect properties are assumed in the implementation of the model.
Still, the model can provide an explanation of underlying mechanisms which
is useful in understanding and interpreting the observed patterns of
phenomena.  Finally, a model can be used in practical situations for
prediction , and for providing suggestions to clinicians for potential
control and change in the phenomena.  Such technological purposes are
important for models of mental disorders since the long-range goal of


mental health research is to prevent or reduce conditions of human mental
suffering.

   Simulations of cognitive processes are difficult for a number of
reasons:

(1) The underlying generative mechanisms of human behavior are
  inaccessible to direct observation, and must instead be postulated
  as hypothetical constructs that may (possibly) account for the
  phenomena.

(2) A simulation must take into account the rich background of
  information that a human has available to apply to a contemporary
   situation.

(3) Human beings have internal needs which are a function of the
  immediate past and present, as well as the long-term past of the
   individual.  These needs and past experiences color the human's
  response to the contemporary situation.

(4) Human linguistic behavior is the richest source of data for
  exploring cognitive processes as well as the most complex and
  therefore desirable behavior to simulate.  At the same time, it is
  difficult due to the variety of behavior possible and the variety of
  explanations possible for one specific linguistic action.

(5) Once a simulation model is performing, it is difficult to show the
  subtleties of the model's generative mechanisms.  Instead, some
  attempt is usually made to reveal the internal workings of the model
  and appeal to the observers' intuition and/or introspection.

   Our overall purpose, then, is to develop theories of human mental
processes, specifically psycho-pathological processes, and to implement
these theories in computer simulation models.  The simulation models help
formalize and explicate their associated theories by forcing them into a
single notation and requiring the theory-builders to specify the details
of the theory.  In addition, the models provide a testing ground for
validating the theory.  On the basis of such theories, we hope to explain
the origin of psycho-pathologies and offer principles on which to base
treatment and prevention.

Technical goals.

     A.  Expand the theory of paranoia.  The theory implemented in the
current model (PARRY21, the humiliation theory, postulates that
informational inputs from other people activate a belief in the self's
inadequacy.  The paranoid mode then consists of strategies which forestall
or ward off an impending negative affect experience of humiliation by
negating this belief that the self is wrong and asserting the belief that
the self is being wronged by others.  The theory provides generative
mechanisms for explaining the expression of a delusional system by a
paranoid person, the chronic distress felt by paranoid persons, and for
the sudden and extreme displays of fear and anger in interactions with
other people.


114

We plan to extend the theory in two ways:

(1) To cover other paranoid phenomena, such as delusions of grandeur,
  the transformation of counter-evidence to evidence supportive of
  delusions, and retrospective misinterpretation of input expressions.

(2) To explain the genesis of paranoid patterns of thought; e.g., the
  manner in which: (a) normal strategies for dealing with the shame-
  humiliation affect are ineffective and paranoid strategies develop,
  (b) strategies are selected as being appropriate and are reinforced
  when they are successful, and (c) persecutory delusional systems
  develop and expand to include much of the paranoid personality's
  cognitive processing.

    B.  Expand the background theory and model of the motivation of
cognitive processes.  The more important characteristics for the model to
have are:

(1) The top-level processes of the model should be purposive intentional
  processes guided by the affect system, rather than a question-answer
  loop or facsimile.

(2) Every action that the model performs should be motivated by an
   intention.  These intentions may be explicit in the case of goals,
  or implicit in the case of an action appropriate to a situation but
  with no explicit end state represented.

(3) Each belief and intention should have an associated measure of its
  significance to the entity.  The criteria for measuring the
  significance are based in the affect system.

(4) The model should have a number of coping mechanisms for avoiding or
  coping with distressing situations,  These mechanisms can be
  reinforced or discarded as they are proved to be more or less useful
   to the model.

(5) The model should be able to change over time to show the development
  of psycho-pathologies,  The most direct form of change is to the
  measures of significance attached to the model's beliefs and
  intentions,

     C.  Implement the theories in a simulation model.  We hope to
construct a model in such a way that the theories of motivation and
paranoia can be represented explicitly, and therefore be open to
inspection and modification.  In addition , since we model paranoid
behavior as expressed through linguistic actions, we hope to develop
adequate natural language understanding programs for recognition and
response in dialog situations.

    D.  Develop further techniques and methods for simulating cognitive
processes.  Specifically, we plan to explore human communication through
natural language in dialog situations and human natural language
interfaces with computers.  Also, we will extend our previous attempts at
finding stronger and more sophisticated tests for validation studies.   Our


115

results should be applicable to other simulations of human mental
activities.

Medical relevance.

   The simulation model of paranoid processes that we are implementing
has implications for the understanding, treatment, and management of
paranoid disorders.

   The shame-humiliation theory and its model suggest that paranoid
phenomena be viewed as a consequence of intentionalistic information-
processing strategies which attempt to avoid or minimize the distressing
experience of humiliation.  In trying to understand what is going on in a
paranoid patient at a symbol-processing level, this perspective directs
clinicians to look for humiliating and shame-engendering situations in the
patient's experience.  These may consist of a single, encompassing
humiliating situation such as a demeaning job, or a series of esteem-
damaging defeats such as repeated failures in disappointing love affairs.

    Since activation of intense shame is posited to be the core process
in paranoid disorders, implications for treatment involve trying to modify
this central mechanism in some way.  One method is to change the patient's
distressing belief in his own inadequacy by exploring topics involving
shame, esteem,  and self-censure.  Another is to desensitize the paranoid
patient to shame experiences through behavior therapy involving a graded
hierarchy of imagined distress situations and countering procedures.
These treatment procedures may be deduced directly from mechanisms in the
model, and the theory used to predict the outcomes of such procedures.

    For management of the disorder, the model predicts that removal from
situational humiliation, as in hospitalization, allows for repair from
breakdowns occurring under repeated activations of shame-humiliation
beliefs.  Also, if the patient returns to unchanged situational
humiliation, as in a distressing home life, he risks a relapse.

Current Status.

    PARRY2 was completed a year ago and has been available for
interviewing and validation tests on the SUMEX system for the past year.
We are now in the process of writing a new version, PARRY3, incorporating
the theoretical constructs presented in this report.   The new version
(which is being completely rewritten) contains mechanisms implementing the
characteristics of models mentioned above with the exception of the
ontogenesis of psycho-pathologies.  In addition,  it contains a new
language recognizer capable of combining pattern-matching rules with
parsing techniques, and explicit rules for recognizing and interpreting
elliptical expressions in dialogs.

Current publications.

[ll Colby, K.M. Artificial Paranoia, Pergamon Press, New York, 1975.


116

[21 Faught, W.S. Affect as Motivation for Cognitive and Conative
  Processes, IJCAI Proceedings, September, 1975.

[31 Colby, K.M., Hilf, F.D., Wittner, W.K., Parkison, R.C. and Faught,
  W.S., A Note on Estimating Improvement in a Computer Simulation of
  Paranoid Processes. UCLA Department of Psychiatry, Memo ALHMF-2, April
   1975.

[4] Parkison, R.C., Colby, K.M., and Faught, W.S., Conversational Language
  Comprehension Using Integrated Pattern-Matching and Parsing.   UCLA
  Department of Psychiatry, Memo ALHMF-5, April 1976.

[51 Colby, K.M.  Clinical Implications of a Simulation Model of Paranoid
   Processes.  Archives of General Psychiatry, 1976.

[61 Faught, W.S., Colby, K.M., and Parkison, R.C., Inferences, Affects,
  and Intentions in a Model of Paranoia. Cognitive Psychology, 1976.

Funding Status.

Grant NIMH , 2 years, $67,000 this year.

Interactions with the SUMEX-AIM resource.

    The SUMEX-AIM resource and its associated network connections make
possible the merging of artificial intelligence techniques and technology
in psychiatry and the resources of a west-coast center for psychiatric
studies, the Neuro Psychiatric Institute (NPI) at UCLA.  Access to SUMEX
from the NPI has brought a new source of questions and viewpoints to
research in mental disorders, as well as an opportunity for the model-
builders to interact with clinicians in elaborating details of paranoid
phenomena.  In addition, the current simulation model is being explored
for use as a training device for medical students and residents in the
Department of Psychiatry,

Critique of resource services.

   The resource itself has provided excellent facilities and service
for our research needs.  We have had almost no trouble developing
simulations due to the SUMEX system itself, and the cooperation of the
SUMEX staff has been excellent.

   In spite of this, problems have arisen with our use of the resource,
due almost entirely to the network aspect of our access.  (We connect to
SUMEX through both the ARPA net and TYMNET.) We see the problems of
network use of the SUMEX-AIM facility as falling into four broad
categories:

(a) the keyhole effect.  The slow terminal rates typical of a network
  connection (due to phone lines or local computer delay) force the
  user to peer at his files and communicate with the computer through


117

a small data channel.  Additionally, the network user may not have
direct access to a high-speed printing device for listing the day's
(or even the week's) work.

(b) the dropped connection. Typically, the network and/or local
  connection of the user to the net can drop, leaving the user's job
  in a dormant (non-running) state, forcing the user to reconnect (and
  perform the typically elaborate reconnect procedure) before his job
  will run again,

(c) slower interactive computer response due to network and local
  hardware , and subsequent greater amount of time necessary to get any
   work done.

Cd) the difficulty of lobbying for system changes and/or additions from
   a distance.  We are made acutely aware of this fact whenever notices
  are given over the system for classes or seminars explaining new
  system features.

(a)

(b)

(cl

Policies and programs that might be useful:

more system programs designed with the network user in mind, such as
status programs to report the status of detached or batch-run jobs,
and cleaner detach and attach programs for reestablishing dropped
connections.

a service-level advantage to non-local users to put them on a more
equal footing with local users.

a special effort to elicit and implement improvements for network
access.

In general, we have found the system reliable, and the staff

courteous and helpful.


118

IV.A.2.d   LANGUAGE ACQUISITION MODELING PROJECT

Language Acquisition Modeling (ACT)

Dr. John Anderson
(Univ. of Michigan)

(Grant NIMH $20,000 this year)

I. Summary of Research Program

A. Technical goals:

   To develop a production system that will serve as an interpreter of
the active portion of an associative network.  To model a range of
cognitive tasks including memory tasks, inferential reasoning, language
processing, and problem solving.  To develop an induction system capable
of acquiring cognitive procedures with a special emphasis on language
acquisition.

8. Medical relevance and collaboration:

    1. The ACT model is a general model of cognition. It provides a
useful model of the development of and performance of the sorts of
decision making that occur in medicine.

   2. The ACT model also represents basic work in AI.  It is in part an
attempt to develop a self-organizing intelligent system..  As such it is
relevant to the goal of development of intelligent artificial aids in
medicine.

C. Progress and Accomplishments:

    ACT provides a uniform set of theoretical mechanisms to model such
aspects of human cognition as memory, inferential processes, language
processing, and problem solving.  ACT's knowledge base consists of two
components, a propositional component and a procedural component.The
propositional component is provided by an associative network encoding a
set of facts known about the world.  This provides the system's semantic
memory.  The procedural component consists of a set of productions which
operate on the associative network,  ACT's production system is
considerably different than many of the other currently available systems
(e.g., Newell's PSG).  These differences have been introduced in order to
create a system that will operate on an associative network and in order
to accurately model certain aspects of human cognition.

   A small portion of the semantic network is active at any point in
time.  Productions can only inspect that portion of the network which is
active at the particular time.  This restriction to the active portion of


119

the network provides a means to focus the ACT system in a large data base
of facts.  Activation can spread down network paths from active nodes to
activate new nodes and links.  To prevent activation from growing
continuously there is a dampening process which periodically deactivates
all but a select few nodes.  The condition of a production specifies that
certain features be true of the active portion of the network.  The action
of a production specifies that certain changes be made to the network.
Each production can be conceived of as an independent "demon".   Its
purpose is to see if the network configuration specified in its condition
is satisfied in the active portion.  If it is, the production will execute
and cause changes to memory.  In so doing it can allow or disallow other
productions which are looking for their conditions to be satisfied.   Both
the spread of activation and the selection of productions are parallel
processes whose rates are controlled by "strengths" of network links and
individual product ions.  An important aspect of this parallelism is that
it is possible for multiple productions to be applied in a cycle through
the set of productions.  Much of the early work on the ACT system was
focused on developing computational devices to reflect the operation of
parallel, strength-controlled processes and working out the logic for
creating functioning systems in such a computational medium.

   We have successfully implemented a number of small-scale systems
that model various psychological tasks in the domain of memory, language
processing, and inferential reasoning.  A larger scale effort is underway
to model the language processing mechanisms of a young child.   This
includes implementation of a productions system to analyze linguistic
input t make inferences, ask and answer questions, etc.  Also a great deal
of effort is being given to developing learning mechanisms that will
acquire and organize the productions for this language processing.  This
learning program attempts to acquire procedures from examples of the
computations desired of the procedures,  For instance' the program learns
to comprehend and generate sentences by being given sentences and picture
representations of the meaning of the sentences(actually hand encodings of
the pictures).  Although this effort is focused on induction of linguistic
procedures, the hope is to develop a general model of induction of
cognitive procedures and not to place any language-specificity into the
induction procedures.

D. Current list of project publications:

[ll Anderson, J. R.  Induction of augmented transition networks.
  Cognitive Science, 1976, in press.

[2] Anderson, J. R.  Language, Memory, and Thought. Lawrence Erlbaum
  Assoc. , 1976, in press.

[31 Anderson' J. R. ,Klein, P., and Lewis, C. Language processing by
  product ion systems.  To appear in P. Carpenter and M. Just(Eds.1
  Cognitive Processes in Comprehension,Lawrence Erlbaum Assoc.

E. Funding Status:


120

   The research is currently being funded by a grant from NIMH for
computer simulation of language acquisition.  The level of funding for the
year beginning May 1, 1976 has yet to be determined.  It was $20,000 for
the past year.

II.  Interactions with SUMEZX-AIM Resource.

   Our period with the project has been too short to develop any
significant interactions.  The ACT program is currently being made a
system which will be available to members of the SUMEX-AIM community.


121

IV.A.2.e    MEDICAL INFORMATION SYSTEMS LABORATORY

MISL Project

Dr. B. McCormick and M. Goldberg, M.D.
(Univ. of Illinois at Chicago Circle)

(Grant HEW MB-00114-02, 3 years, $248,793 this year)

I) Summary of research program

A) Technical goals

   The major goals of the Medical Information Systems Laboratory fall
into three broad categories, described briefly as follows:

1) Construction of a database in ophthalmology.  This will provide a
trial setting for clinical decision-making research.  Four major
  activities are involved: implementation of a clinical data network;
design and on-going development of an Eye Outpatient Index; computer
systems development; and installation of a glaucoma clinic satellite.

2) Network-compatible database design.  This will provide cost-effective
distributed data management for clinical records.  Current projects
include: design of an intelligent coupler for the database system;
continuing design of various levels of database software (a
relational algebraic language -- RAIN, disk controller, database
skeleton); large database / database software compatibility; design
of a separate database computer; design of database network software.

3) Clinical decision support

a. Construction of a consultation system for use in the diagnosis and
  treatment of retinal/choroidal diseases.  The immediate goal is
the development of a system for giving advice about four diseases
prevalent at the University of Illinois Eye and Ear Infirmary:
histoplasmosis, central serous retinopathy, diabetic retinopathy,
  and sickle cell disease.  Besides actual construction of a system,
the project is concerned with theory of diagnosis and knowledge
acquisition methodology.

b. Interface between a pattern recognition system for Structured
Analysis of the Retina (STARE) and the diagnostic model used by
the retinal/choroidal consultation system.

c. Initial development of a causal model for motility.

d. Further inter-institutional communications in disease model theory
and development.

B) Medical relevance and collaboration


122

   We have chosen to explore inferential relationships between analytic
clinical data and the natural history of glaucoma and selected
retinal/choroidal diseases both in treated and untreated form.   These
investigations are intended to provide much more than a simple excursion
of computer technology into ophthalmology.  They address clinical problems
of national interest, as indicated in the Report of the National Eye
Advisory Council Vision Research Program Planning Committee (DHEW
publicat ion No. NIH-75-644) .

   Glaucoma t one of the major causes of blindness in the United States
today,  is difficult to diagnose in its early stages. Some recent evidence
indicates that enlargement of the optic nerve cups may be the first sign
of glaucoma's damage to the eye.  One of the goals of the present project
is the application of a newly developed technique for quantitative
analysis of the optic nervehead.  The technique is sufficiently simple to
permit wide-spread adoption.  If this technique is successful in
identifying very early glaucomatous disk changes, it should permit
institution of therapy at a very early stage, and thereby prevent serious
glaucomatous damage from being done to the eye.

   Diabetic retinopathy is another principal cause of blindness. Very
little is known about its pathophysiology, and there are many gaps in our
knowledge of its natural course,  The present study is designed to elicit
new information about this disease, using a series of new diagnostic tools
which have been developed as part of a system of computerized retinal
image analysis.  The need here is great, because at present there is no
proven satisfactory treatment,

   Sickle hemoglobinopathy can cause ocular changes that lead to loss
of vision and even total blindness.  Little is known about the natural
history of this problem, particularly during its early stages.  The
present project is ideally suited to assist in this study; the nation's
first Sickle Cell Clinic has been established at the Illinois Eye and Ear
Infirmary -- the site of our Ophthalmic Database System.

   At present there is great demand in the United States for improved
efficiency in the delivery of medical care.  Two ways that this can be
accomplished are: 1) by increased use of paramedical personnel to perform
jobs currently being performed by physicians, and 2) by use of automated
equipment to perform tasks previously performed by the physician.   In the
current project we utilize both these methods in the screening of new
ophthalmic patients at the Illinois Eye and Ear Infirmary.   If we can show
that these methods are not only feasible, but also improve the efficiency
and reliability of patient care, then a major contribution to ophthalmic
care for patients in large ambulatory care centers around the country will
have been made.

   Modeling of clinical decision-making is best carried out in intimate
association with an extensive referral clinic where a sufficient patient
population can be accumulated to provide an adequate biostatistical
sample.  In the prescribed setting, the Illinois Eye and Ear Infirmary,
the Medical Information Systems Laboratory has access to:

- clinical expertise provided by a house staff of 45;


123

- 28 residents, all of whom are required to undertake some
research work as a requirement of their appointment, are
available to explore latent contingencies of the database;

- an indigenous, relatively stable population (25% white,
75% black and other minorities) of a medically underserved
portion of the inner city of Chicago; the clinic provides
ophthalmic services to 50,000 patients per year.

   Commonality of the diseases being studied assists construction of an
adequate biostatistical sample.  Roughly 2% of the general population
exhibit symptoms of diseases treated at each of three specialty clinics
(Glaucoma, Retina, and Motility), or allowing for multiple presenting of
symptoms, approximately 5% by population.

   Besides a strong clinical research orientation, the Medical
Information Systems Laboratory brings to the study of disease a history of
successful engineering-medical collaboration.  MISL's sister project,
"Image Processing in Clinical Ophthalmology, " lists the development of the
digital television ophthalmoscope as one of its achievements.  This device
will be a major source of clinical data for our Ophthalmic Database.

C) Progress and accomplishments (of the Clinical Decision
Support activity only 1

   Interaction has continued over SUMEX-AIM with the authors of the
Weiss/Kulikowski glaucoma modeling program.  We are now entering cases
into the glaucoma system at the rate of approximately 5 per week.

   Work on the consultation system for retinal/choroidal diseases has
progressed along two fronts.  While interviewing an expert diagnostician,
in order to build a knowledge base for the four diseases mentioned above,
we have been piecing together a theory of diagnosis for ocular fundus
diseases.  We have attempted to incorporate pieces of the expert's
knowledge in a "fuzzy" diagnostic model, based partly on multiple-cue
probability theory, partly on fuzzy set and confirmation theory.   The
framework of the model is a hierarchy of disease categories, each with a
significance tempered by functions built into the system.  Our efforts
have centered on the acquisition of categories for histoplasmosis, central
serous retinopathy, and diabetic retinopathy,  but should soon also include
sickle cell disease.

   Our experience in interviewing experts has pointed the way to a
knowledge acquisition methodology that is compatible with our thoughts on
diagnostic reasoning.  Specifically we plan to store, for each disease
category , a representation for the contexts (or frames) in which the
disease's attributes apply.  We conceive of a disease category as a
"sphere" (embodying a structural model of the disease) in a hyperspace
defined by dimensions on attributes.  The significance of (or "belief in")
the model of disease is modified by interactions between attributes.
During acquisition, and after contexts have been defined, we can simulate
clinical situations for the benefit of our expert, who indicates his level
of belief in the disease model in the given situation.  This we plan to do


124

with the help of plasma panels, for graphical presentations of the
relevant contexts comprising each situation.  This approach is especially
convenient for specifying typical "default" situations, and for modeling
the time course of disease (in terms of modifications on attributes).

   Presently, while we continue interviews with our expert, we are
formalizing our diagnostic model and expect shortly to finish an in-depth
report.

D) Current list of project publications

[ll Chang Shi-Kuo, O'Brien M., Read J., Borovec R., Cheng W. H., Ke J. S.
   (1976) Design considerations of a database system in a clinical
   network environment.  Accepted for 1976 National Computer Conference.

[2] Chang S. K. (1975) Preliminary report on the implementation of a
   relational data base management system with structurally decomposed
   relations.  MISL internal report M.D.C. 1.1.3.

[3] Chang S. K. and McCormick B. H. (1975) An intelligent coupler for
   distributed database systems.  MISL internal report M.D.C. 1.1.7.

[4] Malone J. (1975) User's guide to uniclass cover synthesis. MISL
   report M.D.C. 4.4.1.

[5] Malone J. (1975) Addendum to AQVAL/l (AQ7), part 1: User's guide and
   program description, MISL report M.D.C. 4.4.1.

[61 Manacher G. K, (1975) On the feasibility of implementing a large
   relational data base with optimal performance on a minicomputer.
   Proceedings of the International Conference on very large data bases,
   Framingham, Mass.

[7] McCormick B, H. and Wilensky J. (1975) Clinical knowledge
   acquisition: design of a relational data base in ophthalmology. To
   be published in Proceedings Second Annual Medical Informations
   Systems Conference, Urbana, Ill.

[81 McCormick B. H., Goldberg M. F., and Read J. S. (1974) Clinical
   decision-making: design of a data base in ophthalmology.  Proceedings
   First Annual Medical Information Systems Conference, Urbana Ill.

[9] Michalski, Ryszard S.  (1975) On the selection of representative
   samples from large relational tables for inductive inference.   MISL
   internal report M.D.C. 1.1.9.

[lOI Murata T. (1976) Equations governing the behavior of E-nets. MISL
   internal report.

[ll] Murata T. (1975) State equation, controllability, and maximal
   matchings of petri nets.  MISL report M.D.C. 1.1.10.

[121 Murata T. and Church R. W. (1975) Analysis of marked graphs and petri
   nets by matrix equations,  MISL report M.D.C. 1.1.8.


125

[131 Vere S. A. (1975) Induction of concepts in the predicate calculus.
   Proceedings of the Fourth International Joint Conference on
   Artificial Intelligence vol l., Tbilisi, U.S.S.R.

[ 141 Vere S. A. (1975) Relational productions systems. MISL internal
   report M.D.C. 1.1.5.

E) Funding status

Year 02 -- 6/30/75 - 6/29/76 : $248,793.

Year 03 -- 6/30/76 - 6/29/77 : $228,000.

II> Interactions with the SUMEX-AIM Resource

A) & B) Collaboration, cross-fertilization

    Most of our interaction of late has involved the Glaucoma Network
fostered by the Rutgers Computers in Biomedicine group.   This network has
made it especially convenient for our expert in glaucoma, Dr. Jacob
Wilensky, to maintain close contact with investigators around the country.

   In addition, monitoring of SUMEX-AIM system messages has helped us
keep abreast of developments in other projects.  We have come to rely on
this facility as a vital source of up-to-date information.

C) Critique of resource services

    In our view SUMEX-AIM services are excellent.  We have been very
pleased with the prompt and personal attention given to our requests by
the resource staff.



126

IV.A.2.f    RUTGERS COMPUTERS IN BIOMEDICINE

Rutgers Research Resource
Computers in Biomedicine

   Dr. Saul Amarel
  Rutgers University
New Brunswick, New Jersey

(Grant NIH RR-00643-05, 3 years, $314,880 this year)

I.  PROJECT GOALS AND APPROACHES

   The fundamental objective of the Rutgers Resource is to develop a
computer based framework for significant research in the biomedical
sciences and for the application of research results to the solution of
important problems in health care.  The focal concept is to introduce
advanced methods of computer science - particularly in artificial
intelligence - into specific areas of biomedical inquiry.  The computer is
used as an integral part of the inquiry process, both for the development
and organization of knowledge in a domain and for its utilization in
problem solving and in processes of experimentation and theory formation.

   The Resource community includes 46 researchers - 26 members, 8
associates and 12 collaborators.  Members are mainly located at Rutgers.
Collaborators are located in several distant sites and they interact - via
SUMEX-AIM - with Resource members on a variety of projects, ranging from
system design/improvement to clinical data gathering and system testing.
At present,  collaborators are located at the Mt.  Sinai School of
Medicine, N.Y.; Washington University School of Medicine, St. Louis, MO.;
Johns Hopkins Medical Center, Baltimore, Md.; Illinois Eye and Ear
Infirmary, Chicago, Ill.; College of Medicine and Dentistry of New Jersey
(CMDNJ); and the University of Miami.

   Research in the Rutgers Resource is oriented to "discipline-
oriented" projects in medicine and psychology, and to "core" projects in
computer science, that are closely coupled with the "discipline-oriented"
studies.  Work in the Resource is organized in three AREAS OF STUDY; in
each area ther are several projects.  The areas of study and the senior
investigators in each of them are:

(1) Medical Modeling and Decision Making (C. Kulikowski, A. Safir).

(2) Modeling Belief Systems (C. F. Schmidt, N. S. Sridharan).

(3) Representations, Modeling and Hypothesis Formation in AI (S.
  Amarel).

    In addition the Rutgers Resource is sponsoring an Annual National
AIM Workshop, whose main objective is to strengthen interactions between
AIM activities, to disseminate research methodologies and results, and to
stimulate collaborations and imaginative resource sharing within the


127

framework of AIM.  The first AIM Workshop was held at Rutgers on June 14
to 17, 1975.  The Second Workshop is scheduled for June 1 to 4, 1976.

II.  AREAS OF STUDY;  SUMMARY OF GOALS AND ACTIVITIES

(1) Medical Modeling and Decision-Making

Present projects include:

(i) Development and clinical testing of the glaucoma consultation
program based on a causal-associational network (CASNET) model -
  as a collaborative project of the ophthalmological network,
  which has grown in the last year to include: Mt.  Sinai School
  of Medicine, Washington University, Johns Hopkins University,
  University of Illinois at Chicago, and the University of Miami.

(ii> Investigation of descriptive models of diseases based on a
  general semantic network representation, with associated
  strategies of diagnosis, prognosis, and therapy.  These models
  subsume a variety of sub-models useful in general consultation.
  A particular emphasis is placed on the analysis of the time
   course of disease, and inter-relationships between various
   disease sub-processes.

(iii) Development of a data base for clinical research associated with
   the consultation programs.  The results of the data analyses to
   be used selectively in updating and improving the models of
    diseases.

(iv) Investigation of various techniques for acquiring knowledge from
   clinical experts.  Incorporation of alternative expert opinions
   within a model.

(v> In collaboration with the Mt.  Sinai Rutgers Health Care
  Computer Laboratory, we are developing models for refraction and
  neuro-ophthalmology.

The following is a summary of accomplishments in this area.

(a> The ophthalmological network(ONET) is actively underway, with
  consultation programs available through SUMEX-AIM to the five
  collaborating institutions.

(b) The consultation system has been perfected by adding many details
  of diagnosis, prognosis, and therapy; new observations and
  decision criteria have been specified as the result of suggestions
  by the ONET collaborators.

(c) A data base for storing cases and providing a chronological model
  based interpretation has been created.


128

(d) A set of programs to analyze the case histories has been
  developed.  They are currently being used in collaborative
  clinical studies by ONET members.   Selected results are to be
  incorporated into the glaucoma model.

(e> A semantic network based model for glaucoma has been implemented
  with expanded capabilities for explanation and a greater facility
  for being updated.

   The progress in extending the collaborative activities of the
ophthalmological network has been made possible by the facilities and
support provided by SUMEX-AIM.  The semantic network model is being
developed on INTERLISP at SUMEX-AIM.  Other program development activities
are evenly distributed between SUMEX-AIM and the Rutgers-lo.

(2) Modeling Belief Systems

   The overall goal of this project is to develop a computer based
psychological model of how persons reason about the causes of human
action,  The common-sense notion of social causation which is used to
understand intentional actions has served as the focus of this effort The
construction of a system (called BELIEVER) to assist in the description of
the psychological theory and to facilitate the testing of the theory is a
principal goal of this project and to date we have accomplished the
following:

(a) A working system has been developed that accepts the descriptions
  of Templates, Relations and their consistency conditions which are
  concepts developed in the Meta Description System (MDS) framework.

(b) We have described PLANS, ACTS, PERSONS and SUMMARIES as templates;
  we have defined the consistency conditions associated with the
  relations among these.

(c) Developed procedures in FUZZY for Instantiating Act Descriptions
  and calculating presuppositions needed to form coherent
  interpretations.

(d) We have provided for an easy-to-understand prompting scheme based
  on the concept of templates and consistency conditions.   The
  prompting proceeds using the template descriptions and attempts to
  fill in missing information by implication following based on the
  consistency conditions.

(e) Developed a framework for submitting and analyzing the semantics
  of experimental evidence when the subject responses are isolely
  unstructured natural language text.

Our goals for the immediate future are:

(a) To continue to gather experimental data and follow the analysis to
  suggest hypothesis in the Believer Theory.


129

(b) Develop strategy Plan Recognition based on a Theory of Motivation
  and Consistency postulates of a Persons beliefs and knowledge.

(c) Continue to investigate the innovative uses of our descriptive
  methodology in empirical procedures.

   The development of MDS - which provides a framework for designing
the BELIEVER system - was carried out at SUMEX-AIM until October 1975;
subsequently, most of the work on MDS was shifted to ISID.  While early
versions of BELIEVER were developed at SUMEX-AIM, the last year saw a
shift in computing on this project from SUMEX-AIM to the Rutgers-10 -
since program development is being done on the FUZZY system which runs on
the Rutgers computer.

(3) Representations, Modeling and Hypothesis Formation in Artificial
  Intelligence

   A major part of our effort in this area is oriented to
collaborations with investigators in other Resource projects - involving
applications of AI ideas and programs and also identification and initial
exploration of new significant AI problems.  The collaboration in the
BELIEVER area has lead to a close integration of the basic AI oriented
work (N. S. Sridharan) and the discipline oriented work (C. Schmidt) in
the area.  This work is reported under 2 above.

"Core" projects in the present area include:

(i)

Development and applications of FUZZY (R.  LeFaivre).  The FUZZY
system was transferred from UNIVAC 1110 LISP to UC1 LISP on the
Rutgers-lo.  A number of improvements were made, both to the UC1
LISP system and to the FUZZY language to make it both more efficient
and more powerful.  These changes include a new prettyprint package
for UC1 LISP and functions for computing differences of associative
nets and creating multiple contexts in FUZZY. FUZZY is currently
being used in the initial implementation phase of the BELIEVER
system.  Applications to reasoning in medical diagnosis are being
explored.

(ii) Applications of grammatical inference schemes to automatic
  adjustment of medical models on the basis of clinical data (A.
   Walker).

(a) An algorithm was found to eliminate loops from stochastic
   causal models.

(b) A grammar model for the flow of control in programs was
   formulated.

(c> A technique for progressively bounding a search space of
  stochastic grammars, in terms of grammars already found, was
  studied theoretically and tested in practice.

(d) The impact of this work on the automated construction of
  production-rule based AI systems is now under study.


130

Computing in this project is done on the Rutgers-lo.

(iii) Development of a grammatical inference system using a "developmental
   paradigm" (W. Fabens).  This is a hypothesis formation system which
   attempts to change a given context free grammar so as to accommodate
   new sentences that cannot be derived from the given grammar.   The
   system includes (a) a relaxation parser - which comes as close as it
   can to an interpretation of a given "deviant sentence", (b) a rule
   hypothesizer which uses such an interpretation to propose changes to
   the current grammar, (c) a rule coalescer which summarizes with as
   little loss or gain in generality as possible the newly hypothesized
    grammar.  We have developed programs in all of these areas and are
   currently composing them into a single system.  Computing in this
   project is done on the Rutgers-lo.

(iv> Development and study of systems for theory formation in programming
   tasks (S.  Amarel).  A system is being developed for acquiring
   knowledge about a program formation task.  The system involves the
   generation, management and evaluation of programs at various stages
   of specification.  In this project, major emphasis is given to
   problems of representation and to the effect of shifts between
   representations.  Program development is being done on the Rutgers-
    10.

   While the first project discussed in this area is focusing on the
development of AI languages that can provide a stronger basis for system
development in the Resource, the last three projects are focusing on
                                       formation -  an area which
                               improvement of a knowledge

different AI approaches to hypothesis (theory)
is essent ial to the automatic acquisition and
base from experimental data.



III.   AIM WORKSHOP

   The first annual AIM Workshop was held June 14-17, 1975. The first
day was devoted to a General Session to provide an overview of current AIM
activities and a broad forum for discussion,  The following three days
were devoted to discussions in depth of AIM designs, and to demonstrations
of current systems.  Several AIM systems were effectively demonstrated on
SUMEX during the Workshop,

   The second annual AIM Workshop will be held June l-4, 1976. The
entire four days will be devoted to lectures and panel discussions on
current projects in the AIM community, and on problems of knowledge
representation, reasoning, and AI system design.  Papers on language,
speech, vision, education and problem solving will summarize recent AI
approaches, while the role of biomathematical modeling and inference
methods will be the focus of summary papers and panel discussions.
Tutorials on languages and systems available to the AIM community will
also be presented.  The Workshop will conclude with a panel on the
dissemination of scientific information and computer networking.

   The SUMEX-AIM system is essential for the Workshop. Many of the AIM
programs will be running on SUMEX-AIM and accessed via TYMNET or ARPANET


131

from Rutgers.  The message facilities of SUMEX-AIM are most useful for
planning, communicating and setting up the information pool for the AIM
Workshop.

IV.  INTERACTIONS WITH THE SUMEX-AIM RESOURCE

   During the past year we have continued to use the SUMEX-AIM resource
for program development and testing, for communications between
collaborators distributed in different parts of the country and for
preparation and running of the AIM Workshop.

   Computing in the Rutgers Research Resource is distributed between
SUMEX-AIM and the Rutgers-lo.  The relative utilization of SUMEX-AIM by
"local" Rutgers users has decreased this year.  The utilization of SUMEX-
AIM by our "remote" medical collaborators has been growing.  The total
amount of computing at SUMEX by our Resource users and by our
collaborators has decreased relative to the previous year.  One of the
reasons for this was the overloaded condition of SUMEX-AIM. Another
important related problem was the relatively poor quality of communication
facilities available to us via TYMNET.

   In order to provide a more reliable and convenient network
environment for our investigators and their collaborators and also for the
AIM Workshop, we have proceeded this year with the implementation of
several enhancements to the Rutgers-lo.  These enhancements were planned
in consultation with AIM management, with the intention of bringing to the
AIM network complementary facilities and added capacity.  Two stages of
enhancement have been completed this year: (a) core memory and fixed head
disk were augmented and the TOPS 6.02 operating system was installed; (b)
a TYMCOM communications unit was installed, making the Rutgers-10
accessible via TYMNET - in time for support of the second AIM workshop.

   The SUMEX-AIM facility played a key role this year in consolidating
our network of collaborators in ophthalmology (ONET) and providing the
support needed for establishing a productive collaboration among the ONET
investigators.

   The SUMEX staff have continued to function as models of excellent
cooperation.  They have been very helpful and responsive in sharing
information and keeping us aware of developments, problems, new ideas,
etc.  SUMEX continues to be a good forum for communicating, linking and
talking with various investigators in our Resource as well as with others
in the AIM community.

   The AIM Workshops rely heavily on SUMEX-AIM.  In the first Workshop,
the demo sessions and the hands-on activities with remote systems were
found to be very effective in disseminating AIM methods and techniques to
a broad group of participants.  The significant role played in these demos
by the SUMEX staff, and by the SUMEX resource, cannot be overstated. In
our planning for the second AIM Workshop we are also placing strong
emphasis on on-line activities, more so considering the broader class of
participants who will meet this year.


132

   SUMEX-AIM has been most useful in communicating, planning and
helping to set up the information pool for this AIM Workshop.

   In conclusion, the SUMEX-AIM facility is continuing to be an
essential part of our research environment.  In view of our AIM Workshop
activities and the related enhancement of the Rutgers-lo, we are moving to
a point where the interactions between the Rutgers project and SUMEX-AIM
are increasing in scope - as Rutgers is gradually adding to its "user"
role a "server" role for the national AIM project.

IV.

El1

121

II31

[41

[51

161

II71

C81

191

LIST OF PROJECT PUBLICATIONS

Amarel, S., and Kulikowski, C.  (1972) "Medical Decision Making and
Computer modeling, Proc.  of 5th International Conference on Systems
Science, Honolulu, January 1972.

Amarel, S.  (1974) "Computer-Based Modeling and Interpretation in
Medicine and Psychology: The Rutgers Research Resource", Proc. on
"Conference on the Computer as a Research Tool in the Life Sciences",
June 1974, Aspen, by FASEB; also appears as Computers in Biomedicine
TR-29.  June 1974, Rutgers University.

Amarel, S.  (1974) 'Inference of Programs from Sample Computations",
Proc.  of NATO Advanced Study Institute on Computer Oriented Learning
Processes, 1974, Bonas, France.

Bruce, B., (1972) "A Model for Temporal Reference and its Application
in a Question Answering Program", in 'Artificial Intelligence", Vol.
3, Spring 1972.

Bruce, 0.  (1973) "A Logic for Unknown Outcomes", Notre Dame Journal
of Formal Logic; also appears as Computers in Biomedicine, TM-35,
Nov.  1973, Rutgers University.

Bruce, B.  (1973) "Case Structure Systems", Proc. 3rd International
Joint Conference on Artificial Intelligence (IFCAI), August 1973.

Bruce, B.  (1975) "Belief Systems and Language Understanding",
Current Trends in the Language Sciences, Sedelow and Sedelow (eds.)
Houton, in press,

Chokhani, S.  and Kulikowski, C.A.  (1973) 'lProcess Control Model for
the Regulation of Intraocular Pressure and Glaucoma", Proc.   IEEE
Systems, Man & Cybernetics Conf., Boston, November 1973.

Chokhani, S.  (1975) "On the Interpretation of Biomathematical Models
Within a Class of Decision-Making Procedures", Ph.D. Thesis, Rutgers
University; also Computers in Biomedicine TR-43, May, 1973.

[lOI Fabens, W.  (1972 1 "PEDAGLOT.  A Teaching Learning System for
   Programming Language", Proc.  ACM Sigplan Symposium on Pedagogic
   Languages, January 1972.


133

[ill Kulikowski, C.A. and Weiss, S.  (1972) "Strategies for Data Base
  Utilization in Sequential Pattern Recognition", Proc.  IEEE Conf. on
   Decision and Control, Symp.  on Adaptive Processes, December 1972.

El21 Kulikowski, C.A. and Weiss, S.  (1973) "An Interactive Facility for
   the Inferential Modeling of Disease", Proc.  7th Annual Princeton
   Conf.  on Information Sciences and Systems, March 1973.

[13] Kulikowski, C.A.  (1973) "Theory Formation in Medicine: A Network
   Structure for Inference", Proc.  International Conference on Systems
   Science, January 1973.

[14] Kulikowski, C.A., Weiss, S., and Safir, A. (1973) "Glaucoma
   Diagnosis and Therapy by Computer", Proc.  Annual Meeting of the
  Association for Research in Vision and Ophthalmology, May 1973.

[151 Kulikowski, C.A.  (19731, "Medical Decision-Making and the Modeling
  of Disease", Proc. First Intern. Conf. on Pattern Recognition,
   October, 1973.

[161 Kulikowski, C.A.  (1974) "Computer-Based Medical Consultation"-- A
   Representation of Treatment Strategies", Proc.  Hawaii International
   Conf.  on Systems Science, Jan.  1974.

[ 171 Kulikowski, C.A.  (1974) "A System for Computer-Based Medical
   Consultation", Proc.  National Computer Conference, Chicago, May
   1974.

[ 181 Kulikowski, C.A.  and Safir, A.  (1975) "Computer-Based Systems
  Vision Care", Proceedings IEEE Intercon, April 1975.

[191 Kulikowski, C.  and Trigoboff, M.  (1975a) "A Multiple Hypothesis
  Selection System for Medical Decision-Making", Proc.  8th Hawaii
   International Conference on Systems Sciences, January 1975.

[20] Kulikowski, C. & N.S. Sridharan, "Report on the First Annual AIM
   Workshop on Artificial Intelligence in Medicine.  Sigart Newsletter
   No.  55, December 1975.

[21] Kulikowski, C., "Computer-Based Consultation Systems as a Teaching
   Tool in Higher Education, 3rd Annual N.J.  Conference on the use of
  Computers in Higher Education, March 1976.

[22] Kulikowski, C., Weiss S., Safir, A.  et al "Glaucoma Diagnosis &
  Therapy by Computer: A Collaborative Network Approach" Proc. of
  ARVO, April 1976.

[23] Kulikowski, C., Weiss, S., Trigoboff, M.  Safir, A., "Clinical
  Consultation and the Representation of Disease Processes", Some AI
  Approaches, AISB Conference, Edinburgh, July 1976.

[24] LeFaivre, R., "Procedural Representation in a Fuzzy Problem-Solving
   System", Proc.  Natl.  Computer Conf., New York, June 1966


134

[251 LeFaivre, R. and Walker, A.  "Rutgers Research Resource on Computers
   in Biomedicine, H", Sigart Newsletter No,  54, October 1975.

[261 Mauriello, D.  (1974) "Simulation of Interaction Between Populations
   in Freshwater Phytoplankton", Ph.D.  Thesis, Rutgers University,
   1974.

[271 Schmidt, C.  (1972) "A comparison of source unidimensional,
   multidimensional and set theoretic models for the prediction of
   judgements of trail implication", Proc. Eastern Psych. Assoc.
   Meeting, Boston, April 1972.

[281 Schmidt, C.F. and D'Addamio, J.  (1973) "A Model of the Common Sense
  Theory of Intension and Personal Causation", Proc.  of the 3rd IJCAI,
   August 1973.

[291 Schmidt, C.F. and Sedlak, A.  (1973) "An Understanding of Social
   Episodes", Proc.  of Symposium on Social Understanding in Children
  and Adults: Perspectives in Social Cognition, American Psych.   Assoc.
   Convention, Montreal, August 1973.

[301 Schmidt, C.F., Sridharan, N.S., & Goodson, J.L. Recognizing plans
   and summarizing actions.  Proceedings of the Artificial Intelligence
  and Simulation of Behavior Conference, University of Edinburgh,
   Scotland.  July 1976.

[311 Schmidt, C.  Understanding human action: Recognizing the plans and
   motives of other persons.  In (eds. J.  Carroll and J. Payne)
   Cognition and Social Behavior.  Potomac, Maryland:

[321 Schmidt, C.F.  (1975) "Understanding Human Action", Proc.
   Theoretical Issues in Natural Language Processing: An
  Interdisciplinary Workshop in Computational Linguistics, Psychology,
  Artificial Intelligence, Cambridge, Mass., June 1975.  Also appears
   as Computers in Biomedicine, TM-47, June 1975, Rutgers University.

f331 Schmidt, C.F.  (1975) "Understanding Human Action: Recognizing the
  Motives", Cognition and Social Behavior, J.S.  Carroll and J. Payne
  (eds.), New York: Lawrence Earlbaum Associates, in press. Also
  appears as Computers in Biomedicine, TR-45, June 1975, Rutgers
   University.

[341 Sedlak, A.J.  (1974) "An Investigation of the Development of the
  Child's Understanding and Evaluation of the Actions of Others", Ph.D.
   Thesis, Rutgers University

[351 Sridharan, N.S.,  "The Frame and Focus Problems in AI: Discussion in
   relation to the BELIEVER System.  Proceedings of the Conference on
  Artificial Intelligence & the Simulation of Human Behavior,
  Edinburgh, July 1976.

[361 Srinivasan, C.V.  (1973) "The Architecture of a Coherent Information
   System: A General Problem Solving System", Proc.  of the 3rd IJCAI,
   August 1973.


135

[371 Tucker, S.S.  (1974) flCobalt Kinetics in Aquatic Microcosmsll, Ph.D.
  Thesis, Rutgers University.

[ 381 Vichnevetsky, R.  (1973) "Physical Criteria in the Evaluation of
  Computer Methods for Partial Differential Equations", Proc.  7th
  International AICA Congress, Prague, Sept.  1973; reprinted in Proc.
   of AICA, Vol.  XVI, No.  1, Jan.  1974, European Academic Press,
   Brussels, Belgium.

[39] Vichnevetsky, R., Tu, K.W., Steen, J.A. (19741, "Quantitative Error
  Analysis of Numerical Methods for Partial Differential Equations",
    Proc.  Eighth Annual Princeton Conference on Information Science and
  Systems, Princeton University, March 1974.

[401 Weiss, S.  (1974) "A System for Model-Based Computer-Aided Diagnosis
   and Therapy", Parts I and II, Ph.D.  Thesis, Rutgers University; also
  Computers in Biomedicine TR-27, Feb.  1974.


136

1V.B    INFORMAL PROJECTS

   The following is a summary of the various "pilot" projects which
have been admitted to SUMEX on a temporary basis pending development of a
formal proposal.  Many of these projects reflect initial efforts at
formalizing analyses of experimental situations in preparation for the
development of DENDRAL-like heuristic inference generation and modeling.

IV.B.l    STANFORD PILOT PROJECTS

1V.B.l.a    AI IN MOLECULAR GENETICS - MOLGEN

THE MOLGEN SYSTEM FOR EXPERIMENTAL MOLECULAR GENETICS

    Prof. J. Lederberg
Stanford Department of Genetics

   The MOLGEN system is designed to aid the experimental molecular
geneticist in many important phases of laboratory investigation.  It will
be composed of three major interacting parts: an experiment planning
system, an enzymatic action simulation program, and a collection of
knowledge bases containing the rules and heuristics of molecular genetics.

   The experiment planning program will collect information about a
problem from the user, select an appropriate methodology for solution
(information retrieval, simulation, hierarchical planning, or some
combination thereof) and then work interactively with him to solve the
problem.  Some examples of the range of problems MOLGEN's experiment
planner will deal with are:

1.




7 -.






3.




4.

The user wishes to know which enzymes will function under a given pH
or salt concentration --a straight information retrieval problem.

The user wants an accurate prediction of the ratio of linear to
circular DNA after application of ligase to a given starting
concentration of "sticky-ended" DNA--a problem best handled by a
discrete simulation.

The user wants a verification that a proposed experiment will
produce something like a desired result--probably a combination of
retrieval and simulation.

The user wants a plan to synthesize and then analyze a new DNA
structure --a deep problem involving hierarchical planning methods
making full use of all program facilities.

The simulation program will provide detailed modeling of the action

of enzymes on nucleic acid structures.  The program has been shown to


137

produce accurate and reproducible results on several diverse structures
for simple ligation, and is being extended to other common enzymatic
actions (exo and endo-nucleases, polymerase, etc.).

   The knowledge bases will be composed of collections of the rules and
heuristics used by geneticists, as well as facts about enzymes,
experimental methods, and physical processes like de/renaturation.  They
will be designed to allow access in retrieval, simulation, and planning
modes, so the major problem lies in represen-tation of diverse types of
knowledge in a common, consistent fashion.

   Along with the major system components discussed above, certain
themes remain dominant in all phases of system design.  Primary
consideration is given to making the system an easy and natural tool for
the molecular geneticist to use.  Nucleic acid structure entry, editing,
and display is by way of an interactive, user-oriented program.
Explanation facilities (in the manner of the MYCIN system) will be
provided whenever possible, and all knowledge bases made easily extendable
and modifiable.  We consider the trust and cooperation of the expert user
vital to continued system development,  and consider the best way to
provide for this cooperation is to make the system immediately useful and
intelligible to geneticists.


138

1V.B. 1.b    BAYLOR-METHODIST CEREBROVASCULAR PROJECT

BAYLOR-METHODIST CENTER FOR CEREBROVASCULAR RESEARCH
       DATA SERVICES RESEARCH LABORATORY

John L, Gedye, M.D.
Department of Neurology
Baylor College of Medicine
    Houston, Texas

   During the year the data services research laboratory has had a
total of about 3,000 hours of man-effort available, of which about 50% has
been devoted to implementation of the local facilities described below,
and a further 5% has been devoted to the SUMEX pilot study.

A) GENERAL GOALS

   Clinical research in neurology, as exemplified by the program of the
Baylor-Methodist Center for Cerebrovascular Research, creates a large
number of data handling problems of a wide variety of types.  The Data
Services Research Laboratory seeks to support the program of the center by
developing and making available a comprehensive range of data acquisition
storage, processing and display techniques for the center's investigative
laboratories.  These techniques are being designed to facilitate the
systematic study of inter-relationships between the different types of
data gathered from the various cerebrovascular disease patient groups
being studied by the center.

Technical Resources

   At the beginning of last December the Data Services Research
Laboratory accepted delivery of a Digital Equipment Corporation (DEC) PDP-
11135 computer configuration with 32K core, 2 RK05 disks, a TS03 magnetic
tape unit, a Terminet 1200 printer acting as console and hard-copy device,
and 2 modified Hazeltine 1200 video display terminals for general
interactive use. the system has incoming (1) and outgoing (1) 300 baud
modem interfaces to the public switched network the latter incorporating a
Bell 801~ autodialler.  This configuration supports time-shared services,
both interactive and batch, based on a single user-language (this is an
extended basic called BASYS - the system is currently operating under the
commercially supported version of BASYS V3P, known as AIMS V3P - for
details see the latest edition of the AIMS-11 programming manual, November
1975, obtainable from ARBAT Systems Limited, 61 Broadway, New York, N.Y.
10006).

Access to SUMEX

   This has been by means of TYMNET, which we access through one of the
Houston TYMSAT's.  At the beginning of the project we used a 300 baud TI


139

Silent 700 terminal in the normal manner, but since the installation of
our local PDP-11/35 configuration in December, we have taken advantage of
our autodial facilities and the supporting facilities provided in BASYS
V3P and have used our BAYSYS terminals for this purpose.   As a result of
this experience we are now considering implementing software which will
allow an easier interface between our local system and the resources of
SUMEX.  We have in mind an ability to create files on our system and pass
them to SUMEX and to pick up files from SUMEX and store them locally.   It
is felt that implementing such facilities will greatly facilitate
interaction with the SUMBX resource and will lead naturally to the
procedures needed to support our AI research.

B) MEDICAL RELEVANCE

   The system designed and implemented for the Center for
Cerebrovascular Research (BAYSYS - not to be confused with BASYS) allows:

1)

Maintenance of an immediately accessible, up-to-date, cross-
referenced directory of all patients who have, at any time, come
under the care or investigation of members of the clinical staff of
the Department of Neurology, together with a record for each showing
what data has been gathered and where it is located.  On March 31,
1976 the directory contained 570 entries, and experience to date
suggests that in order to keep up with the patient throughput of the
center,  new names will be added at a rate of about lOO/month. As
presently configured the system has a directory capacity of 6,000.

2) Storage in a readily accessible, computer-compatible form, of all
data gathered on patients which may be relevant to the current and
  future research interests of the center.  Investigatory data is
regularly archived on industry compatible magnetic tape in a format
which allows subsequent collation using standard sort and merge
  software.

   The work of the Data Services Research Laboratory is organised
around the assumption that the research activities of the center can, for
all practical purpose, be regarded as a set of inter-related projects,
each of which includes planned data acquisition by one or more of the
investigative laboratories of the center in accordance with a
predetermined schedule.

   As a result of providing these primary data gathering services to
the investigatory laboratories of the center, the Data Services Research
Laboratory will acquire access to a reliable data base covering in
principle, the entire range of activities of the center, and this will
allow a range of secondary data handling activities to be undertaken on
behalf of the center.

   It appears that the main technical problem that will have to be
solved before it is possible to keep up with the potential flow of data is
the design and implementation of a suitable range of data input procedures
to cope with the wide variety of data types.  It is hoped that the new
hand held OCR wand recently developed by Recognition Equipment, Inc. of


140

Dallas, Texas will allow us to develop a suitably flexible data entry work
station for our purposes.

C) PILOT STUDY

   The aim of this study has been to formulate a project relevant to
the activities of the center which will provide an acceptable and
legitimate "point of entry" for artificial intelligence research, and
which will allow the systematic formulation of objectives for the future.
We are, at the present time, focussing on situations in which a researcher
working in the Center for Cerebrovascular Research is required to respond
to information from a new source and in some way incorporate it into his
understanding of a class of clinical situations,

Background

    A continual source of background guidance for our work has been the
writings of Stephen Toulmin, particularly his book "Human Understanding"
(the first volume of which appeared in 1972) in which he develops a
evolutionary approach to the subject in terms of a llpopulationallf account
of conceptual change in intellectual disciplines.  From the standpoint of
Toulmin's approach "men demonstrate their rationality not by ordering
their concepts and beliefs in tidy formal structures, but by their
preparedness to respond to novel situations with open minds -
acknowledging the shortcomings of their former procedures and moving
beyond them".  The emphasis in his approach to rationality is thus on
lfchangelf, on the circumstances under which and the means by which men
change their concepts and beliefs.  Our work to date can be thought of as
an attempt to model this approach to the growth of human knowledge in a
specific situation - assimilating the results of the new 133XE inhalation
regional cerebral blood flow assessment technique.  The approach requires
at least 3 levels of activity:

1. Choice of goals of rational enterprise

2. Development of concepts

3. Formulation of arguments

    The basic approach has been to design as system which will, when
provided with with a set of descriptions of paradigmatic patients
representing two clinical conditions, automatically formulate an optimal
algorithm (or in other words a set of decision rules which makes the best
use of the available information) for discriminating between those two
conditions, and which can then be used on a new set of patients for
various purposes,  The approach has been tested on regional cerebral blood
flow data by checking an algorithm developed from a representative set of
32 patients who acted as paradigms on a similar set of 32 patients and a
diagnostic success rate of 90% was obtained in relation to the request
"Tell me, on the basis of regional cerebral blood flow measurements alone,
whether this individual is normal or abnormal".


141

   The approach appears to have a wide range of applications in the
context of the work of the center and effort is currently being
concentrated on applying the technique to the systematic exploration of
regional cerebral blood flow differences in relation to such contrasts as
tfmale/female't,  "left-hemisphere/right hemisphere" and so on.   The
technique can be applied to data as it accumulates, thus allowing the
detection of trends at an early stage of the research.

   It is inappropriate to go into details of the approach here, but
it's essential feature is that it permits revision of both conceptual
boundaries (such as what is meant by "high" as opposed to "lowff flow in a
particular brain region) and of arguments expressed in terms of a given
set of concepts (such as: lfhighlt flow in region x, together with fflowfl
flow in region y, and "high" flow in region z implies multi-infarct
dementia as opposed to Alzheimer's dementia) as a result of the
acquisition of new data.

II) INTERACTIONS WITH THE SUMEX-AIM RESOURCE

A) Collaborations Through the Network

   We have not yet reached a stage at which we are able to support
regular collaboration through the network.  We hope that this will develop
naturally as soon as we have been able to develop interface between our
local PDP-11135 system and SUMEX 30 that we can handle interaction as a
natural part of our day to day activities.  Dr. David Bowen is planning to
cooperate with us during the summer from London over ARPANET, and we
intend to work together on his neurochemical data.

0) Contacts and Cross Fertilisations

   The Rutgers workshop was the most valuable feature of SUMEX-AIM
participation during the year, providing an opportunity to get an overview
of work in progress and to see directions in which our work here could
develop to complement what was being done elsewhere.  On the clinical
application side the "INTERNIST (DIALOG)" project was the most
stimulating, as it demonstrated the challenges that would have to be met
by any practically useful approach, for example: the ability to handle
multiple problems presenting together.

    At a more fundamental level  "DENDRAL" confirmed the value of the
approach to which we are committed - trying to model the research process
itself.  As a result of the Rutgers workshop, a working relationship has
been established with Drs. Lindberg and Blackwell of the University of
Missouri and there have been reciprocal visits between our respective
locations.  On my last visit to Columbia, I gave an invited paper called
"A Jurisprudential Approach to Artificial Intelligence" and have since
been invited to write this up for "Biosciences Communicationsft .

   Dr. Lindberg's work on the encipherment of electrolyte patterns has
proved to be a useful stimulus to our own work on the encipherment of


142

neurological data, and suggestions from myself that it might be valuable
to look at electrolyte pattern transitions has been taken up and developed
as an application of the theory of finite state machines.

C) Critique of Resource Service

   Our use of the computational resources of SUMEX has ao far been
largely confined to experimentation with the various modules of the system
described above.  This was particularly valuable before our own resources
became available.  We now see ourselves moving to a new mode of operation
in which we try to find out which things are best done on SUMEX and which
locally.  Our main criticism to date has been slowness of response during
peak hours, when, unfortunately we have sometimes had to try to operate
because of constraints on manpower availability.

FUNDING STATUS

   Work is currently being supported by departmental funds. However,
we have recently received unofficial notification from NIH that funds have
been approved for the support of the Data Services Research Laboratory in
the center grant renewal effective February 1, 1977.  Approval is for one
year in the first instance with support for a further two years subject to
satisfactory administrative arrangements.


143

1V.B.l.c    AUTOMATIC LV MODELING

Automatic Radiographic Image Analysis by LV Modeling

    Donald C. Harrison, Professor of Medicine
Edwin L. Alderman, Assistant Professor of Medicine
Lynn Quam,Ph.D., Research Associate in Computer Science

Stanford University

   A proposal to carry out this research has been submitted to the NIH.
Medical applications of computer image processing was part of the original
collaborative research goals of SUMEX. This has been supported as a pilot
project to facilitate the development of independent grant support.

   The proposal is to use the facilities of SUMEX-Aim to develop a
mini-computer system for the automatic analysis of left ventricular
angiography in a clinical cardiac catheterization laboratory setting. This
system will be designed to 1) provide frame by frame quantitative volume
measurements 2) analyze wall motion abnormalities and 3) generate new
information about left ventricular function.  In conjunction with the
SUMEX systems staff, Dr.  Lynn Quam has done the initial development work
on an interactive graphics and image display system as summarized below.

SUMMARY :

   A general purpose hardware and software system for interactive
graphics and grey-level image display has been developed on the SUMEX
Tenex system, using a Tektronix 611 storage scope controlled by a PDP-
1 l/10 processor.  The system is capable of producing limited displays
dynamically using the non-storage mode of the 611, whereas complex
displays require the use of storage mode or photography.

   A general purpose graphics package has been developed, which is
essentially compatible with the graphics software at the Stanford A-I Lab.
Consequently, with minor revisions, many of the graphics programs written
at the A-I lab can be used at SUMEX.

   Many of the image processing algorithms originally developed at the
A-I lab by L. H. Quam have been revised to run in the Tenex environment.

   The combined effect of the graphics and image display hardware and
software is the capability for SUMEX users to perform a wide variety of
image enhancement operations on grey-level images, and to display both
line drawing graphics and grey-level images.

PURPOSE :

The primary purpose for developing the graphics and image display


144

system was to support the needs of an NHLI proposal in the division of
Cardiology.  The proposed research was to develop algorithms to automate
the procedure for outlining ventricular margins in angiograms, for the
purpose of cardiac dynamic performance evaluation.  The images are
obtained by passing x-rays thru the patient who has a catheter placed in
the heart.  The x-ray target is viewed by both a tine film camera and a
vidicon.  The vidicon output is both viewed directly, and recorded on a
video disc.  Several cardiac cycles are recorded on the disc, then a
radio-opaque dye is injected into the heart using the catheter, and
several (at least 3) more cycles are recorded.  This procedure produces
about 150 images which must be analyzed.

   In the normal clinical operation, a technician manually traces the
ventricular margins using a light pen which is connected to a mini-
computer which computes the desired performance measurements and produces
hard copy output.  The manual tracing is quite tedious and slow.

   The primary difficulty with automating ventricular margin out lining
is that during part of the cardiac cycle the margin is of very low
contrast (poor signal to noise ratio), making it impossible to detect
without taking adjacent (in time) images into consideration.   In order to
develop techniques for automatic margin definition, it was necessary to
have hardware and software for image and graphics display.

HARDWARE :

   The hardware consists of four component parts: a Tektronix 611
scope, a display controller, a PDP-ll/lO, and a PDP-10 to PDP-11
communication interface.

Tektronix 611 Scope:

   A Tektronix 611 storage oscilloscope is used to generate the images.
Briefly, the 611 scope has high resolution (about 100 points to the inch
on a 7 by 9 inch screen), and storage and non-storage modes.  Using non-
storage mode increases the resolution by about a factor of 2, and allows
the the display of grey-level images.  Unfortunately, the 611 `s deflection
system is too slow for direct viewing of very large images without an
intolerable flicker.  For large grey-level images one of two approaches
must be used: photographic recording of the 611 screen, or halftone grey-
level simulation using storage mode.

Display Controller :

   The X, Y and Z axis signals to the 611 scope are generated by a
display controller designed at SUMEX.  Basically, the X and Y deflection
signals are generated by two 12-bit digital to analog converters which are
driven by X and Y position registers in the display controller.   The Z-
axis signal is controlled by a digital level which turns the 611 beam on
for a time proportional to the binary number in the Z axis register.


145

   The display controller is capable of two primary modes of operation:
vector and raster.  Vectors are generated as a sequence of discrete
points.  Vectors are specified to the hardware by the starting X, Y
location, the DX, DY distance between the discrete points of the vector,
and N the number of points in the vector.  Grey-level rasters are
generated as a sequence of discrete points (pixels) each of which can have
an arbitrary intensity.  To the controller, a single raster line is
specified exactly the same as a vector with the addition of N 8-bit bytes
of Z-axis intensity information.

The display controller is connected to a PDP-11 Unibus.

PDP-1 l/10:

   A PDP-ll/lO minicomputer is used to control the display.  For images
in non-storage mode, the PDP-11 dynamically refreshes the screen at about
3 microseconds per point (depending on the brightness: 3 microseconds is
the minimum time).  For halftone grey-level simulation, the PDP-11
executes the halftone algorithm.

PDP-10 to PDP-11 Interface:

   A general purpose communication interface connects the PDP-10 to the
PDP-11.  This hardware consists of two 32-bit registers, one for each
direction of data transfer, 2 status registers, and 2 control registers,
Using this interface, data can be transferred between the PDP-10 and the
PDP-11 at about 20 microseconds per word data rates (potentially).

SOFTWARE :

   The software to to utilize the display hardware consists of many
modules some of which execute on the PDP-10 and others on the PDP-11.

Communication Module:

   Communication between the PDP-10 and the PDP-11 is accomplished by
transferring blocks of data thru the communication interface which is
controlled by programs running in each machine.  The basic operations are:

a. Load a program in the PDP-11
b. Send a block of data to the PDP-11
c. Get a block of data from the PDP-11
d. Start a program running in the PDP-11
e. Stop the program running in the PDP-11

. . .  and a few other operations

PDP-11 Display Module :


146

   The display module interprets blocks of data sent thru the
communication interface as commands to the display.  The basic display
commands are :

a, move the beam to position X ,Y
b. draw a line consisting of N points, incrementing the

     beam position by DX,DY between each point.
c. generate a grey-level raster
d. generate a half tone raster
e. set beam brightness
f. display a string of text
g. display subroutine call
h. display subroutine return

   From these primitive commands all higher level display functions are
built.

PDP- 10 Display Module :

   The PDP-10 display module interprets SAIL procedure calls and
produces blocks of data to send to the PDP-11 display module.   In addition
to the primitives listed above, many higher level display functions are
implemented:

a. display a circle

b. display an arc of a circle
c. plot a graph of the data in a array: labelling the axes

PDP-10 Image Processing Functions:

   Many of the image processing functions developed at the Stanford
Artificial Intelligence Laboratory have been modified to run under Tenex.
The following is a partial list:

a. Input an image from a disk file
b. Output an image to a disk file
c. Input a window of an image from a disk file
d. Display an image on the 6 11 scope
e. Enhance the contrast (stretch) of the image
f. Rotate the image 90 degrees clockwise
g. High-pass filter the image
h. Low-pass filter the image
i. Display the histogram of the image
j. Expand the image
k. Remove "noise" from the image (local sigma test)
1. Difference two images


147

1V.B. 1.d    INFORMATION PROCESSING PSYCHOLOGY PROJECT

INFORMATION PROCESSING PSYCHOLOGY

Prof. E. Feigenbaum (Computer Science)
and Prof. H. Cohen (U. C. San Diego)

May 1976 Report

   Information Processing Psychology is concerned with the construction
of models of human cognition, using the methodologies of computer
simulation and artificial intelligence.  The attempt is to give a precise
characterization of the human information processes and information
structures that underly human problem solving, learning, and perceptual
behavior.  Over the past two decades,  research in this scientific area has
produced computer models of behavior in puzzle-solving, game-playing, and
theorem-proving tasks; rote learning laboratory tasks; linguistic
understanding and long-term memory tasks;  pattern extrapolation tasks,
e.g., as are found in intelligence tests; children's seriation tasks;
concept attainment tasks; visual scene understanding tasks; tasks
involving mental imagery; and many others.  A type of human
cognitive/perceptual activity that has not been much studied is the
behavior associated with the production of works of art.  In the past,
neither graphical/visual art-making nor musical composition has been
studied in depth.

   The particular project described below has sought to bring under
examination one of the least well-defined areas of higher intellectual
functioning -- the activities of art-making performance -- and to develop
a computer model (i.e., information processing model) capable of verifying
the plausibility of a number of hypotheses concerning such activities. We
have addressed the subset of art-making behavior which is concerned with
the production of freehand drawings, and in particular drawings which
might be characterized by their imagistic richness as opposed to formal
complexity .

    The computer model has followed the format used in many A.I.
programs : a production system in which an explicit body of knowledge is
encoded as a set of rules linking the recognition of complex prior program
behavior (in the making of the drawing) and current states within the
drawing itself, to the exercise of appropriate subsequent rules, which in
turn move the drawing into new states.  The model is to be regarded as an
expert or specialist,  in the sense that the encoded knowledge is
specifically concerned with the mechanics of image-building and does not
encompass any other aspect of the world.

    Since much of what we understand by 9neaning1' in images -- as
elsewhere -- clearly involves world knowledge, there may seem to be
something anomalous in a program without world knowledge designed to
generate imagistically rich drawings.  However,  our belief has been that a
large part of "meaning" is signalled by the image-structure itself, and
that this is related more to the nature of underlying perceptual processes
than to any particular stored perception of the world.   There should be a


148

set of pre-acculturated behavioral patterns of so fundamental a kind that
their very exercise would persuade the viewer that some "meaningl' was
intended,

   Following from this position, our selection of appropriate
production rules in the production system has tended to stress a number of
low level perceptual activities.  Early versions of the model were able to
differentiate figure from ground, closed forms from open forms, inside
from outside; and also to perform tasks -- like generating a path from one
point to another under certain constraints -- in feedback mode, which
required a continually updated model of the state of the drawing under
construct ion.

   More recently, the model has been given enough knowledge of the
mechanics of representation to permit it to manipulate the emerging
drawing more fully.  Thus, it knows that a closed form may function as a
delineated area upon which other markings may be made; or whose flatness
may be stressed by cross-hatching; or that the form may "stand for" a
solid object which may be shaded or cast shadows.

    The protocols referred to above are families of behavioral rules
which are distributed throughout the system and become enmeshed into
complex structures .  For example, one aspect of the model's awareness of
figure-ground relationships is a set of avoidance protocols, which prevent
the invasion of existing elements in the drawing.  Which of the set will
be invoked will depend upon both what is being done -- in terms of
currently "open" protocols -- and what is being avoided.

   The major protocols currently available to the model may be
summarized as follows :

closure: forms may be closed by preplanning or, at a later stage, as
suggested by the state of the drawing.  Reinforced by hatching, shading,
marking (recursive repetition, see below), piercing, accretion.

placement: the model is able to select unused areas of the drawing of a
shape and size appropriate to its current plan; its subsequent behavior
is then determined in large part by the precise consideration of its
environs.  The model has no aesthetic criteria or compositional
strategies beyond providing itself with adequate space.

avoidance: may result in the discontinuation or the modification of the
current plan, with or without the development of an alternative plan; or
in an attempt to circumvent the obstructing form.

repetition:  in the  I1 P lacement" area of the production system, this would
result in similar sub-histories being repeated, subject to local
conditions, in other parts of the drawing.   In other conditions it will
result in a recursive use of closed forms as fields upon which other
closed forms may be made; in multiple division or extension of an area;
in zigzags or groups of parallel lines, or in concentricity.

   Long term plans include the provision of simple "world knowledge" to
the problem, in order to investigate plausible specialist/non-specialist


149

interactions in the drawing process as a source of imagistic richness. We
have done some recent experimentation with people, designed to isolate and
examine the protocols actually employed by a group of drawing students in
a visual arts class at U.C., San Diego.  The results of these experiments
are now undergoing statistical analysis, and it is anticipated that much
useful material will be available for the next stage of program
development.

   The program described above was developed in SAIL on the SUMEX
facility, partly during the period when Professor Harold Cohen, of the
Visual Arts Department of U.C., San Diego, was on leave at Stanford
University, and partly upon his return to his campus.   He has assumed the
Directorship of a new Project for Art/Science Studies at UCSD, and has by
gifts and grants procured for the Center a PDP/ll-type facility capable of
supporting the research described above on the modeling of art-making
behavior.  The innovative work of this SUMEX pilot project will therefore
be "spun off" to a fruitful environment at UCSD.  As a SUMEX activity,
this particular research has effectively terminated.

Bibliographic References

1.  Cohen, H.  "Steps toward a Theory of Meaning", invited paper,
  International Sculpture Conference, U. of Kansas, March, 1974

2.  Cohen, H.  "On Tools and People, Including Computers and Artists",
  invited paper, Conference on Computers in Art, Purdue Univ., 1975.

3.  Cohen, H.  "The Simulation of Perception: Problems in Generating
  Drawings by Machine", invited paper AAAS Annual Meeting, 1972.

4.  Cohen, H.  "On Purpose", <<Studio International>>, January, 1974.

5.  Cohen, H.  "Parallel to Perception", <<Proceedings of the Edinburgh
  Conference on Art and Computing>>, 1973


150

1V.B.l.e    AIM RESEARCH - UNIVERSITY OF ROCHESTER

AIM Research - University of Rochester

    Drs. Feldman, Rovner, and
        Rochester University

(Grant NSF DCR74-24203, 2 years,
Sloan Fdn. 74-12-5, 3 years, $

Low

$149,956 total and
120,000 last year)

SUMEX facilities are being used at the Un iversity of Rochester by a

group of about 10 second-year medical students under the direction of
Charles Odoroff, Biostatistics.  Their work is either an optional part of
a course in biostatistics and epidemiology, or an individual project.
They have studied some of the documentation of the MYCIN system, and have
experimented with using it both on canned case histories and on cases from
the files of microbiologists at the U. of R. Medical School.   At least one
evaluative paper has been written,  It is planned to use the CASNET system
in the same way.

   There is continuing system development work, especially for the SAIL
language system.



151

1V.B.l.f   QUANTUM CHEMICAL INVESTIGATIONS

QUANTUM CHEMICAL INVESTIGATIONS OF
HEME PROTEINS AND FERREDOXINS

      Dr. Gilda Loew
Stanford Department of Genetics

(Grant NSF GB-40105, 2 years, $18,000 this year)

   SUMEX is used for the calculation of various one-electron
electromagnetic properties of iron containing compounds.  The programs
were formulated and written by David Steinberg, Michael Chadwick, and
David Lo.  David Lo was responsible for converting the programs for
interactive use on the PDP system.  Slight improvements were made by
Robert Kirchner, and Sheldon Aronowitz is currently expanding the
formulation to include additional spin and oxidation states of the iron
atom.

   The properties that are calculated include the electric field
gradient at the iron nucleus, quadrupole splitting, isotropic and
anisotropic hyperfine interaction, spin-orbit coupling and zero field
splitting, g values, and temperature dependent effective magnetic moments.
The calculated values are compared directly to experimental results
obtained from published Mossbauer resonance and electron spin resonance
spectra.  Such a comparison determines not only the reliability with which
these properties can be calculated but also gives an indication of the
ability of the model of the iron active site to mimic the actual
environment found in a particular compound or iron containing protein.

  The major input to these properties programs is a description of the
electron distribution of the compound under consideration. This
description is obtained using a semi-empirical molecular orbital method
employing the interactive extended Huckel procedure.  Such a calculation
requires up to 660K core and is performed elsewhere.  When the calculated
electron distribution yields a set of calculated properties in agreement
with observation, we have increased faith in the description of the model
of the active site and can carry the model one step further to make
qualitative inferences about certain properties relevant to the biological
functioning of the compound. These properties, which are harder to
characterize experimentally, include the nature of the ligand binding to
iron, relative bond strengths themselves, net atomic charges, and electric
potentials.  The model may be varied (that is, change the spin or
oxidation state of the iron, replace certain ligands, or simply change the
geometry of the ligands) and a new set of properties calculated to predict
what effect these changes would have on the observed electromagnetic
properties,

  Such a procedure lends itself well to the study of three classes of
iron containing compounds of biological interest: the one-iron sulfur
proteins known as rubredoxins and the two-iron sulfur proteins called


152

plant-type ferredoxins which serve as one electron transfer agents, heme
proteins which serve as oxygen as well as one electron transfer agents,
and sideramines which serve as iron transport agents.   The calculated
properties for the first class, used to elucidate the geometry of the
sulfur 1iRands and the spin state of the iron within the protein, are
reported in the following

PUBLICATIONS :

[l] G.H. Loew, M. Chadwick, and D.A. Steinberg, Theoret. Chim.  Acta
  (Berl.) 33, 125 (1974)

[2] G.H. Loew and D.Y. Lo, Theoret. Chim. Acta (Berl.) 33, 137 (1974)

[3] G.H. Loew, M. Chadwick, and D.Y. Lo, Theoret. Chim. Acta (Berl.) 33,
  147 (1974)

[4] G.H. Loew and D.Y. Lo, Theoret. Chim. Acta (Berl.) 32, 217 (1974)

   We are currently performing a systematic study of heme proteins.
The electromagnetic properties of these proteins and of synthesized
compounds which mimic the observed behavior of the proteins have been well
studied experimentally.  But many questions regarding the nature of small
ligand binding to the heme group remain unresolved. Before we can address
ourselves to such problems, we must first be able to theoretically
reproduce the experimentally observed behavior.  The specific areas of
interest are :

a.> deoxy heme in both the relaxed (iron in the plane with a low or high
  spin state) and tense (iron out of the plane with a high spin state)
  configurations.

b.) oxy heme with various oxygen geometries (co-axial or coplanar) and
  several excited electronic states (promotion of an electron from an
  iron d-orbital to anunfilled oxygen orbital).

c. 1 abnormal heme compounds which do not bind oxygen but which bind
  axially to CN, N3, NO, or OH.

d.) the enzymatic,cycle of cytochrome P450 camphor, in which the protein
  has been isolated with!the iron in various spin and oxidations
   states.

   Preliminary results for oxy heme have been published (Loew and
Kirchner) in the J. \Amer. Chem. Sot. 97, 7388 (1975).

  We have alsz  9
            ca culated the electromagnetic properties of ferrocene
Fe(C5H5)2 and the binuclear transition metal complexes biferrocene and
biferrocenylene in various oxidation states.  The work has been published
(Kirchner and Loew) in ,Theoret,. Chim. Acta (Berl.) 41, 1 (1976) and
submitted (Kirchner and Loew) to Inorganic Chemistry.


153

   The heme work is funded by the National Science Foundation Grant GB
40105, which was renewed starting June 1976 for a period of two years.

   Undergraduate research projects which attempt to correlate reactive
electronic sites for a series of polycyclic aromatic hydrocarbons with
carcinogenic activity use SUMEX to calculate various measures of C-C and
C-H bond reactivity.  Again, the major input to this program is taken from
the results of an iterative extended Huckel molecular orbital calculation
performed elsewhere,  The only current funding for these projects is a
SCIP computer processing subsidy although an application to the National
Institute of Health is pending.


154

IV.B.2    NATIONAL PILOT PROJECTS

   Over that past year several national pilot projects have been
initiated with the approval of the AIM Executive Committee and advice of
he AIM Advisory Group.  One of these (the ACT Project under Dr. John
Anderson) has moved from pilot status to become a formal project.  The
currently active pilot efforts are summarized below.

IV.B.2.a   NATURAL LANGUAGE UNDERSTANDING

Natural Language Understanding

Prof. R. Lindsay
University of Michigan

(Financial support from University of Michigan)

I.  Summary of Research Program

   The major aims of this pilot project have been to establish
research goals, initiate collaborations with faculty at the University of
Michigan Medical School, and to develop software.  The project staff
consists of Associate Professor Robert K.  Lindsay, Dr.  Mai ja Kibens
(Research Associate), and Mrs.  Kathie Gourlay (Programmer Analyst), all
of the University of Michigan.

A) Technical Goals

   The overall goal of this project is the development of a
histological model that will assist anatomists in the design and
evaluation of methods of organ culture.  As conceived at present, our
technical goals are the design of (a) a data structure for encoding
descriptions of microscope slides made from organ explants, (b) a model of
microanatomical processes based on the expertise of histologists and
pathologists, and cc> means for construction of the data structures of (a)
from histologists' verbal descriptions.

B) Medical Relevance and Collaboration

  The value of such a system to biology and medicine is far-reaching
to the extent that it succeeds in assisting in the development of organ
culture methodology.  To illustrate with a single important example, the
ability to cultivate the organs of experimental animals is the first step
in the in vitro study of disease processes such as cancer in those organs.
We are working in collaboration with three anatomists in the Department of
Anatomy, Professor Raymond Kahn, Associate Professor William Burkel, and
Assistant Professor Theodore Fischer. For the past two years this group
has been experimenting with methods for Cultivating canine prostate.


155

C) Progress and Accomplishments

   Our efforts have been directed along two fronts.   We are
familiarizing ourselves with the current capabilities, knowledge, and
problems of the histology group. We are also developing artificial
intelligence programs to understand histological information typed in a
natural language format.

Collaborating with the histologists

1.  We are interviewing the principal investigators and their several
  assistants individually to learn from each of them his conception of
  tissue functioning.

2.  Formulating better analysis methods - Together with the histologists
  we have designed a new grading scheme for recording the maintenance
  status of an organ explant.  This scheme is more complex than their
  previous category system in that it includes more factors and finer
   distinctions.  Currently, the group is using standard non-parametric
  statistics to compare the evaluations of the explants.

3.  Designing an AI model - The histologists are enthusiastic about the
   new evaluation method.  They believe it to be a great improvement
  over their previous one.  However, they recognize its inadequacies
  and are open to any artificial intelligence techniques that will
  enable them to capture more of their knowledge about each explant in
  a form that can be used in the design and evaluation of experiments.

Software Development for Natural Language Input

1.  Formulating a general design - The proposed structure for the system
  includes multiple sources of knowledge, each contributing hypotheses
  about the meaning of the input.  The input is to be freely formatted
  with possible typing, spelling, and syntactic errors.  The output
  will be an internal representation of the meaning of the input text,
  namely a representation of an organ explant.  The knowledge will
  include components for: typing correction, word decomposition, morph
  recognition, syntactic analysis, semantic analysis, and histological
  knowledge.  The knowledge components of the system are being
  designed to be independent of each other insofar as possible.

2.  Implementing - The knowledge components for typing correction and
  word decomposition have been written in INTERLISP. The dictionary
  format has been designed.  Work is currently being done on the morph
  recognition component.

D> Publications

    A manuscript describing the results of the application of the
revised evaluation scheme has been written by Professors Kahn, Burkel,
Fischer, and Herwig (the project surgeon).  The paper is titled "Effect of


156

vitamin A on canine prostate in organ culture".   There have been no
publications to date on the AI aspect of this project.

E) Funding Status

   The canine prostate project is funded by the National Cancer
Institute.  A grant application to NIH for similar work with lung is
pending.  The salaries and research facilities of Professor Lindsay, Dr.
Kibens,  and Mrs.  Gourlay are provided by the University of Michigan
Mental Health Research Institute, a division of the Department of
Psychiatry in the Medical School.

II.  Interactions with the SUMEX-AIM resource

A) Medical Use of Programs through Networks

We have not had occasion for such use.

B) Useful Contacts and Cross Fertilization with Other SUMEX-AIM Projects

   Kathie Gourlay is on the LISP users' mailing list. Most of the other
users and personnel at SUMEX have been very helpful in giving advice and
solving problems.  A few examples of such assistance during the past year
follow.

   There have been some problems with the speed of response in an
INTERLISP program.  Masinter has been very helpful with suggestions. In
one instance, he was able to run the program and interactively pinpoint
some slow spots.

   We have also had communication via SNDMSG with N.  Smith, Hedberg,
Feigenbaum, Davis, Lederberg, R.  Smith, Colby, Parkison, Winograd, and
others.

    Dr.  Kibens has used the LINK facility to obtain answers to
questions about details of system use, and has used SNDMSG extensively for
such purposes.  Other interactions have concerned obtaining information
about current status of natural language input systems for interactive
programs.  SUMEX has also been a very convenient communications facility
via SNDMSG to non-SUMEX users at SRI, CMU, and SU-AI.

C> Critique of Resource Services

   The SUMEX staff has been very helpful.  To cite just one example: At
one time last year, we wanted to transfer some data to SUMEX from a PDP-9
minicomputer which is located here.  Gourlay communicated with Cower via
SNDMSG and arranged to mail a DECtape.  Contrary to what the DECsystem 10
Assembly Language Handbook led Cower to believe, the PDP-9 DECtape was not
compatible with the PDP-10 software.  Cower and another person spent


157

several hours working unsuccessfully to transfer the data.  We appreciate
their efforts although the problem could not be solved because of serious
incompatabilities.

   Now that Tymnet has several local lines to our area (since January
1976) and we have terminals located in our offices the SUMEX facility is
very convenient.  The system is quite reliable.  However ,  when it is down
the explanation given us from Tymnet is almost always out-of-date, e. g.,
maintenance work that was completed hours ago.

    There should be a manual such as the Tenex Executive Manual that
explains the features available on the SUMEX system.  At present, it is
necessary to look in dozens of different documentation files or to learn
by hearsay.  In the interim, it would be good to have one of the system
personnel designated as a general source person for details of system use.

   We would suggest that some thought be given to making the LINKing
facility a more productive and convenient means of communication.  While
LINKing is a potentially useful device,  it is also a potential nuisance to
the recipients.  This is a cause of our reluctance to use this facility.
Perhaps an explicit policy should be decided upon by all SUMEX users to
establish what is the community attitude toward LINKing.  It might
facilitate communication by LINKing if the default option were changed to
REFUSE, but modified to allow immediate acceptance of the LINK upon
learning who it is from.  Certain programs, such as TYPE, are usually run
in REFUSE mode.  Perhaps these programs should set REFUSE mode on entry
and clear it on exit so that the user would be protected from interruption
at those times without needing to have the REFUSE mode set permanently.
There are obviously many such technical changes that could be made to
improve this feature,  and we would like to see some discussion of them.

    We look forward to the availability of 1200 baud capability over
TYMNET so that listings can be obtained more rapidly.  Any encouragement
from SUMEX to TYMNET that would speed the conversion would be appreciated.
We think it would be wise for the system to be compatible with the VADIC
protocol (soon to be adopted by Bell, we hear unofficially) rather than
with the troublesome Bell 202 equipment, as announced.


158

IV.B.2.b    KRL PROJECT

Knowledge Representation Language - KRL

Dr. Dan Bobrow, Xerox PARC
Dr. Terry Winograd, SU AI Lab

   This pilot project was just initiated on the SUMBX-AIM facility and
is a medicine-oriented extension of the KRL development effort at Xerox
Palo Alto Research Center.  The basis of the original project is the
development of a systematic programming framework within which to describe
and manipulate knowledge about a task domain and which may be used by a
performance program to reason and solve problems within that domain.  The
development of such AI tools is an important part of the AIM community in
that it allows the more coherent and general formulation of medical AI
programs.

   A first version of KRL has been implemented and several students
will experiment with implementing medical consultation programs (e.g.,
MYCIN, CASNET, or Rubin's model of renal disease) using KRL.


159

IV.B.2.c    COMPUTERIZED PATIENT MONITORING

Computerized Patient Monitoring and Clinical Decision Making

John J. Osborne, M.D. Director Intensive Care Services
  Richard R. Mitchell, Ph.D. Biomedical Engineer

The Institutes of Medical Sciences
    Pacific Medical Center

   The immediate desire of this pilot project is to use MLAB and to
explore the opportunity to become a regular member of the SUMEX-AIM user
community.

   The project is part of a Bioengineering and Computer Science
Resource for medical research. The Research Data Facility of the
Institutes of Medical Science is a NIH funded center for the development
of computerized patient monitoring and clinical decision making. The
emphasis to date has been in the area of clinical monitoring of the
respiratory parameters of critically ill patients.

   There are three major areas of potential joint cooperation with
SUMEX-AIM:

1.  Clinical Decision Making;

2.  Image Processing;

3.  Modeling.

   In the area of Clinical Decision Making the project is funded to
develop an intelligent system for detecting and reporting alarm conditions
in the hospital intensive care environment.  Its goals include using
sophisticated image processing techniques for the evaluation of pulmonary
physiology using the scintillation camera and 133 Xenon.  The present work
in modeling is limited by the capabilities of the IBM 1800 CSMP and the
investigators are interested in exploring the use of more complicated
models requiring a sophisticated simulation language,


160

IV.B.2.d    AI IN PSYCHOPHARMACOLOGY

Artificial Intelligence in Psychopharmacology
  (NIH grant application in preparation)

    Dr. Jon F. Helser, M.D.
   Assistant Adjunct Professor
Dept. of Psychiatry and Human Behavior
University of California at Irvine

A.   Introduction

   This project has just been authorized as an AIM project.   The
Following quote from a letter of Drs.  Buchanan and Axline of the MYCIN
project describes the collaboration that lead to their project and which
is expected to continue.

   "He is extending MYCIN's knowledge base to cover consultations
regarding chemotherapy for psychiatric disorders,  This is valuable to us
for at least two reasons:  it increases the potential uses of the program
and it illuminates those specific parts of the program that are not yet
general enough to be easily extended to new areas.  By pioneering in the
effort to develop a more general framework for medical reasoning computer
programs, Dr. Heiser is helping us provide a means for encoding and
testing large amounts of medical knowledge."           I*

   The objective of the new project is to develop computer based
automated systems capable of assisting in research, teaching and
consultation in psychopharmacology.  It will result in the development of
software which will run on the University of California, Irvine PDP-10.

   [The following material is abstracted from Dr. Heiser's proposal to
the AIM Executive Committee].

A. 1  BACKGROUND :

    Information in medicine expands so rapidly that both researchers and
clinicians struggle to digest it and apply it wisely.  Computer-based
instruction (textbooks, journals, individualized supervision and
consultation) is one solution to this problem.  By their very nature
computer-based systems can be programmed to 1) explain their reasoning in
natural language and in terms intuitively acceptable to users of various
degrees of sophistication, 2) have their behavior totally analyzed, and 3)
be easily modified or updated,

   Computer-based knowledge systems have been developed for describing
and solving problems in pharmacology.  At Stanford University Medical
Center in California, when new drugs are prescribed for a patient, their
profile of action is compared to that of drugs already consumed by the
patient.  A warning is generated for the physician if a potential
interaction is noted (1).  Artificially intelligent systems have been


161

developed which utilize pharmaco-kinetic models to suggest initial doses
and monitor on-going maintenance doses with complex drugs such as
digitalis (2).  Artificial intelligence systems are also being generated
to diagnose patients with infectious diseases and to suggest appropriate
antibiotic agents (3).

   Several other systems were discussed at the First Annual AIM
(Artificial Intelligence in Medicine) Workshop, held at Rutgers University
14 June through 17 June 1975.  (S. Amarel and C.A. Kulikowski of the
Computer Science Department at Rutgers University directed the
conference).

  We have begun to adapt the techniques of the Stanford Group (3,4) in
the generation of an artificially intelligent system which evaluates and
diagnoses psychiatric patients, suggest pharmacological treatment and
monitors the on-going clinical course.

   The system has 16 rules, based on conventional clinical
observations, for diagnosing either mania or schizophrenia.  Clinical
findings are collected and a diagnosis made by manipulating human expert
generated "certainty factors" which are similar to but not identical to
probabilities.  their precise mathematical nature and manipulation are
described in Shortliffe and Buchanan (4).

A.2 RATIONALE:

   Psychopharmacological agents are frequently misused qualitatively
and quantitatively by prescribing physicians as well as by consumers.
Consultation with experts in psychopharmacology is frequently sought and
is given on the basis of clinical data, currently established practice,
evolving research or ad hoc hypotheses.  A computer based consultation
system, available 24 hours per day, could greatly assist non-specialist
physicians in choosing the best psychopharmacological treatment, given the
same expertise and data.  Such a system could also serve as a teacher-
advisor to students and as a reference for various types of
psychopharmacological knowledge, e.g., well established principles and
practice, new but not fully verified ideas and late breaking developments
(5).  Properly weighted, all such information could be used in in
consulting and teaching functions,  For example, the system could suggest
less well established, more controversial or more hazardous diagnostic or
therapeutic techniques for a patient with a life-threatening situation not
responsive to conventional measures.  Like the human clinician, in
desperate or excessively chronic circumstances, the system could generate
novel hypothesis with an estimate of potential risks and benefits.

B. SPECIFIC AI@,:

1.  To study existing automated systems, computer based and otherwise,
  which assist in clinical decision making.

2.  To develop a model of expert clinical decision making for clinical
  psychopharmacology.


162

3.  To implement this model on a computer system such that the system
  can converse in real time in natural language through computer
  terminals with users located close by or remotely.

4.  To evaluate the performance of the system as a teaching and
  consulting aide.

5.

To increase the breadth and depth of the artificially intelligent
system by a) increasing the technical sophistication, e.g., by
adding options such as voice activated microphone-loudspeaker (or
print-out) terminals. b) adding other areas of clinical psychiatry
towards an ultimate goal of having a fully automated and self-
contained textbook-consultant for psychiatry.  c) linking the system
to preceded data bases so that the system could quickly learn from
thousands of actual case histories and use this data and experience
both to modify its intuitively human-like rule-based decision model
and to generate abstract mathematical or statistical decision making
models. d) integrating the system with biomedical data collecting
techniques achieve more direct involvement in research and patient
evaluation.  e) proposing new drugs to be synthesized or tested for
desired psychopharmacologic affects.

   Aims l-4 should easily be attained within five years.  Aim number 5
is obviously quite speculative and dependent on developments in computer
technology, biomedical engineering, etc.  Promising beginnings have been
made.  An example is a program which reads and analyzes the content of
typed scripts of spontaneous human speech (6). This system parses
sentences into noun and verb clauses, recursively or repeatedly if
necessary, and uses a set of rules to score the noun and verb phrases for
a variety of affects and states of mind by means of a well documented,
reliable and valid technique of content analysis otherwise requiring a
human content analyzer with a common sense knowledge of the world and the
language.

C.  METHODS AND PROCEDURES :

   We plan to study existing systems and to develop a similar system in
clinical psychopharmacology.  Dialogues, equivalent to those planned for
this development, are routinely produced by the antibacterial program
mentioned above in (3) and could be available in the area of clinical
psychopharmacology within a year in complete enough form to be evaluated.
This work will be done in consultation with the Information and Computer
Science Department at University of Calif., Irvine and the Computer
Science Department at Stanford University.  During approximately the first
two years efforts will be concentrated on developing the artificial
intelligence techniques referred to above, including question answering in
natural language, abstract reasoning, advice giving, etc.  The expert
information will be installed in sentence-like form initially from the
working knowledge of the principal investigator and the behavior of the
system compared to the behavior of the principal investigator.   In the
second phase expert information will be abstracted from standard
textbooks, journals and consultation with acknowledged experts.  Here
evaluation becomes more difficult except when the system makes an obvious


163

mistake.  A third level of expert input will include other and possibly
non-human information sources such as actuarial formulas and other
statistical techniques (12).

   In later phases of the project more concern will be placed on the
diagnosis problem.  This problem is being deemphasized during initial
phases because it has received reasonable attention by other groups (7-
12).

   Many of the above mentioned systems have been formally or informally
evaluated and found to perform, within their range of applicability and
our ability to measure performance, as well as acknowledged experts (12).
Consultation with experts in evaluation research in both education and
clinical medicine is available locally and will be enlisted in later
aspects of the project once a workable system has been developed.

   Risks and hazards in this procedure are minimal since no biological
material is involved, no patient records are used and identification of
patients to be discussed can be secured or eliminated with no impact on
the system or the user.  Use of a consulting and teaching system in
clinical psychopharmacology might involve clinical responsibility and
could be regarded as contributing to or responsible for an error of
omission or commission in clinical judgement and practice.  Every attempt
will be made to complete a thorough evaluation of the system, its validity
and reliability before it is made available to other than a small testing
group.  If in later phases we add the capability of utilizing large data
bases such as available through the Missouri Information System (10,121
only those well developed procedures for transmitting large data bases
with complete anonymity and protection of individual patient rights will
be used.  Such procedures will be rigidly and consistently adhered to in
all aspects of this project.

D.   SIGNIFICANCE :

   It is hoped that results from these initial studies will stimulate
further research by physicians and graduate students in related fields
such as biochemistry, pharmacology, pharmacy, mathematics and the
information processing sciences.  To have instant access to teaching and
consultation based on a consensus of the best data, the best abstract
mathematical or statistical "number crunching" techniques and the best
human experts would be of great value to researchers, specialists, family
physicians,  students and others.  Evaluation of the effects of such a
system and comparison with traditional methods of research, teaching and
consultation would be of great benefit to medical educators.  It is hoped
that many students would master, beat or "psyche out" the system.  This
would be excellent evidence that learning is occurring.  However, because
of the information explosion, periodic updates to and from the system
should prevent it from becoming obsolete.

References

1.  Cohen, S.N. et al.  Computer-based monitoring and reporting of drug


164

interactions,  Proceedings MEDINFO IFIP Conference, Stockholm, Sweden,
August 1974.

2.  Silverman, H, A digitalis therapy advisor.  MAC TR-143, Massachusetts
  Institute of Technology, Cambridge, Massachusetts, January 1975.

3.  Shortliffe, E.H., AXLINE, S.G., Buchanan, B.G. and Cohen, S.N. Design
  considerations for a program to provide consultation in clinical
  therapeutics,  Proceedings of the San Diego Biomedical Symposium,
  February 1974, 311-319.

4.  Shortliffe, E.H. and Buchanan, B.G.  A model of inexact reasoning in
   medicine.  Mathematical Biosciences 23, 351-379, 1975.

5.  Ayd, F.J.  Rules for neuroleptic therapy.  International Drug Therapy
  Newsletter 9, 33-35 (1974).

6.  Gottschalk, L.A., Hausmann, C. and Brown, J.S.  A computerized scoring
  system for use with content analysis scales.  Comprehensive Psychiatry
  16, 77-90, 1975.

7.  Johnson, J.H., Giannetti, R.A. and Williams, T.A.  Real-time
  psychological assessment and evaluation of psychiatric patients.
  Behavioral Research Methods and Instrumentation 7, 199-200, 1975,

8.  Glueck, B.C.  Computers at the Institute of Living.  In J.F. Crawford,
  D.W. Morgan and D. Gianturco (Eds.), Progress in mental health
  information systems: Computer applications.  Cambridge, Mass:
  Ballinger Publishing Company, 1974.

9.  Laska, E.M.  The Multi-state information system.  In J.F. Crawford,
  D.W. Morgan and D. Gianturco (Eds.), Progress in mental health
  information systems: Computer applications.  Cambridge, Mass:
  Ballinger Publishing Company, 1974.

10. Sletten, I.W., and Hedlund, J.L.  The Missouri automated Standard
  System of Psychiatry: Current status, special problems and future
  plans.  In J.F. Crawford, D.W. Morgan and D. Gianturco (Eds.),
  Progress in mental health information systems: Computer applications.
  Cambridge Mass: Ballinger Publishing Company, 1974.

11. Spitzer, R.L. and Endicott, J.  Can the computer assist physicians in
  psychiatric diagnosis? American Journal of Psychiatry, 131, 523-530,
  1974.

12. Sletten, I.W. and Hedlund, J.L.  The future of computers and actuarial
  methods in mental health practice,  Presented at the International
  College of Psychosomatic Medicine Symposium IV: Rating Devices and
  Information Processing in Psychosomatics, Catholic University, Rome,
  Italy, September 16-20, 1975.


166

APPENDIX A

OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH

B. G. Buchanan and E. A. Feigenbaum
     Stanford University

   We give here a brief overview of artificial intelligence (AI) taken
from a description of the Stanford Artificial Intelligence Laboratory.
The articles following the overview are taken from a preliminary draft of
a handbook about AI being written at Stanford under Professor Feigenbaum's
supervision.  The intent of the articles is to convey some sense of the
techniques, problems and successes of AI. Only a few of the most relevant
articles are reproduced here.

OVERVIEW

   Artificial intelligence is the name given to the study of
intellectual processes and how computers can be made to perform them.
Some workers in the field believe that it will be possible to program
computers to carry out many intellectual process now done by humans.
However, almost all agree that we are not very close to this goal and that
some fundamental discoveries must be made first.  Therefore,  work in AI
includes trying to analyze intelligent behavior into more basic data
structures and processes, experiments to determine if processes proposed
to solve some class of problems really work, and attempts to apply what we
have found so far to practical problems.

   The idea of intelligent machines is very old in fiction, but present
work dates from the time stored program electronic computers became
available starting in 1949.  Any behavior that can carried out by any
mechanical device can be represented in a computer, and getting a
particular behavior is "just1t a matter of writing a program unless the
behavior requires special input and output equipment. It is perhaps
reasonable to date AI from A.M.  Turing's 1950 paper. Newell, Shaw and
Simon started their group in 1954 and the M.I.T. Artificial Intelligence
Laboratory was started by McCarthy and Minsky in 1958.  [The Stanford AI
Lab was started in 1963.1

     Board Games
      P -a  Early work in AI included programs to play games like
chess, checkers, kalah and go.  The success of these programs was related
to the extent that human play of these games makes use of mechanisms we
didn't understand well enough to program.  If the game requires only well
understood mechanisms,  computers play better than humans.   Kalah is such a
game.  The best rating obtained in tournament play by a chess program so
far is around 1700 which is a good amateur level.  The chess programmers
hope to do better.

   Formal Reasoning. Another early problem domain was theorem proving
in logic.  This is important for two reasons.  First, it provides another
area in which our accomplishments in artificial intelligence can be


167

compared with human intelligence.  Again the results obtained depend on
what intellectual mechanisms the theorem proving requires, but in general
the results have not been as good as with game playing.  (This is partly
because the mathematical logical systems available were designed for
proving metatheorems about logic rather than for proving theorems in
logic.)

   The second reason why theorem proving is important is that logical
languages can be used to express what we wish to tell the computer about
the world, and we can try to make it reason from this what it should do to
solve the problems we give it.  It is quite difficult to express what
humans know about the world in the present logical languages or in any
other way.  Some of what we know is readily expressed in natural language,
but much basic information about causality and what may happen when an
action is taken is not ever explicitly stated in human speech,  This gives
rise to the representation problem of determining what is known in general
about the world and how to express it in a form that can be used by the
computer to solve problems.

    Publications.  The results of current research in artificial
intelligence are published in the journal Artificial Intelligence, and in
more general computer science publications such as those of the ACM and
the British Computer Society,  The ACM has a special interest group on
artificial intelligence called SIGART which publishes a newsletter.  Every
two years there is an international conference on artificial intelligence
which publishes a proceedings.  The fourth and most recent was held in the
U.S.S.R. at Tbilisi in the September 1975 and the proceedings are
available,


168

SUMMARY ARTICLES ON SELECTED TOPICS

   The following are selected articles on various aspects of Artificial
Intelligence research taken from the current collection of articles in the
AI handbook effort.  A complete outline of the articles planned can be
found in Appendix B.  The following articles include discussions of
production systems, rote learning, speech understanding, and PLANNER.

PRODUCTION SYSTEMS

GENERAL DESCRIPTION

   A PRODUCTION SYSTEM consists of a set of rules (the productions), a
data base and an interpreter for the rules.  The data base is a collection
of symbols.  The interpreter tries to match the left hand side of each
production to the data base. The interpreter performs the processes on the
right hand side of the production if the condition on the left hand side
matches some element in the data base.  The productions are generally
ordered so that if the condition on the left hand side of more than one
production matches an element in the data base, the production higher in
the order takes priority.

DATA BASE EXAMPLES
--

   The data base of a production system may be simply a set of symbols
intended to reflect the state of the world.  Some production systems are
intended to model a memory mechanism, for example, "short term memory",
For these, each element of the data base may represent some piece of
knowledge.  Examples of systems modeling short term memory are PSG
[Newell, 19731 and VIS [Moran, 19731.  Sample elements from the data base
for VIS are

(HEAR NORTH EAST 5 END)
(L-2 LINE EAST P-2 P-l)

   The data bases for knowledge-based experts such as MYCIN [Shortliffe
19751 and DENDRAL [Feigenbaum 1971, Smith 19721 contain facts and
assertions about their respective domains of knowledge.  For example, the
data base in the DENDRAL system contains complex graph structures which
represent molecules and molecular fragments. Sample elements form the
MYCIN data base are

(IDENTITY ORGANISM-l E.COLI ,8)
(SITE CULTURE-2 BLOOD 1.0 >

   A third type of data base is the ntoken stream approach" in which
the data base is a linear stream of tokens accessible only in sequence.
An attempt is made to match each production to the beginning of the stream
and if a match succeeds, characters in the matched segment may be deleted
or modified or new characters may be added.  This data base organization
was used in LISP70 [Tesler 19731.


169

   In all the production systems described above the data base is the
only storage medium for all variables of the system. There is no separate
control state information such as a program counter or stack as is used in
procedurally-oriented languages.  The data base is accessible to every
rule in the system and thus serves as a communication channel.  The
contents of the data base always reflect the current state of the
production system.

VARIATIONS OF PRODUCTION SYSTEMS
            -

   Production systems have been used in many different programs and
programming environments.  Many variations of production systems are
possible due to differences in the ordering and accessing of rules.  The
productions are themselves a source of variation in production systems.
It is possible to match against the right hand side of the productions
instead of the left hand side to obtain a recognizer for symbolic strings.
It is also possible to view the left hand side as a goal to be achieved by
matching the right hand side of the production,  The data base may be a
source of variation in production systems as has been discussed above.

ENVIRONMENT

   Production systems are particularly appropriate in a domain
consisting of a large number of independent states requiring independent
actions.  The states and actions can be modeled easily using rules, which
are also modular in nature.  Procedure-oriented systems often find it
difficult to update and maintain large numbers of state variables.
Production systems are particularly appropriate in this instance. Each
production can be viewed as a "demon" ready to be invoked when a
particular system state occurs.  Production systems are also appropriate
where the ability to recognize and react to small variations in the domain
is important.

REFERENCES

   An excellent reference that discusses in detail many aspects of
production systems is a paper by R. Davis and J. King entitled "An
Overview of Production Systems", (A.I.  Memo 271, Stanford Computer
Science Department, November 1975).  Other references used in this article

are:

Minsky M., Computation Finite and Infinite Machines, Prentice-Hall, 1972.

Newell A., Simon H., Human Problem Solving, Prentice-Hall, 1972.


170

ROTE LEARNING

BRIEF DESCRIPTION AND HISTORY

   Rote learning is a technique which effectively increases the depth
of tree searches by recognizing nodes (situations) that have been
encountered and evaluated previously.  This is done by consulting a file
which contains for each node previously encountered a description of that
node and the result of the evaluation at that encounter.

   This technique was first used by A. L. Samuel in his Checker playing
program [See Samuel 19591.  The program used rote learning to accumulate
experience over the games it played.

ENVIRONMENT

   Rote learning is particularly useful when searching game type trees
where the value of a position (node, state) is determined by use of an
evaluation function.

TECHNIQUE

    Assume that there exists a list of nodes which have been evaluated
previously.  Associated with each node description is an evaluation. This
list will be called the memory file.  At the very beginning of the
learning process, the file contains nothing in the list.  The steps below
show how the file is built up by the rote learning process.

The basic steps to rote learning are:

1)

From the current node which is to be evaluated, form
the tree which is to be searched.  The form and size
of the tree may be governed by a set of heuristics.
(For example, expanding the tree fully to a depth
of three ply)

2)     Evaluate the deepest nodes as follows:

a>  Examine the memory file to see if any of the
  deepest nodes have been previously evaluated.
  If so, retrieve from the file the evaluations
   of these nodes.  (Effectively then, these nodes
  have been evaluated by a further tree search.)

b) Any of the deepest nodes not present in the memory
  file should now be evaluated by the evaluation
   function.

3)

Now that all of the deepest nodes have been evaluated
back up the tree in the usual min-max fashion to
obtain the evaluation for the current node, and to
obtain the decision.


171

4)

Save a description of the current node and its value
in the memory file.

EXAMPLE

   Assume that we are playing a game, and that the tree search
heuristic is to completely expand the tree to a depth of two ply.  Further
suppose that we have arrived at a node A, which we wish to evaluate.
Assume that this is not our first game, so that the memory file is not
empty.  We follow the steps of the rote learning procedure as shown:

STEP 1: From node A, we expand the game tree to a depth of
     two ply, and label the deepest nodes B through J:

                               -----
                   I A I
                               ----m
                 / I \
                 / I \
                / f \
                    /    t     \
                                 I
           --------------               ----B-s
          I                        I                I

          I               I                I

          I               I                I

       -----                   --w-s           -e-B-
        I     I                   I                I
        I                 I          I_-_-_____-
       -----                   --e-e           -----        I
   / I \          1 I        / I \            I
   / I \         1 I       / I \           I
  / I \       1 I      / I \          I

  /    I     \       /       I       /       I     \         I

/     t      \      /        I      /        I      \        I

-a--- w--w- --mm- ----- ---m- -mm-- ----- -mm-- --s-m
I B I  ICI  I D I  I E 1   IFI  I G I  I H 1  I I I  I J I
-m-m- ---mm --s-m --B-B ---m- ----- --s-B -m--m --w-s

STEP 2:   Looking in the memory file we discover that nodes
     B, C, E, F, and H have been previously evaluated,
      so that we already have their values.   Thus we
      apply the evaluation function only to nodes D, G,
      I, and J to obtain their values.

STEP 3:   Now that the deepest nodes have values, we can go
      up the tree in the standard fashion, eventually
      assigning a value to node A, and deciding which
        branch to take.

STEP 4:   Now that we have a value for node A, we place a


172

description of the node (for instance, the state
vector description) and the value of the node in
the memory file for possible future use.

BENEFITS OF ROTE LEARNING
         --

   As can be seen from the example, since the values of nodes B, C, E,
F, and H were retrieved from the memory file, there is in effect a tree
search emanating from each of these nodes, and this tree search took place
sometime in the past.  Thus the depth of the tree search for node A is
only 2 ply in some areas, and of greater ply in others.  Now if node A
itself is ever retrieved from the memory file for evaluation of another
node, the depth of the tree for that node is even greater.

LIMITATIONS

1)

A potential problem with implementing rote search is the storage and
retrieval of information from the memory file, especially as the file
grows in size.  In cases where rote learning is used on a problem of
significant size searching the memory file becomes a task which can
take the majority of effort.  Techniques from other areas of computer
science may be used to aid in efficiently maintaining and searching
this file.  In addition , it is sometimes necessary or desirable to cull
the file (delete entries).  In this case, heuristics must be devised to
determine which entries to keep and which should be purged.

2) It is difficult to use rote learning in conjunction with learning
schemes which modify the evaluation function (for example, signature
  tables).  The reason for this is that once the evaluation function is
changed, in principle every previously evaluated node in the memory
  file should be re-evaluated, so that the values of of newly evaluated
nodes and previously evaluated nodes may be meaningfully compared.

COMMENTS

1) Samuel found in his Checker playing program that rote learning worked
best in the opening and end games.  He hypothesized that rote learning
functions reasonably well where the results of any specific action are
long delayed, or in situations where highly specialized techniques are
required (the Checker playing program learned to avoid obvious traps in
the end game).

2) By slight modification of the procedure described above rote learning
can be used with other true search techniques, such as the alpha-beta
  search, plausibility ordering, or tree-pruning.  The fact to realize is
that it is not necessary to expand the tree before looking nodes up in
  the memory file.

3) Samuel observed that rote learning can cause nodes which may both lead
to winning situations to receive equal weight in decision making,
although one of the nodes may lead to a win much more quickly than the


173

other.  Since it is usually desirable to play a shorter game, the depth
of the tree search which leads to each node's evaluation should be
considered in this case; the node which has a smaller number of plys to
the win should be chosen.  Thus it may be necessary to store in the
memory file the depth of the search for each node stored in the file.

REFERENCES

    Samuel, A.L.;  "Some Studies in Machine Learning Using the Game of
Checkers," IBM Journal 3, 211-229 (1959).  Reprinted (with minor additions
and corrections) in COMPUTERS AND THOUGHT, edited by Feigenbaum and
Feldman, McGraw-Hill, 1963.


174

SPEECH UNDERSTANDING

Introduction:

    The aim of a "speech understariding" system is determination, for
spoken utterances, of the intended message in relation to the
accomplishment of some task and in spite of indeterminacies and errors in
generation, transmission, and reception of the utterance.   This is to be
distinguished from the aim of a Vspeech recognition" system, which is
provision of an orthographic transcription of the sounds and words
corresponding to the acoustic signal.  Thus the aim of a speech
understanding system does not necessarily include production of an
accurate phonetic transcription of the input signal, or an accurate list
of the successive words of the input (although it must surely correctly
recognize most of them). In other words, if a situation arises in which
acoustic processing is unable to resolve the decision between two phonemes
or words at a particular point in an utterance, but the overall system is
still able to decide the meaning of the sentence, then the sentence is
deemed to have been correctly understood.

    It seems apparent that a speech recognition system requires a number
of different types of processing, each of which corresponds to a different
source of information,  in order to achieve its aims.   It is now well
established that knowledge of vocabulary, syntactic, semantic, and
pragmatic constraints of a language is required to compensate for errors
and uncertainties in the acoustic realization of an utterance.

    In summary, a speech understanding system, as presently conceived,
will generally fit the following description.

1) The system is organized into a number of levels, starting with the
acoustic and working up to the syntactic and semantic.

2) Action is generally from the lower levels upward, utilizing programs
that incorporate knowledge of each particular level.

3) Task limitations are used at several levels to help make selections.

4) The higher levels are sometimes used in a feedback mode at lower
  levels to help make selections.

History :

   Speech recognition research has yielded significant results in the
case of isolated words (accuracy greater than 95%). The primary emphasis
has been on acoustic processing and classical pattern re-cognition and
matching techniques. Straight-forward extrapolation of these techniques to
continuous speech recognition, however, has not proved successful.   It is
felt that a major reason for the difficulties encountered is that the
information used by humans in understanding speech is not completely
contained in the acoustic representation of the speech signal. Experiments
by Klatt and Stevens (1972) in the area of spectrogram reading showed that


775

the performance obtained by human experts for phonetic segmentation and
labelling without conscious appeal to syntactic, semantic, and vocabulary
constraints was: approximately 75% correctly labelled, 15% mislabelled,
and 10% missed, When these other sources of knowledge were used, the
success rate for word identification rose to 96%. These results have
greatly influenced recent research in speech understanding.

Possible Applications:

   Speech would be an appropriate input channel to a computer in many
situations.  The average output data rate is higher for speech than for
writing or typing. Use of the speech channel does not tie up other
effecters, such as hands, eyes, feet, or ears. It can therefore be used
while in motion or in parallel with other channels.  Speech is also a
preferred channel for spontaneous communication of the type that is found
in an interactive environment.

    Long range applications are readily listed. They might include for
example, automatic dictation systems, voice-response order takers, or in
the computer area, a voice operated graphics terminal.

    In the shorter term,  several tasks have been suggested as possible
vehicles for research in speech understanding (Newell et al 1973). They
are:

1) Querying a Data Management System

2) Data Acquisition of Formatted Information
(voice-key-punch)

3) Querying the Operational Status of a Computer

4) Consulting on the Operation of a Computer (i.e.,. a
voice-operated HELP)

Unsolved Problems:

    The following is a brief discussion of unsolved problems in speech
understanding following Newell(1973) and roughly ordered in terms of
system level (i.e. from acoustic at the lowest to semantic at the
highest).

    The essential problem of continuous speech at the acoustic level is
phoneme-level identification and not necessarily segmentation between
words. There is, however, a significant amount known about acoustic-
phonetic and phonological rules which has yet to be fully exploited in
production systems.  The difficulty of adapting to multiple speakers of
different sexes and with different dialects also remains a problem,
although it is hoped that proper normalization of acoustic-phonetic and
phonological rules will make them speaker-invariant,  Two other acoustic-
related problems are environmental noise and possible distortions caused
by the communications channel (e.g. the telephone channel),



176

   At higher levels there are problems with allowable vocabularies.
Present systems attempt to employ vocabularies in which the words are well
separated in a feature space. As vocabularies grow, however, or as the
choice of words becomes constrained (by a task domain, for, example), then
the possible errors in matching can be expected to increase, At the
syntactic level, it is questionable how much more progress can be achieved
without the use of general grammars, as opposed to simple ad hoc grammars.
In this regard, the interface between grammars of this type and the
phonemic processing level is not yet well understood.

   Semantic support is another problem area since many of the
interesting applications of speech understanding do not lend themselves to
precise semantic formulation.  The spontaneity which is a major advantage
to speech input works against an understanding system here.

   From the hardware point of view, there remain the expected problems
of real-time response, processing power, memory size, systems
organization, and cost.

   In summary, significant progress in speech understanding awaits
developments in many areas.  It is hoped, however, that many such
developments will occur in the next few years.

References:

A.  Newell et al "Speech Understanding Systems", North-Holland, Amsterdam,
  1973.

D.H.  Klatt and K.N.  Stevens YSentence Recognition from Visual
Examination of Spectrograms and Machine-Aided Lexical Searching",
Conference Record, 1972 Conference on Speech Communication and
Processing, Newton, Mass., April 1972.


177

PLANNER

Central Ideas:

   Planner is both a problem solving formalism and a programming
language. It stresses the importance of goal-orientation, procedural
representation of knowledge, pattern directed invocation of procedures and
a flexible backtrack-oriented control structure in a problem solver and in
a high level programming language.

Technical Description:

   Planner was developed as a formalism for problem solving by Hewitt
(1972,1973) and a subset of the Planner ideas was implemented by Sussman
et al (1973) in a programming language called Micro-Planner.

   Planner is primarily oriented towards the accomplishment of goals
which can in turn, be broken down into multiple subgoals. A goal in this
context can be satisfied by finding a suitable assertion in an associative
data base, or by accomplishing a particular task.  Multiple goals may be
activated at the same time, as might occur, for example in a problem
reduction type of problem solver. The attempt to satisfy a goal is
analogous to an attempt to prove a theorem,  Planner, however, is not
strictly a theorem-prover. The differences are mainly due to the types of
knowledge which it can manipulate.

   The traditional theorem-prover accepts knowledge expressed in
declarative form, as in the predicate calculus; that is, as statements of
"fact" about some problem domain. Planner, by contrast, is able to deal as
well with knowledge expressed in imperative form; that is, knowledge which
tells the problem solver how to go about satisfying a subgoal, or how to
use a particular assertion.  In fact the emphasis in Planner is on the
representation of knowledge as procedures. This is based on the view that
knowledge about a problem domain is intrinsically bound up with procedures
for its use.

   The ability to use both types of knowledge leads to what has been
called a hierarchical control structure; that is, any procedure (or
theorem in Planner notation) can indicate what the theorem-prover is
supposed to do as it continues the proof.

   Procedures are indexed in an associative data base by the patterns
of what they accomplish.  Thus, they can be invoked implicitly by
searching for a pattern of accomplishment which matches the current goal.
This is known as pattern directed invocation of procedures, and is another
cornerstone of the Planner philosophy.

   The final foundation of Planner is the notion of a backtrack control
structure,  This allows exploration of tentative hypotheses without loss
of the capability to reject the hypotheses and all of their consequences.
This is accomplished by remembering decision points (that is, points in
the program at which a choice is made) and falling back to them, in order
to make alternate choices, if subsequent computation proves unsuccessful.


178

Example:

    The following, somewhat hackneyed, but still illustrative example is
described in pseudo Micro-Planner.  We will assume that the data base
contains the following assertions.

          (HUMAN TURING)
          (HUMAN SOCRATES)
          (GREEK SOCRATES)
together with the theorem

(THCONSE (x) (FALLIBLE $3~)
        (THGOAL (HUMAN $?x)))

where the theorem is a consequent theorem which can be read as - if we
want to accomplish a goal of the form (FALLIBLE $?X), then we can do it by
accomplishing the goal (HUMAN $?X).

    We now ask the question  "is there a fallible Greek ?'I.   This can be
expressed as

(THPROG (X)

(THGOAL (FALLIBLE $7~) $?T)
(THGOAL (GREEK $?x))
(THRETURN $?X))

This program uses a linear approach to answering the question; that is, it
first attempts to find something fallible, then check that what it has
found is Greek. Is so, it returns what it has found.

    Consider what happens when this program is applied to the data base
above. It first finds nothing that is fallible in the list of assert-ions,
and hence tries the theorem, and searches again for something human. It
finds (HUMAN TURING) and binds TURING to $?X. However, an attempt to
prove (GREEK TURING) fails.  At this point, the backtrack control
structure comes into play. The program returns to the last point at which
a choice was made; that is, to the point at which TURING was bound to $?X.
This binding is undone and the data base is searched again for something
human.  This time (HUMAN SOCRATES) is found and SOCRATES is bound to $?X.
An attempt to prove (GREEK SOCRATES) succeeds and SOCRATES is returned as
the value of the THPROG.

   This example illustrates, albeit superficially, the basic tenets of
the Planner formalism as they apply in a programming language. The reader
is encouraged to consult the references for the complete details.

References:

C.  Hewitt, "Description and Theoretical Analysis (using schemas) of
PLANNER: A Language for Proving Theorems and Manipulating Models in a
  Robot", Phd Thesis, MIT,Feb., 1971.

C.  Hewitt, "Procedural Embedding of Knowledge in PLANNER", 2nd IJCAI,
  1971.


179

G. J.  Sussman, T.  Winograd, and E.  Charniak,  "MICRO-PLANNER Reference
Manual", MIT AI Memo 203A, December, 1971.


180

APPENDIX B

AI HANDBOOK OUTLINE

NOTE:

The following material describes work in progress and planned for
publication.  It is not to be cited or quoted out of the context of this
report without the express permission of Professor E. A. Feigenbaum of
Stanford University.

I. INTRODUCTION

A. Intended Audience

    This handbook is intended for two kinds of audience; computer
science students interested in learning more about artificial
intelligence, and engineers in search of techniques and ideas that might
prove useful in applications programs.

B. Suggested Style For Articles

    The following is a brief checklist that may provide some guidance in
writing articles for the handbook. It is, of course, only a suggested
list.

i) Start with l-2 paragraphs on the central idea or concept of
the article. Answer the question "what is the key idea?"

ii) Give a brief history of the invention of the idea, and its use
   in A.I.

iii> Give a more detailed technical description of the idea, its
   implementations in the past, and the results of any
   experiments with it, Try to answer the question "How to do
   it?.

iv) Make tentative conclusions about the utility and limitations
  of the idea if appropriate.

v> Give a list of suitable references.

vi) Give a small set of pointers to related concepts
  (general/overview articles, specific applications, etc.)

vii) When referring in the text of an article to a term which is
   the subject of another handbook article, surround the term by
  +`s; e.g. +Production Systems+.

C. Coding Used In This Outline


181

   This outline contains a list of the major areas of artificial
intelligence covered in the handbook. At the lowest level, the outline
shows article titles either contained or needed. In the case of an article
that is needed, the notation NEED[#] follows the proposed focus of the the
article, where # is a number in the interval [O,lOl. Low numbers indicate
little expected difficulty with the article, whereas high numbers indicate
a potentially difficult article. For example, an article on a specific
system, where only a minimal amount of reading is required would rate
approximately 4, whereas an overview article would likely rate 8 or
greater. In the case of articles which already exist in the handbook, the
notation done[t] is used, where low numbers indicate that the article
needs only minor modifications, and high numbers indicate that major
modifications are required. For example, repair of typographical errors
and wording could be expected to rate O-2.  Correction of errors in the
article might rate 3-6, and major rewrites which require considerable
reading would likely rate 7-10.

   It should be noted that the real difficulty involved in writing an
article is highly dependent on the a priori knowledge of its author.

D. A General View of Artificial Intelligence

Philosophy                         NEED [91
  This article might address the kinds of questions
  raised by Turing's article (CAT), Dreyfus's
  book, the rebuttals, Lighthill's critique,
  McCarthy's reply, and so on.

Relationship to Society                NEED [81
  This might touch on science fiction, popular
  misconceptions, the Delphi survey, and so on.

History                          NEED [91
  Perhaps start with Cybernetics, the Dartmouth
  conference, and so on. See HPS appendix. Also
  note the major centers, their focus and
  personalities. Note the role of ARPA funding on the
  research, the ties to DEC machines and so on.

Conferences and Publications              NEED [61
  AI journal, SIGART, SIGCAS, MI books, IJCAI
  proceedings, CACM, JACM, Cognitive Psychology, some
  IEEE (Computers, ASSP, SMC), Computational
  Linguistics, Special interest conferences: robotics,
  cybernetics, natural language,
  Note the tech note unofficial type documents

II, HEURISTIC SEARCH

A. Heuristic Search Overview

NEED [91


182

Algorithmic presentation of "heuristic search" procedure.

Heuristics for choosing promising nodes to expand next,
heuristics for choosing operators to use to expand a node.

Meta-rules : using heuristics to choose relevant heuristics.

Pervasive character of the combinatorial explosion.

Arguments (both formal and intuitive) supporting the use of
  heuristic search to muffle this explosion.
  Formal : Completeness of A*; Knuth's recent work on
     alpha-beta search.

Opportunities for future research
  Where do heuristics come from?
    (see Simon's current work; meta-rules; meta-meta-...?)
Modifying heuristics based on experiences
     (see Berl.iner `s current work)
  Working with symbolic, rather than numerical, values for nodes
  Coding heuristics as production rules
    (e.g.: view Mycin as a heuristic search)

Situations NOT suited to attack by heuristic search
Typically:  non-exponential growth process; no search anyway
    (e.g., finding roots of a quadratic equation)

Identity problems

Disguising Heuristic Search as something else
Disguising something else to appear to be a Heuristic Search

B. Search Spaces

1. Overview                        NEED [81
  The concept of a search space; how a search space
  can be used to solve (some) problems; different
  representations, different spaces

2. State-space representation             done [63
  [2 articles exist here, which ought to be unified]

3. Problem-reduction representation         done [31

4. AND-OR trees and graphs

done [4]

C.  llBlind" Search Strategies

1. Overview

NEED 151

2. Breadth-first searching

done [23

3. Depth-first searching

done [21

4. Bi-directional searching

NEED [61


discuss heuristics. MI articles by ira Pohl.

5. Minimaxing

done [31

6. Alpha-Beta searching

done [31

D. Using Heuristics to Improve the Search

1. Overview
  The idea of a heuristic

done [7l

The idea of a heuristic evaluation function
savings in change of representation,

2. Best-first searching                 done [41
  (Ordered-search) but need to add: Martelli's
  work (ask Nils for a draft of this)
  speech ret: IJCAI-3 (Paxton), Reddy's book

3. Hill climbing

done [31

4. Means-ends analysis

done [31

5. Hierarchical search, planning in abstract spaces NEED [4]
  Abstrips (Sacerdoti)

6. Branch and bound searching

done [41

7. Band-width searching
    Harris - AI journal

NEED [43

E. Programs employing (based on) heuristic search

1. Overview                        NEED [71
  Comparison of systems.  Results & limitations,
  (This first article should be written later as an
  introduction to the following articles.)

2. Historically important problem solvers

a) GPS

b) Strips

c) Gelernter's Geom. Program

III.  Natural Language

A. Overview

1.  Early machine translation             done [5l

NEED [41

NEED [4]

NEED [31


184

Failures of straight forward approaches

2.  History and Development of N.L.         NEED 183
  Main ideas (parsing, representation)
  comparison of different techniques.  mention ELIZA, PARRY.
  Include Baseball, Sad Sam, SIR and Student articles here.
  see Winograd's Five Lectures, Simmon's CACM articles.

B. Representation of Meaning
  (see section VII -- HIP)

C. Syntax and Parsing Techniques

   1. overviews

     a. formal grammars
     b. parsing techniques

  2. augmented transition nets, Woods

  3. Shrdlu*s parser (systemic grammars)

  4. Case Grammars Bruce (AI Journal, l/76)

   5. CHARTS - well formed substrings

  6. GSP syntax & parser

   7. H. Simon - problem understanding

  8. transformational grammars

D. Famous Natural Language systems

  1. SHRDLU, Winograd

   2. SCHOLAR

   3. SOPHIE

E. Current translation techniques

done [3l
NEED [61

done [31

done [5l

NEED [51

NEED [61

NEED [61

NEED [71

done [5l

NEED 151

NEED [51

NEED [51

NEED [81

  Wilks- work, commercial systems (Vauquois)

F. Text Generating systems                 NEED C81
  Goldman, Sheldon Klein, Simmons and Sloan (in S&C)

IV.  AI Languages


  A. Early list-processing languages

      overview article

done [31


185

languages like COMIT, IPL, SLIP, SNOBOL, FLPL

Ideas: recursion, list structure,
     associative retrieval

B. Language/system features

0. Overview of current list-processing
languages

NEED [71

1. Control structures, what languages they     NEED [63
are in and examples of their use.

Backtracking (parallel processing)
Demons (pseudo-interrupts)
Pattern directed computation

2. Data Structures (lists, associations,
bags, tuples, property lists,...)

NEED [51

Once again, examples of their use
is important here.

3. Pattern Matching in AI languages
     see Bobrow & Raphael

4. Deductive mechanisms
  see Bobrow & Raphael

C. Current languages/systems
  1. LISP, the basic idea

2. INTERLISP

3. QLISP (mention QA4)

4. SAIL/LEAP

5. PLANNER

6. CONNIVER

7. SLIP

8. pop-2

9. SNOBOL

10. QAj/PROLOGUE

NEED [63

NEED [51

done [2]

NEED [51

done [3l

done [2]

done [2]

done [2]

NEED [4]

NEED [41

NEED [4]

(see thm. prov.)

V.  AUTOMATIC PROGRAMMING


186

A. Overview                           done [71
B. Program Specification Techniques
  i.e. how does the user describe the program to
  be synthesized?

--an overview article including various methods NEED[9]
  see SAFE system (ISI), Green's tech. report,
    DSSL, Smith's graphic specification, and
    include general remarks on the high-level
     language methods

C. Program Synthesis techniques              NEED[91
  - given a description of the program in some form,
  generate the actual program

1. Traces                         done[31

2. Examples                        done[31
  (include Biggerstaff at U. of Washington)

3. Problem solving applications to AP        NEEDf91
--including classical problem-solving techniques,
   plan modification, "pushing assertions across
   goals," and theorem proving techniques.
  (debugging (Sussmans's Hacker), Simon's Heuristic
  Compiler, and Prow (Waldinger) & QA3)
  (Should Theorem-Proving-Techniques remain a
   separate article?)

4. Codification of Programming Knowledge      NEEDC?]
  see C.Green's work, Darlington, Rich & Shrobe

5. Integrated AP Systems                NEED[?]
  see Lenat's original work, Heidorn, Martin's
  OWL, PSI at SAIL

D. Program optimization techniques            NEED [71
  How to turn a rough draft into an efficient
   program.  See Darlington 8 Burstall, Low, Wegbreit, Kant.

E. Programmer's aids
  (Interlisp's DWIM, etc)

NEED [73

F. Program verification
  (IJCAI 3)

NEED [71

VI.   THEOREM PROVING

A. Overview

NEED [91

B. Resolution Theorem Proving

1. Basic resolution method

done [41


187

2. Syntactic ordering strategies           done [21

3. Semantic & syntactic refinement

[4. other strategies?]

C. Non-resolution theorem proving

1. Natural deduction

2. Boyer-Moore

3. LCF

D. Uses of theorem proving

1. Use in question answering

2. Use in problem solving

3. Theorem Proving languages
    (QA3, Prologue)
4. Man-machine theorem proving
  (Bledsoe)

E. Predicate Calculus

done [21

done [31

done [31

done [61

NEED [51

NEED [61

done [51

F. Proof checkers

VII.  Human Information Processing - Psvchologv

(see Perry's outline for details and references)

A. Perception

NEED [91

An overview of relevant work in psychology
on attention, visual and auditory perception,
pattern recognition.  Applied perception (PERCEIVER).
Difficulties resulting from inability to introspect.

B. Memory and Learning

1. Basic structures and processes in IPP

NEED [91

Short- and Long-term memory, Rehearsal, Chunking,
Recognition, Retrieval, recall, Inference and
question-answering, Semantic vs. episodic memory,
Interference and forgetting, Type vs. token nodes
Simon - Sciences of the Artificial

2. Overview of memory models, Representation   NEED [lo]


188

How to get to the airport: A comparison of
the various models.

a. Associative memory'models

l. semantic nets                NEED [91
  Quillian (TLC), Nash-Weber (BBN)
  Shapiro, Hendricks (SRI), Wood's article
  in Bobrow & Collins, Simmons (S&C)

2. HAM (Anderson & Bower)

3. LNR: Active Semantic Networks

4. Componential analysis
  Jakendoff, Schank (conceptual
  dependency), (MARGIE), G. Miller

5. EPAM

6. Query languages
  Wood's (19681, Ted Codd (IBM SJ)

     b. Other representations

        1. Production systems

       2. Frame systems (Minsky, Winograd)

       3. Augmented Transition Networks

       4. Scripts (Schank, Abelson)

C.  Psycholinguistics

  A prose glossary including:
  Competence vs. performance models, Phonology
  syntax vs. semantics vs. pragmatics, Surface

NEED 171

NEED [61

NEED [91

NEED [51

NEED [71

done Ill

done [7l

done [31

NEED [71

NEED [91

vs.

vs.

deep structure, Taxonomic grammars, generative grammars,
transformational grammars, Phrase-structure rules,
transformation rules, Constituents, lexical entries
Parsing vs. generation, Context-free vs. Context-sensitive
grammars, Case systems (e.g., Bruce AI article)

D. Human Problem Solving -- Overview

  1. PBG's

2. Concept formation (Winston)

3. Human chess problem solving

NEED [81

done El1

done 121

NEED [61


189

E. Behavioral Modeling

1. Belief Systems
  Abelson, McDermott

NEED [81

2. Conversational Postulates (Grice, TW)      NEED 151

3. Parry

VIII.   VISION

A. Overview

NEED [51

NEED [91

This article should discuss the early work in vision;
its roots in pattern recognition, character recognition,
Pandemonium, Perceptrons and so on. (i.e.. the pre-Roberts
work). It should discuss the main ideas of modern vision
work as a leadin to the more specific articles, for
example the use of hypothesis, model, or expectation
driven strategies, It should also discuss the way in
which the focus of the field flip-flops from front end
considerations to higher level considerations with
time.

B. Polyhedral or Blocks World Vision

An overview article should include the major
ideas in this work together with brief
summaries of the work of the major investigators.
In addition, separate articles should be written
on the work of those listed below.

Overview                       NEED [71
(Roberts, Huffman and Clowes, Kelley,
  Shirai and others listed below)

Guzman

done [21

Falk

NEED [51

Waltz                          NEED [71
  This article should contain more general
  material on constraint satisfaction, drawn
  possibly from Montenari and Fikes

This exhausts my list. Please add others or delete some
of mine if appropriate.
It has been suggested [Belles] that the most instructive
method of writing these articles would be to provide
simple examples of the problems attacked by the various
programs.


190

C. Scene Analysis

Overview                          NEED [91
  This article should describe or point to detailed
  strategies used, and the present state of the
   art.

The following articles should be written or modified to
describe the specialized tools of scene analysis.
See Duda and Hart.

Template Matching
(a non-mathematical description)

NEED [51

Edge Detection                       done [41

Homogeneous Coordinates                done c71
   This article should be modified to include
  the general questions of the perspective
  transformation, camera calibration, and so on.

Line Description

done [4]

Noise Removal

done [41

Shape Description

done [41

Region Growing (Yakamovsky, Olander)

done c31

Contour Following

NEED [43

Spatial Filtering

NEED [41

Front End Particulars                 NEED C61
  This article should contain some description of the
  methods and effects of compression and quantization
  for example.

Syntactic Methods

NEED [51

Descriptive Methods
  See Duda and Hart, and Winston

NEED[6]

D. Robot and Industrial Vision Systems

Overview and State of the Art

NEED [91

Hardware

NEED E81

E. Pattern Recognition

It's not clear just where this discussion should go, or
what level of detail is required,

Overview

done [81


191

This article needs to be refocussed and cleaned up

IX.

Statistical Methods and Applications

NEED [91

Descriptive Methods and Applications

NEED CSI

F. Miscellaneous

Multisensory Images                   NEED 171

Perceptrons

SPEECH UNDERSTANDING SYSTEMS

NEED [61

Overview  (include a mention of ac. proc.)     done [31

Integration of Multiple Sources
of Knowledge                       NEED [91
  For example the blackboard of the HEARSAY II system

HEARSAY I

done [4]

HEARSAY II

done [51

SPEECHLIS

done [21

SDC-SRI System (VDMS)

NEED [71

DRAGON                            done C61
  Jim Baker's original system plus Speedy-Dragon by
   Bruce Lowerre. This article is a little harder than
  the other system articles because the methods used
  may be unfamiliar to some.

X.  ROBOTICS

Overview                          NEED [91
   This article should discuss the central issues and
  difficulties of the field, its history, and the
  present state of the art.

Robot Planning and Problem Solving         NEED [81
  For example, STRIPS and ABSTRIPS. This article
  could be quite general depending on the point of
   view taken.

Arms                              NEED 181
  Explain the difficulties of control at the bottom
  level, system integration, obstacle avoidance
   and so on. Also note the problems with integration
  of multi-sensory data, for example vision and


192

touch feedback.

XI.

Present Day Industrial Robots            NEED [71

Robotics Programming Languages
  For example WAVE, and AL
   (a short article)

Applications of AI
              --

NEED [61

An overview article.  What are the attributes   NEED C81
of a suitable domain?  Custom crafting -
theory vs. actual use. (See EAF: 225 notes, 1972)

A. Chemistry

1. Mass spectrometry (DENDRAL, CONGEN, meta-dendral)   done [61

2. Organic Synthesis
  Overview                        NEED [83
  Summarize work of Wipke, Corey, Gelernter, and Sridharan

B. Medicine

1. MYCIN

doneEl

2.  Summarize DIALOG(Pople), CASNET(Kulikowski), NEED171
  Pauker's MIT work, and the Genetics counselling
   programs

C. Psychology and Psychiatry
  Protocol Analysis (Waterman and Newell)

NEED [63

D. Math systems

1.   REDUCE                          NEED [41

2.  MACSYMA (mention SAINT)              NEED [61

E. Business and Management Science Applications

1.  Assembly line balancing (Tonge)         NEED [51

2.  Electric power distribution systems      NEED [51
  (MI)

F. Miscellaneous

1. LUNAR

2. Education
  Papert,  or more ?

NEED [51

NEED [71


193

  3, SRI computer-based consultation

  4. RAND--RITA production rule system for
    intelligent interface software


I. Miscellaneous

NEED [61

NEED [51

Overview of music composition and aesthetics   done [71

XII.  Where do these &
      ---

Reasoning by analogy                    done 141

Intelligence augmentation
Chess

done [51
done [51

XIII.  Learninq-
            and Inductive Inference

Overview

Samuel Checker program

Winston

Pattern extrapolation problems--Simon,
Overview of Induction

NEED [91

NEED [51

done [21

NEED [51


194

APPENDIX C

HEURISTIC PROGRAMMING PROJECT WORKSHOP

   In the first week of January 1976, about fifty representatives of
local SUMEX-AIM projects convened at Stanford for four days to explore
common interests.  Six projects at various degrees of development were
discussed during the conference.  They included the DENDRAL and META-
DENDRAL projects, the MYCIN project, the Automated-Mathematician project,
the Xray-Crystallography project, and the MOLGEN project.  Because of the
interdisciplinary nature of each of these projects, the first day of the
conference was reserved for tutorials and broad overviews.  The domain-
specific background information for each of the projects was presented and
discussed so that more technical discussions could be given on the
following days.  In addition the scope and organization of each of the
projects was presented focusing on the tasks that were being automated,
how people perform these tasks, and why the automation was useful or
interesting.

   In the following days of the workshop, common themes in the
management and design of large systems were explored.   These included the
modular representations of knowledge, gathering of large quantities of
expert knowledge, and program interaction with experts in dealing with the
knowledge base.  Several of the projects were faced with the difficulties
of representing diverse kinds of information and with utilizing
information from diverse sources in proceeding towards a computational
goal.  Parallel developments within several of the projects were explored,
for example, in the representation of molecular structures and in the
development of experimental plans in the MOLGEN and DENDRAL projects.  The
use of heuristic search in large, complex spaces was a basic theme to most
of the projects.  The use of modularized knowledge typically in the form
of rules was explored for several of the projects with a view towards
automatic acquisition, theory formation, and program explanation systems.

   For each of the projects,  one session was devoted to plans for
future development.  One of the interesting questions for these sessions
was the effect of emerging technology on feasibility of new aspects of the
projects.  The potential uses of distributed computing and parallel
processing in the various projects were explored, particularly in the
context of the DENDRAL project.

   Most of the participants felt that the conference gave them a better
understanding of related projects.  And because many members of the SUMEX-
AIM staff actively participated, the workshop also provided all projects
with information about system developments and plans.  The discussions and
sharing of ideas encouraged by this conference has continued through a
series of weekly lunches open to this whole community.


195

APPENDIX D

TYMNET RESPONSE TIME DATA

  Following are statistics on one-way character transit time delays
over the TYMNET derived from the collected TYMSTAT data between June 1975
and April 1976.  The first line in each section contains the node ID.
Then for each month when data were available for that node, the succeeding
tables in the section give the number of data points collected and delay
statistics in milliseconds for various parts of the day (Pacific Time).
These data have been the basis of numerous conversations with TYMSHARE
over the past year attempting to correct intolerable delay times,  That
fight goes on!

An index to particular nodes follows:

PAGE
p.  195
P.  196
P,  196
p.  197
p.  197
P.  197
P.  198
P.  198
P.  199
p.  200
p.  201
p.  201
p.  202

P.  203
P.  203
p.  204
p.  205
p.  205
p,  206
P.  207
P.  207

NODE
1010
1011
1012
1014
1017
1022
1023
1027
1034
1036
1037
1043
1051
1054
1060
1063
1072
1073
1112
1116
1173

OAKLAND
WASHINGTON
CHICAGO
MIDLAND
PALO ALTO
WASHINGTON
SEATTLE
LOS ANGELES
NEW YORK
NEW YORK
LOS ANGELES
ST LOUIS
PORTLAND
SAN JOSE
MOUNTAIN VIEW
PITTSBURGH
PALO ALTO
UNION
NEW YORK
CHICAGO
VALLEY FORGE

CALIFORNIA    415/465-7000
D.C.        703/841-9560
ILLINOIS     3121346-4961
TEXAS       915/683-5645
CALIFORNIA    415/494-3900
D.C.        703/521-6520
WASHINGTON    206/6x-7930
CALIFORNIA    213/683-0451
NEW YORK     212/532-7615
NEW YORK     212/344-7445
CALIFORNIA    213/629-1561
MISSOURI     314/421-5110
OREGON       503/224-0750
CALIFORNIA    408/446-4850
CALIFORNIA    415/965-8815
PENNSYLVANIA   4121765-3511
CALIFORNIA    415/326-7015
NEW JERSEY    201/964-3801
NEW YORK     212/750-9433
ILLINOIS     312/368-4607
PENNSYLVANIA   215/666-9190

1010 OAKLAND           CALIFORNIA    OAK1

July 1975

E o * 415/465-7000
--

        05:00-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number              1
Average Delay    282.0
Std Deviation        .O
Minimum Delay      282
Maximum Delay      282

August 1975
        05:00-09:OO 09:00-17:00 17:00-22:00 22:00-05:OO


196

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

1011 WASHINGTON       D.C.      WASSRl C ** J'o1/841-9560

365.:

.O
365
365

July 1975    05:00-09:OO 09:00-17:00 17:00-22:00 22:00-05:OO

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

   1
204.0
  .O
204
204

September 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

05:00-09:OO 09:00-17:OO li':OO-22:00 22:00-05:OO
                5
           177.6
           38.9
             123
              227

October 1975

05:00-09:oo 09:00-17 :oo 17:00-22:oo 22:00-05:OO

Number             1
Average Delay    153.0
Std Deviation       .O
Minimum Delay      153
Maximum Delay      153

November 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
     2
  144.5
   13.5
   131
   158

1012 CHICAGO           ILLINOIS    E fJ 3l2/146-4961
                                 CH12

December 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

05:00-09:OO 09:00-17:00 17:00-22:00 22:00-05:OO
346.:         3
           393.0
3406       160.8
             214
   346        604

March 1976

05:00-Og:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO


197

Number              2
Average Delay     251.5
Std Deviation     66.5
Minimum Delay      185
Maximum Delay     318

MIDLAND
10 14

June 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

PALO ALTO
1017

July 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

TEXAS       MDLI C   j15/683-5645

05:00-0g:oo og:oo-17:oo 17:00-22:oo 22:00-05:OO
     2                                 1
  525.0                        310.0
  158.0                               .O
   367                           310
   683                          310

CALIFORNIA    PA1   c z 415/494-3900

o`j:oo-0g:OO Og:OO-17:OO 17:00-22:00 22:00-05:Of
                           1
                    414.0
                          .O
                      414
                      414

WASHINGTON
1022               D.C.        WAS2  E ** 703/521-6520
                                                     --

July 1975    05:00-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number                      3
Average Delay              188.0
Std Deviation               10.6
Minimum Delay               173
Maximum Delay              196

September 1975
         05:00-0g:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number                      3
Average Delay              197.3
Std Deviation              23.5
Minimum Delay               165
Maximum Delay                 220

October 1975
         05:00-0g:oo og:oo-IT:00 17:00-22:oo 22:00-05:OO
Number                          2
Average Delay              261.0
Std Deviation               3.0
Minimum Delay              258
Maximum Delay                264


IQ8

November 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

December 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

1023 SEATTLE           WASHINGTON    SEA1 C   206/622-7930

05:00-09:OO 09:00-17:00 17:00-22:00 22:00-05:
    3
  242.7   3001:
  64.5       161.8
   153        129
   302       774

00

05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
                 2
           208.0
          49.0
             159
             257

September 1975
         05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number              1
Average Delay    385.0   391.:
Std Deviation                  .O
Minimum Delay   38;        39 1
Maximum Delay     385        391

March 1976     '
          05:00-09:OO  09:00-17:00  17:00-22:00  22:00-05:OO
Number                         1
Average Delay             805.0
Std Deviation                  .O
Minimum Delay               805
Maximum Delay              805

1027 LOS ANGELES        CALIFORNIA    LA2   E ** 213168%0451
                                                     --

December 1975
           05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number                        2
Average Delay              162.0
Std Deviation                6.0
Minimum Delay               156
Maximum Delay               168

January 1976
         05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number             3
Average Delay     172.3
Std Deviation     9.4
Minimum Delay      161
Maximum Delay      184


199

1034 NEW YORK          NEW YORK     NYCSRl E ** 212/532-7615
    NEW YORK          NEW YORK     NYCSRl E ** 212/551-9322
      --              --                --

June 1975
        05:00-0g:oo og:oo-17:oo 17:00-22:oo 22:00-05:OO
Number                      8
Average Delay             561.9
Std Deviation              98.9
Minimum Delay              407
Maximum Delay                709

July 1975    05:00-09:OO Og:OO-17:OO li':OO-22:00 22:00-05:OO
Number             3
Average Delay    511.3   518.;
Std Deviation    53.8      105.3
Minimum Delay     458        407
Maximum Delay      585        732

September 1975
          05:00-09:oo 09:00-17:oo  17:00-22:oo 22:00-05:OO
Number               2
Average Delay    418.0   365:;
Std Deviation     95.0      187.7
Minimum Delay      323       187
Maximum Delay      513        828

October 1975
        05:00-0g:oo og:oo-17:oo 17:00-22:oo 22:00-05:OO
Number
Averaqe Delay  712::   3941;
Std Deviation    523.5      147.2
Minimum Delay      335        182
Maximum Delay     1783       768

November 1975
        05:00-0g:oo og:oo-17:oo 17:00-22:oo 22:00-05~00
Number            19
Average Delay    635.4   3802;
Std Deviation    511.0       55.4
Minimum Delay      224        264
Maximum Delay     2183         510

December 1975
         05:00-0g:oo og:oo-17:oo 17:00-22:oo 22:00-05:OO
Number            13         33
Average Delay    855.2      931.2
Std Deviation    996.8      908.4
Minimum Delay      190        223
Maximum Delay     2763       3035

January 1976
        05:00-0g:oo og:oo-17:oo 17:00-22:oo 22:00-05:OO
Number             4         11
Average Delay    466.0      591.4


Std Deviation     152.7       180.0
Minimum Delay      226        233
Maximum Delay      621        901

February 1976

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

05:00-09:oo og:oo-17:oo 17:00-22:oo 22:00-05:OO
     2         11
  508.5      709.7
  53.5      160.3
   455        466
   562       1028

March 1976

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

05:00-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
     8
  849.8   581.85
  315.1      230.1
   487        331
  1351        953

April 1976

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

05:00-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
    13          6
1180.4      794.3
  511.8      304.0
   529        471
  2108      1346

10'16 NEW YORK        YORK
                     NEW        NY1   E o * 212/344-7445

June 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

05:00-09:OO 09:00-17:OO li':OO-22:00 22:00-05:OO
     4         3
  687.8      495.3
  66.9      134.8
   609        339
   756       668

July 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

05:00-09:oo og:oo-17:oo 17:00-22:oo 22:00-05:OO
     6          1
  426.5      847.0
  77.2
   338    Silj
   562       847

200

September 1975
        05:00-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number                       4
Average Delay            380.8
Std Deviation             34.0
Minimum Delay             346
Maximum Delay               428



201

m ANGELES
1037                 CALIFORNIA   C E 213/629-1561
                               LASRl

December  1975
          05:00-09:OO  09:00-17:00  17:00-22:00  22:00-05:OO
Number                       1
Average Delay               121.0
Std Deviation                  .O
Minimum Delay                121
Maximum Delay                121

1043 ST LOUIS          MISSOURI     u c   314/421-5110

June 1975
        05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number
Average Delay  8001:   766:;           2
                              309.0
Std Deviation     211.1       212.4       39.0
Minimum Delay     431        480        270
Maximum Delay     1124      1347       348

July 1975
         05:00-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number            16        83          11
Average Delay    649.3      679.9      325.9
Std Deviation    152.9      238.7       53.9
Minimum Delay     435       243        244
Maximum Delay      971        1550        420

August 1975
         05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number            8         27
Average Delay    660.6      601.9   302.:
Std Deviation    235.8      209.8
Minimum Delay      242        268    3;:
Maximum Delay     942       1079        302

September 1975
         05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number            8         20
Average Delay    569.4      538.7   369.0'
Std Deviation     221.0      228.4       95.0
Minimum Delay      333       238       274
Maximum Delay     988        939        464

October 1975
         O`j:OO-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number    5171;         26           2
Average Delay            516.3      218.0
Std Deviation    110.6      168.8        9.0
Minimum Delay     380        237        209
Maximum Delay      757       960        227

November 1975
         05:00-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO


202

Number              2         9          1           1
Average Delay    500.5      532.1      258.0       225.0
Std Deviation     85.5       119.7          .O          .O
Minimum Delay      415        320        258        225
Maximum Delay     586        770       258        225

December 1975
          05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number             4          9           1
Average Delay    498.0      345.9      294.0
Std Deviation    157.2      178.6          .O
Minimum Delay      315        155       29'1
Maximum Delay     749       807       294

January 1976
           05:00-09:OO  09:00-17:00  17:00-22:00  22:00-05:OO
Number                      14
Average Delay  374.:      399.6

Std Deviation       .O      174.1
Minimum Delay     374        177
Maximum Delay     374       943

February 1976
          05:00-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number        3441:         3
Average Delay                        172.0
Std Deviation              87.9        7.0
Minimum Delay               153       163
Maximum Delay              491        180

March 1976
           05:00-09:OO  Og:OO-17:OO  17:00-22:00  22:00-05
Number    849.65         12          4           1
Average Delay             432.7      381.3       160.0
Std Deviation    722.3      265.5      306.2          .O
Minimum Delay      210       238        160        160
Maximum Delay     1779        1200        909        160

#OO

April 1976
           05:00-09:OO  09:00-17:00  17:00-22:00  22:00-05:OO
Number             4          10           1
Average Delay    300.0       279.5       175.0
Std Deviation     36.0       82.0          .O
Minimum Delay      251         201         175
Maximum Delay     347       431         175

PORTLAND
1051

OREGON

PORl c   503/224-0750

August  1975
           05:00-09:OO  09:00-17:00  17:00-22:00  22:00-05:OO
Number                        1
Average Delay              299.0
Std Deviation                  .O
Minimum Delay               299


203

Maximum Delay               299

December 1975
         05:00-09:OQ Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number    666.:                           3
Average Delay                               229.7
Std Deviation    110.7                         14.4
Minimum Delay      519                             210
Maximum Delay     786                           244

January 1976  05:00-0g:OO Og:OO-17:00 17:00-22:00 22:00-05:OO
Number                                 4
Average Delay                     458.3
Std Deviation                      154.5
Minimum Delay                        266
Maximum Delay                        614

1054 SAN JOSE

CALIFORNIA     CRP2   ? o * 408/446-4850
                - --

August 1975   05:00-0g:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number                        1
Average Delay               211.0
Std Deviation                  .O
Minimum Delay                211
Maximum Delay                211

MOUNTAIN VIEW
1060

CALIFORNIA
           AMEI   E *@ 415/965-8815
                        --

June 1975
         05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number                                3
Average Delay                      287.0
Std Deviation                        88.0
Minimum Delay                        171
Maximum Delay                       384

July 1975    05:00-0g:oo og:oo-17:oo 17:00-22:oo 22:00-05:OO
Number
Average Delay      318.:
Std Deviation             124.7
Minimum Delay                 220
Maximum Delay              494


204

PITTSBURGH
1063              PENNSYLVANIA PIT1 C   412/765-1511

June 1975
         05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number                         2
Average Delay             471.5
Std Deviation              45.5
Minimum Delay               426
Maximum Delay               517

September 1975
         05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number                     3
Average Delay             268.7
Std Deviation             49.5
Minimum Delay                 200
Maximum Delay               315

November  1975
         05:00-og:oo Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number              1
Average Delay    283.0
Std Deviation       .O
Minimum Delay     283
Maximum Delay     283

December 1975
         05:00-09:OO 09:00-17:OO li':OO-22:00 22:00-05:OO
Number              1
Average Delay    267.0
Std Deviation       .O
Minimum Delay     267
Maximum Delay     267

February 1976
         05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number
Average Delay      668.:
Std Deviation
Minimum Delay       6ki
Maximum Delay               668

March 1976
         05:00-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number              1                                1
Average Delay    297.0                        266.0
Std Deviation        .O                               .O
Minimum Delay      297                           266
Maximum Delay      297                           266


205

1072 PALO ALTO

August 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

1073 UNION

June 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

August 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

October 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

November 1975

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

January 1976

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

March 1976

05: 00-09 :00  09:00-17:00  17:00-22:00  22:00-05:OO
                1                     1
           169.0                148.0
               .O                    .O
            169                  148
            169                  148

CALIFORNIA    PCOSRl E ** 4151126-7015
                        --

NEW JERSEY    UNISRl E o * 201/964-3801
                        --

05:00-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
        371.:
                      9.0
                    362
                    380

05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
     1           1
  484.0      692.0
     .O          .O
   484       692
   484       692

05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
769.:   485.0 1
  97.5          .O
   672        485
   867       485

05:00-09:OO  09:00-17:00  17:00-22:00  22:00-05:OO
641.:   689 Iii
  204.4      178.2
   419        476
   1106        1055

05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
     1
  281.0
     .O
   281
   281

05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO


206

Number
Average Delay
Std Deviation
Minimum Delay
Maximum Delay

688.;
221.5
467
910

April 1976    05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number              1
Average Delay    1125.0
Std Deviation        .O
Minimum Delay     1125
Maximum Delay     1125

1112 NEW YORK          NEW YORK     NYCSR2 C ,* 2121750-9433
    NEW YORK
      m-          NEW YORK
                            --    NYCSR2 C E 212/750-9445

June 1975
         05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number             4
Average Delay    668.5   308:;
Std Deviation    207.6       51.3
Minimum Delay     458        232
Maximum Delay      960       439

July 1975
         05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number             5          7
Average Delay    655.2      532.9
Std Deviation    176.9       104.2
Minimum Delay      401       356
Maximum Delay     891       679

August 1975
         05:00-09:OO 09:00-17:OO 17:00-22:00 22:00-05:OO
Number
Average Delay      600.:
Std Deviation
Minimum Delay       60:
Maximum Delay               600

December  1975
        05:00-09:OO Og:OO-17:OO 17:00-22:00 22:00-05:OO
Number              1
Average Delay    894.0
Std Deviation        .O
Minimum Delay     894
Maximum Delay     894


207

CHICAGO
1116                   ILLINOIS     CHISRl C 2 3 12/168-4607

August 1975   05:00-0g:oo og:oo-17:oo 17:00-22:oo 22:00-05:OO
Number                         1
Average Delay              166.0
Std Deviation                  .O
Minimum Delay               166
Maximum Delay                166

VALLEYFORGE
1173                  PENNSYLVANIA   VFOSRl g   2151666-g 190

December 1975
         05:00-0g:oo og:oo-17:oo 17:00-22:oo 22:00-05:OO
Number                       4
Average Delay  311.:      392.8
Std Deviation               102.4
Minimum Delay   3;':        266
Maximum Delay      311        511

January 1976
        05:00-0g:oo og:oo-17:oo 17:00-22~00 22:00-05:OO
Number                       4
Average Delay             457.5
Std Deviation               28.2
Minimum Delay                421
Maximum Delay              496


APPENDIX E

MAINSAIL DESIGN SUMMARY

A MACHINE-INDEPENDENT PROGRAMMING SYSTEM

           Clark R. Wilcox
SUMEX Computer Project, Stanford University
        Stanford, California

ABSTRACT

A general-purpose programming system is being
developed for the support of portable software,
and as a tool for research into machine-
independent code generation. The issues involved
in such a design project are discussed, and an
overview is given of the approach taken for
MAINSAIL.

INTRODUCTION

   Much effort is now expended in the development of software whose
conceptual framework, at least, is already well-understood and documented.
A significant amount of time spent in such development is invariably
attributable to the particular environment in which the program will
execute, rather than the function of the program itself,  An algorithm is
easily overwhelmed by implementation details, and its intention obscured
by the resulting program. The source language, the operating system, the
size of the machine, the file system, the debugging facilities, the time
schedule, the demands of efficiency: all seem to conspire against clarity
and generality. The original purpose, and the means used to obtain a
running program, can become inextricably enmeshed, the result having no
application beyond its limited context.  The program becomes tied to the
machine, the operating system, a particular version of the operating
system, and the various local enhancements, and certain terminals, with
given keyboards and character sets;  it continually becomes obsolete, never
works quite right, and dies a certain death when the author departs. And
yet essentially the same program is developed for other machines, and
meets the same fate. There seems to be neither the time nor the tools to
do it right once, and distribute it;  indeed, everyone is busy writing his
own version.

   If a program is to find general use beyond the confines of a
particular implementation, the multitude of machine-dependent traps must
be defended against at every turn. Whether this necessarily entails a loss
in efficiency (program size and execution time), and the inability to use


209

local features which might otherwise enhance performance, is becoming less
clear, and certainly less important as memory and processing rates
increase. The programming task is being given increased scrutiny, with an
eye to the elimination of duplication, obscurity and inflexibility solely
for the purpose of execution-time efficiency.   Software is viewed more as
a product with general applicability than as a means to an end. The
tremendous effort required for a quality software product is resulting in
a less tolerant attitude towards programs which must be totally rewritten
if "moved" to a new machine.

   If programmers had access to programming systems which aided in the
creation of portable software, then perhaps we would be surprised at the
tasks now considered machine-dependent which could be cast in a more
general mold, passed from one machine to another, with possibly minor
changes isolated and well-documented, To gain acceptance, such a system
must balance several conflicting requirements without adversely affecting
its ease of use.

PORTABILITY

   The programming system itself must be transportable among a wide
variety of machines. Its design must incorporate the means to insure
compatible versions among machines, and to allow a new machine to be
implemented with a minimum of effort. A language standard, presumably
enforced in all implementations, is not sufficient. There is little chance
that every version will be totally compatible. A standard retards the
introduction of improvements and new ideas,  since every implementation
requires concurrent upgrading to preserve compatibility. The orchestration
of such updates across a broad class of computers is prohibitive. Thus the
parallel development of the programming system on many machines is not
sufficient, and is an example of the very redundancy which a machine-
independent programming system can alleviate.  Such is the case with many
languages which are now used for program portability, for example FORTRAN,
COBOL, BASIC and SIMULA.

   If a single version of the system could be written and distributed
to all sites, then an elegant solution would be provided to the problem of
maintaining compatibility, and hence portability. There would be no need
for a language standard, since each site would use the same compiler.
Every version would without question be compatible, since there would be
only one version. Any changes to the system would be immediately
transmitted to all sites by merely sending copies of the updated software.
Errors found by one site result in fixes for every site.

    This type of distribution can take place if the programming system
is written in its own language.  All software comprising the MAINSAIL
programming system is itself written in the MAINSAIL language. The
compiler can compile itself, and its own runtime system. It is easily
bootstrapped since it is written in a subset of MAINSAIL which can be
compiled by an existing compiler for the language SAIL, from which
MAINSAIL is derived. Furthermore, the creation of a MAINSAIL system for a
new computer is largely automated by a compiler-generator program.


210

   The programming system itself is one example of the portability of
programs written for the system. As a corollary, user programs can be
written which will execute correctly on any implementation. The
consequence3 of being able to move programs freely among several computers
and operating systems are far-reaching. Program3 may be shared among all
site3, regardless of what computers are involved. At a single site, the
same language can be used on all computers, thus promoting program
interchange, and removing the problems involved with using different
languages on each computer. If one computer system becomes unavailable,
programs may be moved to another, The introduction of new computers may
take place without fear that existing programs will become obsolete: it is
only necessary that the programming system be implemented on the new
system.

EFFICIENCY

    In order to compete successfully with existing programming systems,
a machine-independent system must offer advantages greater than the
penalties derived from its lack of intimacy with the host machine. While
this statement is nearly tautological, it nevertheless suggests the
tradeoffs between efficiency and portability which must be dealt with in
the design of such a system.  Machine-independence is more a question of
degree than possibility, since, in theory at least, even an extremely
limited machine can be made to simulate the operations of the most
powerful.

   In order to obtain an acceptable level of efficiency, few
assumptions concerning the target machine3 should be embodied in the
programming system. It would be unacceptable to model all target machines
as stack machines, if this model must be carried to the point of code
generation.  Similarly , register usage, linkage conventions,
addressability, and storage allocation must not be given rigid
characteristics if the system is to be truly portable. Interpreted code
cannot be emitted in every case.  Such consideration3 seem to rule out the
effectiveness of a well-defined "abstract machine" for which code is
generated. Instead, the code should be made to fit each target machine as
well as most compilers now fit the machines for which they were designed.
In many cases MAINSAIL is able to generate better code than existing
compilers. For example MAINSAIL produces about 10 percent less code than
the SAIL compiler, which was designed for a particular machine (PDP-10).

MACHINE-DEPENDENCIES

    Somewhat paradoxically, a machine-independent programming system can
benefit from features which support its use in machine-dependent
applications. If the language attempt3 to ban any construct3 which it
consider3 machine-dependent, then programs which by their nature are
heavily dependent on a particular machine configuration cannot be written.
Programmers who would prefer use of the language must turn to another for
such purposes; their preferences may be similarly turned.

At the very least, linkage should be allowed to external procedures


211

written in other languages, 30 that a library of procedures of local
interest can be constructed. If such a procedure is very short, say merely
a call to the operating system,  then the overhead for a procedure call may
be unacceptable. In this case, the ability to insert assembly language
directly into the program is most useful.

   By its very design, MAINSAIL can benefit from machine-dependencies.
Though most of the runtime system is written for portability in MAINSAIL,
some system procedures are too machine-dependent to be written once for
all computers. When writing these procedures for a particular
implementation, it is desirable to use MAINSAIL if possible, because of
the ease with which the machine-dependent portion can be interfaced with
the machine-independent parts. Thus the entire runtime system may be
written in MAINSAIL, which seems almost magical considering that
everything else is also written in MAINSAIL,

   There is of course a danger in explicitly allowing the introduction
of machine-dependencies into the language.  Programmers may begin using
such constructs when not really necessary,  so that the advantages of using
a portable language are lost.

LANGUAGE DESIGN

   In designing a general-purpose language for portability, one is
immediately faced with the problem of data representation, for this is
most closely dictated by the underlying machines. The selection of
primitive data types must not be too narrow to prevent the full use of
more powerful machines, nor too broad to require extensive simulation on
smaller machines. Two basic approaches for data definition suggest
themselves : offer standard definitions from which the programmer must
choose ; or give the programmer control over data characteristics such as
range and precision. These approaches can be contrasted for the primitive
data type integer.

   The first would offer one or more standard ranges, for example
INTEGER and LONG INTEGER, with ranges corresponding to, say, 16 bits, and
greater than 16 bits (an upper bound would be of dubious value).   These
ranges would correspond to the minimal ranges expected for all computers
to be implemented, and the programmer would understand that in a program
written for portability, LONG INTEGER would preclude its use on computers
with a small word size, unless this type were simulated. On larger
computers, INTEGER might be represented with, say, 32 bits, and programs
written specifically for such machines could make use of the full range.

   The second approach would include, with each declaration, range
information, for example the smallest and largest values. The compiler
would use this information to allocate the integer, presumably choosing
different representations for different ranges. The programmer need
consider only the characteristics of his data, rather than the various
machines which are to support his program.  The inclusion of a range
specification is also a useful form of program documentation, and aids the
compiler in checking that the variable is properly used. Of course, the
programmer must realize the consequences if his integer range is beyond
that of a 16-bit word.


212

   MAINSAIL presently offers the first approach with data types
INTEGER, LONG (integer), REAL, and DOUBLE (real). LONG and DOUBLE are
useful if the hardware provides these extended data types, or they are
necessary for the intended applications, but must be supported by
software. In the latter case they are expensive to use, and the single
precision types should be employed where possible, In either case,
machine-dependent considerations are involved in deciding to use these
types, and thus they cannot appear in "portable" programs. This approach
simplifies the compiler design, and perhaps results in more efficient code
for smaller machines, where this is most crucial. The type BITS, for
logical operations on bit vectors, is also offered, and defined as
providing at least 16 bits. Thus the data types are optimized for ease of
implementation, rather than optimal use of storage on machines with larger
words. The compiler is never concerned with an attempt to "pack" a data
type into the available words.

   MAINSAIL says nothing about the bit patterns used to represent data.
For example, integers can be represented as ones complement, twos
complement,  or even decimal. Bit operations are allowed only on the type
BITS, with standard conversions among BITS and INTEGER. An INTEGER is
converted to BITS by forming the binary representation of the integer
(undefined if the integer is negative), Similarly, a BITS is converted to
INTEGER by forming the non-negative integer whose binary representation is
given by the bits. Thus it can be determined whether a positive integer is
odd by converting to BITS and testing the low-order bit, no matter what
representation is being used.

    Another issue of data representation is the character codes.
MAINSAIL offers the type STRING, which is a variable-length sequence of
characters (the number of characters is automatically kept track of).
There are two operations which are concerned with character codes: the
first character of a string may be converted to its integer code; and an
integer may be converted to a string of one character. The codes used to
store characters within strings are of no consequence; there is only a
need for a standard code during the two operations. MAINSAIL decrees that
the ASCII codes are in effect whenever an integer is deemed to be a
character code. Each implementation is responsible for any necessary
conversions to and from the internal codes used in string storage.

    In order to allow the runtime system to be largely written in
MAINSAIL, some assumptions concerning memory and addressability are
necessary. The amount of memory required by each data type is measured in
"storage units." The physical interpretation of a storage unit is machine-
dependent; for example, a storage unit may be a "byte" or a "word. I1 The
number of storage units required by n consecutive values of the same type,
for example elements of an array, is n times the size of a single value.
However , sizes of consecutive values of differing types cannot be added to
obtain a total size, since machine-dependent "padding" may occur between
the allocations for alignment purposes.

    The type ADDRESS is introduced for manipulating memory addresses. A
memory model is adopted which specifies only those addressing
characteristics necessary for the simplest memory accesses. For example,
an address is not used to indicate a particular character of a string,


213

since this is not possible on some machines without additional information
concerning the location of the character within a word.  Associated with
each STRING is a "string descriptor" which contains the current length,
and the location of the first character, A string descriptor is a
primitive data type, since an integer-address pair may not be sufficient,

   Addressability, and the associated issue of program linkage, is an
area which requires special attention. MAINSAIL allows programs to be
written as separate texts, called "segments." These segments are
separately compiled, and linked together to form a program in some
machine-dependent manner, Inter-segment communication is provided by
global data and procedures. Each segment is given a name and
characteristics such as MAIN and OVERLAY, A variable or procedure is
declared "external" by preceding its declaration with the name of the
segment which contains its "internal" occurrence. If a procedure is
internal to an OVERLAY segment, then that segment must be brought into
memory before the procedure can begin execution, MAINSAIL does not provide
the facilities for such overlay handling, but does include the syntax for
specifying which segments are overlays,

   A machine must provide for an address composed of a static or
dynamic base (possibly external), with a static or dynamic offset. Static
means that the value does not change during program execution, i.e. it is
known at compile-time (within relocation). Thus a computer which does not
provide indexing will produce inefficient code.  A single level of indirect
addressing can also improve the code quality. For example, if an address
variable is in memory, it is useful to be able to access, say, an integer
pointed to by the address, without first loading the address into an index
register.

   The syntax of expressions and statements is more distant from the
underlying machine, so that there are few difficulties in removing
machine-dependencies. Perhaps the overall result is a clear and
straightforward syntax, since the prejudices and peculiarities exhibited
by more machine-dependent languages are missing. There are no exotic data
operations, since every machine would have to support such operations.
Probably no machine will have instructions corresponding to every
operation, though some come rather close. For example, BITS can be shifted
left or right by any amount. Some machines have instructions which do just
this; others require several instructions, or even a procedure call.
STRING operations are generally too complicated to be carried out in-line,
and thus there is no requirement for byte addressability or compact byte-
manipulation instructions.

COMPILER DESIGN

   The primary consideration in the design of a machine-independent
compiler is the interface between what is known about the language and
assumed about all target machines, and what is left to be supplied for
each implementation. If too much is assumed, then the class of machines is
unduly restricted, and clumsy devices may be necessary to resolve a
distorted model to reality, resulting in needless inefficiencies. If too
little, then the generation of a new system could be a major undertaking,
retarding the spread of the system to new machines.


214

   In contrast to a compiler-compiler which has no knowledge of the
source language, the MAINSAIL language and compiler evolved by an
iterative process. Features which were felt necessary for an efficient
compiler were simply put into the language. Similarly, the language was
modified in those areas requiring an inordinate amount of time or space
for compilation. With regard to optimizations, this intertwining of design
may result in additional statements in the compiler, yet a smaller
compiler when the optimized version compiles itself.

   The compiler consists of two passes in order to cleanly separate the
machine-independent and dependent phases. The first pass converts the
source program to an intermediate language,  and the second translates this
intermediate language to the target assembly language (which must be
assembled by some machine-dependent assembler not provided by MAINSAIL).
The intermediate language consists of operators with a variable number of
operands, The operators reflect either MAINSAIL operations, such as
addition; program structure, such as procedure entry; or internal
information, such as the handling of temporaries.  In most cases an operand
is a pointer into the symbol table.

   This is quite different from an attempt to generate intermediate
code for an abstract machine. For example, the intermediate code for
"a := a + b1 might be <push a>, <add b>, <pop a> if the abstract machine
were stack-oriented, whereas MAINSAIL generates <add b a>. In the former
case, a register-oriented machine could certainly simulate the pushes and
pops, but the generated code would be of dubious quality. A machine with a
memory-to-memory add would suffer even more. MAINSAIL, however, generates
intermediate code which captures only what is in the source program, with
no assumptions concerning the target machine. The <add b a> can involve
registers, a stack, memory-to-memory, or even a procedure call.

    The second pass consists of a machine-independent part, and a
machine-dependent part which is translated from a code-generation
language. The machine-independent part is responsible for creating a
convenient interface to the machine-dependent part, consistent with the
separation between the two. It fetches the intermediate instructions, and
sets up the operator and operands for easy accessibility. It supplies
answers to questions concerning the operands, or the current code
generation environment which it is responsible for maintaining.

   MAINSAIL employs a general notion of register which is useful in a
number of contexts. An operand is always associated with a memory
location, and may be temporarily marked as loaded in a register. The
compiler provides several services related to registers, such as: mark an
operand in a register, clear a register, or find the "best" free register.
It will automatically load and store registers when necessary.  A register
may also be marked as containing the address of an operand.

   The services provided for registers are never invoked unless the
code generators either directly request a service, or indicate that
registers are to be used in certain situations (for example, to pass
procedure parameters).  Thus code can be generated for machines with no
registers, for example a stack machine (actually, the top of the stack can
be modeled as a register). A code-generation environment is created and


215

maintained which is flexible enough to be of use for a wide variety of
computer architectures. Many checks insure the internal consistency of the
environment, for example a register cannot be marked with two operands at
the same time, By knowing the rules of this environment, code generators
can be written for a new computer with minimal effort.

   The code-generation language provides a powerful and convenient
setting in which to specify code sequences. Declarations give semantic
information concerning register usage, storage units, additional symbol
table entries, and various parameters used within the compiler and runtime
system. A code generator must be written for each intermediate
instruction. A generator has available to it services such as those
discussed above, and the operands of the intermediate instruction. In
general a code-generator looks like the assembly language which it is to
produce, except it contains keywords which are replaced during code
generation with operand names, registers, or constants. The code-
generation language is translated to MAINSAIL, and hence the full power of
MAINSAIL is available. In practice, the constructs provided are sufficient
for almost all situations which arise during code generation.  A code
generator usually takes the form of a series of conditions, each followed
by pseudo assembly language which is to be processed if the condition is
satisfied. The complexity of the conditions is determined by the degree to
which the target machine conforms to the general framework provided for
code generation, and the amount of optimization desired. Procedures can be
used for commonly occurring code sequences.

  Since code generators are associated with intermediate instructions,
they provide only for local optimization. Because of the extreme ease with
which the code generators can be altered, a compiler can be created from
the current generators, and its output examined for errors and
inefficiencies. Based on this, the generators can be altered, a new
compiler created, and so forth. This process continues until the code
appears correct, and is sufficiently efficient. Construction of a new
compiler from a few changes in the generators can be done in a matter of
minutes. Thus a single session spent tuning the generators can produce
significant results.

   The formal separation of target-machine semantics from the more
general aspects of code generation has an exciting potential for research
into the design of instruction sets. Since a wide variety of computers can
be described with the code generators, experiments can be conducted to
test features such as the number of registers, the utility of indirection,
or various procedure linkages. Existing machines can be compared to
determine which is best suited for a high-level language implementation.
For example, an instruction set which allows complete addresses can be
compared with one which offers a base with small displacement, to
determine which requires the fewest memory accesses. A micro-coded
instruction set based on the MAINSAIL intermediate instructions would
produce optimized code sequences.

   The facility with which code generators can be written makes
MAINSAIL accessible to one-of-a-kind machines. For example, there is now
under construction a three-address parallel processor with no registers
which will use MAINSAIL as its high-level language. Programs can be


216

written, and the code examined, before the machine is complete (even the
assembler for the new machine can be written in MAINSAIL!). Providing such
a machine with a high-level language would be a major undertaking if the
compiler, runtime system and assembler had to be written in assembly
language.

RUNTIME DESIGN

   The runtime system provides support during program execution:
program initialization, file manipulation, i/o, conversions among string
and numeric-bits, string handling, mathematical routines, string and
record collection, and dynamic memory allocation. If MAINSAIL is to be
used as an implementation language, then it may be desired to limit the
size of the runtime package. Since the system procedures are used only in
response to implicit or explicit requests, program3 may be written which
require little, if any, support. For example, programs which involve only
arithmetic, logical and address operations, with no i/o, string handling
or dynamic storage allocation, may be compiled into assembly language
programs which call only the system initialization procedure.  By removing
this call, a self-sufficient program is obtained which can be combined
with hand-coded assembly-language modules. In this sense, MAINSAIL can be
regarded as a convenient means of generating assembly language programs.

   Mathematical routines for trigonometric functions, exponentiation,
logarithm,  square root, and random numbers have been written in MAINSAIL,
accurate to at least 17 decimal digits in most cases. Since they are
written in MAINSAIL, there are of course no assumptions regarding word
size or representation. The obscurity of their assembly language
counterparts is in stark contrast to the clarity with which the algorithms
are expressed in a high-level language, and has probably contributed to
the astounding number of time3 they have been written, over and over
again, for different machines.  The Same can be said of the MAINSAIL
routines for conversion between string and floating point numbers.

   MAINSAIL has a well-developed i/o capability, including any number
of sequential and random files, and terminal interaction, File names are
represented as strings, and the format of these strings is transparent to
MAINSAIL, since they are handled only by machine-dependent routines.
There are two types of sequential files: text and data. Text file3 are
meant for legible text, for example a program or document. Whenever
numeric or bits data is written to a text file, an automatic conversion is
made to a string representation; similarly, such reads from a text file
automatically scan for the proper string representation.

   A data file contains machine-readable data in some machine-dependent
format. Any mixture of numeric and bits can reside on a data file,
presumably stored in a compact form identical to the internal
representation within the computer. Since no conversion is necessary,
input and output is efficient.

   A random file is composed of fixed-length blocks of data, called
file-blocks. Reads and writes supply a file-block number, and the entire
file-block is involved in the transfer. A file-block is read into, or


217

written from,  a memory area whose address is supplied to the read or write
routine.

   Files can be opened, closed, and deleted, Additional file-
manipulation routines can be added for each site. Much of the i/o activity
is handled in a machine-independent manner, so that only a few well-
defined elementary procedures need be written for each machine.

CURRENT STATUS

   MAINSAIL now runs on a PDP-10 with TENEX, and a PDP-11 with RTll.
Development is under way for a PDP-10 with TOPSlO, a PDP-11 with UNIX, and
the IBM-370. Code has also been generated for an INTERDATA 7/16, VARIAN
and NOVA. Many more machines were examined while developing MAINSAIL, and
will be considered for implementation as sufficient resources are made
available.

   A number of projects across the country are interested in using
MAINSAIL for the development of portable software. Among these are a
robotics project, a mass spectrometry system, a program for chemical
structure elucidation (now written in LISP), a computer-aided-instruction
system for the teaching of logic,  an automated cell classification
laboratory, a machine-independent version of INTERLISP, and a display-
oriented text editor.


218

APPENDIX F

SUBSYSTEMS AND DOCUMENTATION DIRECTORIES

Nancy Smith
December 1974
(updated April 1975)
(updated Sept. 1975)
(updated Oct. 1975)

   The sources of available documentation for these programs will be
abbreviated as follows:

TUG  Tenex User's Guide (1975 edition)
DUH   DEC Users Handbook
DAL   DEC Assembly Language Handbook
DML   DEC Mathematical Languages Handbook
HC   a hard-copy manual for the language
OL   on-line documentation which can be found by
   @DIR <DOC>programname,* .  The following extensions are
   used on the <DOC> directory:
        .MANUAL     complete usually fairly long manual
      .HELP or .HLP shorter summary, list of commands, etc.
        .SUPPLEMENT   on-line supplement to hard-copy dot
        *UPDATE     list of updates by date
        .SAMPLE     sample program or output

   See <DOC>A-LIST-OF-ALL-AVAILABLE-DOCUMENTS.INFO for complete details
on these documents including where and how to order them.

   Many of the major programs also have a <BULLETINS>programname.BBD
file where messages about new developments, bugs, hints for using the
program etc. are sent.  These <BULLETINS> files can be read by any of the
mail reading programs (READMAIL, RD, MSG, or BANANARD).

   New programs or new versions of old programs will be put on <NEWSYS>
for a trial period.  The file <NEWSYS>NEW-SYSTEMS.INFO which is a message
file will have a message about each program available.  These new programs
will not be included in the list of programs given here.

   The HELP program obtained by typing @HELP gives assistance in
finding the appropriate on-line documents for the various programs.


219

SUBSYS                   DESCRIPTION                      DOC
------------------------------------------------------------------------

2SIDES
ACCESS
ADDMSG
AID
AIFAIL
ALIAS
BAIL
BACKUP
BANANARD
BASIC
BCPL
BINCOM
BLISlO
BLISll
BLISS
BOOTGT
BUDGET
BYE
CALENDAR
CAM
CCL
CLEAN
COPYM
CREF
CRSREF
CRYPT5
DCHANGE

DCHECK
DDT
DED
DELOLD
DELVER
DFTP

DIABLO
DIREXT

DO
DOM

DONE
DROP
DSKACC
DTACOP
DUMPER
EOFIX
EXTR

F40

FAIL

makes files for multi-columns and/or 2-sided listing OL
gives a list of subsys's currently available to GUESTS
appends a msg to a specified file
algebraic interpretive dialog conversational lang. HC
assembly lang. - early version of FAIL from SU-AI OL,HC
allows a dummy name to be set up for a program
SAIL debugger (on <SAIL>)                      OL
short term file loss protection                 OL
msg reading program (many extra features)         OL
conversational programming lang.  (DEC version) OL,DML,TUG
compiler writing and systems programming lang.    HC
binary comparison of files (now replaced by FILCOM) DAL
compiler for system implementation (DEC version) OL,HC,TUG
BLISS for the PDPll
compiler for system implementation (TENEXized)  OL,HC(DEC)
loader for the PDPll (GT40)
budget management program (especially proposals) OL
@BYE same as @BREAK (LINKS)

calendar management and reminder system        OL,TUG
the compare and merge program of SOUP see <DOC>SOUP.MANUAL
concise command language                  OL,DUH
a file by file directory clean-up program        OL
reading/writing DECtapes                 OL,TUG
cross-reference assembly listing             OL,DAL
TENEX cross-referencing program (outfile_infile(s))
En/Decrypts textfiles to provide security        OL
character set conversion for llforeign" tapes            OL
see <DOC>DCHANGE.MANUAL and <DOC>DCHANG.HLP

reads blocks of file into core & calls DDT to examine OL
debugger (single-stepping added at IMSSS)     OL,TUG,DAL
text-editor (designed for TENEX)                OL
deletes files by cutoff date of last access       OL
deletes excess versions of files               TUG
file transfers to and from the Datacomputer.      OL
(for certain special file storage needs)

prints final copy of PUB-produced documents on DIABLO OL
prints directory information for files sorted by OL
        file extension rather than file name
creates or appends a line to a reminder file      OL
effects the assembly and loading of a single
      MACRO program                 OL
deletes a line from a reminder file             OL
similar to DELVER, deletes oldest and 2nd newest on *.*
gives dsk allocation for all members of accounting groups
DECtape to DECtape copy
reads/writes magnetic tapes
deletes any pages past end of file mark          OL
"EXTRactor" processes MACRO/FAIL source files to
produce .FAI listing of labels defined
FORTRAN IV (see also <DOC>FORTRAN.HELP and     OL,TUG,DML
       <DOC>LISP-FORTRAN-INTERFACE.HELP )
assembly language (BBN version of FAIL)        OL,HC
(see also JSYS manual & <DOC>SUMEX-JSYS'S.INFO)


220

FED
FILCHK
FILCOM
FILDMP
FILES
FILEX

FORMAT
FORTRA
FREQ
FRKCOM
FTP
FUDGE2
GETDMP

GRIPE
HELP
HOSTAT
IDDT
IFAIL
ILISP
IMSSS
INSPEX
KILL
LAST
LD
LINK10
LINK11
LINKSTAT
LISP
LOADER
LOADGT
LOADVT
LOWCASE
LPTSTS
MAC11
MACRO
MAILBOX
MAILSTAT
MANTIS
MATHLAB
MLAB
MSGFIX
MTACPY
MTCOPY
MULTI
MY-ACCOUNTS
NDIR
NETSTAT
NEWFILES
NEWINFO

NODE
NON

the final edit program of SOUP see <DOC>SOUP.MANUAL
checks SAIL programs for loader incompatibilities OL
complete file comparison package          OL,DAL,TUG
dumps files in variety of formats                 OL
multiple to multiple copies, renames, protections
for file transfers converts between DEC machine   OL,DUH
formats for dsk and DEC-tape.

makes table of contents & index for SAIL sourcefiles OL
FORTRANlO(version 4) (see also <DOC>FORTRAN.HELP) OL,HC
ranks words in text file according to frequency
compares an address space with address space of file TUG
ARPANET file transfers                          TUG
updates/manipulates files containing rel programs  DAL,TUG
loads into core .dmp file from SU-AI (SAV only to

677777) type filename to * prompt
sends comments or complaints about system to staff   TUG
helps locate on-line documentation
prints network site status information          TUG
DDT for inferior forks                   TUG,OL
assembly language (IMSSS version of FAIL)       OL,HC
UC Irvine LISP (extension of LISP 1.6)           OL
direct link to IMSSS
checks files for wasted space and pages past eof OL
closes all jfns--useful when RESET can't get a file closed
Gives date, time of last full dump, archive or daily dump
prints SYSTAT-like info
DEC loader                        OL,DAL,TUG
linker for PDPll DOS operating system
prints status of IMSSS link
INTERLISP-see also <DOC>LISP-FORTRAN-INTERFACE.INFO OL,HC
(from IMSSS)-see <DOC>LINKlO-LOADER-DIFFERENCES.HELP TUG
GT40 standard format loader
loader for PDPll (CT401
converts a text file to lowercase
gives the files on the lineprinter queue & their size OL
MACRO cross-compiler for the PDP11
assembly lang-JSYS manual dr <DOC>SUMEX-JSYS'S.INFO TUG,DAL
to reroute mail (not fully implemented yet)       OL
info on queued mail                          TUG
Fortran debugger
interactive symbolic algebraic system            OL
mathematical modeling and graphics package       OL
TECO routine to help fix the format of messages
magtape program                              TUG
DEC magtape program                           OL
multiple-fork supervisor--switches between forks
prints user's valid accountnames
gives compact list of files on connected directory
prints info on ARPANET status                  TUG
directory information for files written in last 24 hrs OL
gives all new files on public directories or for any OL

file group (includes number of reads for each file)
gives the geographical location of a TYMNET node
zero-compresses file, options to remove linenumbers,
pagemarks, convert eel's, etc.


221

PCSAMP
PDP6DT
PIP
PIP11
PNTMAK
POET
PPL
PROFIL
PUB
PUB2
RD
READMAIL
RECOG

RECORD

REDUCE
RPURGE

RSEXEC
RTTY
RUNFIL
RUNOFF
SAIL
SCAN
SEARCH

SEARCHDIR
SEARCHP
SEGSAV
SITBOL
SNDMSG
SNOBOL
SORT
SOS
SPELL
SPSS
SRCCOM
STP
SUBMIT
SWITCH
SYSDPY

SYSIN
TABLE
TALK
TAPCNV
TBASIC
TCTALK
TECO
TELNET
TIPCOPY
TMERGE
TODAY
TRITAP

measures the operation of other user programs     TUG
DEC-tape program
DEC utilities program                   OL,DUH
transfers PDPll DOS DECtapes to/from TENEX files OL
converts underlines to suitable format for LPT:   OL
text editor designed for TENEX use               OL
an interactive extensible programming lang.       TUG
gives freq of execution of SAIL statements      OL,HC
document preparation lang.                     OL
2nd pass of PUB -- used separately to change underlines
mail reading program (MSG is better)             TUG
mail reading program (MSG is better)             TUG
when ordinary recognition is ambiguous RECOG gives     OL
the possible filename matches

for pseudo-ttys, typescript of job, detaching     OL
from running job

symbolic algebraic language                     OL
requires confirmation before purging (delete & expunge) OL

R puts info on purged files in a file by date
restricted access only                        TUG
types out a file starting at the end (reverse)    OL
uses file instead of tty for input commands      TUG
document-preparation language (DEC not BBN version) OL
ALGOL-like lang.-see also <DOC>LEAP.MANUAL      OL,HC
scans multi-directories for a variety of file info OL
searches multi-text files for English words or SAIL OL

identifiers,  can be used with TV editor
substring search of directory information on files OL
substring search also allows random reading of file OL
reads .shr & .low files to produce TENEX .sav     OL
compiler version of SNOBOL                      OL
message sender                        OL,TUG
string-processing programming lang.          OL,HC
stand alone COBOL column-oriented text file sorter OL,TUG
text editor                                 OL
spelling checker/corrector for text files (not TENEX) OL
Statistical Package for the Social Sciences       OL
compares text files                            TUG
Western Michigan University StaTistical Package   OL
submission to batch (see <DOC>BATCH.HELP)
switches the format of a reminder file           OL
gives SYSTAT-like info constantly updated on display OL

(CRT) terminal
executes LISP SYSOUT's                        OL
creates conversion tables for DCHANGE
used with LINK command to eliminate need for ;'s
reads card image file processed by MTACPY        TUG
TENEXized version of DARTMOUTH BASIC            OL
teleconferencing over ARPANET                  OL
text editor (see TENEX TECO manual)           OL,TUG
restricted access only                         TUG
sends text files to a TIP port              TUG,OL
merges specified text pages from files into new file OL
lists the contents of today's reminder file       OL
processes magtapes from XEROX, IMSSS, BBN        OL


222

TTYTRB
TTYTST
TV
TVFIX
TYMSTAT

TYPBIN
TYPEIN
TYPREL
UPCASE
WATCH
WATCH.IMS
WHAT
WHO
WHOIS
VIEW
XED
XT
Z

used to report terminal line problems            TUG
prints test patterns for diagnosing terminal      TUG
text editor for TEC and DATAMEDIA displays       OL
restores bad TV files (see <DOC>TV.MANUAL)
(for TYMNET lines only) gives measure of current

efficiency of TYMNET transmission
does an octal dump of a packed file             TUG
appends type-in to file with some editing allowed   OL
analyzes contents of .REL files                TUG
converts an entire file to uppercase
continuous on-line monitoring  of system activity TUG
IMSSS version of WATCH
lists the contents of a reminder file           OL
prints SYSTAT-like information
looks up username & prints name/address info on user OL
examines a file word by word, several typeout modes OL
text-editor (used with BANANARD)                OL
reformats and prints text file                 OL
logs jobs off including from inferior (lower) forks &

prints a witty saying


223

  .HELP;2                  1
2SIDES.HELP;3              3
A-GENERAL,HELP;12            2
A-GUIDE-TO-TENEX-USER'S-GUIDE,INFO;2         5
A-LIST-OF-AVAILABLE-DOCUMENTATION.INFO;8      14
A-SURVEY-OF-THE-DEC-HANDBOOKS.INFO;lO         5
ACCOUNT-NAME-USAGE,INFO;2     3
AID.HELP;Q                  2
  .INF0;3                  1
ALL-SUBSYS'S-AVAILABLE-AT-SUMEX,INFO;8        7
BACKUP.HELP;2                2
BAIL.HELP;5                 1
  .MANUAL;3               17
  .UPDATE;l               3
BANANARD.HELP;l
BANK.MANUAL;2      i6
BASIC.HLP;2                 2
  .UPDATE;2                12
BATCH.HELP;3              3
  .UPDATE;2                4
BLISlO,HLP;4                 2
  .UPDATE;2                10
BLISS.HELP;2                 2
BSYS.MANUAL;3              25
BUDGET.MANUAL;7            9
  .UPDATE;2                1
  .SMP;2
CALENDAR.MANUAL;2     :,
CCL.HELP;2                  2
CHECKDSK.HELP;3             4
CHESS.HELP;l               3
CLEAN.HELP;l                 2
COPYM.HELP;2                5
CREF.HLP;l                   1
  .UPDATE;2                 2
CRYPT5.HELP; 1                2
DCHANG.HLP;2                 2
DCHANGE.MANU AL;1             12
DCHECK.HELP;l                1
DDT.SUPPLEMENT;l             2
  .HELP;l                  1
  .BRIEF;2                4
  *SUMMARY;1              9

<DOC> DIRECTORY LISTING

   The following is a listing of the <DOC> directory which contains
most of the on-line formal documentation about the system and subsystems,

<DOC>      13-MAY-76 08:19:25

FILE NAME        SIZE (COMPUTER PGS)


224

DEC-HANDBOOK-GLOSSARY-UPDATE.INFO;l          3
DEC/TENEX-COMMAND-EQUIVALENTS.INFO;4          11
DED.MANUAL;l               15
DELOLD.HELP;l               1
DESCRIPTION-OF-SUMEX-AIM-PROJECTS.INFO;j      4
DFTP.HELP;j                5
DIABLO.HELP;g              7
DIREXT.HELP;l                2
DOM.HELP;l                  1
DUMP.INFO;l                 1
EDIR.MANUAL;2              9
  .HELP;l                  1
  .UPDATE;l                1
EDIT.INFO;l                 2
EDITOR-PROGRAM-INTERFACE,INFO;2              2
EOFIX.HLP;2                 1
FAIL.MAMJAL;j              70
  #HELP;5                3
FILCHK.HELP;l                1
FILCOM.HLP;4                1
FILDMP,HELP;2               2
FILEX.HLP;l                  2
  .UPDATE;l               4
FLECS.HLP;l                  2
FORDDT.HLP;l                1
  .UPDATE;l                 2
FORMAT.HELP;l              3
FORTRA.HLP;l                1
FORTRAN,HELP;2              11
FTP.UPDATE;l               3
  .ANONYMOUS-ACCESS;1       3
GLOB.HLP;l                  1
  .UPDATE;l                 2
GRUMP.HELP;l                 1
GT40-LIGHTPEN.HELP;l         3
GT40-LIGHTPEN-IMPL.DOC;l      8
GT'+O-OMNI-MONITOR-DIRECTIONS.HELP;l
GT'+O-OMNIGRAPH.INFO;l         2
GT'+O/OMNI-MONITOR,DOC;2       2
GUEST-ACCESS-SUMEX.INFO;l      1
GUEST-LOGIN.HELP;l            1
HOW-TO-UPDATE-DOC,INFO;j
IDDT.HELP;l       i
ILISP.MANUAL;l             116
  .TENEX-MANUAL;1         49
  .HELP;2                  2
INSPEX.HLP;l                1
INTERROGATE.HELP;4            2
INTRO-TO-SUMEX-AIM-TENEX,INFO;5
ISAIL.HELP;l                 1
JSYS-INDEX.INFO;l            5
LEAP.MANUAL;j              15
LINKlO.HLP;l                 2
  .UPDATE;j               8
LINKlO-LOADER-DIFFERENCES.HELP;l


225

LISP.HELP;3                  2
  -UPDATE;5                4
LISP-FORTRAN-INTERFACE.HELP;2
LIST.HELP;3               6
LOADER.UPDATE;l              2
LPTSTS.HELP;l                2
MACRO.HLP;l                  1
  .UPDATE;3                11
MAILBOX.HELP;l               2
MAKLIB.HLP;l                1
MARK-MSGS.HELP;l              2
MATHLAB.HELP;3              4
MLAB.HELP;l               14
MSG.MANUAL;Q               17
  #UPDATE;3               5
MTCOPY.HLP;2                1
MULTI.HELP;l                1
NEW-SOS-TO-SUMEX-SOS-COMPARISON.HELp;3
NEW-VERSION-SOS,INTRO;l43
  .MANUAL;l               31
  .SUPPLEMENT; 144            20
NEWFILES.HELP;2              2
NEWINFO.HELP;l               2
NOTE.HELP;l                  2
OLDFILES.HELP;l              1
OMNIGRAPH-USER'S-GUIDE.INFO;l
OVERVIEW-OF-COMPUTER-SYSTEM.INFO;l
PAGESCAN.HELP;l              1
PCAL.HELP;3                 2
PIP.HLP;~                  1
  .UPDATE;2                10
PIPll.HELP;l                 2
PLOTTER.INFO;l               2
PNTMAK.HELP;l
POET.HELP;l       :
  -MANUAL;1              13
PROFIL.UPDATE;2              2
PROJECTS-AND-ASSOCIATED-USERS.INFO;69
PSEARCH.HELP;l               2
PUB.MANUAL;3              62
  .HELP;5               47
  *UPDATE;10              8
RADIX.HELP;l                1
RECOG.HELP;l                 2
RECORD.MANUAL;3             19
REDUCE.MANUAL;l             44
RPURGE.HELP;l                2
RTTY.HELP;l                 1
RUNOFF.HLP;l                 2
  .UPDATE;l              24
  .COMMANDS;l             3
  .HELP;l                  1
SAIL.HELP;2                 1
  .SUPPLEMENT;b           34
  .TENEX-SUPPLEMENT;2       7

2

94
2

7



226

  .BEGIN-MANUAL;2          57
SAMPLE.PUB;l                 2
SCAN.HELP;l                 1
SCROLL.MANUAL;Q             7
  .HELP;2                  2
SEARCH.MANUAL;j             6
  .INF0;6                  2
SEARCHDIR.HELP;l             1
SEARCHP.HELP;l                2
SEGSAV.HELP;l                1
SETTING-UP-NEW-USER-DIRECTORIES.INFO;l
SIMCOM.HLP;l                 2
SIMDDT.HLP;l                5
SIMRTS.HLP;l                 2
SIMULA.HLP;j                 2
SITBOL.HELP;l              18
SNDMSG,HELP;7              3
SNOBOL.MANUAL;l             40
SORT.HLP;l                  2
  .INFO;l                  1
SOS,HELP;5                7
  .MANUAL;l              54
SOUP.HLP;l                  2
  .MANUAL;2              13
SPELL.MANUAL;lO             7
SPSS.MANUAL;l                11
SRCCOM.UPDATE;l
STP.MANUAL;l      E13
  .INDEX;l                  2
SUMEX-JSYS`S.INF0;2          12
SUPXEC.MANUAL;2             9
SYSDPY.HELP;l                2
SYSIN.HELP;l                 1
SYSTEM-SCHEDULE,INFO;4        2
TAPE.INFO;l                 1
TCTALK,MANUAL;2            28
  .SAMPLE;l                2
  .HELP;l                4
TECO.SAMPLE;l
  .COMMANDS;l      :
  .HELP;l                  1
  .TEXTl;l                7
  .TEXT2;1
  .SUMMARY;l      z
TEKTRONIX.HELP;l              1
TELNET.INFO;l                2
TENEX-EXEC-MANUAL-UPDATE.INFO;6
TERMINAL-LINKING.INFO;l       1
TIPCOPY.HELP;l               1
TMERGE.HELP;q               1
TRITAP.HELP;l                1
TV .UPDATE; 12              9
  .MANUAL;9              26
TYMNET-INSTRUCTIONS.INFO;~    3
TYPEIN.HELP;2                2

16


227

UPCASE.HELP;l                1
USER-NAME-ADDRESS-PHONE.INFO;lO'J
VIEW.HELP;2                  2
WHOIS.HELP;l                1
XED,MANUAL;301             19
  .301-CHANGES; 301          5
  .NEWS;jOl
XSEARCH.ALGORITHM;l    ;
  .HELP;2                9
XT .HELP;l                  1

20

220 FILES, 1779 PAGES


228

<BULLETINS> DIRECTORY LISTING

  The following is a listing of the <BULLETINS> directory which is a
repository of informal or transient information about the system,
subsystems, current events,  and items of intereat.

<BULLETINS>   13-MAY-76 08:20:41

FILE NAME        SIZE (COMPUTER PCS)

12-MAR-75.SYSLTR;l          3
AIM-WORKSHOP-ATTENDANCE.BBD;l 5
ARITHMETIC.BBD;l             1
ARPANET.BBD;l                2
BANK.BBD;l                  1
BASIC.BBD;l                4
BATCH.BBD;l                 1
BULLETINS.BBD;l               2
CALENDAR.BBD;4               1
COMPATIBILITY.BBD;l           1
CONSTANTS(PHYSICAL-OR-CHEMICAL).BBD;l 1
CRYPTION.BBD;l               2
DATA-MEDIA.BBD;l             1
DATACOMPUTER-SCHEDULE.BBD;l 1
DDT.BBD;l                   1
DFTP.BBD;l                  1
DO .BBD;l                   1
EDIT.BBD;l                  2
EMPLOYMENT-WANTED.BBD;l       1
EXCESS-BAGGAGE.BBD;l          1
FAIL.BBD;l                  1
FILE-MANAGEMENT,BBD;l        3
FILES.BBD;l                6
FORTRAN.BBD;l              19
FTP.CHANGES;l
GOOD-LISP-USAGE,BBD;2   i
GOOD-SYSTEM-USAGE.BBD;l       2
GUEST-LIST,BBD;2             1
ILISP.BBD;l                 1
IMP-PM-SCHEDULE.INFO;l        1
IN-WATS.BBD;l                1
JSYS.BBD;l                  1
KWIC.BBD;l                3
LEAP.BBD;l                  1
LIBRARY-SAIL.BBD;l           5
LINKlO.BBD;l                1
LINKINC.BBD;l                1
LISP.BBD;2                   2
LIST.BBD;l                   2
LOGIN-CMD.BBD;l              1
LOGIN-MESSAGES.BBD;2        58
MACRO.BBD;l                 1


229

MEETINGS.BBD;l               1
MLAB.BBD;l                3
NEW-EXEC.INFO;g             9
PDPll-GTQO.BBD;l             2
PLOT.BBD;l                3
PLOTTER.BBD;l                1
PROTECTION.BBD;l             1
PUB.BBD;l                  7
RECORD.BBD;l                 2
SAIL.BBD;l                 4
SEARCH.BBD;l                 2
SNDMSG-READMAIL.BBD;l         2
SORT.BBD;l                  1
SOS.BBD;l                   1
SPELL.BBD;l                  1
TECO.BBD;l                   2
TESTIMONIALS.BBD;l            1
TIMING.BBD;l               3
TYMNET.BBD;l                 1
  .NODE-DOWNTIME;1           1
TYMNET-RESPONSE-STATISTICS.JUN/75-APR/76;1       16
USER-INTERFACE-PROTOCOLS.BBD;l                   2
WORKSHOP-DEMO-SCHEDULE.INFO;l                   1

75 FILES, 293 PAGES


230

APPENDIX G

SUMEX-AIM SUMMARY FOR ARPANET RESOURCES HANDBOOK

   The following material was assembled as a description of the SUMEX-
AIM resource for the ARPANET RESOURCE HANDBOOK (NIC 232001, Edited by E.
J.  Feinler of the Network Information Center at Stanford Research
Institute,

(SUMEX-AIM)   Stanford Universitv Medical EXDerimental Combuter

(FUNCTION)

SERVER   COMPUTER : PDP- 10   HOST ADDR. 56   IMP 56/HOST 0

The SUMEX (Stanford University Medical Experimental Computer)
project is a cooperative computer resource involving participation
by the Biotechnology Resources Branch of the National Institutes
of Health and a variety of major research projects, many of which
are also supported by ARPA and which thereby are authorized for
access to the ARPANET.

SUMEX encompasses a dual mission:  1) supporting the development
of artificial intelligence (AI) computer science research with
special emphasis on biological and medical problems and 2) the
demonstration of computer resource sharing within a national
community.  The SUMEX resource resides administratively
within the Genetics Department of the Stanford University Medical
School and serves as a nucleus for a growing community of
projects, both within and external to Stanford.  SUMEX provides
computing facilities specifically tuned to the needs of AI
research and communication tools to facilitate inter- and
intra-group contacts as well as trial dissemination of research
products to medical users.  The project also develops tools for
and takes an active role in stimulating community relationships
among collaborating projects and medical researchers.  Currently
active projects span a broad range of application areas such as
clinical diagnostic decision-making, molecular
structure-interpretation, belief systems modeling, mental
function modeling, and instrument data interpretation.

(ADDRESS)

SUMEX-AIM Computer Project, TB105
Stanford University Medical Center
Stanford, California 94305

( PERSONNEL 1

(PRINcIPAL-INVESTIGATOR)


231

Joshua Lederberg

(LIAISON)
Richard Cower

(SOFTWARE-CONTACT)
TENEX system:
        Andrew Sweer
Subsystems :
      Nancy Smith

(HARDWARE-CONTACT)
Richard Cower

(CONSULTANT)
   Richard Kahler

(SYSTEM-USE)

( POTENTIAL USERS >

    For information and

(LEDERBERG@SUMEX) (415) 49-7~5801

(COWER@SUMEX)    (415) 497-5208

( SWEER@SU~~EX >    (415) Wi'-6707

(NSMITH@SU~EX)    (415) 49'7-6982


(C~WER@SUMFX)    (415) 497~52o8

(KAHLER@SUMEX >   (415) 497-5336

introductory  literature contact:

Dr. Elliott Levinthal
AIM User Liaison
SUMEX Computer Project
c/o Department of Genetics, SO47
Stanford University Medical Center
Stanford, California 94305

User projects are separately funded and autonomous in their
management and are selected for access to SUMEX on the basis
of their scientific merit as well as their
commitment to the community goals of SUMEX.   Procedures for
access to SUMEX are governed by a national advisory group.
GUEST access is provided only for limited demonstrations of
applications programs developed by the various SUMEX
projects.  Applications for GUEST access should be made
either to Dr. Levinthal (address above) or the Principal
Investigator of the particular project offering the program.

SUMEX-AIM does not sell computer time.

Long-term online storage is not available to network users.

(SERVICE-SCHEDULE)

SYSTEM DOWNTIME :

THUR -- 18:00 to 24300 for preventive maintenance
SUN  -- 6:OO to 10:00 for system backup

TYPICAL PRIME TIME LOAD = 22 users
MAX. NO. USERS = 50 users
NO. NETWORK SLOTS = 24


232

(LOGIN)

TELNET INFO:

. Appropriate transmission mode = Character-at-a-time

, Appropriate echo mode = Full-duplex; however, TENEX
will also operate in half-duplex.

. Mapping between NVT and local character set uses
the full ASCII character set; <CR-LF> received from
an NVT is passed to TENEX as New Line (Octal 37).

. TENEX EXEC does not distinguish between upper
and lower case alphabetic3 and will accept either.
At SUMJZX the defaults are "no raise" and "lowercase".
If the "raise" command is given then lower case
characters received will be translated to upper case,
and echoed in upper case.   "No lowercase" causes
lower case characters to be sent to the terminal as
upper case.

. The user can declare his/her terminal type to EXEC as
follows:

[@lterALTCminal (type is)] TYPE <CR>

. TIP settings - t e 0, e r

USER INFO:

Free experimental use is not available.

.  USERNAME = user's last name

. ACCOUNT NAME = an assigned string

. PASSWORD = user's assigned password

LOGIN:

Full Duplex Login (default condition)

Connect to SUMEX-AIM, then type:
[@]login <SP> USERNAME <SP> PASSWORD <SP> ACCT-NAME <CR>
[job xx on tty xx date time]
[previous login: date time]
[other active jobs for this user if any]
[current systems messages if any]
[next scheduled downtime]
[execution of commands from <username>login,cmd if any]
[r$.ification of the existence of new mail if any]

Half-Duplex Login


233

Connect to SUMEX-AIM, then type:
[@]half <CR>

[@]login <SP> USERNAME <SP> PASSWORD <SP> ACCT-NAME <CR>
[job xx on tty xx date time]
[previous login: date time]
[other active jobs for this user if any1
[current systems messages if any]
[next scheduled downtime]
[execution of commands from <username>login.cmd if any]
[notification of the existence of new mail if any]
[@I

Guest Login

*GUESTS are users authorized to run the various applications
*programs.  They are provided with a restricted version of
*the EXEC and restricted access to other resources not
*directly related to running the programs.  SNDMSG capability
*is available.

Connect to SUMEX-AIM, then type:
[@]guest <SP> LASTNAME <SP> GUEST-PASSWORD <CR>
[ Checking guest registry . ..I
[We would like to get some general information from you as

[(CTRL-X to redo the current prompt. 1
[Full name? (end with CR)]
NAME

1

our guest.]

[Affiliation? (end with CR)]
AFFILIATION
[Mailing address, phone number? (end
ADDRESS
PHONE-NO.
CONTROL-Z

with ^Z)l

[From whom did you get the password? (end with CR)1
NAME
[Thank you.  If you come in as a guest again, and use the]
[name "NAME" , you will skip these questions.]
[Type "ACCESS" to see what programs are currently available

to guests.1

[job xx on tty xx date time]
[previous login: date time]
[current system messages if any]
[next scheduled downtime]
[@I
[@;   ***** WELCOME TO SUMEX *****I
I@;  Please type HELP after the @sign if you wish further
                             information.]
  .
ESi  NOTE: Say "RUN SNDMSG" to send messages as a guest.]

SUBSYSTEM INTERRUPT = CONTROL C

SUBSYSTEM CONTINUE O [@Icon <CR> or  [@]c <CR>


234

(LOGOUT)

LOGOUT:

[@]logout <CR>

[number of other active jobs for this user if any]
[killed job xx, user xyz, acct stuv, tty xx,
at DATE TIME used 0:O:O in 0:O:Ol

After logout,  a Network user should instruct his/her NCP to
close both connections.

AUTOLOGOUT:

Breaking Network connections does not log the user out;
however, his/her job becomes "detached".   If after 20 min.
the job has not been reattached (via the "attach" command),
the job is logged out.

(CONTROL-CHARACTERS)

A few ASCII control characters are listed below:

Delete last character
Delete previous word
Delete command or line
Abort print
Retype edited command
Prompt or help
Force recognition
Is-system-still-there?

CONTROL-A (or DEL key)
CONTROL-W
CONTROL-X
CONTROL-C
CONTROL-R
  ?
ALTMODE
CONTROL-T

For a complete set of control characters available in TENEX
see BBN TENEX EXEC Language Manual.  However, note that the
DELETE (RUBOUT) key at SUMEX is used for deletion of a single
character not abortion of the entire typein as in standard
TENEX.

(HELP)

All TENEX commands available to the user are documented
online.  The user may obtain a complete list of these commands
by typing:

[@I?

At any point TENEX appears to expect a word or argument, the
user may type I'?" and a list of allowable keywords or
arguments will be displayed.
Also, at SUMEX, help with using the system or the various
programs may be obtained by typing:

[@Ihelp <CR>

(NETWORK-COMMANDS)

(LIST-ACTIVE-USERS)


235

Connect to SUMEX-AIM, then type:
[@]systat <CR>

(NETWORK-STATUS)

GENERAL STATUS
Connect and Login to SUMEX-AIM, then type:
  [@]netstat <CR>
  [*I <CR>

NETWORK TENEX LOAD STATUS
Connect to SUMEX-AIM, then type:
  [@]netload <CR>

(LINK-TO-ACTIVE-USERS)

To link to an active user on a given TENEX host,
connect to that host, then type:
[@]link <SP> ACTIVE USERID <CR>
        ACTIVE TTY-NO.<CR>(number - not word 'tty')
[@I;... MESSACE...<CR>
(NOTE: each line must start with a ';' and end with <CR>)
[@];...REPLY... <CR> (Text of reply)
[@]break <CR> (breaks link and returns user to EXEC)

To answer a link from an active user, type:
[link from smith, job x, tty nn]
[@;this is smith, how are you]
CONTROL-C (ONLY if not already in EXEC)
[@I;...REPLY...<cR>

To refuse all links, type:
[@]refuse <CR>

Users from other TENEX sites can link to users at
SUMEX through the RSEXEC.  Users at SUMEX require special
authorization for access to RSEXEC to initiate links.

(SEND-MESSAGE)

Login to TENEX, then type:
[@lsndmsg <CR>

[to:] USERNAME@HOSTNAME <CR>
  (NOTE:  Here the user must actually type an "@I'
  followed by the HOSTNAME of the recipient.
  If coming through a TIP, type two '@' signs.)
[cc:] USERNAME@HOSTNAME,USERNAME@HOSTNAME,etc. <CR>
[subject:] . ..HEADER or TITLE........<CR>
[message:] . . . . . ..TEXT...............<CR>
#(edit with control-A,Q,R,X or DEL).<CR>
. ..(call TECO to edit or............<CR>
. . ..insert file with CONTROL-B)....<CR>
. . . . . . . . . . . . . . . . . . . . ..end with CONTROL-Z
[q,s,? ,carriage-return:] <CR> sends the mail at once


236

Q delivers mail later
? lists other options

(GRIPE)

Login to SUMEX-AIM, then type:
[@]gripe <CR>
[griping on subject of1 general <CR>
[message (? for help):]
. . . . ..MESSAGE......
CONTROL-Z
[thank you for your comments]
[@I

(RETRIEVE-MESSAGE)

During TENEX login this statement will appear:
[you have a message]

Also, at SUMEX, at regular intervals, the message file
will be checked and this statement will appear:
[you have new mail]

To retrieve the message type:
[@]readmail <CR> <CR>
[;<FILENAME>message.txt;l DATE TIME
SENDER]
[ . . . . . . ..TEXT........]

For interactive reading and deletion of selected
messages use the BANANARD program rather than READMAIL
  [@IBANANARD <CR>
<- (help is available on-line by typing ? 1.

To delete all messages from your directory, type:
[@Idelete message.txt <CR>
[@Iexpunge <CR>

(CONSULTATION)

TENEX offers two ways to send messages to system
programmers.  They are the subsystems GRIPE and SNDMSG, and
are obtained by typing the appropriate subsystem name to
the EXEC. Each subsystem provides self-explanatory
instructions.  GRIPE is generally used for constructive
criticism about a subsystem or TENEX.  In general gripes
are low level criticisms to which formal responses are not
generally made. SNDMSG should be used for direct questions
to specific individuals.  If a user does not know which
specific individual to contact, a message can be sent
via SNDMSG to SUMEX@SUMEX.  The message file on this
directory will be read by the consultant (to be appointed
shortly) and redirected to the appropriate member of the
systems staff for action,

(COMMAND-SUMMARY)


237

To obtain a complete list of commands that exist in the
TENEX EXEC login to SUMEX, then type:
[@I ?

TENEX commands are fully documented in the TENEX EXEC
Manual (Ref. 4).
General help is also available through the HELP program:
[@Ihelp <CR>

(FILE-NAMING)

File names in TENEX are composed of five identifiers. These
are device, directory name, file name, extension, and version.
These five items uniquely identify any file accessible to a
user on the system. The device name identifies which device in
the system contains a given file. The directory name gives the
directory under which the file appears. The file name,
extension, and version identify a particular file in a given
directory, Here is an example of a TENEX file name:

DSK:<SUBSYS>TECO.SAV;l

The HELP program contains pointers to general information
files available on-line as well as pointers to documentation
files for specific programs.  It also gives information on the
assignment of filenames to public files at SUMEX so that
they can be easily located.

(PROTOCOLS)

(SERVER)

Network Server Protocols currently implemented are:

1.  TELNET (Network Standard) (ICP to Socket 1 for old
  protocol TELNET or Socket 27 for new protocol TELNET)
  Establishes a NVT connection to TENEX.

2.  FTP (Network Standard) (ICP to Socket 3).
  Establishes a duplex connection to the File
   Transfer Protocol Server.  SUMEX supports anonymous
  access to a selected set of directories--1ogin with
  username as ANONYMOUS and use lastname as the password.

3.  TENEX-TENEX (Private Standard) (Socket 105 octal)
   Used for ICP to file transfer service.

4.  MAXIM (Private Standard) (ICP to Socket 21).
   Transmits a TENEX 'mmaxim~.

5.  RSSER (Network Standard) (ICP to Socket 365 octal)
  establishes a duplex connection to RSEXEC server
   process.

6.  RSEXEC (Network Standard) (ICP to


238

Socket 367 octal).  User is connected to the
RSEXEC which may be used to access
network news, host status, etc.

7.  DAYTIME (Private Standard) (ICP to Socket 15)
  Transmits day and time in full format.

(USER)

1.  FTP (Network Standard)
   To access, type:
    [@]ftp <CR>
    [*Ihelp <CR>

(NCP-INTERFACE-FROM-LOCAL-PROGRAMS)

The NCP is implemented within the TENEX file system, and
hence a Network connection appears to the assembly-language
programmer as a sequential file whose byte size is that of
the connection.  The usual file JSYSes - openf, closf, bin,
bout, etc. - are used to manipulate the connection.

Network connections are distinguished from other TENEX
files by their file names, in which local socket number,
and remote host and socket number are embedded.   See the
TENEX JSYS Manual for further information.

(RESOURCES)

(HARDWARE)

(COMPUTER)
  TYPE        CORE AMOUNT     CORE SPEED    WORD LENGTH
(2)PDP-10 KI     256K         1 microsec     36 bits

(PERIPHERALS)
          HOW MANY     TYPE             MAKE       MODEL

(DISKS)     6      DOUBLE DENSITY         DEC      RP03
(DRUMS)      2      FIXED HEAD   DIG. DEVEL. CORP. A7312-D-8
(TAPES)      2     9 TRACK              DEC      TU30
          2     DECTAPE              DEC      TU56
(PRINTER) i      ROTATING DRUM      DATA PRODUCTS   2410
(PLOTTER) 1      100 STEPS/INCH         CALCOMP   565

(TERMINALS)

  SUMEX supports a wide variety of terminals. The most
   commonly used display terminal is the DataMedia for which
   a specially designed keyboard is available to interface
   with the TV-editor used at SUMEX.  A variety of other
  software programs are being developed with special display
   features designed for use on the DataMedia terminal.
(OPERATING-SYSTEM)


239

TENEX is a virtual-memory operating system for a time-shared
DEC PDP-10 computer that provides a 256K word virtual address
space to each process.  It permits the creation and running of
hierarchies of interdependent processes, accommodates large
numbers of users, has extensive file system capabilities, and
a well human-engineered executive command language.  Most
programs written for the standard DEC monitor (lo/50 code) run
directly.

SUMEX runs a special version of TENEX, modified for the KI-10
computer from the original BBN KA-10 version, to accommodate
the KI-10 paging hardware.  Preliminary modifications to
TENEX version 130 for the KI-10 were made by Dan Murphy at
DEC.  Under Rainer Schulz, that system was extensively
debugged and updated to version 131 (at the NASA Ames Research
Center) as well as substantially improved in throughput (at
Stanford:  Institute for Mathematical Studies in the Social
Sciences and SUMEX).  This version of KI-TENEX currently
operating at SUMEX has approximately twice the throughput
of KA-TENEX systems.  The staff is currently debugging a
dual processor version of TENEX.

SUMEX has a broad range of users including many computer
novices.  To facilitate smooth interface with our community
of users,, we have made a number of local modifications to
TENEX, particularly the EXEC.  Many of these are in the area
of user-settable options.  A complete list and description
of these modifications to the standard TENEX EXEC is
available online in <DOC>TENEX-EXEC-MANUAL-UPDATE.INFO ,

(USER-PROGRAMS)

(APPLICATIONS PROGRAMS)

These offerings are expected to increase as our newer projects
become established.  A list of the current programs is
available to GUESTS by typing "ACCESS".

DENDRAL PROGRAMS:

CONGEN
------

(TYPE)

Chemistry:  Computer-Assisted Structure Elucidation

(CONTACT)

JOHNSON@SUMEX or SMITHWJMEX

(DESCRIPTION)

Congen (CONstrained structure GENeration) accepts as input
known structural features of an unknown molecule (whose


240

elemental composition is known) and produces all
structural isomers consistent with these data.  The
features and constraints are entered in an interactive
session with the program and results can be drawn at a
terminal or further constraints added based on
examination of new data.  CONCEN represents an initial
version of a program for computer-assisted structure
elucidation.  The structure generator which underlies
CONCEN has been described (see Masinter et. al.,
J. Amer. Chem. Sot., 96, 7702 (1973) and ibid., 7714
(1974) 1.

(ACCESS)

For GUESTS:      [@lcongen <CR>
For Regular Users:  [@]sysin <dendral>congen <CR>

(DOCUMENTATION)

<DENDRAL>CONGEN.DOC

INTSUM
m--m--

(TYPE)

Chemistry:  Mass Spectrometry

(CONTACT)

JOHNSON@SUMEX or SMITH@SUMEX

(DESCRIPTION)

Given a set of known,  related structures and the mass
spectrum corresponding to each structure, INTSUM suggests
possible fragmentation processes which resulted in the
observed ions, and then summarizes the results in terms of
processes which are general to the class of structures,
and those which are specific to certain members of the
class. (See Smith et. al., Tetrahedron, 29, 3117 (1973)).

(ACCESS)

For GUESTS:       [@lintsum <CR>
For regular users:  [@lsysin <dendral>intsum <CR>

(DOCUMENTATION)

<DENDRAL>INTSUM.DOC

MOLION
-a----


241

(TYPE)
Chemistry:  Mass Spectrometry

(CONTACT)

JOHNSON@SUMEX or SMITH@SUMF,X

(DESCRIPTION)

Molecular ion determination program.  Given a (low
or high resolution) mass spectrum in which the molecular
ion may or may not be present, this program suggests a
ranked list of candidate molecular ions.  (See G. Dromey,
B. G, Buchanan, D. H. Smith, J. Lederberg and C. Djerassi,
J. Org. Chem., 40, 770(1975)).

(ACCESS)

For GUESTS:       [@]molion <CR>
For Regular Users:  [@]sysin <dendral>molion <CR>

(DOCUMENTATION)

<DENDRAL>MOLION.DOC

PLANNER
-------

(TYPE)

Chemistry:  Mass Spectrometry

(CONTACT)

JOHNSONWJMEX or SMITHWJMEX

(DESCRIPTION)

Planner infers possible structures of unknown compounds
(singly or as mixtures) given a mass spectrum and
fragmentation rules of the class of compounds to which the
unknown(s) presumably belongs. (See Smith et. al.,
J. Amer. Chem. Sot., 94, 5962 (1972); ibid., 95, 6078
(1973)).

(ACCESS)

For GUESTS:      [@]plan <CR>
For Regular Users:  [@]sysin <dendral>plan <CR>

(DOCUMENTATION)

<DENDRAL>PLAN.DOC


242

MYCIN PROGRAMS:

MYCIN
--w-w

(TYPE)

Interactive Consultation for Infectious Disease Patients

(CONTACT)

SCOTT@SUMEX

(DESCRIPTION)

MYCIN is an interactive program which utilizes data
available from the microbiology and clinical chemistry
laboratories, plus the physician's response to
computer-generated questions, to provide physician
nonspecialists with consultative advice on diagnosis and
antimicrobial therapy for patients with bacterial
infections.  The program also has interactive capabilities
allowing the user to request explanations of the
consultation program's actions and reasoning, to ask about
MYCIN's static knowledge base or about specific
conclusions made during the consultation, and to teach
the program by entering new pieces of judgmental
knowledge.

(ACCESS)

For GUESTS:      [@lmycin <CR>
For Regular Users:  [@l<mycin>mycin <CR>

(DOCUMENTATION)

<MYCIN>MYCIN.DOC

HIGHER MENTAL FUNCTIONS MODELING PROGRAMS:

PARRY
-we--

(TYPE)

Interactive simulation of paranoid thought processes

(CONTACT)

COLBY@SUMEX or FAUGHT@SUMEX or PARKISON@SUMEX

(DESCRIPTION)

PARRY is an interactive simulation of a model of paranoid
thought processes.  The purpose of the user (usually a


243

clinical psychiatrist) is to conduct a first interview
with the patient (PARRY) and obtain a diagnosis.   The
interview usually lasts 40-60 input/output pairs.
The program is divided into two modules: the recoqnizer
module and the response module.  The recognizer accepts
natural language input expressions (English) to determine
its semantic content.  The response module uses the
model's inference, affect,  and intentions mechanisms
to determine the appropriate response.

(ACCESS)

For GUESTS:      [@]parry <CR>
For Regular Users:  [@l<parry>parry <CR>

(DOCUMENTATION)

<PARRY>PARRY.DOC

GENERAL UTILITIES PROGRAMS FROM NIH:

MLAB
----

(TYPE)

Mathematical Modeling

(CONTACT)

JOHNSON@SUMEX

(DESCRIPTION)

MLAB stands for MODELAB which stands for "modeling
laboratory". It is a program which allows one to do
scalar and matrix computations, curve-fitting, and
differential equation solving. Graphic facilities are
also provided.  This program was written by, and obtained
from, Gary Knott at DCRT/NIH.

(ACCESS)

For GUESTS:       [@lmlab <CR>
For Regular Users:  [@]mlab <CR>

(DOCUMENTATION)

<SYSSUP>MLABD.TXT (also available as interactive help)

OMNIGRAPH
-----w-m-


244

(TYPE)

Terminal Independent Graphics Software

(CONTACT)

JOHNSON@SUMEX

(DESCRIPTION)

Omnigraph is a graphics subroutine package, designed by
Robert F. Sproull while at DCRT/NIH, for driving a number
of different display devices with either SAIL or FORTRAN
as the programming language.  The Omnigraph system is
designed for routine graphics applications, not for
high-performance terminals.  Terminals which can be used
with this package include Dee GT/40, Dee 340,
Tektronix 4010, Computek, and the Ards.

(ACCESS)

Used by linking appropriate language dependent code in
with the program which incorporates the Omnigraph calls.
Terminal type is defined at run-time, and appropriate code
is loaded by the system.  Language dependent code segments
are available as LIB:DISSAI for SAIL, LIB:DISFOR for
F40 Fortran, and as LIB:DISFlO for FORTRAN-lo.

(DOCUMENTATION)

<DOC>OMNIGRAPH-USER'S-GUIDE.INFO

RUTGERS RESEARCH RESOURCE ON COMPUTERS IN BIOMEDICINE:
[To be supplied later]

(LOCAL SOFTWARE PROGRAMS)

SUMEX is willing to provide copies of any of its non-contract
software programs to other interested sites.  The following
is a list of those programs which originated at IMSSS or
SUMEX with SUMEX as their network distribution point that
should be of particular interest to other TENEX sites. To
facilitate the most efficient distribution of these programs,
we request that any site desiring one of the programs appoint
a single person to serve as the local contact with SUMEX for
that program.  The designated person should communicate
directly with the author of the program (listed below). In
this way,  we can insure that the interested site receives a
correct working version of the program as well as any updates
as they become available.  We also request that any
subsequent comments or bug reports  be channeled to SUMEX
through the local contact.  If local modifications need to


245

be made, we will be happy to incorporate them under
conditional compilation switches in the master sources
thereby simplifying the procedure of moving to updated
versions in the future.  We cannot be responsible for
maintaining versions of these programs obtained in any other
manner.

TENEX SAIL:
-B-w- ---a-

SAIL which was developed at SU-AI is an ALGOL-like language
with complex data- and control-structure extensions for
artificial intelligence research and it compiles reasonably
efficient code.  Robert Smith at IMSSS has TENEXized SAIL,
It has been given complete access to the facilities provided
by TENEX and a number of new features (random input-output,
interrupt system) have been added.  These changes were
merged into the master-files at SU-AI insuring integrity of
the SAIL software.

SUMEX and IMSSS are currently organizing a SAIL library.
Anyone interested in contributing routines should contact
DANIELS@SUMEX.

A number of utilities to support SAIL usage are also
available at SUMEX including:

BAIL   written by John Reiser at SU-AI which is an
    interactive runtime debugging package

PROFIL written by D. Sweet which gives the frequency of
     execution of SAIL statements

FILCHK written by R. Smith which checks SAIL programs for
    loader incompatibilities

FORMAT written by R. Smith which adds a table of contents
    and optional index to SAIL programs

Contact RSMITH@SUMEX for details.

TENEX UCI-LISP:
-B-w- ---------

UCI-LISP is an extension of LISP 1.6 with many new features.
It has been TENEXized at IMSSS by Tom Wolpert.  The pmapped
IO is extremely fast. It also includes the edit and break
packages of 1972 INTERLISP.  TENEX UCI-LISP is approximately
6 times faster than INTERLISP.  Complete documentation is
available.  Contact WOLPERT@SUMEX for details.

MACHINE-INDEPENDENT SAIL


246

A machine-independent compiling system is being developed for
a major subset of the SAIL language by Clark Wilcox at SUMEX.
Compilers have been created for a number of machines,
including the PDP-11, PDP-10, IBM-SYSTEM/360, and NOVA. The
compiler and much of the runtime system are written in SAIL,
so that the system is portable.  This system is still
developmental but will be available in the near future.
Inquiries should be addressed to WILCOX@SUMEX-AIM .

RECORD:
-.----mm

The RECORD program by R. Smith creates pseudo-teletype jobs.
It requires the pseudo-teletype code developed at AMES which
is not supported in the standard BBN TENEX system.
It can be used for running 3 jobs simultaneously from the
same terminal with easy switching back and forth, It
optionally makes a typescript of the entire session on the
pseudo-teletype which is very nice for preparing
documentation on program use, keeping a record of
applications users, recording the action of program bugs,
etc. RECORD also allows for detaching from running jobs with
a very large buffer for storing program output until the
job is reattached.  Contact RSMITH@SUMEX for details.

TV:
---

TV is a display-oriented editor for use on DATAMEDIA, TEC,
IMLAC and several other display terminals.  It was written by
Pentti Kanerva at IMSSS and is a relative of several other
TV-editors developed at IMSSS and SU-AI.  It provides many
features for updating text files, such as good cursor control
user-defined macros, and string searching.   The latest
version of TENEX SAIL is required to compile the editor.
Contact P. Kanerva through NSMITH@SUMEX for specifics on
which terminals are or could be supported by the additional
of a terminal dependent front end for the editor.   An older
version of the editor is also available for non-TENEX PDPlO
sites.

PUB Macros Package and Documentation:
s-s -w--e- --m---w --- -----w------e-

PUB is a document preparation language written by Larry
Tesler formerly of SU-AI and currently at XEROX-PARC.
A complete set of PUB macros for easy use of PUB has been
written by Nancy Smith at SUMEX with a full manual describing
the use of this package.  The macros are designed to produce
interestingly formatted documents by relatively inexperienced


247

computer users.  It is designed for a non-XGP TENEX site.
Contact NSMITH@SUMEX for details.

DDT:
----

Robert Smith at IMSSS has added single-stepping to TENEX DDT.
Contact RSMITH@SUMEX for details.

(NEW DEC RELEASES WITH TENEX MODIFICATIONS)

SUMEX has purchased the latest FORTRAN10 package from DEC and
modified both the programs and PA1050 to get it running on
TENEX.  SUMEX, of course,  cannot share the sources but would
be happy to share the modifications with any other TENEX
sites who are DEC-authorized users of the new FORTRANlO.
The following is a partial list of the late release DEC
software running at SUMEX:

FORTRAN10   Version  4
F40        Version 27
LINK10     Version 2A
MACRO      Version  50
BLISlO      Version 5
RUNOFF      Version  10

(OTHER LICENSED SOFTWARE)

SITBOL (Stevens Institute of Technology version of SNOBOL)

(STANDARD SOFTWARE PROGRAMS)

The following are the major standard TENEX and/or DEC
programs routinely offered at PDPlO sites which are available
at SUMEX plus an assortment of programs which we have
obtained from other network sites,  In some cases, local
modifications have been made. The documentation for these
programs can be located with the assistance of the HELP
program.

ADDMSG
AID
BANANARD
BASIC
BCPL
BINCOM
BLISlO
BSYS
CALENDAR
CAM
CCL
COPYM

DCHANGE
DDT
DED
DELVER
DUMPER
F40
FAIL
FED
FILCOM
FILDMP
FILES
FILEX

FRKCOM     MACRO
FTP       MAILSTAT
FUDGE2     MAILSYS
GRIPE      MULTI
HOSTAT     NETSTAT
IDDT      PA1050
ILISP      PCSAMP
INTERLISP  PIP
LD        POET
LINK10     PPL
LOADER     PUB
LOADGT     RD

REDUCE
RUNFIL
RUNOFF
SAIL
SDDT
SNDMSG
SNOBOL
SORT
SOS
SPELL
SRCCOM
SUBMIT

SYSIN
TALK
TAPCNV
TCTALK
TECO
TTYTRB
TTYTST
TYPBIN
TYPREL
UDDT
WATCH
XED


248

CREF      FORTANlO   LOADVT     READMAIL
(LOCAL UTILITIES PROGRAMS)

SUMEX also has a number of local utilities programs which
we would be happy to share with other sites but for which
we are unable to provide maintenance or guaranteed further

support.

2SIDES
ACCESS
BACKUP
BUDGET
CARDS
CLEAN
CRYPT5
DCHECK
DIREXT
DIRNUM
DO
DONE
FREQ
LOWCASE
NEWFILES
NEWINFO
NON
PERUSE
RPURGE

RTTY

(Many of these programs are from IMSSS).

makes files for multi-column & 2-sided listing
gives a list of programs available to guests
short term file loss protection
budget preparation program
creates online "card catalog" for a library
file by file directory clean-up
en/decrypts text files
reads blocks of file into core & examines with DDT
lists directory ordered by file extension
translates directory name to no. for DEC programs
creates or appends a line to a reminder file
deletes a line from a reminder file
ranks words in text file according to frequency
converts a text file to lowercase
prints info on all user's files written in 24 hrs.
prints info on all public files written in 24 hrs.
zero-compresses, removes linenos., pagemarks, etc.
allows fast reading of random parts of a file
interactive selection of files to purge--records
write/create/purge date --optional comments on file
types out a file starting at the end (reverse)

SEARCHDIR substring search of directory information--like

        wildcard for names --also  searches on author & date
SEARCHP   substring search of multi-files with PERUSEing of
                   random parts of the file
TBASIC    TENEXized version of Dartmouth Basic
TMERGE   merges specified pages from file(s) into new file
TODAY    lists the contents of today's reminder file
UPCASE   converts a text file to uppercase
WHAT      lists the contents of a reminder file
WHOIS    looks up username & prints name/address info
XSEARCH   very fast substring search of multiple files with
       optional production of  "hit" list for TV-editor
z       logout from lower forks.

(INTERESTS)

(SUMEX STAFF)

The interests of the SUMEX staff center around two themes:
1) providing easy access to the system for a community of remote
users with widely differing computer experience including medical
professionals with no previous computer experience and
2) developing means to facilitate communication and resource
sharing among the various projects.  Therefore, areas of
interest and program development include:  message sending and


249

reading facilities,  creation of an on-line bulletin board,
organization of on-line documentation with a help system for
easy access of the material, development of libraries of program
routines, acquiring of available utilities packages, and study of
techniques for acquiring and employing user models in all of these
areas.

(SUMEX PROJECTS AND THEIR PRINCIPAL INVESTIGATOR(S) )

(STANFORD)

DENDRAL

Prof. C. Djerassi (Chemistry)
Prof. J. Lederberg (Genetics)
Prof. E. Feigenbaum (Computer Science)

MYCIN

Prof. S. Cohen, M.D. (Pharmacology)
Dr. B. Buchanan (Computer Science)

PROTEIN CRYSTALLOGRAPHY MODELING
Dr. S. Freer (Chemistry, U.C. San Diego)
Prof. E. Feigenbaum (Computer Science, Stanford)
Dr. R. Engelmore (Computer Science, Stanford)

(NATIONAL)

COMPUTER MODEL OF DIAGNOSTIC LOGIC (DIALOG) -- U. of Pittsburgh
Dr. H. Pople
J. Myers, M.D.

HIGHER MENTAL FUNCTIONS MODELING -- UCLA
Kenneth M. Colby, M.D.

MEDICAL INFORMATION SYSTEMS LABORATORY (MISL) -- U. of Illinois
Dr. B. McCormick                    at Chicago Circle
M. Goldberg, M.D.

RUTGERS RESEARCH RESOURCE COMPUTERS IN BIOMEDICINE -- Rutgers U.
Prof. S. Amarel

There are also a number of pilot-projects both at Stanford
and nationally.

(DOCUMENTATION)

(PROGRAMS)

A list of the available on-line documentation of programs and
general information files describing the SUMEX system and policies
is available through running of the HELP program.   A list of the
hardcopy documentation available and procedures for obtaining copies


250

is contained in <DOC>A-LIST-OF-AVAILABLE-DOCUMENTATION.INFO.

(PROJECTS)

The following is a partial bibliography of research papers by the
various projects.  More complete bibliographies can be obtained by
contacting the individual project leaders,

DENDRAL:
1.  D. H. Smith, L. M. Masinter, and N. S, Sridharan, "Computer
Representation and Manipulation of Chemical Information",
W.T. Wipke, S. Heller, R. Feldmann, and E. Hyde, Eds., John Wiley
and Sons, Inc., 1974, p. 287.
2.  R. E. Carhart et, al., "Applications of Artificial Intelligence
for Chemical Inference.   XVIII.  An Approach to Computer-Assisted
Elucidation of Molecular Structure", J. Amer. Chem. Sot.,
in press, Sept. 1975.
3.  Duffield et. al., "Applications of Artificial Intelligence
for Chemical Inference. II.  Interpretation of Low Resolution
Mass Spectra of Ketones", J. Amer. Chem. Sot., 91,
2977,(1969).

4.  R. E. Carhart, et. al.,  "Networking and a Collaborative
Research Community:  A Case Study Using the Dendral Programs",
to appear in the Proceedings of the Amer. Chem. Sot., Aug. 1975.

HIGHER MENTAL FUNCTIONS MODELING:

1.  W. S. Faught, K. M. Colby, R. C. Parkison, "The Interaction
of Inferences, Affects, and Intentions in a Model of Paranoia",
AIM-253, December 1974, Stanford AI Laboratory.

2.  K. M. Colby, R. C. Parkison, B. Faught, "Pattern-Matching
Rules for the Recognition of Natural Language Dialogue
Expressions", AIM-234, Stanford Artificial Intelligence
Laboratory, Stanford, California, June 1974.

MYCIN:

1.  E. H. Shortliffe, F. Rhame, et. al., "MYCIN, A Computer
Program Providing Antimicrobial Therapy Recommendations",
Clinical Research,.vol 23, p 107A (abstract) 1975.

2.  E. H. Shortliffe, S. G. Axline, B. G. Buchanan, S. N. Cohen,
"Design Considerations for a Program to Provide Consultations in
Clinical Therapeutics".  Presented at San Diego Biomedical
Symposium 1974 (February 6-8, 1974).

3.  E. H. Shortliffe, R. Davis, S. G. Axline, B. G. Buchanan,
C. C. Green, and S. N. Cohen, "Computer-Based Consultations in
Clinical Therapeutics:  Explanation and Rule Acquisition
Capabilities of the MYCIN System",  to appear in Computers and



251

Biomedical Research, June 1975.

4.  E. H. Shortliffe, "MYCIN, A Rule Based Computer Program...",
STAN-CS-74-465, Computer Science Department, Stanford University,
1974.
5.  E. H. Shortliffe, S. G. Axline, B. G. Buchanan, T. C. Merigan,
and S. N. Cohen, "An Artificial Intelligence Program to Advise
Physicians Regarding Antimicrobial Therapy", Computers and
Biomedical Research, 6 (1973), 544-560.

6.  E. H. Shortliffe and B. G. Buchanan, "A Model of Inexact
Reasoning in Medicine", July 1974. To appear in Mathematical
Biosciences.

RUTGERS RESEARCH RESOURCE ON COMPUTERS IN BIOMEDICINE:

1) C.A. Kulikowski, "Computer-Based Systems for Vision Care",
Proc.  IEEE Intercon, April 1975.

2) S. Amarel, "Computer-Based Modeling and  Interpretation in
Medicine and Psychology:  The Rutgers Research Resource",
Federation Proceedings, Vol. 33, No. 12.

3) C. A. Kulikowski, "Computer-Based Medical Consultation--A
Representation  of Treatment Strategies",  Proc.   Hawaii
International Conf.  on Systems Science, January 1974.

4) C. A. Kulikowski, "A System for Computer-Based Medical
Consultation",   Proc.   National Computer Conference, Chicago,
May 1974.

5) S. Amarel, "Inference of Programs from Sample Computations",
Proceedings of NATO Advanced Study Institute on Computer Oriented
Learning Processes, Bonas, France.

6) B. Bruce, "A Logic for Unknown Outcomeslf, Notre Dame
Journal of Formal Logic.

7) S. Chokhani and C. A. Kulikowski, "Process Control Model for
the Regulation of Intraocular Pressure and Glaucoma", Proc. IEEE
Systems, Man Cybernetics Conf., Boston, November 1973.

8) C. F. Schmidt and J. D'Addamio, "A Model of the Common Sense
Theory of Intension and Personal Causation", hoc, of the 3rd
IJCAI, August 1973.

9) C. F. Srinivasan, "The  Architecture of a Coherent
Information System:  A General Problem Solving System", Proc.
of the 3rd IJCAI, August 1973.


252

APPENDIX H

AIM MANAGEMENT COMMITTEE MEMBERSHIP

   The following are the membership lists of the various SUMEX-AIM
management committees at the present time:

AIM EXECUTIVE COMMITTEE:
-----------------------
-----------------------

LEDERBERG, Dr. Joshua (LEDERBERG)     (Chairman)
    Department of Genetics, S331
    Stanford University Medical Center
    Stanford, California 94305
     (415) 497-5801

AMAREL, Dr. Saul (AMAREL)
     Department of Computer Science
     Rutgers University
    New Brunswick, New Jersey 08903
     (201) 932-3546

BAKER,  Dr. William R., Jr. (BAKER) (Executive Secretary)
    Biotechnology Resources Program
     National Institutes of Health
     Building 31, Room 5B25
     9000 Rockville Pike
    Bethesda, Maryland 20014
     (301) 496-5411

LINDBERG, Dr. Donald (LINDBERG)
     605 Lewis Hall
     University of Missouri
    Columbia, Missouri 65201
     (314) 882-6966

(Adv Grp Member)


253

AIM ADVISORY GROUP:
------------------
------------------

LINDBERG, Dr. Donald (LINDBERG)
     605 Lewis Hall
     University of Missouri
    Columbia, Missouri 65201
     (314) 882-6966
AMAREL, Dr. Saul (AMAREL)
    Department of Computer Science
     Rutgers University
    New Brunswick, New Jersey 08903
     (201) 932-3546

(Chairman)

BAKER,  Dr. William R., Jr. (BAKER) (Executive Secretary)
     Biotechnology Resources Program
     National Institutes of Health
     Building 31, Room 5B25
     9000 Rockville Pike
    Bethesda, Maryland 20014
     (301) 496-5411

BOBROW, Dr. Daniel G.  (BOBROW)
      Xerox Palo Alto Research Center
     3333 Coyote Hill Road
    Palo Alto, California 94304
     (415) 494-4438

FEIGENBAUM, Dr. Edward (FEIGENBAUM)
      Serra House
     Department of Computer Science
     Stanford University
    Stanford, California 94305
     (415) 497-4878

FELDMAN, Dr. Jerome (FELDMAN)
     Department of Computer Science
     University of Rochester
     Rochester, New York
     (716) 275-5671

LEDERBERG, Dr. Joshua (LEDERBERG)     (Ex-officio)
     Principal Investigator - SUMEX
     Department of Genetics, S331
     Stanford University Medical Center
    Stanford, California 94305
     (415) 497-5801

MILLER, Dr. George (GMILLER)
     The Rockefeller University
     1230 York Avenue
     New York, New York 10021
     (212) 360-1801

REDDY, Dr. D. R.  (REDDY)


254

Department of Computer Science
Carnegie-Mellon University
Pittsburgh, Pennsylvania
(412) 621-2600, Ext. 149

SAFIR, Dr. Aran (SAFIR)
    Department of Ophthalmology
     Mount Sinai School of Medicine
     City University of New York
     Fifth Avenue and 100th Street
     New York, New York 10029
     (212) 369-4721

STANFORD COMMUNITY ADVISORY COMMITTEE:
----------------_--------------------
-----------------_-------------------

LEDERBERG, Dr. Joshua (LEDERBERG)     (Chairman)
     Principal Investigator - SUMEX
    Department of Genetics, S331
     Stanford University Medical Center
    Stanford, California 94305
     (415) 497-5801

COHEN, Stanley N., M.D.  (COHEN)
      Division  of Clinical Pharmacology, S169
     Stanford University Medical Center
    Stanford, California 94305
     (415) 497-5315

DJERASSI, Dr. Carl
    Department of Chemistry, Stauffer l-106
     Stanford University
    Stanford, California 94305
     (415) 497-2783

FEIGENBAUM, Dr. Edward (FEIGENBAUM)
      Serra House
     Department of Computer Science
     Stanford University
    Stanford, California 94305
     (415) 497-4878

LEVINTHAL, Dr. Elliott C.  (LEVINTHAL)
    Department of Genetics, SO47
     Stanford University Medical Center
    Stanford, California 94305
     (415) 497-5813


255

APPENDIX I

USER INFORMATION - GENERAL BROCHURE

Revised May 1976

The Stanford University Medical Experimental Computer (SUMEX) was

established in January, 1974, to constitute the first national shared
computing resource for medical research.  An innovative effort to help
biomedical scientists meet today's research requirements and to explore
computer applications in many health fields ranging from basic research to
bedside care, SUMEX is directed by Professor Joshua Lederberg, Chairman of
Stanford's Department of Genetics,  The project has been funded by a grant
from the Division of Research Resources of the National Institutes of
Health (Biotechnology Resources Program) for an initial term that expires
in July, 1978.

   At present, SUMEX consists of a powerful time-shared dual processor
DEC-10 computer system.  It is available to approved users throughout the
United States over computer communications networks.  The project's goals
for its present 5-year term are: 1) the encouragement of applications of
artificial intelligence in medicine (AIM), and 2) the managerial,
administrative and technical demonstration of a nationally-shared
technological resource for health research.

   Such a resource offers scientists both a significant economic
advantage in sharing expensive instrumentation and a greater opportunity
to share ideas about their research,  This is especially timely in
computer science, a field whose intellectual and technological complexity
tends to nurture relatively isolated research groups.  Each group may then
tend to pursue its own line of investigation with limited convergence on
working programs available from others.  In this respect, computer
applications have demonstrated less mutual incremental progress from
diverse sources than is typical of other sciences,  The SUMEX-AIM project
seeks to reduce these barriers to scientific cooperation in the field of
artificial intelligence applied to health research.

ARTIFICIAL INTELLIGENCE

   The term "artificial intelligence" (AI) refers to research efforts
aimed at studying and mechanizing information-processing tasks that have
required the application of some degree of human intelligence.
Controversial speculations on how far this eventually may lead only
distract from pragmatically useful applications of currently feasible art.
The current emphasis in the field is to understand the underlying
principles of a) efficient acquisition and utilization of material
knowledge, and b) the programmed representation of conceptual abstractions


256

in reasoning, deductive , and problem-solving activities.  At present,
these are far more specialized and inflexible than human intellectual
functions; however, in special domains they may be of comparable or
greater power, e.g., in the solution of formal problems in organic
chemistry or in the integral calculus,

   AI systems are characterized by complex computational processes that
are primarily non-numeric, e ,g. , graph-searching and symbolic pattern
analysis.  They involve procedures whose execution is controlled by
diverse types and forms of knowledge about a given task domain, such as
models, fragments of "advice", and systems of constraints or heuristic
rules.  Unlike conventional algorithms commonly based on a well-tailored
method for a given task, AI procedures typically use a multiplicity of
methods in a highly conditional manner --depending on the specific data in
the task and a variety of sources of relevant information,  The tangible
objective of this approach is the practical development of computer
programs which, using formal and informal knowledge together with
mechanized hypothesis formation and problem-solving procedures, will offer
more general and effective consultative tools for the clinician and
medical scientist.  Contexts in which experimental data already are
acquired by machine may offer even richer opportunities.

   Each authorized project in the SUMEX-AIM community is concerned in
some way with the application of these principles to medical and health
research problems.  This type of "intelligentV1 assistance by computer
program is perhaps best illustrated by the following brief descriptions of
a selected sample of SUMEX-AIM projects.

DENDRAL

   The DENDRAL project at Stanford,  under the direction of Professor
Lederberg, Genetics; Professor Edward Feigenbaum, Computer Science; and
Professor Carl D jerassi, Chemistry,  is aimed at assisting the biochemist
in interpreting molecular structures from spectroscopic, physical and
chemical information.  In cases where the characteristic spectra of a
compound are not catalogued in libraries, the DENDRAL programs carry out
the rather laborious processes a chemist must go through to interpret the
spectrum from "first principles".  One of the DENDRAL programs, CONGEN
(for CONstrained structure GENeration),  is an interactive program designed
to assist the chemist in the enumeration of structural isomers;based on
inferences about structural features of an unknown compound.   These
inferences, whether obtained from physical,  chemical or spectroscopic
data, are supplied to CONGEN as structural fragments and related
information, using a standard computer terminal connected to SUMEX-AIM.
The program uses atoms and superatoms (non-overlapping structural
fragments known to be present in the molecule) to construct structures;
the procedure is restricted by a variety of constraints on desired and
undesired substructures and ring systems.  There is no direct algorithmic
path available to determine such a molecular structure from the spectral
data--only the inferential process of hypothesis generation and testing
within the domain of reasonable solutions defined by a knowledge of
organic and physical chemistry.


257

   This process, as implemented in the computer, is a simplified
example of the cycle of inductive hypothesis--deductive verification that
is often taught as a model of the scientific method.  (Whether this is a
faithful description of contemporary science is arguable, and how it may
be implemented in the human brain is unknown.  Regardless, these are
useful leads rather than absolute preconditions for the pragmatic
improvement of mechanized intelligence for more efficient problem-
solving.) The elaboration of these approaches with existing hardware and
software technologies is the most promising approach to enhancing the
application of computers to the vaguely structured problems that dominate
our task domains.

   A new pilot project, MOLGEN, has been motivated by the success of
the DENDRAL effort.  MOLGEN uses similar paradigms in an effort to
mechanize experiment-planning in molecular genetics, particularly work on
DNA structure and inter-species transfer being conducted in Professor
Lederberg's laboratory.  Whereas the DENDRAL goal was a hypothesis (i.e.,
a chemical structure) to explain a set of experimental data, MOLGEN begins
with a stipulated DNA structure and seeks suggested experiment plans that
could either falsify or validate the asserted structure.  At present, this
entails a substantial effort in representing existing knowledge of
experimental techniques (i.e., enzyme specificities, electron-microscopy,
electrophoresis) and the physical biochemistry of DNA.

THE RUTGERS PROJECT
COMPUTERS IN BIOMEDICINE

   Professor Saul Amarel, a Rutgers University computer scientist,
directs several research efforts designed to introduce advanced methods in
computer science--particularly in artificial intelligence and interactive
data base systems --into specific areas of biomedical research.

   For example, a group led by Professor Casimir Kulikowski is
developing computer-based consultation systems for diseases of the eye in
collaboration with Dr. Aran Safir, an ophthalmologist from the Mount Sinai
School of Medicine.  An important development in this area is the
establishment of a national network of collaborators (called the ONET) for
computer diagnosis and treatment of glaucoma,  The computer system, which
includes an elaborate pathophysiologic model of the disease, is being
tested through the SUMEX-AIM network at five eye centers: Mount Sinai
Hospital and Medical Center, New York; Washington University, St. Louis;
The Johns Hopkins University, Baltimore;  the University of Illinois at
Chicago Circle;  and the University of Miami.  Glaucoma,  in one form or
another, affects 2% of all people over 40 years of age.   It is a disease
in which increased pressure within the eye may lead to irreparable optic
nerve damage and blindness.  The computer-based program has great
potential for assisting clinicians and researchers in understanding the
disease, diagnosing it more accurately and improving its treatment.

   In another project, Professor Charles Schmidt, a social
psychologist, is developing a theory of how people arrive at


258

interpretations of the social actions of others in collaboration with
Professor N.S. Sridharan, a computer scientist,  The theory will be tested
in situations such as the psychiatric interview and the legal trial.   The
computer system which currently represents the theory is called
"Believer".  It includes a large body of statements about people's
motivations and actions.

   The Rutgers project includes, in addition, several fundamental
studies in artificial intelligence and system design.  These provide much
of the support needed for the development of complex systems such as the
glaucoma consultation and the "Believer" programs.

SIMULATION AND EVALUATION
OF CHEMICAL SYNTHESIS

   The development of new drugs and the study of how drug structure is
related to biological activity depends on the chemist's ability to
synthesize new molecules and modify existing structures, e.g.,
incorporating isotopic labels into biomolecular substrates.   The
Simulation and Evaluation of Chemical Synthesis (SECS) project, directed
by Dr. Todd Wipke, Associate Professor of Chemistry at the University of
California, Santa Cruz,  is aimed at assisting the synthetic chemist in
designing stereospecific syntheses of complex bio-organic molecules.

   The molecule to be synthesized is presented to SECS using
interactive computer graphics.  The program studies the chemical graph and
also j-dimensional and electronic models of the molecule which it knows
how to construct ; then, using fundamental chemical principles, and various
heuristics, it works backwards from the target to predict possible
precursors which are one synthetic step away from the target.   The chemist
selects the precursors to be considered by the program for further
analysis.  Thus, SECS acts as a consultant, working with the chemist to
form a chemist-computer team.  The chemist helps guide the search and
decides when to stop the analysis.  Knowledge about chemical
transformations is expressed directly by chemists in ALCHEM, an English-
like language interpreted by SECS.  Goals for further development of the
project include generation of constraining strategies based on symmetry,
steric and electronic considerations, and expansion of the chemical
transform data base.

   In addition to its on-going development on the SUMEX-TYMNET system,
an experimental version of SECS is available over TELENET from First Data
Corporation in Waltham, Massachusetts.  SECS also runs on a Univac system
at the University of Strasbourg, France, and on PDP-10's at the
Universities of Darmstadt and Heidelberg, Germany,   Feedback from this
outside use of SECS spotlights areas for needed work and provides positive
evidence of the usefulness of SECS as a tool in synthetic design.


259

        MYCIN
Computer-based Consultation
in Clinical Therapeutics

   Dr. Stanley Cohen, Professor and Head of the Division of Clinical
Pharmacology at Stanford, directs this research in collaboration with Dr.
Stanton Axline and with computer scientists interested in artificial
intelligence and medical computing.  The MYCIN system models the decision
processes of medical experts, utilizing both clinical data and the
judgmental knowledge of experts to provide physician nonspecialists with
consultative advice regarding clinical therapeutics.  Although init ial
research concerns the use of antimicrobial agents in the treatment of
bacteremias, the system is being expanded to deal with the treatment of
other infections.

   The primary component of the system is the Consultation program
which uses the physician's response to computer-generated questions about
a patient to make deductions about the case.   It then advises the
physician on the infectious disease diagnosis and the recommended
treatment for the patient. The utility and flexibility of this program are
increased by three adjunct programs:  1) a Question-Answering program which
answers questions about the system's knowledge base and about a specific
consultation, 2) an Explanation program which justifies the consultative
advice and explains the system's deduction process, and 3) a Knowledge
Acquisition program which extends the knowledge base of the system through
dialogue with an expert.

    Goals for further development of the system include implementation
and evaluation of the system in the clinical setting at the Stanford
University and Palo Alto Veterans Administration Hospitals.

         ACT
A Model of Human Cognition

    The ACT Project is directed by Dr. John Anderson, Associate
Professor of Psychology at Yale University.  The ACT program provides a
uniform set of theoretical mechanisms to model such aspects of human
cognition as memory,  inferential processes, language processing, and
problem-solving.  The knowledge base consists of two components.   The
propositional component is provided by an associative network encoding a
set of known facts which provide the system's semantic memory.   The
procedural component consists of a set of productions which operate on the
associative network.  The production system used is considerably different
than those in other currently available systems, e.g., Newell's PSG, and
allows the system to operate on an associative network and to more
accurately model certain aspects of human cognition.


260

ARTIFICIAL INTELLIGENCE METHODOLOGY
APPLIED TO PROTEIN CRYSTALLOGRAPHY

   Members of the artificial intelligence project at Stanford also are
collaborating with Professor Joseph Kraut, Dr. Stephan Freer and other
protein crystallographers at the University of California, San Diego. They
are using the SUMEX-AIM facility as the central repository for programs,
data and other information of common interest. The general objectives of
the project are: 1) to identify critical tasks in protein structure
elucidation which may benefit by the application of AI problem-solving
techniques, and 2) to design and implement programs to perform those
tasks.

   Two principal task areas have been identified where collaboration is
of practical and theoretical interest to both protein crystallographers
and computer scientists working in AI:  1) interpreting a 3-dimensional
electron density map, and 2) determining a plausible structure in the
absence of phase information normally inferred from experimental
isomorphous replacement data.

INTERNIST

The INTERNIST project, under the direction of Dr. Harry Pople and

Dr. Jack Myers at the University of Pittsburgh, is a large-scale,
computerized medical diagnostic system utilizing the methods and
structures of artificial intelligence.  Unlike most computer diagnostic
programs,  which are oriented to differential diagnosis in a rather limited
area, the INTERNIST system deals with the general problem of diagnosis in
internal medicine and currently accesses a medical data base encompassing
approximately 50% of the major diseases in internal medicine.

MEDICAL INFORMATION SYSTEMS LABORATORY

   The Medical Information Systems Laboratory (MISL) at the University
of Illinois, Chicago Circle Campus,  has been established under the
direction of Dr. Bruce McCormick of the Department of Information
Engineering, in collaboration with Dr. Morton Goldberg of the Department
of Ophthalmology at the University of Illinois Medical Center,

   The foremost goal of the resource is the exploration of artificial
intelligence techniques in automated clinical decision-making in
ophthalmology.  Investigations into the construction of a data base in
ophthalmology, and into distributed data base design, are ancillary goals.
Incorporating reliable clinical information into the ophthalmology data
base is a critical prerequisite to adequate clinical decision support.
Core research concerns the exploration of inferential relationships
between analytic data and the natural history of selected eye diseases,
both in treated and untreated form.


261

   MISL utilizes the computer facilities of the University of Illinois
and the SUMBX-AIM network, providing the administrative structure for
assembling the expertise of the collaborating departments.  Serving as a
bridge between diverse academic worlds, MISL promotes close involvement
between engineering and medical faculty.  The Illinois Eye and Ear
Infirmary at the Medical Center, with a throughput of 50,000 patients per
year, provides an ideal setting for the direct application of computer
technology to real problems in clinical medicine.

SUMEX-AIM Management

   A significant part of the SUMEX-AIM experiment has been the
development of a management structure to maximize the utility of the
computer capability for a national community.

   Users of the SUMEX facility are divided for administrative purposes
into two groups: 1) local, at Stanford University School of Medicine, and
2) national,  elsewhere in the United States.  As Principal Investigator
for the SUMBX grant, Dr. Lederberg reviews Stanford medical school
projects with the assistance of a local advisory committee.  National
users may gain access to the facility through an advisory panel for a
national program in Artificial Intelligence in Medicine (AIM). The AIM
Advisory Group consists of members-at-large of the AI and medical
communities, facility users and the Principal Investigator of SUMEX as an
ex-officio member.  A representative of the National Institutes of Health-
Biotechnology Resources Program (NIH-BRP) serves as Executive Secretary.

    The SUMEX-AIM computing resource is allocated initially to qualified
users without fee.  This, of course,  entails a careful review of the
merits and priorities of proposed applications.   At the direction of the
Advisory Group, expenses related to communications and transportation to
allow specific users to visit the facility also may be covered.

   SUMEX-AIM is aware of the necessity of making the central facility
available for trial use by potential users and collaborators.   A GUEST
mechanism has been established for those who have an indicated requirement
for brief access to certain programs.  Those who have been given an
appropriate telephone number and login procedure can dial up SUMEX-AIM to
exercise these programs on a trial basis,  A specific objective of many
user projects is the demonstration of their programs for the benefit of a
highly dispersed national community.


262

USER QUALIFICATIONS

   The SUMEX-AIM facility is a community effort, not merely a machine
service.  Applications for membership are judged on the basis of the
following criteria:

1) The scientific interest and merit of the proposed research and
  its relevance to the health research missions of the NIH.

2) The congruence of research needs and goal3 to the AI functions of
  SUMEX-AIM as opposed to other computing alternatives.

3) The user's prospective contribution3 and role in the community,
  with respect to computer science, e.g., developing and sharing
  new system3 or application3 programs, sharing use of special
  hardware, etc.

4) The user's potential for substantive scientific cooperation with
  the community,  e.g., to share expert knowledge in relevant
  scientific specialties.

5) The quantitative demands for specific elements of the SUMEX-AIM
   resource, taking account of both mean and ceiling requirements.

FACILITY INFORMATION

   The computer facility, consists of dual DEC Model KI-10 CPU's
running under a locally-developed dual processor TENEX operating system.
It has 256K word3 (36-bit) of high-speed memory, 1.6M word3 of swapping
storage, 70M words of disk storage, two g-track 800 bpi industry-
compatible tape units, a dual DEC-tape unit, a line printer, and
communications-network interfaces providing user terminal access.  SUMEX
may be accessed by local telephone lines, through the TYMNET and as a host
over the ARPANET communications network,

   Program (software) support will evolve from the basic system as
dictated by the research goals and needs of the user.  Initially,
available programs include a variety of TENEX user, utility and text
editor programs.  Major user languages include INTERLISP, SNOBOL, SAIL,
FORTRAN-IO, BLISS-lo, BASIC, Macro-lo, OMNIGRAPH and MLAB.


263

POTENTIAL USERS

Potential users seeking further information are invited
to write:

Elliott Levinthal, Ph.D.
AIM User Liaison
SUMEX-AIM Computer Project
c/o Department of Genetics, SO47
Stanford University Medical Center
Stanford, California 94305
Telephone:  (415) 497-5813

Procedures for access to SUMEX-AIM are governed by the:

Biotechnology Resources Program
Division of Research Resources
National Institutes of Health
Building 31, Room 5B19
Bethesda, Maryland 20014


264

APPENDIX J

GUIDELINES FOR PROSPECTIVE USERS

     SUMEX-AIM RESOURCE
INFORMATION FOR POTENTIAL USERS

   National users may gain access to the facility resources through an
advisory panel for a national program in Artificial Intelligence in
Medicine (AIM).  The AIM Advisory Group consists of members-at-large of
the AI and medical communities, facility users and the Principal
Investigator of SUMEX as an ex-officio member,  A representative of the
National Institutes of Health-Biotechnology Resources Program (NIH-BRP)
serves as Executive Secretary.

   Under its enabling 5-year grant, the SUMEX-AIM computing resource is
allocated to qualified users without fee,  This, of course, entails a
careful review of the merits and priorities of proposed applications. At
the direction of the Advisory Group, expenses related to communications
and transportation to allow specific users to visit the facility also may
be covered.

USER QUALIFICATIONS

   The SUMEX-AIM facility is a community effort, not merely a machine
service.  Applications for membership are judged on the basis of the
following criteria:

1)



2)


3)





4)




5)

The scientific interest and merit of the proposed research and
its relevance to the health research missions of the NIH.

The congruence of research needs and goals to the AI functions of
SUMEX-AIM as opposed to other computing alternatives.

The user's prospective contributions and role in the community,
with respect to computer science, e.g., developing and sharing
new systems or applications programs, sharing use of special
hardware, etc.

The user's potential for substantive scientific cooperation with
the community, e.g., to share expert knowledge in relevant
scientific specialties.

The quantitative demands for specific elements of the SUMEX-AIM
resource, taking account of both mean and ceiling requirements.

In many respects, this requires a different kind of information for


265

judgment of proposals than that required for routine grant applications
seeking monetary funding support.  Information furnished by users also is
indispensible to the SUMEX staff in conducting their planning, reporting
and operational functions.

   The following questionnaire encompasses the main issues concerning
the Advisory Group,  However, this should neither obstruct clear and
imaginative presentation nor restrict format of the application.  The
potential user should prepare a statement in his own words using
previously published material or other documents where applicable, In
this respect, the questionnaire may be most useful as a checklist and
reference for finding in other documentation the most cogent replies to
the questions raised.

   For users mounting complex and especially non-standard systems, the
decision to affiliate with SUMEX may entail a heavy investment that would
be at risk if the arrangement were suddenly terminated.  The Advisory
Group endeavors to follow a responsible and sensitive policy along these
lines --one reason for cautious deliberation; and even in the harshest
contingencies,  it will make every effort to facilitate graceful entry and
departure of qualified users.  Conversely,  it must have credible
information about thoughtful plans for long-term requirements including
eventual alternatives to SUMEX-AIM.  SUMEX-AIM is a research resource, not
an operational vehicle for health care.  Many programs are expected to be
investigated, developed and demonstrated on SUMEX-AIM with spinoffs for
practical implementation on other systems.  In some cases, the size, scope
and probable validation of clinical trials would preclude their being
undertaken on SUMEX-AIM as now constituted.  Please be as explicit as
possible in your plans for such outcomes.

Applicants, therefore, should submit:

1) One to two-page outline of the proposal.

2) Response to questionnaire, cross-referenced to supporting
  documents where applicable.

3) Supporting documents.

4) List of submitted materials, cross-referenced,

   We would welcome a draft (2 copies) of your submission for informal
comment if you so desire.  However, for formal consideration by the SUMEX-
AIM Advisory Group, please submit 13 copies of the material requested
above in final form.


266

Elliott Levinthal, Ph.D.
AIM User Liaison
SUMEX-AIM Computer Project
c/o Department of Genetics, SO47
Stanford University Medical Center
Stanford, California 94305
Telephone:  (415) 497-5813

May, 1976


267

SUMEX-AIM RESOURCE

QUESTIONNAIRE FOR POTENTIAL USERS

   Please provide either a brief reply to the following or cite
supporting documents.

A) MEDICAL AND COMPUTER SCIENCE GOALS

1) Describe the proposed research to be undertaken on the SUMEX-AIM
   resource.

2) How is this research presently supported? Please identify
  application and award statements in which the contingency of
  SUMEX-AIM availability is indicated.  What is the current status
  of any application for grant support of related research by any
  federal agency?  Please note if you have received notification of
  any disapproval or approval, pending funding, within the past
  three years.  Budgetary information should be furnished where it
  concerns operating costs and personnel for computing support,
  Please furnish any contextual information concerning previous
  evaluation of your research plans by other scientific review
   groups.

3) What is the relevance of your research to the AI approach of
  SUMEX-AIM as opposed to other computing alternatives?

B) COLLABORATIVE COMMUNITY BUILDING

1) Will the programs designed in your research efforts have some
  possible general application to problems analogous to that
   research?

2) What application programs already publically available can you
  use in your research?  Are these available on SUMEX-AIM or
   elsewhere?

3) What opportunities or difficulties do you anticipate with regard
  to making available your programs to other collaborators within a
  reasonable interval of publication of your work?

4) Are you interested in discussing with the SUMEX staff possible
ways in which other artificial-intelligenbe research capabilities
  might interrelate with your work?

5) If approved as a user, would you advise us regarding
  collaborative opportunities similar to yours with other
  investigators in your field?


268

C> HARDWARE AND SOFTWARE REQUIREMENTS

1) What computer facilities are you now using in connection with
  your research or do you have available at your institution? In
  what respect do these not meet your research requirements?

2)

What languages do you either use or wish to use? Will your
research require the addition of major system programs or
languages to the system? Will you maintain them? If you are
committed to systems not now maintained at SUMEX, what effort
would be required for conversion to and maintenance on the PDP-10
- TENEX system?  What are the merits of the alternative plan of
converting your application programs to one of the already
available standards?  Would the latter facilitate the objectives
of Part B), Collaborative Community Building?

3) Can you estimate your requirements for CPU utilization and disk
  space?  What time of day will your CPU utilization occur? Would
  it be convenient or possible for you to use the system during
  off-peak periods?  Please indicate (as best you can) the basis
  for these estimates and the consequences of various levels of
   restriction or relaxation of access to different resources.
  SUMEX-AIM's tangible resources can be measured in terms of:

a>  CPU cycles.

b) Connect time and communications.

c) User terminals (In special cases these may be supported by
  SUMEX-AIM.).

d) Disk space.

e) Off-line media-printer outputs, tapes (At most, limited
  quantities to be mailed.).

Can you estimate your requirements? With respect to a) and b),
there are loading problems during the daily cycle. --Can you
indicate the relative utility of prime-time (0900-1600 PST) vs.
of f-peak access?

4) What are your communication plans (TYMNET, ARPANET, other)? How
  will your communication and terminal costs be met?  See following
  note concerning network connections to SUMEX-AIM.

5) If this is a development project, please indicate your long-term
  plans for software implementation in an applied context keeping
   in mind the research mission of SUMEX-AIM.

Our procedures are still evolving, and we welcome your suggestions


269

about this framework for exchanging information.  Needless to say, each
question should be qualified a>  "insofar as relevant to your proposal",
and b)  "to the extent of available information".

    Please do not force a reply to a question that seems inappropriate.
We prefer that you label it as such so that it can be dealt with properly
in future dialogue.

    Above all, we are eager to work with potential users in any way that
would help minimize bureaucratic burdens and still permit a responsible
regard for our accountability both to the NIH and the public.   Please do
not hesitate to address the substance of these requirements in the format
most applicable to you.

NETWORK CONNECTIONS TO SUMEX-AIM

TYMNET

     Attached is a list of available TYMNET nodes and associated
telephone numbers.  The cost to users of using TYMNET is the telephone
charge from user location to the nearest TYMNET node.   This is available
only for communication to SUMEX-AIM and not for other facilities that may
be connected to TYMNET.  In some cases, there are "foreign exchanges" set
up by users.  These may offer less expensive communication, Details of
these possibilities can best be learned by calling the nearest TYMNET
node.  The telephone company can provide information on comparative costs
of leased lines, toll charges, etc.  The initial capital investment for
TYMNET installation as well as login and hourly charges is provided by
SUMEX-AIM.  Standard usage charges on TYMNET are approximately $j/connect-
hour.

ARPANET

    SUMEX-AIM is connected to the ARPANET.  Our name is SUMEX-AIM; our
nickname is AIM.  We support the new TELNET protocol.   Our network address
is decimal 56, octal 70.  This provides convenient access for ARPANET
Hosts and Associates and those who have accounts with ARPANET.