SUMEX

      STANFORD UN IVERS ITY
MEDICAL EXPER IMENTAL COMPUTER RESOURCE

RR - 00785

ANNUAL REPORT - YEAR 04

       Subm it ted to
BIOTECHNOLOGY RESOURCES PROGRAM
NATIONAL INSTITUTES OF HEALTH

June, 1977

     DEPARTMENT OF GENfl ICS
STANFORD UN IVERS ITY SCHOOL OF MED IC INE
Joshua Lederberg, Principal Investigator


NATIONAL INSTITUTES OF HEALTH
DIVISION OF RESEARCH RESOURCES
BIOTECHNOLOGY RESOURCES PROGRAM

SECTION I  - RESOURCE IDENTIFICATION

Report Period:                                        Grant Number:

                                                 RR-00785-04
From August 1, 1976 to July 31, 1977                Report Prepared:
                                           June, 1977

Name of Resource:              Resource Address:
Stanford University         Stanford University
Medical Experimental         Stanford, California
Computer (SUMEX)             94305

Resource Telephone Number:

Principal Investigator:
Joshua Lederberg, Ph.D.

Title:                Academic Department:
Chairman and Professor      School of Medicine
Department of Genetics     Department of Genetics

Grantee Institution:         Type of Institution:      Investigator's Telephone No.:

Stanford University         Private University        (415) 497-5501


Name of Institution's Biotechnology Resource Advisory Committee:

SUMEX-AIM Executive Committee

Membership of Biotechnology Resource Advisory Committee:
Name                  Title              Department      Institution

Saul Amarel, Ph.D.    Chairman and Professor Ccmputer Science Rutgers University
Donald Lindberg, M.D.  Professor           Pathology      University of Missouri
                     Director              Information     School of Medicine
                                 Science Group
Jack Myers, M.D.     University Professor   At Large      University of Pittsburgh
                    of Medicine                         School of Medicine

Principal Investigator:

Joshua Lederberg, Ph.D.
Chairman and Professor



Stanford University Official:

Signature:                      Date:
;-&-.L~-,     June 10, 1977


                                   /
                                 ,
.,                            /
Signature:                      Date:

June 10, 1977


Table of Contents
      - ---

   l3QOK I

Section

Table of Contents - BOOK I . . . . . .

List of Figures                 . . . . . .

Table of Contents - BOOK II . . . . . .

1.   RESOURCE OBJECTIVES AND PROGRESS . . . .


   1 .l   OVERVIEd OF OBJECTIVES AND RATIONALE .


   1.2   BACKGROUND AND PRaGRESS . . . . .


       1.2.1   PROGRESS Stii?MARY . . . . .


       1.2.2   DETAILED PfiOGRESS REPORT . . .

1.2.2.1


1.2.2.2

1.2.2.3

1.2.2.4


1.2.2.5

1.2.2.6

1.2.2.7

1.2.2.8


1.2.2.9


1.2.2.10


1.2.2.11


1.2.2.12

1.2.2.13

1.2.2.14

. .



. .



. .





. .



. .



. .



. .



. .

. . .



. . .



. . .





o ? ?



? ? ?



? ? ?



? ? ?



? ? ?

DEFINITION OF TERMS AND OBJECTIVES . .

FACILITY tiARDir]ARE . . . . . . . .

SYSTEM SOFTWARE . . . . . . . .

NETWORK COi$XlNICATIGN FACILITIES . . .

SYSTEM RELIABILITY AND BACKUP . . . .

PROGRAklMING LANGUAGES . . . . . .

STANFORD AI HANDBOOK PROJECT . . . .

.



.



.



.



.



.



.



.



.



.



.

. .



. .



. .





. .



. .



. .



. .



. .



. .



. .



. .



. .



. .



. .



. .

USER SOFTklARE AND INTRA-COMiqUNITY COMXLJNICATION

DOCUXENTATION AND EDUCATION    .......

SOFTiJARE COMPATIBILITY AND SHARING   .....

RESOURCE MANAGE?lEWT   ..........

SUI~MARY OF RESOURCE USAGE   ........

NETriORK USAGE STATISTICS    ........

PUBLICATIONS    ............

Page

.    i



.   ii



.   iii





.   . 1



.   . 1



.   . 4



.   . 4



.  . 6



.   . 6



.   * 7



.   14



.   15



.   23



.   23



.   26



.  27



.   28



.   28



.   29



.   35



.   48



.   51

J. Lederberg


TABLE OF CCNTENTS

BOOK I (continued)

                  List of Figures
                              --


SUMEX-AIM Computer Configuration . . , . . . . . . .

Cost-effectiveness of SUMEX Augmentations . . . . . . .

Capacity and Loading Increase with Dual Processor Augmentation

4.

5.

TYMNET !Vetwork Map         . . . . . .


ARPANET Geographi.cal Network Map . . .

6.   ARPANET Logical ldetwork Map . . . . .

7.   Monthly CPU Time Consumed . , . . .

8.   CPiJ Usage by Community . . . . . .

9.




10.




11.




12.



13.



14.



15.

File Space Usage by Community . . . .

Average Diurnal Loading (3/77): Total Number of Jobs .

Average Diurnal Loading (3/77): Percent Time Used . .

Average Diurnal Loading (3/77): Percent. Overhead . .

Average Diurnal Loadjng (3/77): Balance Set - Jobs in Core .

Average Diurnal Loadjng (3/77): Runnable Jobs . . . . .

TYMNET and ARPANET Usage Data . . _ _ _ _ _ _ _ _ _

u -

. .





. .





. .





. .





. .





. .

. . .





. . .





. . .





. . .





. . .





. . .

. .





. .





. .





? o





? ?





? ?





? ?





? ?





? ?

.





,





.





.





.





.





.





.





.





.





.





.





.





.

. .





. .





. .





. .





. .





. .





. .





. .





. .

.9

11

13

19

20

21

35

37

38

45

45

46

46

47


49

J. Lederberg


Table of C0nt.ent.s
      ---

   BOOK II

  Introduct.ion . . . . , . . . , . . . .

5.    COLLABORATIVE PROJECT PROSRESS AND OBJECTIVES    .

   6.1    STANFORD PROJECTS . . . . . . . . .

      6.1.1   DENDRAL PROJECT . . . . . . . .

      5.1.2   HYDROID PROJECT . . . . . . . .

     6.1.3   MOLGEN PROJECT . . . . . . . .

      6.1.4   ilYCIN PROJECT . . . . . . . .

      6.1.5   PROTEIN STRUCTURE PROJECT . . . .

   6.2    NATIONAL AIM PROJECTS . . . . . . . .

   6.2.1   mmsmo~~ OF COGNITIVE PROCEDURES (ACT)

   6.2.2   CHEt'iICAL SYNTHESIS PROJECT (SECS) . . .

   6.2.3   HIGHER MENTAL FUNCTIONS PROJECT . . .

   6.2.4   INTERNIST PROJECT . . . . . . . .

   6.2.5   MEDICAL INFORr4ATIOrJ SYSTEMS LABORATORY .

   6.2.6   RUTGERS COMPclTERS ItJ 3IOi4EDICINE . . .

6.3    PILOT STANFORD PROJECTS . . . . . . . .

   6.3.1   GENETICS APPLICATIONS PROJECT . . . .

   6.3.2   BAYLOR-METHODIST CEREBROVASCULAR PROJECT

   6.3.3   COMPUTER ANALYSIS OF CORONAHY ARTERIOGRAMS

   6.3.4   QUANTUM CHEMICAL INVESTIGATIONS . . .

6.4    PILOT AIM PROJECTS . . . . . . . . . .

.



.



.



.



.



.



.



.



.



.



.



.



o



?



?



?



?



?



?



?

6.4.1   COIWJNICATION ENHANCEMENT PROJECT  ....

6.4.2  AI IN PSYCH6PHARMACOLOGY  .......

6.4.3   ORGAN CULTURE PROJECT  ........

5.4.4   NEUROPROSTHESES PROJECT   .......

.



*



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.

6.4.5   MATHEMATICAL MODELING OF PHYSIOLOGICAL SYSTEMS

6.4.6  PUFF/VM PROJECT (Abri.dged - Proposal pending)



                        iii.

.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.

.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.

.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.



.

.    1



.   41



.   41



.   42



.   76



.   81



.   84



.  108



.   112



.  113



.  118



.  128



.  132



.  138



.  144



.  158



.  159



.  161



.  165



.  169



.  171



.  172



.  179



.  189



.   191



.   194



.  197

J. Lederberg


TABLE OF CONTENTS

BOOK II (continued)

Appendix I

OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH
[Deleted - Copi.es available on request]

Appendix II

AI HANDBOOK OUTLINE . . . . . .

Appendix III

SUMMARY OF MAINSAIL LANGUAGE FEATURES .

Append1.x IV

MICROPROGRAMMED MAINSAIL PLANS . . .

Appendix V

  AIM MANAGEMENT COMMITTEE MEMBERSHIP    .



Appendix VI

  USER INFORMATION - GENERAL BROCHURE    .



Appendix VII

GUIDELINES FOR PROSPECTIVE USERS . .

. .









. .









. .









. .









. .









. .









. .

.   202

.  225

.  231

.  235

.  239

.  243

.  245

J. Lederberg

iv



PROGRESS REPORT

            BOOK I
               --

SUMEX-AIM Resource Progress Report

    This is an annual report of t,he work performed under an !\lIH Biotechnology
Resources Program grant supp0rt.i.n g t.he St.anford Uni versi ty Medical EXperimental
computer (SUMEX) research resource for appli cati ons of Arti fi ci al Intelli gence i n
Medi ci ne ( Air?) .  It spans the year from May 1976 - April 1977. As we have
invested substantial effort in prepari.ng a related document, an application for
renewal dated June 1,  1977, t.hi s report has been prepared by revising and
augmenting the other.  Some secti.ons may i.nadvertently reflect that proveni ence,
e.g., by adopt,i.ng a longer time perspective, but we believe wi thout distorting or
ai srepresenti.ng our last year's effort.  Book II of this report. is the sa%e text
as used for the renewal, and contains det.ai led progress report.s of collaborating
user projects and other pertinent appendi ces.

1    RESOUflCE OBJECTIVE3 AND PROGRESS

     1 .l   OVERVIEW E OBJECTIVES AND RATIONALE

    The SUMBX-AIM project is a nat.ional computer resource wi th a dual mi ssi on:
1) the promotion of applications of artificial i.ntelligence (AI) comput,er science
research to biological and medi.cal problems and 2) the demonstration of comput.er
resource shari.ng within a national community of health research projects.

    Definitive funding of the SUMEX-AIM resource was initiated in December
???? o  The pri.nci.pal hardware was delivered and accepted in April 1974, and the
system became operational for users during the summer of 1974. The present
renewal is therefore wri.tten from a perspective of just. short of three years of
experience in attempting to develop and serve the user community for the
resource.

    The origi.nal SUMEX proposal was an outgrowth of two lines of endeavor at
Stanford tnat had been supported by the Biotechnology Resources Program.   The
ACME project (Advanced Computer for MEdi cal Research), 1965-72, had introduced
the i nnovati.on of interactive time-shared computing to the medical research
community at the Stanford Medical Center.  The second line, the DENDRAL project,
i-s a resource-related project, connected wi th applications of arti fi.ci.al
i ntelli.gence to problems of molecular characteri zation by analytical instruments
li.ke mass-spectrometry, gas-chromatography, nuclear magnetic resonance, and so
on.

    In 1972 we appli.ed to NIH for the establi.shment at Stanford of a next
generation computer resource to supplant, ACME for applications for which the
uni.versi ty-wi de faci.li ty was inadequate.  The DENDRAL project was the central
source of thi s i.ni ti ati ve; several ot,hers entailing real-time instrumentation as

much as AI needs were also specified . Du ri.nq the subsequent 18 months, we
entered a phase of protracted review and neg0t.i ations with BRP and its advi sory

1

J. Lederberg


Section 1 . 1

OVERVIEti OF OBJECTIVES AIVD RATIONALE

groups, from which emerged the policy determi nat.ion that. resources of t.hi s scope
were best. justi fied if they could be .funct.ionally speci.ali zed, but geographically
generali zed.  The emerging technology of computer networking opened an
opportunity to demonstrate t,his model i.n a way that could serve both local and
nat.i onal needs.

    Our technical task has been achieved: to collect and implement an effective
set of hardware and software tools supporting the development of large and
complex AI programs and to faci li tate comaunicati.ons and interactions between
user groups.  In effect, users throughout t.he country can turn on thei.r own
teletype or CRT-di splay terminals, dial a local number, and logon to SUMEX-AIM
wi th the same ease as i f it were located on their own campus -- and have access
to a speciali zed resource unlikely to be matched nearby.  From the communi t,y
vj ewpoi n t , we have substantially increased the roster of user projects (from an
initial 5) to 11 current major projects plus a group of pi.lot. efforts.  Many of
these projects are bui It around the communications network facilities we have
assembled; bringing t.ogether medical and computer science collaborat.ors from
rem0t.e i.nsti.tutions and maki.ng their research programs available to still ot.her
remote users.  As discussed i-n t.he sections describing the individual projects, a
number of the comput.er programs under development by these groups are maturing
into tools increasingly useful to the respecti.ve research communities.  The
demand for production-level use of these programs has surpassed the capacity of
t.he present. SU[%X faci.li t.y and has rai.sed the general issues of how such software
systems can be optimized for producti on environments, exported, and maintained .

J. Lederberg

2


Secti.on 1 .2                                       BACKGROUND AND PROGRESS

     1.2    BACKGROUND AND PROGRESS



     1.2.1    PROGRESS SUMMARY

   This progress summary covers the period from December 1973, when the SUMEX-
AIM resource was ini ti.ally funded, through April 1977.  During thi s peri od we
have met all of the defined goals of the resource:

j)

ii)

i. i i. >

iv >

VI

vj >

We have established an effective c0mput.i ng faci.li tg t.o support. a nation-
wide community of medical AI research projects including connections to
two comput.er comnuni cat.i on networks to provide wide geographical access to
the facility and research programs,

We have actively recruited 2 growing communi.ty of user pro.iects and
                                                   a-
collaborations.  The i ni.ti.al complement of collaborators i ncludezi ve
projects.  This roster has grown to eleven fully authorized projects
currently plus a group of approximat.ely six pi-lot efforts in various
stages of formulation.  Recrui.ting efforts have included a pub1i.c
dedication and announcement of t.he resource, NIH referrals from comput.er-
based project reviews, direct contacts by resource personnel and on-going
projects as well as contacts through the AIM workshop series coordinated
by the Rutgers Computers in Bi omedi.tine resource under Dr. Saul Amarel .

We have established an AIM communi tv
                      --         management st.ruct.ure based on an
overseeing Executi.ve Commi ttee and an Advi sory Group to assi st in
recruiting and assessing new project applications and in guiding the
pri ori ti es for SUMEX-AIM developments and resource allocati ons.   These
committees also provide a formal mechanism for user projects to request
adjustments in their allocated share of facility resources and to make
known their desi.res for resource developments and pri ori.ti es.

SUMEX user projects have made good progress i.n developinq more effect.ive
consultative computer programs for medi.cal research; one of t.he major
goals toward which our AI appli cations are ai med.  These performance
programs provide experti se i n analytical bi ochemi cal analyses and
syntheses, medi.cal di agnoses, and various kinds of cognitive and affective
psychological modeling.

We have worked hard to bui Id system faci li t,ies t;o enable the inter- and
i ntra- group communicati.ons and collaborations upon which SUHEX is bzd.
We have a number of examples in whi.ch user projects combine medical and
computer science experti se from geographi.cally remote i nsti tut.i.ons and
numerous examples of users from all over the United Stat.es and
occasi onally from Europe experimenting with the developing AI programs.
The SUivIEX staff itself has had good success in establi shing such sharing
relationshi.ps on a system level with other research groups and has many
examples of complementary development and maintenance agreements for
system programs.

We have made numerous i.mprovements to the computing resource to extend its
                                      --
capaci.ty , to i.mprove i.ts ef fi ci.ency , t.o enhance its human interfaces, to
i.mprove i ts documentation, and to enhance the range of software facili.ties
available to user projects.

J. Lederberg

4


PRC)GRESS SUkli4ARY

Section 1.2.1

vi j >  We have begun a core research effort to i.nvestigate alternatives and
   programming tools to faci.li tate the exportability of user and system
                                                             B-v
    software .  This i s just now producing a %achine-independentll
   implementation of the ALGOL-li.ke SAIL language which will run on a range
  of large and small machines and provide a language base for transferring
    programs .

viii)  We have supported community efforts in the more systematic documentation
   of AI concepts and techni.ques and in bui ldinq
       --                          more general software t.ools
  for the desi.gn and i.mp1ementat.i on of AI appli c=n programs.  These have
   included a Stanford AI Handbook project. compri sing a compendium of short
   articles about the projects, ideas, problems, and techniques that make up
     the field of AI.

J. Lederberg


Section 1.2.2

DETAILED PRGGRESS REPORT

1.2.2    DETAILED PROGRESS REPORT

    The following material covers in greater det.ail the SU!4EX-AIM resource
acti.vi t.i es over the past 3.5 years.  These secti.ons attempt to define i.n more
detai 1 the techni cal objectives of our research communi t.y and include progress in
the context of the resource staff and the resource management.  Details of the
progress and plans for our ext.ernal collaborator pr0ject.s are presented i.n
Section 6 on page 41 (in Book II).

1.2.2.1    DEFIlJITION OF TERMS AND OBJECTIVES
                        ---

    Artificial Intelligence i s a branch of comput.er science which att.empt,s to
di stern the underlying principles involved i.n the acquisition and uti li zation of
knowledge in reasoning, deducti on, and problem-solving acti vi ti es ( 1) .  Currently
authorized projects in the SUEYEX community are concerned in some way with the
appli cati on of these pri nci.ples to bi.oaedi cal research.  The tangible ob jecti ve
of this approach is the development of computer programs which, using formal and
informal knowledge bases together wi th mechanized hypothesi s format.i on and
problem solving procedures, will be more general and effective consult.at,ive tools
for the clini ci.an and medical sci enti.st..  The exhaust.ive search p0tent.i al of
computeri zed hyp0thesi.s formation and knowledge base uti li zati on, constrai ned
where appropri.ate by heuri stic rules or interactions with the user, has already
produced promi sing results in areas such as chemi.cal structure eluci dat.ion and
synthesi s, di agnostic consultation, and mental function modeling.   Needless to
say, much is yet to be learned in the process of fashioning a coherent sci ent3fi.c
disci pli ne out of the assemblage of personal intuitions, mathematical procedures,
and emerging theoreti.cal structure of the "ana1ysi.s of ana1ysi.s" and of problem
solvi ng .  State-of-the-art programs are far more narrowly speci ali zed and
inflexible than the correspondi.ng aspects of human intelligence t,hey emulate;
however, in special domai.ns they may be of comparable or greater power, e.g., in
the solution of formal problems in 0rganj.c chemistry or in the i nt.egral calculus.

    An equally important function of the SUMEX-AIM resource i s an exploration
of the use of computer communications as a means for i.nteractions and sharing
between geographically remote research groups in the context of medical computer
sci ence research.  This facet. of scientific interaction i.s becoming increasingly
i.mportant wi th the explosi.on of complex information sources and the regional
specialization of groups and facili ti.es that might be shared by remote
researchers.  Our community buildi.nq role is based upon the current, state of
comput,er communications technology.  While far from perfect.ed, these new
capabi li ti es offer highly desi rable lati tude for collaborative linkages, both
wi thin a given research project and among them.  Several of the active projects
on SUI4EX are based upon the collaboration of computer and medi.cal scientists at

---------------------------------------------------------------------------------
   (1) For recent reviews to gi.ve some perspective on the current state of AI,
see: (i) `Winston, P.H., "Arti.fici al Intelli gence" , Addi.son-Wesley Publi.shi.ng Co. ,
19 77; (i i ) Ni lsson, N. J. , "Arti fi ci al Intelli gence" , Informati.on Processi ng 74,
North-Holland Pub. Co. (1975) ; and (i.i.i) a summary by Feigenbaum, E. A., attached
as Appendix I, page 202 (see Book II).  An addi.ti onal overvi ew of research
areas i.n AI i s provided by the outline for an "Artificial Intelligence Handbook"
bei.ng prepared under Professor Feigenbaum by computer sci.ence students at
Stanford (see Appendi.x II on page 225 in Book II).

J. Lederberg                      6


DETAILED PROGRESS REPORT

Section 1.2.2.1

geographi tally separate i nsti.tuti ons; separate bot.h from each other and from the
computer resource.  The network experiment also enables di.verse projects to
interact more directly and to facili.tate selective demonstrations of available
programs to physicians and medical students.  Even i.n their current developing
state, we have been able to demonstrate that such communication faci.li.ti es allow
access to the rat.her speciali.zed SUMEX computing environment and programs from a
great many areas of the United States (even to a li.mi ted extent from Europe) for
potential new research projects and for research product dissemination and
demonstration.  In a similar way, the network connections have made possible
close collaborations in the development and maintenance of system software wi.th
other faci.li. ti.es.

1.2.2.2    FACILITY HARDWARE

   Based on the AI mi ssi.on of SUMEX-AIM, we selected a Digi tal Equipment
Corporati.on (DEC) model KI-10 computer system for our facility.  This selecti on
was based on 1) hardware architectural and performance feat.ures, 2) available
software support relevant to AI appli cat.ions , 3) price versus performance data
for the system, and 4) the scope of the user c0nmuni.t.y from which we might expect
t.o draw collaborators and share software.  Thi.s choi ce has proved highly
effective.

   The current system hardware confi.guration is di.agrainmed in Figure 1 on
page 10.  It is the result of a number of augmentations over the past 3 years to
meet the capaci.ty needs of the growi.ng SUMEX-AIM project c0mmunit.y.  Our i.ni t i al
confi.guration consisted of a KI-10 processor, core memory (192K 36-bit words @ 1
microsecond), swappi.ng storage (I .7M words @ 8 msec average rotati.onal latency
and 2 microsecond/word transfer rate), fi.le storage (4OM words), magnetic tapes,
DEC tapes, terminal li.ne scanner, and line pri.nter.  Our network connections are
discussed in Secti.on 1.2.2.4 on page 16.

   Thi.s system reached prime-t.i.me saturation by fall of 1974. Since many of
our medical and other professi.onal collaborators cannot adjust thei.r schedules to
match li.ght computer loading during the night-time hours, the prime-t.ime
responsiveness is crud al to bel ng able to support medi.cal experimentation with
developing programs and to allow communi t.y growth.  We have taken active steps to
transfer as much pri.me-time loadi.ng as feasible to evening and night hours
I.ncluding shi fti ng personnel schedules (part.icularly for Stanford-based
projects), controlling the allocati.ons of CPU resources between various user
communiti.es and projects, and encouraging jobs not requiring intimate user
i.nteraction to run during off hours by developing batch job faci li ti.es.  Despite
these efforts, prime-ti.me loading has remained quite high, particularly wi t.h the
growth of the number of user projects.

   A similar congesti.on has persi sted in the on-li.ne file space we have been
able to allocate to user pr0ject.s.  Again we have implemented controls to try to
assure effective use of available space and to encourage use of external file
storage facilities such as the ARPANET Data Computer and other computer sites.
Nevertheless, the interactive character of SUMEX use, the large AI program files,
and the extensive use of SUMEX for collaborator communications have continuously
raised file space demands beyond those we could meet.

7

J. Lederberg


Section 1.2.2.2

DETAILED PROGRESS REPORT

   We have proposed a number of hardware confi.guration augmentat.ion steps to
the Executi ve Commi.ttee to cost-effectively provi de addi ti onal capaci t.y.   These
were based on analyses of predomi.nant system bott,lenecks and enhancement steps
feasi ble wi. thi.n avai lable budgets.  The enhancements approved by the committee
and implemented i nclude :

I> Add 64K words of core memory and 20M words of file storage (11/74)

2) Add second KI-10 CPU for dual processor operation (5/76)

3) Add 256K words of core memory and upgrade fi.le system to higher volume,
  lower cost technology (recently approved by NIH and the AI:4 Executive
  Commi.ttee wi th implementati on in progress)

    A plot of effecti.ve CPU capacity as a function of continuing i.nvestment is
shown in Fi.gure 2 on page 11 and displays the cost-effectiveness of our
sequenti.al augmentati ons.  At the present. time our hardware configuration has
grown about as much as i s cost-effect,ive.  Addi t.i onal growth would entai 1
signi ficant redesigns of the system i.ncludi.ng upgrades of existing hardware.
Contemplating such future expansi on also rai ses the i ssues of compat.i bi li ty wi th
newer hardware technologies being announced.  These provide advant.ages in speed,
cost., si.ze, and mai.ntai.nabi li. t.y .  Such a complete upgrade i s not envisioned in
t.he i.mmedi ate fut,ure as a number of int.eresting new product. announcements are
expected over the next 1 or 2 years that could subst,anti.ally affect such an
upgrade strategy.

J. Lederberg


.                                                    .              9

  Memory      Memory   Memory        Memory
   m-10          M!?-10         MF-10          MF-10

t          I      I   I -
IL

(2) 4800 baud
network links

TY?ISHARE
TIP

Figure 1.                         AI Lab
SUMEX-AIM Computer Configuration     IMP

- ler
TN-10


DETAILED PROGRESS REPORT                                Section 1.2.2.2

Figure 2.  Cost-effectiveness of SUMEX Augmentations

Estimated Capacity in
Useful KI-10 Equivalents
(Net of overhead)

2 T

/ - Add 256K memory and upgrade
     file/tape system [estimated
     improvement - upgrade in progress]


  \
   Add second KI-10, 5/76

/    Initial purchase, 3/74

KI-10 with 192K memory

0                 1                 2
     Cumulative System Investment ($M)

    This plot illustrates the incremental increases in computing capacity
achieved as a function of cumulative investment in the SUMEX-AIM facility. The
higher slope of the curve after the initial investment illustrates both the
substantial investment in peripheral devices (file system, tapes, communications,
etc.) and the trend toward lower memory prices.  The largest impact in terms of
PDP-10 memory price reductions occurred around the time of adding the 64X
increment in November 1974,  Since then processor prices have stayed relatively
stable and memory prices have dropped less dramatically.  It should be noted that
semi-conductor memories have not yet made a big in-road in the PDP-10 market;
this technology is where the more recent memory price reductions have occurred.

   The original purchase of 1 KI-10 with 192K of memory for about $800K
performed with about 60% efficiency under peak load.  Adding the 64K of memory
for $75K brought the efficiency up to about 85%. Then adding the second
processor for $200K increased throughput to about 1.3-1.4 KI-10 equivalents.
This step represents about a 59% increase in throughput for a 20% increased
investment.  A proposal has been approved recently by the AIM Executive Committee
and NIH to augment core memory by 256K words.  This augmentation would increase
throughput to about 1.7 KI-10 equivalents for another $lOOK; this would be a 26%

Privileged Communication

11

J. Lederberg


Section 1.2.2.2

DETAILED PROGRESS REPORT

throughput increase for 8% addi.tional investment.  As part of the proposed memory
augmentat.ion we plan to upgrade the fi.le and tape systems as well to relieve file
space congestion and i.ncrease system operati ons efficiency .  Includi.ng the net
cost of the fi.le/t.ape upgrade in these fi.gures (purchase price less resale of
exi sting equipment) raises the proposed addi.tional investment to $16OK and the
fractional i.ncrease from 8% to 13%.  Of course, the di.sk upgrade affects CPU
t.hroughput only i.ndi.rectly in that t.he increased speed reduces c0nt.ent.i on,
part,icularly when movi.ng head swapping is necessary.  It contri.butes primarily to
supporting the growing on-li.ne fi.le needs of the projects.

J. Lederberg

12


DETAILED PROGRESS REPORT                                 Secti.on 1.2.2.2

Fi.gure 3.  Capacity and Loadi.ng Increase with Dual Processor Augmentation

l-PROC OP'N   2-PROC TRNS'ti   2-PROC OP'N   2-PROC OP'N
l/76 - 4/76    5176 - 8/76   g/76 - 12/76   l/77 - 3/77
-----------     -------------    ------------    -----------

Peak Ld Ave       4.8         5.6           6.0         6.6

Peak   Jobs      30.2         33.3         34.7       38.1

;% Overhead/
Processor

18.1          31.1          33.2        31.9

Total CPU
drs/Mo

304.4        384.9        534.0        520.1

    This table presents system usage data averaged over several months
preceding, duri.ng , and after j.nstallati.on of the SUMEX-AIM dual processor syst.em
in order to show real changes i.n peak loading capacity and computing resources
deli vered.  The first three rows of data are derived from monthly di.urnal loading
data and reflect average prime-time peak loading conditions (daily peak usage
figures are often considerably higher, but those shown better represent gross
trends).  The last row gi.ves average total monthly CPU hours delivered during the
various periods.

    With t.he common cri.terion that users have pushed both the single and dual
processor systems to the 1imi.t.s of useful work in terms of prime time
responsi.veness, it is clear that the second processor has substantially increased
throughput (lltolerablell peak load average up 388, number of jobs up 268, and
delivered CPU hours up 7146).  At the same ti.me the overhead burden per machine
has risen from 18 to 3246, principally i.n the category of I/O wait (total
scheduler time and time wai.ting for a runnable job to be loaded in core). An
addi.t.ional factor, not explicj.tly shown in these data (because we only have a 1
msec clock) ,  is the added time spent at interrupt level servicing drum swapping.
Thi s adds another lo-151 estimated overhead.

   We feel these increased overhead figures can be reduced roughly to the
single processor levels by adding more memory,  thereby ef fecti vely recovering
about 40-50s of the capacity of a KI-10 processor.  A proposal j s now pending
with the AIM Executive Commi.ttee for this augmentatjon and we expect it to be
implemented wi thin the funding cej li.ng of the current grant.

13                   J. Lederberg


Section 1.2.2.3

DETAILED PROGRESS REPORT

1.2.2.3    SYSTEM SOFTWARE

    In parallel wi th t.he choi ce of DEC PDP-10 hardware for the SUH!ZX-AIM
facili.ty, we select.ed the TENEX operating syst.em developed by Bolt, Baranek, and
Newman (BBN) as the most effective for our medical AI applications work. TENEX
was the only available demand-paged system to support simultaneous large address
space users, offered the IrjTERLISP language for LISP-or4 ented program
development, and was well integrated wi th the ARPANET faci.li.ti es which provide an
excellent. base for our community sharing efforts.  This choice has proven a very
effecti.ve one i-n that the producti.vi ty of the TENEX community in AI research has
been highly advantageous to us (2).

   The original BBN TENEX was written for a hardware-modified KA-10 system.
This version of t.he system required a substantial amount of work to accommodate
the relati.vely limi.ted pagi.ng facilities of t.he KI-10 to run effectively.  These
early phases also included substantial moni.tor work to incorporate the TYMNET
memory-sharing interface which connects us to the TYMNET and t.o integrate the
high speed swapping storage.  We have made numerous enhancements to the monitor
calls and correct,ions of bugs to develop a highly reliable and effective
operating system for our c0mmuni.t.y work.

   We continue to work to i.mprove the efficiency of the syst.em and i.ts
effectiveness in al1ocat.i ng valuable resources.  For example we have modified the
handling of user page tables so that the expensive procedure of clearing page
tables and set.ting them up to run ti.me-shared users could be minimized.   Thi s
involved creating a pool of page tables which could be allocated to current.ly
runni.ng users and could be kept available wi thout setup overhead.  We also
implemented a system for mi.grati.ng dormant pages from our fast swapping storage
to moving head disk.  Thi.s preserves the use of this limi.ted resource for the
current.ly active jobs.

   We have implemented a form of llsoftll CPU allocation cont.rol in the monitor ,
assi.sted by a program whi.ch adjusts user percentages for the scheduler based on
the dynami.c loading of t.he system.  The allocati.on control struct.ure works based
on the scheduler queue system and takes account of the 2 priori. allocation of CPU
ti.me and t,hat actually consumed.  Our TENEX uses a hierarchy of five queues for
jobs ranging from highly interact.i.ve jobs requiri.ng only small amounts of CPU
time between wajts to more CPU i ntensfve jobs which can run for long periods
without user i.nteracti.on.  These interactive queues (text edi tting, etc. ) are
scheduled at highest pri.ority wi.thout c0nsi.derat.i on of allocati on percentages.
If nothing i.s runnable from the hi.gh pri.ority queues, the CPU-bound queues are
scanned and jobs are selected for runni.ng based on how much of their allocated
ti.me has been received during a given allocation cycle time (currently 100
seconds).  If no such jobs are runnable,  then those that have received their
allocation of CPU time already are scheduled based on how much they are over
allocat.ion and how long they have wai.ted to be run again.  This system i.s not a
reservation syst.em in that. it does not guarantee a gi.ven user some percentage of

---------------------------------------------------------------------------------
  (2) It should be noted that DEC has recently adopted a form of TENEX (TOPS-
20) as their choice for future system marketi.ng.  They have made improvements in
a number of areas of the monitor and subsystem software but have also shown an
i.ncreasi.ng tendency to make changes to the TOPS-20 system that impai.r
compati bi li. ty with older TENEX systems.

J. Lederberg


DETAILED PROGRESS REPORT

Section 1.2.2.3

the system.  It allocates cycles preferentially, trading off 2 priori allocations
wi th actual demand but does not waste cycles.  Thi.s allocation control system is
still i.n an experimental state and we are attempting to evolve the "best."
policies wi.th the AIM Executive Committee for dividing the system fairly and
effectively among the various communi.ties of users.

   During the spring of 1976 we implemented a dual processor version of TENEX
as the most cost-effect.ive way to increase our processing capaci.ty.  In order to
upgrade to the new KL-"n" t.echnology, we would have had to replace most of the
equipment. that had been purchased initi.ally.  For the cost of an addi tional
processor and 8 man-months of intensive software development we were able to
increase our CPU capacity by 75%.  We have an addi.tional 40% equivalent of a KI-
10 processor which can be mace available by i.ncreasi.ng memory to reduce our
swapping contention.  The dual processor system that has evolved is running qui.te
reli ably.  It treats the two machines in an almost symmet.ric manner. The only
difference is that. one of the machines has all of the I/O equipment att,ached to
it.  They both schedule jobs i.ndependently and share the rest of the non-I/O-
devi.ce moni.tor code.  The areas of the noni.tor involving the management of
resources and jobs which cannot be manipulated by both machines simultaneously
are protected by a system of locks.  We have made some measurements indicat.ing
that overhead for lock wai.t.s is less than 10%.  The overall i.ncrease in capacity
provi.ded by t.he processor upgrade is illustrat,ed in Fi.gure 3 on page 13 which
measures key loading parameters in the periods before and after the dual
processor installation.  Observing the delivery of DEC's high-performance KL-
TENEX systems over the past 6 mont.hs, it seems clear that for the investment., we
made the best choice for the community by implementing the dual processor
upgrade.  We hope t.o augment the memory soon to fini.sh exploiting the capacity
this extra machine provides and to remove some non-li nearities remaining in
system swappi.ng performance.

   Now t,hat the dual processor system has stabi.lized, we are undertaking
another assessment of system performance to be sure we have removed residual and
correctable ineffi ci.enci.es.  This study is on-goi.ng now.

   Fi.nally, over the past year we made several substantial improvements in the
"GTJFN" monitor call which interactively acqui.res handles on file names specified
by the user.  These extensi.ons allow for more general "wi.ld card" speci.fi.cations
and interactive help in deci.ding between and searching for existing file name
alternatives .  They also give the user much more flexibility in designating
groups of files and therefore in structuring hi-s data.

   With a working dual processor system, the current implementation of
allocation controls i.n our system, the divergi.ng path of the DEC TOPS-20 system,
the terminati.on of active BBN TENEX development, and t.he unique compli.cat.ions of
the KI-10 pagi.ng system, we have not made any concerted effort to upgrade our
TENEX system to the latest BBN release (1.34).  The advantages of such an upgrade
are not overwhelmi.ng in face of the compli.cated conversion (KI paging, dual
processor, special swapping device handler, TYMNET service routines, local
JSYS's, etc.) and result,ing system unreli abi.lity `for some peri.od.

15

J. Lederberg


Section 1.2.2.3

DETAILED PROGRESS REPORT

   Another area of software development is i.n the EXECutive program which js
the basic user interface to manipulate files, di.rectories, and devices; control
job and terminal parameter settings; observe job and system status; and execute
public and private programs.  This work improves system accommodation to users
and provi.des more convenient and useful information about system and job status.
Through such features as logjn default files, directed file search path commands,
mai. noti.fication, help facilities, better file archival and retrieval commands,
and flexible status information,  we have tried to make it easier for users to
work on the SUMEX-AIM machine.

1.2.2.4    NETjJORK COmUNICATION FACILITIES

   A highly important aspect of the SUMEX system i.s effective communication
with remote users.  In additi.on to the economic arguments for terminal access,
networking offers other advantages for shared computing such as uniform user
access to multiple machines and special purpose resources, convenient file
transfers for software shari.ng and multiple machi.ne use, more effective backup,
co-processing between remote machi.nes , and improved inter-user communications.
Over the past year we have been substantially ai.ded in exporting the MAINSAIL
system through our network connections.  Because of the developmental nature of
the language at present, i.t j.s i.mportant that we have close jnteractions with the
user community and that we be able to effecti.vely perform bug fixes and upgrades.
Since MAINSAIL by its nature involves operations on a variety of machines and
since our access to example systems cannot be entirely local, the network
connections to Rutgers, the Stanford AI Lab, and Stanford Research Institute have
been invaluable.  It would be considerably more diffi.cult to export MAINSAIL and
communicate with users via tapes and mail.

    We have based our remote communication services on two networks - TYMNET
and ARPANET.  These were the only networks existing at the start of the project
which allowed foreign host access.  Since then, other commercial network systems
(notably TELENET) have come into existence and are growing in coverage and
services.  The two networks to which we are currently connected complement each
other; the TYMNET providing primarily terminal service with very broad
geographical coverage and unrestricted user access,  and the ARPANET havi.ng more
limited access but providing a broader range of communication services.
Together, these networks gi.ve a good view of the current strengths and weaknesses
of this approach.

   Users asked to accept a remote computer as if it were next door will use a
local telephone call to the computer as a standard of comparison.   Current
network termi.nal facilities do not quite accomplish the illusion of a local call.
Data loss is not a problem i.n network communications - in fact with the more
extensive error checking schemes, data integrity is much higher than for a long
distance phone link.  On the other hand, networking relies upon shared community
use of telephone lines to procure widespread geographical coverage at
substantially reduced cost.  However, unless enough total line capacity is
provided to meet peak loads, substantial queueing and traffic jams result in the
loss of terminal responsiveness.

J. Lederberg

16


DETAILED PROGRESS REPORT

Secti.on 1 .2.2.4

TYMNET:

    Networks such as TYMNET are a complex i.nt.erconnect.ion of nodes and lines
spanning t.he count.ry (see Fi.gure 4 on page 20).  The primary cause of delay in
passi.ng a message through the network i.s the time to transfer a message from node
to node and the scheduling of this traffi.c over multiplexed lines.  This latter
effect only becomes i.mportant in heavily loaded situations; the former i.s always
present.  Clearly from the user viewpoint,  the best sit.uati.on i s to have as few
nodes as possible between him and the host - thi s means many i nterconnecti ng
lines t.hrough the net,work and correspondingly higher costs for the network
manager.  TENEX in some ways emphasizes this conflict more than other time-
shari.ng systems because of the hignly interactive nature of terminal handling
(e.g., command and fi.le name recogni.tion and non-printi.ng program commands as in
text edi.tors or INTERLISP).  In such j nstances,  individual characters must be
seen by the host machine to determi.ne the proper echo response in cont.rast to
other systems where only  "line at a time" commands are allowed.   We have
connected SUMEX to the TYHNET in two places as shown in Figure 4 so as to allow
more direct access from different parts of the country.  Based on delay ti.me
statistics collected during the previous year from our TYMSTAT program, the
response times are scarcely acceptable.  When delay times exceed 200-300
milli.seconds, the character printi.ng lag problems become noticable wi.th a full
duplex, 30 char/set terminal .  In the past. these times have been particularly bad
in New York with peak delays approachjng 3 seconds one way! Other nodes have
shown uni.formly hi.gh readings as well.   These data were reflected in the
subjecti.ve, but strongly arti.culated, comments of many of our user groups.

    We have had numerous meet.ings with TYMNET personnel t,o try to ease t,hese
problems and have i.nsti tut.ed rerouti.ngs of the lines connect.ing SUMEX-AIM to the
network.  Also local lines to more strategic termi.nal nodes have been considered
for users in areas poorly served by the existing line layout. TYMNET has also
made some upgrades in the internal connect.lvl ty and speeds with which data i s
swi tched wi thin their node clusters.  These changes seem to have had some
beneficial effects i-n that delay ti.mes have i.mproved and user complaints have
subsided .

    We wi.11 cont.i nue to pursue improvements in TYMNET response but user
terminal interactions such as used i-n TENEX programs are not reali.zed in t.he
ti.me-sharjng systems offered by most. other TYMNET users and hence are not
supported well by TYMNET.  TYMNET has implemented 1200 baud service in 7 major
ci.ti es over the past year.  Unfortunately many of our users are not in these
ci ties so we have only li.mited experience with the 1200 baud support..

ARPANET :

   The ARPANET, while designed for more general information transfer than
purely terminal handling, has similar bottleneck problems in its topology (see
the current geographi.cal and logical maps of the ARPANET in Figure 5 and Figure
6 on page 21).  These are reduced by the use of relatively higher speed
interconnection lines (50 K baud instead of 2400 - 9600 baud lines as in TYMNET)
but response delays through many nodes become objectionable eventually as well.

17

J. Lederberg


Section 1.2.2.4

DETAILED PROGRESS REPORT

   Consistent with the agreements with ARPA when we were granted network
access initially, we are enforcing a policy to restrict the use of the ARPANET to
users who have affiliations with ARPA-supported contractors and system/software
interchange wi.th cooperating TENEX sites.  The administration of the network
passed from the ARPA Information Processing Techniques Office to the Defense
Communications Agency as of July 1975.  At that time policies were announced
restricting access to DOD-affiliated users.  We have restricted the facilities
for calling from SUMEX out to other sites on the ARPANET to authorized users.
This also protects the SUMEX-AIM machine from acting as an expensive terminal
handler for other machines - this function is better fulfilled by dedi.cated
terminal handling machines (TIPS).  In general, we have developed excellent
working relationships with other sites on the ARPANET for system backup and
software interchange - such day-to-day workjng i.nteractions with remote
facilities would not be possible without the integrated file transfer,
communication, and terminal handling capabilities unique to the ARPANET.

   We take very seri.ously the responsibi.lity to provide effecti.ve
communication capabilities to SUMEX-AIM users and are continuously looking for
ways to i.mprove our existing facilities as well as investigate alternatives
becoming available.  iJe have done preliminary investigations of the TELENET
facili.ties that have been rapidly expanding thi.s past year.  BB&N has hooked one
of their TENEX systems up to TELENET and whereas we did not have the same
quantitative tools we have for measuring response on the TYMNET, we observed
TELENET delays at least as long as those encountered on TYMNET.  We did the
reverse experiment by using long distance telephone to connect from the TELENET
node in Washington, D.C.  to the SUMEX machine i.n California and observed the
same sort of delays reaching several seconds per character.  The TELENET has many
attractive feature in terms of a symmetry analogous to that of the ARPANET for
terminal traffic and file transfers and being commercial would not have the
access restricti.ons of the ARPANET.  However, until the network throughput
improves we would not get substantial benefits from connecting to it.

J. Lederberg

18


i

6

11     13 ,   :s    17           2:    23                   -7

Figure 4.  TYMNET Network Map


Figure 5.

ARPANET GEOGRAPHIC MAP, APRIL 1977

  ._-. --..-
Q/d
CMU

*~~lhr SATELLITE CiRCUIT
0 IMP
0 TIP
A PLURIBUS IMP

(NOTE: THIS MAP DOES NOT SHOW ARPn'S EXPERIMENTAL
SATELLITE CONNECTIONS)

NAMES SHOWN ARE IMP NAMES, NOT (NECESSARILY 1 HOST NAMES

Y h `@SC\  \/                                        I"", I


HAWAII

STANFORD

Figure 6.

ARPANET LOGICAL MAP, MARCH 1977

(PLEASE NOTE THAT WHILE THIS MAP SHOWS THE HOST POPULATION OF THE NETWORK ACCORDING TO THE BEST
INFORMATION OBTAINABLE, NO CLAIM CAN BE MADE FOR ITS ACCURACY)
NAMES SHOWN ARE IMP NAMES,NOT (NECESSARILY) HOST NAMES


DETAILED PROGRESS REPORT

Section 1.2.2.5

1.2.2.5   SYSTEM RELIABILITY AND BACKUP

   System reli.ability has remai.ned high over the past years; excellent under
stable hardware and software conditions and degrading temporarily during
debugging and development periods and during periods of di.fficult hardware
problems.  In general we take the system down for approximately 50 hours per
month for scheduled hardware maintenance, file backup, and other mai.ntenance. In
additi.on we average from 10 to 15 hours per month in unscheduled downtime.
During particularly difficult hardware or software difficulties we must absorb
substantially more downtime.

1.2.2.5    PROGRAMMING LANGUAGES

    Over the past years we or members of the SUMEX-AIM community have continued
to maintain the major languages on the system at current release levels, have
TENEXized several languages to improve efficiency, and have invest,igat,ed a number
of issues related to the efficiency of programs written in various LISP
implementations and the exportability of programs.  These issues are becoming
increasingly critical in dealing with AI performance programs which have reached
a level of maturity so that substantial, non-developmental user communities are
growing.  The following summari~zes general accomplishments and the following
section di.scusses in detail the work this past year in desi.gning a machine-
independent ALGOL-li.ke system (MAINSAIL).

LISP Efficiency:

    There has been an on-going debate among a number of projects over the best
language to choose for developmental implementati.on of the various AI programs.
The key issues i.nclude ease and flexibility of conceptual representation of
program functions and objects, interactive debugging support, efficiency, and
exportabili.ty.  To date the predominant language choice for AIM research has been
LISP and more particularly INTERLISP.  These issues are important because they
i.nfluence the time required to develop new AI programs and subsequently the
incremental load placed on the SUMEX machine when in use.  We recently attempted
an evaluation of INTERLISP and ILISP including the relative efficiencies of the
two languages and the level of assistance the language systems provide the user
in developing programs.  The tests were based on an implementati.on of a subset of
REDUCE (a symbolic algebra manipulator).  The results of several iterations in
program refinement by experts in the respective languages were that the runtimes
for the two versions were quite comparable (far less than the factor of 5-10
disparity predicted by ILISP enthusiasts).  A more disquieting result was the
substantial difference in runtimes depending on how particular functions were
coded IN THE SAME LANGUAGE.  It is apparent from the results that factors of 10
differences in time can result from a superficial implementation - expert
programming insight is essential to efficient program performance.  This is not a
real surprise in that it is true of programming i.n any language - the problems
may be increased by such a rich language as INTERLISP with such a wi.de array of

23

J. Lederberg


Section 1.2.2.6

DETAILED PROGRESS REPORT

ways t.o do the same thing but wi.th li.ttle guidance as to the relat.i.ve costs.   It.
has proven very di ffi cult to quanti fy the llrulesff for good programming. Mr.
Wasi.nter and Mr. Phi.1 Jackson attempted to document good INTERLISP programming
habi ts and i.ssued a bulletin for SUMEX users.

   A further impact of t.hese data i s that i t i s very di ffi cult to
simultaneously develop a new AI program and make the implementation highly
ef fi ci.ent .  Wi t.h the i.terations required to develop the conceptual design of the
program, i.t is diffi.cult to ensure its efficiency.  This may lead to the need to
reimplement the program after the basic development stabili.zes to increase
ef fi ci ency whi le still accommodating conveni.ent and orderly further development.
Such reimplenentation may or may not be best done in LISP - this will depend on
many factors i.ncluding the nature of the program data structure requirements and
anticipated further development efforts.

MAIIJSAIL Progress

    SUMEX, in its role as a nati onally shared computer resource, is an
appropriate vehicle for the development of software unbound by the underlyi.ng
machi ne environment .  We have a built-in community of program developers acutely
aware of the significance of providi.ng their work to a broader base of users.
Thi.s i nt,ersecti on of hardware capabi li.ty , software experti se, and dedi cati on to
resource sharing presents a uni que opportunity to promote a system designed for
program sharing .

The ?4AINSAIL (3) project has three closely related goals:

1)  Provi.de an integrated set of tools for the creation of ef fici.ent portable
  software on a variety of computer systems, and provide support and
  continued development of these tools in a form compatible across all
  implementations.

2)  Study innovative approaches to portabi li ty, both hardware and software,
  and develop such approaches into effective tools.

3) Promote the development and distribution of portable software, advise and
  assist in its design, and evaluate i.t.s applicability.

    By portable software we mean computer programs which may be executed on a
vari.ety of machines with few, if any, alterations.  MAINSAIL i tself wi 11 provide
the i.ni.ti.al example of portable software, since all of the system i.s written in
the MAINSAIL language except for those parts which are determined by the host
envi.ronment (hardware, i.nstruction set., operating system, etc. 1.  Even these
parts are embedded withi.n MAINSAIL.

   ( 3) The MAINSAIL (MAchi.ne-INdependent SAIL) language i s derived from SAIL, a
programming language developed at Stanford Universi.ty's Arti fici al Intelligence
Laboratory.  It is not compatible wi.th SAIL, since SAIL was desi.gned for a PDP-10
wi.th TOPS-l 0, and hence contai ns machi.ne-dependenci es.  However it has ret.ai ned
the basi c attributes of SAIL as an extended ALGOL-11 ke language.  A summary of
some of the features of t.he MAINSAIL language and thei~r relationship to other
languages i-s qi.ven in Appendix III on page 231 (see Hook II).

J. Lederberg

24


DETAILED PROGRESS REPORT

Section 1.2.2.6

   There i.s a key di sti.ncti on between MAINSAIL's approach to portabi.li.ty and
t.he "classical" approach characterized by languages such as FORTRAN, ALGOL, LISP,
COBOL and BASIC.  These languages att.empt to adhere to a single syntax standard
which is separately implement,ed for each different computer syst.em.  lnvari ably
these i.mp1ement.at.i ons have di.fferences which preclude the creation of a program
which is accepted by all.  It. is difficult, i.f not impossible, to define a
language standard which i.s unambiguous and at the same time sufficiently
comprehensi ble to provi de the basi s for compati ble implementations.  Furthermore,
many implement.ors yield t.o the temptation to provide "enhancements" to the
standard which immediately introduces machine and system dependenci.es.

   MAINSAIL, on the other hand, provi.des a single system (wri t.ten primarily i.n
i tself) which i.s employed at every site.  Thi.s i s made possible by its ability to
compile i.tself into code for a vari.ety of machi.nes.  Only the compiler's code
generators and the runtime operati.ng-system interfaces need be rewritt,en for each
i mplementat,i on.  These parts of MAINSAIL are at a level which has already been
defined by the machine-independent parts, and do not affect the language from the
user `3 vi ewpoi.nt .  Thus the "language standard" has been reduced to a "semantic
standard" which is surrounded by machine-i.ndependent software.

    It remains to be seen whether the temptation to augment the language with
machine-dependencies (for purposes of ultimate efficiency or to take advantage of
particular local system features) can be overcome.  Herein also lies the biggest
llprice" to be paid for exporbabili ty.  The code emitted from the gAINSAIL
compiler can be (and is, based on tests to date) at least as efficient as that
from many machine-dependent compilers.  On the other hand, special machi ne or
operati.ng system features that cannot be uniformly implement.ed may provide local
optimi zations at the cost of exportabi lity or vice versa.  We cannot ef fee ti vely
measure the extent of thi.s cost at thi s stage.

DEVELOPMENT APPROACH

    We do not underesti.mate the di ffi.culty in obtaining the cooperation of a
community which will span a wide variety of appli.cati ons and hardware/software
systems.  If MAINSAIL is to obtain widespread use, i.t i s cruci.al that i t have an
effective and credible base of support.  The initial parts of MAINSAIL are just
about ready for li.mi ted di.st.ri bution.  We want to maintain close supervi sion of
this di stributi on, and insure that systems labelled as MAINSAIL are not altered
without our approval.  In thi.s regard we are pursuing legal channels to safeguard
the integrity of MAINSAIL software.  We plan to take MAINSAIL through an orderly
progressi on of development, and to avoi.d casual di stri.buti.on with no provision
for a so1i.d base of mai.ntenance and future growth.

REVIEW OF PROGRESS TO DATE

   MAINSAIL has been under development for almost three years now. Beginning
with an initial goal of converting the PDP-10 SAIL compiler to generate code for
a PDP-1 1, several versi.ons had been i mplement.ed on a PDP- 10 and a PDP-11, and the
groundwork had been 1ai.d for extending the system to a wi.der variety of machi.nes.
The current version was begun i.n August of 1976.

25

J. Lederberg



Section 1.2.2.6

DETAILED PROGRESS REPORT

   Early versions of MAINSAIL attempted to maintain close compatibility with
the original SAIL, but in surveyi.ng a wider variety of machines (especially mini-
computers), we concluded that this compatibility could be maintained only at the
expense of portability.  It was felt that MAINSAIL could contribute more by
provi.ding a truly portable system.  Thus we began redesigning MAINSAIL,
rebuilding from previous implementations.  This effort has resulted in a new
version which is still under development , and is now being tested on several
systems.

   Initial implementations of the current design are for DEC PDP-1G's with the
TENEX operating system and with the TOPS-10 operating system. The TENEX version
is being tested at SUMEX and has been installed at one other TENEX site (Stanford
- IMSSS).  The TOPS-10 version was developed at SUMEX by using TENEX facilities
which provi.de compatibility with TOPS.10.  The Rutgers University PDP-10 facility
was chosen for external testing since it is a standard TOPS-10 system, and can be
accessed from SUMEX over a network.  MAINSAIL i-s now undergoing preliminary
testing there.  A modifi.ed TOPS-10 version has been set up on the Stanford AI-
lab's PDP-10, but also has not been open to general use,

   Little additional work will be necessary to make the TENEX version execute
on a DECSYSTEM-20 since TOPS-20 is derived from TENEX.  However, some time will
be needed to take full advantage of the extended instruction set of the KL-10.
Two sites are available for TOPS-20 development: the LOTS facility at Stanford;
and a machine at SRI, close to Stanford and accessi.ble over a network.  Both of
these sites have expressed an interest in using MAINSAIL.

   The PDP-11 has been chosen as the first mini-computer to be implemented.
Code generators have been written for it but not debugged. Several variants of
these code generators will be necessary to cover the full PDP-11 family.

   MAINSAIL interfaces to three PDP-11 operating systems (RT-11, RSX-11 and
UNIX) are now under development.  All of these operating systems are available to
the MAINSAIL project on PDP-11'3 at Stanford. RT-11 will be the first to be
implemented.  The mi.x of instructi.on sets, operating systems and configurations
will be a good test of MAINSAIL's ability to provide a compatible implementation,
even across this one family of computers.  We expect the PDP-11 systems to be
operational by this summer.

1.2.2.7   STANFORD & HANDBOOK PROJECT

   The AI Handbook is a compendium of short articles (3-5 pages each) about
the projects, ideas, problems.and techniques that make up the field of Artificial
Intelligence.  Over 150 articles have been drafted by researchers and students in
the field, on topics ranging in depth from "Augmented Transaction Networks"
(ATN's) to "An Overview of Natural Language Research", and coveri.ng the entire
breadth of AI research: search, robotics, speech understanding, real-world
applications, etc.  An outli.ne of the current contents of the handbook is given
in Appendix II on page 225 (see Book II).

J. Lederberg

26


DETAILED PROGRESS REPOR'!

Section 1.2.2.7

   During the Spring of 1976 the final push for draft.ing new articles was
completed,  with some 60 arti.cles produced by student.s during that. quarter.   Si nce
then the process has begun of' rewriting the vari.ous chapt,ers of the Handbook to
produce coherent manuscripts from the original work of five to ten authors.  This
effort i.nvolves rewriting articles for accuracy and completeness as well as
integrating the 15 to 25 articles i n a section i.nt.0 an edi torially uni form and
readable document.  An editor has been added to t.he project team who will be
responsible for maintaining a consistent, format and st.yle in the Handbook.

   When complet.ed, each chapter wi 11 be reviewed by experts in the appropri ate
research area before it i s released to the public.  At. present, the chapter on
Natural Language research is completed and being reviewed, and we expect. that the
sections on Search, Speech Understandi ng , Represent,at.i on of Knowledge, and
Automatic Programming will be completed during the next. two months.  Dur i ng the
Fall of 1977 the first seven chapters of the handbook will be published in
pr el i mi nary  form.  Meanwhile, the handbook is already available to cooperative
experts and critics on-li ne vi a the SUMEX-AIM network connections.   We are
consi deri ng mai ntai ni ng the handbook on-li ne, wi th occa.si onal hard-copy editions,
and believe this method of "publication" may be a protot.ype for other
encyclopedic monographs.

1.2.2.8   -USER SOFTWARE AND INTRA-COMMUNITY COMMUNICATION

   In addi.tion to the syst,em and language software development efforts of
SUMEX, we have assembled or developed where necessary a broad range of utilities
and user software.  These include operati onal aids, stati.stics packages, DEC-
suppli ed programs, improvements to the TOPS-10 emulator, text edi t.ors, text
search programs, file space management programs, graphics support, a batch
program execution monitor, text formatti.ng and justi fi cati on assi stance, and
magneti c tape conversi on aids.  We have also developed a number of user
i nformati on assistance programs such as a ltWHOIStl facili t.y to recover names and
affi li ations of users and a "HELP" faci li t.y to locate on-li.ne documentation of
i nterest through key word searches.

   Of major importance for our community effort is t.he set. of tools for inter-
user communi cati ons.  We have enhanced t.he message sending and manipulation
programs to bet.ter int.egrate text edi.tting faci li.ties for ea.4 er message
preparati on and reading .  We have also developed a unique "bulletin board" system
to deal wi t.h informal notes, thereby bri dginq a functional gap bet,ween formal
system documents and pri vate messages communi cations bet.ween i ndi vidual users.
The bulletin board system provides an informal and dynamic base for inf0rmat.i on
about system facili~ti es, lore, bugs, etc. or can provide a means for intra-
project communication and coordination.

   The syst.em has been in operation for more t.han one year and has been
exported to IMSSS (Stanford's other TELEX site) and USC-ECL.  We have also
proposed t.hat the next. generation of ARPANET information services provide for
bulleti n board-li ke faci li ti.es.  At SUMEX-AIM t,here are 10 bulletin boards, 8 of
which are project-speci.fi.c.  The mai.n system bulletin board currently cont.ains
more than 140 bulletins under 85 topics covering system status announcements,

27

J. Lederberg


Section 1.2.2.8

DETAILED PROGRESS REPORT

explanations of recent crashes, hardware troubles and monitor upgrades, new
developments, bugs, and little-documented features of our programming languages
and utilities.  Project bulletin boards have been used for notices and minutes of
meetings, references to and abstracts of papers, coordination of on-going
developments, vacation schedules, documentation and announcements of vari.ous
kinds.

Current Bulletin Board features include:

Multiple bulletin boards (public, private, general, specific, etc.).

Topics and subtopics (separated by periods) may be nested to any depth.

Expi.re dates for each bulletin, after which they are removed automatically.

Interest-list-of-topics for each user allows him to be notified about new
bulletins he is interested i.n and to ignore others.

Users notified when new bulletins arrive, by running BBCHECK (the bulletin-
board MAIL CHECK) or by mail.

Help and browsing faci.li tated in a variety of ways (? can be typed anywhere,
general and comaand-specific help provided).

Command structure nodelled after the TENEX EXEC, with conscious attention to
human-engineering.

Companion program BBREAD is a bulletin-board READMAIL.

Companion program BBNEWS types out a directory listing of any new bulletins.

1.2.2.9    DOCUi4ENTATION AND EDUCATION

   We have spent considerable effort to develop, maintain, and facilitate
access to our documentation so as to accurately reflect available software.   The
HELP and Bulletin Board systems have been important in this effort.  We have
limited manpower for user assi.stance. In general, users are responsible for
their own software development and maintenance.  The SUMEX staff, however,
(including Lederberg and Rindfleisch) share the responsibi.liti.es for system level
assistance to users, tracking down bugs, reviewing user suggestions, etc.   The
terrni.nal li.nki.ng facilities of TENEX have been valuable tools to assist remote
user groups and also for system users to communicate with each other.  With the
recent initial release of the MAINSAIL system on selected machines, we are
becomi.ng increasingly involved in describing MAINSAIL and advi.sing user projects
in its possible applications.

1.2.2.10    SOFTWARE COMPATIBILITY AND SHARING

   At SUMEX-AIM we firmly believe in importing rather than reinventing
software where possible.  At SUMEX many avenues exist for sharing between the
system staff, various user projects, other facilities, and vendors.  In the past

J. Lederberg

28


DETAILED PROGRESS REPORT

Secti.on 1.2.2.10

wi. thout, communi cati on networks, the system vendor served as t.he focal point for
distri but.ion of most software to user si.tes.  Since the process of distri but.ing
tapes (and parti.cularly of handling bug reports and user suggestions) was very
slow, it was common for sites to take a version of a program and then modify and
mai ntai n it. locally.  This caused a proliferation of home-grown versions of
software .  Similar impediment.s have exi st.ed to the di ssemi.nation of user
software .  User organizati.ons like SBARE and DECUS have helped to overcome these
problems but communi cat.i on i s sti 11 cumbersome.  The advent. of fast and
convenient. communi.cati on faci li.ti.es coupli.ng communi.ti es of computer faci li ti.es
has the potenti al of making a major difference in faci li tating inter-group
cooperation and to lower these barriers ,

    The TENEX sites on the ARPANET have been interacting increasingly wi t.h each
other to develop new software systems.  This functions effectively to build
communi~cation around the network and promote a functional di vi sion of labor and
experti se.  The other major advantage is t,hat as a by-product. of the constant
communi cati on about parti cular software , personal connections bet,ween staff
members of t.he various sites develop.  These connections serve t.o pass general
i nformati on about, software tools and t.o encourage the exchange of ideas among the
sites.  Certain common problems are now regularly discussed on a multi-site
level.  We continue to draw signi ficant amounts of system software from other
ARPANET sites, reci procating wi tn our o\lm local developments.   Interacti ens have
i ncluded mutual backup support, hardware confi gurati on experi.ments , operating
system enhancements, utility or language software, and user project.
collaborations.  tie have been able to import many new pieces of software and
improvements to existing ones i.n thi s way.  Examples of imported software i nclude
the message mani pulat,i on pro,qarn MSG, TMEX SAIL, TENEX SOS, INTERLISP, the
RECORD program, ARPANET host tables, and many others.  Reci procally , we have
exported our contributions such as the druim page migrat,ion syst.em, KI-10 page
table effi ci ency improvement.s, GTJFi1 enhancements, PUB macro fi les, the bulleti n
board system,  SNDMSG ennancements, our BATCH monitor, etc.   The most recent
example of this cooperative use of networks i s in the preliminary export of
MAINSAIL.

1.2.2.11    RESOURCE MANAGEMENT

ORGANIZATION qivn PROCEDURES

    The SUMEX-AI24 resource i s administered within the Genet.ics Department of
tne Stanford Uni versi ty Medical School, Professor Lederberg's "main office",
though he also holds appointments in the Computer Sci.ence Dept,. and the Human
Bi ology program .  Its mi ssi on, locally and nat.ionally, entai.ls both the
recrui t.ment of appropriate research projects int.erested in medical AI
applications and the catalysi s of interactions among these groups and the broader
medical communi t,y .  User projects are separately funded and autonomous in their
management.  They are selected for access t.o SUMEX on the basis of their
scienti fit and medical meri t.s as well as their commitment. t.o the community goals
of SUMEX .  Currently active project,s span a broad range of application areas such
as clinical di agnostic consu1tat.i on, molecular bi ochemi stry , belief systems
modeling, mental function modeli.ng, and i nstrument data i nterpret,at.i on ( see
Section 6 on page 41 in Book IX) .

29

J. Lederberg


Section 1.2.2.11

EXECUTIVE AND ADVISORY COMMITTEE ORGANIZATION

DETAILED PROGRESS REPORT

   As the SUMEX-AIM project is a multilateral undertakjng by its very nature,
we have created several management committees to assist in administering the
varjous portions of the SUMEX resource.  As defined j.n the SUMEX-AIM management
plan adopted at the time the initial resource grant was awarded, the available
facility capacity is allocated 40% to Stanford Medical School projects, 40% to
national projects, and 20% to common system development and related functions.
Within the Stanford all.quot, Dr. Lederberg has established an advisory committee
to assist him in selecting and allocating resources among projects appropriate to
the SUfEX mission.  The current membership of this commIttee is listed in
Appendix V (see Book II).

    For the national community, two committees serve complementary functions.
An Executive Committee oversees the operations of the resource as related to
national users and makes the final decisions on authorizing admissl.on for
projects.  It also establishes policies for resource allocation and approves
plans for resource development and augmentation withjn the nati.onal portjon of
SUMEX (e.g., hardware upgrades, MAINSAIL development priorities, etc.).   The
Executive Commi.ttee oversees the planning and implementation of the AIM Workshbp
serj.es currently implemented under Prof. S. Amarel of Rutgers University and
assures coordination with other AIM activities as well.  The commj.ttee will play
a key role i.n assessing the possj.ble need for additional future AIM community
computing resources and in decjding the optj.mal placement and management of such
facilities.  The current membership of the Executive committee is listed in
Appendix V (see Book II).

   Reporting to the Executive Commjttee, an Advl.sory Group represents the
interests of medical and computer scl.ence research relevant to AIM goals.  The
Advisory Group serves several functjons in advising the Executive Committee; 1)
recruiting approprlate medical/computer science projects, 2) reviewing and
recommending priorities for allocation of resource capacity to specific projects
based on scientific quality and medical relevance, and 3) recommending poll.cies
and development goals for the resource.  The current Advisory Group membership is
given in Appendix V (see Book II).

   These committees have actively functioned in support of the resource.
Except for the meetings held during the AIM workshops, the committees have met by
telephone conference owing to the size of the groups and to save the time and
expense of personal travel to meet face to face.  These telephone meetl.ngs, Jn
conjunction with terminal access to related text materials, have served quite
well in accomplishing the agenda business and facilitate greatly the arrangement
of meetings.  Other solicitations of advice requiring review of sizable written
proposals are done by mal.1.

   We will continue to work with the management committees to recruit the
additional high quality projects which can be accommodated and to evolve resource
allocation policjes which appropriately reflect assigned priorities and project
needs.  We hope to make more generally available information about the various
projects both inside and outsj.de of the community and thereby to promote the
kinds of exchanges exemplified earlier and made possible by network facilities.

J. Lederberg

30


DETAILED PROGRESS REPORT

Section 1.2.2.11

NEW PROJECT RECRUITING

    The SUMEX-AIM resource has been announced through a variety of media as
well as by correspondence, contacts of NIB-BRP wi.t.h a variety of prospective
grantees who use computers, and contacts by our own staff and commi.ttee members.
The number of formal projects that have been adai t.t.ed to SUMEX has more t.han
doubled since the start of the project.; others are working tentatively as pi lot.
projects or are under review.

   We have prepared a variety of materials for the new user ranging from
general i.nformation such as i s cont.ained in a brochure (see Appendix VI in
Book II) to more detailed i.nformati.on and gui.deli.nes for determining whether a
user project is appropri at.e for the SUMEX-AIM resource.   Dr. E. Levi nthal has
prepared a questionnaire to assi st. users seriously considering applying for
access to SUMEX-AIM (see Appendix VII i.n Book II).  Pilot project. cat.egories
have been est,abli.shed both withi.n the Stanford and national a1iquot.s of the
faci.li ty capaci.ty to assi.st and encourage projects just. formuiati.ng possible AIM
proposals pending their application for funding support and in parallel formal
application for access t.0 SUMEX.  Pi lot projects are approved for access for
limited periods of time after preliminary review by the Stanford or AIM Advisory
Group as appropriate t-o t.he origin of the project.

    These contacts have sometimes done much more than provide support for
already-formulated programs.  For example, Prof. Feigenbaum's group at %anford
has i.nitiated a major collaborative effort with Dr. Osborn's group at t.he
Insti.tutes of Medical Sciences in San Francisco.  This project in l*Pulmonary
Function Moni tori ng and Venti.lat.or Management - PUFF/VM" (see Section 6.4.6 on
page 197 i.n Book II) originated as a pilot. request to  us e MLAB in a small way for
modeling.  Subsequently the AI potentiali ties of t.hi s domain were recogni~zed by
Feigenbaum, Ni i , and Osborn who have submitted a joint proposal to NIB and have a
pilot status at. present.

   The following lists the fully authorized projects currently comprising the
SUMEX-AIM community (see Section 6 in Book II for more detailed descri pt.i.ons) .
The nucleus of five projects that were authorized at the initial funding of the
resource in December 1973 are marked by "<*>".

31

J e Lederberg


Section 1.2.2.11

DETAILED PROGRESS REPORT

National Community -

  1)



<*> 2)


  3)



  4)



<*> 5)

  6)

Acquisition of Cognitive Procedures (ACT); Dr. J.  Anderson (Yale
University)

Higher Mental Functions Project; K. Colby, M.D. (Uni.versity of Ca1iforni.a
at Los Angeles)

INTERNIST Project; J. Myers, M.D. and Dr. H. Pople (University of
Pittsburgh)

Medical Informati.on Systems Laboratory (MISL); J. Wilensky, M.D. and Dr.
B. McCormick (University of Illi.nois at Chicago Ci.rcle)

Rutgers Computers in Biomedicine; Dr. S. Amarel (Rutgers University)

Chemical Synthesis Project (SECS); Dr. T. Wipke (Uni.versity of Ca1i.fornl.a
at Santa Cruz)

Stanford Community -

<*> 1) DENDRAL Project; Drs. C. Djerassi, J. Lederberg, and E. Feigenbaum

2) Large Multi-processor Arrays (HYDROID); Dr. G. Wiederhold

3) Molecular Genetics Project (MOLGEN); Drs. J. Lederberg, E. Feigenbaum, and
  N. Martin

<*> 4) MYCIN Project; S. Cohen, M.D. and Dr. B. Buchanan

<*> 5) Protein Structure Modelling; Drs. J. Kraut and S. Freer (University of
    California at San Diego) and E. Feigenbaum (Stanford)

   As an additional aid to new projects or collaborators with existing
projects, we provide a limited amount of funds for use to support terminals and
communications needs of users without access to such equipment.  We are currently
leasing 6 terminals and 4 modems for users as well as 4 foreign exchange lines to
better couple the Rutgers project i.nto the TYMNET and a leased line between
Stanford and U. C. Santa Cruz for the Chemical Synthesis project.

STANFORD COMMUNITY BUILDING

   The Stanford community has undertaken several internal efforts to encourage
interactions and sharing between the projects centered here.  Professor
Feigenbaum organized a seminar class with the goal of assembling a handbook of AI
concepts, techniques,  and current state-of-the-art.  This project has had
enthusiastic support from the students and substantial progress made in preparing
many sections of the handbook as reported earlier.  An outline of the material
being prepared can be found in Appendix II on page 225 (see Book II). Several
examples of completed articles are given in Appendix I on page 202 (see Book
II).

J. Lederberg                    32


DETAILED PROGRESS REPORT

Section 1.2.2.11

    A second community-building effort was a mini -conference on AI held at
Stanford in January 1976.  This 3 day seri.es of meetings featured presentati.ons
by each of the local projects and comparati.ve di.scussi.ons of approaches t.o
current problems in AI research such as knowledge representations, production
system strategies and rule formati.on, etc.  Weekly i.nformal lunch meeti.ngs
(SIGLUNCH) are also held between community members to discuss general AI topics,
concerns and progress of individual projects, or system problems as appropriate
as well as havi.ng a number of outside invited speakers.

AIM WORKSHOP SUPPORT

   The Rutgers Computers in Bi.omedici.ne resource (under Dr. Saul Amarel) has
organized a series of workshops devoted to a range of topi.cs related to
artifi ci.al intelligence research, medi.cal needs, and resource sharing poli.ci es
wi. thin NIH.  Meetings have been held for the past two years at Rutgers and
another is planned for thi.s summer.  The SUMEX facility has acted as a prime
computing base for the workshop demonstrations.  We expect, t.o continue t.hi.s
support for future workshops.  The AIM workshops provide much useful information
about the strengths and weaknesses of the performance programs bot,h in t.erms of
criti.ci.sms from other AI pr0ject.s and in terms of the needs of practicing medical
people.  We plan to conti.nue t-o use thi.s experience to guide the community
building aspects of SUMEX-AIM.

RESOURCE ALLOCATION POLICIES

   As the SUMEX faci.lity has become increasingly loaded, a number of diverse
and confli.cti.ng demands have ari.sen which requi.re controlled allocation of
critical facility resources (file space and central processor time) .   We have
already spelled out a poli.cy for file space management; an allocation of file
storage is defi.ned for each authorized project in conjunction wi.th the management
commi. ttees.  Thi.s allocation is divided among project members in any way desi red
by the i ndi vi-dual pri nci pal investigators.  System allocation enforcement i s
implemented by project each week.  As the weekly file dump is done, i-f the
aggregate space in use by a project is over its allocation, files are archived
from user di.rectories over allocation until the project is within i.ts allocation.

    We have recently implemented system scheduling cont,rols to attempt. to
maintai.n the 40:40:20 balance i n terms of CPU uti li zati.on (see page 14) .   The
i.ni.tial complement of user projects justi fyi.ng the SUMEX resource was centered to
a large extent at Stanford.  Over the first term of the SUMEX grant, a
substantial growth in the number of national projects was realized.  During the
same time the Stanford group of projects has matured as well and in practice the
40:40 sp1i.t between Stanford and non-Stanford projects is not ideally realized
(see Fi.gure 8 on page 38 and the tables of recent project usage on page 40).
Our job scheduli.ng controls bias the allocati.on of CPU time based on percent ti.me
consumed relative to the time allocated over the 40:40:20 community split.  The
controls are "soft" however in that they do not waste computer cycles if users
below thei.r allocated percentages are not on the system to consume the cycles.
The operating disparity i.n CPU use to date reflects a substantial difference i-n
demand between the Stanford community and the developing national projects,
rather than i.nequi ty of access.  For example, the Stanford utilization i.s spread

33

J. Lederberg


Secti.on 1.2.2.11

DETAILED PROGRESS REPORT

over a large part of the 24-hour cycle,  while national-AIM users tend to be more
sensi.tive to local prime-time constrai.nts.  (The j-hour time-zone phase shift
across the continent is of substantial help in load-balancing.)  For the present,
we propose to continue our policy of llsoft.lt allocation enforcement for the fair
split of resource capacity.  If necessary to assure proper apportionment, we can
implement a pl.e-sli.ce reservation system to more ri.gjdly control the allocatj.ons.

   Our system also categorizes users j.n terms of access privileges. These
comprise fully authorized users, pilot projects, guests, and network visj.tors J-n
descending order of system capabilities.  We want to encourage bona fide medj.cal
and health research people to experi.ment wJ: th the various programs available with
a ,mini.mum of red tape whi.le not allowing unauthenticated users to bypass the
advjsory group screening procedures by coming on as guests.  So far we have had
relatively lj.ttle abuse compared to what other network sites have experienced,
perhaps on account of the personal attention that seni.or staff gj.ves to the logon
records, and to other security measures.  However, the experience of most other
computer managers behooves us to be cautious about being as wide-open as might be
preferred for informal service to pilot efforts and demonstrations. We will
contl.nue developing thj s mechanj sm I.n conjunction wi.th management committee
policy decisi ons.

J. Lederberg

34


Section 1.2.2.12

DETAILED PROGRESS REPORT

1-2-2.12    SUM;%ARY OF RESOURCE USAGE
            --- - I_--- --

   The following data give an overview of SUMEX-AIM resource usage.  There are
five sub-sections containing data respectively for 1) monthly CPU time consumed,
2) resource usage by community (AIL4 and Stanford), 3) resource usage by project,
4) recent diurnal loading data, and 5) Network usage data.

MONTHLY CPil TIi"lE CONSWED
--- - I_ I_-

  600






  500






  400
-
v1
44

-2i
s  300
2
-r(
H
2
v
  200







  100







    C

ASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJ
  1974          1975              1976             1977

Figure 7.  Monthly CPU Time Consumed

J. Lederberg

35

Privileged Communication


Secti.on 1.2.2.12

DETAILED PROGRESS REPORT

RELATIVE SYSTEM LOADING BY COMMUNITY

   The SUMEX resource is divided, for administrative purposes, into 3 major
communities: user projects based at the Stanford Medical School, user projects
based outside of Stanford (national AIM projects), and common systems development
efforts.  As defined in the resource management plan approved by BRP at the start
of the project, the available resource in terms of CPU capacity and fi.le space
will be divided between these communities as follows:

Stanford      40%
AIM         40%
Staff       20%

The 11available'8 resources to be divided up in this way are those remal.ni.ng after
various monitor and community-wide functions are accounted for.  These include
such things as job scheduling, overhead, network service, file space for
subsystems and documentatjon, etc.

   The monthly usage of CPU and fjle space resources for each of these three
communities relative to their respective aliquots is shown in the plots in Figure
8 and Figure 9.  It is clear that the Stanford projects have held an edge in
system usage despj.te our efforts at resource allocation and the substantial
voluntary efforts by the Stanford communi.ty to utj.lize non-prjme hours.  This
reflects the development of the Stanford group of projects relative to those
getting started on the national side and has correspondingly accounted for much
of the progress 1-n AI program development to date.

J. Lederberg

36


Section 1.2.2.12

DETAILED PROGRESS REPORT

40 t  National AIM

0L:::::::;:::::::::::;::::::::::: ::I-
ASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJ

1974          1975              1976             1977

40+ Stanford

.
r;l
2
4

w
0

w
  04::: ; ; ; i : ; : ::: : : : : : : : : : : : : : : : : : : : : : +
  ASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJ
     1974          1975              1976             1977

System Staff

!j
d

w
0

w
  07::::::::::::::::::::::;::: :::::;::+
  ASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJ
     1974          1975              1976             1977

Figure 8.  CPU Usage by Community

J. Lederberg                      37           Privileged Communication


DETAILED PROGRESS REPORT

Section 1.2.2.12

w ok:::::::::::::::::::;;;:::::.......,
  ASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJ
     1974          1975              1976            1977

  40~ Stanford
-z
s
:
it
cn

.
2
B
4

w
0

w  04 :::;;; ;;;;;: :;;;::: ~'.."'.".'...+
  ASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJ
     1974          1975              1976            1977

  20~  System Staff
z
s
2
$
w

N  o+:::::::;::*.;;;:::::.......-.......~
  ASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJ

1974          1975              1976            1977

Figure 9.  File Space Usage by Community

Privileged Communication

38

J. Lederberg


Sect.ion 1 .2.2.12

INDIVIDUAL PROJECT AND COMWNITY USAGE
           -- -

DETAILED PROGRESS REPORT

    The table following shows cumulati.ve resource usage by project i.n the past
grant. year.  The data di.splayed i.nclude a descripti.on of the operational funding
sources (outsi.de of SUMEX-supplied computing resources> for currently active
projects, total CPU consumption by project (Hours),  total termi.nal connect ti.me
by project (Hours), and average file space in use by project (Pages, 1 page O 512
computer words).  These data were accumulated for each project for the months
between May 1976 and April 1977.  Again the well developed use of the resource by
the Stanford community can be seen.  It should be noted that the Stanford
projects have voluntarily shifted a substantial part of their development work to
non-prime time hours which is not shown i.n these cumulative data.  It should also
be noted that a si.gnificant part of the DENDRAL and MYCIN efforts, here charged
to the Stanford aliquot, support development efforts dedicated to national
community access to these systems.  The actual demonstration and use of these
programs by extramural users is charged to the national communi.ty in the "AIM
USERSI category, however.

J. Lederberg

40


DETAILED PROGRESS REPORT

RESOURCE USE BY INDIVIDUAL PROJECT
         --

STANFORD COMMUNITY

1)

2)

3)

4)

5)

6)

7)

DENDRAL PROJECT
"Resource Related Research
Computers and Chemistry"
NIH RR-00612-08
(3 vs. 1977-80)
ARPA DAHC-15-73-C-0435
(2 vs. 1977-79)

HYDROID PROJECT
"Distributed Processing
and Problem Solving"
ARPA DAHC-15-73-C-0435

MOLGEN PROJECT
NSF MCS76-11649
NSF ~~~76-11935
(2 yrs. 1976-78)

MYCIN PROJECT
"Computer-based Consult.
in Clin. Therapeutics'
HEW HS-01544 (2 yrs. 1977-79)
NSF (2 vs. 1977-79)

CPU       CONNECT
(Hours)     (Hours)

1181.64     19657.56

FILE SPACE
(Pages)

13058

40.92

924.49

239

85.61     2487.73      1853

410.87

6640.75

6688

PROTEIN STRUCT MODELING
"Heuristic Comp. Applied
to Prot. Crystallog.t1
NSF DCR 74-23461
(2 vs. 1977-79)
ARPA DAHC 15-73-C-0435

159.80     2894.19

AIHANDBOOK PROJECT


PILOT PROJECTS
(see reports in
Section 6.3 in
Book II)

26.46

327.67







----w-m

464.42

5919 033

---------

COMMUNITY TOTALS          2232 097    38988.47      28460

Section 1.2.2.12

2477

639

3506

---B-B

41

J. Lederberg


Secti.on 1.2.2.12

NATIONAL AIM COMMUNITY

I)

ACT PROJECT
"Acquisition of
Cognitive Procedures"
NIMH MH29 353
ONH N0014-77-6-0242

2)

HIGHER MENTAL FUNCTIONS     206.03
"Computer Models jn
Psychiatry and Psychother."

NIH MH-27132-02 (2 yrs.)
UCLA NPI Gen. Res.

3)

INTERNIST PROJECT
(DIALOG)
"Computer Model of
Diagnostic Logic"
BHRD rE!-00144-03 (3 yrs.)

4)

MISL PROJECT
"Medical Information
Systems Laboratory"
US-PHS-MB00114-03 (3 yrs.)

5)

RUTGERS PROJECT
"Computers in Biomedicine"
NIH RR-00643-05 (3 yrs.)

6)

SECS PROJECT
I'Chemical Synthesi.~~~

7)

AIM PILOT PROJECTS
(see reports in
Section 6.4 in
Book II>

8)  AIM Administration

9)

AIM Users

57.02

COMMUNITY TOTALS

205.20

2721.26

3535

9.27       380.05       876

139.63

2433.43

10862

308.96

4374*03

4515

40.91

1326.56

1558

11.13

56.89



-------

1035.04

1195.84       986

2680.16      2198

DETAILED PROGRESS REPORT

  383.22      1762

  672.35       362


---------       -e----

16166.90      26654

J. Lederberg

42


DETAILED PROGRESS REPORT

SUMEX STAFF AND SYSTEM

1) Staff

2) Miscellaneous

3) Operations

COMMUNITY TOTALS

RESOURCE TOTALS          5757.45     143977.15     101136

Section 1.2.2.12

903 007     23198.86      11919

80.87      2508.98      1721

1505.50     63113.94     32382



----w-B       ---------       ------

2489.44     88821.78      46022

-------        ---------       ------
-------        ---------       ---mm-

43

J. Lederberg


Secti.on 1.2.2.12

DETAILED PROGRESS REPORT

SYSTEM DIURNAL LOADING VARIATIONS
       --

   The following fi.gures give a picture of the recent variations i.n diurnal
SUMEX system load, taken duri.ng March 1977. The plots i.nclude:

Figure 10   - Total number of jobs logged in to the system

Figure 11 - Percent of total CPU ti.me used by logged i-n jobs (maximum is 200%
        for dual processor capaci.ty)

Figure 12  - Percent of total CPU time consumed as overhead; I/O wait, core
        management, scheduling, etc. (maximum O 200%)

Figure 13 - Balance set size (number of jobs in core)

Fi.gure 14 - Number of runnable jobs (whether or not in core)

    The abscissa for these plots is broken into 20 mi.nute intervals throughout
the day.  The ordi.nate for each i.nterval i.s the average of all the daily
measurements for that interval over the weekdays during March 1977. A daily
measurement for a given 20 minute interval is in turn an average of the
appropri.ate statistic sampled every 10 seconds.  Since these plots display
overall average data, they give representative i.llust.rati.on of the general
characteri.sti cs of diurnal loading.  There are, of course, substantial
fluctuations in the quanti t.ies measured from day to day as well and for some,
also on time scales shorter than the intervals displayed in the figures.   For
example in Figure 14, the number of runnable jobs (equi.valent to the system "load
average") shows a fairly smooth curve peaking at 6.7 jobs.  On both a scale of
minutes and from day to day, however, the number of runnable jobs will vary from
only a few to 12 or more.  Thi.s fluctuation is not shown i.n these average plots
but also plays a role in the resp0nsi.venes.s of the system.

    In the heading of each plot are shown range statistics for the measurement
over various parts of the day.  Range data include the mi.ni.mum value rlLowll,
average value llAvel', and maximum value "High".  The fi.rst line of the heading
gives the range over the whole day and on succeeding lines, "Prime Time" covers
6:00-18:00 Pacific time and "Non Prime Time" covers the remaining night time
hours.

    It can be noted i-n Figure 12 that the current overhead level for the dual
processor system is quite hi.gh (about 33% per processor).  This i.s because of the
li.mited memory si ze (256K words) we currently have and the resulting increase in
swapping i.nterrupt rate and I/O wai.t time.  We have a proposal pending wi.th the
AIM Executive committee to augment our memory which should reduce this overhead
down to our earlier single processor levels (about 15-20% per processor).

J. Lederberg

44


DETAILED PROGRESS REPORT

Section 1.2.2.12

   Figure 10.  Average Diurnal Loading (3/77): Total Number of Jobs

50-1                        Total Day (Low= 13.2,  Ave= 23.7, High= 37.2)
   I                 Prime Time (Low= 13.3, Ave= 28.4, High= 37.2)
   I              Non PrJme Time (Low= 13.2, Ave= 17.9, Hjgh= 22.7)
   I

   I
  -I                        @@@
   I                      wxweeee@eeeee
   I                     @eeeeeeee@emweQQe@
   I                   QQQQQQQ@QQQQQQQQQQQQQQQQQ
     I                  QQQQQQQ@QQ@Q@QQQQQQQQQ@Q@fi!@
  -I                 Q@QQQeQQQQQQQQQQQ@QQQQQQ@QQQQ
     I                 QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ     QQQQQQQ
      I               QQQQQQQQQQQQQeQQQQQQQQQQQQQ@QQQQ@Q~QQQQQQQQQQQQQQ
   IQQ           @QQQQQQ@QQ@QQeQ@QQQQQQQQQQQQQQQQQQQQQQQQeQQQQQQQQQQQ
   I QQQQQQ@@      QQQQ@QQQQQQQ@eQQQQQQQQQQQQQQQQQQQQQQQQ@QQQQQQQQQQQeQQQQ
  -1QQQQQQQQQQQQQ@Q@QQQQQQQQQQQQQQQQQQQQQQQQQQ~QQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
  lQQQ@QQQQQQQQQQQQQQQQQQQQQQQeQ@QQQQQQQQQQ@QQQQQQQQQQQQQQQQQQQQQQQeQQQQQQQ
  IQQQQ~QQQQQeQQQQQQQQQQQQQQQQQQQeQQQQQQQeQQQQQQQQQQQQQQQeeQQQQQQQeQQQQQQeQ
  IQQ@QQ@QQQQQQQQQQQQQQ~QQQQQQQQQQQQQQQQQQQQQQQQQ~QQQQQQQQQQQ~QQQQQQQQQQQQQ
  IQQ@QQeQQQQQQ@QQQQQQQQQQQQ@QQQQQ,QQQQQQQQeQQQQQQeQeQQQ@QQQQQQQQQQQQQQQQQQQ
PAC   +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
TIME  0     2    4    6    8    10    12    14    16    18    20    22    24

Figure 11.  Average Diurnal Loading (3/77): Percent Time Used

200-i
    I

I

Total Day (Low= 39.2, Ave= 92.6, High= 133.5)

I

Prime Time (Low= 39.2, Ave= 104.3, High= 133.5)

Non Prime Time (Low= 48.5, Ave= 78.1, High= 117.5)

     I
     I                       QQQQ@Q      QQQQQ Q
     I                    eQQeewiwQ ee QQQQQQQQQQ       Q
     I                  Q@QQQQQQ@QQQQQQ@QQQQQQQQQQQ       Q@ Q
  -I                  QQQQQQQQQQQQQQQQQQQQQ@QQQQQQ      @QQQQQQQ
     I                 QQQQQQQQQQ@QQQQeQQQQQQQQQQQ@QQQ Q   @QQeQ@eQQ
   IQ QQ Q          QQ@QQQQQQQ@QQQ@Q@QQQQQQQQQQ!?QQQQQQ QQQQQQQeQQQQQ
   IQ@@@@@ Q       Q~QQQQQQQQQQQQQQQQQQQQ~QQQQQQQQQQQQQQQQQQQQQ~QQQQQ
   ieQeQeQQ@ eQ     QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ@QQQQQQeQQQQQQQQQQQ
  -IQQQQQQQQQQQQQQQ   QQQQQ@QQQQQeQQQQQQQQQQQQQQQQQeQQeQQQQQQQQQQQQQQQQQQeQ
  IQQQQQQQQQQQQQQQQQQ~QQ@QQQQQQQ@QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
  1QQQQQQQQQQQQQQQQQQQQQQQQQ@QQQQQQQQQQQQQQ~QQQQ~QQQQQQQQQQQQQQQQQQQQ@QQQ~Q
  IQQ@QQQQQ@QQQQQQQQQQQQQQQQQQQQQQQQQQQeQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQeQQQQ
  IQQQQQQQQQ&QQQQQQQQQQQQQQQQQQQ@QQQeQQQQQQ@QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
PAC    +-----+-----+-----+-----+ -----+-----+-----+-----+-----+-----+-----+-----+
TIME  0     2    4    6    8    10    12    14    16    18    20    22    24

45

J. Lederberg


Section 1.2.2.12

DETAILED PROGRESS REPORT

Fi.gure 12.  Average Diurnal Loading (3/77): Percent Overhead

200-i
    I
  I
    I

   Total Day (Low= 24.4, Ave= 46.7, High= 63.9)
  Pri.me Time (Low= 26.3, Ave= 52.5, Hi.gh= 63.9)
Non Prime Time (Low= 24.4, Ave= 39.5, High= 50.3)

I
-I
I

I

I
I                   QeeeeQQeeeeeeeQeeeQeeeeeee
-f                 @QQ@QQQQQQQQQQQQ@@QQQQ@@Q@Q@Q@@@ @   QQ@@@@Q@
IQ@    QQ' Q      eQQQ@QQQQQQBQeQeQQQQQ@Q@QQQe@Q@QQeQ@@QQQe@Q@Q@QQeeQ@
IQQQQQQQQQ @Q@@QQQQ@eQQQQQ@eQQ@QQQeQQQeQQQQ@Q@QeeQQQeQQe@QQQQ@Q@@eQQQeeQ@
l~@e@QQeeQ@eQQQQQQQQQQQ@QQQQQQQQQ~@Q~~Q~QQQ@QQ@QQ~@QQQ~QQ~QQQQ~Q~QQ~~QQ@@
IQ@QQ@QQQ@QQ@@QQQQQeQQQQQQQQQQQ@@Q@~~@QQ~Q@QQ~QQQ@Q~~@@~QQ@QQ@QQ@QQ~QQQ~~

TIME 6        il        1; 14 lk 18 2;) 22 2i

Figure 13.  Average Diurnal Loading (3/77): Balance Set - Jobs in Core

12-I
   I

   I
t

   Total Day (Low= .7, Ave= 2.4, High= 4.9)
  Prime Time (Low=  .7, Ave= 3.1, High= 4.9)
Non Prime Time (Low= .8, Ave= 1.6, High= 2.8)

     I
  -I
   I
     I
     I                         Qee
   I                        QQQQQQ@   Q QQQQQQ
     I                       @QQ@QQQQ@QQQQQQeQ@@QeQ
  -I                    Q@eeQQ@QQ@@QQQQQQQQQQ@QQQQQ       e
   1                  QQQQ@QQQQQQQ@QQQQQ@QQQ@QQQ'QQQ@Q    QQQQQ@Q
   I@                 QQ@@QQQQQQQQQ@QQQQQQQQQ@QQQQQQQQQQQQQQQQQQQQQQQQe
   ;QQ@QQ@QQ @QQQ    Q@@QQQQQQQQQQQQQQQQQ@QQQQQQQQQQQQ@QQQQ@eQQeQeeQQ@eeeQ
   IQQQQQQQQQQe@QQQQQ@QQQeQQQeeQQQQQQ@QQQeQQeQQQQQQQQQQeQQQeQQQQQQQeQeQQeQ@Q
PAC    +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
TIME  0     2    4    6    8 10 12 14 16 18    20 22    24

J. Lederberg

46


DETAILED PROGRESS REPORT

Secti.on 1.2.2.12

Figure 14.  Average Diurnal Loadi.ng (3/77): Runnable Jobs

   Total Day (Low= .7, Ave= 2.9, High= 6.7)
  Prime Time (Low= .7, Ave= 3.8, High= 6.7)
Non Prime Time (Low= .8, Ave= 1.7, High= 3.1)

     I                           Q
  -I                        QQQQ
     I                        QQ@Q@@Q      QQ Q
   I                       Q@QQQQQQQ@QQQQQQQQQQ
   I                     Q QQQeQQQQQeQeQQQQQQQQQQ
   I                   QQQ@QQQQQQQQQeQQQQQQQQQii?QQQ
  -I                    QQQQQQQQQQQQQQQQQQQQQQQQQQQQ       QQ
   I                  QQQ@QQ@@QQQQ@QQQQQQQeQQQQ@&?Q@Q    QQQQQQQQQ
   lQ     8           ~QQQQQQQeQQQQQQQQQQQQQQQQ~QQQQQ~QQQQQ~QQQQ~QQ@~QQ~
   ~QQQQQQQQQ QQQQQ   ~QQQQ@QQQQQQQQ@@QQQQeQQQQQQQeQQQQQQQeeQQQQQQQQQQQ@QQQ
   IQQQQQQQQ~QQ@QQQQQ@Q@QQQQQQQQQQ~QQ@@QQQQQQQQQQQQQQQQQQ~QQ~QQ~QQQQQQQQQQQQ
PAC    +-----+-----+-----+-----+-----+-----+-----+-----+-----+ -----+------+------+
TIME  0     2    4    6    8 10 12 14 16 18 20 22 24

47

J. Lederberg


Section 1.2.2.13

DETAILED PROGRESS REPORT

1.2.2.13   NETWORK USAGE STATISTICS

NETaORK USAGE PLOTS

   The plots in Figure 15 show the major billing components for SUMEX-AIM
TYMNET usage.  These include the total connect time for terminals coming into
SUMEX and the total number of characters transmitted over the net. The ratio of
characters received at SUMEX to characters sent to the termi.nal is about 1:12
over our peri.od of usage.  Also shown for recent months is a plot of ARPANET
connect ti.me whi.ch tracks the corresponding data for TYMNET usage fairly closely.
No data for  "character I1  transmission is available for ARPANET since file
transfers and terminal traffi.c use different byte sizes and these data are not
resolved and maintained for the ARPANET.

J. Lederberg

48


Section 1.2.2.13                              DETAILED PROGRESS REPORT

800

TYMNET-

t

ASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJ
  1974          1975              1976             1977

20 t  TYMNET-

18                          I\ A/

14--


12..


1 o--


8.-


6..


4..


2-m

04::::: :::::::::::;::;:::::::: :;::: :*
ASONDJFMAMJJASONDJFMAMJJASONDJFMAMJJ
  1974          1975              1976             1977

Figure 15.  TYMNET and ARPANET Usage Data

J. Lederberg                    49

Privileged Communication


Section 1.2.2.14

DETAILED PROGRESS REPORT

1.2.2.14    PUBLICATIONS

   The following are publications for the SUMEX staff and have included papers
describing the SUMEX-AIM resource and on-going research as well as documentation
of system and program developments.  Publications for indivi.dual collaborating
projects are detailed in their respective reports (see Section 6 on page 41 in
Book II).

111

[21

[31

the

Carhart, R.E., Johnson, S.M., Smith, D.H., Buchanan, B.G., Dromey, R.G., and
Lederberg, J, "Networking and a Collaborative Research Community: a Case
Study Usi.ng the DENDRAL Programsn, ACS Symposium Series, Number 19, COMPUTER
NETWORKING AND CHEMISTRY, Peter Lykos (Editor), 1975.

Levinthal, E.C., Carhart, R.E., Johnson, S.M., and Lederberg, J., "When
Computers Talk to Computers", Industrial Research, November 1975

Wilcox, C. R., "MAINSAIL - A Machine-Independent Programming System,"
Proceedings of the DEC Users Socjety, Vol 2, No 4, Spring 1976.

Mr. Clark Wilcox also chaired the session on "Languages for Portability" at
DECUS DECsystemlO Spring `76 Symposium.

In addition as reported earlier, a substanti.al effort has gone into

developi.ng, upgrading, and extending documentation about the SUMBX-AIM resource,
the SUMEX-TENEX system, the many subsystems available to users, and MAINSAIL.
These efforts include a number of major documents (such as SOS, PUB, and TENEX-
SAIL manuals) as well as a much larger number of document upgrades, user
information and introductory notes, an ARPANET Resource Handbook entry, and
policy guidelines (see Appendix VI, and Appendi.x VII in Book II).
Publications for individual user projects are summarized in the respective
reports (see Section 6 in Book II).

51

J. Lederberg



INTRODUCTION - BOOK II

BOOK II
- .-

Collaborating Project Reports and Supporti.ng Appendixes

   The following sections detail the reports and plans for SUMEX-AIM
collaborati.ng projects and also contain addi ti.onal information in the form of
appendixes relati.ng to the core resource progress and operation.  The heading and
page numbering of these sections does not continue sequent.ially from that of t.he
Book I progress report.  The di.sconti.nui.ty reflects the i.ni.ti.al organi zati.on of
this materi.al as part of our renewal grant appli.cati.on.

J. Lederberg


COLLABORATIVE PROJECTS

6    COLLABORATIVE PROJECT PROGRESS AND OBJECTIVES
                                    --

   The following subsections report on the collaborative use of the SUMEX
facility including the formally authorized projects wi.thin the Stanford and AIM
ali.quots and the various 1fpilot11 efforts currently under way.  These project
descri.pti.ons and comments are the result of a soli.citation for contributi.ons sent
to each of the project Principal Investigators requesti.ng the following
i.nf ormati.on :

I) Summary of research program
A) Technical goals
  B) Medical relevance and collaboration
C) Progress summary
  D)  Up-to-date list of pub1i.cat.i ons

IL) Interacti.ons with the SUMEX-AIM resource
A) Examples of collaborati.ons and medical use of programs via
      SUMEX
B) Examples of shari.ng,  contacts and cross-fertili zation with other
    SUMEX-AIM projects (via workshops, system faci.lities, personal
    contact, etc.)

We beli.eve that the reports of the i.ndivi.dual pr0ject.s speak for themselves as
rati onales for parti.cipation; i.n any case the reports are recorded as submi t.ted
and are the responsibi.li.ty of the indi.cated project leaders.

6.1    STAi'lFORD PROJECTS

   The following group of projects is formally approved for access to the
Stanford aliquot of the SUMEX-AIM resource.  Their access is based on review by
the Stanford Advisory Group and approval by Professor Lederberg as Principal
Investigator.  As noted previously, the DENDRAL project was the historical core
appli.cation of SUMEX.  Although thi.s is descri.bed as a "Stanford project," a
significant part of the development effort and of the computer usage i.s dedi.cated
to national collaborator-users of the DEiJDHAL programs.

41

J. Lederberg


Section 6.1.1

DENDRAL PROJECT

5.1.1    DENDRAL PROJECT

DENDRAL - Resource Related Research - Computers & Chemistry

Carl Djerassi, Principal Investigator
    Professor of Chemistry
     Stanford University

I.  OVERVIEW g RESEARCH ACTIVITIES

Technical Coals

   Our research, development and future plans focus on both the question of
structure elucidation in general and the problem of providing computer assistance
to scientists engaged in specific aspects of this important activity.

   A simplified representation of major milestones in solving unknown
biomolecular structures by manual methods is presented in Figure 1.

NEW STRUCTURAL

INFERENCES AND
CONSTRAINTS

SPECTROSCOPY                    COYmoN
                PLAN           AN0
                NEW -               EXAMINE
             EXPERIMENTS T-----i '
                            UNIPUE     STRUCTURES
                           FEATURES

' REACTION :
`-
:-.
SEOUENCES :

a.____.-__ -_--__ i

Figure 1.  Important steps in manual solution of structures of unknown chemical
       compounds.

   These steps, indicated as separate boxes, may be performed explicitly or
implicitly.  There are considerably more complex relationships among the boxes of
Fig. 1 than are indicated when structures are actually solved.  Nevertheless, the
Figure provides a good introduction to both our recent work and our future
directions.  We describe briefly each of the milestones in the following
paragraphs.  More detailed discussions of each topic follow in subsequent
sections.

J. Lederberg

42

Privileged Communication


DENDRAL PROJECT

Section 6.1.1

   The first step in identification of an unknown structure is to separate it
from other components in a potentially complex mixture and to isolate it in
reasonably pure form.  These steps are performed by scientists, frequently with
the assistance of various instruments.  Although our research is not directed
toward any part of this separation and isolation procedure (except insofar as
these procedures also yield data which are subject to computer-assisted
interpretation), information about the chemical and physical characteristics of
tile compound may be crucial to further efforts to determine its structure.

    Depending on the quantity of sample available and its characteristics,
various spectroscopic and additional chemical data are then collected on the
unknown.  A mass spectrum is frequently obtained, e.g., from a combined gas
chromatograph/mass spectrometer (CC/MS) system.  An important part of our recent
proposal to the NIH is directed toward automation of combined GC/MS systems
operated at high mass spectrometer resolving powers.   Data on elemental
compositions and relative ion abundances are then available in computer-readable
form for further analysis (see MSRANK).  The chemist possess an armamentarium of
spectroscopic techniques which can be brought to bear on a structure.  One
advantage of our work is that any data so obtained can be used to help solve the
structure as long as it can be expressed, manually or by computer, in
substructural statements about the unknown.

   The next important phase in structure elucidation is interpretation of the
available data (Fig.  1) in terms of structural features of the molecule.  These
interpretations may be in terms of known structural units ("superatoms",
polyatomic aggregates of atoms in known configurations), or in terms of
structural units, ring sizes, proton or carbon distributions.  The latter set of
features represents constraints on the kinds of structures which are possible.
Our efforts in the area of computer-assisted data interpretation are focussed on
mass spectral and carbon-13 nuclear magnetic resonance (13CMR) data. We are
developing general approaches to automated analysis of these data in terms of
structural features of unknowns.

   Our recent efforts are summarized in Figure 2, and discussed in detail
subsequently.  We have been concerned with use of these data from two points of
view, planning and prediction (Fig. 2).  During planning, experimental data are
examined in order to extract specific structural information to be used in
assembling candidate structures.  In prediction each candidate structure is
tested to determine how closely its predicted spectrum agrees with the observed
spectrum.  The candidates can be ranked accordingly.  The Meta-DENDRAL research
is directed toward determination of rules of spectroscopic data which can be used
either for planning or prediction (see below).

    Given possible structural fragments of the complete molecule and
constraints on how these fragments may be assembled into complete molecules, a
process of structural assembly follows (Fig. 1).  There has been no proven
algorithm for solving this problem prior to earlier work supported by the current
grant.  Traditionally, this process has been left to manual, pencil and paper
work. Our CONGEN program, which was designed to solve this problem, is the
farthest advanced of programs designed to assist in various aspects of structure
elucidation.  It performs the structural assembly process, under constraints, and

Privileged Communication

43

J. Lederberg


Section 6.1.1                                         DENDRAL PROJECT

DATA INTERPRETATION

"PLANNING"

PREDICTION

EXTRACTION OF STRUCTURAL         USE OF SPECTROSCOPIC

INFORMATION DIRECTLY FROM            DATA TO RANK

SPECTROSCOPIC DATA.                CANDIDATE STRUCTURES,

1, MASS SPECTRA - MDGGEN        1, MSPRUNE, MSPRED
2, I3CNMR                  2, 13CNMR

\ /
  META - DENDRAL

FORMATION OF RULES TO BE

USED FOR BOTH PLANNING

AND PREDICTION,

Figure 2.  Relationship between use of rules in either planning or prediction.
     Both approaches are used in utilizing data for structure elucidation.

J. Lederberg

44

Privileged Communication


DENDRAL PROJECT

Section 6.1.1

allows the scientist using the program to examine structural candidates and
remove those deemed implausible (Fig. 1).  A large portion of our recent and
future work is directed toward improving the CONGEN program and building other
facilities around it (see later sections).  We have demonstrated the utility of
CONGEN in structural studies, and subsequent sections discuss our recent
developments and applications of CONGEN as well as our interactions with other
scientists desiring access to our programs.

   Given a set of structural candidates,  the experimenter examines them to
determine what experiments might be performed to focus on the correct structure
by stepwise rejection of alternative hypotheses.  When there are only a small
number of possibilities under consideration, manual methods suffice.  But CONGEN
provides the capability for exhaustive enumeration of structural possibilities at
a point in a structural problem when there may be many hundreds of possibilities.
It is very difficult to examine these structures and plan experiments by hand.
We have begun exploring ways to provide computer assistance to this important
aspect of structure elucidation.  We refer to this research area as the
Experiment Planner, discussed in more detail below.

   When new experiments have been planned the researcher carries them out and
uses the results as additional constraints on the structural candidates (Fig. 1).
New experiments may include collecting of additional spectroscopic data or
performing a sequence of chemical reactions on the unknown. The latter
experiments may be chosen to convert the unknown into a related compound which
possesses physical or chemical properties more amenable to analysis.  During the
past year we have developed a program to assist scientists in carrying out
representations of chemical reactions in the computer and eliminating undesired
structural candidates based on constraints exercised on the products of the
reaction.  This work is described in two subsequent sections.  One section
describes use of the program, which we call REACT, to explore structural
possibilities exactly as outlined above.  A later section describes recent
progress in increasing the power of REACT.

Medical Relevance

    Structure elucidation is a fundamental problem for medical practice and
biomedical research.  For example, we are collaborating with physicians in the
Department of Pediatrics who monitor the body fluids of newborn infants in order
to detect abnormal compounds.  Much of the research leading to new drugs and new
methods for synthesizing drugs also depends on careful analysis and
identification of molecular structures of compounds.  The computer tools that we
are developing will aid in the determination of molecular structures by giving
working scientists help with data collection, data interpretation, hypothesis
testing and, most important, systematic consideration of all molecular structures
that are consistent with the interpretations of the available data.

Privileged Communication

45

J. Lederberg


Section 6.1.1                                          DENDRAL PROJECT

PROGRESS SUMMARY

Experiment Planner

   We have begun preliminary considerations of design and implementation of an
experiment planner.  This program will assist chemists in designing the most
effective set of experiments to perform to solve the structure.  Although the
experiment planner will be a future activity of our group, we are developing and
using other structure manipulation functions which will provide groundwork for
future developments.

   One important aspect of experiment planning is the ability to examine in
some way the set of candidate structures.  Although many can be drawn for visual
review, drawing is impractical when dozens or hundreds of structures are
involved.  To assist persons using CONGEN in reviewing their structures we have
developed a function auxiliary to CONGEN which we call SURVEY.

SURVEY

FUNCTION:  AIDS IN PERCEPTION OF ANY OF A

PRE-SPECIFIED SET OF STRUCTURAL

FEATURES IN A GROUP OF

STRUCTURAL CANDIDATES,

E,G, A) FUNCTIONAL GROUPS
   B) TERPENOID SKELETONS

C> AMINO ACID SKELETONS

Figure 3.  Function of the SURVEY program and examples of recent application
           areas.

   The function of SURVEY is summarized in Figure 3.  SURVEY simply acts as a
reminder to the scientist of the presence or absence of certain structures or
structural features.  During the past year we have used SURVEY extensively.   For
example, we have used it to detect implausible functional groups in a set of
candidate structures, using a file of substructures representing a wide variety
of functionalities.  In many problems, implausible functional groups are
forgotten and CONGEN is never constrained to remove them.  Another example of use
of SURVEY is in conjunction with collaborative work with persons in the

J. Lederberg

46

Privileged Communication


DENDHAL PROJECT

Section 6.1.1

Ijepartment of Genetics.  In analysis of serum or urinary metabolites in patients
of high risk of metabolic disorder,  we have had occasion to use CONGEN in
exploration of unknown structures [Report HPP-77-111. Some of these structures
could formally be conjugates of amino acids with organic acids.  If so, such
structures will possess backbones of naturally-occurring amino acids.  SURVEY was
used to provide a summary of which structural candidates possessed such amino
acid skeletons.

   We have recently used SURVEY in a related application involving the
structure of "polyalthenol", discussed by LeBoeuf, et al.  (Figure 4).
Superatoms and constraints supplied to CONGEN to derive structural candidates are
summarized in Fig. 4.

   We summarize in Figure 5 the structural possibilities which resulted.
There are five structures possessing a bicyclo[2.1.1] system, and six which
possess a bicyclo[4.3.1] system (Fig. 5, top).  These structures are enerqeticaly
less favorable.  For example, several possess a double bond at a bridgehead atom,
which violates Bredt's Rule.  Tnere remain, however, 11 structures which are not
formally excluded by data presented by LeBoeuf, et al. Because these workers
based their structural assignment on biogenetic grounds, we used SURVEY and REACT
to test their hypothesis.  We have, in computer-accessible libraries, known
terpenoid ring systems which can be used within SURVEY to test sets of structures
for known skeletons.  None of the 22 structural candidates possesses a previously
known skeleton.  Because the authors postulated a relationship to a known
skeleton via a single methyl shift,  we used REACT to exercise a single methyl
shift in all possible ways on each of the 22 candidates.  SURVEY was then used to
test the results for tile presence of known terpenoid systems, and the drimane
skeleton, the postulated precursor of polyathenol, was the only known skeleton
which resulted.  This does not prove the hypothesis of LeBoeuf, et al., but
certainly helps strengthen it.

    SURVEY is, however, only the barest beginning of an experiment planner,
even though it has proven useful.  We plan to build from this beginning toward a
much more powerful system.

Privileged Communication

47

J. Lederberg


Section 6.1.1                                         DENDRAL PROJECT
M. LeBoeuf, M. Hamonni\ere, A. Cave/, H. Gottleib, N. Kunesch, and E. Wenkert,
Tet. Lett., 3559 (1976).
--

"POLYALTHENOL"

C23H31No

ithJPFRATOMS

crl FV
:I I
    N

._.

Y
CH3-c-:H-CH2
I OH
CH3

CH3-FV

FV-CH2-FV

Y
FV-CH-FV

fV
CH-FV

XH=C /
  \ /F-FV
CH3 FV

ARsrTRARV NUMBER

   IN          1

BI

CH2

CH

1

1


3

1

STRAIbjI.5

1) ALL FREE VALENCES BONDED TO NON-HYDROGEN ATOMS

2) GOODLIST

       IN-CH2-BI       1 TO ANY
(EVENTUALLY IN-CH2-CHO,O)
     ME- (BI CH)      1 TO ANY
(EVENTUALLY CH3-CH, EXACTLY 1)

3) GOODRINGS                    2 EXACTLY 5

4) BADRINGS                      3

Figure 4.  Superatoms and constraints supplied to CONGEN in investigations of
     plausible structural alternatives to the proposed structure of
       Polyalthenol.

J. Lederberg                     48

Privileged Communication


DENDRAL PROJECT                                      Section 6.1.1

    OH
HCH2C/H
   P ' ,CH3               OH
%F c,H2
      IN

(5)

-KY
   \    OH
(FH2
IN

y+-NOH yPOH q

`OH

Figure 5.

Structural candidates for
polyalthenol based on data
given in Figure 4.

Privileged Communication

J. Lederberg


Section 6.1.1

REACTION CHEMISTRY DEVELOPMENTS

DENDRAL PROJECT

1, SEPARATION FROM CONGEN - COMMUNICATION VIA FILES OF
   STRUCTURES,

2, ADDING CONSTRAINTS - SITE - AND TRANSFORM - SPECIFIC,

3, CONTROL STRUCTURE - RAMIFICATION

A, ESTABLISH RELATIONSHIPS AMONG PRODUCTS AND REACTANTS
B, DEAL PROPERLY WITH RANGES OF NUMBERS OF, PRODUCTS

4. INTERACTION - DEVELOP MANIPULATION COMMANDS WHICH

PARALLEL LABORATORY OPERATIONS, E,G,,
SEPARATE INTO FLASKS, TEST CONTENTS OF
VARIOUS FLASKS, INCOMPLETE SEPARATIONS,
ETC,

5, REPRESENTATION OF REACTIONS

6, PROSPECTIVE DETECTION OF DUPLICATE PRODUCTS BASED ON

SYMMETRY PROPERTIES OF:  A) STARTING MATERIAL; AND
B) TRANSFORMATION,

Figure 6.  Current and future direction for improvement and extension of REACT, a
    program for exploration of applications of reaction chemistry to
      structure elucidation problems.

J. Lederberg

50

Privileged Communication



DENDRAL PROJECT

Section 6.1.1

Applications of REACT to Structure Elucidation Problems

   We have recently described our initial efforts toward representation of
chemical reactions and their use in structure elucidation problems [Report HPP-
76-51.  These efforts provided the framework for carrying out reactions within
the computer which emulate actual laboratory reactions performed on a unknown.
Constraints on the numbers and identities of the products are used to constrain
the reaction products and, implicitly, the starting materials.  Based on the
results of that work we drew up a set of steps to be carried out to provide a
truly useful tool for the chemist.  Although the current program can be used in
applications to real problems it has some fundamental limitations which we have
been working to solve.  The developments we have undertaken to improve REACT are
summarized in Figure 6.

   We first undertook to separate REACT from CONGEN, for two reasons. One
reason was due to program size.  Many functions of CONGEN are not needed in REACT
and become unnecessary when only REACT is being exercised.  The procedures of
structure generation (CONGEW) and REACT are sequential and a separate pro&ram
introduces no problems.  A second reason was the different uses of certain CONGEN
functions in REACT.  For example, the ways in which the graph matcher is used are
different between the two programs, necessitating keeping two different versions
around with the programs together.  The separation has been accomplished.   The
current version of REACT is now a separate program.  It communicates structural
information with CONGEN via files.  All interactive portions are consistent with
the structural manipulation functions of CONGEN so that learning the structural
language of CONGEN is sufficient to use either program.

    We have also added new constraint types to the reaction to expand greatly
the ways in which reactions can be defined and constrained.  An example of new
extensions to reaction definitions illustrates some of the new features (Figures
7-10).  The reaction defined here is one which will perform a dehydration of an
alcohol; the site of the reaction is defined in Fig. 7.

   The transform is defined as cleavage and loss of the oxygen resulting in
formation of a double bond between the two carbon atoms of the original site
(Fig. 7).  In this particular dehydration the chemist wished to specify a site-
specific constraint.  It was known that a tertiary butyl group was part of the
structure, and the dehydration will be prevented if that group is in close
proximity to the reaction site (i.e., in a position alpha to the carbinol
carbon).

   The definition of this constraint is given in Figure 8.  Subsequently, this
constraint ("HINDERED") is placed on BADLIST for constraints specific to the site
as shown in Fig. 9.  The completed definition of the reaction is summarized in
Figure 10.

Privileged Communication

51

J. Lederberg


Section 6.1.1

DENDRAL PROJECT

:EDITREACT
NAME:DEHYDRATION
(NEW REACTION)

*SITE
'CHAIN 3,
pATNAME 1 0

.HRANGE 1 1 1 3 13
SADRAW

DEHYDRATION: (HRANGES NOT INDICATED)

o-c-c

>DONE

"TRANSFORM
4NJOIN 12
>JOIN 2 3
>DELATS 1
>ADRAW

DEHYDRATION: (HRANGES NOT INDICATED>

c=c

>DONE

Figure 7.  Definition of reaction site and chemical transform in REACT.

J. Lederberg

52

Privileged Communication


DEtii)HAL PHOJECT                                      Section 6.1.1

"DEFINE-CONSTRAINTS
:?
PLEASE ENT-ER ONE OF:
GRIPE         BUGOUT
TRANSFORMSPECIFIC

GENERAL(G)
DONE

SITESPECIFIC
HALT

:SITESPECIFIC
NAME: HINDERED
(NEW CONSTRAINT>
(WARNING: THE FINAL CONSTRAINTS MUST HAVE AT
SITE)
'NDRAW

LEAST

ONE ATOM OF THE

HINDERED: (HRANGES NOT INDICATED)
NON-C ATOMS: 1 0

l-2-3

'BRANCH 3 2 4 1 4 1
'ADRAW

HINDERED: (HRANGES NOT INDICATED)

C

o-c-c-~-c
   C

'DONE

Figure 8.  Definition of a site-specific constraint to be applied to the reaction
        DEHYDRATION.

Privileged Communication

53

J. Lederberg


Section 6.1.1

DENDRAL PROJECT

*CONSTRAINTS
:?
PLEASE ENTER ONE OF:
GRIPE        BUGOUT

ST FOR CONSTRAINTS ON STARTING MATERIAL
S FOR SITESPECIFIC CONSTRAINTS
T FOR TRANSFORMSPECIFIC CONSTRAINTS
PR FOR CONSTRAINTS ON PRODUCTS
DONE
HALT

:S
>BADLIST
BADLIST CONSTRAINTS
CONSTRAINT NAME:HINDERED
CONSTRAINT NAME:

---m-w-
>DONE
:DONE

Figure 9.  Specification of constraint named HINDERED as a BADLIST constraint for
        the reaction.

J. Lederberg

54

Privileged Communication


DEIVDRAL PROJECT                                      Section 6.1.1

"SHOW
---
SITE:
NAME-DEHYDRATION
ATOM# TYPE ARTYPE NEIGHBORS HRANGE
1    0  NON-AR  2     l-1
2    C  NON-AR  1  3
3    C  NON-AR  2     1-3

DEHYDRATION: (HRANGES NOT INDICATED>
NON-C ATOMS: 1  0
l-2-3
TRANSFORM:
   UNJOIN 12
  JOIN 2 3
  DELATS 1

DEHYDRATION: (HRANGES NOT INDICATED)

2=3

CONSTRAINTS:
CONSTRAINTS ON STARTING MATERIAL:
NO CONSTRAINTS
SITE-SPECIFIC CONSTRAINTS:

-----me

BADLIST CONSTRAINTS
NAME
HINDERED

--we---
TRANSFORM-SPECIFIC CONSTRAINTS:
NO CONSTRAINTS
CONSTRAINTS ON PRODUCTS:

NO CONSTRAINTS
*DONE

(DEHYDRATION DEFINED)
(DEHYDRATION ADDED TO THE REACTION LIST)

Figure 10.

Summary of the completed
definition of the
DEHYDRATION reaction.

Privileged Communication

55

J. Lederberg


Section 6.1.1

DENDRAL PROJECT

  The remaining items summarized in Figure 6 are currently under development.
We are redesigning the control structure so that the scientist using the program
can use intuitive concepts as commands, such as separation.  To carry this out
important parts of the current mechanism have to be redesigned.  Although the
current program can be used effectively, its non-intuitive approach to dealing
witn reactions yielding multiple products and subsequent separation (within the
computer) and analysis of each product presents a barrier to use by a wider
community.  We are continuing to develop our capabilities for representing
reactions to ensure that the user of REACT has a complete descriptive language
with which to specify reactions.  We continue to study ways to avoid duplication
in carrying out reactions.  We know how to implement certain of the symmetry-
related constraints and will do so shortly.

CONGEN Developments

   The problem solving paradigm that has emerged from DENDRAL work is the so-
called "plan-generate-test" paradigm.  It is based on heuristic search of a space
of possible hypotheses with planning before generation of hypotheses and testing
of each generated candidate.

   The generator for DENDRAL, named CONGEN, is a general-purpose graph
generator which produces a list of all possible graphs containing specified
numbers of nodes of various types.  The most important features of the generator
are that the list of graphs is guaranteed to be complete and non-redundant and,
equally important, that the list need not be exhaustively generated.   The
generator can be constrained to produce only graphs that meet specified criteria
that are inferred from the initial problem data.

   During the past year, CONGEN has developed along two major lines: 1) tools
have been developed which will allow more efficient and llintelligent" use of
substructural information supplied by the chemist; and 2) data from chemical
reactions and from observed mass spectra can be used to eliminate unlikely
structural candidates from a set produced by a CONGEN generation.  These
extensions will be discussed below.

1) Intelligent use of substructural information as constraints

   There is sometimes a significant conceptual gap between the intuitive
chemical phrasing of a CONGEN problem and the phrasing which is most efficient,
in both computer time and storage requirements, for the program.  CONGEN provides
a rich language for stating structure elucidation problems in precise
substructural terms.  However, there are usually many ways of defining a given
problem and different definitions can place widely different demands upon the
program.  We have a continuing interest in reducing this conceptual gap by in
making CONGEN responsible for rephrasing a problem in the most efficient way,
thus freeing the chemist to concentrate upon the chemical, rather than the
algorithmic, aspects of a given case.

   One distinction which is frequently puzzling to new CONGEN users is the one
between superatoms and GOODLIST items.  A superatom is a polyatomic "building
block" which CONGEN joins with other superatoms and single atoms to form full

J. Lederberg

56

Privileged Communication


DENDRAL PROJECT

Section 6.1.1

structures.  GOODLIST items are substructures which are required to be present in
those full structures, but they are not incorporated directly into the initial
phrasing of a problem as are superatoms.  Rather, their presence or absence is
tested by a graph-matching routine after the structures are produced.
Frequently, a great many structures produced by the structure generator are
discarded by this final test and a significant amount of the program's time can
be spent "shooting blanks".  The concepts behind these two types of constraints -
that specified substructural features must be present - are similar, but their
implementations differ substantially in efficiency.

   GOODLIST items cannot simply be transferred to the superatom list, though,
because ';OODLIST items are allowed to share atoms and bonds with other GOODLIST
items or with superatoms.  For example,  if two substructures which are benzene
rings are placed on GOODLIST, then a naphthalene derivative will be an acceptable
structure even though the two occurrences of the ring have two atoms and one
aromatic bond in common.  Because of the building-block nature of superatoms,
they may be joined to one another by additional bonds in CONGEN, but never
"mergedfl (i.e, overlapped).  Thus the price of efficiency is a more restricted
interpretation of structural possibilities for superatoms.

    We have developed a new procedure which captures the best of both
situations.  In order to incorporate a GOODLIST substructure into the problem at
the earliest stage, it is necessary to find all unique ways that the given
substructure can be created using parts of the existing building blocks (atoms
and superatoms).  This produces a set of new CONGEN problems with more or larger
superatoms, each of which is easier to solve than the original one because the
GOODLIST item is built-in and needs not be tested.  Figure 11 shows schematically
some of the ways this construction might occur: a) by bonding together two (or
more) existing superatoms to create one larger one; b) by bonding additional
atoms to a superatom to create a larger one; and c) by constructing a copy of the
substructure from single atoms, creating a new superatom.

   Figure 12 summarizes a CONGEN problem which was attempted but which could
not be completed because of the unintelligent use of GOODLIST.  The problem
amounts to finding all ways of allocating three new bonds to the free valences
(the bonds with unspecified termini) in the superatom CEMB such that the three
indicated substructures are present in the final molecules.  There are perhaps
10,000 unique allocations of those three new bonds, but only 7 pass the GOODLIST
tests.  Using GOODLIST as a post-test only, COi4GEIV would generate all 10,000 and
discard nearly all of them, a process which would have been so lengthy that it
was never completed.  The constructive graph-matching routine approaches the
problem in a much more efficient and chemically intuitive way: 1) there are only
three places in which the first GOODLIST item can be constructed; 2) for each of
these, there are four ways of constructing the second; and 3) for each of these,
there are 0, 1 or 2 ways of incorporating the third.  It quickly arrives at the
correct set of solutions.

   Most CONGEN problems contain one or more GCODLIST items which can be
processed in this way, and when the constructive graph-matcher is fully
integrated into CONGEN, it will make a substantial difference in its ability to
use this structural information effectively.

Privileged Communication

57

J. Lederberg


Section 6.1.1

DENDRAL PROJECT

Cemb: H cc
       3       CH

H7

GOODLIST:

   I
CH3-C=CH-U--k-

CHpy=CH-YH-

Figure 11.  Example of breaking one GOODLIST substructure into several
       subproblems for CONGEN, each with different superatoms.

J. Lederberg

58

Privileged Communication


DENDRAL PROJECT

Section 6.1.1

CONGEN

PROBLEM

GOODLIST

ENTR'?

CONSTRUCTIVE

SUBSTRUCTURE

SEARCH

cc

!
A CH2CH

6:
2CH2

ccc

ETC,                           ETC,

Figure 12.  Example showing the inefficiency of specifying a constraint as a
       GOODLIST item instead of analyzing its implications for constructing
       allowable chemical graphs.

Privileged Communication            59

J. Lederberg


Section 6.1.1

DENDRAL PROJECT

2) New tools for post-pruning CONGEN structures.

   From an algorithmic standpoint, CONGEN is successful if it can, in a
reasonable amount of of time and without exhausting storage resources, produce a
list of candidate structures satisfying the chemist's constraints.  However, this
list is often quite large, perhaps several hundred structures, and from a
chemical standpoint the problem may be far from complete.  It remains for the
chemist to discriminate among the candidates, eventually reducing the
possibilities to just one structure.  A SURVEY function is available for
classifying the list into groups of chemically related structures using either
pre-defined or user-defined libraries of substructural features, and this process
can help the chemist perceive groups which might easily be ruled out by
additional experiments.  Also, the graph-matching (pruning) mechanism of CONGEN
allows him to express, in terms of substructural tests on the candidates, new
data which he gathers on the unknown.  These are both important aids in dealing
with a list of candidates, but are restricted to tests which can easily be
phrased purely in terms of structural features of the candidates themselves.

   There are two informative sources of data which cannot always be phrased in
this way: 1) structural features observed in products of the unknown when it
undergoes simple chemical reactions; and 2) empirical spectroscopic measurements
on the unknown which cannot be interpreted unambiguously in precise structural
terms.  During the past year, we have made progress in utilizing such
information.  The program REACT addresses the first problem while MSRANK concerns
the second, in the context of mass spectrometric observations.

2.1 REACT

   This program [see Report HPP-76-51 has two basic goals: 1) to provide the
chemist with a computerized language for defining graph transformations and
applying them to structures, thus simulating chemical reactions; and 2) to
automatically keep track of the interrelationships between structures in a
complex sequence of reactions so that whenever structural claims are made ruling
out structures at one level, the implications in terms of structures at other
levels can traced.  During the last year some progress has been made toward both
of these goals.

   EDITREACT, the reaction-editing language, has been extended to allow the
user to define subgraph constraints which apply relative to a potential reaction
site rather than to the molecule as a whole.  For example, in the present version
of REACT, we can say either that a hydroxyl group (OH), if present anywhere in
the reactant molecule, would inhibit the reaction, or that such inhibition would
take place only if the OH group is adjacent to the reaction site.  Such site-
specific constraints, applied either before or after the transformation (i.e.,
reaction) has been carried out on the site, are critical to the detailed
description of real chemical reactions.  The inclusion of this facility in REACT
substantially increases its usefulness in real-world chemical problems.

   The bookkeeping problem has undergone a complete reconceptualization in the
past year, the purpose being to mimic more closely the actual steps taken by a
chemist in the laboratory.  In the initial implementation, a set of products
arising from the application of a given reaction to a given starting structure

J. Lederberg

60

Privileged Communication


DENDRAL PROJECT

Section 6.1.1

could be subjected to a multi-level classification which grouped the products
based upon user-defined substructural constraints.  Each of these classes had an
associated minimum and maximum number, representing the numbers of products which
were allowed to be members of the class.  Any starting materials whose products
could not satisfy these conditions were removed from the list of candidates.
Structures in any class could be further reacted, their products classified, and
so on.  This treatment of bookkeeping was sufficient for stating many chemical
problems.  For example, suppose a chemist knew that a particular reaction on an
unknown compound yielded two carbonyl compounds (i.e., containing GO), at least
one of which was an ester (-O-GO).  He could define a product class CARBONYL
using the C=O substructure with a minimum and maximum of two products.  He could
then define a sub-class of CARBONYL called ESTERS using the substructure -O-C=0
with a minimum of one and a maximum of two products.  The program would
automatically use this information to eliminate candidate starting structures
which could not give the indicated product distribution with the given reaction.

   There are chemical problems, though,  for which the above scheme is too
ri,gid.  For example, suppose a reaction gives several products, two of which are
isolated and labelled Pl and P2.  Suppose that only a small amount of Pl is
available so only mass spectroscopic measurements are practical.  Suppose also
that a deuterium-exchange experiment shows that Pl has two exchangable protons
(say, either N-H or O-H).  P2 shows a strong carbonyl absorption in the IR. Pl
might also contain a carbonyl group, but that was never determined, and neither
was the number of exchangable protons in P2, which could be two.  No matter how
one attempts to use the above-described classification system, one cannot express
this information accurately.

   In the new approach, for which the algorithmic design has been completed,
one is allowed to express data in a much more natural sequence which parallels
the experimental steps.  The first experimental step after a reaction is usually
the separation and purification of products.  An analogous step is to be included
in REACT, in which the separation amounts to the setting up of a specified number
of labelled l'flask9 (analogous to the labels Pl and P2 in the above example)
each of which is ultimately to contain a specified number (usually 1) of the
products.  As experimental data are gathered on each real product, corresponding
substructure constraints are attached to the corresponding flask in the program.
As each such assertion is made, the bookkeeping mechanism verifies that, for a
set of reaction products from a given starting material, there is at least one
way of distributing them among the flasks such that each product satisfies the
constraints for its flask.  If this test is ever violated, the starting material
is removed as a candidate structure.  Flasks containing more than one product may
be further separated into "subflasks" to any level, and the contents of any flask
may be made to undergo further reactions.  This capability, the reacting of flask
contents, is analogous to common laboratory procedures in which incomplete
separations of products are encountered.  Dealing with such situations adds
considerable complexity to the bookkeeping mechanism, because the contents of a
flask may be ambiguous to the program when the reaction is applied.  REACT must
keep track of all possible structures which might, based on the current flask
constraints, occupy the reacting flask.  If such a reaction fails (because the
products did not satisfy the constraints specified for them), REACT does not
eliminate the starting structure entirely, but notes that the structure may not
occupy that flask in future flask-allocation tests.

Privileged Communication

61

J. Lederberg


Section 6.1.1

DENDRAL PROJECT

2.2 r4SRANK

   This program is an outgrowth of MSPRUNE described in last year's annual
report.  It is a combination of a predictor which uses a very simple theory of
mass spectrometry to predict the spectra of candidate structures, and an
evaluation function which compares the predictions with the observed spectrum of
the unknown, assigning a goodness-of-fit score to each candidate.  The candidates
are then sorted based upon how well they match the observations.  The basic
concept here is not a new one to the DENDRAL project [see, for example, Buchanan,
et al. in Machine intelligence 4 (Meltzer & Michie, eds., Edinburgh Univ. Press,
                             --
1969)], but there are some new aspects to the problem when viewed in the overall
CONGEN context.

    Because of the wide variety of structural types which can be produced by
CONGEN, it is necessary for MSRANK to use a very general model of mass
spectrometry.  The best predictive theories of mass spectrometry are limited to
families of closely related structures (i.e., class specific theories), and the
Meta-DENDRAL program is designed to help in discovering such theories.  There are
very few general principles upon which to draw in predicting mass spectra,
though, so MSRANK is limited to only the most approximate kinds of evaluation
functions.  One principle which we noticed being used by practicing mass
spectrometrists was: of two candidate structures for an unknown, the most likely
structure is the one which explains the observations most "simply" - i.e., with
the fewest complex explanations involving many bond cleavages and the transfer of
many hydrogen atoms.  The evaluation function used by MSRANK is based on a
quantitation of this principle.

   MSRANK is quite new and we have not yet had sufficient experience with it
to evaluate its overall usefulness,  By using only unit plausibilities for
selected characteristics of the mass-spectral cleavages, we are able to duplicate
earlier results obtained with the predictor/comparitor functions applied to mono-
and di-ketoandrostanes.  These tests serve to check the accuracy of the MSRANK
program.  We are now doing a systematic study of various classes of compounds by
ranking the spectrum of a known structure against a CONGEN-generated list of
structures which contains the correct one among several which are closely
related.

Stereochemistry in CONGEN

    We have started the complex task of giving CONGEN the capability of
recognizing stereochemical features of molecules and using stereochemical
information in structure determination.  The ability to recognize stereochemical
features would allow, for example, the generation of all stereoisomers of a given
topological structure with or without constraints.  The ability to use
stereochemical information would allow the determination of constraints on
stereoisomer (and topological isomer) generation caused by, for example, partial
knowledge of relative or absolute stereochemistry of structural fragments,
knowledge of overall molecular chirality (or lack of), absolute and relative

J. Lederberg

62

Privileged Communication


DENDRAL PROJECT

Section 6.1.1

stereochemistry from circular dichroism measurements, and so forth,  Thus far,
only the topological information (constitution) has been recognized and used by
CONGEN.

   The first stage of this development is to produce a program which generates
all the stereoisomers of a given topological structure.  This program will be
placed at the end of the existing CONGEN program.  The present report describes
the development of the theory and algorithm for stereoisomer generation and the
progress on the programming of this algorithm.

The GWHRMS DATA SYSTEM

New Developments

    In addition to upgrading old versions of the high resolution system, work
is being done on creating a low resolution system for the MAT 711.  The ultimate
aim is collect data that can be run through CLEANUP, a program that resolves
multiple spectra under a single GC peak, and cleans up the final spectra.  The
problem with the current system is that we cannot scan fast enough to provide
CLEANUP the data it needs.  The high resolution system requires resolution good
enough to separate sample peaks from the reference peaks.  If the scan is sped up
past a certain point, SAMRUN can no longer separate the peaks, and therefore
cannot calibrate the run.  At the same time, CLEANUP requires at least 7 spectra
across a CC peak be taken to insure resolution of multiple spectra.  The
fundamental problem then is that an alternate method of calibrating the mass
spectrum, without using known calibration peaks, must be found before scan speeds
required by CLEANUP can be achieved.  The most direct solution to this is to
directly measure the magnetic field strength of the instrument, and using it to
calculate the mass that is being observed.  To do this we inserted a hall probe
between the poles of the magnet, and connected it to the data acquisition system
on the PDP-1 l/20.

   The main problems with the hall probe are as follows: 1) to make sure that
the ion reading and the hall probe reading are simultaneous 2) to insure that the
correct hall reading can be assigned to the correct ion reading 3) to determine
the reproducibility of hall readings versus mass being observed in both dynamic
(scanning) and static situations and 4) to decide if the probe has the speed and
accuracy to calibrate the instrument.  The first two problems are a matter of
hardware.  The configuration -of the original data collection system is as
follows: the ion detector goes to an A/D converter, which is connected to a DMA.
The DMA is on an 11/20, which has a data collection system, SAQMON, running. This
performs various low level filtering and buffering operations. The DMA is
actually a low level processor which counts the number of samples taken, stores
them into successive memory locations, and interrupts the central processor when
a block of data has been collected.  The timing of the sample collection is
controled by a quartz crystal clock.  On each timing pulse, a signal is sent to
the A/D on the ion detector to convert that value to a digital number. To

Privileged Communication

63

J. Lederberg


Section 6.1.1

DENDRAL PROJECT

accommodate the hall probe, the DMA was modified so that on the timing pulse, the
start signal is sent simultaneously to both the A/D on the ion detector and the
A/D on the hall probe.  The DMA then services both of the A/D's, and stores the
readings in successive memory locations.  The net result is that when the DMA
interrupts the central processor, the block of data is a set of pairs of
readings, an ion reading and the hall reading for that time.  This solves both of
the first two problems, since we now have the ion reading and the hall reading
connected both in time and location.

   The second two problems, testing the reliability and reproducibility of the
hall probe, requires new software.  We are currently modifying portions of the
calibration mechanism of the high resolution system to calculate masses for a
large number of hall readings.

MBTA DENDRAL

    The success of any reasoning program is strongly dependent on the amount of
domain-specific knowledge it contains.  This is now almost universally accepted
within AI, partly because of DENDRAL's success.  Because of the difficulty of
extracting specific knowledge from experts to put into the program, many years
ago we began to explore the problems of efficiently transferring knowledge into a
program.  We have looked at two alternatives to llhand-craftingV1 each new
knowledge base: interactive knowledge transfer programs and automatic theory
formation programs.  In this enterprise the separation of domain-specific
knowledge from the computer programs themselves has been a critical component of
our success.

   One of the stumbling blocks with the interactive knowledge transfer
programs is that for some domains there are no experts with enough specific
knowledge to make a high performance problem solving program.  We were looking
for ways to avoid forcing an expert to focus on original data in order to codify
the rules explaining those data because that is such a time-consuming process.
Therefore we began working on an automatic rule formation program (called Meta-
DENDRAL) that examines the original data itself in order to discover the
inference rules for that part of the domain,

    The problem solving paradigm for Meta-DENDRAL is also the plan-generate-
test paradigm used in Heuristic DENDRAL.  In this case one part of the program
(RULEGEN) generates plausible rules within syntactic and semantic constraints and
within desired limits of evidential support.  The model used to guide the
generation of rules is particularly important since the space of rules is
enormous.  The planning part of the program (INTSUM) collects and summarizes the
evidential support.  The testing part (RULEMOD) looks for counterexamples to
rules and makes modifications to the rules in order to increase their generality
and simplicity and to decrease the total number of rules.

   Meta-DENDRAL successfully formulated rules of mass spectrometry that were
new to the science.  These rules, along with a discussion of the methodology,

J. Lederberg

64

Privileged Communication


DENDRAL PROJECT

Section 6.1.1

were published in the scientific literature [Report HPP-76-41. The program was
tested to see if it could rediscover the rules of mass spectrometry for two
classes of chemical compounds that were already well understood (amines and
estrogenic steroids).  Then it was applied to three classes of compounds whose
mass spectrometry was not as well known (mono-, di-, and tri-ketoandrostanes).
The program produced three sets of rules that explained much of the significant
data for these classes.  The time for manual rule formation for these data was
estimated to be several months.

   Progress was made on generalizing the Meta-DENDRAL program, and rules for a
new domain were successfully discovered by the program.  A scientific paper on
this application was submitted for publication [Report HPP-77-43. The new
application was learning rules for interpreting signals from Cl+NMR
spectroscopy.  The instrument produces data points in a bar graph in response to
the resonance of each carbon-13 nucleus in the sample. The rules describe an
environment of a Cl3 atom and predict a resonating frequency range for every atom
that matches the description.  The Meta-DENDRAL program needed some modification
because the rules are predicting ranges of data points, and not precise
processes, as for the mass spectrometry version.

    The RULEGEN component of Meta-DENDRAL was demonstrated to work with its
heuristic search paradigm.  Guidance from a model of mass spectrometry is an
important feature of RULEGEN.  Also, the program uses problem data for pruning
possible rules (and all more specific rules formed from those).  The amount of
data examined during the search is very large and the space of rules is immense,
so the search needs to be rather coarse in order to produce plausible, but not
necessarily optimal, rules.

    The RULEMOD program for "fine-tuning" Meta-DENDRAL's newly-discovered rules
was finished.  This program provides a number of important subtasks, including
merging similar rules , making rules more specific or more general, and filtering
out the weakest rules.  RULEMOD checks for counterexamples to rules and uses this
information in all of the named tasks.  Because of the expense of computing
counterexamples to possible rules, this computation is delayed until Meta-DENDRAL
has a set of plausible rules, rather than computing counterexamples on each
possible rule examined in the search of the rule space.

    A report was written on the AI methodology underlying Meta-DENDRAL The
major idea developed in this report is that knowledge of the domain can be used
effectively to guide a learning program.  The major difference between Meta-
DENDRAL and statistical learning programs is that Meta-DENDRAL uses a strong
model of mass spectrometry, including any assumptions the user cares to make
about the domain, to guide the formation of explanatory rules.

Cl 3 NMR SPECTROMETRY

   13C NMR was selected as a new application area for the rule formation
program, Meta-DENDRAL.  The algorithms used for mass spectrometry rule formation

Privileged Communication

65

J. Lederberg


Section 6.1.1

DENDRAL PROJECT

were extended to 13C NMR and used to obtain a set of rules for These two classes
and acyclic amines.  These two classes were chosen since compounds in these
classes are known to show a strong correlation between structural environment, and
shift.  Thus, the programs could be tested knowing that the underlying basis for
the form of the rule was valid.

The form of the rule is

substructure ---> shift range.

A sample rule generated is

C-c*-C-X- ---> 19.85<= (delta sub C)<=21.3.

   The asterisk in the substructure description denotes the atom for which the
shift is predicted.  Only topological descriptors were used to construct the
substructures. The addition of stereochemical terms is a topic of current work.

   It was necessary to change RULEGEN so that the left-hand sides of rules
were expanded outward from a carbon atom rather than from a bond.  The right-hand
side of the rule is associated with a range rather than a precise mass as in the
mass spectrometry program.  This modification also required changes in the rule
search procedure.  The user sets two parameters which guide the rule search.
These parameters are MINIMUM-EXAMPLES which requires each rule to explain a given
number of peaks in the training set and MAXIMUM-RANGE which defines the
acceptable shift range for a rule.  These parameters regulate the degree of
specificity or generality of the rules.

   From the set of rules generated a subset is selected corresponding to the
18best11 set which still covers all the training set data.  The best rule is
selected by calculating

(number of peaks predicted/(range ** 2)).

   Data which are predicted by the best rule are removed and the next best
rule is found for the remaining data using the criterion given above.  This
process is repeated until all data are explained.

    In order to test the informational content of the rules generated a second
program was written which applied the rules to a list of candidate molecules and
ranked the molecules.  Firsts, all possible structural isomers for a given
empirical formula were generated using CONGEN.  The rules were applied to each of
the possible isomers and spectra were predicted.  The predicted spectra were
compared to that of a known spectrum from a compound with the same empirical
formula.  The structural isomers were ranked according a comparison score to
determine how well the correct compound was distinguished from its isomers, on
the basis of the predictive rules.

    The details of the generation of rules and the use of rules for structure
selection can be found in a paper recently submitted for publication [Report HPP-
77-41

J. Lederberg

66

Privileged Communication


DENDRAL PROJECT

Section 6.1.1

   'The 13C NMR rule formation program was applied to a set of paraffins and
acyclic amines.  The program generated 138 rules to cover 435 data peaks.  The
rules generated were applied in a structure selection test for the structural
isomers of CgH20 and C6Hl5N.  No structures with these empirical formulas were
included in the training set.  Twenty-four C9H20 and eleven C6H15N 13C NMR
spectra were available to act as unknowns in the structure selection test.  The
results of the structure ranking applied to these spectra are shown below.

EMPIRICAL
FORMULA

C9H20
C6H15N

    NUMBER OF       NUMBER OF CANDIDATES
CANDIDATE ISOMERS          RANKING
                 1st   2nd.,...6th......gth
    35         20124   3/24          l/24
    39         8111   2/11    l/II

   The performance of the rules in discriminating among similar structures not
included in the training set data demonstrated the content of the rules.

RECENT PUBLICATIONS
              --

(Only publications related to computers in chemistry are shown.)

HPP-76-l D.H.  Smith, J.P. Konopelski and C. Djerassi, "Applications of
     Artificial Intelligence for Chemical Inference. XIX. Computer
     Generation of Ion Structures", Organic Mass Spectrometry, 11: 86,
      ( 1976).

HPP-76-2 Raymond E. Carhart and Dennis H. Smith, "Applications of Artificial
     Intelligence for Chemical Inference XX. Intelligent Use of Constraints
     in Computer-Assisted Structure Elucidation", Computers In Chemistry (in
      press).

HPP-76-3 C.J. Cheer, D.H. Smith, C. Djerassi B. Tursch, J.C. Braekman and D.

Privileged Communication

67

J. Lederberg


Section 6.1.1

DENDRAL PROJECT

Dal oze , lfApplications of Artificial Intelligence for Chemical Inference
XXI.  Chemical Studies of Marine Interbrates - XVII. The Computer-
Assisted Identification of [+I-Palustrol in the Marine Organism
Cespitularia sp., aff. subviridis". Tetrahedron. 32:1807, Pergamon
Press, (1976).

HPP-76-4 B.G. Buchanan, D.H. Smith, W.C. White, R.J. Gritter, E.A. Feigenbaum,
      J. Lederberg, and Carl Djerassi, "Application of Artificial
      Intelligence for Chemical Inference XXII. Automatic Rule Formation in
     Mass Spectrometry by Means of the Meta-DENDRAL Program", Journal of the
      American Chemical Society, 98: 6168 (1976).

HPP-76-5 T.H. Varkony, R.E. Carhart and D.H. Smith, "Applications of Artificial
      Intelligence for Chemical Inference XXIII.  Computer-Assisted Structure
      Elucidation. Modelling Chemical Reaction Sequences Used in Molecular
      Structure Problems", in "Computer-Assisted Organic Synthesis", W.T.
      Wipke, Ed., American Chemical Society, Washington, D.C., in press.

HPP-76-6 D.H. Smith and R.E. Carhart "Applications of Artificial Intelligence
       for Chemical Inference XXIV. Structural Isomerism of Mono and
      Sesquiterpenoid Skeletons 1,2-", Tetrahedron, 32:2513, Pergamon Press
      (May 1976).

HPP-76-10 Bruce G. Buchanan and Dennis Smith, "Computer Assisted Chemical

HPP-77-4

HPP-77-6

Reasoning", in Proceedings of the III International Conference on
Computers in Chemical Research, Education and Technology", Plenum
Publishing, (1976).

T.M. Mitchell and G.M. Schwenzer, "Applications of Artificial
Intelligence for Chemical Inference.  XXV. A Computer Program For
Automated Empirical 13C NMR Rule Formation', (Submitted to JACS,
January 1977).

Bruce G. Buchanan and Tom Mitchell.  "Model-Directed Learning of
Production Rules", Submitted to the Proceedings for the Workshop on
Pattern-Directed Inference Systems in Hawaii, (February, 1977).  (STAN-
cs-77-597 1

HPP-77-11 Dennis H. Smith and Raymond E. Carhart,  "Structure Elucidation Based on
      Computer Analysis of High and Low Resolution Mass Spectral Data".
      Proceedings of the Symposium on Chemical Applications of High
     Performance Spectrometry. University of Nebraska, Lincoln, (in press).

II.  INTERACTION WITH THE SUMEX-AIM RESOURCE
                ---

   The number of persons experimenting with CONGEN has grown as a result of
both the continuing practice of issuing an "invitation for program trial use" at
the conclusion of publications, as well as continuing personal contact between

J. Lederberg

68

Privileged Communication


DENDRAL PROJECT                                        Section 6.1.1

Dendral project members and potential program users,  Three categories of users
make up this group:

Chemists Using Exported Programs

    The part of CONGEN responsible for teletype output of chemical structures
(the DRAW program) is coded in Fortran. Since the paper describing this program
appeared in print [R. Carhart, JACS, 16:82, 19761. we have exported the program
to half a dozen sites, ranging from Japan, across North America, to England.
Similarly, the entire CONGEN program, is largely coded in Interlisp and SAIL, and
has been exported to a collaborator in England who is very interested in the
methods and programming techniques employed in coding the program.  Another
program which we have exported for use by other chemists is the PDP-11 CLEANUP
program which was described in ANALYTICAL CHEMISTRY [48:1368, 19761. This
program ffcleans up" new GC/MS data to eliminate noise peaks and to separate the
data associated with components in the mixture.

   In each case, the requestors were provided with an initial choice of format
options from which they could select the one most suitable for their computer
installation.  They were asked to send a 2400 foot reel of magnetic tape
appropriate to the selected format option.  The programs were written on the tape
and returned to them along with a brief written explanation of program
organization. Accurate records are kept of who has received the programs, so that
omissions and errors can be corrected by mail at a later date, if ever necessary.

1.


2.

3.

4.

5.

6.

7.

8.

9.

Dr. James F. Elder, Dow Chemical U.S.A., Midland, Michigan.

Dr. Robert M. Supnik, Massachusetts Computer Associates, Inc., Wakefield,
Massachusetts.

Mr. Dan Pearce, Orange County Sheriff-Coroner Department, Santa Ana,
California 92702

Dr. H. J. Stoklosa, Central Research & Development Department, E. I. du
Pont de Nemours & Company, Wilmington, Delaware.

Dr. Douglas W. Kuehl, Environmental Research Laboratory-Duluth, Duluth,
Minnesota.

Dr. Richard A. Graham, Food Sciences Laboratory, U. S. Army Natick
Laboratories, Natick, Massachusetts.

Dr. Walter M. Shackelford, United States Environmental Protection Agency,
Environmental Research Laboratory, Athens, Georgia.

Dr. Richard Gans, Chemical Research Division, American Cyanamid Company,
Bound Brook, New Jersey.

Dr. John C. Marshall, Department of Chemistry, the University of North
Carolina, Chapel Hill, North Carolina.

10. Dr. Graham S. King, Department of Chemical Pathology, Queen Charlotte's
  Hospital for Wanen, London, England.

Privileged Communication

J. Lederberg


Section 6.1-l

DENDRAL PROJECT

11. Dr. J. Wyatt l Chemistry Division, Naval Research Laboratory, Washington,
   D. C..

12. Dr. Gareth Templeman, Research and Development Laboratories, The
  Pillsbury Company, Minneapolis, Minnesota.

13. Dr. J. B. Justice, Department of Chemistry, Emory University, Atlanta,
  Georgia.

14. Dr. Thomas Knudsen, Northrop Services, Environmental Sciences Group,
  Research Triangle Park, North Carolina.

15. Dr. Ingolf Meineke, Fachbereich Chemie, Philipps Universitaet, Lahnberge,
  West Germani.

16. Dr. M.A. Shaw, Unilever Research, Port Sunlight Laboratory, Wirral,
  Merseyside, England.

17. Dr. Ernst Weber, Varian MAT, Bremen, West Germany.

18. Paul V. Fennessey, Department of Pediatrics, University of Colorado
  Medical Center, Denver, Colorado.

19. R. G. A. R. Maclagan, Department of Chemistry, University of Canterbury,
  Christchurch, New Zealand.

20. James E. Oberholtzer, Arthur D. Little, Inc., Cambridge, Massachusetts.

21. F. Street, AEI Scientific Apparatus Limited, Manchester, England.

Remote Users of SUMEX

   Due to the fact that the SUMEX computer is available via both the TYMNET
and ARPANET communication networks, it is possible for scientists in many parts
of the world to directly access the Dendral programs on SUMEX.  Primary usage is
centered on CONGEN, although INTSUM is beginning also to gain a following.
Although access points to SUMEX are widespread, they frequently are not diverse
enough to accommodate the dispersed group of scientists who have expressed an
interest in using one of the Dendral programs.  For example, Dr. Joseph Baker of
the Roche Institute of Marine Pharmacology in Dee Why, Australia, is looking at
the possibility of accessing SUMEX by using International Direct Distance Dialing
(IDDD).

Chemists Communicating by Mail

   Many Scientists interested in using DENDRAL programs in their own work are
not located near a network access point. .Users of this type choose to use the
mail to send details of their structure elucidation problem to a Dendral Project
collaborator at Stanford.

J. Lederberg

70

Privileged Communication


DENDRAL PROJECT

Section 6.1.1

Chemical Problems Posed to CONGEN

   Following is a list of CONGEN users, and a brief summary of their program
interests during the past year.

1. Dr.  Roger Hahn, Syracuse University. While at Stanford he used
  CONGEN to help solve the structures of photoproducts by obtaining all
  possibilities under available constraints and designing NMR
  experiments to differentiate the possibilities. This work will be
  published soon.

2.  Dr. William Epstein, University of Utah. During a demonstration of
  CONGEN, he posed a problem to verify that the structural
  possibilities he determined for an unknown were in fact all
  possibilities.  The structure of methyl santolinate has been
  published (see Epstein, et al., J.C.S. Chem. Commun., 590 (1975)).

3.  Dr. Clair Cheer, University of Rhode Island. While on sabbatical at
  Stanford, Dr.  Cheer has worked on a number of structure elucidation
  problems using CONGEN including Briareine D and [+I-Palustrol (Cheer
  et al., Tetrahedron Letters, 1807 (1976)). Work is continuing on the
  structure of another marine natural product, presumably a
  cembrenolide, for which there are currently seven possibilities.

4.  Dr. Jerrold Karliner, Ciba-Geigy Corporation. Dr. Karliner has
  solved several structural problems using CONGEN, including material
  with flame retardant properties, an impurity in a production sample
  and nitrogen heterocycles being investigated for pharmacological
  activity.  CONGEN enabled reduction of the number of possibilities to
  the point where subsequent experiments led to unambiguous structural
  assignment.

5. Dr.  Gino Marco, Ciba-Geigy Corporation.  He has used CONGEN to help
  solve structures of conjugates of pesticides with sugars and amino
   acids.

6.  Dr. Milton Levenberg, Abbott Laboratories.  He has worked on the
  structure of a compound with mild antibiotic activity, isolated from
   a fermentation broth.  There are currently ten structural
  possibilities, reduced to that number from the 33 initially
  determined using CONGEN by additional experimental data.

7. Dr.  David Pensak, DuPont.  He is currently learning to use CONGEN
  and plans to evaluate its utility for structural problems of some of
   his coworkers.

8.  Dr. Douglas Dorman, Eli-Lilly.  He is using CONGEN to assist in
  structure elucidation of metabolites of microorganisms shown to have
  pharmacological activity.  He has worked on five such problems,
  including a current one where the developing MSPRUNE capabilities are
  being used.

9.  Dr. L. Minale, Napoli, Italy.  We have worked with him by sending him

Privileged Communication

J. Lederberg


Section 6.1.1

DENDRAL PROJECT

structural alternatives for proposed structures for some marine
natural products (Pallescensins, Tetrahedron Letters, 1417 (1975))
and cyclic diethers from the lipid fraction of a thermophilic
bacterium (J. C. S. Chem. Commun., 543 (1974)).

10. Dr. K. Nakanishi, Columbia University.  We have worked with him by
  sending him structural possibilities for termite defense compounds
(structure finally solved by X-ray crystallography). This trial plus
  a live demonstration to one of his students has resulted in efforts
  toward continued collaboration on other insect defense secretions and
  exploration of the possibility of his direct access to SUMEX.

11. Dr. L. Dunham, Zoecon Corporation.  We have collaborated with him on
  the use of INTSUM for mass spectral fragmentation studies of insect
  juvenile hormones.

12. Dr. A. G. Gonzales, Tenerife, Spain.  We have recently sent him
  structural alternatives for constituents of Laurencia Perforata
(Tetrahedron Letters, 2499 (1975)), and expect to continue
  discussions on the structures of these compounds.

13. Dr. T. Irie, Sapporo Japan.  We have recently sent him structural
  alternatives to published structures on constituents of Laurencia
  Glandulifera (Tetrahedron Letters, 821 (1974)) and expect to continue
  discussions on this problem.

14. Dr. C. J. Persoons, Delft.  We have corresponded with him on
  structural alternatives for cockroach sex pheremones (Periplanone-B
(Tetrahedron Letters, 2055 (1976)), and he has agreed to further
  collaboration on new problems.

15. Dr. F. Schmitz, University of Oklahoma. We explored for him
  structural alternatives for an unknown diterpenoid hydrocarbon. We
  obtained 25 possibilities, of which only four obeyed the isoprene
  rule.

16. Dr. J. Baker, Roche Institute of Marine Pharmacology, Australia. We
  plan collaboration with Dr. Baker on the sterol fractions of various
marine organisms and are exploring ways for him to access CONGEN.

17. Dr. E. VanTamelen, Stanford University.  We have used the developing
reaction features of CONGEN to explore structural possibilities for
  both chemical and biogenetic cyclization products of squalene-oxide
   congeners.  We have suggested alternatives to proposed structures and
  helped to design experiments to differentiate them.

18. Dr. J. C. Braekman, Brussels.  Dr. Braekman visited Stanford as a
part of continuing collaboration in marine chemistry with Dr.
  Tursch's group.  While at Stanford he explored use of CONGEN for use
in current problems in marine natural products, and worked on the
  problems of Drs. Irie and Gonzales (see above). He is currently
  exploring access to CONGEN from Brussels, via TYMNET.

J. Lederberg

72

Privileged Communication


DENDRAL PROJECT

Section 6.1.1

   Use of CONGEN by working scientists has turned up one major area in which
additional information to the user was thought to be necessary.  CONGEN users
unanimously indicated their desire for a method of determining what percentage of
the whole problem was solved at any moment, i.e., total number of possible
structures is represented by the number already generated.  In a prototype system
we have implemented the Cntrl-I and Cntrl-S user information interrupts, to show
how far CONGEN has progressed.  If, for example, someone who has generated 357
structures is told that this indicates that they have generated 1 percent of the
total possible structures, they immediately know that they do not want to finish
generating all the structures.  Even if there were enough space, 40,000
structures would be far more than they would want to see.

   We implemented another user-oriented facility for an invited paper
presented at the 172nd American Chemical Society meeting, in August of 1976.
Special features were added for a character-oriented, screen-addressable CRT
terminals to give users an informative visual interface to CONGEN, an otherwise
complex The dynamic field of view provided by this type of terminal was used to
advantage to give the chemist-user a continuous, graphic summary of both the
information he has supplied to the program and the dynamic use of that
information by the program.

INTERACTION WITH OTHER SUmX-AIM PROJECTS

   We have had numerous discussions with Prof. Todd Wipke's research group in
meetings of our combined groups.  Because the problems of manipulating chemical
graphs are much the same for both groups,  frequent discussions are mutually
advantageous.

   Almost daily contact with other Stanford-based projects provides new ideas
and programming assistance.  In particular, there is considerable interaction
with members of the MYCIN, MOLGEN and Protein Crystallography projects.  Many of
our experiment planning ideas have come from discussions with the MOLGEN group.
Our ideas about explaining a program's reasoning are derived from the success of
MYCIN's explanation package.  And our ideas about integrating multiple sources of
knowledge in data interpretation have been enhanced through discussions with the
Protein Crystallography group.  The large number of excellent INTERLISP
programmers in all these groups provides a pool of programming expertise that we
draw on frequently also.

    We are collaborating with Dr. Robert Lindsay on a monograph about the
DENDRAL programs, with most of our interaction and all our text preparation
taking place over the SUMEX system.  We have also discussed helping Dr. Lindsay
with a knowledge-based reasoning program to help pathologists at the University
of Michigan.

Privileged Communication

73

J. Lederberg



Section 6.1.2

HYDROID PROJECT

6.1.2    HYDROID PROJECT

HYDROID - Studies in Distributed Processing and Problem Solving

        Prof. Cio Wiederhold
Computer Science and Electrical Engineering
       Stanford University

I.  Summary of Research Program
             -

A. Technical Coals

   The objective of this research is the development of a methodology for the
analysis and isplementation of alternatives in distributed processing and problem
solving.  One of the primary reasons for interest in this area is its potential
to break through the speed limitation barriers imposed by uniprocessing systems.
If such a breakthrough can be achieved then the viability of the methods being
developed by other projects using the SUMEX-AIM resource will be enhanced.

   The rapid development of microprocessor and communications technology has
given rise to a large number of proposed implementations of networks employing
multiple processors.  The computations to which these distributed systems are to
be applied include heuristic decision-making problems, mathematical modelling,
data reduction, and database search, as well as general purpose multi-access
computing. There is however a lack of an adequate global understanding of the
computational tradeoffs implied by network architectures.

   In order to complement the experimental results of other investigators and
broaden their applicability to the system-design decision-making process, we are
developing a general framework for the study of processor interaction in
distributed processing systems.  The framework consists of rules to obtain
parameters from programs which specify the computations, rules to parameterize
descriptions of networks of processors, and procedures to calculate expected
system performance from these parameter sets.  The framework is to be
sufficiently powerful so that, when it is validated, the methods will be able to
assist in the a priori assessment of the potential performance of new system
alternatives or of systems with improved system components.

   One of the primary tools we are using to analyze the interaction between
computations and distributed processor networks is simulation.  The behaviour of
processor network nodes, interprocessor control and task flow, and problem
decomposition all require simulation at different levels of abstraction.
Analytic queuing models may provide insight into relationships in networks, but
are not adequate to provide quantitative results. Simulation is not seen as the
pnd product of the study, but as a means to develop and assess the validity of
our model of the interaction of computations and processor network architecture.
Where possible, mathematical results will be used to assess the validity of model
simulations.

J. Lederberg

76

Privileged Communication


HYDROID PROJECT

Section 6.1.2

    A number of large computational applications are being analyzed in order to
assess their potential for decomposition into modules for distributed processing.
The current candidate applications are:

a) Programs which use heuristic methods in decision-making.  Heuristic
programs frequently employ recursive decomposition of problems into
subsidiary problems which themselves may be suitable for distributed
  processing.

b) Programs which use multi-faceted databases to retrieve and abstract
information. The process of intelligent data retrieval and analysis often
  depends on data or knowledge sources which are being maintained at
geographically distributed processing sites.

c) Programs tihich acquire data from multiple, possibly dissimilar, sensors and
attempt to reduce this data to simpler hypotheses.

d) Programs which solve large numerical problems, such as those found in image
  processing applications.

Parameters which describe the computations to be simulated include:

a) The computational kernel size: the cycle and memory demand of a
computational unit between interprocessor reference requirements.

b) The computation definition message size: the amount of data required to
  transmit sufficient information to initiate a computational kernel.

c) The database size: the amount of data or program text required to sustain a
  computational kernel, and its availability and residence in the network.

   The behaviour of the system can be varied through the adjustment of other
parameters.  These parameters may be set to reflect the architecture of specific
hardware systems, or may be varied to obtain optimum performance.  In addition to
obvious parameters (as the number and power of the processors), we expect the
following parameter types to be important in developing an understanding of the
spectrum of distributed processor architectures:

a> Interconnection density.  As the density decreases, the message delay and
  congestion increase.  This parameter will provide a high level abstraction
of multi-processor connectivity schemes.  Geographical distribution will
  increase message delay and transmission cost.

b) Computational locality.  A high degree of locality (of database or
procedural information.in the network) will enhance the probability that
relevant knowledge exists in closely linked nodes, thus counteracting the
effects of a low interconnection density.

c) Database viscosity.  A database, including the programs required to carry
out the computations at a node, may be more or less fixed to one specific
  node.  This therefore encourages the use of certain nodes for specific
  functions.  Many current processor networks are completely rigid in this
  sense, and for these networks optimal initial program and database

Privileged Communication

77

J. Lederberg


Section 6.1.2

HYDROID PROJECT

allocations may be determined.  However, we hypothesize that a greater
degree of dynamic resource allocation is desirable to cope with changing
loads and in order to enhance reliability.  For this reason this parameter
needs to be included.

d > Redundancy.  In order to assess the cost and benefits in terms of
responsiveness and reliability, the redundancy of database and computations
  will also be made a parameter.  In order to utilize the redundancy well,
the computational resources (programs or data) which effect system
  performance most must be identifiable.

e) Error rate.  In order to test the effectiveness of reliability strategies,
  node and communications channel failures will be simulated.

   An important aspect of this model is that we intend to keep the
abstractions at a sufficiently high level to allow analytic and intuitive
verification of the model behaviour when applied to well understood computations.
Computations have been mapped into specific parallel machines, but these results
are not easily transferred to new architectures.  The distributed processor
systems now being built may have characteristics with unpredicted effects on
system behaviour.  We expect to be able to use the model to find potential
bottlenecks, which then will define areas where extra design attention has a high
payoff .

   We do not intend to build hardware which is based literally on the abstract
model.  We hope to verify results obtained from the model using existing
di stri buted processor systems and, assuming that our model (with appropriate
parameters describing the load and architecture) matches the given system, be
able to advise on system utilization or development aspects. A local resource of
this type may be the Stanford I processor, now being built under ERDA
sponsorship.  In addition , if we determine that a certain, yet untried,
architecture is promising, we would like to encourage and participate in its
implementation.

B. Medical Relevance and Collaboration

   Nany applications at SUMEX consume large quantities of computational
resources. The use of multiple distributed processors may provide a means to gain
the required processing capabilities in an economic manner. In this sense the
medical relevance of this study is indirect.  We are attempting to develop tools
which will be of use in medical computation problems.

   Our studies in distributed data base applications have a more direct
medical relevance.  To this end, we are maintaining contact with Dr. Jim Fries,
whose ARAMIS database network collects data for the analysis of disease progress
and treatment efficacy in rheumatoid arthritis from a variety of institutions.
.Sharing of data to provide a broader base for analysis is also a feature of
programs in cardiology and oncology in which physicians at Stanford participate.
In each of these instances the distributed nature of the data resources leads to
differences in the meaning of data items, so that simple aggregation of the data
may not be valid.  Distributed processing may provide a powerful alternative.

J. Lederberg

78

Privileged Communication


HYDROID PROJECT

Section 6.1.2

C. Progress Summary

   The HYDROID project, got underway in the fall of 1976. We have been involved
since that time in developing a basic understanding of important problem areas in
distributed processing and problem solving.

   A weekly research seminar, begun in Dec.  1976 has brought together members
of the faculty and students from a variety of disciplines, and has included
several speakers from application areas where distributed processing may be
beneficial.

   We have developed a formalism in which to express the control of
distributed problem solving in loosely-coupled processor networks. This CONTRACT
NET protocol makes the cost of interprocessor interactions explicit. It is this
cost which appears to generate one of the performance boundaries for distributed
processor systems.

  We have written a basic simulator with which to investigate the merits of
the formalism together with problem solving methods applicable in the distributed
processing environment.  To this end the simulator is currently being tested with
small search problems as a means of determining the necessary information that
must be transferred from node to node in a distributed processor system for such
problems together with the advantages to be accrued via a distributed approach.
The simulator is being developed to cover a greater variety of computational
interactions.

D. Publications

1) H. Garcia-Molina and Gio Wiederhold, "Application of the Contract Net Protocol
to Distributed Data Bases", HPP-77-21, Heuristic Programming Project, Stanford
University, April 1977.

2) R. G. Smith, "The Contract Net: A Formalism for the Control of Distributed
Problem SolvingI', HPP-77-12, Heuristic Programming Project, Stanford
University, February 1977 (also submitted to the Fifth International Joint
Conference on Artificial Intelligence).

II.  Interactions with SUMEX-AIM
                --

  SUMEX-AIM currently provides all computing resources for the project. We
thus enjoy a high degree of interaction with other projects involved in the
problems which result from construction of large programs. Other points of
contact are related to the use of the same programming languages as well as the
abundance of AI expertise residing around the resource.  This latter point is

Privileged Communication

79

J. Lederberg


Section 6.1.2

HYDROID PROJECT

especially important considering that one of our aims is discovery of suitable
mappings of well understood AI methods onto highly parallel asynchronous
processor networks.

   SUMEX-AIM is also an excellent medium for informal transmission of reports,
recent results and bulletins to users with related interests and problems.  The
powerful screen-oriented editors available greatly enhance our capabilities for
writing both text and programs.

   Finally, the development of simulation programs generally requires a highly
interactive computing environment - the sort of environment we feel is provided
by SUMEX-AIM.

J. Lederberg

80

Privileged Communication


MOLGEN PROJECT

Section 6.1.3

6.1.3
      MOLGEN PROJECT

MOLGEN - An Experiment Planning System for Molecular Genetics

    Prof. J. Lederberg (Genetics, Stanford)
Prof. N. Martin (Computer Science, U. of New Mexico)
Prof. E. Feigenbaum (Computer Science, Stanford)

I.  Summary of Research Program
                          -

A.  Technical Goals

   The goal of the MOLGEN project is to develop an experiment planning system
for the domain of molecular genetics.  In order to accomplish this, we hope to
create and apply innovative methods of knowledge management and hierarchical
planning.

   Experiments in molecular genetics are concerned with the study and
manipulation of DNA molecules.  The MOLGEN howledge base will include both
declarative and procedural information about such structures and the laboratory
tools and techniques which experimental geneticists use.  Also represented will
be much of the strategic information required to join individual experimental
steps into a meaningful whole.  tie are using the uniform method of schemata for
representation of all types of knowledge within MOLGEN.  We believe this will
facilitate knowledge acquisition and explanation and provide a consistent means
of storing hierarchical and other relations among objects and rules in the
system.  We nope to make the underlying knowledge base flexible enough to allow
for experimentation with a wide variety of specific planning strategies.

B.  Medical relevance and collaboration

   Molecular genetics has at least two major connections to medical research.
Learning about the basic mechanisms which control the operation and transmission
of genetic information is necessary to understand and treat the wide range of
diseases (and health conditions like aging) which are genetically controlled.
Also, recent developments in molecular genetics offer the promise of using
genetic mechanisms to produce essentially limitless amounts of drugs and other
biomedical substances.  The MOLGEN project will develop a system designed to aid
the molecular geneticist in planning experiments of these types.

   The MOLGEN project is a joint effort of the Computer Science Departments of
Stanford and the University of New Mexico and the Genetics Department of
Stanford.  Major participants are Professor Nancy Martin of the University of New
Mexico, Professor Edward Feigenbaum, Peter Friedland, Jonathan King, and Mark
Stefik of Stanford Computer Science, and Professor Joshua Lederberg and Jerry
Feitelson of Stanford Genetics.

Privileged Communication

81

J. Lederberg


Section 6.1.3

MOLGEN PROJECT

C.  Accomplishments

   MOLGEN is in the first year of formal funding as an independent entity. We
have devoted this year to learning and analyzing the basic knowledge of
experimental molecular genetics and to building part of the central structure of
the knowledge base management system.  A wide variety of experiments have been
studied with the aim of extracting knowledge about the genetic objects and
operators used as well as the higher-level know-ledge used to form the overall
experimental plan.  The object level knowledge is currently being organized into
the schemata formalism for an initial attempt at a molecular genetics knowledge
base.

    A representation method for DNA structures and an interactive structure
editing and entry system (EDNA) has been built and tested successfully with
geneticist users.  Work is proceeding on the schemata storage and access routines
and on routines for acquiring and editing the rules which describe the procedural
knowledge of the domain.  We plan to have the basic MOLGEN system operational for
the purpose of testing object and operator knowledge (the practical goal of
experiment checking) by the end of July 1977.

D.  Publications

1) N. Martin, P. Friedland, J. King, M. Stefik, I,Knowledge Base Management for
Experiment Planning in Molecular Genetics,,,  submitted to Fifth International
Joint Conference on Artificial Intelligence

2) M. Stefik and N. Martin, ,,A Review of Knowledge Based Systems as a Basis for a
Genetics Experiment Designing System,,, Feb.  1977 Stanford CS Report STAN-CS-
77-596, HpP77-5

3) N. Martin, P. Friedland, M. Stefik, ,,MOLGEN Knowledge Base I: Object System,,
  To appear as HPP Working Paper

4) N. Martin, P. Friedland, M. Stefik, ,,MOLGEN Knowledge Base II: Rule System,, To
  appear as HPP Working Paper

II.  Interactions with SUMEX-AIM

    All system development has taken place on the SUMEX-AIM facility. We have
used the system not only for programming, but also as a major aid in writing and
transmitting among ourselves the wide variety of formal and informal reports
which are necessary in the MOLGEN design phase.  We believe the availability of
good interactive text editing facilities like TV-Edit increases our productivity
significantly.

J. Lederberg

82

Privileged Communication


MOLGEN PROJECT

Section 6.1.3

   Active collaboration with remote users at the University of New Mexico will
begin in September 1977 (Prof. Nancy martin has been visiting at Stanford this
year).  We expect this collaboration to occur over the ARPA network. We hope
also to maintain a collaboration with Dusko Ehrlich, formerly a Stanford
geneticist and now doing research at The Institut de Biologie Moleculaire Faculte
de Science in Paris over a TYMNET link to Sumex.

   We have benefited enormously from the collected expertise in both
knowledge-based systems and general programmin g and design problems available
from other SUIYEX-AIM projects.  We have especially strong ties to the knowledge
management expertise of the MYCIN project, but we also share common objectives
with parts of the DENDRAL, SECS, and protein crystallography projects. We have
also benefited from the intense interaction with many other projects at the AIM
conferences.

   Finally, we have provided small amounts of SUMEX resources to geneticist
users as part of a quid pro quo relationship for helping us understand that
subset of genetic knowledge necessary for our initial knowledge base.  The most
outstanding example of this sort of collaboration occurred with Prof. Larry
Kedes' group at the VA hospital in Palo Alto who are using SUMEX to determine the
feasibility of automated assistance in analyzing complex DNA base sequences.

Privileged Communication

83

J. Lederberg


Section 6.1.4

MYCIN PROJECT

6.1.4    MYCIN PROJECT

MYCIN - Computer-based Consultation in Clinical Therapeutics

S. N. Cohen, M.D. (Pharmacology) and
B. G. Buchanan, Ph. D . (Computer Science)
       Stanford University

I> Summary of research
   --     --

Technical goals

    The Mycin project is aimed at the development of a computer program capable
of functioning as an expert consultant on a range of medical decision making
problems. In par titular , we have been working on the construction of a system
that provides consultative advice on the diagnosis and therapy selection for a
number of infectious diseases.  Current areas of competence of the system include
bacteremia and meningitis, and work is currently underway to extend this to
urinary tract infections, pulmonary infections, and prophylactic use of
antibiotics.

Our work has been guided by three fundamental objectives :

(1) A major goal of the MYCIN system has been to provide a computer-based
  therapeutic tool designed to be clinically useful, one that would be used
  eventually in the clinical setting.  This goal requires development of a
  system that has a medically sound knowledge base, and that displays a high
  level of clinical competence in its field.  The program must first
  convince clinicians of the quality of the information it is providing
  before they will be willing to use it.

(2)

Since many clinicians are not likely to accept the advice provided by a
computer-based system unless they can understand why the recommended
therapy has been selected, the system has to do more than just give advice
dogmatically.  It should have the ability to explain the reasoning behind
its decisions, and should be able to do so in terms that suggest to the
physician that the program approaches the problem in much the same way
that he does.  This permits the user to validate the program's reasoning,
and modify (or reject) the advice if he believes that some step in the
decision process is not justified.  It also gives the program an inherent
instructional capability that allows the physician to learn from each
consultation session.

(3) A third major goal is to provide the program with capabilities that enable
  augmentation or modification of the knowledge base by clinical experts in
  infectious disease therapy, in order to improve the validity of future
   consultations.  The system therefore requires some capability for
  acquiring knowledge by interacting with experts in the field, and for
  incorporating this knowledge into its knowledge base.

J. Lederberg

84

Privileged Communication


MYCIN PROJECT

Section 6.1.4

    Three separate parts of the MYCIN system accomplish these goals.  The
consultation system uses the knowledge base, along with patient-related data
entered by the physician to generate therapeutic advice.  The explanation system
has the ability to explain the reasoning used during the consultation, and to
document the motivation for questions asked or the rationale for conclusions
reached.  Finally, tne knowledge acquisition system enables experts in
antimicrobial therapy to update MYCIN's knowledge base, without requiring that
they know how to program a computer.

   We have also sought to use Mycin as a framework for understanding the
process of medical decision making and the nature of clinical judgment.
Physicians are constantly faced with the necessity of making decisions based on
information tnat is both incomplete (missing historical data or test results not
yet available) and inexact (results are rarely definitive).  In addition, those
decisions are often based on rules that are only approximate (e.g., 'Ia qram-
negative aerobic rod in the blood is probably a bacteriodes").   But decisions are
made despite these problems, and the results often proven later to be valid. We
have attempted to understand how this is done by developing in our system a
parallel set of capabilities.  We have relied on the "production rule" encoding
of information, in which individual decision rules are specified in an "if/then*'
format.  For example, the rule indicated just above is encoded in the system as:

If   1) the gram stain of the organism is gram negative, and
   2) the morphology of the organism is rod, and
   3) the aerobicity of the organism is anaerobic,
Then there is suggestive evidence (.6) that the identity of the organism
    is Bacteroides.

    This encoding of knowledge offers a number of advantages over some of the
more traditional approaches to diagnosis like decision trees, Bayesian analysis,
and utility theory.  Unlike decision trees,  it can deal with both inexact and
incomplete information.  Unlike the Bayesian and utility theory approaches, it
does not need extensive amounts of conditional probability data. A collection of
independent rules is also far easier to augment than a complex decision tree; the
rules thus provide a much more flexible body of knowledge to which new
information is more easily added.  The rules also make possible an explanatory
capability: the system can justify any of its actions or decisions by displaying
the relevant rules it invoked in reaching that decision. This provides an
explanation that is far more comprehensible than any we might be able to provide
by recapping the actions of a program based solely on statistical considerations.

   A more specific goal of our research involves understanding the process of
infectious disease diagnosis and therapy selection.  This process is not as yet
well understood, and we believe that by dissecting it down to individual decision
rules, we can gain insight into how it works.  In addition, the resulting set of
rules may prove to be a useful compendium of knowledge about the task.

   Since we believe this set of rules will also be quite large, we are
studying the problems of accumulating, managing, and using large stores of such
task-specific knowledge,  We are working on a range of techniques to provide
capabilities like insuring the consistency of the set of rules and making it easy
to modify existing rules or add new ones.

Privileged Communication

85

J. Lederberg


Section 6.1.4

MYCIN PROJECT

   Finally, since computer consultants are designed for use by people who
might not otherwise make use of computers, we have devoted a great deal of
attention to the issue of human engineering, and the "habitability" of the
system.  This ranges from such minor items as the automatic correction of
misspelled answers, to the range of sophisticated explanation capabilities
available.

Medical relevance and collaboration

   A number of recent studies indicate a major need to improve the quality of
antimicrobial therapy.  Almost one-half of the total cost of drugs spent in
treating hospitalized patients is spend on antibiotics [ 1,2], and if results of a
number of recent studies are to be believed, a significant part of this therapy
is associated with serious misuse [2,3,4,5], Some of the inappropriate therapy
involves incorrect selection of a therapeutic regimen [ 43, while another serious
problem is the incorrect decision to administer any antibiotic [ 2,4,5]. Cne
recent study concluded that one out of every four people in the United States was
given penicillin during a recent year, and nearly 90% of these prescriptions were
unnecessary [6] .  Other studies have shown that physicians will often reach
therapeutic decisions that differ significantly from the decisions that would
have been suggested by experts in infectious disease therapy practicing at the
same institution.

   Nonexperts sometimes choose a drug regimen designed to cover for all
possibilities , prescribing either several drugs or one of the so-called "broad
spectrum" antibiotics, even though appropriate use of clinical data might have
led to more rational and less toxic therapy.  Within a hospital environment in
which professional resources are often overburdened, and in environments where
expert sources are not readily available, a computer-based consultant will be
highly useful.  Such a system will also have broad fringe benefits in its
educational impact on staff physicians and in providing a framework for quality
control and peer-review evaluations.

   Antimicrobial therapy appears to be an especially suitable area for the
initial development of a computer-based system to assist physicians with
decisions in clinical therapeutics.  The components of the decision making
process in antimicrobial therapy are more readily definable than in many other
areas of medicine, and the consequences of the physician's decision can usually
be assessed in terms of the direct therapeutic action. Nevertheless, the general
approach used here is applicable to other areas of clinical decision making.  The
basis of rational antimicrobial therapy decisions is identification of the
microorganisms causing the infectious disease.  Accurate identification is
important because of the specificity of antibiotic action: drugs that are highly
effective against certain organisms are often useless against others.  The
patient's clinical status and history (including information such as prior
infections and treatments) provide data that may be valuable to the physician in
identifying the disease-causing organisms.  However, bacteriological cultures
that use specimens taken from the site of the patient's infection usually provide
the most definitive identifying information.

   Initial culture reports from a microbiological laboratory may become
available within 12 hours from the time a clinical specimen is obtained from the

J. Lederberg

86

Privileged Communication


,YYCI!J PROJECT

Section 6.1.4

patient.  While the information in these early reports often serves to classify
the organism in general terss, it does not often permit precise identification.
It may be clinically unwise to postpone therapy until such identification can be
made with certainty, a process that usually requires 24 to 48 hours, or longer.
Thus it is commonly necessary for the physician to estimate the range of possible
infecting organisms, and to start appropriate therapy even before the laboratory
is able to identify the offending organism and its antibiotic sensitivities. In
this setting MYCIN plays two roles: (a) providing consultative advice that will
assist the physician in making the best therapeutic decision that can be made on
the basis of available information, and, (b) by its questioning of the physician,
pinpointing the items of clinical data that are necessary to increase the
validity of the clinical decision.

   Our project is an interdisciplinary effort involving the joint effort of
computer scientists from the Stanford Computer Science Department, and clinicians
from both the Department of Clinical Pharmacology at Stanford and the Department
of Infectious Disease at the University of Arizona.  The task of the clinicians
has been to specify the decision rules necessary for diagnosis and therapy
selection, while the computer scientists have been devising ways to represent and
use this information in the computer.  The system is then tested by the
clinicians using real cases obtained from journals and medical records.

A complete listing of the staff is given below.

Stanley N. Cohen, MD, Clinical Pharmacology
Bruce G. Buchanan, PhD, Computer Science
Stanton Axline, MD, Infectious Disease (now at University of Arizona)
Randall Davis, PhD, Computer Science
Frank Rhame, i4D, (to g/75), Infectious Disease
Edward Shortliffe, MD PhD (to 6176, returning 6/77), Infectious Disease
Victor Yu, MD, Infectious Disease
Rudolph0 Chavez-Pardo, MD, (to q/75), Clinical Pharmacology
A. Carlisle Scott, MS, Computer Science
Sharon Wraith, BS, Clinical Pharmacology
Jan Aikins, BS, Computer Science
Robert Blum, MD, presently in Computer Science
William Clancey, AB, Computer Science
Larry Fagan, AB, Computer Science
William van Melle, AB, Computer Science

Progress Report

Period covered:  June 1, 1974 through September 30,1976

Summary

   Over the past three years we have designed, built and partially evaluated a
computer program capable of diagnosis and therapy selection for certain varieties
of infectious diseases.  The program is intended to function as a consultant, and
"interviews" a doctor about his patient, requesting information on clinical
findings and results of laboratory tests.  It relies on a store of judgmental
knowledge (obtained from experts in infectious disease) to determine the

Privileged Communication

87

J. Lederberg


Section 6.1.4

MYCIN PROJECT

conclusions which can be drawn from the answers it receives.  This judgmental
knowledge is in the form of some 400 decision rules dealing with the wide range
of topics that must be considered in determining the likely identity of causative
organisms and selecting appropriate antimicrobials.

   HYCIN is composed of the three systems described earlier (the consultation,
explanation, and knowledge acquisition systems), all of which reference the
knowledge base of decision rules.  The program is currently capable of dealing
with bacteremia and meningitis infections.  It can diagnose the likely presence
of more than 35 different organisms and can recommend therapy for 100 organisms,
selecting drugs from a "pharmacopoeiatt of 30 antimicrobials.  The system can
tailor its therapy recommendations to a specific organism and infection, can
adjust dosage levels and durations in response to impaired renal status, and can
combine drugs to create combination therapies, giving it a wide range of clinical
applicability.

Detailed Report

   Cur work in the past several years has been organized around five main
areas of investigation.  We have

a) increased the system's competence in existing areas of clinical expertise
while expanding its scope

'0) developed a number of user-oriented features to increase the program's
  attractiveness to clinicians

c) developed a range of knowledge acquisition capabilities to speed the
process of expanding the system's clinical competence

d) solved a number of technical problems to insure that the program does not
  outgrow the computer resources available to it

e) evaluated the system's level of expertise.

Clinical Capabilities

   Since the primary qualification for any clinical consultant is competence
in the domain, we have devoted significant effort to expanding MYCIN's knowledge
base and widening its scope of competence.

   For instance, the system was directed initially at patients with positive
blood cultures, the basic methodology was generalized to support a much broader
approach to the problem.  MYCIB has now gained the ability to deal with
infections from which the causative pathogen hasn't been isolated (e.g.,
pneumonia), or which haven't even been cultured (e.g., brain abscess).   With this
broadening of scope, it has also become necessary to be able to evaluate the
meaningfulness of isolates for cultures taken from sites other than blood.   For
urine and sputum isolates, for example, the system gained the ability to base its
evaluation of sterility of an isolate on both the method of collection and the
user's estimation of conscientiousness of collection.

J. Lederberg

88

Privileged Communication


HYCILJ PROJECT

Section 6.1.4

   An extensive review of the program's approach to drug selection has led to
a major revision in the basis for therapy selection during the course of program
development.  The program was given the abi1it.y to consider both the infectious
disease diagnosis and the significance of the organism as further determinants of
tnerapy , in addition to organism identity,  These three together have become the
primary factors in drug selection, with drug toxicity and ecological factors as
secondary considerations.  The result is a more appropriate, more sharply
focussed drug selection that also includes dose, route, and duration.

   While the initial development of the knowledge base focussed on rules
concerned with the diagnosis and therapy for blood infections (bacteremia) , the
complexity of infectious disease therapy and the frequent occurrence of multiple
infections in a single patient requires a broader knowledge if the system is to
be clinically useful.

   In response we have extended MYCIN's knowledge base, while at the same time
improving the degree of sophistication with which the system deals with
bacteremia.  The second major area has been the diagnosis and treatment of
meningitis, and more than 100 rules were added to provide the ability to deal
with it.  In the processs the program was also extended beyond bacteria, as it
gained the ability to consider and treat both fungi and viruses.

   This area has proved to be an especially useful domain because it has
presented several new challenges.  In particular, meningitis requires the ability
to deal with a disease that is often diagnosed on clinical grounds alone, before
any specific microbiological evidence is available (by comparison, the diagnosis
of bacteremia on clinical grounds alone is far less certain, and usually requires
establishment of the fact that bacterial growth has occurred in blood cultures.)
For this reason, extension of the project into the meningitis area has made it
necessary for MYCIN to consider a larger range of clinical factors, and has
resulted in a system Which has a broader picture of the whole patient.

   Other contributions to the system's competence have come from expansion of
the knowledge base to include information about normal bacteriological flora for
a wide range of culture sites.  This enables the program to distinguish between
normal and pathological flora, and it can as a result decide more precisely on
whether to treat.

User Oriented Features

   Clinicians traditionally shun computer programs, and we believe this is in
large measure due to insufficient attention paid to user oriented features. As a
result, we have devoted significant effort to insuring that MYCIN is responsive
to its users in a number of unique ways.  The development of the explanation and
question answering capabilities have been a essential for this work, and both
have grown extensively in power.

   The system's ability to explain the motivations for its questions, for
instance, underwent a major desi,gn revision.  It is now based on a more powerful
approach that relies on the program's knowledge of its own control structure and
ability to examine its own rules.  The user can now fully explore the system's
current line of reasoning, rather than just a single level, as initially
implemented.

Privileged Communication

a9

J. Lederberg


Section 6.1.4

MYCIN PROJECT

   The language understanding capabilities of the question answering system
have also been extensively revised.  They now allow a broader range of questions
to be asked and offer more precise answers.  The use of this feature was also
simplified so that the user no longer needs to classify his questions.

   A comprehensive review of the kinds of questions asked by users of the
system has led to a number of important features.  MYCIN can now answer a much
wider range of questions, and can, in particular, explain why it did not take a
specific action, as well as why positive conclusions were reached.  It is our
feeling that capabilities such as these are of great importance in enabling the
project's staff and clinical experts to understand the program's rationale for
its actions in instances where its recommendations do not appear to be the most
appropriate and most correct.  Thus, the line of reasoning of the program can be
evaluated, and requirements for new or modified rules can be uncovered.  These
kinds of capabilities are also important in optimizing user acceptance of the
system.

   A substantial addition to the question-answering facility enables the
system to explain the process of therapy selection.  In comparison to the
diagnostic process, therapy selection is complicated somewhat by the need to
consider a range of different factors simultaneously, such as the total number of
drugs recommended, the degree of sickness of the patient, possible interactions
between drugs, toxicity and other side effects, etc.  Despite this complexity,
explanations of therapy selection are phrased at a conceptual level that makes
them comprehensible to the physician.  As before, this makes it possible for the
physician to verify the validity of the system's decisions, and makes it clear to
him that the system reaches its results in much the same way that he does.

   The explanation consists of a step-by-step review of the reasoning which
led to recommending a particular drug for a specific organism.  It considers such
issues as why a drug was first considered for an organism, why a drug may have
been chosen as the best therapy for that organism, how the total number of drugs
was reduced by considering common drug classes among the candidates, and
consideration of possible contraindications based on the patient's allergies,
age,  and other factors.  By characterizing each drug according to this scheme,
the program can explain why a drug was or wasn't prescribed, as well as why one
drug is to be preferred over another.  This offers an important explanatory
capability that will make the system more attractive and acceptable to
clinicians.

   Several capabilities have been added to make the program easy to use. The
system is now more tolerant of erroneous or inappropriate responses, and is able
to provide a reworded question, along with a list of acceptable answers. In
addition, it has the ability to recognize responses which are not sufficiently
precise, and can rephrase its questions accordingly.

   We have recently added to the system the ability to modify drug dosage in
cases of renal failure.  Where, previously, the system only issued a warning to
modify doses,  it is now able to use either creatinine clearance or serum
creatinine levels to compute the level of renal function.  The program then uses
drug-specific information (e.g., half-life, percent loss of the drug via renal
excretion, etc.) to adjust the regimen.  It can either (a) adjust dose levels
downward and leave dosing interval unchanged, or (b) increase dosing interval and

J. Lederberg

90

Privileged Communication


MYCIN PROJECT

Section 6.1.4

leave levels unchanged, or (c) allow the physician to select a dose interval, for
which it chooses an appropriate dose level,

   Since the problem of determining renal status and the proper adjustment of
drug dose is important in the use of aminoglycoside antibiotics, cephalosporins,
and other antimicrobial agents,  the customization of drug dosage recommendations
will be an important addition to the power of the system.

   We have found, in addition, that there is a substantial amount of
information that is routinely collected in every consultation, like the date and
site of each of the cultures, gramstain and morphology results for each of the
organisms that grew out, etc.  Currently, the program exhaustively analyzes each
culture and all of its organisms in turn.  Some users of the program appear to be
impatient with this method, and would much prefer to enter all the relevant data
on all the cultures and organisms at once.  This is faster and easier, since the
information can be gathered in a single review of the chart, instead of having to
review it several times as each culture is processed.  In response to this, we
have reorganized tne consultation slightly, so that it is possible to enter all
of this data at once, at the beginning.  This offers two other advantages in
addition to improving the program's acceptability to its users.  First, it
provides a basis for our future efforts to write rules which deal with
interactions between infections (see below, "Specific Aims"), and second, it
suggests a mechanism for eventually merging our work with the product of existing
efforts to organize and automate the recording and handling of medical record
data.  This latter development may in time make it possible for MYCIM to obtain a
large part of the information it requires directly from such autotnated records,
sharply reducing the number of questions it has to ask, and speeding up the
consultation considerably.

   Finally, several new capabilities make the system convenient to use, in
anticipation of its evaluation in the clinical setting.  Among these are the
option of the user to type a comment about system performance at any time during
the consultation.  His comment is recorded in a special file which is reviewed
periodically by our medical staff, and provides an on-going opportunity for users
to offer feedback aimed at improving the usefulness of the system.   The user can
also indicate his belief that the system has "broken down" in some way and he is
invited to describe the problem.  His description is saved along with information
about the current state of the program, so that our systems programmers can deal
with the problem later.

Knowledge Acquisition

   A preliminary knowledge acquisition program was completed in the middle of
1974, and demonstrated the feasibility of having a physician teach the system new
rules using a rather stylized subset of English.  Building on the experience
gained here, work began on a revised program designed to allow the user to
examine and modify the program's knowledge and behavior as a single, unified
action.  This program was designed to make the explanation and knowledge
acquisition capabilities available together, to make use of the fact that the
nature of the explanations requested can give a clear hint about the content of a
new rule.  The program was also designed to advise the user about the effect of
his rule on the original deficiency, indicating, for instance, whether or not it
corrects the problem he noticed.

Privileged Communication

91

J. Lederberg


Section 6.1.4

MYCIN PROJECT

   Work on a preliminary version of this new program was completed in 1976,
making available a broad range of useful features enabling our clinical experts
to add rules to the system without requiring that they have a knowledge of
programming.  If the expert finds that MYCIN's handling of a particular problem
is at variance with his own expert knowledge, he can use the explanation
capabilities to discuss the line of reasoning in use at that time, can add or
modify rules in the knowledge base, and can determine the effects of the changes
on MYCIN's subsequent performance.  (Quality control is maintained on the overall
system by regular meetings of our clinical and pharmacological experts who
determine the "official I1 MYCIN knowledge base.)

Technical Issues

   As MYCIN's clinical capabilities have expanded, efficiency has improved as
a result of a number of modifications to the system's technical capabilities.
Early in our dark, for instance, a comprehensive review and modification of the
control structure was undertaken to improve efficiency and generality.  The
resulting program was both more direct, and faster.

   More recently, modifications have been made so that the the large English
dictionary can be kept on the disk and accessed only as needed, rather than
keeping it in core, which slows down the system's response speed.  The self
documenting features of the program have also been improved to make them faster,
and the system's interaction with the terminal has been made more uniform, to
prepare for the time when different users of the system may have various
different kinds of terminals.

Evaluation Activities

   Since clinicians are likely to require documentation of MYCIN's competence
and utility before seeking its advice, considerable time has been spent on
evaluating the system and on implementing a range of program features to support
these efforts.

   In the past two years we have obtained many useful suggestions from
clinicians when the system was presented to several different conferences. In
February 1975 it was presented to the Western Society for Clinical Research, in
September 1975 to the International Symposium on Clinical Pharmacy and Clinical
Pharmacology, and more recently (June 1976), it was presented to the Drug
Information Association.

   A large scale formal study and evaluation of MYCIN's performance was begun
in January 1976.  The same set of clinical data was provided to both MYCIN and a
set of experts in infectious disease therapy.  [Five of the experts were
nationally recognized authorities in the field, the other five were clinical
fellows in the Infectious Disease Division at Stanford.  A complete list of
names, titles and affiliations is found in Appendix B.] The judgments of the
program and the experts were compared, and the experts were asked to evaluate
MYCIN's performance.

J. Lederberg

92

Privileged Communication


MYCIi1 PHOJECT

Section 6.1.4

    To do this, we first designed a form to allow us to separate the variables
requiring analysis.  The parameters evaluated include

A.  the "qualityt  of the interaction - were any questions irrelevant
     or missing

0,  the program's ability to determine organism identity
C.  the program's ability to determine organism significance
D.  the program's ability to select proper therapy
E.  overall performance evaluation
F.  potential impact as a clinical tool or teaching facility

The evaluation form was designed to be informative yet simple to complete. It
was tested in a pre-evaluation trial run, then used for the formal study.

    Consecutive patients with positive blood samples were evaluated for
inclusion in the study by project personnel, until we obtained at least 10
patients for which MYCIN recommended therapy, and 15 patients overall (patients
were rejected if they uere outpatients when the sample was drawn, if they had a
previous blood culture in the preceding seven days, or if they had a diagnosis of
meningitis or infectious endocarditis.) For each of the patients accepted, a one
to two page clinical summary was prepared and combined with a summary of the
laboratory test data as of the time when the first blood culture was obtained.
Tnis information was then used to obtain a therapeutic evaluation from WYCIN.

   Each of the participating experts received a set of fifteen evaluation
forms (one for each patient).  Each form contained: (a) the clinical summary and
lab data; (b) space for the expert to record his conclusions about the nature of
tne infection, likely causative organisms, and appropriate therapy; and (c) a
transcript of the PIYCIN consultation along with space for the expert to record
his opinion of various aspects of IflCIl\l's performance.  By presenting the
information in this order, we obtained a therapeutic regimen from the expert
based on the same information supplied to MYCIN.  This allowed us to compare the
expert's answers to MYCIN's, and also gave us the expert's opinion of the
system's performance.

    In the past few months a sufficient number of the forms have been returned
that we were able to do a prelilninary analysis.  The figures below are based on
the nine (out of ten) which have been returned.

   Since it is difficult to select a single number which summarizes
performance,  we have in general measured eaclh of the parameters listed above in
three ways: (i) the percent of instances in which the program was judged exactly
correct, (ii) the percent of instances in which the program's performance was
judged exactly correct or an acceptable alternative, and (iii) the percent of
cases in which a majority of the experts judged its performance exactly correct
or an acceptable alternative.  By using all three measures, we obtain a range of
figures which give a good picture of the program's performance.

   All of these attempts to evaluate performance are complicated by the fact
that (as expected) the experts' own choices about each patient were not
unanimous.  Thus, we cannot ask whether MYCIN's answers were "correct" in any
absolute sense,  since there was no agreement on what constitutes ttcorrectU.
Instead, we ask now often each individual expert rated the program's responses as

Privileged Communication

93

J. Lederberq


Section 6.1.4

MYCIN PROJECT

correct .  But given the variation among experts themselves, the program can never
be expected to reach 100%) and depending on the extent of the intra-group
variation, the absolute limit may in fact be much lower.  Thus the ideal question
to ask is "Do experts rate MYCIcJ's performance correct at least as often as they
rate each other's performance correct? I1 This would give a good indication of how
close the system's performance was to that of the group of experts as a whole.

   We have been able to do this in a few isolated cases, but in general it
requires more information than we were able to collect.  This is discussed in
more detail below, but in general terms the problem is that we were able to ask
each expert for his choices for each patient, and ask him to rate MYCIN's
choices.  But, without a second round of questionnaires, which would ask each
expert to rate the acceptability of the other 9 experts' responses, we lack
direct information about intra-expert variability.  The figures below should be
reviewed with this caveat in mind.

A. ltQualityl'  of the interaction

   To measure the first item, the experts were instructed to mark any
questions in the consultation which they felt were irrelevant, and to note any
questions which they felt were omitted by the system.  Overall MYCIN did quite
well, as there were no consultations in which a majority of the experts felt that
any particular question was irrelevant or omitted.  On the average, there were
0.53 questions judged irrelevant and 0.55 indicated as omitted.

Table I summarizes the next four measurements.

J. Lederberg

94

Privileged Communication


MYCIN PROJECT                                          Section 6.1.4

       MYCIN 1st choice     MYCIN 1st choice        MYCIN 1st choice
        identical to an      identical to or an       identical to or an
      expert's 1st choice   acceptable alternative   acceptable alternative
                    to an expert's 1st     judged by a majority
                          choice              of experts
-----------------------------+-------------------------+-----------------------+
                                I                             I                           I

ORGANISM      56.3%      f        75.6%            I       81.8%           I

IDENTITY               I                             I                 I
           N= 414          I         N= 414           I
                                                             I        N= 11           I

-----------------------------+-------------------------+-----------------------+
                                I                             I                           I

ORGANISM      91.7%          I          NA              I       100%            I

SIGNIFICANCE                    I                             I                           I
          N= 36          I                             I        N= 4            I

-----------------------------+----------------------~--+-----------------------+
                                I                             I                           I

THERAPY       12%            I        75%              I       91%        I
SELECTION                       I                  !                           I

           N= 99      I         N= 99            I                           I
                                                             I         N= 11

-----------------------------+-------------------------+-----------------------+
                    !                 !                 I
OVERALL       17.0%          I        59.3%            I       GO .O%           I

PERFORMANCE             !                             I                           I

           N= 135         I         N= 13'5       !        N= 15           I

-----------------------------+-------------------------+ -----------------------+

Table I

Summary of nine experts' responses to MYCIN's performance on 15 cases

B. Organism Identity

   For organism identity, the experts were asked to rate each of MYCIN's
selections as exactly correct (they agreed that the organism was likely to be
present), an acceptable alternative (they had not chosen that organism, but
agreed it might be present), or an unacceptable choice (they disagreed with its
selection).  Since 11 of the cases were not contaminants, and there was a total
of 46 organisms chosen by the system, with 9 experts rating each of those choices
we have an N of 414 for the first two columns and 11 for the third.

   In 569 of the instances the system's choices were identical to the
experts', 75% of them were either identical or acceptable alternatives, and in
82% of the cases, its results were acceptable to a majority of the experts.

   In addition, the experts were asked to indicate which organisms they felt
MYCIN had overlooked in its diagnosis.  For the 11 non-contaminant cases, the
experts indicated an average of only 0.35 organism identities that were
overlooked by the system.  In no case did a majority of experts feel that any
particular organism had been overlooked, suggesting that even the 0.35 figure is
a result of intra-expert variation.

Privileged Communication

95

J. Lederberg


Section 6.1.4

MYCIN PROJECT

C. Organism Significance

   The first question on the evaluation form gave the expert a chance to
indicate that he felt the patient did not need to be treated.  The first column
of the second row indicates the number of times the expert indicated no treatment
was necessary for a case in which MYCIN also judged the organism to be a
contaminant.  (Tnere is no number in the second column since we did not ask about
a "close call" on whether or not to treat.  In addition, the measurement is based
only on the contaminant cases, since in many of the cases where both MYCIN and
the expert determined that treatment was necessary, they based that decision on
different organisms.  We felt that it would be misrepresentative to call these
situations llagreementsV1.)

    As the figures show, in only three out of 35 instances was there any
disagreement with the system's decision on whether or not to treat.

D. Therapy Selection

   The expert was asked to select therapy for the organisms which he felt were
likely to be present before looking at MYCIN's therapy recommendation.   He was
then asked to judge MYCIN's choice of therapy for that patient.  Since MYCIN was
selecting therapy for the organisms which it felt were present (which may have
differed from those chosen by the expert),  this provides a fundamental comparison
of performance - it compares therapy selection performance of the two when they
are faced with the same clinical situation.

    This comparison is a difficult one to make, since it is complicated by the
difficulty noted above, of variability in the experts' performance and the need
to judge MYCIN with respect to that variability. Looking only at exact agreements
(i.e., two identical therapies) produces the figure in the first column, which
indicates that 12% of the time MYCIN's recommendation was identical to that of an
expert.  Comparing each expert's therapy choice with the other 8 indicates that
35% of the time (N= 396) any pair of experts chose identical regimens. The
experts were also asked to judge whether MYCIN's therapy was an acceptable
alternative (if it was not identical to their own), producing the figure in the
second column.  This indicates that it was either identical, or they felt it was
an acceptable alternative 75% of the time.  (Unfortunately, we have no reliable
way of judging the intra-expert variability here, without a second round of
questionnaires which asked each expert to rate the acceptability of the other
experts' choices.) [As an alternative, we have attempted to develop a measure of
how "far apart" two non-identical regimens are.  But the problem is difficult:
for example, for gram negative rods with salmonella most likely, is gentamycin
and chloramphenicol "very different" from gentamycin and ampicillin?   We have
been working on a "drug metric" to solve this problem, attempting to base the
"difference" between two drugs on factors like organism susceptibility, toxicity,
and drug efficacy, but this work is still in progress.]

   The figure in the third column gives a crude overall measure of therapy
selection performance, and indicates that in 91% (10 out of 11 cases), a majority
of the experts rated MYCIN's regimen as either identical to their own or an
acceptable alternative.

J. Lederberg

96

Privileged Communication


HYCIDJ PROJECT

Section 6.1.4

   [The evaluation form also asked each expert to choose a regimen for the
organisms which MYCIN had selected.  The intent here was to compare the system's
performance against the expert when both were faced with the same set of
organisms (rather than compared with the same clinical situation, as above).
Unfortunately, inconsistent answers on the part of the experts indicated that
they were not answering the question according to the instructions.  It appeared
that they were not able to suspend their own judgments about organism identity
sufficiently to select a regimen based on MYCIN's organisms alone.   For this
reason, we believe the data to be unreliable, and have not included it here.]

E. Overall Performance

   At the end of each evaluation form,  the expert was asked to rate the
system's overall performance as either excellent, good, fair, or poor.  The first
two columns of the last row indicate that 17% of these evaluations were
llexcellentlV, and almost 50% were either "excellentl' or "good" (only 13% were
llpoorll).  In 60% of the cases (9 out of 15), a majority of the experts felt that
MYCIN's overall performance was either f'excellentH or "goodtl.

F. Present Utility and Future Potential

   Finally, after completing the entire set of 15 patients, each expert was
asked to rate MYCIN's present utility and future potential as a clinical tool and
as an educational tool, rating it as having "considerable", "some", or "no"
potential.  The table below summarizes their response.

Evaluation of Present Utility

                    tlconsiderable"      llsomel'         "none f,

-----------------------+---------------+---------------+---------------+
clinical tool            I
                         I     11%     I     67%     I     22%     I

-----------------------+---------------+---------------+---------------+
educational tool         I     11%     I     89%     I      0%        I


---------------------------------------+- --------------+---------------+

Evaluation of Future Potential

                    "considerable"      llsome"         l*none'l

-----------------------+---------------+ ---------------+---------------+
clinical tool            I     11%        I     89%        I      0%        I


-----------------------c---------------c- --------------+---------------+
educational tool         I     67%     !     33%        I      0%        I


---------------------------------------+---------------+---------------+

                        Table II
Opinions of 9 experts on MYCIN's present utility and future potential

    To aid these evaluation efforts, we have also implemented a number of
useful features in the system.  For instance, MYCIN now keeps continuing

Privileged Communication

97

J. Lederberg


Section 5.1.4

MYCIN PROJECT

statistics of the use of rules in its knowledge base.  This will help us to
monitor its long term performance, to study the interrelationship between rules,
and perhaps detect automatically any inconsistencies or gaps in the knowledge
base.

   We have also designed and implemented a mechanism for "on-line" evaluation.
At the end of each consultation, the system asks a few questions about the
quality of its performance from the clinicians who are using it. This
interchange will be brief to avoid being a burden to the user, but it is expected
to represent an important addition to the other evaluation efforts.

   It will, for instance, make possible a new form of evaluation of the
system.  Rather than using a series of "prepackaged" cases as was done in our
initial evaluation, the next stage will be carried out using information entered
at a terminal by the evaluator.  The participating panel of experts will be
selecting patients in areas covered by the MYCIN knowledge base, and will engage
in a dialogue with the system about those patients.  Following completion of the
session, the on-line evaluation feature will ask questions about system
performance, and the responses will be tabulated and evaluated on-line by
appropriate biostatistical programs.  Specific recommendations which may point
out problem areas in the consultation will be reviewed by our staff.  By this
process we expect to be able to maintain a continuing evaluation of MYCIN's
capabilities in various areas, and pinpoint specific areas where performance is
suboptimal.

MYCIN Project Publications

THESES
--

Davis R,

Applications of meta level knowledge to the construction, maintenance, and
use of large knowledge bases, Thesis: PhD in Computer Science, AI Memo 283,
304 pp, Stanford University, July 1976.

Shortliffe E H,

MYCIN: A rule-based computer program for advising physicians regarding
Antimicrobial therapy selection, Thesis: Ph.D. in Medical Information
Sciences, Stanford University, Stanford CA, 409 pages, October 1974.  Also,
Computer-Based Medical Consultations: MYCIN, American Elsevier, New York,
1976.

PAPERS
---

Buchanan B G, Davis R, Yu V, Cohen S N,
Rule-based medical decision making by computer, Proc. MEDINFO 1977, to
   appear.

Clancey ti.
   Chronicler: an explanation system based on set-predicate representation
  of computational processes, submitted to 5th IJCAI.

J. Lederberg

98

Privileged Communication


MYCI!rl PROJECT

Section 6.1.4

Aikins J S.

Use of models in a rule-based consultation system, short paper submitted
to 5th IJCAI.

Davis R.

Interactive transfer of expertise: acquisition of new inference rules,
submitted to 5th IJCAI.

Davis R.

Knowledge acquisition in rule-based systems: knowledge about representations
as a basis for system construction and maintenance, to appear in Pattern
Directed Inference Systems, Waterman and Hayes-Roth teds.), Academic Press,
in press.  Also to be presented at Pattern Directed Inference Systems
Workshop, Honolulu, May 1977.

Davis R, Buchanan B G.
  Meta-level knowledge: overview and applications, submitted to 5th IJCAI,
  Cambridge, MA, August 1977.

Davis R.

A decision support system for medical diagnosis and therapy selection, Data
Base (SIGBDP newsletter), 8 (Winter 1977) pp 58-72.

Wraith S, Aikins J, Buchanan B G, Clancy W, Davis R, Fagan L, Scott A C,
van Melle W, Yu V, Axline S, Cohen S,

Computerized consultation system for selection of antimicrobial therapy,
American Journal of Hospital Pharmacy, 33 (December 1976) pp 1304-1308

Scott A C, Clancey W, Davis R, Shortliffe E H,
  Explanation capabilities of knowledge based production systems, American
  Journal of Computational Linguistics, Microfiche 62, 1977. Also, HPP Memo
  77-1, Stanford Computer Science Department, February 1977.

Shortliffe E H, Davis R,

SOme considerations for the implementation of knowledge-based expert
systems, SIGART Newsletter, 55:9-12, December 1975.

Davis R, Buchanan B, Shortliffe E H,
  Production rules as a representation for a knowledge-based consultation
  system, Artificial Intelligence, 8 (Spring 1977) pp 15-45. (Also, AI Memo
  256, Stanford University, October 1975).

Davis R, King J J,

An overview of production systems, in Elcock and 14ichie (Eds.), Machine
Intelligence 8: Machine Representations of Knowledge, John Wylie, to appear,
1377. (Also AI Memo 271, Stanford University, October 1975).

Shortliffe E H,

Judgmental knowledge as a basis for computer-assisted clinical decision
making, Proceedings of the 1975 International Conference on Cybernetics and
Society, pp 256-7, September 1975.

Privileged Communication

99

J. Lederberg


Section 6.1.4

MYCIN PROJECT

Snortliffe E H, Axline S, Buchanan B G, Davis R, Cohen S,
  A computer-based approach to the promotion of rational clinical use of
  antimicrobials, in Gouveia, Tognoni and Van der Kleijn (Eds.), Clinical
  Pharmacy and Clinical Pharmacology, pp 259-274, Elseiver/North Holland
  Biomedical Press, 1976.

E H Shortliffe, R Davis, S G Axline, B G Buchanan, C C Green, S N Cohen,
  Computer-based consultations in clinical therapeutics:  explanation and rule
  acquisition capabilities of the MYCIN system, Computers and Biomedical
  Research, 8 :303-320 (August 1975).

E d Shortliffe and B G Buchanan,
  A Model of Inexact Reasoning in Medicine, Mathematical Biosciences
  23:351-379, 1975.

Shortliffe E H, Rhame F S, Axline S G, Cohen S N, Buchanan B G, Davis R, Scott A
C, Chavez-Pardo R, and van Melle W J

MYCIN: A computer program providing antimicrobial therapy recommendations
(abstract only).  Presented at the 28th Annual Meeting, Western Society For
Clinical Research, Carmel, CA, 6 Feb 1975. Clin.   Res.  23:107a (1975).
Reproduced in Clinical Medicine, p. 34, August 1975.

Shortliffe E H
   MYCIN:  A rule-based computer program for advising physicians regarding
  antimicrobial therapy selection (abstract only); Proceedings of the ACM
  National Congress (SIGBIO Session), p.  739, November 1974. Reproduced in
  Computing Reviews 16:331 (1975).

E H Shortliffe, S G Axline, B G Buchanan, S N Cohen,
  Design considerations for a program to provide consultations in clinical
  therapeutics, Presented at San Diego Biomedical Symposium 1974 (February
  6-8, 1974).

E H Shortliffe, S G Axline, B G Buchanan, T C Merigan, S N Cohen.
  An artificial intelligence program to advise physicians regarding
  antimicrobial therapy, Computers and Biomedical Research, 6:544-560
  (1973) o

Articles About MYCIN

"Which Antibiotic?" Emergency Medicine, January 1977, pp 152-162.

J. Lederberg

100

Privileged Communication



I~YCIN PROJECT

Section 6.1.4

II) Interactions with Sumex-Aim resource

Collaborations and medical use of programs

Dr. Jon Heiser

    We have been working with Dr. Jon Heiser of the Department of Psychiatry of
the University of California at Irvine,  in an effort to create a consultant for
the use of psychoactive drugs.  We began by creating a version of Mycin that had
all of the infectious disease knowledge removed from it, and showed Dr.   Heiser
how to build up the required base of knowledge about the new field.

   He has, with his students, developed a small, but functional system that
demonstrates encouraging performance on the task. Work has now begun in earnest
to extend the competence of this pilot system, to produce a consultant with a
useful level of performance.

   It is interesting to note that the explanation capabilities required no
modification whatever, and worked in the new system exactly as designed for the
original system, despite the change in domains.

Privileged Communication

101

J. Lederberg


Section 6.1.4

MYCIN PROJECT

INTERNIST Project

   The Sumex computer has lnade possible a valuable interaction between
researchers on the MYCIl4 project at Stanford University and those working on the
INTERrJIST project at the University of Pittsburgh. These researchers are
studying the possible representations and uses for disease models in a medical
diagnosis system.  Both research groups have been able to run each others
programs and to study the medical knowledge bases which are stored on the Sumex
computer.  Communication between project members has also been greatly
facilitated through use of the Sumex system.

Stanford Infectious Disease Faculty

   Dr. Victor Yu of our group has been actively soliciting the involvement of
the Stanford ID faculty in the development and evaluation of IMycin.  He recently
presented the system to the faculty and fellows of the Department, and has been
seeking ways to involve the system in the Department's educational activities.
For instance, medical students under his supervision have used the system during
their ID rotation, comparing its results and reasoning process with their own on
problems encountered in patients on the wards.

The Pulmonary Function Facility

   Members of the Mycin project have also been collaborating with Dr. John
Osborn and his co-workers of the Presbyterian Hospital/Pacific Medical Center in
San Francisco on the development of a program to interpret the results of
standard pulmonary function tests.  The program is designed to perform a range of
tasks, including: identifying the need to repeat tests because of poor patient
effort;  identifying the need for additional information in order to make a more
definitive diagnosis;  reporting and explaining the reasons for primary and
secondary diagnoses and severity of any disease state; identifying the relation
between diagnosis and any referral diagnosis; and interpreting any change from
previous tests, or limitations on the interpretation because of the test
methodology and the patient effort.

Sharing with other projects

   Groups at Rutgers University, the University of Pittsburgh, Rochester
University, and the University of Virginia Medical School have all been involved
in varying degrees with running Mycin and evaluating its performance.  They have
suggested to us improvements in its design, and stock of medical knowledge, and
made useful contributions to its development.

    In addition, we have made use of the programs developed at both Rutgers and
Pittsburgh.  The former has been instructive to us in its handling of dynamically
changing situations, while the latter has helped us to develop our own ideas
about the modelling and use of prototypical descriptions of disease states.

   The Molgen group at Stanford has also profited from much of our experience
in acquiring knowledge and building large knowledge bases.  Several of their

J. Lederberg                       102

Privileged Communication


MYCIN PHOJECT

Section 6.1.4

techniques for accumulating knowledge about genetics are based on extensions to
ideas first suggested in some of our work.

   In all of these cases, the use of Sumex as a national resource has clearly
been a critical factor in making possible this sort of interaction.

Privileged Communication

103

J. Lederberg


i?YCIN PROJECT

References

Section 6.1.4

Cl1


c21


L31


[41


i51

C61

Reinan H H, D'ambola J, The use and cost of antimicrobials in hospitals, Arch
Environ Realth, 13:631-636 (1966).

Kunin C M, et.al.,  Use of antibiotics: a brief exposition of the problem and
some tentative solutions, Anns Int Med, 79:555-560 (1973).

Sheckler W E, Bennett J V,  Antibiotic usgae in seven community hospitals, J
Amer Med ASSOC, 213:264-267 (1970).

Roberts A W, Visconti J A, The rational and irrational use of systemic
antimicrobial dru.gs, A?ner J !-Iosp Pharm, 29 :82!?1-834 (1972).

Simnons H E, Stolley P D, This is medical  progress ? Trends and consequences
of antibiotic use in the United States, J Amer Med ASSOC, 227:1023-1026
(1974).

Kagan B M, Fanin S L, Bardie F, Spotlight on antimicrobial agents, JAMA,
226:306-310 (1973).

Privileged Communication

107

J. Lederberg


Section 6.1.5

PROTEIrj STRUCTUHE PROJECT

6.1.5    PROTEIN STRUCTURE PROJECT
                           - ----

Protein Structure Modeling Project

Prof. J. Kraut and Dr. S. Freer (Chemistry, U. C. San Diego) and
Prof. E. Feigenbaum and Dr. R. Engelmore (Computer Science, Stanford)

I.  Summary of research program
             --        --

A. Technical goals

  The goals of the protein structure modeling project are to I) identify
critical tasks in protein structure elucidation which may benefit by the
application of AI problem-solving techniques, and 2) design and implement
programs to perform those tasks.  We have identified two principal areas which
have both practical and theoretical interest to both protein crystallographers
and computer scientists working in AI.  The first is the problem of interpreting
a three-dimensional electron density map.  The second is the problem of
determining a plausible structure in the absence of phase information normally
inferred from experimental isomorphous replacement data.  Current emphasis is on
the implementation of a program for interpreting electron density (e.d.) maps.

B. Medical relevance and collaboration

  The biomedical relevance of protein crystallography has been well stated in
a recent textbook on the subject (Blundell & Johnson, Protein Crystallography,
Academic Press, 1976):

"Protein Crystallography is the application of the
techniques of X-ray diffraction . . . to crystals of one
of the most important classes  of biological molecules,
the   proteins. . . .  It is   known that the diverse
biological functions of these complex molecules are
determined by   and    are   dependent   upon   their
three-dimensional structure  and upon the ability of
these structures to respond to other Imolecules by
changes in shape.  At the present time X-ray analysis of
protein crystals forms the only method by which detailed
structural  information   (in terms   of the   spatial
coordinates of the atoms) may be obtained.  The results
of these analyses have provided firm structural evidence
which,  together with biochemical and chemical studies,
immediately suggests  proposals concerning the molecular
basis of biological activity."

   The Project is a collaboration of computer scientists at Stanford
University and crystallographers at the University of California at San Diego
(under the direction of Prof. Joseph Kraut) and at Oak Ridge National
Laboratories (Dr. Carroll Johnson).

J. Lederberg

108

Privileged Communication


PROTEIN .STHUCTURE PROJECT

Section 6.1.5

C. Progress summary

   During the past year we have been designing and implementing a system of
programs for interpreting three-dimensional e .d. maps.  Progress has been made by
attacking the problem from two directions: working upward from the primary data
(i.e. the array of e.d. values) to higher level symbolic abstractions, and
LJorking downward from the given amino acid sequence and other experimental
information to generate candidate structures which can then be confirmed by the
abstracted data.

   In the t'bottom-up" area of research we have developed and implemented
programs for analyzing topological features of the skeletonized e.d. map in terms
of protein structural elements (e.g., side chain:;, chain ends, bridges, etc.),
for finding local maxima, and, recently for generating a critical point network,
i.e.   a three-dimensional spannin g tree which connects all critical points
(peaks, saddle points) found in the map.

   In the "top-down" area we have designed and implemented, in INTERLISP, a
structure inference program which generates structural hypotheses at several
levels of detail.  At present the program can infer, from the amino acid sequence
and other chemical information, and the symbolic abstractions of the e.d. map,
the location of heavy atoms, cofactors and chain ends.  Those features provide
toeholds, i.e.  islands of certainty, from which additional structure is inferred
by extension.  Work is currently in progress on identification of the main chain,
disambiguation of multiply connected regions and classification of side chain
regions.

   The system under development is knowledge-based.  Both the corpus of
knowledge of the task domain and the problem-solving strategy knowledge are
incorporated as production-like rules.

D. List of Publications

I) Robert S.  Engelrnore and H. Penny Nii, "A Knowledge-Based System for the
InterpretatiOn of Protein X-Ray Crystallographic Data," Heuristic Programming
Project Memo HPP-77-2, January, 1977.  (Alternate identification: STAN-CS-77-
589)

2) E.A. Feigenbaum, R.S. Engelmore, C.K. Johnson, "A Correlation Between
Crystallographic Computing and Artificial Intelligence," in Acta
Crystallographica, A33:13, (1977). (Alternate identification: HPP-77-Z)

Privileged Co-mmunication

109

J. Lederberg


Section 6.1.5

II.  Interaction with the SUMEX-AIM resource
     --- -- - -        -I_-

PROTEIN STRUCTIJRE PROJECT

A. Collaborations

   The protein structure modeling project has been a collaborative effort
since its inception, involving co-workers at Stanford and UCSD (and, more
recently, at Oak Ridge).  The SUMEX facility has provided a focus for the
communication of knowledge, programs and data.  Without the special facilities
provided by SUIY~EX the research would be seriously impeded.  Computer networking
has been especially effective in facilitating the transfer of information.   For
example, the more traditional computational analyses of the UCSD crystallographic
data are made at the CDC 7600 facility at Berkeley.  As the processed data,
specifically the e.d maps and their Fourier transforms, become available, they
are transferred to SUMEX via the FTP facility of the ARPA net, with a minimum of
fuss. (Unfortunately, other methods of data transfer are often necessary as well
-- see below.) Programs developed at SUMEX, or transferred to SUMEX from other
laboratories, are shared directly among the collaborators.  Indeed, with some of
the programs which have originated at UCSD and elsewhere, our off-campus
collaborators frequently find it easier to use the SUMEX versions because of the
interactive computing environment and ease of access.  Advice, progress reports,
new ideas , general information, etc.  are comnunicated via the message and/or
bulletin board facilities.

B. Interaction with other SUMEX-AIM projects

   Our interactions with other SUMEX-AIM projects have been mostly in the form
of personal contacts.  We have strong ties to the DENDRAL, Meta-DENDRAL and
MOLTEN projects and keep abreast of research in those areas on a regular basis
through informal discussions.  The SUMEX-AIlvI workshop in *June, 1976 provided an
excellent opportunity to survey all the projects in the community.   Common
research themes, e.g. knowledge-based systems,  as well as alternate problem-
solving methodologies were particularly valuable to share. (That workshop was
very likely the most significant conference for applied AI to be held in 1976.)

J. Lederberg

110

Privileged Communication


Section 6.2

NATIONAL AIM PROJECTS

6.2   NATIONAL AIM PROJECTS_

   The following group of projects is formally approved for access to the AIM
aliquot of the SUF&X-AIr4 resource.  Their access is based on revietr by the AIF4
Advisory Group and approval by the AI>4 Executive Comittec?.

112

Privileged Comurlic&ion


1Z2H3ITIQN OF COGNITIVE PROCEDURES (ACT)

Section 6.2.1

6.2.1   ACQUISITION OF COGNITIVE PROCEDURES JACTL
         -         --

Acquisition of Cognitive Procedures (ACT)

Dr. John Anderson
Yale University

I.  Summary of Research mram
             ---       -.-e

A.  Technical goals:

   To develop a production system that will serve as an interpreter of the
active portion of an associative network.  To model a range of cognitive tasks
including memory tasks, inferential reasoning, language processing, and problem
solving.  To develop an induction system capable of acquiring cognitive
procedures with a special emphasis on language acquisition.

B.  Medical relevance and collaboration:

1.  The ACT model is a general model of cognition.  It provides a useful
  model of the development of and performance of the sorts of decision
  making that occur in medicine.

2.  The ACT model also represents basic work in AI.  It is in part an attempt
  to develop a self-organizing intelligent system.  As such it is relevant
  to the goal of development of intelligent artificial aids in medicine.

   We have been evolving a collaborative relationship rrith Dr.  .James Srearlo
;Icld Allan Lesgold at the University of Pittsburgh.  They are applying ACT to
modeling the acquisition of reading and problem solving skills.  We plan to make
ACT a guest system within SUMEX.  ACT is currently at the state where it can be
shipped to other INTERLISP facilities.  We have received a number of inquiries
about the ACT system.  ACT is a system in a continual state of developxrnt :I;lt LJ~
periodically freeze versions of ACT which we maintain and make available to the
national AI community.

2.  Progress and accomplishments:

   ACT provides a uniform set of theoretical mechanisms to model such aspects
Of human cognition as memory, inferential processes, language processing, and
problem solving.  ACT's knowledge base consists of two components, a
propositional component and a procedural colilpone~rlt.  The propositional cornponellt
is provided by an associative network encodin; a .?&t  OF' facts knotin about t.he
world .  This provides the system's semantic memory.  The procedural component

Privileged Communication

113

J. Lederberg


Section 6.2.1

ACQUISITION OF COGNITIVE PROCEDURES ( ACT)

consists of a set of productions which operate on the associative network.  ACT's
production system is considerably different than many of the other currently
available systems (e.g., Newell's PSG) .  These dif ferenr,,?s hive been introduced
i!l order to create a system that dill operate on an associative n:?tNork and in
order to accurately model certain aspects of human cognition.

   A small portion of the semantic network is active at any point in time.
Productions can only inspect that portion of the network which is active at the
particular time.  This restriction to the active portion of the network provides
a means to focus the ACT syste,n in a large data base of facts.  Act i v <at. i 3 rl  13 arl
spread down network paths from active nodes to activate ned nodes anal links. To
prevent activation from growing conti3uollsly there is a dampening prxess which
loeriodically deactivates all but a select few nodes.  The condition of a
production specifies that certain features be true of the active portion of the
network.  The action of a production specifies that certain changes be made to
the network.  Each production can be conceived of as an independent !`(iernon.!' Its
:):x-pose is to see if the networ'k configuration specified in its co,n31i's.iorl is
-;-ltisfisd in the active portion.  If it is, the production mlli execute and c-?#.ls-?
ehaiges to :Qemory.  In so doing it can allow or disallow other productions which
are looking for their conditions to be satisfied.  Both the spread of activation
and the selection of productions are parallel processes whose rates are
controlled by "strengths" of network links and incIi.vidilal productions.   An
ilnportant aspect of this parallelism is that it is possible for a:Alt ipl:?
y xiuctiori 3 t.$3 I)~+ applied in a cycle through tile set .I? p->,h;t i,~:i-;.  `.!;12h Ji" i-.:13
early work on the ACT syste,n wa:j foou.s,o,i on developi?;S v 7.1 q) ~tat,imal ievices to
reflect. tne operation of parallel, strength-controlled processes and working out
the logic for creating functioning systems in such a computational medium.

   We have successfully implemented a number of small-scale systems that model
various psychological tasks in the domain of memory, lang:_\aJe processing, an;1
inferential reasoning.  A larger scale effort is undsrray to model the language
proi.:I.?8:3ing nechanis.ns of a young child.  This includes implementstio:i of a
product.ion system to analyze linguistic input, make inferences, ask and answer
questions, etc.  Also a great deal of effort is being given to developing
learning mechanisms that will acquire and organize the productions for this
language processing.  This learning program attempts to acquire proce'iures fron
examples of the computatiorl,s issirel of t.he procedures.   For instance, the
program learns to compre!lend ani generate sentences by bei:lt: giva.1 s.entenc.33 an:1
                                                                           2
gicture representations of the meaning of the  senten ces (actually hand encodings
of the pictures).  Although this effort is focused on induction of linguistic
procedures, the hope is to develop a general model of induction of cognitive
procedures and not to place any language-specificity into the induction
procedures.

    At the time of this report,  we have complete-l the F version of ACT which is
the system with learning capabilities.  We are currently testing ,.irl:i t:lriirig t:hJ
rystem on a number of linguistic examples.  Other projects which are progressing
in earlier versions of ACT include use of spreading activation to model semantic
disambiguation, modeling of the reading process, and modeling of solutions to
word arithmetic problems.

J. Lederberg

114

Privileged Communication


AZaUISITION OF COGNITIVE PROCEDURES (ACT)

Section 6.2.1

il.  Current list of project publications:

[l] Agdersorl, J.R.  Computer sirnlllation of a

langua.ze acquisitiofl 3fste.n

                                                                           :  9
  second report.  Ii1 D. LaBerge and S.J. SarIlue1.s (Ens.).  Per<:?ot.Q~i .3,x!
  i!illsdale, N.J.: L. Erlbaum Assoc., 1976.
   Comprehension.

[2] Anderson, J.R. Language, Memory, and Thought.
                                         --      Hillsdale, N.J.: L. Grlbaum,
  Assoc., 1976.

[3] Anderson, J.R.  Induction of augmented transitian networks.  "'3:r,nitiJe
                                                                       LL      --
  Science, 1977, in press.

[4] Anderson, J.R. & Kline, P.  Design of 8 ppo:luctioll ;;;.3&.1.  3'1;l'r t3 be
  presented at the Worksho:) on Patterfl-3irectel I~fererlz-: ;y?t?;;l:;, '-Hawaii, '43~
  23-27, 1977.

[s] Anderson, J-R., Kline, P. & Lewis, C.  Langage p r>ocessing by prodtic t ioei
  systems.  To appear in P. Carpenter and M. Just (Eds.). Cognitive Processes
                                                                          --
  in Comprehension
    -            ---- -  L. Erlbaum Assoc., 1977.

[61 Kline, P.J. & Anderson, J.P.  The ACTE !Jser's Manual, 1976.

II.  Interaction With the SUi%X-AI3 Resource
     -v-e -. - I- -- ------- ---.-

    The SilMEX-AIIL"I resource iu superbly suited for the needs of oul* pr,3ject.   We
have made the most extensive use of the IiUERLISP facilities and the facilities
for communication on the ARPANET.  We have found the SUMEX personnel extremely
helpful both in terms of responding to our immediate emergencies and in providing
advice helpful to the long-range progress of the project.  Despite the fact that
we are on the other side of the continerlt, we have felt almost no de:;radatio.l i.1
&Jr ability to do research.  We find we can easily li.:3t  o? tile terminal a s,rlaill
:>ortion of programs under modification.  The willingness of SiJp4EX mail listing
has also meant we can keep relatively up-to-date records of all programs under
development.

   A unique east coast advantage of working with SUMEX is the low loading of
the system during the mornings.  We have been able to get a great deal of vrork
Ione during these hours and try to save our computer-intensi,Je  ,/JJ~ rk  fo r  th ,z .5 :::
hours.

   We have found our one AIM work shop so far (1976) a very useful opportunity
to ineet with colleagues and exchange ideas.

   A particularly striking example of the utility of the SUMEX resource was
illustrated in the move from Michigan.  In the summer of 1976 Anderson moved to
Yale and Green0 to Pittsburgh.  There was no loss at all associated with having
to transfer programs from one system to another.  At Yale we were pro.gramming the
-lay after we arrived.  The SiJmX link has also permitted continued collaboration
with Greene.

Privileged Communication

115

J. Lederberg


Section 6.2.2

CHEMICAL SYNTHESIS PROJECT (SECS)

6.2.2   CHEMICAL SYNTHESIS PROJECT LSECSL
                 - ---- --- -

SECS -  Simulation and Evaluat.i3fl .3P 7'le:iixal. gjr~th3si.s

        W. Todd Wipke
     Department of CheGstry
:J;liversity of California at Santa Cruz

I.  Summary of Research Program
             --           -

A.  Technical Goals.

   The long range goal of this project is to develop the logical principles of
molecular construction and to use these in developing practical computer programs
to assist investigators in designing stereospecific syntheses of complex bio-
organic molecules.  Our specific goals this past year focused on improvement of
the library of chemical transforns, completion of b'he :>erception of !nolecular
Jynmetry and integrating the use of symmetry inf'ornation throughout SECS
including the strategy module.  We also wanted to improve the execution speed of
SgCS, and the speed of graphical interaction over remote communication lines. We
planned to simplify tne program from the user's viewpoint by including automatic
file failsafing , improvelnent of HELP commands, and non-fatal handling of al.1
errors, as well as production of user's manuals for operation of the progra:n and
the writing of chemical transforms.  Additionally we intended to initiate
applications of SECS to the areas of biosynthesis and metabolism of compounds, as
well as phosphorus chemistry.  Finally we hoped to improve the strategic
constraints and controls that guide SECS in growing a synthesis tree.

B.  Medical Relevance and Collaboration.

   The development of new drugs and the study of how drug structure is related
to biological activity depends upon the chemist's ability to synthesize new
molecules as well as his ability to modify existing structures, e.g.,
incorporating isotopic labels into biomolecular substrates.  The Simulation and
Evaluation of Chemical Synthesis (SECS) project aims at assisting the chemist in
designing stereospecific syntheses of biologically important Inolecllles. The
advantages of this computer approach over a manual approaches are manyfold: 1)
;:r'diL-?r speed in designing a synthesis; 2) freedom frm bias of :>ast experienc::
and past solutions; 3) thorough consideration of all possible syntheses using a
more extensive library of chemical reactions than any individual person can
remember ; 4) greater capability of the computer to deal with the many structures
which result; and 6) capability of computer to see molecules in graph theoretical
sense, free from bias of 2-D projection.

   SECS was designed to be able to apply any kind of chemical transfornation,
%id because of this generality we see SACS finding application in biogenesis and
metabolism (see section II A below) .  The objective of using SECS in biogenesis
is to predict possible biogenetic pathways for a given natural product and also

J. Lederberg

118

Privileged Communication


CHE"IICAL SYNTHESIS PROJECT (SECS)

Section 5.2.2

to predict related compounds which might also co-occur in nature.  This can be a
great aid in searching for new natural products and in structure elucidation.

   The objective of using SECS in metabolism is to predict the plausible
metabolites of a given xenobiotic in order that they may be analyzed for possible
carcinogenicity.  Metabolism research may also find this useful in the
identification of metabolites in that it suggests what to look for, and in the
identification of possible metabolic pathasys connectins il laetai):>Ii:-.:? ?. ) -1
<easbiotic .

C.  Progress and Accomplishments.

   RESEARCH ENVIRONMENT: At the University of California, Santa Cruz, we have
a CT-40 graphics terminal connected to the SUMEX-AIM resource by a 1200 baud
leased line and a TI 725 thermal printing teletype connected via TYMNET at 31Kl
baud.  UCSC has only a small IBM 370/145 and a PDP-11/45 (limit #of 12 K words per
tisar) available,  both Of which are unsuitable for t;'li~ ?i?s+ar'r)il.  Trofl July until
December our research group had to occupy temporary space Allring re;31oqdt;i;)il, :>lli;
i-; llOW finally in permanent space in Thimann Laboratories vhere lr~e have close
collaboration with other organic chemists.

    CHEMICAL TRANSFORMS: The library of chemical transforms has been
reorganized and reevaluated during the past year by Mr. Dolata, a student of
Professor D.A.  Evans of Cal Tech.  New reactions were added and the scope and
limitations of others were tipdated an:1 leading references prolri'led.
Additionally, Merck, Sharp, and Dohme Research Laboratories provided revisions of
Inany transforms which a group of 25 synthetic chemists had carefully researched.

   SY?'l!VIETRY: An efficient algorithm for recognizing molecular symmetry was
developed last year.  This year that algorithm has been tested against all
possible molecular point groups and a few problems which developed were
corrected.  The algorithm has been docunented and initial studies bezufl :)n
actually determining the point group of a molecule.  The symmetry group is Norm
utilized in conjunction with the symmetry of a chemical transforn so the
transform is applied in all possible unique ways, to generate a non-redundant set
of precursors.  This symmetry of course takes into account stereochemistry of
saturated centers and double bonds.  tie have surveyed literature syntheses for
examples of existing heuristics based on symmetry which can be tised for
automatically generating high level strategies.  This information has never been
pulled together before and should make an interesting contribution also to
organic synthesis.

   STRATEGIC CONTROL: Last year we began developing an implementation of
strategic control for SECS, and a simple language for expressing strategies
independent of chemical transforms.  Since these strategies contain expressions
llrhich refer to the molecular structure, it was also necessary to incorporate
sy$mnetry here too.  For example, if a p,articular bond is <iesignate:.l 8s ~~tr*ate<!z
::.> break, but a transform breaks another bo:l:.l , the stratagy is still sati.~fie:i if
tll? two bonds are equivalent by symmetry.  This problem becomes more complex when
pairs of bonds are specified and when there are logical connectives (AND, OR,
XOR, and NOT) involved.  This has however been solved. Other changes since last
year include a completely new user interface to strategy to alloti error

Privileged Communication

119

J. Lederberg


z 3rrec t.ion aiij very 9li~y Lmodif ication of goals.  Fir~31l.j 4q~~s~tit.rit:.i~e
                                                    9 <`>3 r i tie-1 L'3
:lave been performed to raeasure the effect of ie~elopiri~ ,i s;l,l:.1231:3 ti-33 nith
various t.yp23 ,3f strategic coclstrai,its.  The net result ol" bi1L.s dark is that t!le
Oser can more easily constrain SECS now to work only in areas which the user
decides are worthwhile, consequently fewer precursors are generated which the
user would delete.

    USER INTERFACES: Users of SECS had difficulty understandin:: how to copy
files into work areas in order to save or reStor. d~~l'i!ieSis t,-.s,zs. 4ou 9ZZS doe.3
41 i file aanipulation, elililinating the problem.  Further SEC'; ,~ON autmnticaily
f-lilsafes the synthesis tree at key points eo that in t'ne event of machine or
com.munication failure the user can automatically restart his analysis from the
last key point.  Considerable modifications were made to the graphical interface
for increasing readability and speed of interaction. Over long slow
communication lines (which happens to be the way most SECS users are aceessiix
tlld program) interactive graphics must be done dit.h care, ninimizi3~ the amount
c~!ld frequency of picture transmission, in order t.13 -rchiefs ~a~f::r). tolerable marl-
~l,~:;h j. :I,?  aos.nunicatio~l .  Lastly, we have implemented appro:)i*i3t,3 itlput +-,339 lW35
Lo eliminate the possibility of a fatal crash from user input errors. According
to user reports this was a major problem.

    PHOSPHORUS CHEtiISTHY: Graphical input and output procedures were developed
for entering the stereochemical configuration of a trigonal bipyrimid (TBP)
phosphorus ato:n and for producing a correct structural 4iayr?-cl fron the Inachine's
internal repreYent.ati.on.  The SEXA algorith for generating a st&??ocile~!li~?il,*'
Ii.1 I,JLl3 ,?-me uas extend& to deal with t;le 2') possi.bi.c: CI~II?~~~;II-- 1Y.L )I+ e)rt ?IJ.I ?T7
,a
* 2 it.3, including the ability to recognize enanti:) ?:-3r3.  T.:1+ \;,l;~s'I 131s.ti;: f II'
+?:)~133enting (chemical transforms cJas extended to facilitate manipulation of
T;3'S,  including changes from trigonal and tetrahedral configurations to square
base pyramid and TBP.  Queries may deal with apicophilicity, and axial or
equatorial orientation.  The fine details of phosphorus che:nistry stlch a3 thz
fact that groups entering or leaving the phospilorus cc>or(l.irlatioq sphere nor%ally
dig 33 from the apical position,  Pseudo rotatio,i, a~3i3r~~)hFl :.oityr, .31:1 3trai.l
: `1 ?`zy 3rd 23 1siddf .& in ev31ilat.i,1z tll.? .3tdi~`ia T3F :>.~;?flgflr 9l;icl;i.3 ,191 i 1 2'ieQ:l;ifl;:
P,) !'  1. l.? iI-,i 22;  .3i;nrJ3turds.  A li.br?i-y of @osphorur C>ilC:`jl i Jt !`y i .j .I.`) 4 `J $1: .llJ
;F!-?pXrS1 irl Coi~rii)Orition urCk'n a group at t'he iJtli\rt>r5it,y ,]f 3~.-~$o~~1~;, Fi-s:l::e.

    <3;4PJTER-AIDE9 ELJCIDATIO~1 :3? BiO~~ENE'I'IC P!L'ctli4ar'3: Alt:io@ 4 great amount
of effort has been spent on various areas of biogenesis, there have been few
attempts to develop general techniques for the elucidation of biogenetic schemes.
As a result, the formulation of biogenetic schemes has often been criticized for
its lack of rigor and explicit criteria.  Our approach is to 32velop gener,al
tesnniques which lead to the postulation of ~>lausiiie 5iogsneti.z pat'i;~,~ya, u?,in3
t.:l.? .SECj &s an aide in obtaining and analyzing solutions tt) ty1i.q .?~,?pl+?~ pn~l)ii?n.
It  Is o,lr `nope this application of comp,lter pw:,leril salJiils k+:'~~ii .I~~:?.s qi IL I:>:..
~11~ uncover new ti:aiys of recognizing an:1 evaluating bioyenetic pathways but, also
provide added support to deductions made from biogenetic schemes, such as the
generality of a scheme which may be tested in only a few species.

   With the proper input information and goals well defined there may be
explicit rules to guide the chemist to plausible biogenetic pathways for a
$articular natural product.  Unfortunately, the vast faajority of solutions LV
?.ills problem are zieter:ni,?e4 by a col-~bi~lati3n of tile z<p.?r ieil:: : 1 ?ai;.lr 11 ;~~l:l~:t..s

1 23

Privileged Communications


CHEtIICAL SYNTHESIS PROJECT (SECS)

Section 6.2.2

chemist's ability to consider the most important rules involved and his unique
set of experience-based prejudices. There may be some means to represent and
utilize all of the known relevant rules, data and possibly even experience-based
prejudices to arrive at the best plausible pathways.  The most precise method for
representing, developing and testing such a theory is in the form of a computer
program.  To implement such a computer program,  known rules and constraints must
be clearly defined, then those that are applicable can be applied at each step of
the analysis toward the desired goal.  This will keep the solution pathways
logically pure and insure that all alternatives which satisfy the rules and
constraints are considered.  This guarantee of completeness simply can not be
made using hand analysis.

   A new reaction library containing biogenetic transformations have been
written.  After inputting a natural product the program will apply the biogenetic
transforms which fit the natural product.  This generates a set of plausible
biogenetic precursors to the target natural product.  By continuing this process
with the precursors generated, the plausible biogenetic pathways for the natural
product can be discovered.

   The structures of marine natural products were entered into the program and
the plausible biogenetic pathways for these compounds were generated and
analysed.  Biogenetic pathways which had been proposed in tne literature were
among the pathways discovered, as were other plausible pathways which would now
have to be considered.  The success we attained in this research effort verified
the applicability of the SECS program as an aid in the analysis of metabolic
pathways.

    COMPUTER-AIDED PREDICTION OF METABOLIZES FOR CARCII~OGENICITY STUDIES: We
have initiated a research project in collaboration with the Chemical
Carcinogenesis group at the National Cancer Institute.  The objective of this
research is to establish a computer program by which a biochemist or metabolism
expert can explore the metabolism of a chemical compound.  The investigator
enters the substrate molecule by interacting with an input and structure editing
module.

   Then the program will apply the biological transforms which sfit'V the
structure, taking into consideration all the context information (2-D, 3-D, and
electronic) available about the transform and all perceived information about the
structure.  This will generate a set of metabolites which are one step away from
the substrate structure.  The metabolites will be ranked according to expected
probability or yield.  The exact parameters which should be monitored will be
determined during the course of this research.  An evaluation module may then
screen these metabolites according to criteria specified by the investigator.
Duplicate metabolites arising from different pathways will be labelled to
indicate that fact.  Finally the investigator will be shown the set of
metabolites together with data about the transform which produced each one and
the values of the parameters being monitored.

   The investigator may select one metabolite for further metabolism or may
request that all be processed for a specified number of steps.  In this way a
rltreeV' of metabolites is produced and displayed.  The entire state of the user's
tree may be saved to permit continuation of the analysis at another time.
Exploration of the metabolism tree will be predominately guided interactively by

Privileged Communication

121

.J. Lederberg


Section 5.2.2

CHEMICAL SYrlTHESIS PHOJECT (SECS)

the expert investigator.  We feel that at this stage of' development of the field
of metabolism and carcinogenicity that interactive guidance by the expert is
necessary.  There are many areas tihere the theory is very thin and a given
biological transformation may have been observed for only a few substrates.   When
this transform is applied to a new substrate, some unrealistic metabolites may be
generated owing to the deficiency of contextual information and constraints. An
expert is necessary to prune the tree and prevent the automatic processing of
those unreasonable intermediates.  It is much more efficient for the expert to do
this pruning as the tree is being grown,  rather than later after an enormous tree
has been completed.

    At some point either during tree generation or at the end, the metabolites
will be passed to another program which will identify those metabolites which are
identical or "similarl' to known carcinogens.  Those will be so marked in the
tree.

    Presently, tne major task is the aquisition of the metabolism knowledge
base, i.e.  the writing of the transformation library to be utilized.  Metabolism
experts at the National Institute of Health are gleaning this information from
both their own research and the metabolism literature.  This information will be
encoded and the first testing of this new application For the SECS program will
begin in June 1977.

D.  Current List of Project Publications

W.T. iJipke and P. Gund, t'Simulatiotl and Evaluation of Chemical Synthesis.
    Congestion: A Conformation Dependent Function of Steric Environment at a
    Reaction Center.  Application with Torsional Terms to Stereoselectivity of
   &cleophilic Additions to Ketones," J. Am. Chem. Sot., 98, 8107(1975).

W.T.  dipke, G. Smith and H. Braun,  '*SECS-Simulation and Evaluation of Chemical
    SyntheSeS : Strategy and Planning, " ACS Symposium Proceedings, 1977.

W.T. tiipke, Computer Planning of Research in Organic Chemistry, Proceedings of
   the Third International Symposium on Computers in Chemical Education,
    Research, and Technology, Caracas, Venezuela, 1976.

J. Lederberg

122

Privileged Communication


CHEI"IICAL SYNTHESIS PROJECT (SECS)

Section 6.2.2

S.A. Godleski, P.v.R Sohleyer, E. Osawa, and W.T. Wipke, "The Systematic
    Prediction of the Most Stable Neutral Hydrocarbon Isomer," J. Am. Chem.
    sot., 99, oooo(1977).

F. Choplin, R. Marc, G. Kaufmann, and W.T. Wipke, "Computer Design of Synthesis
    in Phosphorus Chemistry.  Automatic Treatment of Stereochemistry," J. Am.
   Chem. Sot., 99, 0000(1977).

Manuals:

SECS Users Manual, June 1976.
SECS Users Guide, Aug 24, 1976.
ALCHEM Tutorial, Sep 21, 1976.

II.  Interactions with SUMEX-AIM Resource

A.  Examples of Collaborations and Medical use of Programs via SUMEX.

    SECS is available in the GUEST area of SUMEX and has been accessed
experimentally by many others as well.  Professor R. V. Stevens (UCLA) explored
some syntheses of lycapodine while visiting Santa Cruz and as a result has
requested UCLA to obtain a graphics terminal so he and others at UCLA can access
SECS via SUMEX.  Professor W. G. Dauben's group (Rerkeley) has utilized the SECS
model builder on SUMEX is now extending the capabilities of that module of SECS.
Mr. Mel Spann of the National Library of Medicine toxicology program is
collaborating with us in developing a metabolism library for the metabolism of
catechol amines.  Also collaborating with us on metabolism are Drs. Ted Gram from
Guarino's lab, Harry Gelboin, Dhiren Thakken and Harukiko Hagi from Jerina's lab,
Lance Pohl from Gillette's lab, Sidney Nelson from Mitchell's lab, Lionel Poirier
from Weisburger's lab, and Ken Chu and Sidney Siegel all of whom are from the
National Cancer Institute.  Dr. Steve Heller of the EPA and Dr. G.A. Milne of
the National Heart and Lung Institute have expressed interest in putting SECS on
the Cyphernetics network as a part of the NIil chemical information system.
Restrictions on the allowed core image on that system have so far held up the
negotiations.

    For the past two years SECS has been available over TELENET from First Data
Corporation and has been accessed by industry: Squibb; Merck, Sharp and Dohme;
Pfize; Searle; Lederle Labs; FMC; and recently 3rd Corporation and Stauffer. Dr.
Beryl Dominy of Fizer recently presented a paper before the Pharmaceutical
Manufacturer's Association entitled "SECS and the Information Scientist" in which
he describes his experiences with SECS, including an example where a synthetic
chemist was having difficulty with a particular synthesis, he then went to SECS
for possible solutions.  SECS suggested another route as being better and indeed
that is what he found when he tried it later in the lab.

   The availability of SECS on SUMEX-AIM has also served health-related
research at the University of California, Santa Cruz. Model building using the
SECS model builder is being performed for Professor Edward Dratz (UCSC) to
generate conformations of fatty acids isolated from visual membranes ("Structure

Privileged Communication

123

J. Lederberg


Section 6.2.2                       CHEMICAL SYNTHESIS PROJECT (SECS)

and Function of Visual photoreceptors, I1 EIOOl75), and for Professor Howard Wang
(UCSC) to study how conformations of steroids may affect the local anesthetic -
membrane interaction ("Role of Membrane Proteins in Local Anesthetic Action,"
GM22242).

   We have assisted Professor J. E. McMurry in his synthetic work towards
Aphidicholine and Digitoxigenin by using the model builder for predicting
possible reaction pathways.  Rn example is given below, where the conformation of
the epoxy-ylide was calculated along with the strain energies of the two possible
closure products.

    C
   / \
/     \
C        C
!        ! \
!         !    0
!        ! /       +P-R
C        C        /   3
\       /        /
  \ /        - c
    C           !
     \          !
       \       c 0
                     em-
                     --v
        \     /
          \ /
           C

   Utilizing the SECS model builder, we have shown that attack on the epoxide
to form the fused system should be much more favorable then attack to form the
bicycle compound.  Similar studies have been undertaken to predict the
stereochemistry resulting from the acid catalyzed cyclization of McMurry's
Diaitoxigenin precursor (HL-18118 "Total Synthesis of Cardiac Aglycones."):
application of SECS using a special library of cationic sigmatropic rearrangement
transforms generated the possible products which facilitated identification of
some of the side products in the early cyclization experiments.  We have also
collaborated in the biogenesis work with Professor Phil Crews (UCSC) in marine
natural product biogenesis. Dr. Wipke has also used several SUMEX programs such
as COidGEN in his course on Computers and Information Processing in Chemistry.

a. Examples of Sharing, Contacts and Cross-fertilization with other
SUMEX-AIM projects.

    In collaboration with Dr.  Ray Carhart and Dr. Dennis Smith of the
DENDRAL/CONGEN Project, a Computers in Chemistry Workshop was held at U.C. Santa
Cruz on the weekend prior to the Fall 1975 American Chemical Society National
Meeting held in San Francisco.  The workshop attracted participants representing
all parts of the chemical community, academia, industry and government.  Morning
lecture/discussion sessions introduced the SECS and CONGEN programs running on

J. Lederberg

124

Privileged Communication


CHEMICAL SYNTHESIS PROJECT (SECS)

Section 6.2.2

SlJWX and the afternoon and evening sessions allowed llhands-onll experience for
the participants.  The response of the workshop participants was a very positive
one with many participants showing so much interest that future collaboration
and/or use of the powerful non-numerical computing tools available on SUMEX was
discussed.

   The SECS project has held joint research group meetings at Stanford with
the DENDRAL and AI groups to discuss common problems and research goals.  This
has been very rewarding since the groups are complementary in orientation.   These
joint meetings also let the members meet in person after having met on-line on
the network.  Last year's AI!1 Conference at Rutgers was also a valuable
experience, which allowed us to meet people interested in similar problems in
different disciplines.  It was particularly useful to have the opportunity to
talk with experts designing new languages for knowledge representation and to
hear them compare their systems.

Privileged Communication

125

J. Lederberg



Section 6.2.3

HIGHER MENTAL FUNCTIONS PROJECT

6.2.3    HIGHER MENTAL FUNCTIONS PROJECT

I>

Modeling of Higher Mental Functions

Kenneth M. Colby, M.D.

Professor of Psycniatry and Biobehavioral Sciences
   UniVerSity of California at Los Angeles

Summary of Research Program

A.  Technical Goals:

There are three technical goals of the Higher Mental Functions Project:

(1) To improve and "therapeuticallyfl experiment with a computer simulation of
  paranoid processes in order to make treatment recommendations to
  clinicians based on experience with the model.

(2) To develop a new taxonomy of psychiatric patients based on the conceptual
  patterns appearing in accounts of their illnesses.

(3) To develop an intelligent speech prosthesis for patients suffering from
  communication disorders.

B.  Medical Relevance and Collaboration:

   The Higher Mental Functions Project is located in the Neuropsychiatric
Institute at UCLA.  The medical relevance of its research concerns the fields of
psychiatry and neurology.  The Project collaborates with clinicians and
investigators in psychiatry, neurology, the neural sciences and neurolinguistics.

C.  Progress Summary:

   tie have improved the paranoid model to the point where it can be utilized
for therapy experiments.  (Tw model has now passed a true Turing Test in which
it cannot be distinguished from real patients.)

   The taxonomy effort is just under way, using the language recognition
program which serves as the front end of the paranoid model. This program will
have to be added to and modified to serve the purpose of finding and classifying
the conceptual patterns appearing in patients' accounts of their illnesses.

   We have interfaced a micro-processor with a voice-synthesizer to provide a
speech prosthesis for patients unable to speak.  The next step is to write an
"intelligent" algorithm which attempts to figure out what the patient is trying
to say from his partial input information.

J. Lederberg

128

Privileged Communication


HIGHER MENTAL FUNCTIONS PROJECT

Section 6.2.3

II.  Interactions with the SUMEX-AIM Resourcz
               --v

A.  Collaborations:

   The project collaborated with Professor Jon Heiser, Department of
Psychiatry, University of California, Irvine,  and consulted with Professor Robert
K. Lindsay, Department of Psychology, University of Michigan, in conducting a
Turing Test of the paranoid model.  Other users of SUMEX have received advice and
suggestions regarding their problems as well as opportunities to contrast their
simulations with ours.  We have benefitted greatly from others' comments on the
adequacy and inadequacy of our paranoid model.

B.  Sharing, etc.:

   Members of the project have participated in two workshops held at Rutgers,
presenting several papers, chairing panels, and conducting discussion groups.
Informal discussions with large numbers of workers in Artificial Intelligence in
Medicine have led to a helpful sharing of ideas and techniques.  SUMEX is
valuable to us as a communication channel combining the advantages of a telephone
and the U.S. mail without the disadvantages of either.  For widely scattered
researchers, it facilitates the intimate, low-level communication which is
normally accomplished in hallways or around water coolers.  The individual
discussions are not very profound, but the cumulative effect subtly improves our
research.

   Tne existence of SUMEX as an independent project naturally relieves
numerous researchers of the burden of separately financing and staffing a large
computer facility.

Privileged Communication

129

J. Lederberg


Section 5.2.3

HIGHER MENTAL FUNCTIONS PROJECT

D.  Up-to-date List of Publications:

Colby, K.M., Parkison, R.C. and Faught, B.  Pattern-matching Rules for the
  Recognition of Natural Language Dialogue Expressions.  Am. J.
  Computational Linguistics, Microfiche 5, Sept., 1974.

Colby, K.M.  Clinical Implications of A Simulation Model of Paranoid
    Processes.  Archives of General Psychiatry, 33, 854-857, 1976.

Faught, W., Colby, K.M. and Parkison, R.C.  Inferences, Affects and Intentions
   in A iqodel of Paranoia.  Cognitive Psychology, 9, 153-187, 1977.

Colby, K.M.  An Appraisal of Four Psychological Theories of Paranoid
    Phenomena.  J. of Abnormal Psychology, 86, 54-59, 1977.

Parkison, R.C., Colby, K.M. and Faught, W.S.  Conversational Language
   Comprehension Using Integrated Pattern Matching and Parsing. Artificial
   Intelligence (In Press) 1977.

Colby, K.M., Christinaz, D. and Graham, S.  A Computer-driven, Personal,
   Portable and Intelligent Speech Prosthesis for Aphasic Disorders.   Brain
  and Language (In Press) 1977.

Colby, K.M.  On the Way People and Models Do It.  Perspectives in Biology and
   Medicine (In Press) 1977.

J. Lederberg

130

Privileged Communication


HIGHER XEWTAL FUNCTIO~IS PROJECT

Section 6.2.3

deiser, J., Colby, K.M., Faught, h'. and Parkison, R.C.  Testing Turing Test
   (Forthcoming).

Fauqht, W.S.  Conversational Action Patterns in Dialogs.  Proceedings of the
   tiorkshop on Pattern-directed Inference Systems, May, 1977.

Privileged Communication

131

J. Lederberg


Section 6.2.4

INTERNIST PROJECT

6.2.4    INTERNIST PROJECT

INTERNIST - Diagnostic Logic Project

J. idyers, M.D. and H. Pople, Ph.D.
   University of Pittsburgh

I.  SUMMARY OF RESEARCH PROGRAG
            -

A.  Objectives

   The principal objective of this research project has been and continues to
be the development, evaluation, and implementation of a computer-based diagnostic
consultation system for internal medicine.  This work, which was initiated at the
University of Pittsburgh approximately six years ago, has been supported for the
past three years by a grant from the Bureau of Health Resources Development. A
heuristic diagnostic program called INTERNIST has been developed, along with an
extensive medical database now comprising more than four hundred disease
categories and two thousand manifestations of disease.  The system has been
tested with a wide variety of difficult clinical problems: cases published in the
medical journals, CPC's, and other interesting and unusual problems arising in
the local teaching hospitals.  In the great majority of these test cases, the
heuristic INTERNIST program has proved to be effective in sorting out the pieces
of the puzzle and coming to a correct diagnosis.  In some cases, as many as six
distinct disease entities have been identified correctly.

    We believe that by the time of the expiration of the BHRD grant in June,
1977, our original objective, which was to develop a system providing expert
diagnostic capability with regard to the major diseases of internal medicine,
will have been accomplished to the extent possible in the current laboratory
framework.

    At that time, we propose to initiate a broader collaboration, which will
invite the participation of remote users in

(a) further evaluation of the INTERNIST programs and data-base.

(b) development of specialized data-bases and procedures for various medical
  subspecialties.

(c) refinement of the user interface.

(d) investigation of alternate uses of the INTERNIST data-base.

   We believe that the expansion of the experience base of INTERNIST users,
which will result from this type of collaboration, will significantly enhance the
further course of INTERNIST development.

J. Lederberg

132

Privileged Communication


INTERIVIST PROJECT

Section 6.2.4

B.  Progress Summary

   Expansion of the medical data-base to encompass new areas of disease is an
on-going activity of the project.  Huch of this work is carried out by medical
students who elect to take part in the project as part of their fourth year
clinical rotation, with the period of participation varying from 6 to 18 weeks.

   Each student is assigned a group of diseases, usually in a specific
clinical area, for study.  Tne literature on a disease is studied exhaustively
for all quantitative data available.  Frequently clinical experts on the faculty
are consulted, particularly about controversial data.  The student compiles a
complex list of the manifestations of the disease under study and assigns
tentative measures of strength of association.

   The clinical principal investigator together with any other clinicians
working on the project then review the data exhaustively in order to assure the
appropriateness and completeness of the disease profile.

   The profile is then entered into the computer and tested for completeness
and reliability against a typical or fltextbookl' example of clinical cases. If
available, other cases of the disease from the floors of our university hospital
and from published cases such as the clinical-pathological conferences from the
New England Journal of Medicine and the American Journal of Medicine are also
used.  Further refinement occurs in the course of the continued use of the data-
base.

   In addition to this data-base development, work on a refined diagnostic
program has also been an on-going activity during this period.

   The present INTERNIST process employs a 'problem - formation' heuristic,
which identifies one of perhaps several problems in a clinical case as its
initial focus of problem-solving attention.  Although only one problem is
considered at a time, the process recycles after each problem is solved, thereby
uncovering the entire complex of diseases present.  In the great majority of
clinical cases tested, this strategy of iterative problem formation and solution
has proved to be effective in sorting out the complexities of a case and
rendering a correct diagnosis.  In many respects, however, it seems clear that
performance could be significantly enhanced if the program were to attend to the
various component problems and their inter-relationships simultaneously.  Use of
a more global problem - formation strategy could be expected to yield more rapid
convergence on the correct diagnosis in many cases, and in at least some cases to
prevent missed diagnoses.

   Alternative problem formation strategies that exploit the type of
pseudoparallel processing facilitated by the INTERLISP 'spaghetti stack' are
presently being investigated.  We believe that this research will also set the
stage for subsequent development of a therapeutic management component of the
INTERNIST consultation facility; however at the present time it is not possible
to project a precise timetable for the development of these additional
capabilities.

Privileged Communication

133

J. Lederberg


Section 6.2.4

INTERNIST PROJECT

c.  Publications

1. Pople, H.E., i"lyer.3, J.D., & Miller, R.A.,  "The DIALOG Model of Diagnostic
Logic and its use in Internal Medicine".  Proceedings of the Fourth
International Joint Conference on Artificial Intelligence, Tbilisi, USSR,
September 1975.

2. Pople, H.E., "Artificial-Intelligence Approaches to Computer-based Medical
Consultation, Proceeding IEEE Intercon, New York, 1975.

3. Pople, H.E., "Tne Syntheses of Composite Hypotheses in Diagnostic Problem
Solving: An Exercise in Hypothetical Reasoning".  Proceedings of the Fifth
  International Joint Conference on Artificial Intelligence, August 1977
  (forthcoming).

II.  UTERACTION 'riITH SUMEX-AIM RESOURCE

A.  Medical Use of Programs and Collaborations

    Because of the research and development nature of our work on the INTERNIST
system over the past several years, we have been somewhat limited in our ability
to establish wide-spread collaborations.  However,  members of the medical house
staff in the local hospitals having some prior experience with the project have
continued to work with INTERNIST while pursuing their medical training. In
addition, project staff often have occasion for interaction With individuals and
groups who have interest in the characteristics of the diagnostic system from
both medical and computer science perspectives.  Future plans for more extensive
collaboration are discussed in section III.

J. Lederberg

134

Privileged Communication


1NTER;JIST PROJECT

Section 6.2.4

B.  AIl"i Interactions

   We have benefitted considerably from interactions with other members of the
SUMEX-AIM community.  In June `76 we participated in the AIM workshop at Rutgers,
whicn provided an excellent perspective as to what else is going on in the field.
During the past several months we have had useful exchanges with Randy Davis,
Victor Yu, and John Foy, three individuals participating in the MYCIN project.
In addition, we rather routinely interact with SUMEX staff regarding fine points
and problems relating to our use of system facilities.

   The opportunity to keep abreast of developments in a fast changing field is
one of the principal benefits to be derived from the collegial environment
fostered by SU?4EX-AIM.

Privileged Communication

135

J. Lederberg


Section 6.2.5

MEDICAL INFORMATION SYSTEMS LABORATORY

6.2.5    MEDICAL INFORMATION SYSTEMS LABORATORY
         ---         - --

MISL - Medical Information Systems Laboratory

r/l. Goldberg, M.D. and EC. McCormick, Ph.D.
University of Illinois at Chicago Circle

I) SUMmARY E RESEARCH PROGRAM

A.) TECHNICAL GOALS

   The Medical Information Systems Laboratory (MISL) was established under
grant HM-0114 in Chicago to pursue three activities: i) Construction of a
database in ophthalmoloqy, ii) Clinical knowledge system support, and iii)
Network-compatible database design.  Priorities in year 04 of MISL's operation
are the same as in previous years: investigations into how to construct a
database in ophthalmology, and into distributed database design, are ancillary to
the exploration of a clinical knowledge system to support clinical decision
,making.  We are developing ways to get reliable clinical information into the
ophthalmic database primarily because we are interested in getting out
significant clinical decision support.

13) APPROACH AND filEDICAL RELEVANCE

B.l) Construction of the database in Ophthalmology

   A specific aim of this project is to construct a workable database in
ophthalmology, using the outpatient population of the Illinois Eye and Ear
Infirmary.  We view this database as a testbed for developing clinical decision
support systems.  The Ophthalmology Department of the Illinois Eye and Ear
Infirmary provides an excellent environment for evaluating new techniques for
capturing and using clinical information.

B.2) Clinical knowledge support system

   The goals for clinical knowledge system development are to provide a
flexible user interface for a prototype relational database system, to devise
means of accessing alphanumeric and pictorial information stored in the database
system, and to provide efficient means for logically restructuring a database so
that it can be adapted to different operating environments in a network-
compatible distributed medical information network.

   No clinical database, however, has intrinsic significance beyond its
ability to support the diagnosis and management of disease.  Additional goals for
the clinical knowledge system are therefore to devise computer-based consultation
systems for glaucoma and selected retinal/choroidal diseases, and to provide

J. Lederberg

138

Privileged. Communication


MEDICAL INFORMATION SYSTEMS LAaORATORY

Section 6.2.5

formal models which permit the relational development and evaluation of rule-
based consultation systems containing 2,000 - 10,000 rules.  In recognition that
a continuum exists between physician-guided decision support and computer-based
consultation,  we choose to describe these services as a Clinical Knowledge
System: a consortium of a clinical database and rules for its interpretation.

C) PROGRESS SUMMARY (INCLUDING ITBMS OF INTERES'I TO SUMEX-AIM COMMUNITY ONLY)

C.l) The database in ophthalmology

    Physician terminals and interfaces to ophthalmic instruments have been
positioned in the general eye clinic and several key ophthalmic subspecialty
clinics.  Systematic,  modular hardware and software for clinical source data
acquisition have been established.  The clinical support system computer will
shortly be transfered to the newly dedicated Goldberg Research Center, adjacent
to the Illinois Eye and Ear Infirmary.  We look forward to stabilizing the
hardware configuration, telecommunication linkages and software support.

C.2) Clinical knowledge system support

C.2.a) Development of the relational database includes the following:

- A user interface through which unsophisticated users communicate with the
database.

- An intelligent coupler that serves as an intermediary between the end user
and the distributed database system.  The coupler listens to the user's
retrieval requests; helps the user formulate his requests correctly;
efficiently translates user's retrieval requests into a network-compatible
retrieval command language;  and obtains authorization from the system for
data retrieval and/or update.

- Tools for picture data management.  Graphical indexing techniques are
provided so that the clinical researcher and physician can easily retrieve
pictorial/graphical information from the medical database.

- Means for logical database synthesis.  This involves conversion of the user's
view of the database into a logically coherent physical organization.

C.2.b) Development of a computer-based consultation system for diagnosis and
     management of glaucoma.

   This involves on-going collaboration between Dr. Jacob Wilensky at MISL,
and, through SUrlFX-AIM, other investigators around the United States.  Included
are the original investigators in glaucoma consultation: Dr. Casimir Kulikowski
(Rutgers), Dr. Shalom Weiss (Mt. Sinai Hospital, NY), and Dr. Aaron Safir (Mt.
Sinai Hospital).

Privileged Communication

139

J. Lederberg


Section 6.2.5

MEDICAL INFORMATION SYSTEMS LABORATORY

C.2.c) Development of a consultation system for diagnosis and management of
     retinal/choroidal diseases.

   A design has been proposed (in Walser and McCormick, see below) for MEDICO,
a consultation system that advises non-expert physicians in the management of
chorioretinal diseases.  In addition, a major subsystem of MEDICO, responsible
for mediating tne acquisition and organization of rules, has been implemented.

C.2.d) Formal models for consultation systems.

   Petri nets have been studied, primarily by Murata (see below), as a formal
representation for interacting parallel processes.  Petri nets are similar to
causal networks, as described by Kulikowski and Weiss at Rutgers, except that,
with Petri nets, cyclic activity is easily represented.  The similarity between
Petri nets and inference nets has also been noted (Walser and McCormick).   The
utility of the Petri net framework for modelling physical processes was explored
by Walser, with the construction of a simulated coffee maker.  Further studies
are planned.

D.) LIST OF MISL PUBLICATIONS

Chang S. K., Donato N., McCormick B. H., Reuss J., and Rocchetti R. (1977) A
   relational database system for pictures. Proc. IEEE Workshop on Picture
   Data Description and Management, April 20-22, 1977, Chicago, Illinois.

Chang S. K. and Cheng W. H. (1975) A database skeleton and its application to
   logical database synthesis.  MISL report M.D.C. 1.1.17.

Chang S. K. and McCormick B. H. (1975) An intelligent coupler for distributed
   database systems.  MISL report r1.D.C. 1.1.7.

IYalone, J. E. (1976) Interval generalization of structure representation. MISL
    report M.D.C. 1.1.22.

Malone J. E. (1975) User's guide to uniclass cover synthesis.  MISL report M.D.C.
    4.4.1.

l"Ialone J. E. (1975) Addendum to AQVAL/l (AQ7), part 1: User's guide and program
    description.  MISL report M.D.C. 4.4.1.

Manacher G. K. (1977) The case for strong loops and selection structures in
   ordinary computer languages.  MISL report M.D.C. 1.1.21.

Manacher G. K. (1975) On the feasibility of implementing a large relational data
   base with optimal performance on a minicomputer.  Proc. International
   Conference on Very Large Data Bases, Framinqham, Mass.

McCormick B. d. and Nordmann B. J. Jr. (1977) Modular asynchronous control
   design.  Forthcoming in IEEE Transactions on Computers.  Also MISL report
    M.D.C. 1.1.25.

J. Lederberg

140

Privileged.Communication


MEDICAL INFORMATION SYSTEMS LABORATORY

Section 6.2.5

McCormick B. ii. and Aflendola H. C. (1977) Cytospectrometers for subcellular
   particles and macromolecules: design considerations.  Presented at Workshop
   on Theory, Design and Biomedical Applications of Solid State Chemical
   Sensors, Case Western Reserve University, March 23-30, 1977.  Also MISL
    report K.D.C. 1.1.24.

?lcCoraick B. ti. and Wilensky J.  (1975) Clinical knowledge acquisition: design of
    a relational data base in ophthalmology.  Proc. Second Annual Medical
   Information Systems Conference, ilrbana, 111.

McCormick B. H., Goldberg iq. F., and Read J. S. (1974) Clinical decision-making:
   design of a data base in ophthalmology.   Proc.  First Annual Medical
   Information Systems Conference, Urbana, Ill.

Michalski R. S. and Chang S. R.  (1976) A self-model for a relational database.
    NISL report r4.D.C. 1.1.16.

i9ichalski R. S. (1975) On the selection of representative samples from large
    relational tables for inductive inference.  tqISL report M.D.C. 1.1.9.

+lurata T. (1376) On liveness and other properties of E-Nets. MISL report M.D.C.
    1 .l .1'5.

Pkrata T. (1975) Bibliography on Petri nets and related topics.  MISL report
    i"i.D.C. 1.1.20.

rlurata T. (1976) A method for synthesizing marked graphs from .given markings.
   Presented at 17th Annual Symposium on Foundations of Computer Science,
   October 25-27, Houston, Texas.

i4urata T. (1976) d)n deadlock and the liveness of E-nets. Presented at the 17th
   Annual Symposium on Foundations of Computer Science, October 25-27,
    Houston, Texas.

idurata T. (1975) State equation, controlability, and maximal matchings of Petri
     nets,  ivIISL report M.D.C. 1.1.10.

ilurata T. and Church R. W. (1975) Analysis of marked graphs and Petri nets by
    matrix equations.  MISL report M.D.C. 1.1.8.

Vere S. A. (1975) Induction of concepts in the predicate calculus.   Proc. Fourth
    IJCAI.

Vere S. A. (1975) Relational production systems.  Forthcoming in Artificial
    Intelligence.  Also MISL report M.D.C. 1.1.5.

Walser R. L. and McCormick B. H. (1976) Organization of clinical knowledge in
    MEDICO.  Proc. Third Illinois Conference on Medical Information Systems,
    Urbana, Ill.

Walser R. L. and McCormick B. H. (1977) A system for priming a clinical knowledge
     base.  Fortlcoming in Proc. 1977 National Computer Conference, June 13-16,
    Dallas, Texas.

Privileged Communication

141

J. Lederberg


Section 6.2.5

MEDICAL INFORMATION SYSTEtiS LABORATORY

II) INTERACTION WITH SUMEX-AIE RESOURCE

A.) COLLABORATION

   Major collaboration at present is through the ONET, involving the
ophthalmology departments of five medical schools.  Dr. Jacob Wilensky is
actively engaged in evaluating and modifying the Glaucoma Consultation Program,
written originally by Shalom Weiss.

J. Lederberg

142

Privileged. Communication


Section 6.2.6

RUTGERS COMPUTERS IN BIOMEDICINE

6.2.6    RUTGERS COMPUTERS IN BIOMEDICINE
                         _-- _- .

Rutgers Research Resource - Computers in Biomedicine

  Principal Investigator:  Saul Amarel
Rutgers University, New Brunswick, New Jersey

I> SUMMARY OF RESEARCH PROGRAH
    -- -

A) Goals and Approach

    The fundamental objective of the Rutgers Resource is to develop a computer
based framework for significant research in the biomedical sciences and for the
application of research results to the solution of important problems in health
care.  The focal concept is to introduce advanced methods of computer science -
particularly in artificial intelligence - into specific areas of biomedical
inquiry.  The computer is used as an integral part of the inquiry process, both
for the development and organization of knowledge in a domain and for its
utilization in problem solving and in processes of experimentation and theory
formation.

   The Resource community includes 48 researchers - 30 members, 8 associates
and IO collaborators.  Members are mainly located at Rutgers.   Collaborators are
located in several distant sites and they interact, via SUMEX-AIM, with Resource
members on a variety of projects, ranging from system design/improvement to
clinical data gathering and system testing.  At present,  collaborators are
located at the Mt.Sinai School of Medicine, N.Y.; Wasnington University School of
Medicine, St. Louis, MO.; Johns Hopkins Medical Center, Baltimore, Md.; Illinois
Eye and Ear Infirmary, Chicago, Ill.; and the University of Miami.

   Research in the Rutgers Resource is oriented to tldiscipline-oriented"
projects in medicine and psychology, and to llcore" projects in computer science,
that are closely coupled with the "discipline-oriented" studies.   Work in the
Resource is organized in three AREAS OF STUDY; in each area there are several
projects.  The areas of study and the senior investigators in each of them are:

(1) Medical Modeling and Decision Making (C. Kulikowski, A. Safir).

(2) Modeling Belief Systems and Common-sense Reasoning (C.F. Schmidt, N.S.
  Sridharan).

(3) Artificial Intelligence: Representations, Reasoning and System Development
  (S.  Amarel)

   In addition, the Rutgers Resource is sponsoring an Annuai National AIM
Workshop, whose main objective is to strengthen interactions between AIM
activities, to disseminate research methodologies and results, and to stimulate
collaborations and imaginative resource sharing within the framework of AIM. The
second AIi`4 Workshop was held near the New Brunswick Rutgers Campus on June l-4,
1976.  The third Workshop is scheduled for July 6-8, 1977.

J. Lederberg

144

Privileged.Communication


I~UTGERS COHPUTERS IrJ BI3MEDICIN.E

Section 5.2.6

9) Medical Relevance;   Collaborations

   A major part of our research is focusing on the development of computer
based medical consultation systems.  We are using artificial intelligence
approaches in problems of: knowledge acquisition from experts in a medical
specialty and from their clinical experience;  the representation and management
of these complex and changing data bases of medical knowledge within the
computer; and the development of a sufficiently rich repertoire of reasoning
strategies for diagnosis, prognosis, therapy selection, explanation and teaching.
By linking such a system to a data base of prospectively chosen cases, we are in
the position to provide a powerful tool for clinical research with built-in
interpretative capabilities.

   i)ur approach emphasizes the development and application of clinically
useful models that describe the patnophysiology and dysfunction of diseases in a
variety of tasks:

a) Consultation embodying expert knowledge, which is expressed in terms
acceptable to the clinician;

5) Clinical research aid, assisting the investigator to;

i) Summarize and incorporate his knowledge, experience, and opinions into
a computer system;

ii) Analyze his data, cheek it against that of other investigators, pooling
  it when appropriate to draw stronger conclusions based on the large
  sample of cases;

iii) Test, evaluate and modify the data base of models and decision
  strategies to produce an up-to-date summary of experience in his
   specialty.

c) Screening and diagnosis, to aid nursing or paramedical personnel in
performing routine decision procedures within restricted medical
environments;

d) Instruction to provide practitioners and support personnel with
appropriate explanation and guidance in clinical decision-making.

   A unique and novel aspect of our work is the creation of a network of
clinical investigators to collaborate on the testing and continued development of
the computer programs needed to accomplish the above tasks.  During 1976, the
ophthalmological network (ONET) of glaucoma investigators has grown and
established itself, with several significant collaborative research projects
currently underway.  The consultation program for
                           -glaucoma using the causal
associational network (CASNET) model developed within the Rutgers Resource, was
jointly presented by the ONET members at the 1976 meeting of the Association for
Research in Vision and Ophthalmology.  An important new emphasis has been the
incorporation into the consultation program of alternative expert opinions on
subjects currently under debate. Dr.  Douglas Anderson of the Bascom-Palmer Eye
Institute at the University of Miami has joined ONET to provide such alternatives
and strengthen the glaucoma model in certain important areas.  The SUMEX-AI!4
shared computer resource has been essential to the activities of ONET.

Privileged Communication

145

J. Lederberg


Section 5.2.6

RUTGERS COMPUTERS IN BIOMEDICINE

   The knowledge base and the strategies of our CASNET glaucoma consultation
system are being strengthened and refined continuously in the ONET environment.
The system is now at a point where it is considered by leading ophthalmologists
as *'highly competent to expert" in several subspecialties of glaucoma.  The ONET
group was confident enough about the system to demonstrate it at the October 1976
meeting of the American Academy of Ophthalmology and Otolaryngology.  The
reactions to the system were most favorable.  The response of an independent
sample of ophthalmologists taken at this meeting strongly emphasized the
importance of the system for glaucoma research.

   In addition to the main glaucoma research activities, the Resource has
collaborated with the Mt.  Sinai-Rutgers Health Care Computer Laboratory in the
development of models for refraction and visual fields. These will be used by
clinical prototype pro=ms for guidirparamedical personnel in data acquisition
and decision-making.  These programs run on the PDP-11 computers of the clinical
ophthalmological system at Nt.  Sinai,  which are to be linked to the PDP-10 at
Rutgers for accessing the more complex models of disease when they are needed.
The activities in conjunction with the Health Care Computer Laboratory reflect
the more applied aspects of our work in the medical area.

    The collaboration with Dr. R. Nordyke of the Straub Clinic on thyroid
disease consultation systems has continued at a low level of activity during
1976.

   In the area of Belief Systems,  collaboration has continued with Professor
Andrea Sedlak and her group at the University of North Carolina. This
collaboration is focusing on developmental aspects of action perception.

    In the AI Area we had extensive interactions with researchers in several
              --
institutions on problems of representation, problem solving systems, natural
language processing, automatic programming, data base systems, and interactive
systems.  Contacts continued with the natural language group at BBN (Woods,
Bruce) on the design of natural language processors for medical systems. Also,
we had contacts with the Stanford-Xerox group (Winograd, Bobrow) which is
involved in the development of KRL (Knowledge Representation Language).

   Following the Rand Workshop on Biomedical Modeling (February 18-20, 1976),
in whicn S.  Amarel participated, preliminary contacts started with Dr. D.
Garfinkel from the University of Pennsylvania in connection with possible
applications of AI methods to the modeling of metabolic processes.

   Our close contacts with the Stanford projects on Heuristic Programming
(Drs.  Buchanan, Feigenbaum, Lederberg) are continuing.   The orientation and
approach of these Stanford projects are very similar to ours.  We continue to
share with the investigators in DENDRAL and METADENDRAL a strong interest in
computer-based methods of scientific inference and in AI ideas and techniques for
representation of knowledge in computers, diagnostic problem solving and theory
formation.

   One of the significant collaborative developments this period was the joint
work of Ed Feigenbaum and his students at Stanford, and Saul Amarel and his
students at Rutgers, on the development of an AI Handbook.  This handbook is
being prepared on the SUMEX-AIM and RUTGERS-10 computers, and it is intended to

J. Lederberg

146

Privileged -Communication


RUTGERS COMPUTERS IN BIOMEDICINE

Section 6.2.6

provide a network-accessible encyclopedic coverage of the AI field for the AIM
community and AIM guests.

C) Progress Summary

1. Areas of Study and Projects

a) Hedical Modeling and Decision-Making

   The consolidation of the opthalmological network (ONET) of collaborating
glaucoma investigators using the SWlEX-AIM shared resource facility, the testing
and improvement of the CASNET consultation system with the help of the
collaborators, the design and implementation of a time-oriented database system
and a set of analysis programs for aiding joint clinical research activities
within O;G3T, and the development of a new knowledge-based consultation system
(IRIS), represent the main achievements in the last year.

   The network of investigators in glaucoma is designed to foster development
of consultation systems that embody sufficient depth for knowledge and expert
opinion in a variety of subareas to be useful as research and teaching tools.
The collaborative activities, coordinated by Dr.  A. Safir at Mt. Sinai, bring
together selected scientist-users with complementary interests and strengths in
different aspects of glaucoma, and Resource investigators who are concentrating
on the development of new computer science methodologies in modeling and problem
solving.  During this period, there has been more extensive testing of the CASNET
glaucoma consultation program.  The collaborators had several meetings to discuss
the structure of the glaucoma model and suggested many improvements and
additions.  A significant new capability of the program is the inclusion of
alternative interpretations that capture differences of opinions among the
experts on aspects of the model that are currently under debate.

   A neti development during this period has been the implementation of a tine-
sequenced data base for glaucoma, which has the dual purpose of aiding the
clinical research of ONET collaborators and of providing a systematic means for
evaluating and improving the performance of the consultation programs.

   In the area of general methods and systems we have developed a multilevel-
semantic network representation for characterizing disease processes, their
anatomical descriptions and their taxonoaic identification.  This is used by a
set of normative rules for diagnostic, prognostic and therapeutic reasoning,
which results in a very general and flexible system for clinical consultation. A
prototype model called IRIS is being developed using the glaucoma knowledge-base.
We have also continued our investigations of other representation paradigms: a
frame-based approach and the relationship to mathematical models of optics and
refraction.  Another subproject is concerned with developing methods of inference
over network structures that will permit us to incorporate the results of
clinical experience with different groupings of case-types into the models of
consultation, aiding at the same time in the evaluation of the programs.

Privileged Communication

147

J. Lederberg


Section 6.2.6

RUTGERS COMPUTERS 13 BIOMEDICINE

b) Modelin of Belief Systems and Common-Sense Reasoning
            -                  .-             -

   During this period a major achievement was the development and
implementation of the AIHDS system.  This is an MDS-based system that iS
specialized and augmented for use in modeling reasoning about actions. A
noteworthy aspect of the system is the use of the MDS concepts of Consistency
Conditions and Residues to guide frame instantiations and the drawing of further
inferences from sucn frame instantiations.

   The BELIEVER theory is a psychological model of the processes involved in
tne interpretation and common-sense reasoning about observed human actions.  The
AWiDS system is being constructed to provide a framework for formulating,
studying and testing the BELIEVER theory.  The computer system and the
psychological theory are growing together, and they are strongly influencing each
other's development.  The domain of common-sense reasoning about actions
represents a prototypical example of knowledge based reasoning.  The richness of
the psychological data that this theory must explain, namely, persons' linguistic
descriptions and summarizations of everyday behavior, has forced us to think very
carefully about how knowledge is to be represented and used.  Out of this has
emerged a general scheme that not only seems psychologically plausible but also
appears to provide a useful framework for viewing a wide variety of problems of
interpretation including medical diagnosis and theory-based interpretive problems
involved in organic chemistry.

   Along with the implementation of the system, we have developed the
representation of the central knowledge components of the BELIEVER theory.   The
central common-sense concepts of Person, Plan and Act have been represented as
frames.  These frames are highly articulated structures which express the core
assumptions of the common-sense psychological theory.  By expressing these
concepts as frames we have been able to provide a representation of these
assumptions that can be used to guide and control the overall processes of
reasoning about particular persons, plans and actions.  The procedural components
of the theory have been defined and are closely linked to these frames.  This
interplay and association between processes and highly articulated structures
promises to provide a basis for strongly decomposing the knowledge of the domain.
Since the interdependencies of these concepts are represented structurally rather
than procedurally, the active database of our PIDS-based system provides the basis
for communication and cooperation between the processes that monitor these
person, plan and act frames.

   The definition of these central structural components together with the
general system components have also provided a competence theory within which
detailed predictions of the BELIEVER theory were specified.  These predictions
about the structure of summary protocols were tested and borne out by the data.
This provides one of the few examples of the verification of predictions derived
from work on the development of psychological theory using AI concepts in the
process of theory formation.

J. Lederberg

148

Privileged-Communication


RUTGERS COMPUTERS IN BIOMEDICINE                           Section 6.2.6

c) Artificial  Intelligence; Representations, ReasonLn& and
    Systems Development

    Our work in this area continues to be oriented to collaboration with
investigators in other Resource projects and to study of basic AI problems that
are related to Resource applications.  The collaborations involve adaptation and
augmentation of existing AI methods and techniques to handle specific key
problems identified in the application projects.

    The close collaboration with investigators in the Belief Systems area has
resulted this year in the development of the AIMDS System for handling problems
of action interpretation of the type encountered in the domain of the BELIEVER
theory.  This system has provided one of the first examples of a working frame-
based AI system.  In addition, it has led to several important AI results, such
as elucidation of the "frame problem" and unification of previous approaches to
planning in heuristic problem solving.

   Our research in language processing has led this period to two important
applications - in Medical Systems and in Belief Systems.  In one project, the
PEDAGLOT system is being adapted to provide a natural language interface for
communicating patient case histories to our glaucoma system.   In a second
project, PEDAGLOT is providing the basis for implementing the experimental
component of a competence theory within which the BELIEVER theory can be
evaluated.  Empirical work in this area requires the ability to process summaries
and other natural language data.

   In the basic component of our work on language processing, we continued to
develop a language inference system based on a "developmental paradigm" for
grammar acquisition.  We made progress in the area of coalescing rules of
hypothesized grammars, and we started to look into ways of using semantic
information to guide the hypothesis formation process.

   In another project, which is also focusing on hypothesis formation, we are
studying processes of computer assisted acquisition of domain knowledge from
empirical data, where knowledge is in the form of weighted production rules.
This type of knowledge can be represented as a stochastic Traph.  This year we
obtained several new results in this area.  We explored the implications of these
results with the help of an experimental program which constructs a stochastic
graph from empirical data.  Also, we wrote a program which makes use of a file of
graph-structured knowledge to make decisions about a domain.

    In our work on theory formation in programming, we developed a formation
strategy which combines a global, model-guided, approach with a local analysis of
special cases.  In order to study experimentally this strategy, we are now
developing a system for acquiring and handling information about programs in
various stages of specification, as well as other knowledge which is relevant to
the formation task.

   During this period we made important progress in building a strong basis of
AI languages for our work.  The UCI-LISP and FUZZY programming languages were
adapted to the RUTGERS-10 and they were further improved.  The availability of
these languages made possible the implementation of major parts of AIMDS over a
relatively short period of time.  Work has now started on exploring the use of

Privileged Communication

149

J. Lederberg


Section 6.2.6

RUTGERS COMPUTERS IN BIOMEDICINE

FUZZY (including its features for effective use of incomplete and/or uncertain
Knowledge) and AIMDS in certain problems of medical decision making.

2. AIX Workshop
   ~.

   'The Second AI?1 Workshop took place June 1 to 4, 1976 near the Rutgers
campus, and it was attended by about 150 participants.  The program included
reviews of recent AI developments in ?4edicine, Biochemistry and Psychology;
lectures and panel discussions on knowledge representation and AI system design;
papers sum;aarizin.g recent AI work in other application areas (outside AIM); and
presentations of current research on computer-based biomathematical models.  The
Workshop included panels on networking and shared resources; in addition, there
were a number of informal meetings in which specific projects or issues were
discussed in depth.  Hands-on experimentation and demonstration of AI systems
(whic!l were accessed via TYMNET and ARPANET) were an important feature of the
Workshop.  All indications are that the Workshop was very effective in
stimulating scientific interactions anJ in disseminating work being done in the
area of AIM.

   In support of the AIM Workshop series we devoted considerable effort this
period to systems developmn,  to related computer and networking enhancements,
to preparation of proceedings for the first Workshop, and comprehensive
supporting documentation for the second.

   A panel on Applications of AI to Science and Medicine was organized for the
                              ----                      -
week following the Second AIM Workshop at the National Computer Conference in New
York.  It was intended to further augment the dissemination activities of AIM by
bringing to a wide audience of professionals in the computer field recent
developments in the AIM community.

D) Up-to-Date List of Publications

Amarel, S. and Kulikowski, C.  (1372) f114edical Decision Making and Computer
   Modeling, Proc.  of 5th International Conference on Systems Science,
   Honolulu, January 1972.

Amarel, S, (1974) "Inference of Programs from Sample Computations", Proc. of
   NATO Advanced Study Institute on Computer Oriented Learning Processes, 1974,
   Bonas,  France.

Amarel, S. (1974) "Computer-Based Modeling and Interpretation in Medicine and
   Psychology: The Rutgers Research Resource", Proc.  on Conference on the
   Computer as a Research Tool in the Life Sciences", June 1974, Aspen, by
   FASEB; also appears as Computers in Biomedicine TR-29.  June 1974, Rutgers
   University, also in Computers in Life Sciences.
                                  --.-        W. Siler and D. Lindberg
   (eds.1, Faseb and Plenum, 1975.

Amarel S. (1976) Abstract of Panel on "AI Applications in Science and Medicine"
   in 1976 National Computer Conference Program, N.Y., June 7-10, 1976.

Bruce B. (1972) "A Model for Temporal Reference and its Application in a Question
   Answering Program", in "Artificial Intelligence", Vol. 3, Spring 1972.

J. Lederberg

150

Privileged Communication



RUTGERS COMPUTERS IN BICMEDICINE

Section 6.2.6

Bruce, B. (1973) "A Logic for Unknown Outcomes", Notre Dame Journal of Formal
   Logic; also appears as Computers in Biomedicine, TM-35, Nov.  1973, Rutgers
   University.

Bruce, 8. (1973) "Case Structure Systems", Proc.  3rd International Joint
   Conference on Artificial Intelligence (IFCAI), August 1973.

Bruce, B. (1975) "Belief Systems and Language Understanding", Current Trends in
   the Language Sciences, Sedelow, and Sedelow (eds.) Houton, in press.

Chokhani, S.  and Kulikowski, C.A. (1973) "Process Control Model for the
   Regulation of Intraocular Pressure and Glaucoma'f, Proc.  IEEE Systems, Man &
   Cybernetics Conf., Boston, November 1973.

Chokhani, 5. (1975) "On the Interpretation of Biomathematical Models Within a
   Class of Decision-Making Procedures", Ph.D.  Thesis, Rutgers University;
   also Computers in Biomedicine TR-43, Hay 1973.

Fabens, W. (1972) "PEDAGLOT.  A Teaching Learning System for Programming
   Language", Proc.  ACM Sigplan Symposium on Pedagogic Languages, January
   1972.

Fabens, W. (1975) "PEDAGLOT and Understanding Natural Language Processing".
    Proc.  of the 13th Annual Meeting of the Asso.  of Computational
   Linguistics, October 30 - Nov. 1, 1975.

Kulikowski, C.A. and Weiss, S. (1972) "Strategies for Data Base Utilization in
   Sequential Pattern Recognition", Proc.  IEEE Conf.  on Decision and Control,
   Syw.  on Adaptive Processes, December 1972.

Kulikowski C.A. and Weiss, S. (1973) "An Interactive Facility for the Inferential
   Modeling of Disease", Proc.  7th Annual Princeton Conf. on Information
   Sciences and Systems, March 1973.

Kulikowski C.A. (1973) "Theory Formation in Medicine: A Network Structure for
   Inference", Proc.  International Conference on Systems Science, January
   1973 *

Kulikowski, C.A. Weiss S. and Safir, A.  (1973) flGlaucoma Diagnosis and Therapy
   by Computertf, Proc.  Annual Meeting of the Asso.   for Research in Vision and
   Ophthalmology, May 1973.

Kulikowski, C.A. (1973) "Medical Decision-Making and the Modeling of Disease",
    Proc.  First Interntl.  Conf.  on Pattern Recognition, October 1973.

Kulikowski, C.A. (1974) "Computer-Based Medical Consultation - A Representation
   of Treatment Strategies", Proc.  Hawaii Interntl.  Conf.  on Systems
   Science, Jan.  1974.

Kulikowski, C.A. (1974) "A System for Computer-Based fledical Consultation", Proc.
    Natl.  Computer Conf., Chicago, May 1974.

Kulikowski, C.A. and Safir, A.  (1975) "Computer-Based Systems Vision Care",
   Proceedings IEEE Intercon, April 1975.

Privileged Communication

151

J. Lederberg


Section 6.2.6

RUTGERS COMPUTERS IN BIOHEDICINE

Kulikowski C.A. and Trigoboff, P?. (1975) "A Multiple Hypothesis Selection System
   for Medical Decision-i&king" , Proc.  8th Hawaii Internatl. Conf. on
   Systems.

Kulikowski, C. h14.S. Sridharan, (1975) "Report on the First Annual AIM Workshop
   on Artificial Intelligence in Medicine.  Sigart Newsletter I\Jo. 55, December
   1975.

Rulikowski C. (1976) **Computer-Based Consultation Systems as a Teaching Tool in
   nigher Education, 3rd Annual N.J. Conf.  on the use of Computers in Higher
   Education, tiarch 1976.

Kulikowski, C., Weiss S., Safir, A. et al (1976) '+Glaucoma Diagnosis & Therapy by
   Computer:  A Collaborative Network Approach" Proc.  of ARVO, April 1976.

Kulikowski, C. Weiss, S. Trigoboff, M. Safir, A.,  (1976) **Clinical Consultation
   and the Representation of-Disease Processes*+, Some AI Approaches, AISB
   Conferences, Edinburgh, July 1976.

LeFaivre, R. and Walker, A. (1975) ffRutgers Research Resource on Computers in
   Biomedicine, HII,  Sigart Newsletter No.  54, October 1975.

LeFaivre, R., (1976) "Procedural Representation in a Fuzzy Problem-Solving
   System", Proc. Natl. Computer Conf., New York, June 1976.

LePaivre,R. (1977) **Fuzzy Representation and Approximate Reasoning", submitted to
   IJCAI-77, HIT.

Mathew, R., Kulikowski, C. and Kaplan, I<. (1977) liA Multileveled presentation for
   Knowledge Acquisition in Medical Consultation stems", Proc. MEDINFO 77 (in
   press).

Mauriello, D. (1974) '*Simulation of Interaction Between Populations in Freshwater
   Phytoplankton", Ph.D.  Thesis, Rutgers University 1974.

Schmidt, C. (1972) "A comparison of source unidimensional, multidimensional and
   set theoretic models for the prediction of judgements of trail implication**,
    Proc.  Eastern Psych.   Asso.  Meeting, Boston, April 1972.

Schmidt, C.F. and D'Addamio, J. (1973) *'A Model of the Common Sense Theory of
   Intension and Personal Causation", Proc.  of the 3rd IJCAI, August 1973.

Schmidt, C.F. and Sedlak, A. (1973) "An Understanding of Social Episodes", Proc.
   of Symposium on Social Cognition, American Psych.   Asso.  Convention,
   Montreal, August 1973.

Schmidt, C.F. (1975) '*Understanding Human Action", Proc.  Theoretical Issues in
   Natural Language Processing: An Interdisciplinary Workshop in Computational
  Linguistics, Psychology, Artificial Intelligence, Cambridge, Mass., June
   1975.  Also appears as Computers in Biomedicine, TM-47, June 1975, Rutgers
   University.

J. Lederberg

152

Privileged Communication


RUTGERS COMP;ITERS IN BIOl!EDICINE

Section 6.2.6

Schmidt C. (1975) "Understanding Buman Action: Recognizing the Motives**,
  Cognition and Social Behavior, 5.5.  Carroll and J.
  Lawrence Exbaurn Associates, in press.         Payne (eds.), New York:
                            Also appears as Computers in

Biomedicine, TR-45, Juhe 1975, Rutgers University.

Schmidt C.F., Sridharan, N.S., and Goodson, J.L.  (1976) Recognizing plans and
   suinmarizing actions.  Proceedings of the Artificial Intelligence and
   Simulation of Behavior Conference, University of Edinburgh, Scotland, July
   1976.

Schmidt C. (1976) Understanding human action: Recognizing the plans and motives
   of other persons.  In (eds.  J. Carrol and J. Payne) Cognition and Social
                                                                           ---
   Behavior, Potomac, ;4aryland: Lawrence Earlbaum Associates, 1976.

Schmidt, C.F.  and Goodson, J-L. (1976) The Subjective Organization of Summaries
   of Action Sequences,  17th Annual Meeting of the Psychonomic Society, St.
   Louis, 1976.

Sedlak, A.J. (1974) **An Investigation of the Development of the Child's
   Understanding and Evaluation of the Actions of Others**, Ph.D.  Thesis,
   Rutgers University.

Sridharan, N.S. (1976) "The Frame and Focus Problems in AI: Decision in Relation
   to the BELIEVER System.  Proceedings of the Conference on Artificial
   Intelligence & the Simulation of Buman Behavior, Edinburgh, July 1976.

Sridharan, N.S. (1976) **An Artificial Intelligence System to Model and Guide
   Organic Chemical Synthesis, Planning in Chemical Synthesis & Computer,
  American Chemical Society Press, September 1976.

Sridharan, N.S. and Schmidt,C.F. (1977), Knowledge-Directed Inference in
   BELIEVEA, Workshop on Pattern-Directed Inference Systems, Hawaii, Play 1977.

Srinivasan, C.V. (1973) "The Architecture of a Coherent Information System: A
   General Problem Solving System", Proc.  of the 3rd IJCAI, August 1973.

Trigoboff, i4. (1976) Propagation of Information in a Semantic Net", Proc. of the
  Conference on Artificial Intelligence and the Simulation of Behaviour,
  Edinburgh, Scotland, July 1976; updated version appears in CBM-TM-57, Dept.
   of Computer Science, Rutgers University, 1977.

Tucker, S.S. (1974) Cobalt Kinetics in Aquatic l`licrocosmstf, Ph.D.  thesis,
   Rutgers University.

Van der Mude, A. and Walker,. A. (1976) l*Some Results on the Inference of
   Stochastic Grammars**, abstract in Proc.  Symposium on New Directions and
   Recent Results in Algorithms and Complexity.  Dept.  of Computer Science,
   Carnegie-Mellon University.

Vichnevetsky, R. (1973) "Physical Criteria in the Evaluation of Computer Methods
   for Partial Differential Equations", Proc.  7th Internatl. AICA Congress,
   Prague, Sept.  1973; reprinted in Proc. of AICA, Vol.  XVI, No. 1, Jan.
   1974, European Academic Press, Brussels, Belgium.

Privileged Communication

153

J. Lederberg


Section 6.2.5

RUTGERS COMPUTERS IN BIOMEDICINE

Vichnevetsky, R., Tu, K.W., Steen, J.A. (lg'j'4), "Quantitative Error Analysis of
  Numerical Methods for Partial Differential Equations", Proc.  8th Annual
  Princeton Conference on Information Science and Systems, Princeton
   University, March 1974.

Walker, A. (1975) "Formal Grammars and the Regeneration Capability of Biological
   Systems",  Journal Comp.  and Syst. Sciences, Vol.  ll,No. 2, 252-261.

Weiss, S. (1974) !'A System for Model-Based Computer-Aided Diagnosis and Therapy",
   Parts I and II, Ph.D.  Thesis, Rutgers University; also Computer in
   Biomedicine TR-27, Feb.  1974 -

Weiss, S ., Kulikowski, C. and Safir, A. (1977) "Glaucoma Consultation Computerl',
  Computers in Biology and Medicine (in press).

II) INTERACTIONS WITH THE SUMEX-AIM RESOURCE
                  --

   During the past year we have continued to use the SUMEX-AIM resource for
program development and testing,  for communications between collaborators
distributed in different parts of the country and for preparation and running of
the AIM Workshop.  We continue to access SUMEX-AIM via TYMNET, and to a smaller
extent via ARPANET.  SUMEX-AI14 played a key role in consolidating our network of
collaborators in ophthalmology (ONET) and in providing the support needed for
establishing a productive collaboration among the ONET investigators.  Also, it
has been most useful in communicating, planning and helping to set up the
information pool for the Second AIM Workshop.

   Computing in the Rutgers Research Resource continues to be distributed
between SUlMEX-AIM and the RUTGERS-lo.  The two computers are providing
complementary resources for our research and for our national collaborations. At
present, the distribution of our computing is about 3 to 1 between RUTGERS-10 and

J. Lederberg

154

Privileged Communication


RUTGEHS COMPUTERS IN BIOMEDICINE

Section 6.2.6

SUQIEX-AIM.  Our total demand at SUMEX-AIM is estimated at about 5000 connect
hours for the current year with most of the work done in INTERLISP (about 801 of
our total connect hours) and the rest devoted mainly to communications and to
limited program testing within OtiET.

   The SUMEX-A114 facility was used for demonstrations of AIM programs in first
year classes and in second year seminars at the Rutgers Medical School, CMDNJ;
CASNET, MYCIN, INTERNIST and PARriY were interactively accessed in these classes
and seminars.  Another innovative use of SUMEX-AIM has been the collaborative
development of the AI HANDBOOK, which is intended to provide a computer-based and
network accessible encyclopedic coverage of the AI field for the AIM community
and AIM guests.  The AI HANDBOOK was initiated by Dr. E. Feigenbaum and his
students at Stanford.  During the year, a graduate class at Rutgers, given by Dr.
S.  Amarel, worked on the AI HANDBOOK and contributed several articles.

    We find that the SUMEX-AIM bulletin board plays an important role in
communicating ideas and information on services among users.  Since the MYCIN
group at Stanford regularly posts summaries of meetings; and other technical
information, on the MYCIN bulletin board, we have been able to keep track of
their program and problems.  This was particularly useful for our work on IRIS
where concepts close to the MYCIN CF formalism are being studied.

Privileged Communication

155

J. Lederberq


Section 6.3

PILOT STANFORD PROJECTS

   6.3 --
         PILOT STANFORD PHOJECTS

   The following are descriptions of the informal pilot projects currently
using the St.anford portion of the SUlclEX-AIM resource pending funding, and full
review and authorization.

J. Lederberg

158

Privileged Communication


GENETICS APPLICATIONS PROJECT

Section 5.3.1

6.3.1   GENETICS APPLICATIONS PROJECT
                              --

Computer Science Applications in Genetics

    Prof. L. L.  Cavalli-Sforza
     Department of Genetics
Stanford University School of Medicine

   we have been quite satisfied with the use of programs such as REDUCE, MLAB,
SPSS.  REDUCE has been used by graduate student D. Wagener, to check algebra, and
also by L. Cavalli-Sforza and has been of great help in circumstances in which
algebraic manipulations were too lengthy for hand verification.  Unfortunately
REDUCE has a maximum length of algebraic expansions that can be manipulated by
computer, which is not always generous enough for our purposes; the maximum
allowed was increased but there is now no warning as of when the length of
expression overruns the new limits.  The penalty is the total loss of the
information.  If this could be mended, the program would be much more useful.
MLAB is very useful for least square fitting of complex systems of equations.
SPSS is widely used and well known; it is working fine in the system.

    Special modelling efforts involved:  I) a program of information storage and
retrieval which may be useful also for analysis of multi-dimensional contingency
tables.  The material to which it was applied derives from anthropological and
archeological survey and excavation data in Calabria, Italy by A. Ammerman.   The
information collected on coordinates of sites, material found, elevation, land
form, soil, ecological and geological data etc.  refers to hundreds of sites and
will eventually be subject to analysis according to models of growth and spread
of Neolithic populations.  It is eventually hoped to investigate the power of new
techniques of statistical analysis, employing spectral analysis of the matrices
representing the data.  2) Similar situations,  on the basis of other data
available from the literature, are also being investigated by means of
simulations of the population growth and spread, e.g. for the Bandkeramik
populations in Central Europe.  It is thus hoped to obtain, eventually, an
explanation of the geographic distribution of genes in Europe, the Middle East
and nearby areas, based on the hypothesis that the present distribution reflects
predominantly a major radiation of a population of farmers which took place with
the spread of agriculture from the Middle East, from 9000 to 5000 years ago. 3)
The geographic distribution of genes, as observed today, is analyzed by means of
gene frequency maps.  We have developed many methods of interpolation of data for
map construction, and many methods of graphical display of the maps obtained. We
are currently comparing the methods of construction of maps.  Some of the methods
of construction are fairly sophisticated, but more work will be necessary to
develop further our programs so that they can be considered to interpolate
intelligently.  Our tests of validity are based on eliminating each observation
in turn, computing its expected value with the observed one (a sort of jack-
knifing).  It is clear that results could be improved if this procedure could be
carried out simultaneously for several genes and alleles; at the moment it is
done for one allele at a time.  The simultaneous analysis is an ambitious program
but would considerably improve present results.  At the moment, for instance, we
have no way to make gene frequencies of all alleles at a locus sum to 100%
(except approximately, because we cannot consider more than one allele at a

Privileged Communication

159

J. Lederberg


Section 6.3.1

GENETICS APPLICATIONS PROJECT

time).  In addition, other information on the populations (whether they are
isolates, etc.) could be introduced, and verified by the program.  Also, specific
hypotheses on the evolutionary factors affectin, g the gene frequencies could be
tested more directly.  At the moment, the major limitation to these more
sophisticated analyses is the availability of computer space.

J. Lederberg

160

Privileged Communication


BAYLOR-METtIODIST CEREBROVASCULAR PROJECT

Section 6.3.2

6.3.2    BAYLOR-METHODIST CEREBROVASCULAR PROJECT

Baylor-i4ethodist Cerebrovascular Project

          John'L. Gedye, M.D.
      Data Services ,Research Laboratory
Department of Neurology, Baylor College of Medicine

   During tne year the Data Services Research Laboratory has had a total of
about 2,500 hours of man-effort available, of which about 5% has been devoted to
activities directly related to the Sumex pilot study.

I) Summary of research program
             -

A) Tecllnical goals

   The general goal of the laboratory - the creation of a computer-based
system for the support of clinical research in neurology, as described in the
1975-76 annual report - remains unchanged.

   In spite of the limited manpower available during the year, good progress
has been made toward the specific goal of developing the PDPII/35-based clinical
research system 'CLINSYS' to a point where it can begin to give real support to
Departmental projects.

   We have made good progress in recent weeks with the development of software
which will allow easier access to the resources of SUMEX for users of our local
system. It is now possible to give the command 'SUMEX' to our local system
executive and have the entire lo%in procedure through to receipt of the "final"
sut.fEx '0' carried out automatically.  Control characters allow the user's
terminal to be switched between SWEX and the local system, and these have been
chosen to be compatible with the BARANARD control characters, so that this can be
operated without interference.

   Facilities have been provided which allow ASCII files to be be created on
either system and transferred to the other.  These facilities will operate under
our local PD?l l/35 batch system, and we have tested them by creating a test data
file of about 1,000 ASCII characters on an account on the PDPI I/35, and
submitting a batch job (to run at specified time) which logs into SUZZY,
transfers the test data file and copies it back again onto the PDPI1/35 account
and logs out. It then log s in again and repeats the whole process with the latest
copy of the file. In this way we hope to estimate the reliability of this form of
data transmission - at present it looks as if the error rate will be less than I
in 16,000 characters - and to lay the foundations for a system that will allow us
to make maximum use of SUMEX off-peak time in the projects described below.

Privileged Communication

161

J. Lederberg


Section 6.3.2                     BAYLOR-METHODIST ZEREBROVASCULAR PROJECT

B) iiedical relevance and collaboration

   The development of CLINSYS has continued on the general lines described in
the 1975-76 annual report.  Specific data acquisition procedures have been
designed and implemented for: clinical psychology - both conventional and
automated testing techniques have been accommodated; clinical physiology -
facilities for the manual entry of Xe133 inhalation regional cerebral blood flow
measurements have been provided, and work is now in progress on a system for
direct transmission of data to the PDP11/35 from the integral PDP11/05 which is
part of the equipment ; and hematology - provision has been made for the
acquisition of data from tests of platelet function.

   Because of it's central importance,  a major emphasis has been placed on
making provision for the acquisition of suitably summarised CT scan data, and a
number of exploratory studies have been carried out with the result that we hope
to have the first edition of a 'CT scan system' working in the near future. This
will have an important part to play in future projects.

   No further progress has been made with the implementation of a work station
incorporating the hand-held OCR wand developed by Recognition Equipment
Incorporated - which was described in the 1375-76 report - but we intend to make
use of such a 'wand' work station in the context of a system for acquiring data
from the radiologist's 'CT scan report' as part of the 'CT' record.

C) Progress summary

   The aim of our 'pilot study' remains unchanged - to formulate a project
relevant to the activities of the Department which will provide an acceptable and
legitimate 'point of entry' for artificial intelligence research, and which will
allow the systematic formulation of objectives for the future.

   Work nas continued along the lines discussed in the 1975-76 report, using,
as test data, results from 69 demented patients and 15 controls who had had
regional cerebral blood flow measurements. This work has led to a promising 'AI'
approach which is now being applied to CT scan data, and when the feasibility of
this nas been demonstrated the way will be open for work to go head on the
implementation of a general purpose program.

D) Publications

   'There are as yet no publications dealing with the 'pilot study' as such.
Certain aspects of the work referred to in this report have been mentioned in
publications but these are all currently 'in press'. Details are available on
request.

J. Lederberg

162

Privileged Communication


BAYLOR-METHODIST CEREBROVASCULAR PROJECT

Section 6.3.'

II) Interactions with the SUPIEX-AIivl resource
                   --

A) Little has so far been achieved by way of collaborations through the
network, although the SNDMSG facility has been useful for keeping in touch
with contacts made at the 1975 workshop.

It is hoped though, that in the future we may be able to test out the
concept of a CT scan archive created by the joint efforts of a dispersed
community of users.

8) For some reason I did not hear about the 1976 workshop until it was over,
  and so far have heard nothing about a 1977 one.  I found the 1975 workshop
  very useful, and would strongly support the continuation of the workshops in
  some form - particularly if one could get down to fundamentals with people
  working on similar problems.

I have kept in close contact with Paul Blackwell at Columbia, Missouri since
the 1975 workshop, and we last met at an N.S.F. Conference on 'MATHEMATICAL
STRUCTURE IN THE HUMAN SCIENCES' at Penn State in March.

Privileged Communication

163

J. Lederberg


CO!+PIITF,R ANALYSIS OF CO2ONARY ARTERIOGRAXS

Section 6.3.3

6.3.3   COXPUTER ANALYSIS OF C~AOIJAHY ARTERIOGRA?lS
                            --

Computer Analysis of Coronary Arteriograms

Donald C. Harrison, A.D., Edwin 5. Alderman, M.D., and Lynn Guam, Ph.D.
   Division of Cardiology, Stanford University medical School

    'The goal of this project is t.o develop computer techniques for automatic
aquisition of the anatomic distribution of coronary arteries and a quantitation
of the de;gree of narrowing of these vessels.  In order to do this, two different
types of image processin g techniques will be developed.  First,  a three-
dimensional representation of the coronary arterial tree will be automaticaly
constructed from coronary arteriograns taken sequentially from several different
views.  Second, the amount of stenosis will be measured by combinin.; information
from multiple sequential frames in order to improve resolution and reduce
radiographic noise.

    Coronary arteriography is the definitive test for the evaluation of
patients with coronary artery disease.  There is no other test currently
available which provides information concerning the location and severity of
coronary narrowings and the distribution of coronary blood vessels in the
myocardium.  Numerous studies document that prognosis in patients with coronary
disease reflects the severity of anatomic disease.  Coronary vascular anatomy and
tne extent of lesions are, in a epidemiologic sense, more precise indicators of
prognosis than are clinical symptoms.

    At the present time, categorization of the extent of coronary vascular
disease is based some&at simplistically on the number of major coronary vessels
involved and a rough estimate of the percentage obstruction.  Computer
representation of the coronary tree, coupled with either interactive or automatic
entry of degree of stenosis will permit the development of more precise indices
of anatomic disease of the myocardiua.

    Computer image processing techniques offer the possibility of objectively
measuring the severity of coronary stenosis,  both at the point of maximal
narrowing and averaged over a segment of the vessel.

APPROACH:

   An extensive set of image processing functions have been developed and
applied to detect the regions of the arteriograms which correspond to the
arterial tree.  These regions are then transformed to a "skeletonf' which roughly
corresponds to the aidlines of the vessels in the arterial tree.  This skeleton
is then transformed to a graph representation whinh can be topologically and
.geometrically analyzed to distinguish vessel intersections (in the 3-d
projection, not real j-space intersections) from vessel bifurcations. The result
is a ,graph structure interpretation of the arterial tree with quantitation of the

Privileged Communication

165

J. Lederberg


Section 6.3.3

COMPUTER ANALYSIS OF CORONARY ARTERIOGRAMS

locations (2-d) of bifurcations, and for each vessel segment the path of the
vessel midline and the vessel diameter.  The computer algorithms are described in
more detail in the following sections.

Data Aquisition:

   de have digitized a number of 35 mm tine frames from three subjects using
both an Gptronics film scanner and a Dicomed film digitizer operating at 25 and
50 micron pixel resolution.  For each subject frames are manually selected to
provide good contrast in the proximal vessels from both LAO and RAO projections
and be approximately synchronized within the cardiac cycle.

Pre-processing:

   The digitized frames are computer enhanced using high frequency filtering
to eliminate the x-ray exposure gradient and emphasize sharp edges which tend to
correspond to the vessels.

   High contrast areas in the enhanced frames are detected by a simple
threshold region detector.  Currently, many regions are detected which do not
correspond to the arterial tree, but are caused by background features such as
vertebra.  We are in the process of digitizing another set of frames which have
been chosen to include time synchronized pre-injection frames in order to permit
background subtraction.  The result of this step is a binary image corresponding
to high density areas in the frame.

   The root of the arterial tree is manually specified by the operator, and a
connected point region grower finds all points connected to the root.  This
usually finds all medium and large sized vessels, and some smaller vessels.
Unconnected background is totally eliminated. Sometimes, substantial pieces of
the arterial tree are not connected to the root.  When this occurs, the operator
can run the region grower from new starting points.  The result of this step is a
binary image corresponding to most of the arterial tree.

   We expect that by using background subtraction we can very reliably detect
the arterial tree and eliminate most of the manual l'hand-holdinq" in the previous
steps.

Arterial Tree Graph Formation:

   The binary image of the arterial tree is "skeletonized" by computing the
distance transform of the image and connecting peaks and ridges in distance.  The
distance transform computes for each point in the image, the Euclidean distance
to the nearest zero (point not in region). Points at vessel midlines are easily
detected because they are local maxima (ridges) in distance from their vessel
walls.

   The 2-dimensional array of ridge-peak information is next processed to form
a graph structure describing the connectivity of vessel segments (distance
ridges) to nodes (points where 3 or more ridges converge).

J. Lederberg

166

Privileged Communication


COMTUTEH ANALYSIS OF CORONARY AHTERI3GRAMS

Section 6.3.3

   The graph is simplified by detecting and eliminating insignificant terminal
segments which are usually the result of noise in the image.

   We have now accomplished a significant simplification of the data from the
original Z-dimensional array of x-ray density data to an essentially l-
dimensional description of the vessel midlines and points of bifurcation and
intersection.  This data (when vessel width is included) is sufficient to
completely reconstruct the binary i>;nage of the arterial tree.

Topologic and Geometric Graph Analysis :

   The graph is next analyzed to determine the proximal-distal orientation of
each vessel segment. Starting at the distal node of a vessel segment, all
segments which are attached to that node must be wit.hin 90 degrees in pointing
direction.  Any segment violating this rule is identified as an intersection.
Starting from the root of the arterial t.ree, all segments are classified by this
procedure.

   Nodes which have been identified as intersections are now analyzed in order
to correspond distal segments with proximal segments according to the a set of
rules about arterial topology and geometry.

   Having resolved vessel int.ersections, we now transform the graph to a
simple tree structure which corresponds topologically to the arterial tree.

Future Directions:

   The above computer algorithms have been successfully applied to the images
in a few sets of digitized data.  We plan to digitize frames prior to injection
to enable backjround subtraction, which we believe will greatly improve the
reliability and accuracy of the initial vessel detection.  The algorithms have
not yet been tried on cases with abnormal angiograms, and we expect that as more
cases are incorporated into our image library, it will be necessary to develop
more rules and analytical techniques in order to properly interpret the 2-
dimension al images .

   Based on the encouraging progress which has been made in processing
coronary arteriograms and based on other areas of expertise in image processing
within the Stanford University Medical Center, we have developed and submitt.ed on
November 1, 1976 to the NHLBI a new grant proposal titled "Computerized Medical
Image Processing Laboratory".  This proposal contains a detailed report of the
progress had been made up to that time and det.ails the further steps which we
propose to pursue.

USE OF SU?lEX RESOURCE:
---

   Work of this project has been dependent on the SUmX facility for several

reasons.  First, this project. has not been funded to provide its own computer
facilities.  Second, although the Stanford Division of Cardiology does have
minicomputer systems which could be used for this project, it is considerably

Privileged Communication

167

J. Lederberg


Section 6.3.3

COMPUTER ANALYSIS OF CORONARY ARTERIOGRAMS

easier to develop image processing and artificial intelligence techniques on a
larger scale system in whicn many powerful tools already exist. It is important
in the research phase of this project to be able to easily and quickly perform
experiments,  without the difficulties of fittin g the experimental programs into
the small computer memory environment.

J. Lederberg

168

Privileged Communication


QUANTUM CHEMICAL INVESTIGATIONS

Section 6.3.4

6.3.4   QUANTUM CHEMICAL INVESTIGATIONS
                                        --a

Theoretical Investigations of Heme Proteins and Opiate Narcotics

   Dr.  Gilda Loew
Department of Genetics
Stanford University

    SUMEX is used for the calculation of various one-electron electronic
properties of iron containing compounds. The programs were formulated and written
by David Steinberg, Michael Chadwick and David Lo.  David Lo was responsible for
converting the program for interactive use on the PDP system.  Slight
improvements were made by Robert Kirchner and Sheldon Aronowitz has expanded the
formulation to include additional spin and oxidation states of the iron atom.

   The properties that are calculated include the electric field gradient at
the iron nucleus , quadrupole splitting, isotropic and anisotropic hyperfine
interaction, spin-orbit coupling and zero field splitting, g values and
temperature dependent effective magnetic moments. The calculated values are
compared directly to experimental results obtained from published Mossbauer
resonance and electron spin resonance spectra.  Such a comparison determines not
only the reliability with which these properties can be calculated but also gives
an indication of the ability of the model of the iron active site to mimic the
actual environment found in a particular compound or iron containing protein.

   The major input to these properties programs is a description of the
electron distribution of the compound under consideration.  This description is
obtained using a semi-empirical molecular orbital method employing the iterative
extended Huckel procedure.  Such a calculation requires up to 660K core and is
performed elsewhere.  When the calculated electron distribution yields a set of
calculated properties in agreement with observation, we have increased faith in
the description of the model of the active site and can carry the model one step
furtner to make qualitative inferences about certain properties relevant to the
biologicai functioning of the compound.

  We are currently performing a systematic study of heme proteins.  The
electromagnetic properties of these proteins and of synthesized model compounds
which mimic the observed behavior of the proteins have been well studied
experimentally.  Specifically, we have addressed the following problems:

(1) Cooperativity of oxygen binding to hemoglobin. Calculations have been made
  for high and low affinity forms of deoxyhemoglobin. This work has been
  submitted to Nature (Loew and Kirchner).

(2) The nature of oxygen binding to the heme unit. Calculations were made of
  model oxyheme compounds with varying oxygen geometry and electron
  configuration.  This work is now in press in the Journal of the Ameriaan
  Chemical Society.  (Kirchner and Loew) .

Privileged Communication

159

J. Lederberg


Section 6.3.4

QUANTUM CHEMICAL INVESTISATIONS

(3) The enzymatic cycle of an oxidative metabolizing heme enzyme called
  cytochrome P-450.  This enzyme is responsible for drug metabolism and
  toxicity and for activation of many chemical carcinogens.  Preliminary
  characterization of the enzymatically active state has been made.   This work
is in press in the Journal of the American Chemical Society (Loew, Kert
  Hjelmeland and Kirchner).

   In a completely different context, we have been using SUMEX to calculate
the conformation of pentapeptides (enkephalins) which have been recently found to
be endogenous opiates.  The aim of this study is to determine in what way, if
any, they can mimic the structure of prototype opiates such as morphine and
meperidine.  For this work, we use a protein conformation program with empirical
interaction potentials.  Quantum mechanical conformations calculations of the
same peptides are being performed by us elsewhere and the results of the two
methods being compared.

J. Lederberg

170

Privileged Communication


PILOT AIi4 PROJECTS

Section 6.4

    6.4    PILOT AICj PROJECTS
              --
  The followins are descriptions of the informal pilot projects currently
using the AIM portion of the SiJMEX-AIM resource pending funding, and full review
and authorization.

Privileged Communication

171

J. Lederberg


Section 6.4.1

COMMUNICATION ENHANCEMENT PROJECT

6.4.1    COMI%JNICATION ENHANCEMENT PROJECT
                         --- --

Communication Enhancement Project

John 3. Eulenberg, Ph.D. and Carl V. Page, Ph.D.
     Department of Computer Science
       Michigan State University

I) Summarx of research proqram.
            -

A) Technical goals.

   The major goal of this research is the design of intelligent speech
prostheses for persons who experience severe communication handicaps.  Essential
subgoals are:

(1) Design of input devices for persons with greatly restricted movement.

(2) Development of software for text-to-speech translation.

(3) Research in knowledge representations for syntax and semantics of spoken
  English in restricted real world domains.

(4) Development of micro-computer based portable speech prostheses.

B)  Medical Relevance and Collaboration.

   We have exchanged visits and had many conversations with Dr. Kenneth Colby
of UCLA who is working on similar problems for a domain of people who have
aphasia.

   The need for such technology in the medical area is very great. Millions
of people around the world lead isolated existences unable to communicate because
of stroke, traumatic brain injury, cerebral palsy, and other causes.  The
emergence of inexpensive micro-processors and sound synthesizers makes it
possible to develop devices now that can be the prototypes for widespread use.

   We have organized institutes to bring together the many professionals who
have an interest in this area.  Together with the Tufts New England Medical
Center, the TRACE Center of the U. of Wisconsin, and the Children's Hospital at
Stanford, we have begun the first newsletter for dissemination in this area. Dr.
John B. Eulenberg helped to organize the first Federal workshop for governmental
agencies who have some interest in funding work in these areas.  Represented were
the Bureau of Education for the Handicapped, The Veterans Administration, NIMH,
NINCDS, NSF, and others.  We have also been in touch with United Cerebral Palsy
associations at the state and national levels.  There is much interest in this
area from medical, educational, and governmental communities, but no traditional
means of supporting it.

J. Lederberg

172

Privileged Communication


COlWUNICkTION ENHANCEMENT PROJECT

Section 6.4.1

C) Progress summary.

   Although some facets of the research have been underway at MSU for Several
years, we have been using SUMEX-AIM for only six weeks at this time, having
received our password in March, 1977.  During the last six weeks, we have:

1) Designed and built hardware and software allowing us to transmit files to
SUMEX from our Nova 2/10 at 300 baud.

2) Organized a research team of 4 students posessing background in
artificial intelligence led by Dr. Carl V. Page to develop a semantics-
  based speech generator.  We expect to have a prototype running in June
  (written in SAIL). To this end we are concentrating on semantics
associated with personal needs, small talk (weather etc.), and perhaps
obtaining geographic directions.

3)

Have begun conversion of ORTHOPHONE, MSU's large English text-to-speech
program from its CD6500 Fortran implementation to a SAIL version. 4)
Obtained temporary local support for terminals and tie-lines to use the
SUMEX-AIM facility.  We requested these in our original proposal but were
not granted them.  We have to share with others in the use our tie-lines
and terminals.  At present the lack of a dedicated tie-line from East
Lansing to Tymshare in Ann Arbor or Detroit is a problem for us during
0600 to 0900 PST.

During the past few months, Dr. Richard Reid of our project has:

5) Developed a personal communication system for a IO-year-old person who
has cerebral palsy.  It is micro-computer-based and can accept inputs via
an adaptive switch from a series of menus displayed on a TV screen, via
Morse code, or by a keyboard.  Its outputs can be TV display, hard copy,
  Morse code, spoken English, Morse code, or musical sounds.  6Je expect to
use knowledge gained from the SUMEX-AIM semantics project to specify the
content and connection of the choice menus for this project.

During the past three months,

6) We have begun to experiment with the interaction of knowledge sources
(letter and word frequencies, syntactics, semantics and pragmatics) as a
means of anticipating likely inputs and displaying them for a person to
  choose from.

7) Built and tested a myoelectric interface and used it (together with a
miniature FM transmitter) for input of changing muscle potentials into a
computer.  There is reason to believe that this means of input may
provide a higher bit rate than any other known means for those people who
experience severe motoric problems due to cerebral palsy.

Privileged Communication

173

J. Lederberg


Section 6.4.1

COMMUNICATION ENHANCEMENT PROJECT

D) Up-to date list of publications. (1976 to date)

For John B. Eulenberg:

"Technical Systems Development, Headend", Interim Report, April, 1976,
Experimental Applications of Two-Way Cable Delivery, NSF Grant No. APR
75-14286.

"Interactive New Hired Information Access System with Both Voice and Hard
Copy Output: User's Guide to NHQUERRY", April 11, 1976 (With Steven
Kludt and Jerome Jackson (Artificial Language Laboratory Report AEB
041176))

"LangUage Individualization in a Computer-Based Speech Prosthesis System",
National Computer Conference, New York, June 9, 1976.

"Individualization in a Speech Prosthesis System", Proceedings of 1976
Conference on Systems and Devices for the Disabled, June 13, 1975.

"The LEAF Language", Interim Report, September, 1976, NSF Grant No. APR 75-
  14286.

"A Programmable Multi-Channel Modem Output Switch", September 22, 1976,
with Joseph C. Gehman and Juha Koljonen (Artificial Language Laboratory
Report AEB 092276)

"SMPTE 'Time Code Interface and Computer-Controlled Video Switcher", with
Michael Gorbutt and Dennis Phillips, Interim Report, March, 1977 NSF
Grant APR 75-14286.

For Carl V. Page:

"Heuristics for Signature Table Analysis as a Pattern Recognition
Technique", IEEE Transactions on Systems, Man and Cybernetics,Vol. SMC-
7, No. 2, February 1977.

"Discriminant Grammars, an Alternative to Parsing".  with Alan Filipski,
Proceedings of the IEEE Workshop on Picture Processing, Computer
Graphics, and Pattern Recognition, April 22, 1977.

"Pattern Recognition and Data structures". Chapter in "Data Structures in
Computer Graphics and Pattern Recognition " Edited by Allen Klinger,
Academic Press, 1977.

   During 1976 Dr. Eulenberg presented 15 lectures around the country on his
research, was interviewed for TV eight times and was on radio five times.

J. Lederberg

174

Privileged Communication


COPlFlLJNICATION ENHANCEMENT PROJECT

Section 6.4.1

II) Interactions with the SUMEX-AIM resource.
                   ----

   Again we point out that we have been a part of this community for only
about 6 weeks and we will have more to say next year.

A) Examples of medical collaboration and medical use of programs via SUMEX.

   The faculty in the MSU College of Human Medicine who teach medical decision
making were shown a demonstration of the SUMEX system, MYCIN and PARRY.  We plan
to present a demonstration to advanced medical students and faculty at the
Medical School in the near future.

   A member of our Medical School faculty, Dr. Richard Ropple, an expert on
myoelectronics, is a member of of our research group.

   The Dean of our College of Human Medicine visited our laboratory in April,
1977 and we expect encouragement and collaboration.

8) Examples of sharing, contacts,  and cross-fertilization with other
  SUMEX-AIM projects.

1.  We have met with Dr. Kenneth Colby on many occasions including the
  SUMEX-AIM workshop in June, 1976.  Our work in many ways complements his
  and we have had several worthwhile interchanges of information.  We are

Privileged Communication

175

J. Lederberg


Section 6.4.1

COMMUNICATION ENHANCEMENT PROJECT

converting our major software programs for speech generation and
adaptive inputs to the SUMEX AIM system in part so that they can be used
by Dr.  Colby and his group.

2.  Mr. Douglas Appelt, a doctoral student at SU-AI was our principal
  systems programmer last summer.  He is currently doing research in the
  same area as ours with Dr. Gary Hendrix of SRI.  We have used his
  knowledge of your system (via the message sending routines) to assist us
  in starting our project.  Mr. Appelt will be working with us at MSU
  again this summer (June-Sept., 1977), and he will be using the SUMEX-AIM
  system.

J. Lederberg

176

Privileged Communication



AI IN PSYCHOPHARMACOLOGY

Section 5.4.2

6.4.2    AI IN PSYCHOPHARMACOLOGY
         -- --- --

Artificial Intelligence in Psychopharmacology

      Jon F. Heiser, iv1.D.
Dept. of Psychiat.ry and Human Behavior
University of California at. Irvine

I.  Summary_ Research Program
            ---

A.  Technical Goals

1.  tie propose to construct a computer based system embodying some of the
  knowledge of an expert in clinical psychopharmacology.  Such a syst.em
  could greatly assist physicians and students who are not specialists
  in the chemotherapy of mental disorders in choosing the best
  psychopharmacological treatment for patients for whom such treatment
   is indicated.  The system could also serve as a teaching source of
  psychodiagnostic and psychopharmacological knowledge.

2.  The specific aims of this project are:

o To develop a set, of MYCIN type rules which are a model of expert
clinical teaching, consulting and decision-making for clinical
psychopharmacology.

o To implement this set of rules in the MYCIN system, and

o To evaluate the performance of the resulting system as a teaching
and consulting aid.

3.  No system currently available or under development approaches the
  goals of the project in the field of clinical psychopharmacology.

4.  It is anticipated that the research will fall into two distinct
  phases each of approximately 18 months duration.  The first and
  current phase involves evaluating the relevance of the st.ructure of
  the MYCIN system for use in clinical psychopharmacology by replacing
  the current infectious disease diagnosis and therapy rules and
  parameters with psychopharmacology rules and parameters.  The second
  phase will involve accumulating a large body of rules and entering
  them into the MYCIN system and evaluating their performance.  To ward
  the end of this phase, the behavior of the system will be compared
  with the behavior of recognized experts working on the Adult
  Inpatient Psychiatric Service of the UC1 Medical Center. This
  evaluation will focus on the adequacy of the system for representing
  the knowledge of a skilled psychopharmacologist rather than an actual
  system performance in the clinical framework.

Privileged Communication

179

J. Lederberg


Section 6.4.2

AI IN PSYCHOPHARMACOLOGY

B.  Medical Relevance and Collaboration

1.   Fledical Relevance

a.  For many years it has been well recognized that potent, effective
  psychopharmacological agents are frequently used in an
  unsystematic and irrational manner.  The most prescribed
  medication in the United States today is diazepam (Valium), a
  minor tranquilizer.  The first six most prescribed medications
  are all psychoactive agents.  In California, instances of
  repetitive use of psychotropic drugs have been reported by 70% of
  a random sample of adults.  About 30% of the sample had used
  psychotropic drugs in the preceding twelve months.  Another study
  showed that 20% of a medical population was taking psychoactive
  agents at any given time.  These figures do not include alcoholic
  beverages or non-prescription and illicit drugs with psychoactive
  properties.  Many persons are advised to ingest a daily
  pharmacologic stew consisting of one or more neuroleptic agents,
  an antidepressant, an anti-parkinsonian agent, one or more
  tranquilizers, a hypnotic and possibly a psychostimulant.  These
  regimens are often complicated by non-prescription remedies,
  alcoholic beverages and illicit drugs.  The inevitable drug-drug
  interactions affect absorption, distribution, binding metabolism
  and excretion of many drugs,

b.

Each year Americans spend over $700,000,000 for psychotropic
drugs.  In a recent year $150,000,000 was spent on the anti-
anxiety agent chlordiazepoxide (Librium).  Between 20 and 25
million prescriptions are written each year for diazepam.  It is
estimated that 170,000,000 prescriptions for psychotropic drugs
were written in 1967, and that 202,000,OOO prescriptions were
written in 1970, more than one for every person in the United
States.  About 175 of all prescriptions written are for
psychoactive drugs.  If we include medications in which a
psychotropic drug is combined with an antispasmodic vasodialator,
or other agent, probably 25% of all prescriptions contain
psychotropic drugs.  The vast majority of these prescriptions are
written by physicians who are not psychiatrists.

c. Many physicians, including psychiatrists, who are practicing
  today, completed their formal medical training prior to the
  1950's when modern psychopharmacological agents first became
   available.  Their training typically includes no instruction in
  modern clinical psychopharmacology.  Even physicians trained
  since the mid-1950's cannot be expected to keep abreast of the
  expanding and changing field of psychopharmacology.   The
  principles and practices recommended a few years ago are rapidly
  becoming obsolete.  A recent study showed that the general
  knowledge of the pharmacology, physiology, and side effects of
  psychoactive medications was low in both psychiatrists and non-
  psychiatrists: less than 209 of the physician subjects were able
  to devise a psychopharmacologically rational dosage schedule for
  benzodiazepines.  Fifty percent of the non-psychiatrist medical

J. Lederberg

180

Privileged Communication


AI IN PSYCHOPHAR?iACOLOGY

Section 6.4.2

d.

staff felt that doses up to one gram per day of a tricyclic
antidepressant, more than three times the recommended maximum and
a potentially fatal amount, might be prescribed for depressive
symptoms.

We estimate that there are at least 25 discrete syndromes
currently identified in clinical psychiatry, each of which has a
unique hierarchy of pharmacological treatment.  Each treatment in
each section has its own set of potential side effects, adverse
reactions and drug-drug, drug-host, drug-age and drug-state of
health interaction.  In addition, for each therapeutic regimen in
each hierarchy, there are several classes of drugs which
typically consist of more than one agent or combination of agents
which are potentially beneficial and which can be preferentially
ranked dependent on several other factors in the clinical
situation.

2.  Medical Collaboration

1.  The principal investigator, Jon F. Heiser, M.D., is a physician
  who is board certified in psychiatry and in full time teaching,
  research and University service.

2.  Three medical students have participated in this project to date:
  Clifford Risk, Dana W. Ludwig, and Sue A. Clear. 3.   Two
  resident physicians have participated in this project: Bronco R.
  Radisavljevic, M.D., and Steven J. Smith, M.D.

4.  A Doctor of Pharmacy participates in the research: Pierre J.
  Menard, Pharm. D.

C.  Progress Summary

1.  Our initial endeavors to extend a MYCIN-like system to clinical
  psychopharmacology have been successful.  None of the envisioned or
  predicted problems related to the vagaries of psychiatric terminology
  and diagnosis have materialized.  The system appears to work promptly
  and accurately to diagnose psychiatric conditions, and in a very
  rudimentary form, to suggest appropriate pharmacological treatment.

The current system consists of about 20 clinical parameters and about
50 decision rules aimed chiefly at a small number of diagnosed
psychiatric conditions.

Privileged Communication

181

J. Lederberg


Section 6.4.2

AI IN PSYCHOPHARMACOLOGY

The following is an example of a sample consultation but realistic
case:

Sample Consultation with HEADMED, __
                   a Psychopharmacology Advisory System
                                                   -

Patient History

  A 37 year old man was brought into the emergency room by his distraught
family members.  They described how the patient, a well-liked, successful
and cheerful but reserved history professor, had become increasingly
outspoken, short-tempered, and inattentive to accepted norms of social
behavior over the preceding three weeks.  An unprovoked argument and physical
assault upon a long-time friend motivated the family to seek professional
help.

  Results of a physical exam and routine laboratory studies were within
normal limits.  Consultation with the psychiatric service was requested.
A first year psychiatric resident re-interviewed the patient and his
family, and completed a formal mental status exam prior to obtaining the
following consultation from the system.

[Questions are generated by the system and appear in mixed upper and lower
case, user responses are in boldface capitals and follow a double asterisk.
Added commentary is indented in italics,]
                            *I**w******P*Y+

Special options (type for help):
** AFB
Hello, Dr. Bowen
Do you wish to run the consultation program
**YES

29-OCT-76 00:47:26

--------PATIENT-l--------
l> PATIENT'S NAME     [The consultation begins with a few standard
                questions.]

**JOHN DOE

2) PATIENT'S AGE IN YEARS
it* 37
3) SEX:
** MALE

4) IS John Doe ORIENTED
** N

          [Since the patient is disoriented, the system begins to
            Probe a little deeper.]
5) IS John DOE DISORIENTED-TG-PERSON
** Y

6) IS John Doe DISORIENTED-TO-PLACE
** N
7) IS John Doe DISORIENTED-TO-TIME
** Y

J. Lederberg

182

Privileged Communication


AI IN PSYCHOPHARMACOLOGY

Section 5.4.2

8) PLEASE TYPE IN THE ONE WORD WHICH BEST DESCRIBES THE THOUGHT PATTERNS
  OR THOUGHT DISORDER OF John Doe
** CONFUSED

C3NFUSED not a recognized response
tihat is TiiE THOUGHT PAT'TERN OR THOUGHT DISORDER OF John Doe
Expected responses are: NORiJlAL PROLIXIC TANSENTIAL CONFABULATION ETC>
Enter HELP for list of user options.

[The initial response was incorrect, so the user was reminded of
some of the correct responses.  He then requests a display of the
entire list.  Note that both of these capabilities are part of the
core of MYCIN itself.  Applying it to this new domain required
more than supplying the relevant list of acceptable answers.]

** ?

Expected responses are:  NORMAL PROLIXIC TANGENTIAL CONFABULATION
RETARDED LOOSE CONCRETE AUTISTIC DELUSIONAL WORD-SALAD NANICY 'IIJTE
Enter HELP for list of user options.
** TANGENTIAL

9) Has any biological relative of John Doe been evaluated or treated

for a mental or psychiatric problem
** pj

10) *at is the single word which best describes John Doe's behavior
**

What is the best single word description of John Doe's behavior
Expected responses are: NORMAL AVERAGE UNREMARKABLE CATATONIC-EXCITEMENT
etc.

Enter HELP for list of user options.
** HYPERACTIVE

II) What is the single word which best describes John Doe's mood
** LABILE

12) WHAT IS THE SINGLE WORD WHICH BEST DESCRIBES THE EMOTIONS OR FEELINGS
   OF John Doe
** ELATED

13) Does John Doe tend to feel worse or be more symptomatic for the
  first several hours after awakening for the day
** WHY

[i.e. rJHY is it important to determine whether or not PATIENT-l tends
  to feel worse or be more symptomatic for the first several hours
  after awakening for the day]

14.01 This will aid in determining the DIAGNOSIS OF PATIENT-l.
   If

[4. I] John Doe tends. to feel worse or be more symptomatic for
     the first several hours after awakening for the day
then

  there is weakly suggestive evidence (.45) that THE DIAGNOSIS OF
    john doe is depression
[RULEO 173

[All the standard explanatory capabilities of the core system extend
to the new domain without additional programming effort.]

Privileged Communication

183

J. Lederberg


Section 5.4.2

AI IN PSYCHOPHARMACOLOGY

[back to question 13...]
** N
14) Does John Doe have early morning insomnia
a* iJ
15) Please specify the number of pounds that John Doe lost or gained
  in the 2 months prior to seeking medical or psychiatric help.
  Estimate if necessary.  Precede weight LOSS with a MINUS sign (for
  example: -8 - a weight loss of (approximately) 8 pounds within the
  2 month period prior to seeking professional help)

** 5

16) Has psychological testing been ordered on John Doe
** N
17) ARE THE PERCEPTIONS OF John Doe NORMAL, i.e., NEITHER DISTORTED

   NOR DISORDERED
+* N

18) DOES John Doe HAVE DEPERSONALIZATION
tl( N

19) DOES John Doe HAVE EXPERIENCES OF ALIENATION and/or EXPERIENCES OF

   INFLUENCE
%* N

20) DOES John Doe HAVE INCOMPLETE AUDITORY HALLUCINATIONS and/or OTHER

   HALLUCINATIONS
** Y

21) DOES John Doe HAVE COMPLETE AUDITORY HALLUCINATIONS and/or THOUGHT

   BROADCASTING
** N

22) what is John Doe's level of consciousness
**

What is THE LEVEL OF CONSCIOUSNESS OF John Doe
Expected responses are:  HYPERALERT VIGILANT ALERT NORMAL etc.
Enter HELP for list of user options.
*t VIGILANT
23) Is this the first episode of significant mental disturbance that
  john doe has had
#S WHY

[i.e. WHY is it important to determine whether or not this is PATIENT-l's
  first episode of significant mental disturbance]

[8.O] This will aid in determining THE TREATMENT NAME OF PATIENT-l.
    If

La.13 THE DIAGNOSIS OF John Doe is mania, and
[8.2] this is not John Doe's first episode of significant mental
       disturbance
then

there is strongly suggestive evidence (-9) that THE TREATMENT
  NAME OF John Doe is lithium
[RULE0711

[back to question 23...]
[The system has finished with the diagnostic phase, and asks a
question to help determine the appropriate therapy.]
** Y

J. Lederberg

184

Privileged Communication


AI IN PSYCHOPHARMACOLOGY

Section 6.4.2

[And finally the system presents its diagnosis and therapy recommendations.
The numbers are used to give relative weightings to the conclusions:
negative numbers indicate that the evidence tends to rule out that
possibility.]

The diagnosis for patient 1 is:
      ORGANIC-BRAIN-SYNDROME (.84)
        DEPRESSION         C.37)
        MANIA             ( .35)
        SCHIZOPHRENIA         C.2)
        PERSONALITY-DISORDER   (-.4)
        NEUROSIS           C-.4)

So the treatment should be
        EVALUATION
        ANTIDEPRESSANT
        NEUROLEPTIC

( .84)
C.3)
C.28)

Recently work has been directed toward expanding the system and
revising the representation of psychiatric diagnosis and treatment
recommendation.

We have also begun development of a small system to score the
Minnesota f4ultiphasic Personality Inventory (MMPI) psychological test
using empirically well established rules easily coded into the MYCIN
system.

D.  Up to date list of publications.

1.  No reports of this work have been published to date.

2.  Heiser, J.F.  Computer-Aided Diagnosis of Psychiatric Patients.
  Presented to the Research Meeting, School of Engineering, University
  of California, Irvine, 7 October 1976.

3*  Brooks, R. E. and Heiser, J.F.  An Application of Artificial
  Intelligence to Psychiatry.  Presented to:

(a) Indian Institute of Technology, Madris, India,
   28 September 1976, and

(b) Madris Christian College, Madris, India, 3 October 1976.

4.  Heiser, J.F. and Brooks, R. E.  Artificial Intelligence in
  Psychopharmacology.  Accepted for presentation at the VI World
  Congress of Psychiatry, Honolulu, Hawaii, 28 August - 3 October 1976.

Privileged Communication

185

J. Lederberg


AI IN PSYCHOPHARMACOLOGY

Section 6.4.2

II.  Interactions with the SUMEX-AIM resource
                   --                   -

A.  Examples of collaboration and medical use of programs via SUMEX

1 .  As explained fully in the attached research grant application, the
  MYCIN group has been working informally with Dr. Heiser on the
  development of a knowledge base of decision criteria for
  psychopharmacology over the past two years.

8.  Examples of sharing, contacts,  and cross-fertilization with other
   SUMEX-AIM projects (via workshops, system facilities, personal
   contacts, etc.)

1.  Dr. Heiser's introduction to the SUMJZX-AIM project first occurred
  at the first AIM workshop held at Rutgers in June 1975.

2.  Although Dr. Heiser had previously heard of the MYCIN project, his
  official collaboration with MYCIN resulted from discussions
  originating at the first AIM Workshop.

3.  A collaborative experiment with Kenneth Mark Colby, M.D., and
  members of the PARRY project was developed, implemented and
  analyzed completely on SUMEX-AIM.  Enclosed is a rough draft of a
  paper reporting this "Turing Test" which was performed on-line on
  SUMEX, with the psychiatrist-judges located at Irvine, the patient-
  person at UCLA and PARRY at SUMEX.

4.  Much technical support has been received freely and continuously
  from the SUMEX staff and members of the MYCIN team, including basic
  instruction in the use of SUMEX, TENEX, and MYCIN, principles of
  knowledge representation in PIIYCIN, and on-going consultation for
  details of implementing HEADMED in MYCIN.

Much information has been obtained during three visits to to SUMEX
and MYCIN, but daily work in this project would be impossible
without the ability to converse via links, messages, and telephone
conversations with members of the SUMEX and MYCIN staffs.

Privileged Communication

187

J. Lederberg


ORGAIJ CULTURE PROJECT

Section 6.4.3

6.4.3
       ORGAN CULTURE PROJECT
                         --

Application of Computer Science to Organ Culture

Professor Robert K. Lindsay and Dr. Maija Kibens
    The University of Michigan, Ann Arbor

I) Summary of research_ -ram
             -

   The goal of this research project is to develop new methods for the design
and analysis of organ culture experiments, using techniques of artificial
intelligence.

    The cultivation of organ fragments is an important method for the study of
disease processes.  In contrast to cell culture, organ culture is designed to .
inhibit outgrowth of cells and to deal with normal tissue relationships as they
exist in the body, divorced from the complexities or organ interaction. The
technique involves the maintenance of differentiated cells as a group within
their normally associated tissues.  With an ability to maintain differentiated
tissues in culture, a direct histologic and biochemical assessment of factors
influencing an organ is possible.  Such a biologic model would permit
investigation of the structural and functional effects of various substances
directly on the target organ.  With a chemically defined medium, the technique
would allow a simultaneous evaluation of metabolites or hormones released by the
organ fragments.

   The research is being done in collaboration with Professors Raymond Kahn,
Theodore Fischer, and William Burke1 of the Department of Anatomy, the University
of Michigan Medical School.

    We have been working on methods of image analysis of microscope slides.
This has been approached from two directions.  On the one hand we are writing
programs for special image analysis hardware.  These programs will calculate
various indices of the condition of the cultivated organ fragments based upon
measured morphological features. The second approach is to translate the
biologist's verbal descriptions of microscope slides into computer data
structures which encode conditions not detectable by our image analysis programs,
though readily seen and reported by trained human observers.  We have developed a
dictionary of anatomical terms and programs for morphological analysis. At
present we are working on the syntactic analysis of the scientist's verbal
descriptions.

Privileged Communication

189

J. Lederberg


Section 6.4.3

ORGAN CULTURE PROJECT

II) Interactions with the SUMEX-AIiY1 resource
               ---

   We have had valuable contacts with members of the DENDRAL project and the
HOLGEN project, which share certain goals and methods with our own work.

J. Lederberg

190

Privileged Communication


iiEuROPROSTHESES PROJECT

Section 6.4.4

5.4.4   ~1EtJROPROSTH~SES PROJm

Neuroprostheses Project

M. cl. 13ladejovsky, Ph.D., Director
  Division of Artificial Organs
University of Utah Medical Center
  Salt Lake City, Utah 84112

1. --
   Research Summare

   Our research involves the investigation of artificial vision by electrical
stimulation of visual cortex and artificial hearing by electrical stimulation of
the cochlea. This effort has involved the collaboration of several people from
many disciplines, not only from the University of Utah, but also from the Ear
Research Institute, Los Angeles; University of Western Ontario, London, Ontario;
and Columbia University, New York.

   The instrumentation involved is controlled by a minicomputer system
consisting of a PDP-8 and a PDP-11/05. Experimental protocols are implemented by
programs running in the PDP-11.  tie sought access to SUMEX in order to use the
dLISS-11 compiler which runs on the PDP-10. We are using BLISS-11 as the
implementation language for an interactive programming system which will enable
`more flexible control and variation of our experiments.

   The base language we are using is BALM (Malcolm Harrison, "BALM
Programmer's Manual", Courant Institute, NYU, 1974). This language is defined in
terms of an abstract machine called the MBALM machine. The plan of attack is as
follows:

I) implement the MBALM machine in BLISS-II
2) bring up BAL!q, using a dummy garbage collector and no

virtual memory

3) implement garbage collection and virtual memory
4) add floating point operations
5) add a graphics package
6) add real-time capabilities
7) provide an interface to PDP-11 machine language

   The project has progressed to the point that step 2 is almost complete.
Tnis has involved installing a new version of BLISS-11 at SUElEX, writing software
to allow file transfers between SUMEX and our PDP-11 (which is connected to the
Utan-TIP as a terminal), writing MBALM and various support routines in BLISS-II,
implementing an I/3 package for BALM in assembly language, and performing a
bootstrapping process with the BALPl self-definition.  Our schedule calls for
completing steps 3, 4, and 5 by 1 July 1977.  Steps 6 and 7 have not been planned
in detail at this time.

Privileged Communication

191

J. Lederberg


Section 6.4.4

NEUROPROSTHESES PROJECT

   'vie are planning to run the resulting programming system on our PDP-11105
with 28K core, GT-40 graphics system, and running the RT-11 operating system.
Modifying the system to run under a different operating system should be
straightforward. However, whether the system will run efficiently on a machine
with less than 20K core is questionable. It is too early now to say.

   There have been no new publications by our group since our application was
filed last year. Currently several papers are in progress but have not yet. been
submitted for publication. A partial list of previous publications is attached.

   When the BALM system has reached a stable state, we will be happy to
provide documentation and sources for it to anyone who requests them.

II.  Interactions with SUMEX
                   --

   We have been perfectly satisfied with our use of SUIMEX. By far our greatest
use of the system has been of text editors and the BLISS-11 compiler.

   We have also become acquainted through SUMEX with the OMNIGRAPH graphics
package available from NIH and have obtained a copy of the OMNIGRAPH manual. We
nave not used OMNIGRAPH yet but may wish to in the future: We are considering the
features of OMNIGRAPH in the design of the graphics package for our interactive
system.

   We are quite interested in using the MAINSAIL system being developed at
SUMEX and have been told that RT-11 is one of the first operating systems under
which it will be available.

IV.
    Publications

Dobelle, W. H., Mladejovsky, M. G., and Girvin,  J.P. Artificial vision for the
   blind: electrical stimulation of visual cortex offers hope for a functional
   prosthesis. Science, 183, 1 February 1974, 440-444.

Dobelle, W. H., and Mladejovsky, P4. G. Phosphenes produced by electrical
   stimulation of human occipital cortex and their application to the
   development of a prosthesis for the blind.  J. Phsiol., 243, 1974, 553-576.

J. Lederberg

192

Privileged Communication


tiEOROPROSTHESES PROJECT

Section 6.4.4

oobelle, W. il., Nladejovsky, rl. ct., Evans, J. R., Roberts, T. S., and Girvin, J.
     P. 'Braille' reading by a blind volunteer by visual cortex stimulation.
   iJature, 259, 15 January 1976, 111-112.

iQadejovsky, M. G., Eddington, D. K., Evans, J. R., and Dobelle, W. H. A
   computer-based brain stimulation system to investigate sensory prostheses
   for the blind and deaf. IEEE Trans. Biomed. En@;., BME-23, 4 July 1976, 285-
    296.

Nladejovsky, FI. G., Eddington, D. K., Dobelle, W. H., and Brackmann, D. E.
   Artificial hearing for the deaf by cochlear stimulation: pitch modulation
   and some parametric thresholds. Transactions of ASAIO, 21, 1375, l-6.

Privileged Communication

193

J. Lederberg


Section 6.4.5

MATHEMATICAL MODELING OF PHYSIOLOGICAL SYSTEMS

6.4.5   MATHEMATICAL MODELING OF PHYSIOLOGICAL SYSTEMS
                                  -

Matnematical Modeling of Physiological Systems

John J. Osborn, M.D., Director
    Research Data Facility
The Institutes of Medical Sciences
San Francisco, California 94115

   The overall goal of the Institutes of Medical Sciences's collaboration with
SUMEX is the application of computer technology to clinical medicine.   Our
efforts during the past year have been in the fields of knowledge based
engineering and mathematical modeling.

   We are using our available computer based physiological measurement systems
to provide the basis on which physiological interpretation is being developed
using knowledge engineering, and to provide the data with which mathematical
models are being developed using the SUMEX modeling facility.

BIOMEDICAL KNOWLEDGE ENGINEERING IN CLINICAL MEDICINE (KEMED)

   The KEMED system is conceived as an application of the discipline of
heuristic based programming to the interpretation of measurements made in
clinical medicine.  The long range goal of the project is to do research on a
biomedical knowledge-based system for interpreting the clincal significance of
physiological data. This interpretation will be used to aid in diagnostic
decision making and the selection of therapeutic action.  Even the best
measurements often go unused because of the reasonable reluctance of clinical
staff to make measurements whose results they only poorly understand and whose
relation to clinical management is ambiguous. We will use techniques of
biomedical knowledge engineering to extract and systematize the heuristic
knowledge used by experts in the practice of their clinical art.  These
techniques will be used to construct and utilize a knowledge base to guide
inference making by computer programs.

J. Lederberg

194

Privileged Communication


Section 6.4.5

   The first program in the KE?;EL system is designed for interpretatioa of
standard pulmonary function laboratory test data. A knowledge base was developed
for interpreting the relationship between raeasured flows, lung volumes, pulmonary
diffusion capacity and pullnonary mechanics and the standard diagnoses of
pul.ilonary function.  The knowledGe base includes interpretation of measured test
results and dia,gnosis of the type and severity of any pulmonary disease whic`h may
oe present.  The pro;;ram is being developed as an extension to the MYCIN
formalism, and it makes extensive use of the !lYCIN structures and programainc
system.  Funding has been requested to continue this work.

:"iRTi1E?lATICAL MODELING OF PHYSI'3LOGICAL SYSTEMS

   Mathematical models of the cardio-pulmonary system are being developed to
extract clinical physiological information from data acquired by the patient
:nonitoring system .  two approaches are being taken: 1) parsimonious models of the
dynamic behavior of CO

           i: followin, 0 an increase in inspired oxygen concentration
are beiny developed  fo  automated patient monitoring application, and 2) a
detailed model of the regional behavior of radioactive tracers in the lung iS
being used as a standard for evaluation of the previous models.  The idLAB
(Modelling Laboratory) program, available on SUHEX is being used extensively for
model development by simulating hypothesized models and for data analysis, i.e.,
identification of model parameters from experimental data.  The CO, dilution
metnod has been applied successfully in the ICU and additional fun3inq requested.
Two new methods for measuring regional lung function with radioactive tracers
have been developed where MLAB was essential and further funding has been
requested.  rlLAB was used to perform an error analysis of the method for
measuring regional pulmonary shunt fraction.  Also, using MLAB model simulation
to understand the complex dynamic s of 1x3-Xenon in the lun,g-tissue system, a
method for measuring intraregional ventilation/perfusion ratio naldistribution
has been developed which significantly extends the sensitivity of previous
;netnods.  A model of the oculatory system i s presently being developed on MLAB in
collaboration with the Smith-Kettlewell Institute of the Visual Sciences.   We
anticipate that their model will be used in the future for treatment of patients
witn strabismus.

Interface with SUi4EX

   de use SUMEX through the Tymshare network using a terminal.  The text
editing facilities of SUtiEX, including both text editing and message sending, are
excellent additions to our in-house facilities (PDP-11 based system).  The
message system is particularily useful for communicating ideas and questions with
other colleagues using the SUNEX syste,n.  Our principal difficulty with SUMEX is
turn-around tirne.  Both the MYCIM amd MLAi3 systems are interactive, and the 33-60
second time response times associated with HYCIN and MLAB jobs are at best
discouraging.

   We have a strong desire to develop in-house capabilities in artificial
intelligence.  We have already invested significant numbers of hours in
developing: competence with the MYCI!J system, and we are confident of developing
an extremely capable staff in heuristic programming.  An in-house AI
computational capability is a more difficult capability to cOnCeiVe.  Developing

Privileged Communication

195

J. Lederberg


Section 6.4.5

MATHEMATICAL MODELING OF PHYSIOLOGICAL SYSTEMS

artificial intelligence programming facility on a PDT-11 based system remains a
significant long-term interest.  The satellite capability offers both the
potential of not continuing to provide additional load on SUMEX, and it Offers
tne potential of more rapid interaction with the user.

   The SUMEX facility contributed to the following grant applications and
articles:

Bibliography

1) Simulation to Relate Measured Gas Concentrations at the Mouth to Pulmonary
  Mechanics and Perfusion.  J.C. Kunz, R.R. Mitchell, D.H. McClung, J.J.
Osborn, Submitted to the 1977 ACERB.

2) Identifiability of Pulmonary and Recirculation Parameters Fol-lowing
  Sequential Bolus Inputs of 133 Xe. R.R.  Mitchell, R.J.  Fallat.  Submitted
  to the 1977 ACEMB.

3) Simulation of Intraregional Ventilation-Perfusion Ratio Mal-distribution.
J.C. Glaub, R.R. Mitchell, R.J. Fallat.  Submitted to the 1977 ACEMB.

4) Measurement of Residual Volume and Ventilation Distribution Using Helium and a
Five Vital Capacity Breath Maneuver.  R.R. Mitchell, Technical Report 32,
  Institutes of Medical Sciences, Feb. 1977.

5) Identification of Human Oculomotor System Parameters with Application to
Strabismus. N.K. Gupfa, A.V. Phatak, Systems Control; R.R. Mitchell, Heart
  Research Institute and Carter Collins, Smith-Kettlewell Institute, Institutes
of Medical Sciences. Submitted to Joint Automatic Control Conference, 197..

J. Lederberg

196

Privileged Communication


PUFF/VM PROJECT

Section 6.4.6

6.4.6    PUFF/VM PROJECT

PUFF/VM - Pulmonary Function and Ventilator Management Project

          John J. Osborn, M.D.
The Institutes of Medical Sciences (San Francisco)

and

          E. A. Feigenbaum
Computer Science Department, Stanford University

Note: The PUFF/VM project is the outgrowtn of the efforts of Prof. Feigenbaum's
group at Stanford to establish new applications areas for AI in medical research.
It represents a collaboration wi.th Dr. Osborn's group which has been working on
another AIM pilot project titled "Mathematical Modeling of Physiological
Systems".  A PUFF/VM proposal is currently pending with NIH and and PUFF/VM i.s
being reviewed in parallel by the AIM Executive Committee for separate pilot
status.

1.  Overall Objectives:

   Our immediate objective is to develop a computer programming system for
interpreting the clinical significance of measures of pulmonary function, We
hope to develop this system for diagnostic use i-n the pulmonary function
laboratory and to aid di.agnosis and ventilator management of respiratory
insufficiency in the intensive care unit, We hope to demonstrate the clinical
effectiveness of such a system for improving the accuracy and timeliness of
diagnosis.

   Our long range goal is to develop an integrated system for making and
interpreti.ng measures of pulmonary function.  We beli.eve that this i.s possible
because of the present and potential contribution of instrumentation and data
analysis systems to the diagnosis and clinical management of pulmonary distress.
We believe, in addi.tion, that the discipline of knowledge-based heuristic
programming is potentially the best basis on whi.ch to develop a system for
automaticaly interpreting the results of the measures of pulmonary function.

   We aim, in the long run, to develop an inexpensive enough implementation
that the system will find wide acceptability in the delivery of clinical care.

[Further detai.ls will be furnished by Dr. Feigenbaum on request]

197

J. Lederberg



Appendix I

OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH

Appendjx z

OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH
         -.

ARTIFICIAL INTELLIGENCE RESEARCH

What is it? What has it achieved? Where is it gojng?

Excerpt from a report by
Professor Edward A. Feigenbaum
   Stanford University

[The contents of this appendix have been deleted for brevj.ty -
copies may be had upon request to Dr. Lederberg]

J. Lederberg

202


AI 'rlAI'JDBOOK OUTLINE                                  Appendix II

Appendix II

AI HANDBOOK OUTLINE
-

Ncl'TE:
---
The following material is a tentative outline of a handbook on artificial
intelligence planned for publication.  It is not to be cited or quoted out of the
context of this report without the express permission of Professor E. A.
Feigenbaum of Stanford University.  This handbook is intended for two kinds of
audience; computer science students interested in learning more about artificial
intelligence, and engineers in search of techniques and ideas that might prove
useful in applications programs.  Articles in the first seven sections are
expected to appear in the first volume to be published in preliminary form by
September 1977.  The remaining articles are expected to appear in the second
volume to be published in preliminary form by June 1978.

   The following is a brief checklist that was used to guide the computer
science students engaged in srriting articles for the handbook.  It is, of course,
only a suggested list.

i) Start with l-2 paragraphs on the central idea or concept of the
article. Answer the question "what is the key idea?!'

ii) 'Give a brief history of the invention of the idea, and its use in
   A.I.

iii) Give a more detailed technical description of the idea, its
   implementations in the past, and the results of any experiments with
   it. Try to answer the question "How to do it?.

iv) plake tentative conclusions about the utility and limitations of the
  idea if appropriate.

v) Give a list of suitable references.

vi) Give a small set of pointers to related concepts (general/overview
  articles, specific applications, etc.)

vii) When referring in the text of an article to a term which is the
   subject of another handbook article, surround the term by +`s; e.g.
   +Production Systems+.

AI Handbook .Articles
--

I. Ii1TRODUCTION
  A. Philosophy
  B. Relationship to Society
  C. History
   D. Conferences and Publications

Privileged Communication

225

J. Lederberg


Appendix II

AI HANDBOOK OUTLINE

II. HEURISTIC SEARCH
  A. Heuristic Search Overview
  B. Search Spaces
      0. Overview
     1. State-space representation
     2. state-space search
     3. Problem-reduction representation
     4. AND-OR trees and graphs
   C. "Blindtl Search Strategies
      1. Overview

  2. Breadth-first searching
  3. Depth-first searching
  4. Bi-directional searching
  5. tinimaxing
  6. Alpha-Beta searching
D. Using Heuristics to Improve the Search
   1. Overview

  2. Best-first searching
  3. Hill climbing
  4. ?4eans-ends analysis
  5. Hierarchical search, planning in abstract spaces
  6. Branch and bound searching
  7. Band-width searching
E. Programs employing (based on) heuristic search
   1. Overview

2. Historically important problem solvers
  a) ,GPS
  b) Strips

c) Gelernter's Geom. Program

III. AI Languages
  A. Early list-processing languages
   8. Language/system features
     0. Overview of current LP languages
      1. Control structures
     2. Data Structures (lists, associations,
     3. Pattern iulatching in AI languages
     4. Deductive mechanisms
  C. Current languages/systems
     1. LISP, the basic idea
      2. INTERLISP
     3. QLISP (mention QA4)
     4. SAIL/LEAP
      5. PLANNER
     6. COtiNIVER
     7. SLIP
     8. pop-2
     9. SNOBOL
    10. QA3/PROLOGUE

J. Lederberg

226

Privileged Communication


AI HANDBOOK OUTLIidE

Appendix II

IV. Representation of Knowledge
  A. Overviews

  1. Survey of representation techniques
2. Issues and problems in representation theory
B. Representation Schemes
  1. Predicate calculus
2. Semantic nets -- Guillian, Hendrix, LNR
  3. Production rules
  4.  itiRLIN
5. Procedures (SHRDLU, actors, demons)
  6. Frames

7. Componential analysis
8. Scripts
9. KRL
10. ?/iultiple Knowledge sources - Blackboard
11. Query languages
12. FOL

V.

SPEECH UNDERSTANDING SYSTEMS
A. Overview  (include a mention of ac. proc.)
B. Integration of Multiple Sources of Knowledge
C. The ARPA speech systems
  1. HEARSAY I
  2. HEARSAY II
  3. SPEECHLIS

4. SDC-SRI System (VDMS)
5. DRAGON

VI. Natural Language
   A. averview - History & Issues
  B. Representation of cleaning
    "

b. Grammars and Parsing
  1. Review of formal grammars
  2. Extended grammars
     a. Transformational grammars
     b. Systemic grammars
      c. Case Grammars
  3. Parsing techniques
     a. Overview of parsing techniques
     b. Augmented transition nets, Woods
        C.  CHARTS - GSP
D. Text Generating systems
E. Machine Translation
  1. Overview & history
  2. Wilks' machine translation work
F. Famous Natural Language systems
  1. Early NL systems (SAD-SAM through ELIZA)
   2. PARRY
  3. MARGIE
   4. LUNAR

5. SHRDLU, Winograd

Privileged Communication

227

J. Lederberg


Appendix II

AI HANDBOOK OUTLINE

VII. Applications-oriented AI research (overview)
  A. Chemistry

  1. I;bss spectrometry - DEIJDRAL
  2. Organic Synthesis - overview
6.  illedicine
   1.  1.IYCI N
    3 -.. Others
C. Psychology and Psychiatry
  1. Protocol Analysis (Waterman and Newell)
D. Math systems
   1.  REDUCE
   2.  MACSYlJlA Cment.ion SAINT)
E. Business and Management Science Applications
   1.  Assembly line/ power distrib.
F. Miscellaneous
   1. LUNAR
   2. Education
  3. SCHOLAR
   4. SOPHIE
  5. SRI computer-based consultation
  6. RAND--RITA production rule system
   7. Randevous - Query languages

VIII. AUTOi+lATIC PROGRAMMING
   A. Overview

8. Program Specification Techniques
C. Program Synthesis techniques
   0. Overview
   1. Traces
  2. Examples
  3. Problem solving applications to AP
    a. Sussman's Hacker
   b. Program Synthesis by Theorem Proving
  4. Codification of Programming Knowledge
  5. Integrated AP Systems
D. Program optimization techniques
E. Programmer's aids
F. Program verification

IX.

THEOREM PROVING
A. Overview

B. Resolution Theorem Proving
   1. Basic resolution method
  2. Syntactic ordering strategies
  3. Semantic & syntactic refinement
C. Non-resolution theorem proving
   0. Overview

1. Natural deduction
2. Bayer-Moore
3. LCF

D. Uses of theorem proving
  1. Use in question answering

J. Lederberg

228

Privileged Communication


AI HA1jDBOX OUTLINE                                 Appendix II

  2. Use in problem solving
  3. Theorem Proving languages
  4. lilan-machine theorem proving
E. Predicate Calculus
F. Proof checkers

X. Hunan Information Processing -- Psychology
  A. Perception
   13. ,4emory and Learning
    1. Basic structures and processes in IPP
       2. :lemory Xodels
      a. semantic net memory models
      b. BAM (Anderson & Bower)
        c. EPA`rr,
      d. Productions (HPS)
      e. Conceptual Dependency
  C. Psycholinguistics
  D. Human Problem Solving
      0. Overview
     1. PUG'S
       2. Human chess problem solving
  E. Behavioral Flodeling
     1. Belief Systems
    2. COnVerSatiOnal Postulates (Grice, TW)
      3.  PARRY

XI.

VISION
A. Overview
B. Polyhedral or Blocks World Vision
   1.   Qverview
   2.  Guzman
   3.  Falk
   4.   Waltz
C. Scene Analysis
   1. Overview
  2. Template Matching
  3. Edge Detection
  4. Homogeneous Coordinates
  5. Line Description
  6. Noise Removal
  7. Shape Description
  8. Region Growing (Yakamovsky, Olander)
  9. Contour Following
  10. Spatial Filtering
   11. Front End Particulars
  12. Syntactic t4ethods
  13. Descriptive Methods
D. Robot and Industrial Vision Systems
  1. Overview and State of the Art
   2. Hardware
E. Pattern Recognition
   1. Overview
  2. Statistical Methods and Applications

Privileged Communication

229

J. Lederberq


Appendix II

AX HANDBOOK OUTLINE

  3. Descriptive Methods and Applications
F. Miscellaneous
   1.  lclultisensory Images
  2. Perceptrons

XII. ROBOTICS
  A.   Overview
  a.   Robot Planning and Problem SolvinE
  C.   Arms
  D.   Present  Day Industrial Robots
  E.   Hobotics Programming Languages

XIII. Learning and Inductive Inference
  A.   Overview
  a.   Samuel Checker program
  C.   Winston -- concept formation
  D.   Pattern extrapolation problems--Simon,
  E.   Overview of Induction
  F.   AQVAL (Michalski at U.111)
G. Parameter adjustment of linear functions
  i-l. Rote learning
I. D.A. Waterman's machine learning of heuristics
  J. Learning by debugging
  K. Learning by parameter Adaptation
  L. Signature & move phase tables

XIV. Reasoning and Planning
A. Reasoning by analogy
     1. Overview
    2. ZORBA
  B. planning
     1. NOAH
    2. ABSTRIPS

J. Lederberg

230

Privileged Communication


SUIWARY OF YAIIJSAIL LANGUASE FEATURES

Appendix III

,Appendix III

SLJM:"AHY OF MAIHSAIL LANGUAGE FEATURES
-- -

MAIlJSAIL LANGUAGE FEATURES

Clark R. dilcox
Stanford University

Portable ALGOL-like lanpua:ze uith dynamic memory support_

   MAINSAIL is an ALGOL-like language with dynamic memory support for strings,
arrays, records, modules and files. The driving force behind its design is that
it provide for the development of portable software. At the same time, low-level
features allow the programmer to deal with the underlyin g representation of data
aggregates. These low-level features have made it possible for most of the
runtime system to be written as MAINSAIL modules.

Intended applications
                 --

   flAIRSAIL is not oriented toward any particular application. The flexible
use of memory makes it suitable for tasks with memory requirements which are
difficult to predict prior to execution, as is often the case with knowledge
representation. The string capabilities facilitate word processing applications
such as compilers, text editors and document preparation, and "friendly"
interactive programming. These same facilities require runtime support, so that a
i4AINSAIL program is not a stand-alone body of code, and thus may not be
appropriate for some primitive system utilities.

Portability

   A primary goal is that compatible implementations be provided on a variety
of computer systems. Programs which are written for portability should be able to
execute on any of the implementations with the same effect.  Such programs must
adhere to reasonable constraints with regard to data and memory ranges, as
described in the language manual. Programs which violate these constraints are
not considered portable, and thus may behave differently on different
implementations. This design for portability raises a number of questions with
regard to how well MAINSAIL will fit any particular machine. It is too early to
provide a conclusive answer to such concerns, though it appears that many
machines will efficiently support MAINSAIL implementations.

Vlodularit

   In addition to the more obvious effects the machine-independent design has
on data types and operations,  it also necessitates a model of runtime
interactions which can be supported on a broad range of computers.  In particular
iliAINSAIL must be able to execute in a limited address space, which means that
programs must be broken into pieces (modules) which need be in memory only when
executing.  The inability to characterize linkage and overlay systems in a

Privileged Communication

231

J. Lederberg


Appendix III

SUMMARY OF MAINSAIL LANGUAGE FEATURES

machine-independent manner has forced MAINSAIL to take over these functions, and
thus assume duties often considered part of the operating system.

   A MAINSAIL program consists of an open-ended collection of modules, i.e.,
the programmer need not specify what modules make up a program.  The modules may
originate from many files at execution, as contrasted to the common approach of
having a single "save file" or "load module" which may contain an overlay
structure.

   The modules are compiled separately and assembled into a form which does
not require linkage prior to execution,  MAINSAIL resolves all inter-module
references at runtime.  Modules are automatically brought into memory as needed.
If there is insufficient room in memory for an incoming module, MAINSAIL
automatically swaps out one or more resident modules to make room.  This swapping
could involve i/o to an external device or memory mapping.  Modules are position-
independent, i.e., they do not contain references to fixed memory locations.
Thus they may be moved about during execution, and need not be swapped into the
same memory locations from which they were swapped out.  This generalization of
the traditional overlay structure will make possible the implementation of
sizeable programs in a limited address space, while at the same time utilizing
the minimum possible memory on larger systems.

m of data types

   In order to allow efficient operation on machines with a small word size,
yet access to large values when necessary, MAINSAIL offers both short and rclong5"
data types: integer, long integer, real, long real, bits and long bits. In
practice the long forms are used much less frequently than the short forms, and
thus can be simulated if necessary with no major degradation in efficiency.
These data ranges have been chosen to fit the range of machines for which
MAINSAIL is intended.

Strings

    A MAINSAIL string is a variable length sequence of characters. The
programmer does not need to specify a maximum length for a string as is common in
many languages.  Instead, MAINSAIL keeps track of the current number of
characters in a string and automatically handles storage allocation. Most
existing general-purpose languages have omitted a full implementation of strings,
apparently under the assumption that they could not be efficiently implemented,
and were dispensable.  However, the hardware design trend is toward
microprogrammed instruction sets which support string operations, in view of the
increasing acceptance of computers for word-processing.

Classes, records and pointers
                  --

   MAINSAIL employs a general notion of %lassll as a collection of data and
procedures fields.  Classes serve two purposes: they specify the interfaces
through which modules communicate with one another; and they are used as
templates for the creation of and access to records. A record is a dynamically
allocated memory area which contains data corresponding to the fields of the
class to which it belongs. The fields of a record are accessed by means of a
pointer to the record, combined with the name of the field. The pointer must have
been associated with the record's class when it was declared.

J. Lederberg

Privileged Communication


SUVKJIARY OF ;iAIt\JSAIL LANGUA!X FEATURES                       Appendix III

    'The notion of "prefix class" was introduced to provide for a hierarchy of
classes. A class which is declared with a prefix clas s is conceptually made a
member of the prefix class, and inherits the fields of the prefix class as its
initial fields. For example, the concept "doubly-linked list" may be represented
as a class with two pointer fields, say l'leftll and "rightt'.  Any other class will
automatically inherit these two fields if it is defined as a doubly-linked-list
class.

   The language contains rules which govern the use of pointers according to
the relationships between classes and prefix classes. MAINSAIL provides for
secure use of pointers in the majority of cases, but allows insecure operations
if desired.

Arrays

   MAINSAIL's implementation of arrays is quite flexible in that it allows the
programmer full control over the creation and disposal of arrays.  This is to be
contrasted with classical ALGOL, where array allocation is tied to block
structure.  An array is actually a pointer to a record, and thus is allowed many
of the same constructs provided for pointers, such as assignment, equality
comparison, and parameter passing.  An array may be a field of a class, so that
any number of records may be allocated which contain array fields.   This
capability is particularly useful in image processing, where flexible array
allocation can significantly simplify program logic.

Procedures

    Procedures play a major role in i4AIHSAIL. Procedures may be typed for use
in expressions.  There are three simple parameter passing mechanisms: USES passes
the value; PRODUCES passes a value back to the caller;  and MODIFIES passes and
returns a value.  Optional arguments, repeatable arguments, and generic
procedures provide useful synt.actic constructs. Any procedure may be invoked
recursively. Other procedure characteristics are COMPILETIflE (if all arguments
are constants, the procedure is evaluated during compilation), INLINE (produces
"in-line" code), and CODED (supports assembly language coding).

Embedded assembly language

    A number of facilities support the use of assembly language within a
MAINSAIL program: CODED procedures, the Code statement, and the various forms of
encoding variable offsets.  Of course assembly language cannot appear within a
machine-independent program, but nevertheless there are many instances when the
target machine is known.  'The MAINSAIL interface to each operating system makes
extensive use of the assesbly language facilities.

Compiletime support

   Most present-day compilers were designed to work in a sequential access
mode, and suffer from the resulting limitations.  The MAINSAIL compiler was
designed with the understanding that the source files would be on random-access
devices, so that it need not progress through the file in a strictly linear
fashion.  Any number of nested input files are allowed,  in fact the same file may
be scanned several times during compilation (contrast this with a compiler
designed for input from puncned card decks).

Privileged Communication

233

J. Lederberg


Appendix III

SUMMARY OF MAINSAIL LANGUAGE FEATURES

    Compilation involves interaction with the user in that the programmer can
put messages in a source file which are displayed during compilation.   The user
can affect the course of the compilation by specifying the names of files to be
compiled as requested by directives within the file being compiled, and by
defining values which govern the scanning of the source text.  The compiler has
the ability to quickly search through a file for the text to be compiled as
specified either by earlier source text, or interactively by the user.   This
allows a single file to be made a repository of fragments of source text needed
during many different compilations, and quickly searched during a particular
compilation.

   Conditional compilation allows an arbitrarily complicated expression
(ultimately made up of constant operands) to be evaluated by the compiler to
determine whether a particular segment of the source file is to be ignored. In
general, the compiler will evaluate all expressions involving only constant
operands (of type boolean, (long) integer, (long) bits, and string) and
compiletime procedures.  These facilities are quite important when building a
large parameterized system.

   A save and restore facility allows the current state of the symbol table to
be saved.  It may be restored during a later compilation to avoid recompiling
unchanged text.  This is particularly useful for the development of a collection
of modules all of which utilize one or more common "header" files.

   A comprehensive macro facility provides for the definition of constants,
arbitrary text, and arbitrary text with parameters.  Many commonly used constants
are predefined, especially as needed by the system procedures to simplify passing
of bits parameters consisting of predefined l*flaqsll.

File system

   A simple yet powerful file system has been designed which, like all
features of I~AINSAIL, is guaranteed on every implementation.  When a file is
opened for use, the program specifies whether it contains text or data (binary),
and whether access is sequential or random.  A fundamental assumption is the
ability to communicate with a controlling terminal, called the tty (l'teletype").
For example, error message are output to tty, and a response is expected.

Privileged Communication

J. Lederberg

234


IIICI~OPRXRAM~~ZD :4AINSAIL PLANS

Appendix IV

Appendix u

MICROPROSRAWED MAINSAIL PLANS
                         -_I_

Plans for a Microprogrammed Implementation of MAINSAIL

Clark R. Wilcox
Stanford University

   In this appendix we shall discuss our plans for a microprogrammed
implementation of MAINSAIL. The goal of this research is to determine t.he
feasibility of distributing a cost-effective integrated hardware-software
programming environment. A computer which operates under the control of a
microprogrammable control store offers a new approach to efficient program
execution which we summarize below. We feel this approach could offer the means
of developing reasonably-priced comput.ing resources with the capability of
executing programs which are too demanding for present mini-computers. It appears
that such machines may be widely available within a few years. We propose to
purchase the necessary hardware to enable us to develop a microprogrammed
MAINSAIL implementation.

The emulation approach to high-level language implementation
                         -       --

   Traditional implementations of high-level language involve translation to
the fixed machine languages of the target machines. Such machine languages have
not been designed for the efficient representation of high-level lan,guages, with
the result that an excessive number of overhead instructions are required to map
the high-level language into its directly-executable machine code "surrogate".
Witn the advent of microprogrammable computers with writable control stores, a
different approach appears to have great promise for the efficient execution of
high-level languages.

   A micro-coded computer executes the instructions in main memory under
control of the micro program. Thus the machine code may be viewed as data which
is interpreted, or emulated, by the micro program, rather than as direct signals
to the hardware. The micro program is written in a more primitive machine code
called micro code, which (usually) directly controls the hardware, Most micro-
coded computers have been designed for the emulation of a particular machine
code, and thus the micro-code is simply a means of reducing the complexity of the
hardware while perhaps providing a "higher-level" machine code. The micro-code is
placed into a high-speed memory (relative to main memory), so that many micro
instructions can be executed,in the time it takes to fetch a single instruction
from main memory.

   The same technique of interpreting a particular machine code with a micro
program can be broadened to the ability to interpret an arbitrary machine code.
Such a micro computer is called a "softtf machine, or "universal host", since it
is not oriented toward any particular machine code. Instead, the language
implementor chooses a suitable machine-code representation.  A compiler is
constructed which translates into this representation, and a micro program is

Privileged Communication

235

J. Lederberg


Appendix IV

MICROPROGRAMMED ?IAINSAIL PLANS

written which interprets the representation. This approach is known as a
"directly executable language", or DEL, since the high-level language has been
translated into a form tailor-made for it.  The unnecessary overhead instructions
are eliminated, with a resulting decrease in program representation and increase
in execution speed.  There is evidence [3,4,6] that this approach can provide
substantial dividends.

A MAINSAIL Directly Executable Language (DEL)
                      --

   IJe propose to design a MAINSAIL DEL and implement it on a microprogrammable
computer. The goal is to evaluate the economic and technical advantages of
exporting a combined hardware-software environment for program development and
distribution. In particular, we want to orient MAINSAIL's design and
implementation toward such an emulation approach and compare the resulting
"MAINSAIL machine" with conventional implementations.

   We are interested in determining whether a "soft" machine of this sort can
be provided cheaply enough to serve as a basis for the distribution of software
which presently requires expensive hardware facilities. Hardware which can be
specifically tailored for high-level language execution may provide the quickest
route to the economically viable distribution of programs which exceed the limits
of present general-purpose mini-computers.

   This work will complement the on-going implementations of MAINSAIL on
conventional hardware. Thus we will be in a unique position to compare the two
approaches. r,je expect the MAINSAIL DEL to outperform other MAINSAIL
implementations in much the same way that DELtran (a DEL for FORTRAN II)
outperforms FORTRAN II [3]. Initial measurements show that the DELtran
representation is less than one fifth the size of the code generated by the
FORTRAN-H optimizing compiler,  and executes about five times faster.

   MAINSAIL is perhaps better suited to the emulation approach than FORTRAN
because of the locality of reference provided by procedures, records and modules.
A preliminary DEL has already been designed for MAINSAIL, but further work is
necessary before we can predict (or demonstrate) size and execution comparisons
with standard implementations. There is much work to be done in determining the
efficient representation of ALGOL-like languages for the purpose of emulation,
and providing data from actual implementations.

    A MAIkSAIL DEL could provide facilities which are impossible to provide in
an efficient manner on conventional machines.  These facilities relate to the
monitoring of the program during execution. Since the emulator is simply a
program written in micro code, it can be made to perform any kind of execution-
time checks with no need to alter the DEL. By contrast, the MAINSAIL compiler
must generate different code depending on the amount of checking to be performed.

    The emulator can also provide execution profiles and comprehensive
debugging facilities such as instruction traps and single stepping. We expect to
provide several emulators which are oriented toward particular types of
execution, e.g. a "fast" emulator which maximizes execution speed, a "careful"
emulator which provides comprehensive runtime checks, a "performance monitoring"
emulator which gathers information concerning program execution, and a
"debugging" emulator which allows interactive debugging.

J. Lederberg

235

Privileged Communication


KtCROPROGRAPlClED MAINSAIL PLANS

Appendix IV

   Another advantage of the emulation approach is the simplifications in the
compiler. Since the compiler will translate MAINSAIL to its own DEL, the code
generators become almost trivial. MAINSAIL operations which require many
instructions on existing machines can be compactly represented with the DEL.  The
compiler need not worry about register optimization since there will be no
registers in the DEL representation.  Since the MAINSAIL DEL is a close
representation of the source code, there is no reason to "drop into assembly
language" since any "sensible" program which could be written in the DEL could
more easily be written in MAIGAIL.

Hardware support

    To support this development, we propose the purchase of a dynamically
micro-programmable macnine with such supporting hardware as is necessary.   This
machine should be a universal host in the sense that it is not already oriented
towards a particular machine code. Its software support is of little consequence
since we will design our own operating system and high-level language support.

   tie are interested in implementing sophisticated pro,grams, and thus require
a large address space (say 24 bits) and 32-bit arithmetic. We need sufficient
control store, say 16K words, to support a debugging emulator and selected parts
of the operating system. The micro store must be able to quickly transfer words
to and from main memory, in particular we want to be able to quickly switch
emulators. There must be facilities for interface to a variety of peripherals,
and to other computers.

   There are some machines now available along these general lines (e.g. [l]),
with the introduction of more imminent. Indeed, manufacturers are beginning to
include user-microprogrammable features with new models of their traditional
hardware, e.g. Digital Equipment Corporation's PDP-11/60 and Data General's
Eclipse.

    One such machine, EMMY, has been developed by the Stanford Emulation
Laboratory, under the direction of Professor Michael Flynn of the Department of
Electrical Engineering [2,5]. EiWY is a universal host machine which closely fits
our needs. It is an unbiased yet efficient host for a wide range of target
machine architectures. EflMY is scheduled to go into production in late 1977 by
ICL of England (the emulation laboratory has been involved in the development of
a prototype). Ge feel tnat this machine would suit our needs, but further
evaluation is necessary.

   We expect most of development of the MAINSAIL DEL to be independent of any
particular micro program representation. In particular, we are not at this time
proposing to carry out any hardware design to orient the host processor towards
l"IAINSAIL, though this approach would be reasonable if a large number of
processors were to be distributed solely to support MAINSAIL execution.

Privileged Communication

237

J. Lederberg


Appendix IV

MICROPROGRAMMED MAINSAIL PLANS

References

1.  Burroughs Corp., "B-1700 Systems Reference Manual," Burroughs Corp., Detroit,
Michigan, 1972.

2. Flynn, M. J., Hoevel, L. W., and Neuhauser, C. J., "The Stanford Emulation
Laboratory," Digital Systems Lab., Technical Report No. 118, Stanford
University, June 1976.

3. Hoevel, L. W. and Flynn, M. J,, "The Structure of Directly Executed Larquages:
A New Theory of Interpretive System Support," Digital Systems Laboratory,
  Technical Report No. 130, Stanford University, March 1977.

4. Hoevel, L. W., "DELtran Principles of Operation," Digital Systems Laboratory,
  Technical Note No. 108, Stanford University, March 1977.

5.  ideuhauser, C. J., "An Emulation Oriented, Dynamic Microprogrammable
  Processor, If Digital Systems Lab., Technical Note No. 65, Stanford University,
  i)ctober 1975.

6.  Wilner, W., "Burroughs B-1700 Memory Utilization," AFIPS Proceedings, Vol. 41-
  1, FJCC, 1972, pp. 579-586.

J. Lederberg

238

Privileged Communication


AIM MANAGE~4ENT COMMITTEE MEMBERSHIP

Appendix V

Appendix 1

AIM MANAGEMENT COMMITTEE MEMBERSHIP
---

   The following are the membership lists of the various SUMEX-AIM management
committees at the present time:

AIrl EXECUTIVE_ COMMITTEE:

LEDEtlBERG, Joshua, Ph.D.           (Chairman)
    Department of Genetics, S331
     Stanford University Medical Center
     Stanford, California 94305
     (415) 497-5801

AMAREL, Saul, Ph.D.
     Department of Computer Science
     Rutgers University
    New Brunswick, New Jersey 08903
     (201) 932-3546

BAKER, William R., Jr., Ph.D.       (Executive Secretary)
    Biotechnology Resources Program
     National Institutes of Health
     Building 31, Room 5B43
     9000 Rockville Pike
     Bethesda, Maryland 20014
     (301) 496-5411

LINDBERG, Donald, M.D.
     605 Lewis Hall
     University of Missouri
     Columbia, Missouri 65201
     (314) 882-6966

llYERS, Jack D., M.D.
      School of Medicine
     Scaife Hall, 1291
    University of Pittsburgh
     Pittsburgh, Pennsylvania 15261
     (412) 624-2649

Privileged Communication

239

(Adv Grp Member)

J. Lederberg


Appendix V                        AIM MA;$AGEMEN'I COMWITTEE MEMBERSHIP

AIM ADVISORY GROUP:
.-

J. Lederberg

LINDBERG, Donald, M.D.
     605 Lewis Hall
     Univer3it.y of Missouri
    Columbia, Missouri 65201
     (314) 8824966
AMAREL, Saul, Ph.D.
    Department of Computer Science
     Rutgers University
     New Brunswick, New Jersey 08903
     (201) 932-3545

(Chairman)

BAKER, William R., Jr., Ph.D.       (Executive Secretary)
    Biotechnology Resources Program
     National Institutes of Health
    Building 31, Room 5B43
     9000 Rockville Pike
    Bethesda, Maryland 20014
     (301) 496-5411

BOBROW, Daniel G., Ph.D.          [Term expiring]
     Xerox Palo Alto Research Center
     3333 Coyote Hill Road
    Palo Alto, California 94304
    (415) 494-4438

FEIGENBAUM, Edward, Ph.D.
    Department of Computer Science
     Polya Hall, Room 213
     Stanford University
    Stanford, California 94305
    (415) 497-4079

FELDMAN, Jerome, Ph.D.           [Term expiring]
    Department of Computer Science
    University of Rochester
     Rochester, New York
    (714) 275-5671

LEDERBERG, Joshua, Ph.D.           (Ex-officio)
    Principal Investigator - SUMEX
    Department of Genetics, S331
    Stanford University Medical Center
    Stanford, California 94305
    (415) 497-5801

I"4ILLER, George, Ph.D.
    The Rockefeller University
     I230 York Avenue
     New York, New York 10021
     (212) 360-1801

[Term expiring]

240

Privileged Communication


AIM A~ANAGEMENT COMMITTEE MEi4BERSHIP

Appendix V

MOHLER, William C., M.D.
     Associate Director
    Division of Computer Research and Technology
     National Institutes of Health
    Building 12A, Room 3033
     9000 Rockville Pike
    Bethesda, Maryland 20014
     (301) 496-1158

IIYERS, Jack D., M.D.
     School of Medicine
    Scaife Hall, 1291
    University of Pittsburgh
    Pittsburgh, Pennsylvania 15261
     (412) 624-2649

REDDY, D.R., Ph.D.              [Term expiring]
     Department of Computer Science
    Carnegie-Mellon University
    Pittsburgh, Pennsylvania
    (412) 621-2600, Ext. 149

SAFIR, Aran, M.D.
    Department of Ophthalmology
     Mount Sinai School of Medicine
     City University of New York
     Fifth Avenue and 100th Street
     New York, New York 10029
    (212) 369-4721

Privileged Communication

241

J. Lederberg


Appendix V

AIM MANAGEi4ENT COMMITTEE MEMBERSHIP

STANFORD COMW!JITY ADVISORY COMMITTEE:
         --

LEDERBERG, Dr. Joshua             (Chairman)
     Principal Investigator - SUl4EX
     Department of Genetics, S331
     Stanford University Medical Center
    Stanford, California 94305
     (415) 497-5801

COHEN, Stanley N., M.D.
    Department of Clinical Pharmacology, 5169
     Stanford University Medical Center
    Stanford, California 94305
     (415) 497-5315

DJERASSI, Dr. Carl
    Department of Chemistry, Stauffer I-106
     Stanford University
     Stanford, California 34305
     (415) 497-2783

FEIGENBAUM, Dr. Edward
      Serra House
     Department of Computer Science
     Stanford University
     Stanford, California 94305
     (415) 497-4878

LEVINTHAL, Dr. Elliott C.
     Department of Genetics, SO47
     Stanford University Medical Center
     Stanford, California 94305
     (415) 497-5813

J. Lederberg

242

Privileged Communication


USER INFORMATION - GENERAL SRXHURE

Appendix a

USER INFORMATION = GENERAL BROCHURE_
     Revised May 1976

Privileged Communication

243

Appendix VI

J. Lederberg


GUIDELI~~ES F3R PROSPECTIVE USERS

Appendix VII

Appendix VII

(;UIDELINES FOR PROSPECTIVE USERS

    SUMEX-AI:4 RESOURCE
INFOR+lATION FOR POTENTIAL USERS

    National users may gain access to the facility resources through an
advisory panel for a national program in Artificial Intelligence in Medicine
(AIM).  The AIPI Advisory Group consists of members-at-large of the AI and medical
communities, facility users and the Principal Investigator of SUMEX as an ex-
officio member.  A representative of the National Institutes of Health-
Biotechnology Resources Program (NIH-BRP) serves as Executive Secretary.

   Under its enabling ?-year grant, the SUMEX-AI% computing resource is
allocated to qualified users without fee.  This, of course,  entails a careful
review of the merits and priorities of proposed applications,  At the direction
of the Advisory Group, expenses related to communications and transportation to
allow specific users to visit the facility also may be covered.

USER QUALIFICATIONS

   The SUIMEX-AIM facility is a community effort,  not merely a machine service.
Applications for membership are judged on the basis of the fol1owin.g criteria:

1) The scientific interest and merit of the proposed research and its
  relevance to the health research missions of the NIH.

2) The congruence of research needs and goals to the AI functions of SWIIEX-
  AIM as opposed to other computing alternatives.

3) The user's prospective contributions and role in the community, with
  respect to computer science, e.g., developing and sharing new systems or
  applications programs, sharing use of special hardware, etc.

4) The user's potential for substantive scientific cooperation with the
  community, e.g., to share expert knowledge in relevant scientific
  specialties.

5) The quantitative demands for specific elements of the SUMEX-AIM
   resource, taking account of both mean and ceiling requirements.

    In many respects, this requires a different kind of information for
judgment of proposals than that required for routine grant applications seeking
monetary funding support.  Information furnished by users also is indispensible
to the SUiMEX staff in conducting their planning, reporting and operational
functions.

Privileged Communication

245

J. Lederberg


Appendix VII

GUIDELINES FOR PROSPECTIVE USERS

   Tne following questionnaire encompasses the main issues concerning the
Advisory Group.  However, this should neither obstruct clear and imaginative
presentation nor restrict format of the application.  The potential user should
prepare a statement in his own words using previously published material or other
documents where applicable.  In this respect, the questionnaire may be most
useful as a checklist and reference for finding in other documentation the most
cogent replies to the questions raised.

   For users mounting complex and especially non-standard systems, the
decision to affiliate with SUPIEX may entail a heavy investment that would be at
risk if the arrangement were suddenly terminated.  The Advisory Group endeavors
to follow a responsible and sensitive policy along these lines--one reason for
cautious deliberation; and even in the harshest contingencies, it will make every
effort to facilitate graceful entry and departure of qualified users.
Conversely, it must have credible information about thoughtful plans for long-
term requirements including eventual alternatives to SUMEX-AIM.  SUMRX-AIM is a
research resource, not an operational vehicle for health care.  Many programs are
expected to be invest.igated, developed and demonstrated on SUMEX-AIM with
spinoffs for practical implementation on other systems.  In some cases, the size,
scope and probable validation of clinical trials would preclude their being
undertaken on SUMEX-AIM as now constituted.  Please be as explicit as possible in
your plans for such outcomes.

Applicants, therefore, should submit:

1) One to two-page outline of the proposal.

2) Response to questionnaire, cross-referenced to supporting documents
  where applicable.

3) Supporting documents,

4) List of submitted saterials, cross-referenced.

   tie would welcome a draft (2 copies) of your submission for informal comment
if you so desire.  However, for formal consideration by the SUMEX-AIM Advisory
Group, please submit 13 copies of the material requested above in final form.

Elliott Levinthal, Ph.D.
AIM User Liaison
SWMEX-AIM Computer Project
c/o Department of Genetics, SO47
Stanford University Medical Center
Stanford, California 94305
Telephone:  (415) 497-5813

May, 1976

J. Lederberg

246

Privileged Communication


GUIDELINES FOR PROSPECTIVE USERS

SU14EX-AIM RESOURCE

Appendix VII

QUESTIONNAIRE FOR POTENTIAL USERS

   Please provide either a brief reply to the following or cite supporting
documents.

A)  MEDICAL AND COMPUTER SCIENCE GOALS

1) Describe the proposed research to be undertaken on the SUMEX-AIM
   resource.

2)

How is this research presently supported?  Please identify application
and award statements in which the contingency of SUMEX-AIM availability
is indicated.  'What is the current status of any application for grant
support of related research by any federal agency?  Please note if you
have received notification of any disapproval or approval, pending
funding, within the past three years.  Budgetary information should be
furnished where it concerns operating costs and personnel for computing
support.  Please furnish any contextual information concerning previous
evaluation of your research plans by other scientific review groups.

3) What is the relevance of your research to the AI approach of SUMEX-AIM
  as opposed to other computing alternatives?

B) COLLABORATIVE COMMUNITY BUILDING

1) Will the programs designed in your research efforts have some possible
  general application to problems analogous to that research?

2) 'What application programs already publically available can you use in
  your research?  Are these available on SUMEX-AIM or elsewhere?

3) What opportunities or difficulties do you anticipate with regard to
  making available your programs to other collaborators within a
  reasonable interval of publication of your work?

4) Are you interested in discussing with the SUMEX staff possible ways in
  which other artificial-intelligence research capabilities might
  interrelate with your work?

5) If approved as a user, would you advise us regarding collaborative
  opportunities similar to yours with other investigators in your field?

C) HARDWARE AND SOFTWARE REQUIREMENTS

1) What computer facilities are you now using in connection with your
  research or do you have available at your institution?  In what respect
  do these not meet your research requirements?

Privileged Communication

247

J. Lederberg


Appendix VII

GUIDELINES FOR PROSPECTIVE USERS

2)  What languages do you either use or wish to use? Will your research
  require the addition of major system programs or languages to the
  system?  Will you maintain them?  If you are conmitted to systems not
  now maintained at SUiGX, what effort would be required for conversion to
  and maintenance on the PDP-10 - TENEX system?  What are the merits of
  the alternative plan of convertin g your application programs to one of
  the already available standards?  Would the latter facilitate the
  objectives of Part B), Collaborative Community Building?

3) Can you estimate your requirements for CPU utilization and disk space?
  What time of day will your CPU utilization occur?  Would it be
  convenient or possible for you to use the system during off-peak
  periods?  Please indicate (as best you can) the basis for these
  estimates and the consequences of various levels of restriction or
   relaxation of access to different resources.  SUMEX-AIM's tangible
   resources can be measured in terms of:

a)  CPU cycles.

b) Connect time and communications.

c>  User terminals (In special cases these may be supported by SUMEX-
  AIM.).

d) Disk space.

e)  Off-line media-printer outputs, tapes (At most, limited quantities
  to be mailed.).

Can you estimate your requirements? With respect to a) and b), there
are loading problems during the daily cycle.--Can you indicate the
relative utility of prime-time (0900-1500 PST) vs. off-peak access?

4)  What are your communication plans (TYMNET, ARPANET, other)?  How will
  your communication and terminal costs be met? See following note
  concerning network connections t.0 SUMEX-AIM.

5) If this is a development project, please indicate your long-term plans
  for software implementation in an applied context keeping in mind the
   research mission of SUMEX-AIM.

   Our procedures are still evolving, and we welcome your suggestions about
this framework for exchanging information.  Needless to say, each question should
be qualified a) "insofar as relevant to your proposalf', and b) "to the extent of
available information".

   Please do not force a reply to a question that seems inappropriate. We
prefer that you label it as such so that it can be dealt with properly in future
dialogue.

J. Lederberg

248

Privileged Communication


CUIDELIIJES FOR PROSPECTIVE USGRS

Appendix VII

   Above all, we are eager to work with potential users in any way that would
help minimize bureaucratic burdens and still permit a responsible regard for our
accountability both to the NIH and the public.  Please do not hesitate to address
the substance of these requirements in the format most applicable to you.

?JETtiORK CONNECTIONS TO SUMEX-AIM

   Attached is a list. of available TYMNET nodes and associated telephone
numbers.  The cost to users of using TYMNET is the telephone charge from user
location to the nearest TYMNET node.  This is available only for communication to
SUPlEX-AIH and not for other facilities that may be connected to TYNNET.   In some
cases, there are "foreipn exchanpes" set up by users.
                         3                      These may offer less
expensive communication. Detail: of these possibilities can best be learned by
calling tne nearest TYMNET node.  The telephone company can provide information
on comparative costs of leased lines, toll charges, etc.  The initial capital
investsent for TYMNET installation as well as login and hourly charges is
provided by SLIMEX-AIN.  Standard usage charges on TYMNET are approximat.ely
$j/connect-hour.

ARPANET

    SWEX-AIM is connected to the ARPANET.  Our name is SUMEX-AIM; our nickname
is AIM.  We support the new TELHET protocol.  Our network address is decimal 56,
octal 70.  This provides convenient access for ARPANET Hosts and Associates and
those who have accounts with ARPANET.

Privileged Communication

?4cr

J. Lederberg