Reprinted from FORMAL REPRESENTATION of HlJ!~~24N JUDGMEPJT,
Benjamin Kleinmuntz (ed.), 1968. Copyright @ 1968 by
John Wiley & Sons, Inc.

CHAPTER 7

  Joshua Lederberg
and Edward A. Feigenbaum
    Stanford University

    MECHANIZATION
OF INDUCTIVE INFERENCE IN
  ORGANIC CHEMISTRY +

INTRODUCTION

  A paradigm of the scientific method is the successive
alternation of reference to hypothesis and datum. We start
with a hypothesis, which moves us to select some aspect of
the real world for empirical enquiry. Sensory impressions
are translated by convention into data. More refined hypoth-
eses are then induced by a poorly understood process which
contains at least two elements: (1) the data somehow suggest
a hypothesis, and (2) deductive algorithms are applied to the
hypothesis to make logically necessary predictions; these are
then matched with the data in a search for contradictions. A
hypothesis is regarded as inductively proven (i.e.. we have
achieved a scientific discovery) when its predictions are
satisfied, and when we have the illusion of inductive exhaus-
tion that no other hypothesis will lead to equally concordant
predictions. It is rare for inductive exhaustionto be rigorously
justified.  Usually, the process is reiterated many times:
each refinement of hypothesis suggests the examination of new
data; each new datum leads to a discrimination among existing
hypotheses. or suggests another refinement.

  Our main contribution is, perhaps, the suggestion that
organic chemistry is an apt field for the mechanization of the
process of scientific induction. It can be circumscribed so
that the first studies are simplified without undue loss of

  *The research reported here was supported in part by the Advanced
Research Projects Agencyofthe Office ofthe SecretaryofDefense (SD-183 1,
and in part by the National Aeronautics and Space Administration (NsG
81-601.

187


188   Formal Representation of Human Judgment

utility or generality.  Real data of any desired level of com-
plexity can be adduced. Few natural sciences are so rich in
inductive analysis from information that can be presented in a
simple, uniform format. By contrast, genetics or embryology
are sciences that might invoke models of most of external
reality. Above all, the hypotheses of organic chemistry can be
abstractly represented, that is, as structure diagrams. These
lead, in turn, to an algebra for inductive exhaustion rarely
available in any other scientific field at the present time.

  As will be seen, our program (named DENDRAL) is
firmly rooted in this algebra which can generate an exhaustive
and irredundant list of hypotheses from initial contextualdata
(1,2,3,4). The problem of inductionis then reduced to efficient
selection from this prospective list. This is only feasible if
the experimental data are recurrently consulted to guide the
hypothesis-generator.  Unproductive branches of the genera-
tion tree are anticipated and avoided as soon as possible;
conversely, the data are used for heuristic reordering of the
priorities with which the hypotheses are brought up for ex-
amination. The program thus simulates a systematic idealiza-
tion more than it does the haphazard evocation of new concepts
in human intelligence.

   HEURISTIC DENDRAL: A SUMMARY FROM THE
STANDPOINT OF ARTIFICIAL INTELLIGENCE RESEARCH

  The intent of this section is to present in a succinct and
compact fashion information relevant to a general understand-
ing of what our program does and how it does it.

Motivation and Task Environment

  We have been interested in exploring processes of em-
pirical inquiry, particularly discovery processes involved in
searching a hypothesis space for hypotheses meaningful and
relevant to the explanations of real-world data. Some practical
considerations concerning the automation of routine scientific
endeavor in particular environments, using heuristic pro-
gramming techniques, also supplied some of the motivation.
We were interested also in exploring man-machine interaction
in the context of scientific problem solving, not only as an


Mechanization of Inductive Inference    189

augmentation to human problem solving processes ("smart
scratch paper") but also as a means for "educating" a suitably
receptive program, by making it easy for a human skilled in
the task area to impart to the program his heuristic search
rules and other information relevant to good performance in
the task.

  The task area chosen was the analysis of mass spectra
of organic molecules. The hypotheses relevant to explaining
mass spectral data in organic chemical analysis are essen-
tially graphs--molecular graphs consisting of atoms and
bonds, such as are seen in textbooks on organic chemistry,
  The main tasks presented to the program at present are:        4
   1. Given a chemical (compositional) formula, output a list
of the chemically most plausible isomers (structuralvariants)    -

of the composition, ordered from most plausible through least
plausible if that is requested.

  2. Given a mass spectrum and a composition, output a
list of the most plausible isomers of the composition in the
light of the spectral data given: Restated, generate a hypoth-
esis or list of hypotheses ,to best explain some given spectral
data.

Proceeeeo, Algorithmic and Heurietic

  The program that solves these problems is called
Heuristic Dendral.  It `is a LISP program of some 30-40
thousand words, developed on the SDC Q-32 time-sharing
system and presently operating on the PDP-6 at the Stanford
Artificial Intelligence Project. Though the program consists
of many functions, the most important activities can be
summarized as follows:

  1. At the most basic level, there is an algorithm, called
the Dendral Algorithm, rarely exercised without constraints.
Given a chemical composition,, it will generate all of the
topologically possible noncyclical connected graphs that canbe
made from the atoms of the composition, given the valences of
these atoms.  Associated with the Dendral Algorithm is a
notation for these graph structures, called Dendral notation.
Canonical forms of the graph structures in Dendral notation
exist and are used.  The Dendral Algorithm is a systematic


190   Formal Representation of Human Judgment

and exhaustive "topologist" and knows nothing about chemistry,
except the valences of atoms. But, using a chess analogy, it
is the "legal move generator," the ultimate guarantor of the
completeness of the hypothesis space.

  2. Heuristic processes control and limit the generation
process (i.e., prune the implicit generation tree). Taken
together, these heuristics constitute the program's "chemical
model."   This model includes: a list of denied embedded
subgraphs, the existence of any one of which in a strucutre
rules out that structure as a plausible hypothesis; a list of
well-known, stable, and generally highly significant radicals
which are treated in an aggregate fashion as "superatoms,"
or essentially higher level concepts; an evaluation function'
not dependent on spectral data that evalutes the potential
fruitfulness of attempting to generate structures from a col-
lection of as yet unassigned atoms (i.e., evaluates the worth
of pursuing a particular subproblem); a data matching process
that we sometimes call "the zero-order theory of the mass
spectrometer" that makes decisions about the relevance of the
subproblems based on the actual mass numbers present in the
given spectrum; a "rote memory,"called the Dictionary, which
is the memory of previously solved subproblems, correspond-
ing to the theorem memory in theorem-proving programs; and
a few other heuristic processes of lesser importance.

   3.  Learning processes in Heuristic Dendral are relatively
simple, and a high order of learning by the program itself
remains more of a goal than an accomplished fact. The main
"internal" learning process is the Dictionary building activity.
Learning on the subproblem evaluation function a la Samuel's
Checker Program is possible, but not implemented. "Ex-
trinsic" learning, in the sense of a human expert communicat-
ing to the program the elements of the chemical model, has
been extensively and successfully used. Perhaps this is high
level programming, but in the same sense that pedagogy in
general is high level programming activity.

ORGANIC CHEMISTRY THE PROBLEM-CONTEXT
             OF DENDRAL

  The fundamental problem of organic chemistry is the
topological structure of a molecule. This was first brought into


Mechanization of Inductive Inference    191

focus by the Swedish chemist, Jons Jakob Berzelius (1779-
1848) when he established the occurence of chemical isomers.
These are different organic molecules having the same
chemical composition or ensemble of atoms; hence they have
different structures (i.e., connectivities of the atoms with re-
spect to atom-to-atom bonds).  For one of the simplest ex-
amples, take C,H,O, which has the two isomers, dimethyl
ether and ethanol (Fig. 7.1). Todetermine that the composition
of a compound once obtained as a pure sample, say C2H60, is
essentially a mechanical process of quantitative analysis. To
assign it to one of the possible isomers is a much more
demanding intellectual exercise.

  H H
H-Lo-&-H
!l k

  HH
H-C-&-H
IilL

2 H3           ,CH,
\CHs                CHa
               `OH

(4

(b)

Fig. 7.1 Two isomers of CzHsO: (a) Dimethyl ether (O..CH3 CH3), (b)
Ethanol or ethyl alcohol (CHZ..CH3 OH).

Each of these may also be represented by isomorphic graphs. for ex-
ample, for ethanol: (CH3.CHI.OH) or (OH.CH2.CH3) or (CH2..OH CH3).
The previous notation is in canonical DENDRAL form, being initialized by
the center of the graph, followed byadot for each radical. and then a list of
radicals in order of an algorithmically defined value. For internal repre-
sentation or more compact coding, the H's can be dropped, leaving us with
O..CC and C..CO respectively. Where symmetries prevail, we can go one
step further, using the `/I as a ditto mark, as in (O./C) for (O..CC). This
economy is, of course, trivial here, but not so in more complex formulas.

  At this level of analysis, structure means connectivity,
not geometry.  In fact, with the help of X-ray diffraction
analysis, a great deal can be learned about the actual disposi-
tion in space of the atoms in a molecule in the crystalline state.
However, the molecules, especially in the liquid or gaseous
states, may be undergoing a variety of dynamic transitions-
linear, rotational, and rocking modes about every chemical
bond.  Chemical geometry is beyond the scope of the present


192   Formal Representation of Human Judgment

discussion, but what we know of it could be superimposed
upon the topological frameowrk developed below.

  The preceding paragraph canbe summarized:  a chemical
structure is represented by an undirected graph whose nodes
are atoms, whose edges are chemical bonds.   While the
analogy was recognized 100 years ago, this outlook has still
to penetrate the teaching of organic chemistry.

  In practical problem solving the chemist uses every
possible datum. For example, smell can help him decide be-
tween dimethyl ether and ethanol, if he didnot already recog-
nize that the ether would be much more volatile than its
isomeric alcohol.  He also has a repertoire of reagents that
can help to detect various fragments (called radicals) in the
molecule (e.g., -OH).  More recently a specialized instrument,
the mass spectrometer, has been developed which facilitates
a unified systematic attack on structural problems, Briefly,
a molecule is bombarded by an electron beam which sputters
off an electron, leaving a positively charged molecule-ion. A
fraction of these fragment, giving radical ions of various
sizes corresponding to different modes of cleavage, often com-
plicated by further rearrangements and reactions of fragments.
Finally, the ensemble of molecule- and radical-ions is re-
solved by careful acceleration through electrostatic and mag-
netic fields.

  The utility of the mass spectrometer and some examples
of the logical inference employed in exploiting it are reviewed
by McLafferty (1966).

  The mass spectrum is a paired list of mass numbers and
their relative intensities.  Mass spectrometers of very high
resolution have been built, capable of distinguishing between
radicals of different composition but the same integer atomic
weight.  For example, the radical -NH, M =15.0110 can be
distinguished from the radical -CH,, M =15.0215. This cap-
ability is especially useful for determining the formula of the .
intact molecule.  Unless we specify otherwise, however, we
have in mind the more ordinary low resolution mass spectrom-
eter which lumps together species having the same integral
mass. However, more precise data are readily accommodated
and avidly used by the program logic.

  The stated goal of our program is thenan inductive solu-
tion of the mass spectrum. That is, a molecular formula and
its mass spectrum are given as data, We must induce the


Mechanization of Inductive Inference    193

structure (hypothesis) that best satisfies the data. Our basic
approach to this has been first to furnish the computer with
a language in which chemical structure hypotheses can be
expressed, then to interrogate chemists and their literature
for the rules and techniques they have used in problem solving
and attempt to translate these into computer algorithms. In
the course of searching for these heuristics, we have in fact
discovered a number of algorithms which are much more
systematic than the approaches commonly used by chemists
in this field.

ISOMERS

  Underlying the solution of virtually every problem and
subproblem in structural organic chemistry is the potential
exhaustion of the list of possible isomers of a given molecule
or radical. It is remarkable that while hundreds of thousands
of students of elementary organic chemistry are challenged in
this way every year, no algorithmfor generating and verifying
complete lists of isomers has hitherto been presented. Each
student is left to work out his own intuitive approach to this
problem, which may account for the bafflement with which
very many students approach the subject upon their first ex-
posure to it.

  The core of DENDRAL is a notation for chemical struc-
tures and an algorithm capable of producing all distinct iso-
mers and casting each of theminto a canonical representation.
This will be outlined in more detail further on.

  The lowest level of DENDRAL might be called the topo-
logist.  This machine considers only the valence rules and
elementary graph theory in constructing lists of isomers. It
uses two elementary concepts, one, the center of a graph as a
point of departure, andtwo, arecursiveprocedurefor evaluat-
ing a radical as a way of specifyingthe canonical representa-
tion of a given molecule.  After the center of the map is fixed,
being either a bond or an atom of known valence, the radicals
pendant on the center must be listed in nondecreasing value.
The apical node of each radical is then regarded as a new
center and the process continues recursively.

  The same approach can be used to make a generator from
DENDRAL. From the formula or composition list, a bond or a


194   Formal Representation of Human Judgment

given species of atom is first taken as the central feature and
the remaining atoms partitioned in appropriate ways, and
these partitions assigned tentatively to the pendant radicals.
For each radical then successive allocations are made for the
apical node and then partitions are allocated to the pendant
subradicals, and so forth.

TABLE 7.1  Canons of Dendral Order* (Hierarchy of Vector Valuation
         in Decreaeing Order of Significance)

The DENDRAGVALUE of a Radical Consists of the Vector:

COUNT

Rings by number of rings t
Other atoms (except II)

COMPOSITION of radical

  Rings t by valuation of ring
   Compoeition, Vertex Group, Path List, Vertex List, Substituent Locations
  Other atome by atomic number (S, P, 0, N,C)

UNSATURATIONS (afferent link included; ring paths excluded)

APICAL NODE

Ring Value t
Degree :  number of efferent radicals
Composition: e.g. (S, P, 0, N, C)
Afferent Hnk: (i, :, .)

APPENDANT RADICALS if any

(nested vectors in canonical order): nil if current apex is terminal
Enantiomerism around apex (DL, D, L, unspecified) if applicable

*From J. Lederberg (1964), DENDRAL64, NASA CR-57029, Star No, N65-13158
t
Rings are not discussed in this paper.
  Each line above is a separate cell or subcell of the vector.

  Table 7.1 summarizes the order of allocation for evaluat-
ing a radical, and for generating the next structure, a pro-
cedure used recursively in the DENDRAL program in LISP.
At this time, the operational program is confined to acyclic
structures. However, the specifications have been detailed for
a complete system, including ring structures of arbitrary com-
plexity as well as consideration of optical isomerism (1,2).
Table 7.2 lists the computation of all the isomers generated by
the topologist for the formula CSH,N02, one of whose isomers


Mechanization of Inductive Inference    195

is the common amino acid, alanine. This exercise is already
at the very margin of human capability, barring the possible
rediscovery of this algorithm. In practice no intelligent human
has the patience to attempt to generate such a list by the in-
tuitive process. The chemist willoftenthendemand redundant
information at this point in order to narrow the range of pos-
sibilities he is obliged to consider before he will make the
effort to produce an exhaustive list.

  The topologist knows only the valence rules as quasi-
empirical data, that is, that four bonds must issue from each
carbon atom, three from any nitrogen, two from any oxygen,
and but one from hydrogen.  With this very limited quota of
chemical insight, the topologist produces many structures that
would be regarded as absurdities by the experiencedchemist.
for example the radical (.O.NH*OH) in no. 4of Table 7.2. The
next stage in the development of DENDRAL is then to impart a
certain amount of additional chemical information taken from
the real world. Indoingthisadefinite context is implied, even
if this is not immediately overt.  There are probably many
realms of organic chemistry, for example, at ultra low tem-
peratures, of which we have only limited experience.  The
implicit context we have in fact adopted is that of the natural
product, that is to say, molecular species that might be reason-
ably stable at ambient temperatures, and therefore stand some
chance of persisting or being isolated from natural sources.
However, this rule has been applied rather cautiously and the
lists that will be adduced for further illustration still contain
a number of items which would be regarded as quite dubious
by this criterion.

  The program is quite amenable to adjustment to any given
set of facts.  Indeed, a certain stage in the program can be
switched on to interrogate the chemist to help to find the con-
text in which various rules will be applied or not.  At this stage
chemical insight is given most explicitly by providing a list
of forbidden substructures. Whenever these substructures are
encountered during the building of a potential molecule, the
generator is adjusted to ignore that entire branchof synthetic
possibilities. In order to effectuate this use of a "BADLIST,"
a graph matching algorithm has been incorporated into the
DENDRAL program. At best, however, graph matching is an
expensive proposition and it soon became necessary to seek
ways of economizing on redundant computation.  The last


  ((U
   1.
   2.
   3.
   4.
   5.
!:
   8.
1::
2:
  13.
  14.
  15.
:;:
  18.
i::
21.
E:
24.
25.
26.
9;:
29.
30.
31.

TABLE 7.2 The Isomers of Ala-
nine, C3H7N02, without Chemical
      Common Sense*

>(ISOMERS *AlANINE)

>BADLIST
NIL

>GOODLIST
NIL

>SPECTRUM
NIL

> DICTLIST
NIL

>(ILLEGAL ATTACHMENTS)
(NIL NIL NIL)

C3H7N02
MOLECULES

1.1 (C . 3.1 (N . 1.1 (0 . 2.))
. C3H7   0.N = 0,
. CH..CH3 CH3   0.N = 0,
. CH2.CH = CH2   NH .O.OH,
. CH2.CH = CH2   O.NH.OH,
. CHZ.CH = CH2   O.O.NH2,

. CH2.CH = CH2   N..OH OH,
. CH = CH.CH3   NH .O .OH.

CH = CH.CHi  O.NH.OH;
CH = CH.CH3   O.O.NH2,
CH = CH.CH3   N..OH OH,
C.=CH3 CH2   NH.O.OH,
C.=CH3 CH2   O.NH.OH,
C.=CH3 CH2   O.O.NH2,
C.=CH3 CH2   N..OH OH,
CH2.CH2.NH2   O.CH = 0,
CH2 .CH2.NH2   *COOH,

CHZ.NH.CH3   O.CH = 0,
CH2.NH.CH3   *COOH a

NH .C2H5  O.CH = 0,'
NH.C2H5   `COOH,
CH..CH3 NH2   O.CH = 0,
CH. .CH3 NH2   %OOH,
N..CH3 CH3   O.CH = 0,
N..CH3 CH3   *COOH,
CHZ.CH = NH   CHZ.O.OH,
CHZ.CH = NH   O.CH2.0H,
CH2.CH = NH   O.O.CH3,

. CH2.CH = NH   CH..OH OH,
. CH = CH.NH2   CH2.0.0H.

. CH = CH.NH2  O.CHZ.OH;.
. CH = CH.NH2   O.O.CH3,

196


TABLE 7.2 (Continued)

32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.

CH = CH.NH2

CH2.N = CH2
CH2.N = CH2
CH5.N = CH2
CH2.N = CH2
CH = N.CH3
CH = N.CH3
CH = N.CH3
CH = N.CH3
NH.CH = CH2
NH.CH = CH2
NH.CH = CH2

NH.CH = CH2
N = CH .CH3
N = CH.CHj
N = CH .CH3
N = CH.CH3
C.=CH3 NH
C.=CH3 NH

47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
:::
83.
84.

CH..OH OH,
CH2.0.0H.

O.CH2.0H;
O.O.CH3,
CH..OH OH,
CHZ.O.OH,
O.CH2.0H,
O.O.CH3,
CH..OH OH,
CHZ.O.OH,
O.CH2.0H,
O.O.CH3,
CH..OH OH,
CH2.0.0H,
O.CH2.0H,
O.O.CH3,
CH..OH OH,
CHZ.O.OH,
O.CH2.0H.

C.=CH3 NH  O.O.CH3, '
C.=CH3 NH   CH..OH OH,
C = .CH2 NH2   CHZ.O.OH,

C = .CH2 NH2  O.CH2.0H;
C = .CH2 NH2   O.O.CH3.

C = .CH2 NH2  CH..OH 6H,
CH2 .CH2 .OH   CH2.N = 0,
CH2.CH2.OH   CH = N.OH,
CH2.CH2.OH   NH .CH = 0,
CH2 .CH2 .OH   N = CH.OH,
CH2.CH2 .OH   O.CH = NH,
CH2.CH2.OH   0.N = CH2,
CH2.CH2.OH   *CONHZ,
CH2.CH2 .OH   C = .NH OH,
CH2 .O .CH3   CH2.N = 0,
CH2.D.CH3   CH = N.OH,
CH2 .O .CH3   NH.CH = 0,
CH2 .O .CH3   N=CH.OH,
CH2 .O .CH3   O.CH = NH,
CH2 .O .CH3   0.N = CH2,
CH2 .O .CH3   KONH3,
CH2 .O .CH3   C = .NH OH,
0 .C2H5   CH2.N = 0,
D .C2H5   CH = N.OH,
O.C2H5   NH .CH = 0,
0 .C2H5   N = CH.OH,
0 .C2H5   O.CH = NH,
0 .C2H5   0.N = CH2,
0 .C2H5   KONH2,
0 .C2H5   C = .NH OH,
CH. .CH3 OH   CH2.N = 0,
CH..CH3 OH   CH = N.OH,
CH. .CH3 OH   NH.CH = 0,
CH..CH3 OH   N = CH.OH,
    197


TABLE 7.2 (Continued)

85.
86.
87.
8
90:
91.
92.
93.
94.
95.
96.
2
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.
110.
111.
112.
113.
114.
115.
116.
117.
118.
119.
120.
121.
122.
123.
124.
125.
126.
127.
128.
129.
130.
131.
132.
133.
134.
135.
136.
137.

CH..CH3 OH   O.CH = NH,

CH. .CH3 OH
CH..CH3 OH



CH2.CH=O
CH..CH3 OH



CH2.CH =0
CH2 .CH = 0

CH2.CH = 0
CHZ.CH=O
CH2.CH =0
CH2.CH = 0
CH2 .CH = 0
CH = CH.OH
CH = CH.OH
CH = CH.OH
CH = CH.OH
CH = CH.OH
CH = CH.OH
CH = CH.OH
CH = CH .OH
O.CH=CH2
O.CH=CH2
O.CH=CH2
O.CH = CH2
O.CH=CH2
O.CH=CH2
O.CH=CH2
O.CH=CH2
C.=CH3 0
C. = CH3 0
C. = CH3 0
C. =CH3 0
C.=CH3 0
C. = CH3 0
C.=CH3 0
C.=CH3 0
C = .CH2 OH
C = .CH2 OH

C = .CH2 OH
C = .CH2 OH

C = .CH2 OH   O.CH2.NH2,
. C = .CH2 OH   O.NH.CH3,
. C= .CH2 OH   CH. .NH2 OH,
  C = .CH2 OH
- CH.C2H5     N..CH3 OH,
         N.O.OH,
= C..CH3 CH3   N.O.OH,
=  CH .CH2 .NH2   CH.O.OH,

=  CH .CH2 .NH2  C..OH Oi,
= CH.NH.CH3   CH.O.OH,
= CH.NH.CH3   C..OH OH,
=  N.C2H5   CH.O.OH,
= N.C2H5   C..OH OH,
=  C..CH3 NH2   CH.O.OH,
     198

C= .NH'OH,
CH2 .NH .OH,
0.N = CH2;



CH2.O.NH2,
"CONH2.

NH.CH2.0H,
NH.O.CH3.

O.CHZ.Nii,
O.NH.CH3,
CH. .NH2 OH,
N..CH3 OH.

CH2 .NH .0:-i;
CH2.O.NH2,
NH .CHZ.OH,
NH.O.CH3,
O.CH2.NH2,
0 .NH.CH3,
CH. .NH2 bH,
N..CHS OH.

CH2.NH.OH;
CH2.O.NH2,
NH.CH2.0H,
NH.O.CH3,
0 .CH2 .NH2,
O.NH.CH3,
CH. .NH2 OH,
N..CH3 OH,
CH2.NH.OH,
CH2.O.NH2,
NH.CHZ.OH,
NH.O.CH3,

O.CH2.NH2,
O.NH.CH3.

CH. .NH2~ 6H
N..CH3 OH,
CH2.NH.OH
CH2.O.NH2
NH.CH2.0H
NH.O.CH3,


TABLE 7.2 (Continued)

138.
139.
140.
141.
142.
143.
144.
145.
146.
147.
148.
149.
150.
151.
152.
153.
154.
155.
156.
157.
158.
159.
160.
161.
162.
163.
164.
165.
166.
167.
168.
169.
170.
171.
172.
173.
174.
175.
176.
177.
178.
179.
180.
181.
182.
183.
184.
185.
186.
187.
188.
189.
190.

= C..CH3 NH2   C..OH OH,
= CH.CH2.0H   CH.NH.OH.

= CH.CH2.0H  CHIO.NH2;
= CH .CH2 .OH   N.CH2.0H,
= CH .CH2 .OH   N.0 .CH3,
=  CH .CH2 .OH   C..NHZ OH,
= CH.O.CH3   CH.NH.OH,
= CH.O.CH3   CH.O.NH2,
= CH.O.CH3   N.CH2.0H,
= CH.O.CH3   N.O.CH3,
= CH.O.CH3   C..NHZ OH,
= C..CH3 OH   CH .NH.OH,
= C. .CH3 OH   CH.O.NH2,
= C..CH3 OH   N.CH2.0H,
= C..CH3 OH   N.O.CH3,
= C..CH3 OH   C..NHZ OH,
CH...  CH3   CH = NH   O.OH,
c.=.  CH3   CH .NH2   O-OH,
CH... CH3   N = CH2   O.OH,
c.=.  CH3   N.CH3   O.OH,
CH... CH3   CHZ.OH   N=O,
C..= CH3   CH2 .OH   N.OH,

CH... CH3  CH =O  NHSiH.
CH... CH3   CH = 0   O.NH2,

C.= . CH3   CH .OH   NH.OH,
C.= . CH3   CH .OH   O.NH2.

c= . .  CH2   CH2.NH2  o.oil,
c= . .  CH2   NH .CH3   O.OH,
c= . .  CH2   CH2 .OH   NH .OH,
c= . .  CH2   CH2 .OH   O.NH2,
c= . .  CH2   O.CH3   NH.OH,
c= . .  CH2   O.CH3   O.NH2,
CH...  NH2   CH = CH2   O.OH,
c.=.   NH2   CH .CH3   O.OH,
CH...  NH2   CH2 .OH   CH=O,

NH2   CH2 .OH   CH .OH,
NH2 O.CH3   CH=O.

c.. =
E".=
c Y
c= ::

c= . .
c= . .

NH2  0 .CH3  CH .OH',
NH   C2H5   O.OH,
NH   CH2 .OH   CHZ.OH,
NH   CH2 .OH   O.CH3,
:I?   0 .CH3   0 .CH3,
    C2H5   N=O,
OH   C2H5   N.OH,

CH.. .
c.. =

CH... OH   CH = CH2  NH.~H,
CH... OH   CH = CH2   O.NH2.
C.=. OH   CH .CH3  NH.OH,'
C.=. OH   CH.CH3   O.NH2,
CH... OH   CH2.NH2   CH = 0,
C..= OH   CH2.NH2   CH .OH,
CH... OH   NH .CH3   CH=O,
     199

O.CH3   N= 0,
0 .CH3   N.OH.


200

Formal Representation of Human Judgment

TABLE 7.2 (Continued)

191.
192.
193.
194.
195.
196.
197.
198.
199.
200.
201.
202.
203.
204.
205.
206.
207.
208.
209.
210.
211.
212.
213.
214.
215.
216.

.=

C.
CH
CH
C.
C.

. . .

OH
OH
OH
OH
OH

  . . .
  = .
  = .
z:::
c.=,
c.=.
c= . .
c= . .
c= . .

NH .CH3   CH.OH,
CH = NH   CH2.0H,
CH = NH   O.CH3,
CH .NH2   CHZ.OH,
CH.NH2   O.CH3,
N = CH2   CHZ.OH,
N = CH2   O.CH3,
N .CH3   CH2.0H,
N.CH3   O.CH3,
C2H5   NH.OH,
C2H5   O.NH2,
CH2.NH2   CHZ.OH,
CH2.NH2   O.CH3,
NH .CH3   CH2.0H,
NH .CH3   O.CH3,
CH = CH2   O.OH,
CH2.0H   CH=O,
O.CH3   CH = 0,
C2H5   CH = 0,
CH = CH2   CH2.0H,
CH = CH2   O.CH3,

0
0
0
CH3

c- . .
c= . .
c= . .
N . . .
N . . .
N . . .
N . . .
N . . .
N...

OH
OH
0
0
0

CH3
CH3
OH
OH
OH

C . . . .  CH3   CH3 OH  N=O,
C . . . .  CH3   NH2 OH   CH=O.

C.... CH3 OH OH  CH = Nil;
C.... CH3 OH OH   N = CH2,
C *...  NH2 OH OH   CH=CH2.

*This is a complete list of the topological possi-
bilities.  The restraints of BADLIST and of a
filtered DICTIONARY have been relaxed. Com-
pare with Table 7.4: the additional structures
here are chemically implausible for the standard
context of the intended use of DENDRAL. For
example, no structures are empirically known
which contain the radical (.O.NH.OH).

In these and following tables, the text is all
computer output except lines prefixed with >.
which are input from the teletype.

important feature merely exploits an idiosyncrasy of the DEN-
DRAL program that makes it easy to detect linear sequences
of nodes that might be on a list of illegal attachments, for
example, -N-N-N or -0-O.

Of far greater generality is the use of a dictionary of
solved subproblems. As soonas the programhas gone a short
way towards a solution of any practical problem, DENDRAL
would find itself constantly redoing the same subproblems over
and over again as it rebuilds radicals on one side of the


TABLE 7.3 Calling the Function
(PRINDICTER) Results in a Dump of
the Dictionary in Its Current State*

(PRINDICTER)
UOOCOlNOlOOl 1.   .CH2.NH.OH
2.    .CH2.O.NH2
3.    .NH.O.CH3
4.    .O.NH .CH3
5.    .N..CH3 OH

u01c02001 1.   .CH2.CH = 0
2.    = CH.CH2.0H
3.    = CH.O.CH3
4.    .O.CH = CH2
5.    .C. = CH3 0
6.    =C..CH3 OH

UOlCOlNOlOOl 1.    .CH = N.OH
2.    = CH.NH.OH
3.    = CH.O.NH2
4.    .NH.CH = 0
5.    = N.O.CH3
6.    .O.CH = NH
7.    .O.N = CH2
8.    .*CONH2

uooco2oo1        CH2.CH2.OH
2.  .CH2.;:CH3-
3.    .O.C2H5
4.    .CH..CH3 OH

uooco1oo2
(NO STRUCTURES)

u00c01001 1.    .CH2 .OH
2..   .O.CH3

UOlC02NOl 1.    .CH2.CH = NH
2.    .CHZ.N = CH2
3.    .CH = N.CH3
4.    = CH.CH2.NH2
5.    = CH.NH.CH3
6.    .N = CH.CH3
7.    = N.C2H5
8.    .C. = CH3 NH
9.    = C..CH3 NH2

UOlCOlNOl 1.    .CH = NH
2    = CH.NH2
      .N = CH2
4.    = N.CH3

u01c01002       .O .CH = 0
2.  .*COOH1 '

         201


TABLE 7.3 (Continued)

u01c01001       .CH = 0
2.  = CH.ki

UOOC02NOl       CH2.CH2.NH2
2.  .CH2.1Nii CHj
3.   .NH .C2tt;
4.    .CH. .CH3 NH2
5.    .N. .CH3 CH3

UOOCOlNOl       .CH2 .NH2
2.  .NH.CH;-

UOON01002
(NO STRUCTURES)

UOONOlOOl        .NH.OH
2.  .O.NH;'

UOONOl 1.   .NH2

UOlCO3 1.   .CH2.CH = CH2
5:    .CH = CH.CH3
     =CH.C2H5
4.    .c. = CH3 CH
5.    = C..CH3 CH3

UOlCO2 1.   .CH = CH2
2.    = CH.CH3

u01c01 1.   =CH2

UOlN01002 1.   .O.N = 0

UOOOD2
(NO STRUCTURES)

UOlNOlOOl 1.   .N = 0
2.   = N.OH

UOlNOl 1.   = NH

u00001 1.   .OH

uo1002
(NO STRUCTURES)

u01001 1. =o
iJooco3
2.  .CH?.CH3%~;

uooco2 1.   .C2H5
      202


Mechanization of Inductive Inference    203

TABLE 7.3 (Continued)

u00c01 1.   .CH3

DONE

o This example shows the dictionary
that was built ??? Table 7.4, and
contains the radicals needed to gen-
erate the molecules isomeric to
C3H7NO2. The headings encode the
compositions in the form UaaCbb
NccOdd where C, N, 0 have their
usual connotation of atoms, and U
stands for "unsaturations." This is
calculated as double-bond-equiva-
lents, or the number of pairs of H by
which the composition falls short of
a saturated, that is, double-bond-
free molecule.

molecules after reconstructing the other side,  In order to
avoid the waste involved in this redundancy, the program
automatically generates a list of compositions which is con-
sulted whenever a new radical is to be generated. If the com-
position of the new radical appears in the dictionary, the
dictionary contents are simply copied out. If not, the problem
is solved and a new dictionary item is entered for further use
later, Insofar as the dictionary has already been filtered with
respect to BADLIST, a great deal of effort can be saved, and
in fact the program would not be practical for molecules of
even moderate complexity were it not for this feature. As an
example, the dictionary that has been generated in the solution
of the alanine problem is given in Table 7.3, and the filtered
list of isomers is Table 7.4. It is also feasible and desirable
to give chemical insight into the program by overt manipulation
of the dictionary. That is to say, when a given context calls for
it, the radicals corresponding to a given composition can be
entered directly, usually with the aim of excluding certain
idiosyncratic items. This must be done with great care, since
the list of larger radicals that may be generated later relies
upon the dictionary already established for smaller radicals.

  A serious problem encountered in practice is managing
the trade-off between the growth of the dictionary and the cor-
responding loss of scratch space for the LISP program to


TABLE 7.4 The Isomers of Alanine, CsH?NOt,
     Restrrlned by Common Sense*

>(ISOMERS MIANINE)

> BADLIST

NC (1. (N 0)) (1. (N ON) UN 0) (1. C (1. (N 0))))
(C (3. C (1. (N 0) (1. ii)))) (C (3, C) (1. tN 0) (1. MM
ON 0) (1. HI (1. C (3. 0)) (C (2. C (1. (N 0) (1. H))))
(C (2. C) (1. (N 0) (1. HI)) UN 0) (1. tl) (1. C (2. C)))
(N (2. C (1. 0 (1. Ii)))) (C (2. N) (1. 0 (1. H)I) (0 (1.
(1. C (2. NM (0 (1. 0)) (0 (1. N (1. ON (N 11. 0)
(1. 011 K (1. HI (1. N (2. ON (N (1.' C (1. HI) (2. 0))
(*o* (1. 0 (1. HII) UN 0) (1. C (1. 0 (1. HII (2. 0))))

> GOODLIST
NIL

> SPECTRUM
NIL

> DICT LIST
NIL

>(ILLEGAL ATTACHMENTS)
(NIL ((N N N)) ((0) WHZOHW

CW7NO2
MOLECULES
NJ . 1.1 (C . 3.1 (N . 1.) (0 . 2.))

5:
3.
4.
ii:
2
9.
ii:
K
14.
ii:
17.
ii:
20.
Ii:
Z:
25.

.

.

C3H7   0.N = 0,
CH..CH3 CH3   0.N = 0,
CH2.CH2 .NH2   0 .CH =0,
CH2.CH2 .NH2   *COOH,
CHZ.NH.CH3  *COOH,
NH .C2H5   O.CH = 0,
CH. .CH3 NH2   SOOH,
N..CH3 CH3   O.CH = 0,
CHZ.CH2.OH   CH = N.OH,
CH2 .CH2 .OH   NH .CH = 0,
CHZ.CH2 .OH   O.CH = NH,
CH2 .CH2 .OH   0-N = CH2,
CH2 .CH2 .OH   `CONHZ,
CH2 .O.CH3   CH = N.OH,
CHZ.O.Ct-43   *CONHZ,
O.C2H5   CH = N.OH,
O.CZH5   NH.CH = 0,
CH..CH3 OH   CH = N.OH,
;t+;.g3 00"   `CONH2,
CH2:CH = 0   CHZ.NH.OH,
       CH2.O.NH2,
CHZ.CH = 0  NH.O.CH3,
CH2 .CH = 0   O.NH.CH3,
cyt& i   N..CH3 OH,
C        CH2 .NH.OH,
        204

tl)


   Mechanization of Inductive Inference

TABLE 7.4 (Continued)

26.
27.
28.
29.
30.
2
33.
34.
E:
37.
2
40.
41.
fS:
44.
45.
46.
47.
48.
49.
50.

. C.=CH3 0   CH2.O.NH2,
. C. =CH3 0  NH.O.CH3,
. C.=CH3 0   O.NH.CH3,
. C.=CH3 0   N..CH3 OH,
  CH.CH2.0H
l CH.CH2.0H  CH.O.NH2,
         N.O.CH3,
= CH.O.CH3   CH.O.NH2,
= CH.O.CH3   N.O.CH3,
C..= CH3   CH2 .OH   N.OH,
C  = CH3
Cl?... CH3   0 .CH3   N.OH,
          CH = 0   NH <OH,
CH... CH3   CH = 0   O.NH2,
C= . . CH2   CHZ.OH   O.NH2,
`34.;. NH2   CH2 .OH   CH=O,
C= . . NH   CH2 .OH   CH2 .OH,
C= . . NH   CH2.0H   0 .CH3,
CH... OH   CH2.NH2   CH=O,

205

CH... OH   CH = NH   CH2 .OH,
c=.. 0   C2H5   NH.OH.

           -
c=.. 0   C2H5  O.NH2;
c=.. 0   CH2 .NH2   CHZ.OH,
c=.. 0   CH2.NH2   O.CH3,
c=.. 0   NH.CH3   CHO.OH,
N . . .  CH3   O.CH3   CH=O,
N . . .  OH   C2H5   CH = 0,

*This restraint is implemented by systematic
graph-matching against a BADLIST which
contains the worst monstrosities of frag-
ments,  as indicated in the dialogue that
precedes the output table.

maneuver in.  If left unchecked the dictionary building can
easily reach the point of exhausting available computing room
and paralyzing the program. A heuristic management of the
dictionary would be a close analog to the human solution to this
problem and is being studied at the present time.  For example,
very large dictionaries could be stored on external memories,
and only those segments kept in core that are needed for the
current operations of the program.

   These facilities have been built into the DENDRAL gen-
erator program in such a way as to leave it in a state of high
_ efficiency. Thus the filters are not applied at the end after the
production of a larger redundant list, they are applied at the
earliest possible stage in the tree building program. When
C3H,N0, is examined by this filtered DENDRAL generator the
results of Table 7.4 are obtained. Eachof these is a moderately


206   Formal Representation of Human Judgment

plausible chemical isomer.  No. 7 is the actual structure of
alanine.  The order of output is the canonical DENDRAL
sequence.

  It may be of some interest that three of the structures in
Table 7.4 have apparently not yet been reported in the chemical
literature, although they would appear to be reasonable candi-
dates for synthesis by a chemistry graduate student. With even
slightly more complex molecules, one should expect to find that
only a small minority of the potential structural species are
in fact already known to chemical science. Without an algo-
rithmic generator, however, it has not hitherto been possible
to make any realistic estimates of the extent of empirical
coverage of the theoretical expectations.

  It should be perfectly obvious that again with a small
increase in complexity the number of possible isomers will
grow very quickly and one may have to rely upon a heuristic
rather than an exhaustive approach to the generation of hypoth-
eses apt to a given set of data. In particular it might be
desirable to use some a priori notions of plausibility in the
generator and then to seek ways of adjusting the program so
that the parameters for plausibility sequences were already
sensitive to qualities in the data themselves. One approach to
this uses GOODLIST, an ordered list of preferred substruc-
tures. That is to say, we would assign the highest plausibility
and therefore priority for deductive corroboration of those
molecules which contain items in GOODLIST. Inorder to ac-
complish this each GOODLIST item is regarded as a "super
atom" of appropriate valence, and the corresponding subset of
atoms from the compositional formula is allocated to the super
atom.  Thus the very common radical -COOH, the carboxyl
radical, is a very common ensemble of a double bond, a car-
bon atom, and two oxygen atoms, (C.:OH 0). Insofar-as the
molecular formula permits, various numbers of these sets of
atoms are assigned to carboxyl groups, and the construct
-COOH is then regarded as if it were a univalent superatom.

  Certain housekeeping details must be looked after to be
sure of avoiding redundant representations and to reconvert
the constructions to canonical form. They will, however, no
longer be in canonical sequence, but rather have some implicit
order of plausibility in the sequence with which they are put
out. When alanine is subjected to such a procedure, the order-
ing of Table 7.5 is obtained. It will be noted that alanine is a
very early entry in this table.


TABLE 7.5 The Isomers of Alanine, as in Table 7.4, but
   Resequenced by the Application ofGOODLIST*

>(SETQ GOODLIST SAVEGOODLIST)
WCOOH* (1. C (1. 0) (2. 0)) 100 . 01 (*CO* (2. C (2. 0))

iO0. 0.) -(*CHNH2* (2. C (1. N)) 100. 0.) FCH20H"
(1. C (1. 0)) 100. 0.1 (WOH* (2. N (1. 0)) 100. 0.)
;;;N"o*))(l. C (2. N)) 100. 0.1 FNCH2* (1. N (2 C))
>clSOn;rEdS *ALANINE)
MOLECULES
((U .  0.) (C .  1.1 (*COOH* . 1.) (%HNH2* . 1 .)I
  1.    . CH2.CH2.NH2   %OOH,
  2. .  CH. .CH3 NH2   xCOOH,

MOLECULES
KU .  0.) (C . 2.) (N .  1.) (XCOOH" . 1.))
  1. .  CH2.NH.CHS   *COOH,

MOLECULES
uu .  0.) wo* .  1.) (*CHNH2* .  1.) (*CH20H* . 1.))

1.    c=.. 0   CH2.NH2   CH2.0H,
2.    CH... NH2   CH2 .OH   CH = 0,

MOLECULES
KU .  0.1 (C . 1.) (0 . 1.) (*co* . 1.1
1.    c=.. 0   CH2.NH2   0 .CH3,
       CH2.CH2.NH2
3: CH... OH      O.CH=O,
              CH2.NH2   CH = 0,

MOLECULES
((IJ . 0.) CC . 1.) (N . 1.) (*CO* . 1.)
1.    c=.. 0   NH.CH3   CH2.0H,
2. .  CH2 .CH2 .OH   NH .CH = 0,
3. .  CH2.CH2.OH   *CONH2,

MOLECULES
((U I  0.) (C . 2.) (*CO* . 1.) (*NOH* .
     c= 0
:: N..:' OH   C2H5   NH .OH,
              C2H   CH = 0,
3. .   CH2.CH = 0   CH2.NH.OH,
4. .  CH2 .CH = 0   N..CH3 OH,
5. .  C. = CH3 0   CH2.NH.OH,
6.      C.=CH3 0
7.   CH...        N..CH3 OH,
          CH3   CH = 0   NH.OH,

MOLECULES

(KHNH2"

("CH20H*

1 .))

((IJ . 0.) (`2 . 2.1 (N . 1.) (0 . 1.) ('CO* , 1.1)
:: :  CH2.O.CH3   "CONH2,
        CH2.CH = 0   CH2.O.NH2,
3. .  C. = CH3 0   CH2 .O .NH2,
4. .  NH .C2H5   O.CH = 0,
5. .  CH2 .CH = 0   NH.O.CH3,
6. .  C. = CH3 0   NH.O.CH3,
7. . O.C2H5   NH.CH = 0,
              207

1.))







1 .)I


TABLE 7.5 (Continued)

8.    . CHZ.CH = 0   O.NH.CH3,
9.      C.=CH3 0
   c= 0     0 .NH.CH3,
10.            C2H5
    iii. .CH3 OH    O.NH2,
11.
12. CH... CH3     *CONH2,
               CH = 0   O.NH2,
13.      N..CH3 CH3   O.CH = 0,
14.   Ii... CH3 O.CH3   CH =0,
MOLECULES
uu .  1.1 (%HNH2* . 1.) KH20H* . 2.1)
XNO ALLOWABLE STRUCTURES

MOLECULES
((U .  1.) cc . 1.1 (0 . 1 .I (eHNH2* . 1 .I VCH20H* . 1.))

MOLECULES
((U .  1.1 (C . 2.) (0 . 2.) (*CHNH2* . 1.1)
*NO ALLOWABLE STRUCTURES

MOLECULES
(NJ . 0.) FCH20H* . 2.1 FCHNH* . 1.1)
*NO ALLOWABLE STRUCTURES

MOLECULES
((U . 0.) PCH20H* . 2 .I PNCH2*. 1.))
`?dO ALLOWABLE STRUCTURES

MOLECULES
KU .  1.1 (C . 1.1 (N . 1.1 (KH20H* . 2.))
1.    C=.. NH   CH2.0H   CH2 .OH,

MOLECULES
((U .  1.1 (C .  2.1 (XCH20H" . 1.1 (*NOH* . 1.1)
1.      CH2.CH2.OH   CH = N.OH,
2.   i..= CH3   CH2 .OH   N.OH,

MOLECULES
au . 0.) (C . 1.1 (0 . 1.1 (`CH20H* . 1.1 (`*CHNH* . 1.))
1.      CH2.CH2.OH
2. CH... OH     O.CH = NH,
              CH = NH   CHZ.OH,

MOLECULES
((U . 0.) (c . 1.) (0 . 1.1 (*CH20H* . 1.) FNCH2* . 1.1)
1. .  CH2.CH2.OH   0.N = CH2,

MOLECULES
((U , 1.) (C . 2.) (N . 1.) (0 . 1.1 (`CH20H* . 1.))
::     =  CH .CH2 .OH   CH .O.NH2,
     = CH.CHZ.OH   N.0 .CH3,
3.    C= . . CH2   CH2.0H   0 .NH2,
4.    C = .`. NH   CH2 .OH   O.CH3,

MOLECULES
((U . 1.) (`i-l . 3.) (0 . 1.) (*NOH* . 1.))
              208


Mechanization of Inductive Inference    209

TABLE 7.5 (Continued)

1. .  CH2.O.CH3   CH = N.OH,
2. , O.C2H5   CH = N.OH,
3.      CH..CH3 OH   CH = N.OH,
4.   i..= CH3   O.CH3   N.OH,

MOLECULES
((U .  0.) (C .  2.1 (%HNH* . 1.))
kNO ALLOWABLE STRUCTURES

MOLECULES
((U .  0.) cc . 2.1 (0   2.1 (*NCHZ* . 1.))
j;NO ALLOWABLE STRUCTURES

MOLECULES
KU .  1.1 (C   3.1 (N . 1.1 (0 . 2.1)
1. . CjH7   0.N = 0,
2.      CH. .CH3 CH3   0.N = 0,
3.   i CH.O.CH3   CH.O.NHZ,
4.    = CH.O.CH3   N.O.CH3,
*Substructures defined in GOODLIST are pre-
vented from reappearing except under the cor-
responding superatom. Thus the final block of
four molecules is the group containing none
of the defined superatoms: *COOH*, *CO*,
*CHNH2*, *CH20H*, *NOH*, *CHNH*, *NCHP*.
In many applications, the count of a given super-
atom will be set to zero for a particular con-
text, or conversely, to non-zero. For example,
the superatom o NCH2* is quite likely to be
suppressed if the chemist knows that form-
aldehyde was not used in the synthesis of the
molecule being analysed.

  These computations are brought to the surface here only
in order to reveal the heuristic revision of priorities that is
available to DENDRAL.  In actual problem solving, many of
these hypotheses would be rejected long before a trial molecule
was completed.

REFERENCE TO DATA

  With these facilities we are now readytoattempt to apply
DENDRAL to explicit data. The actual processes in the mass
spectrometer are too complicated to be deaIt with head-on in
the first instance. We therefore deal with various models of


210   Formal Representation of Human Judgment

the behavior of the mass spectrometer, the theories of mass
spectrometry.  To exercise the simpler logical elements of
heuristic DENDRAL, we begin with a zero order theory, one
which postulates that the mass spectrum is obtained by assign-
ing a uniform intensity to each fragment that can be secured
by breaking just one bond in the molecule. We neglect the
splitting of bonds affecting only a hydrogen atom. To test the
program we do not at first use a real spectrum, but rather the
spectrum predicted by this idealized theory for some given
isomer.

  As before, the predicter is deeply embedded within the
DENDRAL generator, so that the structure building tree is
truncated at the earliest point that a violationof the theory by
the data set is encountered.  This leads to a very efficient
set of trials, not of completed, but of tentative and partial
structures when the program is given a molecular composition
and a hypothetical zero-order spectrum. The essence of the
program is to generate all of the partitions at a given level.
and then to scan these for compatibility with the mass list of
the fragments.  There are also some pertinent apriori con-
siderations about the partitioning of molecular compositions,
and this has been used to reorder the primary partitions in
the most plausible sequence.  We manage the sequence with
which hypotheses are tested but still retainthe exhaustive and
irredundant character of the generator. Owing to imperfect
memory and nonstandard formats, human judgment rarely
succeeds so well at this.

  Each of the plausibility operations plainly should and can
be related to a statement of context. For example, in setting
up the GOODLIST, the chemist will be interrogated about the
likelihood of certain radicals, and cues for this can also be
obtained directly from the mass spectrum, For example, the
program is aware that mass number 45 is almost pathognomic
for the radical -COOH. Hence, this superatom will be set to
zero in the absence of a signal at that mass. Conversely, in a
high-resolution analysis, the occurrence of mass number
44.998 would justify fixing -COOH as nonzero.

PERFORMANCE

The description, so far,  characterizes an operational


Mechanization of Inductive Inference    211

program.  Its main features can be routinely demonstrated
without special preparation by remote teletypewriter inter-
actions with the PDP-6 computer at Stanford University.
DENDRAL has been tested in a number of ways in an attempt
to evaluate its performance as a working tool. It will, of
course, vastly outdo the human chemist in such contrived but
potentially useful exercises as making an exhaustive and
irredundant list of isomers of a given formula (Table 7.4
shows this for C,H,NOz). In many cases, particularly when an
adequate dictionary has been previously built and no further
entries are being made? the computer will output its solutions
at teletype speed. The program is also slightly faster than the
human operator at subgraph-matching, that is, searching a
series of molecular structures for the presence of any member
of a given list of forbidden embedded subgraphs. It will outdo
the human by approximately 100 : 1, or perhaps better if ac-
curacy is given due weight in converting structural representa-
tions into canonical form and testing for isomorphism.

  A few real spectra have been input, with surprisingly
crisp results in view of the known imperfections of the zero
order theory of mass spectrometry.

  Thus heuristic DENDRAL was run with data on threonine
obtained with a Bendix time-of-flight instrument (Fig. 7.2).
The program returned two solutions, threonine, the correct
structure, and one other (Fig. 7.3). The second isomer has
not, to our knowledge, been analyzed by mass spectrometry.
However, its spectrum can be predicted to resemble that of
threonine very closely in its qualitative features.

  When Dendral was challenged with C,H,NO, under the
conditions of Table 7.4 it returned 238 "plausible isomers,"
of which only these two satify the data according to the pro-
gram's model of the theory of mass spectrometry. The in-
clusion of the data shortens the computationtime from about 30
minutes to about 3 minutes.

  It is not easy to test the exhaustiveness of the DENDRAL
generator without extensive files of known structures. How-
ever, it is possible to write recursive combinatorial expres-
sions to count the expected numbers of isomeric alkane mole-
cules (C.H,,+.J and alkyl radicals (-CnHan+J as shown in Table
7.6.  These numbers have been verified by DENDRAL for
radicals through C9H,, and for molecules through &HZ8, after
which the LISP program structure becomes too unwieldy to


90



80



70 1

60-
8,-
5    _
c
$40 -


.30 -



20 -

    ----_.

10 -



  0   IL

--




_1

0        20

--


?"1

----
d

.---L--B----

Fig. 7.2 The mass spectrum of threonine (CH.. .NH2 CH. .CH3OH C.
-OH 0) presented as a bar diagramfromMartin(1965). This yielded a list
of mass numbers: (15 16 17 18 29 30 43 45 56 57 74 75 119). This list was
input to DENDRAL. which responded to (ISOMERS C4H9N03) with two
solutions: 1. CH. .NH2 CH. .CH3 OH C. = OH 0    threonine
       2. c.... CH3 OH CH2.NH2 C. =OH 0   See Fig. 5.3.

(4                 (b)

Fig. 7.3 (a) Threonine, (CH...NH2 CH..CHJ OH C. =OH 0). and (b) 2-
Methyl, P-hydroxy, 3-aminopropionic acid. (C....CH3 OH CH2.NH2 C.
=OH 0).

212


Mechanization of Inductive Inference    213

TABLE 7.6  Counting the Isomeric Alkanes and Alkyl Radicals:
         C.H2,+2and GHz~+~*

c     Alkane     Alkyl         C     Alkane     Alkyl

1          1          1         11        159       1238

2          1         1         12        355       3057

3         1         2        13        802       7639

4          2          4         14       1858      19241
5          3         8         15       4347      48865

6          5         17        16      10359      124906

7          9         39        17       24894     321198

8         18         89        18      605"3 I<     832019
9         35        211         19     148284     2156010
10         75        507         20     366319     5622109


*The figures were generated by a computer program folIowing the
algorithm of Henze and Blair (1931).  Cayley's historic algorithm is
incorrect, buut is still quoted by a recent monograph on applications
of graph theory (1965).

continue in core memory. Since there are no chemical pro-
hibitions, the list (Table 7.7) of 75 isomeric decanes may illus-
trate the systematic combinatorial aspects of DENDRAL more
vividly to a human reader than the preceding outputs do.  Iso-
mers are, of course, vastly more numerous for compositions
containing some N and 0 atoms.

  Facilities have been provided in the past, but are not
available on our present computer system owing to hardware
limitations, for providing two-dimensional graphic displays of
structural maps as translations of DENDRAL notations. These
programs also enabled man-computer interactions where the
chemist could manipulate chemical structures to a substantial
degree.

  Where DENDRAL begins to be shaky is, as usual, when
confronted with subtle changes of context which the user may
often find difficult to communicate precisely to the program,
even when he can do this readily to his fellow scientists. As
far as possible we seek to get out of this difficulty by building
interrogation subroutines into the program so that the chemis!
can provide data rather than obliging him to write new program
text in the LISP language. Present efforts are concentrated on
elaborating the theory of mass spectrometry as represented in


TABLE 7.7  The 75Isomers of Decane, CJL

(ISOMERS ClOH22)

MOLECULES

((U
1.
C:
4.
5.
6.
2
9.
10.
11.
:;:
14.
:::
17.
18.
:`o:
21.
22.
23.
24.
25.
26.
27.
28.
29 .
30.
31.
E:
34.
35.
36.
37.
2
40.
41.
42.
43.
44.
4465.
47:
4%.
49.

0.) (C . 10.))
  .  CH2 .CH2 .C3H7   CH2.CH2.C3H7,
  .  CH2 .CH2 .C3H7   CH2 .CH2 .CH. .CH3 CH3,
  .  CH2 .CH2 .C3Hf   CHZ.CH. .CH3 C2H5,
  .  CH2 .CH2 .C3H7   CH2.C...CH3 CH3 CH3,
  .  CH2 .CH2 .C3H7   CH. .CH3 C3H7,
. CH2.CH2.C3H7   CH. .CH3 CH, .CH3 CH3,
  .  CH2 .CH2 .C3H7   CH. .C2H5 C2H5,
  .  CH2 .CH2 .C3H7   C...CH3 CH3 C2H5,
. CH2.CH2.CH. .CH3 CH3   CH2.CH2.CH. .CH3 CH3,
  .  CH2 .CH2 .CH. .CH3 CH3   CHZ.CH ..CH3 C2H5,
. CH2.CH2.CH..CH3 CH3   CH2.C.. .CH3 CH3 CH3,

CH2.CH2.CH..CH3 CH3   CH. .CH3 C3H7,
CH2.CH2.CH..CH3 CH3   CH. .CH3 CH. .CH3 CH3.

CH2 .CH2 .CH 1 .CH3 CHi  CH ..C2ii5 C2i5,     '

CH2.CH2.CH..CH3 CH3   C. ..CH3 CH3 C2H5,
CH2.CH. .CH3 C2H5   CH2 .CH. .CH3 C2H5,
CH2 .CH. .CH3 C2H5   CH2.C.. .CH3 CH3 CH3,
CH2.CH..CH3 C2H5   CH. .CH3 C3H7,
CH2.CH..CH3 C2H5   CH. .CH3 CH. .CH3 CH3,
CH2.CH. .CH3 C2H5   CH. .C2H5 C2H5,
CH2.CH. .CH3 C2H5   C...CH3 CH3 C2H5,
CH2.C...CH3 CH3 CH3   CH2.C. ..CH3 CH3 CH3,

. CH2.C...CH3 CH3 CH3   CH. .CH3 C3H7,
. CH2.C...CH3 CH3 CH3   CH..CH3 CH..CH3 CH3.

. CH2.C...CH3 CH3 CHi                      '
                CH. .C2H5 C2H5,
. CH2.C...CH3 CH3 CH3   C. ..CH3 CH3 C2H5,
. CH. .CH3 C3H7   CH. .CH3 C3H7,
. CH. .CH3 C3H7   CH. .CH3 CH. .CH3 CH3,

. CH..CH3 C3H7   CH. .C2H5 C2H5,
. CH. .CH3 C3H7   C. ..CH3 CH3 C2H5.

. CH..CH3 CH..CH3 CH3  CH. .CH3 Cti. .CH3 CH3,
. CH..CH3 CH..CH3 CH3   CH, .C2H5 C2H5,
. CH..CH3 CH..CH3 CH3   C. ..CH3 CH3 C2H5,
. CH..C2H5 C2H5   CH. .C2H5 C2H5,
. CH. .C2H5 C2H5   C. ..CH3 CH3 C2H5,
. C...CH3 CH3 C2H5   C...CH3 CH3 C2H5.

CH...
E:::
CH...
CH...
CH...
CH...
2:::
CH...
:i:::
CH...

CH3
CH3
CH3
CH3
CH3
CH3
CH3
CH3
CH3
CH3
C2H5
C2H5
C2H5

CH2 .C3H7  CH2.C3H7, '
CH2.C3H7   CH2.CH. .CH3 CH3,
CH2 .C3H7   CH. .CH3 C2H5,
CH2 .C3H7   C.. .CH3 CH3 CH3,
CH2.W. .CH3 CH3   CHZ.CH. .CH3 CH3,
CHZ.CH. .CH3 CH3   CH. .CH3 C2H5,
CH2.CH..CH3 CH3   C.. .CH3 CH3 CH3,
CH. .CH3 C2H5   CH. .CH3 C2H5,
CH. .CH3 C2H5   C...CH3 CH3 CH3,
C...CH3 CH3 CH3   C.. .CH3 CH3 CH3,
C3H7   CH2 .C3H7,
C3H7   CH2 .CH. .CH3 CH3,
C3H7   CH. .CH3 C2H5,

  214


           Mechanization of Inductive Inference    215

TABLE 7.7 (Continued)

50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
2:
75.

CH...
CH...
CH...
CH...
CH...
CH...
CH . _ .
CH...
CH...
C....

C2H5   C3H7   C. ..CH3 CH3 CH3,
C2H5   CH. .CH3 CH3   CH2 .C3H7,
C2H5   CH. .CH3 CH3   CHZ.CH. .CH3 CH3,
C2H5   CH. .CH3 CH3   CH. .CH3 C2H5,
C2H5   CH. .CH3 CH3   C. ..CH3 CH3 CH3,
C3H7   C3H7   C3H7,
C3H7   C3H7   CH. .CH3 CH3,
C3H7   CH. .CH3 CH3   CH. .CH3 CH3,
CH. .CH3 CH3   CH. .CH3 CH3   CH. .CH3 CH3,
CH3

C.... CH3
    CH3
::::' CH3
C...: CH3
C.... CH3
C . . . .  CH3
C . . . .  CH3
C.... CH3
C . . . .  CH3
     CH3
k::: CH3
C . . . .  CH3
C . . . .  CH3
k:::  CH3
     C2H5
C . . . .  C2H5

CH3









C2H5
CH3
CH3
CH3









C2H5
CH3
CH3
CH3
CH3
C2H5

C3H7









C2H5  CH2.C3H7,
C3H7









      CHZ.CH. .CH3 CH3,
    CH2.CH. .CH3 CH3,
C3H7









C2H5  CH. .CH3 C2H5,









       CH..CH3 C2H5.
C3H7   C.. .CH3 CH3 CH3,
CH..CH3 CH3   CH2.C3H7,
CH..CH3 CH3   CH2.CH. .CH3 CH3,
CH..CH3 CH3   CH _ .CH3 C2H5,
CH..CHS CH3   C...CH3 CH3 CH3,
C2H5   CH2.C3H7,

C2H5   C2H5  C...CHi CH3 dH3,
C2H5   C3H7   C3H7,
C2H5   C3H7   CH _ .CH3 CH3,
C2H5   CH. .CH3 CH3   CH. .CH3 CH3,
C2H5   C2H5   C3H7,
C2H5   C2H5   CH. .CH3 CH3,

the predictor subprogram. This is giving verypromising re-
sults, the chief limitations being (1) the precise definition of
the rules actually used by the chemist and operant in nature,
and (2) the translation of these conceptual algorithms into
viable program.  These two issues are. however. not as in-
dependent as might be imagined. It is the clumsiness of the
program writing and debugging that impedes rapid testing of
the correctness with which a rule has been formulated.  In our
experience each half hour of conference has generated approxi-
mately a man-month of programming effort. It is obvious that
despite the simplicity of the DENDRAL notation for chemical
structures, we still have a long way to go in the development
of a language for the simple expression of other conceptual
constructs of organic chemistry, particularly context defini-
tions and reaction mechanisms.

  Insofar as programs are also graphs and an effective sub-
routine may be regarded as a hypothesis that matches its in-
tended functions, the latter being both logically deducible and


216   Formal Representation of Human Judgment

operationally testable by running the subroutine, program
writing may be regarded as an inductive process roughly
analogous to the induction of structural formulas as solutions
to sets of chemical data. We believe it may be necessary to
produce a solution to this meta language puzzle before the
implementation of human ideas in computer subroutines can
proceed efficiently enough for the rapid and effective transfer
of human insights into machine judgment. Nevertheless. by the
rather laborious process that we have outlined, DENDRAL has
proceeded to that stage of sophistication where it is at least
no longer an occasion of embarrassment to demonstrate it to
our scientific colleagues and friends who have no interest
whatsoever in computers per se.

  The deferral of cyclic structures will weaken the casuistic
impact of the program upon chemists. However, the acyclic
molecules give sufficient play for analyzing the inductive pro-
cess.  Furthermore, it may be advantageous to leave a blemish
that diminishes the latent threat of artificial intelligence to
human aspirations. However, a complete notation and specifi-
cations for cyclic DENDRAL have been documented (Leder-
berg, 1965) and this is being programmed now in response to
the utilitarian demands of chemist friends.

BUILDING DE NDRAL

  DENDRAL was developed in the LISP 1.5 and 1.6 dialects.
The original package was composed by Mr. William White
working from the' specifications summarized in Table 7,1, and
a version of DENDRAL which almost worked was generated
on the IBM 7090 with the helpof a time-shared editing system
run on the PDP- 1. When the LISP system on System Develop-
ment Corporation's Q-32 became available to us, we pursued
a vigorous programming effort by remote teletype communica-
tion from Stanford to Santa Monica. This proved to be a very
powerful and remarkably reliable system, and Mr. White and
Mrs. Georgia Sutherland perfected the program (Sutherland,
1967) on that computer with a total effort of about one year.

  In retrospect it is quite obvious that the program simply
could never have been written and debugged without the help
of the rapid interaction provided by the time-sharing system.
We stress ylezx?l' advisedly, in the light of our own experience


Mechanization of Inductive Inference    217

with the human frustrations involved in the typicalturnaround
times for error detection and error correction under the
operating system for the IBM 7090. In November 1966 we
moved our operations to LISP 1.5 on the PDP-6 computer in-
stalled for the Artificial Intelligence Project at Stanford.
Despite the avowed close compatibility of the LISP systems,
approximately 3 man-months of effort were required to trans-
fer the program from one dialect to the other.

PATTERN RECOGNITION

   As the candidate structures become more and more com-
plex we have to abandon the idea of exhaustive enumeration of
possible structures.  Instead, the data are scrutinized for cues
that offer any preference for certain kinds of structures as
starting points.  As we keep examining the problem we do find
more and more ways in which such cues can be exploited.  For
example, an elementary pattern analysis of the period with
which mass numbers are represented, for example, for gaps
in the sequence of mass numbers with significant intensity
around a period of about 14 mass units (CH,), can give signifi-
cant hints about the existence of a number of branch points
within the molecule. If these can be limited, the extent of the
necessary tree building can be drastically curtailed from first
principles.  Likewise, an examination of mass numbers ap-
proximating half the total molecular weight can lead to some
trial hypotheses about the major partition of the molecule,
which again can truncate the development. We do not, how-
ever, yet have a program sophisticated enough to make a pro-
found reexamination of its own strategy at any level more
complicated than the resetting of numerical parameters, a
limitation closely related to the meta language challenge
mentioned above. In sum, we find that the development of this
program has not encountered very much that is fundamentally
new in principle: problem solving in this field has much the
same flavor as the solutions already adduced for chess,
checkers, theorem proving, etc.  One possible advantage of
pursuing investigations in artificial intelligence and heuristic
programming within this framework is that the practical
utility of what has already been produced should suffice to en-
gage the attention of a considerable number of human chemists


218   Formal Representation of Human Judgment

working on practical problems in a fashion that lends itself
to machine observation and emulation of their techniques.

PROGRAMMING AS INDUCTION

  The game of writing programs becomes more and more an
experimental science as the complexity of the programs in-
creases. At the limit, the programmer has the insecure hope
that his text will (1) run and (2) accomplish the intended
goals, that is, his program is a hypothesis that needs deductive
elaboration to verify it. This suggests that program writing
ought to be mechanized by a process analogous to the induction
of chemical hypotheses by DENDRAL and starting with
mechanized observations of human techniques of problem-
solving,

  The pervasive role of analogy in human judgment suggests
that much would be gained in artificial intelligence if a large
compatible tool kit of successful programs were available both
to the human and the mechanized programmer. Unfortunately,
artificial intelligence researchers go to such excesses in their
originality and improvisation of idiosyncratic dialects, that
there is no easy way in which past successes of unpredictable
relevance can be immediately tried out for a new problem.
Experimental science, on the other hand, is replete with impor-
tant advances that resulted from the provocative availability of
a new technique waiting on the shelf to find a use. Indeed, mass
spectrometry itself has exactly that history.