Proposa I to THE ADVANCED RESEARCH PROJECTS AGENCY For continuation of THE HEURISTIC PROGRAMMING PROJECT ARPA Order No, 2494 February 1977 Computer Science Department STANFORD UNIVERSITY Stanford University February, 1977 Proposal to THE ADVANCED RESEARCH PROJECTS AGENCY for continuation of THE HEURISTIC PROGRAMMING PROJECT Edward A. Feigenbaum, Professor of Computer Science Joshua Lederberg, Professor of Genetics Bruce G. Buchanan, Adjunct Professor of Computer Science Co-Principal Investigators Abstract The research project in Heuristic Programming is pro- posed for continuation of Contract DAK-15-73-C-0435 for two years beginning July 1, 1976 at a total cost of $650,000.00. This project will be administratively distinct within the Computer Science Department. Stanford University Heuristic Programming Project STANFORD UNIVERSITY HEURISTIC PROGRAMMING PROJECT Principal Investigators Prof. Edward A. Feigenbaum Prof. Joshua Lederberg Prof. Bruce G. Buchanan Contract No. DAHC l5-73C-0435 July 1, 1977 to June 30, 1979 1 OBJECTIVES The objectives of the Heuristic Programming project are to provide conceptual and programming tools for building knowledge- based programs. More specifically the proposed work centers on three major themes: I-A. Generalization of knowledge-based systems design and implementation techniques. I-B. Extension of current research toward new or more powerful techniques for knowledge-based systems. II. Orientation toward other ARPA-IPTO advanced R&D efforts. 2 BACKGROUND AND TECHNICAL NEED 2.1 Symbolic Computation Computer scientists have long recognized that a computer is a general symbol-manipulating device. Arithmetic constitutes a special case of this capability-- the manipulation of those symbols that are numbers. In this proposal we will be discussing non-numeric symbol manipulation by computers. In thinking about non-numeric computation, it is useful to think about: a. inference methods (as opposed to calculation and algorithms) b. qualitative "lines of reasoning" (as opposed to quantitative formulations) c. symbolic facts (not merely numeric parameters and formulas) Stanford University Heuristic Programming Project d. decision rules of expertise and judgment (as opposed to mathe-matical decision rules) Symbolic computation, though general and powerful, has hardly begun to be exploited in real applications. The specialty within Computer Science that has studied complex methods of symbolic computation is "Artificial Intelligence Research*'. 2.2 The Intelligent Agent Viewpoint Though artificial intelligence research has a number of different goals, one of them can be described as the study, design and implementation of "intelligent agent" computer programs. A succinct version of this view was presented by Feigenbaum in a report to the Director of ARPA in 1973, as follows: Artificial Intelligence research is that part of Computer Science that is concerned with the symbol-manipulation processes that produce intelligent action. By "intelligent action" is meant an act or decision that is goal-oriented, arrived at by an understandable chain of symbolic analysis and reasoning steps, and is one in which knowledge of the world informs and guides the reasoning. The potential uses of computers by people to accomplish tasks can be "one-dimensionalized" into a spectrum representing the nature of instruction that must be given the computer to do its job. Call it the WHAT-TO-HOW spectrum. At one extreme of the spectrum, the user supplies his intelligence to instruct the machine with precision exactly HOW to do his job, step-by-step. Progress in Computer Science can be seen as steps away from that extreme "HOW" point on the spectrum: the familiar panoply of assembly languages, subroutine libraries, compilers, extensible languages, etc. At the other extreme of the spectrum is the user with his real problem (WHAT he wishes the computer, as his instrument, to do for him). He aspires to communicate WHAT he wants done in a language that is comfortable to him (perhaps English); via communication modes that are convenient for him (including perhaps, speech or pictures); with some generality, some abstractness, perhaps some vagueness, imprecision, even error; without having to lay out in detail all necessary subgoals for adequate performance - with reasonable assurance that he is addressing an intelligent agent that is using knowledge of his world to understand his intent, to fill in his vagueness, to make specific his abstractions, to correct his errors, to discover appropriate subgoals, and ultimately to translate WHAT he really wants done into processing steps that define HOW it shall be done by a real computer. The research activity aimed at creating computer programs that act as "intelligent agents" near the WHAT end of the WHAT-TO-HOW spectrum can be viewed as the long-range goal of AI research. 2 Stanford University Heuristic Programming Project Thus, the Intelligent Agent is a knowledge-based system-- the essential system component is a body of knowledge representing the user's problem domain. 2.3 Knowledge-based Systems Research and Design Early work in artificial intelligence aimed toward the creation of generalized problem solvers. Work on programs like GPS [by Newell and Simon] and theorem proving [Nilsson711, for instance, was inspired by the apparent generality of human intelligence and motivated by the belief that it might prove possible to develop a single program applicable to all (or most) problems. While this early work demonstrated that there was a large body of useful general purpose techniques (such as problem decomposition into subgoals, and heuristic search in its many forms), these techniques did not by themselves offer sufficient power for expert levels of performance. Recent work has instead focused on the incorporation of large amounts of task specific knowledge in what have been called "knowledge-based" systems. Rather than non-specific problem solving power, knowledge based systems have emphasized high performance based on the accumulation of large amounts of knowledge about a single domain. A second successful focus in work on intelligent systems has been the emphasis on the utility of solving llreal world" problems, rather than artificial problems fabricated in simplified domains. This is motivated by the belief that artificial problems may prove in the long run to be more a diversion than a foundation for further work, and by the belief that the field has developed sufficiently to provide techniques that can aid working scientists. While artificial problems may serve to isolate and illustrate selected aspects of a task, solutions developed for those selected aspects often do not generalize well to the complete problem. There are numerous current examples of successful systems embodying both of these trends, systems which apply task-specific knowledge to real world problems. Our project is widely regarded as the initiator of this line of endeavor. Over the past 12 years, with support from ARPA, NIH, and NSF, our scientists have studied extensively the problems of knowledge-based systems design and have implemented (or are in the process of implementing) a number of such systems. The accomplishments are summarized in the proposal section on Accomplishments, but a narrative here may prove useful. DENDRAL: An intelligent assistant to an analytic and structural chemist. It infers the structures of complex organic molecules from structural constraints. These constraints are either supplied interactively by the user from his "private" Stanford University Heuristic Programming Project knowledge and intuition, or are inferred automatically from instrument data, such as mass spectral data, nuclear magnetic resonance data,etc. For those families of molecules for which the knowledge base has been carefully elaborated, the DENDRAL program performs at levels equalling or exceeding the best human experts. The DENDRAL program now has a significant user community in university laboratories and in industry, and is being used to solve difficult real problems. Meta-DENDRAL: This program is focused on the problem of elaborating DENDRAL's knowledge base for specific families of compounds. It infers an empirical theory (a body of fragmentation rules) of the mass spectrometry of specific families from recorded mass spectral data. It has not only "rediscovered" rules previously acquired from chemists, but has discovered novel rules for certain families--rules that have recently warranted publication in the chemical literature. MYCIN: This program is an intelligent assistant to a physician diagnosing infectious diseases. In conjunction with its diagnoses, it recommends theraputic action. It is capable of explaining its line-of-reasoning in any (and varying) level of detail to the user in English. It can accept new decision rules from the user in English. It keeps an updated model of its own knowledge base, which it uses to critique the introduction of new rules into the system. It is capable of acquiring and using measures of the uncertainty of the knowledge, and produces a "believability" index with each inference,i.e. it is capable of approximate implication. A version called EMYCIN, sans infectious disease knowledge, has been developed to extend the use of the system to other domains. (One of the project scientists, E. Shortliffe, was recently awarded the Grace M. Hopper award of the Association for Computing Machinery for his distinguished work on MYCIN). HASP: Project scientists working in a classified environment led the development of a signal-understanding program for continuous surveillance of certain objects of military interest. The program ran successfully in a number of highly varied test situations, and is being further developed in a currently-funded ARPA program. The program used a design for incremental hypothesis formation that was a modification of the HEARSAY design for the CMU speech-understanding system. Symbolic knowledge from a number of sources was used to aid the interpretation of the primary signal data. Time-dependent analysis was novel in this system and played an important role. AM: This remarkable program conjectures "interesting" mathematical concepts. Its knowledge base encompasses the (usually private) knowledge of a mathematician as to what constitutes an ltinteresting" construct in mathematics. Starting with the simplest set-theory concepts, and hundreds of rules Stanford University Heuristic Programming Project defining ftinterestingness" of mathematical concepts, it has conjectured such concepts as addition, multiplication, factorization, primes, unique factorization into primes (the fundamental theorem of arithmetic), and an almost unstudied concept in number theory called "maximally divisible numbers". MOLGEN: (under development) This program is being designed to be an intelligent assistant to an experimental nolecular geneticist in formulating plans for laboratory experiments involving the manipulation of short DNA strands with restriction enzymes. The program is concerned with representing knowledge about planning and with the automatic formulation of plans to the level of detail demanded by the user. The program's knowledge must be represented at various levels-- biological,genetic,topological, and chemical--and these levels must be incorporated into the reasoning. Crystallographic Image Interpretation: (under development) This program is being designed to interpret ambiguous, incomplete three-dimensional image data obtained in x-ray crystallography of protein structures. The image input data is the so-called electron density map and the answer desired is an approximately correct protein molecule (or portion thereof). As with HASP, many sources of symbolic data support the interpretation of the primary signal data. The HASP program organization has been imported into this program as a test of its generality. The interpretation problem is difficult because the best wavelength available (x-rays) is too long to resolve atoms and interatomic separations;hence the need for additional sources of symbolic knowledge,e.g. the amino acid sequence of the protein. In short, as the capsule sketches above indicate, the main themes of our work involve: the acquisition and maintenance of knowledge bases; the utilization of this knowledge in a variety of ways for data interpretation, problem solving, and planning; and the representation of this knowledge for computer inference. These programs constitute superb "laboratories" in which to study the problems of design of knowledge-based systems. Though built to a large extent with non-AHPA support, they serve well the purposes of the ongoing and proposed ARPA-sponsored research. 2.4 Production-rule-based Systems: Methodology - and System Features We sketch below our approach to issues of knowledge representation, management and use. It consists of (a) a representation of knowledge in terms of production rules and (b) interactive programs for acquiring, using, and explaining the knowledge in the system. An example of a production rule from the MYCIN domain is shown below, in both the internal format Stanford University Heuristic Programming Project (LISP), and the English translation which is generated from it. pplicable approach to knowledge representation, management and use. That foundation consists of (a) a representation of knowledge in terms of production rules and (b) interactive programs for acquiring, using, and explaining the knowledge in the system. An example of a production rule from the domain of infectious disease diagnosis is shown below, in both the internal format (LISP), and the English translation which is generated from it. PREMISE (SAND (SAME CNTXT GRAM GRAMNEG) (SAME CNTXT MORPH ROD) (SAME CNTXT AIR ANAEROBC)) ACTION (CONCLUDE cmx~ IDENT BACTBROIDES TALLY .6) If 1) the gram stain of the organism is gram negative, and 2) the morphology of the organism is rod, and 3) the aerobicity of the organism is anaerobic, then there is suggestive evidence (.6) that the identity of the organism is Bacteroides. The premise of the rule is a Boolean combination of preconditions, each of which is a predicate testing the value of an associative triple, and is thus value triple. Such rules contain modular "chunks" of knowledge about the domain, represented in a form that will be comprehensible to someone who is acquainted with the field, but may know nothing about computer programming. Many unique and important capabilities are made possible by encoding knowledge in this form, capabilities reflected in the organization and architecture of the system. The knowledge base of a system contains the collection of domain specific rules which give the system its competence in a particular domain. The problem date base contains information about a specific problem given to the system. An interactive "intelligent agent" program uses the collection of rules in the knowledge base to reason about the task at hand. If, for, instance, the M!KIN program is attempting to determine the identity of an organism responsible for a particular infection, it retrieves the entire list of rules which, like the one above, conclude about identity. It then attempts to ascertain whether the conclusion of the first rule is valid, by evaluating in turn each of the clauses of the premise. Thus, for the rule above, the first thing to find out is the gram stain of the current organism. If this information is already Stanford University Heuristic Programming Project available in the data base, the program retrieves it. If not, determination of gram stain becomes the objective of a new rule, and the program retrieves all rules which conclude about it, and tries to use each of them to obtain the value of gram stain. If, after trying all the relevant rules, the answer still has not been discovered, the program asks the user for the relevant information which will permit it to establish the validity of the premise clause. Thus, the rules unwind to produce a succession of goals (i.e., the system performs an exhaustive depth first search of the and/or goal tree formed by the rules), and it is the attempt to achieve each goal that drives the consultation. The use of a rule-based representation of knowledge also makes it possible for the system to offer explanations of all intermediate and final conclusions. During the course of interaction with the user, for example, the consultation program requests many pieces of information. If the motivation for requesting any particular piece of information is unclear, the user may temporarily put off answering and instead inquire why such information is needed and how it will be used. The explanation program which allows him to do this can explore past, current, and future (potential) lines of reasoning, at varying levels of detail according to instructions from the user. A second form of explanation is available after the consultation has ended. For example,again using MYCIN to illustrate, if a user asks "How did you determine the identity of the organism?" the program answers by displaying the rules which were actually used, and explaining, if requested, how each of the premises of the rules was established. This is something which someone familiar with the domain (in this case, a physician) can readily understand without having to know very much about either computers in general or this particular system. Note that it also provides a far more comprehensible and acceptable explanation that would be possible if the methodology employed were to be based statistical approaches to decision making. The knowledge acquisition program can aid in the task of adding new "chunks" of knowledge to the program's expanding knowledge base. The rule-based representation of knowledge means that the expert himself can specify the new knowledge by expressing it in this format. He can thus help make the program more competent, without having to know anything about computer programming. In addition, since the rules are designed to be independent of one another, and are used by the program as necessary in order to deal with the particular situation under consideration, the addition of a new rule or modification of an existing rule does not require alteration of other items in the knowledge base, as is often necessary with systems using other methodologies. 7 Stanford University Heuristic Programming Project 2.5 Judgmental and Inexact Knowledpe Since we want to deal with real-world domains in which reasoning is often judgmental and inexact, we require some mechanism for being able to say that "A suggests B", or "C and D tend to rule out E." The numbers used to indicate the strength of a rule have been called Certainty Factors (CFs). The methods for combining CFs are embodied in a model of approximate implication. These are derived from and are related to probabilities, but are distinctly different. For the rulegiven above, the evidence is "suggestive" (.6 out of l), but not absolutely certain. Evidence confirming an hypothesis is collected separately from that which disconfirms it, and the truth of the hypothesis at any time is the algebraic sum of the current evidence for and against it. This is an important aspect of the truth model, since it makes plausible the simultaneous existence of evidence in favor and against the same hypothesis. We believe this is an important characteristic of any model of inexact reasoning. 2.6 Incomplete knowledge Another fundamental characteristic of the information in real world problems is its incompleteness. In a military domain for instance, information about an unfolding situation may be incomplete sometimes because it is totally unavailable, or more typically, because it is not possible to collect all necessary idata before decidin upon a course of action. Since both of these are common to problems in a wide range domains, a basic requirement of an intelligent system is that it be able to reach a decision based on whatever information is currently available. Unlike many previous methodologies for decision making (e.g., decision trees),the goal-directed chaining ofthat we used above as our example rules is quite flexible with respect to the amount of information present, because the reasoning chains are constructed dynamically. Decision constraints (decision "logic") are computed at time of execution. There is thus no need to attempt to create a pre-determined sequence of decisions that takes account of every contingency. Instead the information present at the time of the problem-solving interaction will itself determine the direction in which reasoning ought to proceed. 2.7 Multiple Uses of Knowledge Bases -- The accumulation of a large store of task-specific knowledge presents a number of unique opportunities. First, of course, it supports high performance, and this is typically why 8 Stanford University Heuristic Programming Project it is assembled. But there are numerous other ways in which the same knowledge base can be employed, with little or no extra overhead. One area of investigation is its use as a reference source for computer-assisted education. That is, in much the same way that an expert taught the system about the domain, is it not possible to have the system teach a student? In a similar vein, it might be used as an on-line reference source, one that was easily searched, read, and modified as knowledge in the domain accumulated. 2.8 Summary of Advantapes of Production Rule Methodology - - 2.8.1 Modularity and Ease of Modification -- by Non- programmer User: As discussed above, the individual rules, as "units" of knowledge, can be removed or replaced quickly by the user- analyst. This makes for a highly adaptive man-computer system, one which gives the user the (correct) feeling that he is in charge of the nature and course of the analysis method and strategy. Such a system is bound to be more powerful and responsive than a computer-only inference system. 2.8.2 Flexibility with Robustness--Knowledge is - separated from software Most large programs are currently similar to a house of cards in that changes or additions must be made with the utmost care, lest unanticipated remote effects make the entire structure collapse. Generally there are too many unknown, unstable, and critical interdependencies. Using production rules, the effects of changes are propagated naturally and non-destructively. 2.8.3 Highly integrated and dynamic documentation Documentation is natural, self-evident, and (where necessary) enforced. The rules themselves are represented in a language "natural" to the user-analyst (e.g. English for a physician-user of MYCIN, chemical graphs for a user of DENDRAL). The system can answer the primary explanatory questions concerning use of rules: HOW is the rule being used and WHY was the rule invoked in the line-of-reasoning. Rule forms govern the acquisition of a new rule, and include all information appropriate and useful for documentation (including ancillary information about when, by whom, and why it was added to the knowledge base). In addition, the same kind of "natural" questioning can be used to interrogate the knowledge base. Thus a Manual for the knowledge base is unnecessary. 9 Stanford University Heuristic Programming Project 2.8.4 Rule criticism during Acquisition Because the acquisition of new rules is controlled by the so-called "rule models" inferred from an examination of the knowledge base, the system can catch many errors of commission and omission made by the user. 2.9 Technical Need The defense systems of the United States are as much information processing systems as they are systems of weapons technology and use. Military personnel, particularly those in command and other decision-making positions, are routinely inundated with massive amounts of problem data. They use their expertise and good judgment to convert that data into relevant information on the basis of which intelligent choices and plans can be made. There is a pressing and omnipresent need in the services for computer-based intelligent assistance in this vital data-to-information transformation. Our basic research has been aimed at developing the methodology that will facilitate the programming of such agents; and at implementing real systems in physical,chemical,biological,medical,and military problem domains so as to give credibility to the view that such agents are possible. Traditional mathematical methods are often powerful, but are sometimes abused, and in any event are scarce. These methods are, for most problems of interest, being pressed to their limits in appropriately representing problems of interest. Most of the world's (and the military's) problems are not basically problems of quantitative formulation and analysis, and the attempt to put them in this straightjacket is often frustrating. Most real-world problems involving decision making and problem solving are symbolic in nature, involving qualitative reasoning, using knowledge that is informal, experiential, judgmental, and often tacit (i.e. private). As we have said, our research is aimed at this class of problems. "Expert" knowledge-based intelligent-agent programs meet these kinds of needs (among many others): a. the problem-dependent knowledge base can be made thorough and more complete than that known to any individual user b. the use of the knowledge in decision-making and problem solving can be made highly systematic, so that relevant knowledge or appropriate solution paths are not inadvertently ignored. C. the systems can be made available in crisis situations 10 Stanford University Heuristic Programming Project that are typically characterized by information overload of the human analysts and decision makers. d. symbolic knowledge can help to structure and guide expensive computational processes, by focusing the processing on appropriate calculations, thereby reducing the computational time and cost by (sometimes) orders of magnitude (this advantage was most obvious in the HASP domain). Finally, from the point of view of the people doing analysis, decision-making, planning, etc. (i.e. from the human engineering point of view) there are these needs: a. for systems that are easily expanded and modified, so that the system can change as human understanding of the problem changes b. for programs that can represent human expertise and inexactness of human judgment in a natural way. C. for programs that can build and maintain credibility with their users by offering clearly understandable explanations of their lines-of-reasoning,thereby making the users feel comfortable in accepting assistance from the agent. 2.10 Technology Transfer In view of the importance of the needs described above, our project scientists have been, and are now, engaged in efforts to transfer our techniques and methods to technologists working on defense systems, and to various scientific disciplines. Our previous proposals have documented our technology efforts of past years. The two year period just ending has been rich with such activities. a. Feigenbaum and other project scientists have interacted vigorously with RAND Corporation scientists on the transfer of methods and systems developed here to the RITA system development for command-and-control applications. b. We have had similar high levels of interaction with scientists at SRI on the development of their systems and methods in military applications. Of particular note is our interaction with them on Inference Nets; and the participation of Feigenbaum and Lederberg at a special meeting relating to EW analysis issues. C. Feigenbaum and Nii have worked with the scientists at Systems Control Inc. on the follow-on effort to HASP (the SIAP program)> 11 Stanford University Heuristic Programming Project d. Feigenbaum gave an all-day seminar to Senior Engineers at the National Security Agency on Knowledge-based systems, and our project scientists have held meetings with NSA scientists on this subject. e. Feigenbaum is consulting with System Development Corporation on the writing of a tutorial report on Knowledge- based Systems for a military agency. f. Feigenbaum has interacted with scientists at NELC on the formulation of a Navy project applying knowledge-based system methods to a C-C Testbed application. g- Feigenbaum and project scientists have interacted with engineers at General Motors Research on the application of knowledge-based systems to their industrial problems. h. Considerable interaction with industrial chemists regarding the use of the DENDRAL programs has been underway for some time. One of our scientists will be spending a year bringing up an "export" version of DENDRAL in England under the sponsorship of the Science Research Council and Shell Oil Co. Other industrial users of the programs include Eli Lilly Co., Monsanto Chemical Co.,Du Pont, and Kodak. 1. A general and massive effort at technology transfer has been the initiation of the Handbook of Artificial Intelligence, being written by Feigenbaum and his students (see subsequent proposal material for more details). 30 The AGE project described later in the proposal is a major effort to make available significant portions of our technology in the form of software packages to a broad scientific and engfneering community. 3 ACCOMPLISHMENTS The Heuristic Programming Project has become known as one of the most active artificial intelligence research groups in the country because of our long history of accomplishments as well as the intensity of our current work. Summarized below are the most notable accomplishments in the past two year period, followed by a review of our work in the years prior to that. 3.1 1976 to 1977 --- 1) Production-Rules 12 Stanford University Heuristic Programming Project The production rule methodology was extended and generalized in two important respects. First, the formalism for the premise clauses was extended to allow comparisons among objects as well as tests on individual objects. For example, instead of merely testing to see if the identity of an infecting organism is e.coli, the MYCIN program can now test to see if the current organism has the same identity as any other organism cultured from this patient. Second, explanation capabilities for production rule systems were extended to answer a broader range of questions. For example, questions about why the MYCIN program did NOT consider e.coli to be a likely infecting organism can now be answered. Negative questions are often highly relevant, but are difficult for a program to answer because their answers come from rules that the program did NOT invoke or procedures that did NOT get executed. 2) Planning . ..particularly planning of experiments The framework was laid for a rule-based hierarchical planning program to guide scientific experimentation. The knowledge needed to make the planning inferences can come from multiple sources of scientific knowledge. (Coordinate funding to support much of this work was obtained from NSF.) The testbed for these ideas is the MOLGEN experiment planning program in molecular genetics, largely funded by the NSF. The planning problems are general: how to represent planning strategies, how to represent objects at arbitrary levels of detail, how to cope with imprecision in the planning rules. 3) Meta-DENDRAL: new results=new science Meta-DENDRAL successfully formulated rules of mass spectrometry that were new to the science. These rules, along with a discussion of the methodology, were published in the scientific literature [[ref XXII}. The program was tested to see if it could rediscover the rules of mass spectrometry for two classes of chemical compounds that were already well understood (amines and estrogenic steroids). Then it was applied to three classes of compounds whose mass spectrometry was not as well known (mono-, di-, and tri-ketoandrostanes). The program produced three sets of rules that explained much of the significant data for these classes. The time for manual rule formation for these data was estimated to be several months. Automatic rule-formation was completed in less than an hour of processing time. 4) Meta-DENDRAL: generalization Progress was made on generalizing the Meta-DENDRAL program ; and rules for a new domain were successfully discovered by the program. A scientific paper on this application was submitted for publication [[ref Schwenzer}. The new application was: 13 Stanford University Heuristic Programming Project learning rules for interpreting signals from C13-NMR spectroscopy. The instrument produces data points in a bar graph in response to the resonance of each carbon-13 nucleus in the sample. The rules describe an environment of a Cl3 atom and predict a resonating frequency range for every atom that matches the description. The Meta-DENDRAL program needed some modification because the rules are predicting ranges of data points, and not precise processes, as for the mass spectrometry version. 5) Meta-DENDRAL: Methodology A report was written on the AI methodology underlying Meta- DENDRAL [[ref BGB/TM}. The major idea is using knowledge of the domain to guide a learning program. The major difference between Meta-DENDRAL and statistical learning programs is that Meta- DENDRAL uses a strong model of mass spectrometry, including any assumptions the user cares to make about the domain, to guide the formation of explanatory rules. 6) Meta-DENDRAL: negative results Our attempt to produce rules of various forms from Meta- DENDRAL was premature because the syntactic form of the rules is intimately tied to the rule generator. We are working on the problem of generalizing Meta-DENDRAL and this task is among those that we consider important. Of course, the program can search an immense variety of rules within a specified family. At issue here is the lack of flexibility and generality across family boundaries. 7) HASP: generalization The control structure and representational scheme of the HASP program was successfully mapped into a new domain (image analysis of x-ray crystallographic data, as described in the background section). A working paper on this was published [[ref Nii). 8) Knowledge Engineering: techniques for interactive control Mechanisms were developed for giving users of rule-based interactive programs more control over the running programs. While a knowledge-based system is making its inferences, in between statements that get printed, a user often wonders what is going on and whether the answer from the program will be worth the wait. We introduced into the CONGEN program two helpful interrupt capabilities that can be generalized to other programs. First, the user can interrupt the computation to ask for an estimate of the amount of work remaining to be done. This requires a model of how to extrapolate from current status to 14 Stanford University Heuristic Programming Project final status. Second, the user can request any, or all, of the intermediate solutions already generated in order to decide whether to continue the work or to introduce new constraints on the problem. 9) Rule-based teaching program: held in abeyance Before we could produce an "intelligent teacher" that used an existing production rule knowledge for instructional purposes, it was necessary to work on the problem of providing explanations in many different ways (proposed here). Thus the effort expended on this problem only uncovered new research problems for the time. 10) Distributed computing The first version of the "contract nets" programming organization was developed and a paper about it written. Contract nets will allow various types of AI methods to be programmed for parallel, asynchronous nets of computers. 11) Documentation of AI concepts, techniques, programming methodology: AI Handbook Organization of the volume was completed, and first drafts of most of the volume were completed. The publication of a technical report was premature, but informal copies of material were sent to ARPA-IPTO. 3.2 1975 to 1976 --- 1) The methodology and a prototype system for encoding rules of strategy for problem solving were developed for production rule systems. We call these "meta-rules" and have reported their use in [[ref Davis1976). A meta-rule reorders or prunes the list of relevant domain rules to be invoked on the basis of the encoded strategy to pursue the consultation one way rather than another. 2) A program (named Teiresias) was written for interactive acquisition of new rules, using strong syntactic models of the rules to guide the interaction. This work was reported in [[ref Davis1976). The system has strong expectations of what to expect in rules that make certain conclusions and in rules that refer to various attributes in their premises. Thus it can remind the user to include additional clauses in the rule and it can interpret many ambiguous statements written by the user. 3) A program was written to formulate models of rule-forms in a production rule system's knowledge base. These we call 15 Stanford University Heuristic Programming Project "rule schemas" and their automatic formulation and use are reported in [[ref Davis1976). These models are induced over the set of the existing rules in the system at any moment. They are changed each time a new rule is added so that the information is always current. 4) The MYCIN program was "emptied" of its medical knowledge to produce a program that could accept knowledge of non-medical (or other) domains. This version still makes strong assumptions about the organization of objects in the domain and about the nature of rules. If the structure of the new domain is similar to MYCIN's in these respects, however, the program can accept new rules and offer expert advice in this domain. One test domain was the diagnosis and recommendation for fixing problems in an automobile horn. 5) The RULEMOD program for "fine-tuning" Meta-DENDRAL's newly-discovered rules was finished. This program provides a number of important subtasks, including merging similar rules, making rules more specific or more general, and filtering out the weakest rules. RULEMOD checks for counterexamples to rules and uses this information in all of the named tasks. Because of the expense of computing counterexamples to possible rules, this computation is delayed until Meta-DENDRAL has a set of plausible rules, rather than computing counterexamples on each possible rule examined in the search of the rule space. 6) The RULEGEN component of Meta-DENDRAL was demonstrated to work with heuristic search methods used in many other AI programs (cf. Nilsson's book, Problem-solving Methods in Arti.ficial Intelligence, McGraw-Hill). Guidance from a model of mass spectrometry is an important feature of RULEGEN. Also, the program uses problem data for pruning possible rules (and all more specific rules formed from those). The amount of data examined during the search is very large and the space of rules is immense, so the search needs to be rather coarse in order to produce plausible, but not necessarily optimal, rules. 7) A program was written to discover new, interesting conjectures in mathematics. This program, called AM, starts with a set of fundamental propositions of number theory and explores interesting extensions to the set, for example by looking at analogies and by looking for interesting properties that various subsets of the numbers share. This work was reported in [[Lenat). 8) A rule-based system for integrating multiple sources of knowlege, in the task of interpreting 3-D image data, was designed and partly implemented. (This task was funded in large part by the National Science Foundation.) The data for this prototype system come from the domain of protein crystallography. KSs are being developed and implemented for identifying 16 Stanford University Heuristic Programming Project individual molecules in the crystal structure, locating heavy atoms and cofactors, locating the backbone of the polymer, identifying side chains, matching the features in the data to the amino acid sequence, etc. The inferred structure is a multi- leveled hypothesis, conforming to the levels of detail to which model builders refer in applying their skill. A flexible, rule- based control structure allows the system to be exercised in many different modes, e.g. event-driven or goal-driven. Knowledge sources are kept distinct from the program that deploys them, so that the effectiveness of specific KS.5 on the system's performance can be readily investigated. 3.3 Years prior to 1975 -- Under ARPA support, and later under NIB support, the HPP developed a high performance, knowledge-based program for the interpretation and explanation of signal data from a mass spectrometer. This is the Heuristic DENDRAL program, which is now widely acclaimed as one of the most powerful and useful of AI programs. The problem solving paradigm that has emerged from DENDRAL work is the so-called "plan-generate-test" paradigm. It is based on heuristic search of a space of possible hypotheses with planning before generation of hypotheses and testing of each generated candidate. The generator for DENDRAL is a general-purpose graph generator which produces a list of all possible graphs containing specified numbers of nodes of various types. The most important features of the generator are that the list of graphs is guaranteed to be complete and non-redundant and, equally important, that the list need not be exhaustively generated. The generator can be constrained to produce only graphs that meet specified criteria that are inferred from the initial problem data. The planning program is domain-specific. It contains a large amount of knowledge of mass spectrometer fragmentation processes, nuclear magnetic resonance phenomena, chemical stability, etc., and can accept additional knowledge (or assumptions) about the problem area from the scientist using the program. The testing program uses a theory of mass spectrometry to make testable predictions for each candidate hypothesis. Matching the observed data with the prediction, then, allows the program to disconfirm some hypotheses and confirm the rest to varying degrees. The confirmed hypotheses are ranked and the most plausible hypotheses are given to the user as the best explanations of the initial data. 17 Stanford University Heuristic Programming Project The success of any reasoning program is strongly dependent on the amount of domain-specific knowledge it contains. This is now almost universally accepted within AI, but DENDRAL's success is one of the reasons for its acceptance. We also showed the overall benefits of separating the knowledge of the domain from the computer code that manipulates that knowledge for problem solving. Because of the difficulty of extracting specific knowledge from experts to put into the program, we began to explore the problems of transferring knowledge into a program efficiently. We have looked at two alternatives to "hand-crafting" each new knowledge base: interactive knowledge transfer programs and automatic theory formation programs. In this enterprise the separation of domain-specific knowledge from the computer programs themselves has been a critical component of our success. We first developed a representation of knowledge that is natural and easy for experts to think about. The ideas had been discussed by others (e.g., Newell), but we brought them together in a problem-solving system in science. We codify individual facts about the domain as attributes and values associated with objects -- "the X of Y is Z". In addition, we developed an easy way of encoding and manipulating the degree of certainty (called the "certainty factor" or CF) associated with each fact. The rules of inference between facts are then encoded in conditional sentences, or production rules, which allow new conclusions to be made (with some CF) any time the statements mentioned in the premise clauses are true (or "true enough"). For interactive knowledge transfer we developed programs that carry on a dialogue with an expert in order to help him encode his knowledge of the domain. At first these programs were extremely simple, but we have by now given them the ability to understand stylized (not entirely free form) English sentences typed by the expert and have given them some understanding of the content, as well as the form, of the rules. Automatic knowledge acquisition is a longer-term research problem. One of the stumbling blocks with the interactive knowledge transfer programs is that for some domains there are no experts with enough specific knowledge to make a high performance problem solving program. We were looking for ways to avoid forcing an expert to focus on original data in order to codify the rules explaining those data because that is such a time- consuming process. Therefore we began working on an automatic rule formation program (called Meta-DENDRAL) that examines the original data itself in order to discover the inference rules for that part of the domain. The problem solving paradigm for Meta-DENDRAL is also the plan-generate-test paradigm used in Heuristic DENDRAL. In this 18 Stanford University Heuristic Programming Project case the program generates plausible rules within syntactic and semantic constraints and within desired limits of evidential support. The model used to guide the generation of rules is particularly important since the space is enormous. The planning part of the program collects and summarizes the evidential support. The testing part looks for counterexamples to rules and makes modifications to the rules in order to increase their generality and simplicity and to decrease the total number of rules. Another application of artificial intelligence has been made by the MYCIN group, now called the Knowledge-Based Consultation Systems group to reflect our interest in basic methodological issues. Work on MYCIN began in 1971 with a small group looking specifically at medical decision making. The resulting program, based on many of the ideas developed during work on DENDRAL, was shown to perform at the level of an expert in one area of medicine. The methodology was general, however, and we sought ARPA support in 1975 to extend that methodology beyond the one applications area. This was the start of our work on generalized production systems, along with programs that could explain the system's reasoning and programs to acquire new production rules from experts. Under separate funding the MYCIN program continues to develop into a high performance system that can provide expert advice about problems that are too complex for a non-expert to deal with alone. Other research groups have studied MYCIN for their own applications, including ARPA-sponsored groups at SRI and RAND. The conceptual simplicity of the production rule interpreter in MYCIN makes it relatively easy to map the program into new domains, with their own sets of domain-specific rules. MYCIN'S rule base contains about 400 rules at present and so we are looking closely at ways to keep the amount of processessing from growing with the number of rules. 4 PROPOSED WORK 4.1 Prologue The Heuristic Programming Project's contract support from ARPA is part of a mosaic of funding that supports basic research in the design and engineering of knowledge-based systems. The work is usually set in the context of domains of knowledge in science or medicine. The ARPA contract supports this work in the direction of research goals of interest to ARPA's IPTO in the area of knowledge-based systems. Grant support from the National 19 Stanford University Heuristic Programming Project Science Foundation and the National Institutes of Health provides domain expertise as well as some of the basic computer science support. because the task domains and the basic research goals of the ARPA-sponsored research mesh so well with NIH's goals for the SUMEX-AIM computer resource, pdp-10 time and file space is provided by SUMEX-AIM to the ARPA project. The synergism among these various projects and interests has been very great, making this project one of the most effective--and indeed one of the most cost-effective--in the history of artificial intelligence research. 4.2 Major Themes The statement of tasks that follows is woven of three major themes: I-A. Generalization of knowledge-based systems design and implementation techniques. The systems we have built are sufficient in number and variety to support an attempt to generalize our techniques so that they will be readily accessible to a broad community of system builders, not merely the small Stanford group. We believe that we are at a significant point of cumulation in this science, and that the product of the scientific generalization should be programs embodying the generalized methods serving future knowledge-based systems design and implementation efforts. I-B. Extension of current research toward new or more powerful techniques for knowledge-based systems. The line of research we have been pursuing has been very fruitful. Most of the important scientific thrusts are currently active but not yet fully exploited. There are substantial scientific and practical benefits to pushing ahead to a fuller exploitation during the coming period. Much of the work on knowledge acquisition, representation of production rules, integration of multiple sources of knowledge,etc., is to be viewed in this light. II. Orientation toward other ARPA-IPTO advanced R&D efforts. We are orienting a portion of our total effort to a rule- based approach to some of the problems in the interpretation of image data; and to a study of the program organizations and signal-interpretation methods that might be used in distributed sensor net applications. 4.3 Tasks The following list summarizes the research tasks to be performed during the two year period. It is broken into two major parts. The first set of tasks are frontier problems of 20 Stanford University Heuristic Programming Project advanced symbolic computing. The second set includes research problems that relate directly to other ARPA-IPTO programs. I. Expert Knowledge Based Systems and Heuristic Rule Technique Development Proposed is a continuation of our basic research on knowledge-based systems design and techniques for representing "expert" symbolic knowledge in the form of heuristic rules. The research is composed of five major parts (A-E) discussed below. A. Development of program-packages embodying generalized techniques for work on knowledge-based systems ; preparation of a compendium of these techniques and associated methods and programs in artificial intelligence research. 1. AGE package ("Attempt to GEneralize"). Isolate inference, control and representation techniques from previous knowledge-based programs; reprogram them for domain independence; write a rule-based interface that will help a user understand what the package offers and how to use the modules; and make the package available to other ARPA contractors, service labs doing knowledge-based systems development, and the general scientific community. Detailed Discussion: The goal of this new effort is to construct a computer program to facilitate the building of knowledge-based systems. The design and implementation of the program will be based primarily on the experience gained in building knowledge-based systems at the Heuristic Programming Project in the last decade. The programs that have been built are: DENDRAL, meta-DENDRAL, MYCIN, AM, HASP, and MOLGEN (currently under development). Initially, The AGE program will embody methods used in our programs. However, the long-range objective is to integrate methods and techniques developed at other A.I. laboratories. The final product is to be a collection of useful "building-block" subprograms, combined with a knowledge-based front-end that will assist a user in constructing knowledge-based programs. It is hoped that AGE can speed up this process and facilitate transfer of the technology by: (1) packaging common AI software tools so that they do not need to be reprogrammed for every problem; and (2) helping people who are not knowledge-engineering specialists to write knowledge-based programs. Two Specific Research Activities of the AGE Effort are: 1. The isolation of techniques used in knowledge-based systems. It has always been difficult to determine if a particular problem-solving method used in a knowledge-based program is 21 Stanford University Heuristic Programming Project "special" to a particular domain or whether it generalizes easily to other domains. In the currently existing knowledge-based programs the domain-specific knowledge and the manipulation of such knowledge using AI techniques are often so closely coupled that it is difficult to make use of the programs for other domains. We need to isolate the AI techniques that are general to determine precisely the conditions for their use. 2. Guiding users in the application of these techniques. Once the various techniques are isolated and programmed for use, an "intelligent front end" is needed to guide users in their application. Initially, we assume that the user understands AI techniques and knows what he wants to do, but that he does not understand how to use the AGE program to accomplish his task. The program at this stage of the development will need to have the basic tools coupled with a package to guide the user in applying these tools. A longer-range interest involves helping the user determine what techniques are applicable to his task. That is, we assume that the user does not understand the necessary techniques of writing knowledge-based programs. Some questions to be posed are: What are the criteria for determining if a particular application is suited to a particular problem- solving framework? How do you decide the best way to represent knowledge for a given problem? There are some smaller, but by no means trivial , questions which also need answering. Is there a "best way" to write production rules which would apply to many task domains? Is there a data representation that would cover many tasks? What is the best way to handle differences in the ability of the users of the AGE program? Research Plan: The AGE program will be developed along two separate fronts, both of which are divided into incremental development stages. The first of these fronts is the development of the ability to help build many different types of knowledge- based programs (the "generality" front). The second front is the development of "intelligence" in the interaction between the user and the AGE program; i.e. moving from dialogues on "how to use the tools in AGE" to "what tolls to use" (the "how-to-what" dialogue front). The current plan for the development of the AGE program, with approximate milestone dates, are listed below. a. Generality: The development of a program package that will enable the user to build HASP-like knowledge-based programs` The applicable tasks are characterized by the need for: the integration of multiple sources of knowledge, multi-level representation of solution hypothesis, opportunistic problem- solving methods, and explanation capability of the reasoning steps. HASP-like paradigm has been used to solve problems of interpreting large amounts of digitized physical signals, but can also be extended to problems of processing large amounts of symbolic data. Dialogue: The development of dialogue to show the 22 Stanford University Heuristic Programming Project user how to utilize the packaged components in AGE to build HASP- like programs. The interactive capability will be limited to: specifying how to build multi-level hypothesis structure; how to write production rules to represent domain knowledge; and how to use various techniques available for opportunistic hypothesis formation. (12/77) b. Generality: Supplement the ability to build HASP-like programs with a capability to build MYCIN-like goal oriented programs. Dialogue: Same level of dialogue capability with additional ability to discuss how to chain rules and how to specify the necessary parameters for the context tree. (9/78) c. Generality: Same level as for 9/78, i.e. ability to build HASP- like, MYCIN-like or combination of HASP- and MYCIN-like knowledge-based programs. Dialogue: Begin to extract from the user some key charateristics of the task, and using that information begin to suggest appropriate knowledge representation and problem-solving techniques for the user's task. This interactive capability will be limited to the generality level at this point in the AGE development. (12/78) d. Test phase: the usefulness of the AGE system by developing an application program in some task domain. (a) An application program will be chosen from among ongoing program development efforts within our own project or within the ARPA contractor community. An application will be chosen whose primary task is that of interpreting large amounts of symbolic data or described signal data. Among the possibly appropriate tasks would be one of rule-based image analysis (see proposal section IIa.). (The search for an application will be concurrent with the development of the dialogue capabilities of 3., 12/78). (b) Collect specific knowledge needed for the application program and begin to develop the program using the AGE system (6/79). 2. PLAN package. Oriented toward the representation of plans-of-action and toward an expert's knowledge of the best problem solving strategies to employ in his domain. Will do inference on components of planning and strategy rules so that new plans and strategies can be constructed readily from previous ones. The representation will allow the manipulation of various "levels of detail" of plans and strategies. The package will be made available as previously mentioned in connection with AGE. Detailed Discussion: Before starting a technical presentation of the ideas for the Plan Package, it is worth highlighting some of the issues which motivate its development. a. How can a variety of types of domain actions be accommodated in a knowledge base? b. How can a variety of types of strategy and control knowledge be incorporated in a knowledge base? 23 Stanford University Heuristic Programming Project c. How can a variety of types of problem solving states be expressed and manipulated by the system? d. How should plans be represented? e. How can the problem statements for a variety of types of problems be acquired? f. How does the expression and representation of problem solving states relate to the expression of the domain and strategy knowledge? The Plan Package consists of two major entities -- the Planning Network and the Strategy Package. The Planning Network is a set of software which manages the representation of the plans created during the problem solving process. When a problem is acquired from a user, it is represented as an initial planning network. Problem solving takes place as the active strategy rules manipulate the planning network to create solutions. The Strategy Package itself is discussed in the next section. Since the planning state knowledge is important for the expression of strategy in the Plan Package, it is worthwhile exploring briefly the nature of this knowledge. It is useful to consider the planning network as being composed of three parallel planes -- the solution plane, the planning plane, and the focus plane. These planes contain (1) the solution steps (domain rule applications) and world states, (2) the planning and design steps and (3) the focus of attention knowledge respectively. All three planes of the network are built dynamically during the problem solving process. Different types of nodes in the network correspond to the different components of the problem solving process. A number of issues have been raised about the management of strategy knowledge. a. How should strategies be expressed? b. How can strategy information be assimilated so that the system will use it appropriately when designing or explaining solutions? c. How can a knowledge based system assist a domain expert in structuring and expressing his ideas about strategy? Means-ends analysis is one of the simplest ideas in the current stock of methods for problem solving. As such, it should exist as a standard strategy in strategy package of artificial intelligence techniques to be used as needed. The current state 24 Stanford University Heuristic Programming Project of artificial intelligence, where a researcher must re-code Means-ends analysis any time he wishes to use it is akin to a carpenter forging a new hammer for each job. One approach for making an instance of Means-ends analysis available as a tool would be to provide a packaged program which accepts arguments for the various components of Means-ends analysis (eg. a difference table, difference function, etc.). The alternative being proposed here is a system which uses schemata to drive the strategy acquisition process and which can guide a user through the details. The goal is to create a supportive environment for the painless testing of fairly high level strategies. Such a system should be able to draw on its knowledge base to provide assistance in casting a problem into a Means-ends framework. In summary, other systems have stumbled over the expression of more complex forms of domain and strategy rules and have been limited to solving a single kind of problem. We propose extending this work by developing what we have termed the Plan Package. The Plan Package consists of two major components - a schema-based representation for the problem-solving states termed the Planning Network and a schema-based representation for domain rules and strategies termed the Strategy Package. The Planning Network will provide a representation for a variety of types of problem solving so that the problem solving system will be able to solve more than one type of problem. The Strategy Package will provide a set of standard artificial intelligence strategies in the form of schemata, which may be instantiated into strategy rules when they are supplied with the particulars of domain knowledge. These schemata will facilitate the acquisition of tailored strategies by guiding a user a step at a time through the particulars of the acquisition process. Research Plan: The Plan Package will be developed and tested in the domain of molecular genetics as part of the MOLGEN project. It will be further developed and extended to other domains as a test for generality as part of the AGE project. Time Milestone 3mo MORSE -Molgen Object Rule & Strategy Editor operational objects. 6mo MORSE operational for rules. System capable of simulating several kinds of laboratory experiments. Geneticists begin using system to build Knowledge Base. 9mo MORSE operational for some strategies. Capable of planning limited classes of experiments. Primitive performance monitoring operational. Begin CS experiments. 12mo MOLGEN starts having impact by trying to design real experiments. Penny starts getting involved to use parts of system for experiments in AGE project. 25 Stanford University Heuristic Programming Project 3. AIHANDBOOK. Documentation of methods, techniques, programs, languages, etc., with tutorial overviews, whose goal is the transfer of knowledge-based systems and other A.I. technology to the broader engineering, scientific, and military community. Detailed Discussion: A significant "technology transfer" gap exists in the dissemination of information concerning the concepts, methods, techniques, and programs of artificial intelligence research. The quality of the science is widely recognized (e.g. four of the twelve Turing Award winners of the Association for Computing Machinery are researchers in artificial intelligence). But the use of the science in engineering applications is very small. And the amount of university teaching being done to train new scientists and engineers in the use of the work is also small. Consequently, we were motivated to begin a major work of documentation to fill the gap. The emerging pair of volumes is called the Handbook of Artificial Intelligence. It is being prepared by project scientists and Stanford Computer Science Students. It is a project of low cost and potentially high impact. The Handbook will consist of approximately two hundred articles on all aspects of artificial intelligence research and development. Each article will be approximately three pages in length; will include brief discussions of the concept,method, technique, program,etc., and a small set of primary source references for further study. In addition, the Handbook will contain numerous "overview" articles summarizing main lines of activity. The entire work will be tied together by an index constructed as a relational network (a "semantic net") so that a user can be led to an article related to his interest even though he did not know that he needed that article. The outline for the AI Handbook has been put together with the advice and consultation of many AI scientists from laboratories in this country and abroad. Research Plan: The Handbook will be produced first in a preliminary edition of two volumes, so that comments, criticism, and correction can be obtained from a wide audience. The preliminary edition of the first volume will be available in Fall,1977 (so we intend). The preliminary edition of the second volume will be available by Summer, 1978. All the editorial changes necessary, and the entire Handbook preparation for publication will be done during the 1978-79 academic year. More than 150 "first drafts" of articles are already written, and have been made available to ARPA-IPTO for its internal uses. 26 Stanford University Heuristic Programming Project B. Production rule representation of knowledge. 1. Particular focus will be placed on the representation of expert problem solving strategies. (i.e., rule-directed behavior at the level of program control.) Detailed Discussion: The strategies for problem solving are as important for high performance as the domain specific rules themselves. Sone of the strategies that experts use can be expressed in meta- rules, which guide the application and determine the relevance of domain rules. We have already developed the meta-rule formalism for representing knowledge about the reordering and pruning of domain rules. We are proposing to extend that to additional kinds of strategy knowledge. For example, one strategy for dealing with a complex situation is to view it initially as if it were a simple one, find the specific complicating f factors, and deal with those as special sases. Here is is "default information" about the uncomplicated situation that allows us to focus on the important special case rules. We propose to use knowledge about commonly expected associations (not expressed in meta-rules but in general "models" of the domain) to guide the application of rules. This work will be carried out in the context of the MYCIN and MOLGEN programs. Research Plan: a. Work with MYCIN's domain experts to formulate at least three models of associated knowledge and expected ("default") values for relevant parameters. (6,78)b* Develop a formalism for encoding this kind of knowledge C. Isolate the critical times in MYCIN's performance, specifically in the control structure, when strong expectations from models will be most beneficial for guiding the consultation. Modify the control structure to refer to the stored information in models. d. Test the cost effectiveness of using models of typical situations in the domain in the context of the three (or more) sample in-depth models (6/79). 2. Incorporate time-dependent relations into the production rule formalism and control structure, i.e., use time as an integral part of the analysis, where appropriate. Correspondingly, in the dynamic model of the situation (current best hypothesis) built up by a reasoning program, keep a "time line" of relations between events. HASP was the ground-breaking program for this work, and the new work will be oriented toward signal data interpretation and also toward MYCIN. 27 Stanford University Heuristic Programming Project Detailed Discussion: One area of improvement for production systems is to uniformly handle rules which interpret a sequence of events over time, as opposed to making conclusions based on a single "snapshot" of the decision making situation. This area of interest will allow the expert to capture the more dynamic situations in his problem domain. The main extension to knowledged-based systems is to allow the expert to specify trends to be searched for in data which has occured up to the present time, to provide a prediction of future trends, and to specify specific milestones which are anticipated. As the processing continues forward in time, the program can compare these anticipated events against the actual data to determine if the previous decision making process was effective. The representation of trends can also be used to "work backwards" to estimate what certain data values might have been at a previous time, but were not then reported. The major problems include picking both the most effective representation for this type of information and the methods for recognizing and describing trends, and in incorporating the uncertainties in the data or method of data collection into the prediction process. Much of the richness of time oriented analysis is based on mathematical models (e.g. differential equations, simulations, and statistics), which must be integrated with the heuristic information used by the expert to make decisions. The problems of predicting past or future events based on data not known with certainty, or where the current situation must be used to imply how much reliability can be placed on a prediction using the available data must be handled. In a consultation system, it is important to know which type of facts are expected to change, in order to inquire of the user only the minimum facts needed to update the current model of the problem situation. One area of improvement for production systems is to uniformly handle rules which interpret a sequence of events over time, as opposed to making conclusions based on a single "snapshot" of the decision making situation. This area of interest will allow the expert to capture the more dynamic situations in his problem domain. The main extension to knowledged-based systems is to allow the expert to specify trends to be searched for in data which has occured up to the present time, to provide a prediction of future trends, and to specify specific milestones which are anticipated. As the processing continues forward in time, the program can compare these anticipated events against the actual data to determine if the previous decision making process was effective. The representation of trends can also be used to "work backwards" to estimate what certain data values might have been at a previous time, but were not then reported. 28 Stanford University Heuristic Programming Project The major problems include picking both the most effective representation for this type of information and the methods for recognizing and describing trends, and in incorporating the uncertainties in the data or method of data collection into the prediction process. Much of the richness of time oriented analysis is based on mathematical models (e.g., differential equations, simulations, and statistics), which must be integrated with the heuristic information used by the expert to make decisions. The problems of predicting past or future events based on data not known with certainty, or where the current situation must be used to imply how much reliability can be placed on a prediction using the available data must be handled. In a consultation system, it is important to know which type of facts are expected to change, in order to inquire of the user only the minimum facts needed to update the current model of the problem situation. Research Plan: Design internal representation (3 months) Integrate mathematical models (4 months) Forward and backward prediction under heuristic control (5 months) Handle uncertainty in the prediction process (6 months) 3. Extend the capabilities of the production rule formalism to include general knowledge of the task domain not easily represented in production rules, such as mathematical relations. Extend the control structure to use this general knowledge in order to direct the reasoning and focus the system's attention on the most relevant subset of rules. Toward similar ends, we will develop further our previous work on the use of meta-rules as guidance for the use of domain rules in a program. Detailed Discussion: One very powerful form of reasoning used by a human consultant when he is presented with a new situation is that of classifying and comparing the new situation with "typical" situations he has encountered previously. For example, an auto mechanic who is asked to fix a car which won't start, immediately begins to check for those things which are typically malfunctioning in a "car won't start" situation. That is, he will check the starter, the battery, the supply of gas, etc., and not bother to check the oil level, because he knows that the oil level is not typically at fault in a ucar won't start" situation. Sometimes the consultant is unable to classify the new situation as one he has encountered previously, and he must rely upon his basic knowledge of the task domain in order to complete the 29 Stanford University Heuristic Programming Project consultation. When he is able to classify the new situation as being similar to a "typical" situation, he has, in effect, a model of what to expect in the current Consultation. He uses this model of a similar consultation to guide him in performing the current consultation. The model also gives him expectations about what will happen in this new situation because it tells him what typically happens in this situation. And because he can compare his findings with what is "typically" found, he has a way to check the accuracy of the information he obtains in the new situation. We propose to add such a set of models to a rule- based consultation system in order to control the invocation of the rules and to give the system knowledge about what to expect in typical situations. These models will also give us the ability to take appropriate action in response to inconsistent information. That is, given knowledge about what constitutes a "typical" case, atypical findings can be isolated and checked for their validity. Research Plan: a- Plan design of models and model-based system (g/77) b. Add models to data base and implement model-directed consultation system (3/78) C. Provide consistency checking of information obtained during the consultation d. Provide explanation and interactive knowledge- acquisition facilities for the model-based system (12/78) 4. Develop a formalism for measuring the power and performance of a program's knowledge base. Create methods for assessing strengths and weaknesses in the knowledge base and for recommending modifications. Detailed Discussion: The objective of this research is to develop principles and methods for measuring the performance of certain kinds of AI real world problem solving systems, and to investigate how to use such measurements to improve system performance. This work is motivated by the belief that for an AI system to solve real world problems effectively, it is just as important for it to know how its knowledge is used as it is to know what knowledge it has in the first place. The work is also motivated by the conviction that performance knowledge is essential for conveying a program's scope and limitations to a user, who can then use the program more intelligently. Furthermore, it is hoped that the principles developed here will facilitate the creation of a general laboratory for testing AI methods, whose use will increase our understanding of the field. 30 Stanford University Heuristic Programming Project This investigation will primarily be concerned with how well the knowledge base of an AI problem solving system is organized and used. The emphasis will be on such factors as the frequency of accessing and grouping information about domain objects; the cost of evaluating and invoking rules characterizing the domain; and the relative merits of strategies for selecting knowledge sources. There are of course other important determinants of system performance, such as the selection of the best internal representation for a data structure, or the selection of the most efficient method of searching a data file. However, the main concern here is to assist expert users in organizing domain knowledge for effective problem solving in conjunction with the AI system. We will seek to develop interactive methods for a user or designer of a knowledge based system to express system performance criteria; to describe which parts of the system should be monitored, in what detail, and what special events to look for; and to enter heuristics to help the system decide what changes to recommend on the basis of its observations. The system should provide a description of distribution of effort and an estimate of the costs and benefits of further observation of parts of the system. Research Plan: a. develop layered information gathering tools - tools which offer a choice of more or less detail for more or less cost - and establish how to use them effectively. b. establish ways to characterize which parts of a system to observe and what events to detect. c. develop a means for stating performance criteria (6/78). d. provide examples of using a description of behavior and a statement of performance criteria to recommend knowledge base changes. e. investigate ways of estimating the utility of monitoring the system (6/79). C. Program designs for expert problem solving,with particular focus on planning methods 1. Develop programs and methods for hierarchical planning in task domains in which knowledge is uncertain, Though neither hierarchical planning methods nor methods of inexact inference are entirely new, their confluence is now an important issue. Detailed Discussion: 31 Stanford University Heuristic Programming Project Significant progress has been made in recent years on methods for both hierarchical planning, producing successively more detailed plans until all factors have been considered, and inexact inference, reasoning with rules whose conclusions can only be "trusted" with a measured degree of certainty. Combining these two powerful problem-solving techniques is now an important research issue for complex, real-world domains. Hierarchical planning in these domains is necessary because of the combinatorial explosion that results when all details are considered at all stages of plan formation. Yet, the rules of these domains make use of uncertain knowledge to make uncertain conclusions, conclusions which may change as further details become available at more refined stages of planning. The major research question is how to properly modify the basic structure of hierarchical planning to handle the reality of inexact domain rules. Certainly Sacerdoti's idea of postponing decisions involving the ordering of steps in a plan until such a decision is necessary seems a reasonable beginning. The system must be able to handle cases where plans fail at low levels of refinement not only because detailed preconditions of rules cannot be satisfied, but also because the very actions of the rules themselves change when all details are considered. This occurs because the abstractions of rules which are valid at the highest levels of hierarchical planning cannot be expected to provide more than a reasonable guess about the likely action of a rule for which a detailed simulation may be necessary. In simple domains the virtues of hierarchical planning were seen by merely considering successively more rule preconditions at each level of planning, the action of the rule stayed constant. The problem for domains with inexact knowledge is not merely one of considering at each level of planning only those details which are necessary, it is one of considering those details which are reasonable, a task which itself may be rule based. Research Plan: a. A goal of the first six months should be to categorize the variety of problems which do result when planning in complex domains where knowledge is tentative and uncertain. b. Develop rule-based systems which provide heuristic control at the problem points in hierarchical planning--expand the notion of domain dependent critics as introduced by Sacerdoti. This process should take between 1 and 1 l/2 years- C. Test these systems in a variety of problem domains. This will involve methods for expert entry and explanation of the rules described in (b). The process of testing and expanding will start after 1 year and continue through the second year (6/79). 32 Stanford University Heuristic Programming Project D. Heuristic Rule Acquisition, by automatic rule formation methods and semi-automatic model-directed rule extraction 1. Extend the interactive production rule acquisition system to handle acquisition of other types of information about the task domain, i.e., domain-specific models other than rule- based models. Use the domain- specific models to check for inconsistencies in new rules, i.e., exploit the inevitable redundancy to enforce logical consistency among rules. Detailed Discussion: The information about "typical" situations, discussed in section B above, will be very helpful fo reasoning and problem solving. It is necessary to be able to acquire this information from experts first, however, before it can be used. This will require giving the production rule system the same acquisition and explanation capabilities for these "models" as it has for iis rules. The current knowledge-acquisition facility of the consultation system will be extended so that these models can be acquired through interaction with the user. The explanation capabilities of the system will also be extended and improved so that "higher-level" explanations of system performance will be provided. For example, we could explain which hypothesis, i.e., which model, is being tried during a given consultation, and we could say why it is being tried. Research Plan: a. Define a vocabulary to be used in the acquisition of models. b. Implement the basic Knowledge-Acquisition program based on this vocabulary. C. Provide good human engineering features for Knowledge- Acquisition. For example, simple spelling errors should be tolerated and careful prompts should be used when communicating with the user. d. Extend the explanation facility so that "higher-level", contextual explanations can be handled. 2. Develop programs for acquiring one particular type of non-rule-based auxiliary model of a domain: an expert's taxonomy of the objects in his domain. This taxonomy is a necessary precondition for applying rules that mention the objects. Detailed Discussion: 33 Stanford University Heuristic Programming Project In many domains a strict hierarchical relationship exists among an important subset of the objects in that domain. Experts in these domains often form that subset of objects into a taxonomy with important consequences for rules affecting or using those objects. The major idea is that objects can be understood and accessed through this hierarchical relationship; for example, if no rules can be found which specifically mention object A, then perhaps relevant rules can be found which do mention its parent in the taxonomy, object B. Also, a taxonomical classification makes acquisition and explanation affecting objects simpler and more efficient; misunderstood objects can be clarified by showing their relationship to other, better understood ones. An automated system for extracting from an expert his taxonomy of a complex domain should provide interactive facilities to allow the expert to make completely clear the relationships between objects in the taxonomy. It is important for a rule-based system using the taxonomy to understand if these relationships are based on some underlying theory of the domain, or if they simply reflect a particular expert's view of the pathways through which objects should be accessed. The system should allow diverse types of - relationships within the taxonomy and should make expansion and modification a relatively simple process for even a large taxonomy. Research Plan: a. Initially develop an interactive framework for describing and modifying an expert's taxonomy of objects in his domain. This should be accomplished within six months. b. Combine this taxonomy acquisition system into a variety of knowledge-based systems to test its flexibility and utility. This will be an ongoing process over a several year period. C. Develop methods for describing the domain theory that underlies the taxonomic classification along with means for testing the supplied taxonomy to check if its organization really does represent that theory. This should be accomplished in some generality within two years. 3. Develop a general representation for checking the form of new knowledge entered about objects and relations in the domain. These so-called "schema" can also guide the acquisition of knowledge. Detailed Discussion: One of the most successful systems to address the knowledge acquisition issue is the MYCIN/TEIRESIAS system developed at Stanford. MY~IN/T~TRESTAS developed techniques for the 34 Stanford University Heuristic Programming Project acquisition of domain knowledge in the form of rules and objects. MYCINITEIRESIAS has approximately four hundred production ("If . . . then . ..") rules in its knowledge base. Problem solving context and expectation models are used to guide the process of acquiring these rules from a user. MYCIN/TEIRESIAS also has several hundred objects in its vocabulary (eg. gramstain, nutrient, and culture). To guide the acquisition of these objects which have varied structure, a schema, that is a description of structure, is kept for each type of object. When a new instance of an object is acquired, MYCIN/TEIRESIAS decomposes the acquisition process into a number of small steps corresponding to the components of the object. A schema for schemata (the "Schema-schema") gives a description of schemata in the knowledge base. This makes it possible to acquire a new kind of object by first acquiring a new schema for it using the schema-schema, and then acquiring an instance using the new schema. Several extensions to the ideas in MYCIN/TEIRESIAS are needed to build a powerful problem solving system. The part of the medical diagnostic task represented in this system is described in terms of rules of a very limited format. Some common kinds of operations and strategies are awkward to express within this constrained format. For example, iteration can be expressed only by introducing some extra dummy variables. Following the analogy of using schemata to describe different types of objects, it is proposed that rule schemata be developed to describe different types of rules. (MYCIN/TEIRESIAS may be viewed as having only one type of rule which has an "if" component and a "then" component with associated experts for guidance in filling the two components. This "schema" for rules in MYCIN/TEIRESIAS is built into the system.) We propose to develop a variety of schemata to cover a broad range of domain operations and strategies. Just as an object can be decomposed into it component objects, a rule could be decomposed into its component subactions. In the proposed system, a user could select from a menu of standard artificial intelligence strategies and the system would guide the acquisition of the strategy rule using appropriate schemata from the knowledge base. In summary, the acquisition process for a strategy rule is broken down into a number of small and manageable steps. The schema used to guide the acquisition process inherits many of its specifications for creating the rule from its ancestors in the schemata hierarchy. It is suggested that this process can be used to help prevent required entries from being forgotten when a new rule is acquired. Much of the structure of a rule can be filled in automatically - for example, the iterative loop in the Means-ends analysis example. Tests on the sets of acceptable values for the components of instances can be built into the schemata as a further check on the correctness of what a user enters. The goal of this process of assisted acquisition is to 35 Stanford University Heuristic Programming Project make the acquisition of domain specific strategy rules as painless and bug-free as possible. Research Plan: a. Develop the vocabulary of schemas that extends the MYCIN/TEIRESIAS capabilities to iterative statements. b. Design a hierarchy of schemas and programs for "inheriting" or transfering knowledge from one level of the hierarchy to others. c. Design the "mean" of AI strategies and instantiate all schemas below at least one of them. 4. Additional experiments will test the relative power of rule formation under different learning conditions. Detailed Discussion: Given a body of data from which rules are to be formed, together with a basic approach to rule induction, there remains a range of ways in which the data may be utilized, which differ in the degree of parallelism involved in the examination of instances. At one extreme are methods in which rules are formed and refined in a sequence of steps, each step involving the examination of one new instance. At the other extreme are methods which involve a single-pass rule formation process, using all available data. There are, of course, many intermediate possibilities. We propose to investigate, within the Meta- DENDRAL framework, whether some of these methods are optimal in the sense of yielding rules of comparatively high quality with the expenditure of comparatively little computing effort. It is hoped that the investigation will lead us to some general insights concerning the optimal utilization of data in automatic rule formation. Research Plan: a. Develop and implement one or more procedures for updating an evolving set of rules on the basis of newly examined data. These procedures will make use of existing capabilities of the RULEGEN and RULEMOD programs, and will make possible the implementation of a variety of schemes for data utilization, as described above. b. Select and implement a representative subset of the class of data utilization schemes indicated above, and test their performance in the application area of mass spectrometry. c. Describe in a technical report these experiments, their results, and the lessons learned. 36 Stanford University Heuristic Programming Project E. Explanation by program of its line-of-reasoning Current capabilities are exemplified by the interactive explanation capability (in English) of the MYCIN system, and non- interactive capability in HASP. 1. Continue program development to extend explanation capabilities to complex procedures in addition to production rules. Also extend facility to allow explanation programs to be provided with additional knowledge about the purposes of requests,thereby allowing them to produce the "most relevant" explanations of lines of reasoning. Detailed Discussion: Our previous work demonstrated the explanation capability of rule- based expert systems [Comp. Biomed. Res. ppr, and AJCL microfiche]. We propose to extend the advantages of an explanation system to programs whose knowledge is procedurally embedded. The key concepts we will explore are: 1) an "event history" which records decisions made by the program in the form of traces, 2) an "event structure" which abstractly represents the steps and strategies of a procedure, and 3) an interactive question-answerer which can selectively retrieve traces from the event history, guided by the organization of the event structure. For the sake of illustration we will use an example from antibiotic therapy selection in the MYCIN program. One relevant question at the end of a consultation with MYCIN is "Why did you recommend streptomycin for the streptococcus infection?". In order to answer this question the program needs to keep a record of the major conceptual reasoning steps taken by the various procedures in the therapy selection algorithm at the end of a MYCIN consultation. It must also be able to structure these "events" so that only the relevant steps will be retrieved in answer to the user's question. Research Plan: a. Develop a model for types of problem-solving methods which might be explained by this scheme. Is there a broad class of programs to which it might be applicable? (8/77) b. Develop a general schema for the event structure which can be used to abstractly represent the class of programs determined in 1. (10/77) 2. Extend the explanation capabilities of production systems to allow the explanation of strategy decisions embodied in meta-rules. Detailed Discussion: 37 Stanford University Heuristic Programming Project Strategy decisions may sometimes be more important than decisions about more specific steps in the reasoning. We propose to extend the capabilities of the MYCIN explanation system to phrase answers to questions in terms of the general strategies as well as in terms of individual rules. For example, the answer to the question "Why did you ask me about X?" may sometimes refer to the strategy decision in which X was relevant, e-g-, "Because one of the strategies for finding out about Q is only relevant when you know X." That is, the answer may be given in terms of the strategy rules as well as the individual domain rules. Again, the question "Why didn't you use rule R?" may refer to the general strategy under which the program was operating at the time -- "Because my strategy rule S said that R would not be useful in this context." Finally, general questions about the domain, and not about the specific consultation, may refer to strategies. E.g., ll1r-i general how do you conclude the identity of infecting organisms?" This may be answered by mentioning the individual domain rules, as MYCIN does now, or by mentioning some of the general strategies for reasoning about this parameter. Research Plan: a. Extend the history list mechanism of the MYCIN program to accept information about the strategies that MYCIN is carrying out during a consultation. b. Upgrade the question-answering capabilities of the MYCIN program to answer general questions about the program's reasoning in terms of strategy rules. An important problem to be solved is recognizing the need to frame an answer in terms of strategies rather than individual rules. II. Research in Support of Other IPTO R&D Programs A. Rule-based methods in the interpretation of image data Many of the rule-based methods previously developed by our project, and to be studied and developed in this period, can be applied to assist the interpretation of image data, in at least the following ways: . . . modularity/flexibility. Qualitative rules of interpretation can be changed easily when existing rules are deemed inadequate by the image analysis expert. 38 Stanford University Heuristic Programming Project . . . acquisition. New rules can be inserted into the program under guidance of a rule acquisition system (similar to MYCIN's) to extend performance to new or changed circumstances. . . . explanation. The line-of-reasoning being used by the program to interpret the image can be exhibited for the human analyst to understand and evaluate. This important application of our work is not now being made, and we propose to orient a portion of our effort in this direction. A.I. researchers in IPTO's Image Understanding program are doing significant work primarily in "front-end" image processing and procedure-based interpretation; but not in rule- based methods such as exist in MYCIN,DENDRAL, and HASP. Because the start-up costs in money and scientific energy to understand the real problems in image analysis and build base- level software are so great, we propose to develop our prototype system in a domain in which we have considerable expertise, excellent data, extensive software, and (last but not least) some other financial support from NSF. This domain is protein crystallography, involving the interpretation of three- dimensional images of the electron density of a protein molecule in a crystal. The rule-based system to be developed in this prototype will be engineered to be as portable as possible to facilitate its application to the photo-interpretation prototypes being built by the Image Understanding projects, e.g., those at the Stanford AI Laboratory, SRI, and Carnegie Mellon University. This proposed activity has additional plausibility because of the following technical circumstances: . . . the HASP rule-based program structure developed by some of our staff is applicable to this problem. . . . as part of our current activity, to test the generality of that HASP program structure, we have already mapped the basic ideas in the HASP program onto the crystallographic data interpretation problem. Hence we have a running start on the problem. Detailed Discussion: For the reasons discussed above, we propose to develop rule-based methods for interpreting 3-D images, within the context of protein crystallographic data. Our ultimate goal is to transfer the rule-based methodology to other IPTO R&D programs in image interpretation. In this respect, preliminary discussions have been held with Prof. Raj Reddy of CMU. Prof. Reddy is interested in our work and eager to collaborate. The technology 39 Stanford University Heuristic Programming Project transfer is expected to be bi-directional in this effort: the Image Understanding Projects stand to benefit by employing a rule-based structure in their programs, and we expect to benefit from their expertise in front-end processing of the primary image data. The interpretation of an electron density image, derived from the reduction of X-ray crystallographic data, is a necessary step in understanding the 3-D structure of proteins and other macromolecules. When crystallographers use the term "electron density map" they usually have in mind some pictorial representation of the electron density defined over a certain region of 3-space (usually some fraction of the unit cell of the crystal). By carefully studying the map the experienced interpreter can find features which allow him to infer important features of the image: approximate atomic locations, molecular boundaries, groups of atoms, the backbone of the polymer, etc. An "interpretation" is a model of the molecular structure which conforms to the electron density map and is also consistent with his knowledge of protein chemistry, stereochemical constraints and other available chemical and physical data (e.g., the amino acid sequence). Our goal is to build an intelligent assistant to the image analyst, i.e., a computational system that can generate and verify its own structural hypotheses, and explain the steps taken in the interpretive process. This capability implies (1) a representation of the 3-D image data suitable to machine interpretation, (2) a substantial knowledge base appropriate to the task domain, (3) a wide assortment of model building algorithms and heuristics, in order to achieve acceptable performance, and (4) a flexible, rule-based system for incorporating both the knowledge sources and the control strategy. An elaboration of these points has been recently documented (Engelmore and Nii, HPP-77-2). The fourth point, concerning knowledge representation and utilization, is of primary concern here, because of its applicability in other image understanding contexts. The expert analysts who interpret these images move continually across a large field of basic facts, special features of the data and implications of the partial model already built, looking for any and all opportunities to add another piece to their structure. There are several desiderata to working in this "opportunistic" mode of hypothesis formation: (1) the inference making rules and the strategies for their deployment should be separated from one another, (2) the rules should be separated from the mechanics of the program in which they are embedded, and (3) the representation of the hypothesis space should be compatible with the various kinds of hypothesis generating rules available. (The hypothesis structure represents an a priori established plan for problem solving.) The modularity of such a 40 Stanford University Heuristic Programming Project system allows users to add or change rules for manipulating the data base, as well as to investigate different solution strategies, without having to make major modifications to the system. These issues, which have general applicability beyond the specific task domain of protein crystallography, are discussed further in Appendix A. The formal and informal procedures which comprise OUK knowledge sources are expressed as rules, as in the DENDRAL, HASP and MYCIN systems. These rules are collected into sets of rules, each set being appropriate to use on a particular class of events. The events generally reflect the level on which the inference is being made, which in turn reflects the level of the detail of the model. The correspondence between event classes and rule sets is established by another set of rules, the event rules. The event rules thus form a second layer of rules which direct the system's choice of knowledge sources for a given event, reflecting the system's knowledge of what it knows. (A similar set of rules, the job rules, perform the same role when the system operates in goal-driven mode.) Maintaining the rule- based structure affords a flexibility in choosing different combinations of knowledge sources to work together, without having to make any changes in the knowledge sources themselves. Thus, yet a higher level knowledge source, the strategy rules, can manipulate the events in order to choose the appropriate combination of KSs suited to a particular stage or state in the solution hypothesis. Examples of strategy rules are the merging of two or more events to produce a new event that cau lead to further progress, or shifting from event-driven to goal-driven mode. We thus have a completely rule-based control structure, employing three distinct levels of rules (or knowledge): the specialist, commonly called the knowledge sources, the event processing rules (or job processing rules), representing knowledge about the capabilities of the specialist, and the strategy rules which know when to use all available knowledge to solve the problem. Although this pyramidal structure of rules and meta-rules could continue indefinitely, the flexibility of knowledge deployment offered by our three-tiered system would appear to be sufficient for this problem solving system. Similar ideas in a simpler context have been explored by Davis (HPP-76-7) for the MYCIN system. Research Plan: We propose the following tasks and dates of completion: 1. Continue the implementation of the rule-based image interpretation system in the crystallographic context. Sub-tasks include: 41 Stanford University Heuristic Programming Project a. Implementation of additional knowledge sources for matching templates of predicted sub-structures with primary and/or processed image data. (3/78) b. Partitioning of the KSs into functionally related sets of rules. (3/78) c. Development and implementation of new strategy rules for directing the utilization of the KSs. (6/78) 2.Implement an explanation subsystem, using the history list feature presently incorporated, which interfaces conveniently with the user. (6/78) 3. Develop contacts with at least one IPTO Image Understanding group, in order to understand their particular problems and associated technical isssues. (9/77) 4. Select one of the Image Understanding groups for collaboration, and begin working with it, in order to orient our prototype development towards its particular needs. (6/78) 5. Begin the transfer of ideas and software from the prototype system to the collaborating Image Understanding project. (6/79) B. Distributed Sensor/Computer Networks We propose a two-year effort of relatively small size to study issues in signal interpretation, artificial intelligence methods, and computer organization raised by the IPTO initiative in Distributeed Sensor/Computer Networks. The general problem to be studied is an inference problem of interpreting and integrating the sensory information being received by an array of sensors/computers geographically distributed, for which there is no designated central processing agent. Programs resident at the local sensor/ computer would compute a "local" understanding, communicating it to neighbors as appropriate and calling upon neighboring computational assistance as needed. The local sensor/computer stations may be integrating more than one type of instrumental data. Our experience with this type of problem arises partly from our work on the geographically distributed, multi-sensor (but central processing) HASP task; and partly from our planning for multi-instrument data interpretation in analytic chemistry (arising from the work on DENDBAL,and currently under funding consideration by NIH). The problem described above is inextricably tied up with two other problems, only one of which is being studied elsewhere 42 Stanford University Heuristic Programming Project in the program. The first is one of conceiving of how the methods of artificial intelligence research that are currently well understood can be mapped into a highly- parallel asynchronous computing environment. Attempts will be made to study this problem for ( at least) heuristic search methods, rule-based inference methods and graph matching. The second is one of conceiving of a flexible enough architecture of small computers and communications to support the requirements of the AI methods, especially in their application to the interpretation of geographically distributed signals. We have begun initial and relatively abstract studies of both of these problems. We define "distributed processing" to mean processing that involves one or both of the following attributes: physical decomposition of the processor into relatively independent processor nodes, and functional decomposition of the top level task into relatively independent subtasks. There are several applications, particularly distributed sensor networks and distributed command and control systems, in which the confluence of distributed processing and AI techniques appears attractive. We propose to study and develop techniques for symbolic inference in a distributed processing environment; and to discover the constraints (if any) placed on a distributed processor architecture by these applications. These techniques are based on an emerging formalism we have been developing, which we call the CONTRACT NET. In a contract net, individual tasks are dealt with as contracts that exist between pairs of processor nodes that take on the roles of contract manager and contractor respectively. In any such system with distributed executive control, there must be a method for choosing the small number of tasks that can be processed at a given time instant from the large number generally available. To this end, we wish to explore the concept of a "priority description" which is intended to offer a more detailed specification of task characteristics and importance than the usual integer ratings. Our approach to the architectural problem is toward a much more loose coupling between the computers of the net (i.e., much more local processing,much less communication) than is implicit in the design of the c.m star machine architecture being studied at Carnegie-Mellon. A possible and important by-product of this study could be a computational route to extreme speed-ups in symbolic computation arising from the collective processing activity of large numbers of inexpensive microcomputers. The slow speed of symbolic computation, in contrast to numeric calculation, has been a major hurdle in the development of AI science and AI applications. Joining us in these studies have been a member of the staff of the Canadian Defense Research Establishment Atlantic (a Stanford student) and staff member of the Norwegian Defense Research Establishment (a Stanford post-doctoral Fellow). We 43 Stanford University Heuristic Programming Project have had some indications of possible industry support for the research on distributed architectures. We propose three tasks: 1. Interact with the other investigators in the IPTO Distributed Sensor Network project to understand and help to further refine the scientific and practical problems therein; and produce a report (or series of reports) on the methods available from artificial intelligence research that might be applicable to the development of a total system. Our methodology will be computer simulation (not hardware building). The sense data interpretation problem that will give specificity to our study will be an image data interpretation problem, probably the same image data interpretation problem discussed in (a.) above (the crystallographic image data interpretation problem) for reasons of economy of effort and funds. 2. Prepare a report on the realization of at least two of the well-understood AI methods in a distributed computing environment. Again , the methodology will be computer simulation but the tasks chosen for study will range more broadly than just signal data interpretation. 3.Prepare report(s) on studies of various highly parallel asynchronous architectures suited to symbolic computation. The methodology will involve both abstract analytic studies and computer simulation. We do not at this time intend to construct any network until we understand the issues from a theoretical point of view. Research Plan: 1. Complete design of CONTRACT NET formalism for expressing control of problem solving in a distributed network, and incorporate the completed design into a simulation program. 2. Report on the variations in control that can be achieved through the use of priority descriptions. 3. Expand the simulator to allow exploration of the suitability of different processor node interconnection alternatives for a distributed processor (6/78). 4. Test the CONTRACT NET formalism on a variety of simple, yet representative problems, using the simulator, and report results. 5. Report on the problems raised by the requirement for global information in a distributed network in which there is limited memory at each processor node (6/79). 44 Stanford University Heuristic Programming Project Appendix A. Extended Discussion of Proposed Rule-based System for Image Understanding In this appendix we amplify some issues on knowledge representation and utilization in the context of the proposed 3-D image understanding research. Representation of Knowledge in the Image Interpretation System The problem of representing a large and diverse body of knowledge, in a form which will allow it to be used cooperatively and efficiently in the search for plausible hypotheses, is of central concern to this research. The system currently under development draws upon many concepts which have emerged in the design of other large knowledge-based systems, e.g., the use of production rules (as in DENDRAL, HASP, etc.) and a multi-level space of hypotheses (as in Hearsay-II). Knowledge consists of facts, algorithms and heuristics (rules of good guessing). Facts required for protein structure inference are general physical, chemical, stereochemical and crystallographic constraints. Typical factual knowledge stored in the system includes molecular structure and chemical properties of the twenty amino acid building blocks. These facts are encoded as tables or in property lists attached to specific structural entities. Algorithms and heuristics comprise the formal and informal knowledge which generate and/or verify hypothesis elements. We have been guided by two general principles in the representation of the knowledge sources: 1) decompose identifiable areas of knowledge into elementary units, each of which increments the hypothesis when specified preconditions are met. 2) represent the elementary units as situation-action rules. To illustrate, consider the relatively simple example of heavy atom location. This subproblem is decomposed into two independent parts: 1) inferring the presence of heavy atoms and 2) determining their spatial locations. These two independent parts are represented as two separate KSs, invoked under different conditions. A rule contained in the first KS is: IF the composition list contains a cofactor of type heme, THEN: 1) create a superatom node of type heme in the model plane, 2) create an atom node of type iron in the model plane, 3) create membership links between the iron and the heme, 4) put "cofactor-posited" on the event-list, 5) put "heavyatom-posited" on the event-list. 45 Stanford University Heuristic Programming Project Note that several actions may be performed for a given situation. An action may itself be a situation-action rule, and may be iterative. Also note that at least one of the actions of each rule is to place a token on an event-list. The event-list is used by the interpreter, discussed in the next section, to determine what to do next, i.e., which set of knowledge sources will be invoked after the current event has been processed. Control Structure for the Image Interpretation System There are several choices of control structure faced by the designer of a knowledge-based system. Basically the choices are among points on a spectrum, at the extremes of which are goal- driven and event-driven systems. In a goal-driven system (of which MYCIN is a well-known example (Shortliffe, HPP-74-2)) the rule interpreter selects a rule which concludes with the goal being sought. In our system, we might imagine having such a goal rule as follows: IF 1) the topological description is complete, and 2) the coordinates of all atoms in the structure are assigned, and 3) the structure satisfies stereochemical constraints, and 4) the structure is consistent with the electron density map, and 5) the structure is consistent with auxiliary chemical data, THEN: signify that a model has been completed. The interpreter would then attempt to verify each of the premises in the goal rule. To do that, other rules would be selected whose conclusions (the right-hand sides) verified the premises under consideration and the interpreter would attempt to verify the premises of these rules, and so on, working through the list of rules in this recursive fashion. The program's focus of attention is determined by the current rule whose premises are being evaluated. Many levels of recursion may occur before a rule is reached which is relevant to the current state of the system. A goal-driven monitor is attractive, in that it pursues a logical chain of reasoning, in which the purpose of each move is clearly revealed by the tree of subgoals. A vital requirement for the success of a goal-driven control structure is the relative independence of the subgoals. Although some aspects of the protein image interpretation problem can be solved in this fashion, the independence requirement is not generally satisfied. For example, the location of each atom in the molecule must obey stereochemical constraints with respect to neighboring atoms, so that one cannot simply determine each atom`s position independently. 46 Stanford University Heuristic Programming Project An alternate way to focus attention is to employ an event- driven control structure. In this scheme the current state of the hypothesis space determines what to do next. The monitor continually refers to a list of current events - the event-list mentioned in the rules discussed above - which is used to trigger those knowledge sources most likely to make further headway. As a knowledge source makes a change in the current hypothesis, it also places a symbol on the event-list to signify the type of change made. Thus as events are drawn from the event-list for processing, new events are added, so that under normal conditions the monitor always has a means for choosing its next move. An explanation capability is provided by keeping a list of events processed, their predecessor and successor events, and the KSs invoked. The system we are currently developing operates in both goal-driven and event-driven modes (with an emphasis on the latter), and is closely related to the HASP system design. The normal iterative cycle of problem solving uses the event-list to trigger knowledge sources, which create or change hypothesis elements and place new events on the event-list. Thus the system's behavior is "opportunistic" in that it is guided primarily by what was most recently discovered, rather than by a necessity to satisfy subgoals. The choice of an event-driven control structure as the primary mode of operation is based partly on subgoal interdependence, partly on efficiency in selecting appropriate knowledge sources and partly on conformity with the structure modeling process normally employed by protein crystallographers. 47 BIBLIOGRAPHY Heuristic Programming Project Memos Since 1964 HPP-64-1 J. Lederberg, "DENDRAL-64-A System for Computer Construction, Enumeration and Notation of Organic Molecules as Three Structures and Cyclic Graphs", (technical reports to NASA, also available from the author and summarized in (12)). (la) Part I. Notational algorithm for tree structures (1964) CR.57029 (lb) Part II. Topology of cyclic graphs (1965) CR.68898 (lc) Part III. Complete chemical graphs; embedding rings in trees (1969) HPP-64-2 J. Lederberg, "Computation of Molecular Formulas for Mass Spectrometry", Holden-Day, Inc. (1964). HPP-65-1 J. Lederberg, "Topological Mapping of Organic Molecules", Proc. Nat. Acad. Sci., 53:1, January 1965, pp. 134-139. HPP-65-2 J. Lederberg, "Systematics of Organic Molecules, Graph Topology and Hamilton Circuits. A General outline of the DENDRAL system." NASA CR-48899 (1965) HPP-65-3 AIM-30, AD785056 Edward A. Feigenbaum, Richard W. Watson, "An Initial Problem Statement for a Machine Induction Research Project", (April 1965). HPP-66-1 AIM-38, AD785066 Donald A. Waterman, "A Filter for a Machine Induction System", (January 1966). HPP-66-2 AIM-46, CS-50, PB176761 Staffan Persson, "Some Sequence Extrapolating Programs: a Study of Representation and Modeling in Inquiring Systems", Ph.D. Thesis in Computer Science, (U.C. Berkely September 1966). HPP-66-3 AIM-47 Bruce Buchanan, "Logics of Scientific Discovery", Ph.D. Thesis in Philosophy (Michigan State University December 1966). HPP-67-1 AIM-49 Georgia Sutherland, "DENDRAL -- a Computer Program for Generating and Filtering Chemical Structures", (February 1967). HPP-67-2 J. Lederberg, "Hamilton Circuits of Convex Trivalent Polyhedra (up to 18 vertices), Am. Math. Monthly, (May 1967). HPP-67-3 AIM-54 Joshua Lederberg, Edward A. Feigenbaum, "Mechanization of Inductive Inference in Organic Chemistry", (August 1967). Also in Formal Representations For Human Judgment, B. Klainmuntz, Editor, Wiley, 1968. HPP-68-1 J. Lederberg, "Online Computation of Molecular Formulas from Mass Number". NASA CR-94977 (1968) HPP-68-2 AIM-62 Bruce G. Buchanan, Georgia Sutherland, E. A. Feigenbaum "Heuristic DENDRAL: a Program for Generating Explanatory Hypotheses in Organic Chemistry", (July 1968). Also in Machine Intelligence-4, Edinburgh Univ., Press, 209-254, 1969. 2 HPP-68-3 AIM-67 AD680487 Edward A. Feigenbaum, 'Artificial Intelligence: Themes in the Second Decade", Proceedings of IFIP International Congress, Edinburgh, (August 1968). HPP-68-4 AIM-74, CS-118, AD681027 Donald Waterman, "Machine Learning of Heuristics", Ph.D. Thesis in Computer Science, (Stanford University December 1968). HPP-68-5 E. A. Feigenbaum and B. G. Buchanan, "Heuristic DENDRAL: A Program for Generating Explanatory Hypotheses in Organic Chemistry," in Proceedings, Hawaii International Conference on System Sciences, B. K. Kinariwala and F. F. Kuo (eds.), University of Hawaii Press, 1968. HPP-69-1 AIM-80, AD685612 Georgia Sutherland, 'Heuristic DENDRAL: a Family of LISP Programs", (March 1969). HPP-69-2 AIM-99 Bruce G. Buchanan, G.L. Sutherland, E.A. Feigenbaum, "Toward an Understanding of Information Processes of Scientific Inference in the Context of Organic Chemistry", (September 1969). Al so in Machine Intelligence-5, Edinburgh Univ., Press, 1970. HPP-69-3 AIM-102 Donald A. Waterman, 'Generalization Learning for Automating the Learning of Heuristics', J. Artificial Intelligence Vol. L, No. l/2 (July 1969). HPP-69-4 AIM-104 Joshua Lederberg, Georgia Sutherland, Bruce G. Buchanan, Edward A. Feigenbaum "A Heuristic Program for Solving A Scientific Inference Problem: Summary of Motivation and Implementation", (November 1969). Proceedings of IV Systems Symposium, Case Western Reserve Univ., Springer-Verlas, 1970. HPP-69-5 J. Lederberg, "Topology of Molecules", in the Mathematical Sciences - A Collection of Essays, (ed.) Committee on Support of Research in the Mathematical Sciences (COSRIMS), National Academy of Sciences - National Research Council, M.I.T. Press, (1969). 3 HPP-69-6 J. Lederberg, G.L. Sutherland, B.G. Buchanan, E-A. Feigenbaum, A.V. Robertson, A.M. Duffield, and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference I. The Number of Possible Organic Compounds: Acyclic Structures Containing C, H, 0 and N". Journal of the American Chemical Society, 91 (1969). HIYP-69-7 A.M. Duffield, A.V. Robertson, C. Djerassi, B.G. Buchanan, G.L. Sutherland, E.A. Feigenbaum, and J. Lederberg, "Application of Artificial Intelligence for Chemical Inference II. Interpretation of Low Resolution Mass Spectra of Ketones". Journal of the American Chemical Society, 91:ll (May 21, 1969). HPP-69-8 C.W. Churchman and B.G. Buchanan, "On Design of Inductive Systems: Some Philosophical Problems". British Journal for the Philosophy of Science, 20 (1969). HPP-69-9 G. Schroll, A.M. Duffield, C. Djerassi, B.G. Buchanan, G.L. Sutherland, E.A. Feigenbaum, and J. Lederberg, "Application of Artificial Intelligence for Chemical Inference III. Aliphatic Ethers Diagnosed by Their Low Resolution Mass Spectra and NMR Data". Journal of the American Chemical Society, 91. (1969). HPP-70-1 A. Buchs, A.B. Delfino, C. Djerassi, A.M. Duffield, B.G. Buchanan, E.A. Feigenbaum, J. Lederberg and G. Schroll, "Application of Artificial Intelligence for Chemical Inference. IV. Saturated Amines Diagnosed by Their Low Resolution Mass Spectra and Nuclear Magnetic Resonance Spectra", Journal of the American Chemical Society, 92, (1970). HPP-70-2 AIM-123 Bruce G. Buchanan, Thomas E. Headrick, "Some Speculation About Artificial Intelligence and Legal Reasoning", (May 1970). Stanford Law Review Fall 1970. HPP-70-3 Y.M. Sheikh, A. Buchs, A.B. Delfino, G. Schroll, A.M. Duffield, C. Djerassi, B.G. Buchanan, G.L. Sutherland, E.A. Feigenbaum and J. Lederberg, "Applications of Artificial Intelligence for Chemical Inference V. An Approach to the Computer Generation of Cyclic Structures. Differentiation Between All The Possible Tsomeric Ketones of Composition C6HlOO", Organic Mass Spectrometry, 4, 493, (1970). 4 HPP-70-4 A. Buchs, A.B. Delfino, A.M. Duffield, C. Djerassi, B.G. Buchanan, E.A. Feigenbaum and J. Lederberg, "Applications of Artificial Intelligence for Chemical Inference VI. Approach to a General Method of Interpreting Low Resolution Mass Spectra with a Computer", Helvetica Chimica Acta, 53, 1394 (1970). HPP-70-5 AIM-131, CS-176, AD715128 Edward A. Feigenbaum, Bruce G. Buchanan, Joshua Lederberg, "On Generality and Problem Solving: a Case Study Using the DENDRAL Program", (August 1970). Also in Machine Intelligence-6, Edinburgh Univ. Press, 1971. HPP-71-l A. Buchs, A.B. Delfino, C. Djerassi, A.M. Duffield, B.G. Buchanan, E.A. Feigenbaum, J. Lederberg, G. Schroll, and G.L. Sutherland, "Applications of Artificial Intelligence for Chemical Inference VII. The Application of Artificial Intelligence in the Interpretation of Low-Resolution Mass Spectra", Advances in Mass Spectrometry, 5, 314 (1971). HPP-71-2 AIM-141, CS-205, AD730506 Bruce G. Buchanan, Joshua Lederberg, "The Heuristic DENDRAL Program for Explaining Data", In proceedings of the IFIP Congress 71, Ljubljana, Yugoslavia (February 1971). HPP-71-3 AIM-147, CS-216, AD732457 Robert E. Kling, "Reasoning by Analogy with Applications to Heuristic Problem Solving: a Case Study", Thesis: Ph.D. in Computer Science (August 1971). HPP-71-4 AIl4-141 B.G. Buchanan, E.A. Feigenbaum, and J. Lederberg, "A Heuristic Programming Study of Theory Formation in Science". In proceedings of the Second International Joint Conference on Artificial Intelligence, Imperial College, London (September 1971). HPP-71-5 B.G. Buchanan, A.M. Duffield, A.V. Robertson, "An Application of Artificial Intelligence to the Interpretation of Mass Spectra", Mass Spectrometry Techniques and Appliances, G.W.A. Milne, Ed., John Wiley & Sons, Inc., p. 121, (1971). 5 HPP-72-1 D.H. Smith, B.G. Buchanan, R.S. Engelmore, A.M. Duffield, A. Yeo, E.A. Feigenbaum, J. Lederberg, and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference. VIII. An Approach to the Computer Interpretation of the High Resolution Mass Spectra of Complex Molecules. Structure Elucidation of Estrogenic Steroids", Journal of the American Chemical Society, 94, 5962 (1972). HPP-72-2 B.G. Buchanan, E.A. Feigenbaum, and N.S. Sridharan, "Heuristic Theory Formatlon: Data Interpretation and Rule Formation". In Machine Intelligence 7, Edinburgh University Press (1972). HPP-72-3 AIM-181, CS-325 Bruce G. Buchanan, "Review of Hubert Dreyfus' 'What Computers Can't Do': a Critique of Artificial Reason", (December 1972). Computing Reviews (January, 1973). Also in Computing Reviews, January 1973. HPP-72-4 J. Lederberg, "Rapid Calculation of Molecular Formulas from Mass Values". Journal of Chemical Education, 49, 613, (1972). HPP-72-5 Joshua Lederberg, "Use of a Computer to Identify Unknown Compounds: The Automation of Scientific Inference", Biochemical Applications of Mass Spectrometry, (1972). HPP-72-6 STAN-CS-318 H. Brown, L. Masinter, and L. Hjelmeland, "Constructive Graph Labeling Using Double Cosets". Discrete Mathematics, 7, 1 (1974). HPP-73-1 D.H. Smith, B.G. Buchanan, R.S. Engelmore, H. Aldercreutz and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference IX. "Analysis of Mixtures Without Prior Separation as Illustrated for Estrogens". Journal of the American Chemical Society, 95, 6078 (1973). HPP-73-2 D.H. Smith, B.G. Buchanan, W.C. White, E.A. Feigenbaum, C. Djerassi and J. Lederberg, "Applications of Artificial Intelligence for Chemical Inference X. "INTSU!I. A Data Interpretation Program as Applied to the Collected Mass Spectra of Estrogenic Steroids". Tetrahedron, 29, 3117, (1973). 6 HPP-73-3 AIM-205, CS-370, AD764288 N.S. Sridharan, et.al, "A Heuristic Program to Discover Syntheses for Complex Organic Molecules", (June 1973). HPP-73-4 STAN-CS-73-381 N.S. Sridharan, "Computer Generation of Vertex Graphs", (July 1973). HPP-73-5 R. Carhart and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference XI: The Analysis of Cl3 NMR Data for Structure Elucidation of Acyclic Amines", Journal of the American Chemical Society (Perkin II), 1753, (1973). HPP-73-6 AIM-215, CS-387, AD769380 B.G. Buchanan and N.S. Sridharan, "Analysis of Behavior of Chemical Molecules: Rule Formation on Non-Homogeneous Classes of Objects". In proceedings of the Third International Joint Conference on Artificial Intelligence, (August 1973). Also in Machine Intelligence-7, Edinburgh Univ. Press, 1974. HPP-73-7 Il. Michie and B.G. Buchanan, "Current Status of the Heuristic DENDRAL Program for Applying Artificial Intelligence to the Interpretation of Mass Spectra", in "Computers for Spectroscopy", [Also: University fo Edinburgh, School of Artificial Intelligence, Experimental Programming Report No. 32 (1973).1 HPP-73-8 AIM-217, CS-391, AD770610 N.S. Sridaran "Search Strategies for the Task of Organic Chemical Synthesis", (August 1973). HPP-73-9 E.H. Shortliffe, S.G. Axline, B.G. Buchanan, T.C. Merigan, S.N. Cohen, "An Artificial Intelligence Program to Advise Physicians Regarding Antimicrobial Therapy", Computers and Biomedical Research, 6:544-560 (1973). 7 HPP-73-10 AIM-216, CS-389, AD71299 Larry Masinter, N.S. Sridharan, J. Lederberg, D.H. Smith "Applications of Artificial Intelligence for Chemical Inference XII: Exhaustive Generation of Cyclic and Acyclic Isomers", (September 1973). Also in J. Am. Chem. Sot. Vol. 96, No. 25, December 11, pp. 7702-7714, (1974). HPP-74-1 E.H. Shortliffe, S. G. Axline, B.G. Buchanan, S.N. Cohen, "Design Considerations for a Program to Provide Consultations in Clinical Therapeutics, Proceedings of San Diego Biomedical Symposium 1974 (February 6-8 1974). HPP-74-2 AIM-251, CS-465, ADA001373 Edward H. Shortliffe, "MYCIN: A Rule-Based Computer Program for Advising Physicians Regarding Antimicrobial Therapy Selection", Thesis: Ph.D. in Medical Information Sciences, Thesis: Ph.D. in Medical Information Sciences, (October 1974). HPP-74-3 B.G. Buchanan, 'Scientific Theory Formation by Computer'. In Proceedings of NATO Advanced Study Institute on Computer Oriented Learning Processes, Bonas, France, (1974). HPP-74-4 E.A. Feigenbaum, "Computer Applications: Introductory Remarrks", in "Proceedings of Federation of American Societies for Experimental Biology", 33, 2331 (1974). HPP-74-5 L.M. Masinter, N-S. Sridharan, R.E. Carhart and D.H. Smith, 'Applications of Artificial Intelligence for Chemical Inference XIII. Labeling Of Objects Having Symmetryl,2", Journal of the American Chemical Society, 96, 7714, (1974)/ HPP-74-6 D. Michie and B. G. Buchanan, "The Scientist's Apprentice." In Computers for Spectroscopy (ed. R.A.G. Carrington) London: Adam Hilger, 1974. HPP-75-1 E.H. Shortliffe, B.G. Buchanan, "A Model of Inexact Reasoning in Medicine", Mathematical Biosciences, 23:351-379, (1975). HPP-75-2 E.H. Shortliffe, R. Davis, S.G. Axline, B.G. Buchanan, C.C. Green, S.N. Cohen, "Computer-Based Consultations in Clinical Therapeutics; Explanation and Rule Acquisition Capabilities of the MYCIN System", Computers and Biomedical Research, 8:303-320, (August 1975). 8 HPP-75-3 E.H. Shortliffe, F.S. Rhame, S.G. Axline, S.N. Cohen, B.C. Buchanan, R. Davis, A.C. Scott, R. Chavez-Pardo, and W.J. Van Melle, "MYCIN: A Computer Program Providing Antimicrobial Therapy Recommendations". Clinical Medicine, (August 1975). HPP-75-4 E.H. Shortliffe, "Judgmental Knowledge as a Basis for Computer-Assisted Clinical Decision Making, Proceedings of the 1975 International Conference can Cybernetics and Society, (September 1975). HPP-75-5 E.H. Shortliffe, S. Axline, B.G. Buchanan, R. Davis, S. Cohen, "A Computer-Based Approach to the Promotion of Rational Clinical Use of Antimicrobials, International Symposium on Clinical Pharmacy and Clinical Pharmacology, (September 1975), Boston, Mass. Also in Clinical Pharmacy and Clinical Pharmacology, William A. Gouveia, Gianni Tognoni, Eppo van der Kleijn, Editors, Elsevier/North-Holland Publishing Company, pp. 259-274, (1976). HPP-75-6 AIM-266, CS-517, ADA019641 Randall Davis, Bruce Buchanan, Edward Shortliffe, "Production Rules as a Representation of a Knowledge-Based Consultation Program", (October 1975). HPP-75-7 AIM-271, CS-524,, ADAO19702/OWC Randall Davis, Jonathan King, "An Overview of Production Systems", Machine Representations of Knowledge", Dordrecht, D. Reidel Publ. Co., 1976. Proceedings of NATO AS1 Conference, (October 1975). HPP-75-8 R.G. Dromey, B.G. Buchanan, J. Lederberg and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference. WV. "A General Method for Predicting Molecular Ions in Mass Spectra", Journal of Organic Chemistry, 40, 770 (1975). HPP-75-9 D.H. Smith, "Applications of Artificial Intelligence for Chemical Inference. I07. Constructive Graph Labelling Applied to Chemical Problems. Chlorinated Hydrocarbons". Analytical Chemistry, 47, 1176 (1975). HPP-75-10 R.E. Carhart, D.H. Smith, H. Brown and N.S. Sridharan, "Applications of Artificial Intelligence for Chemical Inference XVI. Computer Generation of Vertex Graphs and Ring Systems", Journal of Chemical Information and Computer Science, 15, 124 (1975). 9 HPP-75-11 R.E. Carhart, D.H. Smith, H. Brown and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference XVII. An Approach to Computer-Assisted Elucidation of Molecular Structure", Journal of the American Chemical Society, 97, 5755 (1975). HPP-75-12 B.G. Buchanan, "Applications of Artificial Intelligence to Scientific Reasoning", In Proceedings of Second USA-Japan Computer Conference, American Federation of Information Processing Societies Press, (August 1975). HPP-75-13 R.E. Carhart, S.M. Johnson, D.H. Smith, B.G. Buchanan, R.G Dromey, J. Lederberg, "Networking and a Collaborative Research Community: A Case Study Using the DENDRAL Program", in "Computer Networking and Chemistry", P. Lykos, Ed., American Chemistry Society, Washington, D.C., (1975). HPP-75-14 D.H. Smith, "The Scope of Structural Isomerism", Artificial Intelligence for Chemical Inference. XVIII. Journal of Chemical Information and Computer Sciences, 15, 203 (1975). HPP-75-15 E.H. Shortliffe, R. Davis, "Some Considerations for the Implementation of Knowledge-Based Expert Systems", SIGART Newsletter, 55:9-12, (December 1975). HPP-76-1 D.H. Smith, J.P. Konopelski and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference. XIX. Computer Generation of Ion Structures", Organic Mass Spectrometry, 11, 86, (1976). HPP-76-2 Raymond E. Carhart and Dennis H. Smith, "Applications of Artificial Intelligence for Chemical Inference XX. Intelligent Use of Constraints in Computer-Assisted Structure Elucidation". (1976). HPP-76-3 C.J. Cheer, D.H. Smith, and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference XXI. Chemical Studies of Marine Interbrates - XVII. The Computer-Assisted Identification of [+I-Palostrol in the Marine Organism Cespitularia sp., aff. subviridis". Tetrahedron. Vol. 32, pp. 1807 to 1810, Pergamon Press, (1976). 10 HPP-76-4 B.G. Buchanan, D.H. Smith, W.C. White, R.J. Gritter, E.A. Feigenbaum, J. Lederberg, and Carl Djerassi, "Application of Artificial Intelligence for Chemical Inference XXII. Automatic Rule Formation in Mass Spectrometry by Means of the Meta-DENDRAL Program", Journal of the American Chemical Society, 98, 6168 (1976). HPP-76-5 T.H. Varkony, R.E. Carhart and D.H. Smith, "Applications of Artificial Intelligence for Chemical Inference XXIII. Computer-Assisted Structure Elucidation. Modelling Chemical Reaction Sequences Used in Molecular Structure Problemsl,2". (1976). HPP-76-6 D.H. Smith and R.E. Carhart "Applications of Artificial Intelligence for Chemical Inference XXIV. Structural Isomerism of Mono and Sesquiterpenoid Skeletons 1,2-n, Tetrahedron, Vol. 32, pp. 2513 to 2519, Pergamon Press (May 1976). HPP-76-7 AIM-283, CS-552 Randall Davis, "Applications of Meta Level Knowledge to theConstruction, Maintenance and Use of Large Knowledge Bases", Thesis: Ph.D. in Computer Science, (July 1976). HPP-76-8 AIM-286, CS-570 Douglas Lenat, "AM: An Artificial Intelligence Approach to Discovery in Mathematics as Heuristic Search", Thesis: Ph.D. in Computer Science, (July 1976). HPP-76-9 AIM-291, CS-577 Bruce G. Buchanan, Joshua Lederberg, John McCarthy, "Three Reviews of J. Weizenbaum's Computer Power and Human Reason", (November 1976). HPP-76-10 Bruce G. Buchanan and Dennis Smith, "Computer Assisted Chemical Reasoning", in Proceedings of the 111 International Conference on Computers in Chemical Research, Education and Technology", Pleum Publishing, (1976). 11 HPP-77-1 STAN-CS-77-593 A. C. Scott, W. Clancey, R. Davis, E. H. Shortliffe, "Explanation Capabilities of Knowledge-Based Production Systems", American Journal of Computational Linguistics, Microfiche 62, Knowledge-Based Consultation Systems, (1977). HPP-77-2 STAN-CS-77-589 Robert S. Engelmore and H. Penny Nii 'A Knowledge-Based System for the Interpretatioin of Protein X-Ray Crystallographic Data', (January 1977). HPP-77-3 B.G. Buchanan, R. Davis, V. Yu and S. Cohen, "Rule Based Medical Decision Making by Computer", (Submitted to MEDINF0.77, 1977). HPP-77-4 T.M. Mitchell and G.M. Schwenzer, "Applications of Artificial Intelligence for Chemical Inference. XXV. A Computer Program For Automated Empirical 13C NMR Rule Formation", (Submitted to JACS, January 1977). HPP-77-5 STAN-C%-77-596 Mark Stefik and Nancy Martin, "A Review of Knowledge-Based Systems as a Basis for a Genetics Experiment Designing System", (February 1977). HPP-77-6 STAN-CS-77-597 Bruce G. Buchanan and Tom Mitchell. 'Model-Directed Learning of Production Rules', Submitted to the Proceedings for the Workshop on Pattern-Directed Inference Systems in Hawaii, (February, 1977). HPP-77-7 H. Penny Nii and Edward A. Feigenbaum, "Rule-based Understanding of Signals", to be presented at Workshop on Pattern-directed Inference Systems, (May 1977). Hpp-77-8 R. Davis, "Knowledge Acquisition on Rule-based Systems: Knowledge about Representations as a Basis for System Construction and Maintenance", submitted to Pattern-Directed Inference Conference, (May 1977). HPP-77-9 R. Davis, "interactive Transfer of Expertise I: Acquisition of New Inference Rules", submitted to Fifth IJCAI, (August 1977). 12 HPP-77-10 R. Davis, 'A Decision Support System for Medical Diagnosis and Therapy Selection in Data Base", SIGBDP Newsletter, 8, pp. 58-72, (Winter 1977). HPP-77-11 Dennis H. Smith and Raymond E. Carhart, "Structure Elucidation Based on Computer Analysis of High and Low Resolution Mass Spectral Data". Proceedings of the Symposium on Chemical Applications of High Performance Spectrometry, University of Nebraska, Lincoln, (1976 in press). 13 Budget Heuristic Programming Project Proposal for Renewal of Contract DAHC 15-73-c-0435 July 1, 1977 - June 30, 1979 Total Budget 7-l-77/6-30-78 7-l -78/6-30-79 7-l-77/6-30-79 Salaries Faculty Fei genbaum, Edward A. Professor 33%, Summer Quarter, 1977 17%, Academic Year, 1977-78 lOO%, Summer Quarter, 1978 lB%,Academic Year, 1978-79 Buchanan, Bruce G. Adjunct Professor 50%, Calendar Year Michie, Donald Visiting Professor in Artificial Intel 1 igence 38%, One Quarter Research Staff Brown, Dennis Research Associate 100% Summer Quarter Creary, Lewis G. Research Associate lOO%,Summer Quarter Davis, Randal 1 (or replacement) Research Associate Sk%, Calendar Year Engelmore, Robert Research Associate 50%, Calendar Year Research Associate (to be named) lOO%, Calendar Year Nii, H. Penny Scientific Programmer lOO%, Calendar Year White, William C. Programmer SO%, Calendar Year Five Student Research Assistants (to be named) SO%, Academic Year; 100% Summer Quarter Secretarial and Clerical Assistance lOO%, Calendar Year 15,750 16,695 32,445 4,667 4,947 9,614 4,375 10,000 15,120 4,375 10,000 16,027 31,147 24,538 19,000 26,010 13,000 50,548 9,896 10,490 20,386 34,965 37,065 72,030 9,600 10,368 19,968 15,829 3,750 3,750 Subtotal, Salaries 137,000 160,181 297,181 Staff Benefits 19.0% 9-l-78/8-31-77 20.0% g-1-77/8-31-78 20.8% g-1-78/8-31-79 Travel 10 West Coast trips 6 East Coast trips 1 Foreign trip Expendable Supplies and Materials Telephone, postage, office supplies, consulting, etc. Pub1 ications and Report Costs Equipment Rental and Maintenance Equ i pment Purchase Terminal Equipment 7-l-77/6-30-78 7-l-78/6-30-79 7-l-77/6-30-79 27,126 33,109 60,235 5,000 7,000 7,100 14,100 3,000 3,300 6,300 6,000 6,500 12,500 7,500 Subtotal Direct Costs lndi rect Costs, 58% (excluding equipment) TOTAL BUDGET 192,626 107,374 300,000 5,000 10,000 225,190 124,810 350,000 650,000 10,000 17,500 417,816 232,184