NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report

MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Other Potentially Related Research chapter Mary Elizabeth Stevens National Bureau of Standards 1/ present time. - In terms of the state-of-the-art of automatic indexing, therefore, we shall not consider these approaches as more than indications for future research. A few suggestive examples are discussed briefly below. The multi-pronged attack on mechanized information selection and retrieval problems headed by Salton and his associates includes the exploration of tree structures, to represent both the relationships between terms in a classification schedule or indexing term vocabulary and the representation of the results of automatic syntactic analyses of natural language text. It is proposed, then, that computer programs can achieve trans- formations of the syntactic trees representing word strings in the original text into simplified, condensed structures with normalized terms and can compare these trees with the classificatory trees (Salton, 1961 [516]). Manipulation of such trees together with appropriate dictionaries or thesauri can result, for a given proposed index term, in the finding of a preferred term for a particular system, or a set of synonymous terms, or sets of all terms in which the given term is included, and the like. Anger considers some of the problems involved in complete syntactic analysis of texts with the objective of identifying the total network of relationships expressed and implied, as proposed by Lecerf, Ruvinschii, and Leroy, among others, of the Research Group on Automated Scientific Information (GRISA), EURATOM. Assuming that computer programs for syntactic analysis are or will be available, he suggests that simplifications may be obtained by determining only the basic relations that are indicated by direct syntactic dependencies or by linking words, (Anger, 1961 [15] ). A specific program for automatically extracting syntactic information from text has been studied by Lemmon (1962 [354]). The possibilities for combining dictionary lookups, word suffixes as indicators of syntactic role, and predictive syntactic analysis for text processing have also been further explored by Salton himself (1962 [518], 1963 [519] ). A variety of word and document association techniques and of synonymous word and phrase groupings which serve to "clue" the selection of a subject heading are also being investigated by members of the Harvard group and guest investigators. 1/ Major difficulties have to do with limitations both upon grammars and vocabularies so far tested and with ambiguities and the number of alternative parsings generated. See, for example, Bobrow, 1963 [68]. Kuno and Oettinger, 1963 [341] and Robinson, 1964 [soz]. Bobrow provides a survey of syntactic analysis programs as of 1963, noting limitations or restrictions on each. He reports, for example, that available programs to compute word classes are not always correct in the class assignments made and that analysis systems are not complete unless they provide means for distinguishing between "meaningless strings and grammatical sentences whose meaning can be understood". He concludes: "Until a method of syntactic analysis provides, for example a means of mechanizing translation of natural language, processing of a natural language input to answer questions, or a means of generating some truly coherent discourse, the relative merit of each grammar will remain mbot." ([68], p. 385) Robinson ([502], p. 12) says of sentences which can be parsed correctly, that they are: "UsQally short sentences with no complicated embeddings of relative clauses and few participial or prepositional phrase modifiers. These include the basic sentences that most grammars are equipped to handle and that adult writers seldom produce." 128