MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
1/
present time. - In terms of the state-of-the-art of automatic indexing, therefore, we
shall not consider these approaches as more than indications for future research. A few
suggestive examples are discussed briefly below.
The multi-pronged attack on mechanized information selection and retrieval
problems headed by Salton and his associates includes the exploration of tree structures,
to represent both the relationships between terms in a classification schedule or indexing
term vocabulary and the representation of the results of automatic syntactic analyses of
natural language text. It is proposed, then, that computer programs can achieve trans-
formations of the syntactic trees representing word strings in the original text into
simplified, condensed structures with normalized terms and can compare these trees
with the classificatory trees (Salton, 1961 [516]). Manipulation of such trees together
with appropriate dictionaries or thesauri can result, for a given proposed index term, in
the finding of a preferred term for a particular system, or a set of synonymous terms, or
sets of all terms in which the given term is included, and the like.
Anger considers some of the problems involved in complete syntactic analysis of
texts with the objective of identifying the total network of relationships expressed and
implied, as proposed by Lecerf, Ruvinschii, and Leroy, among others, of the Research
Group on Automated Scientific Information (GRISA), EURATOM. Assuming that computer
programs for syntactic analysis are or will be available, he suggests that simplifications
may be obtained by determining only the basic relations that are indicated by direct
syntactic dependencies or by linking words, (Anger, 1961 [15] ).
A specific program for automatically extracting syntactic information from text has
been studied by Lemmon (1962 [354]). The possibilities for combining dictionary lookups,
word suffixes as indicators of syntactic role, and predictive syntactic analysis for text
processing have also been further explored by Salton himself (1962 [518], 1963 [519] ).
A variety of word and document association techniques and of synonymous word and
phrase groupings which serve to "clue" the selection of a subject heading are also being
investigated by members of the Harvard group and guest investigators.
1/
Major difficulties have to do with limitations both upon grammars and vocabularies
so far tested and with ambiguities and the number of alternative parsings generated.
See, for example, Bobrow, 1963 [68]. Kuno and Oettinger, 1963 [341] and
Robinson, 1964 [soz]. Bobrow provides a survey of syntactic analysis programs
as of 1963, noting limitations or restrictions on each. He reports, for example,
that available programs to compute word classes are not always correct in the
class assignments made and that analysis systems are not complete unless they
provide means for distinguishing between "meaningless strings and grammatical
sentences whose meaning can be understood". He concludes: "Until a method of
syntactic analysis provides, for example a means of mechanizing translation of
natural language, processing of a natural language input to answer questions, or a
means of generating some truly coherent discourse, the relative merit of each
grammar will remain mbot." ([68], p. 385) Robinson ([502], p. 12) says of
sentences which can be parsed correctly, that they are: "UsQally short sentences
with no complicated embeddings of relative clauses and few participial or
prepositional phrase modifiers. These include the basic sentences that most
grammars are equipped to handle and that adult writers seldom produce."
128