NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report

MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Conclusion chapter Mary Elizabeth Stevens National Bureau of Standards Let us recall the objections to the use of the terms "auto-encoding" (or "auto-index- ing" or `1auto-abstracting") because of the possible connotation of self-encoding, etc.. This is an objection based upon avoiding ambiguous or misleading terminology, but it also points to an objection as to the principle involved- -that is, of treating the document itself, in its own right, as a self-sufficient, self-contained, universe of discourse, and of assum- ing that some type of summation-condensation over a number of different and individually- derived representations of the separate documents in a collection can provide an effective selection-retrieval guidance system to the contents of various specific documents in that collection. Even when the actual operations are to be abetted by synonym reduction and normalization procedures (whether at the indexing or search negotiation stage, or both), there is a significant difference between this endogenous hypothesis and its exogenous alternative: that the basis for automatic indexing be the consensus of the collection, or of a sample of the collection, or of prior indexing. Assignment indexing, especially in the sense that concept-indexing is the goal, may be subjectively preferable to derivative indexing not only because it involves exogenous emphases but because it tends to delimit, centralize, and standardize the access points available to the user in his search-retrieval operations. However, in terms of the human indexing situation, it involves all the traditional difficulties of indexing - which in turn invoke the problems of evaluating indexing systems: "Justification for any indexing technique must ultimately be based on successful retrieval. Success can only be evaluated in terms of a closed system; that is, a system wherein sufficient knowledge is available of the entire contents of the materials, so that an evaluation can be made of various techniques as to their retrieval effectiveness. The various systems .. . cannot really be weighed except on the basis of a test comparing one against the other. This has not been done in any place." 2/ Nevertheless, there are a variety of reasons for accepting even tL[OCRerr] relatively crude derivative indexing products as practical tools today, for seeking machine-usable rules for the improvement of these products, and for continuing research efforts in automatic assignment indexing and automatic classification. There are, first and foremost, the cases where conventional indexes are inadequate or non-existent. Thus Wyllys claims: "It is well-known that the current methods of producing, through human efforts, condensed representations of documents are already hopelessly inadequate to cope with the present volume of scientific and technical literature. Many papers are never indexed or abstracted at all, and even in the cases of those that are indexed or abstracted3 the indexes and abstracts do not become available until six mo[OCRerr]ths to two years after the publication of the paper." 3/ Again, with respect to automatic derivative indexing, especially KWIC indexes based on titles alone, there can be no question as to the evaluation criterion of timeliness. The success of this aspect is widely acknowledged by users, systems planners, and interested observers. On the other hand, there is very little reported evidence available on which 1/ See p.3 0Ł this report. z/ Black, 1963 [64], p. 16. 3/ Wyllys, 1961 [650], p. 6. 175