MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Conclusion
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Let us recall the objections to the use of the terms "auto-encoding" (or "auto-index-
ing" or `1auto-abstracting") because of the possible connotation of self-encoding, etc..
This is an objection based upon avoiding ambiguous or misleading terminology, but it also
points to an objection as to the principle involved- -that is, of treating the document itself,
in its own right, as a self-sufficient, self-contained, universe of discourse, and of assum-
ing that some type of summation-condensation over a number of different and individually-
derived representations of the separate documents in a collection can provide an effective
selection-retrieval guidance system to the contents of various specific documents in that
collection. Even when the actual operations are to be abetted by synonym reduction and
normalization procedures (whether at the indexing or search negotiation stage, or both),
there is a significant difference between this endogenous hypothesis and its exogenous
alternative: that the basis for automatic indexing be the consensus of the collection, or of
a sample of the collection, or of prior indexing.
Assignment indexing, especially in the sense that concept-indexing is the goal, may
be subjectively preferable to derivative indexing not only because it involves exogenous
emphases but because it tends to delimit, centralize, and standardize the access points
available to the user in his search-retrieval operations. However, in terms of the human
indexing situation, it involves all the traditional difficulties of indexing - which in turn
invoke the problems of evaluating indexing systems:
"Justification for any indexing technique must ultimately be based on successful
retrieval. Success can only be evaluated in terms of a closed system; that is, a
system wherein sufficient knowledge is available of the entire contents of the
materials, so that an evaluation can be made of various techniques as to their
retrieval effectiveness. The various systems .. . cannot really be weighed except
on the basis of a test comparing one against the other. This has not been done in
any place." 2/
Nevertheless, there are a variety of reasons for accepting even tL[OCRerr] relatively crude
derivative indexing products as practical tools today, for seeking machine-usable rules
for the improvement of these products, and for continuing research efforts in automatic
assignment indexing and automatic classification. There are, first and foremost, the
cases where conventional indexes are inadequate or non-existent. Thus Wyllys claims:
"It is well-known that the current methods of producing, through human efforts,
condensed representations of documents are already hopelessly inadequate to cope
with the present volume of scientific and technical literature. Many papers are
never indexed or abstracted at all, and even in the cases of those that are indexed
or abstracted3 the indexes and abstracts do not become available until six mo[OCRerr]ths
to two years after the publication of the paper." 3/
Again, with respect to automatic derivative indexing, especially KWIC indexes based
on titles alone, there can be no question as to the evaluation criterion of timeliness. The
success of this aspect is widely acknowledged by users, systems planners, and interested
observers. On the other hand, there is very little reported evidence available on which
1/ See p.3 0£ this report.
z/ Black, 1963 [64], p. 16.
3/ Wyllys, 1961 [650], p. 6.
175