MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
"Interpretation of data revealed, among other things, that 64.4 percent of the
title entries contained as keywords one or more of the ILP subject heading words
under which they were indexed, and 25.1 percent contained logical equivalents.
The remaining 10.5 percent of the title entries had non-descriptive titles.
The difficulties with titles as sources of the indexing information stem from at least
three distinct types of determining factors: (1) the language habits, background,
interests, and idiosyncracies of the author; (2) the interests, familiarity with the subject
matter, language habits, imagination, and idiosyncracies of the user, and (3) factors
largely extrinsic to either the particular author or the particular user. In the first case,
we find especially the problem of the witty, punning, deliberately non-informative title,
the so-called `1pathological title". Janske gives the provocative example, in the literature
of information selection and retrieval itself, of "The Golden Retriever". 2/ Even in the
non-pathological case, however, there is the serious question of whether the author him-
self is likely to be a good indexer 3/
On the user side, the normal critical problems of "bringing the vocabulary of
indexer and searcher into coincidence" (Bernier, 1953 [55]) are aggravated by the facts
that the user of KWIC must anticipate the terminology used by a large number of
different "indexers" (i. e. , the authors), that title words spelled the same but with quite
different meanings in different special applications are grouped together in the same
place in the index, and that the same concepts may be expressed in quite different
phraseology depending on the author's, rather than the user's, field of specialization.
these aggravating circumstances there must be added in turn the psychological accept-
ability to the individual user of the scatter and redundancy, to say nothing of the format
and legibility, of a particular published index.
To
Such factors affecting the particular user will of course vary with the nature and pur-
post of his search. Kennedy points out, for example, that the location of a document from
only a single clue, a single title word, is particularly easy with a permuted title index
and he emphasizes that the "index purpose, use, size, statement and array are other
factors of considerable moment in judging the value of title indexes".
1/
2/
3/
National Science Foundation's CR&D Report No.11, [430], p. 62.
Janaske, 1962 [299] , p.4.
See, for example, a report on a conference on better indexes for technical literature,
ASLIB Proceedings, 13:4, April 1961, with a number of statements on the author as
a poor indexer. See also Crane and Bernier, 1958 [144], p. Si5: "Not even authors
are qualified to index their own work unless they are equipped for the task by train-
mg and experience
4/
Kennedy, 1961 [311], p. 125.
60