NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report

MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Indexes Generated by Machine-Automatic Derivative Indexing chapter Mary Elizabeth Stevens National Bureau of Standards "Interpretation of data revealed, among other things, that 64.4 percent of the title entries contained as keywords one or more of the ILP subject heading words under which they were indexed, and 25.1 percent contained logical equivalents. The remaining 10.5 percent of the title entries had non-descriptive titles. The difficulties with titles as sources of the indexing information stem from at least three distinct types of determining factors: (1) the language habits, background, interests, and idiosyncracies of the author; (2) the interests, familiarity with the subject matter, language habits, imagination, and idiosyncracies of the user, and (3) factors largely extrinsic to either the particular author or the particular user. In the first case, we find especially the problem of the witty, punning, deliberately non-informative title, the so-called `1pathological title". Janske gives the provocative example, in the literature of information selection and retrieval itself, of "The Golden Retriever". 2/ Even in the non-pathological case, however, there is the serious question of whether the author him- self is likely to be a good indexer 3/ On the user side, the normal critical problems of "bringing the vocabulary of indexer and searcher into coincidence" (Bernier, 1953 [55]) are aggravated by the facts that the user of KWIC must anticipate the terminology used by a large number of different "indexers" (i. e. , the authors), that title words spelled the same but with quite different meanings in different special applications are grouped together in the same place in the index, and that the same concepts may be expressed in quite different phraseology depending on the author's, rather than the user's, field of specialization. these aggravating circumstances there must be added in turn the psychological accept- ability to the individual user of the scatter and redundancy, to say nothing of the format and legibility, of a particular published index. To Such factors affecting the particular user will of course vary with the nature and pur- post of his search. Kennedy points out, for example, that the location of a document from only a single clue, a single title word, is particularly easy with a permuted title index and he emphasizes that the "index purpose, use, size, statement and array are other factors of considerable moment in judging the value of title indexes". 1/ 2/ 3/ National Science Foundation's CR&D Report No.11, [430], p. 62. Janaske, 1962 [299] , p.4. See, for example, a report on a conference on better indexes for technical literature, ASLIB Proceedings, 13:4, April 1961, with a number of statements on the author as a poor indexer. See also Crane and Bernier, 1958 [144], p. Si5: "Not even authors are qualified to index their own work unless they are equipped for the task by train- mg and experience 4/ Kennedy, 1961 [311], p. 125. 60