MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Compiled by Machine
chapter
Mary Elizabeth Stevens
National Bureau of Standards
In machine-compiled indexes, no item or entries are eliminated by the machine,
whereas in even the most rudimentary of machine-generated indexes, such as KWIC,
various reductive or extractive operations are automatically applied as a part of the
machine procedure. We shall be concerned in this section with brief discussions of
machine-compiled indexes and related devices, specifically, concordances, card or book
catalogs mechanically prepared, citation indexes, and special indexes such as Tabledex.
The use of machines to compile, sort, duplicate and list index entries can only be con-
sidered to be mechanized indexing in a relatively trivial sense. We shall consider, there-
fore, only a few representative examples, emphasizing early work and some of the
pioneering instances.
2.1 Concordances and Complete Text Processing
When as early as 1856, Crestadoro proposed the use of permutations of the words in
titles as a subject-content index the only "machines" available for the processing opera-
tions were people acting in a strictly clerical way. Precisely such clerical operations
have been used for centuries in a process that is, in the special sense of full representa-
tion of document contents, an index-producing operation--the making of concordances. 1/
The task of listing each separat[OCRerr] word in a book in all the contexts in which it appears
is incredibly time-consuming and tedious when carried out by manual means. There are
those who have spent the major part of their lifetimes at this task. For example: "It
21
took James Strong thirty years to compile his exhaustive Concordance of the Bible..." -
The use of machines capable of processing signals which represent and preserve in-
formation offered a potentially revolutionary change, and with the advent of the electronic
computer even more radical possibilities of very high speed processing were opened up.
As early as 1949, J. W. Mauchly (the co-inventor of ENIAC and UNIVAC) envisioned
the use of computers for documentation and library science activities. He suggested that
the full information contents of the Library of Congress collections could be recorded in
machine language, stored in this form on magnetic tape, and searched by machine in a
procedure which would match words or other selection indicia occurring in the recorded
information to the specified words or selection criteria of a query or search prescription.
Specifically, he estimated that the entire collection, then amounting to 10, 000, 000 books,
could when transcribed to binary-code representation 3/ be serially searched in 20
hours. 4/
1/
See, for example, Black, 1962 [65], p.314: "The oldest book in the world has had
such an index for many years--the concordance to the Bible;" Markus, 1962 [394],
p.19: "The ultimate in permutation for indexing is a published concordance;" Linder,
1960 [363], p.99: "[OCRerr]e know of a concordance prepared in the 13th Century;"
Simmons and McConlogue, 1962 [555], p.3: "Complete indexing has been used of
course for centuries in the preparation of concordances."
2/
3/
Carlson, 1963 [101], p. 211.
That is, markings which have one of two values (thus, binary digits or "bits"), can
be used to distinguish between 2n different other symbols such as alphabetic
characters by using log 2n of such markings. A binary code for the 26 letters of the
English alphabet requires a five-bit representation for each letter. If numeric digit
characters are also recorded, (26+10), a six-bit code representation is required.
Mauchly, 1949 [406], p.295. See also "Report to the Secretary of Commerce on the
application of machines..." 1954 [620], p. 67.
15
4/