MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
6.4 Probabilistic Indexing and Natural Language Text Searching
As in the case of automatic indexing proposals based upon automatic sentence
extraction techniques, machine searching of full natural language text has been suggested
as a basis for, at least, automatic derivative indexing. We have remarked previously
that the machine use of complete text can only be considered to be "indexing" in a very
special sense, that it is subject either to the non-availability of suitable corpora already
in machine-usable form or to high costs of conversion to this form, and that too little
is yet known of linguistic analysis and searching-selection strategies effectively applicable
to natural language materials. Various examples of corroborating opinion, other than
those previously cited, are as follows:
`1Machine searching is superb if it is known exactly how to describe the object of
search, and if one could know how to choose from among many possible search-
ing strategies. I doubt if any one is yet in this comfortable position with respect
to machine searching of text." 1/
"The most effective programs in automatic linguistic analysis have served only
to illustrate how really complex is the structure of the language, and how far
removed the present state of the art is from any system which might be useful
in practice. `2/
"The recognition of words involves only the matching of digital codes, but
the recognition of an idea is a severe intellectual problem, the solution to
which will probably never be exact. Nevertheless, this is the problem which
must be attacked if accuracy is ever to be attained, or even approached, in
using the text of information items as a basis for their recovery." 3/
Nevertheless, some of the work both in natural language text searching and in
"probabilistic indexing" (where weights representing judgments as to degree of relevance
of an indexing term to an item are used either in indexing or search), provide instructive
insights into some of the problems of automatic indexing.
In the period 1958-1960, work at Ramo-Wooldridge resulted in the release or
publication of provocative papers by Maron, Kuhns, and Ray on "probabilistic indexing"
(1959 [398], 1960 [3971) and by Swanson on natural language text searching by computer
(1960 [587, 582], 1963 [583]). Subsequent work along these lines has included further
developments at Thompson Ramo-Wooldridge, the law statutes work at the Health Law
Center at the University of Pittsburgh, and the experimental investigations of Eldridge
and Dennis in a project jointly sponsored by the American Bar Foundation, IBM, and the
Council on Library Resources.
1/
2/
3/
Doyle, 1959 [168], p. 2.
Salton, 1962 [520], p. 111-1 through 111-2.
Doyle, 1959 [165], p. 12.
132