MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
"We have also perceived that two different cognitive processes seem to be
responsible for each type of correlation, one (adjacent correlation) involving
the habitual use of word groups as semantic units, and the other (proximal
correlation) having to do with the patt[OCRerr] of reference to various aspects of
that which is being discussed. We can call the statistical effects, respectively,
`language redundancy', and `reality redundanc[OCRerr]. Such a resolution of statistical
effects is full of significance for information retrieval because it appears likely
that reality redundancy can vary greatly from one science to another, whereas
language redundancy, a universal property of talking and writing, is relatively
invariant." 1/
With respect to the "semantic roadmap" or "association map" technique itself,
Doyle's suggestion is that various measures of word and index term cross-associations
may be applied to the generation of graphic displays of both types of co-occurrence
relationships. Because of the variety of, in particular, the "proximal" correlations, it
is assumed that the literature searcher should be given a display in which the repre-
sentation of the assemblage of the varied relationships is two-dimensional rather than
one. 2/ An example is given, based upon computer processing of 600 abstracts of SDC
internal reports to find intersections between 500 topical words, of associational con-
nections for the word "output". This was generated by selecting the eight words most
strongly correlated in the data with "output", such as "manual" and "radar", and then
finding three other words highly correlated with each of these and also correlated with
"output" itself. From the initial graph, it is further shown that item surrogates might
be generated by word selection rules applied to documents to p i3ck up, for example,
"New York Air Defense system data o[OCRerr]tp[OCRerr]ts D.C. [OCRerr]
Continuing related work by Doyle and others at SDC has included various experi-
mental studies of "pseudo-documents" consisting of lists of the twelve most frequently
occurring words in 100-item samples of abstracts in various subject fields (Doyle, 1961
[OCRerr]l6l]) Of special interest in terms of potential improvements and modifications to
machine indexing techniques are studies, based on similar lists, looking to the separa-
tion of words that may have been used in several different senses, i.e., the detection of
homographs by statistical means (Doyle, 1963 E171]) More recent investigations by
Doyle involve considerations of differences between word-grouping and document-group-
ing techniques and of possibilities for use of hybrid methods.
6.2.4 Work of Giuliano and Associates, the ACORN Devices
A program directed toward the design of "an English command and control language
system" under an Air Force contract with Arthur D. Little, Inc., involves several inter-
related aspects of natural language text processing, use of statistical association factors
in search, man-machine interaction during search, and display of associational relation-
ships by means of analog network devices. In this program and in related research,
Giuliano and his associates are convinced that:
1/
2/
3/
Doyle, 1961 [l69[OCRerr], p. 15.
Doyle, 1962 [l63[OCRerr], p. 379.
Doyle, 1961 L169], pp. 24-25.
124