NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report

MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Other Potentially Related Research chapter Mary Elizabeth Stevens National Bureau of Standards "We have also perceived that two different cognitive processes seem to be responsible for each type of correlation, one (adjacent correlation) involving the habitual use of word groups as semantic units, and the other (proximal correlation) having to do with the patt[OCRerr] of reference to various aspects of that which is being discussed. We can call the statistical effects, respectively, `language redundancy', and `reality redundanc[OCRerr]. Such a resolution of statistical effects is full of significance for information retrieval because it appears likely that reality redundancy can vary greatly from one science to another, whereas language redundancy, a universal property of talking and writing, is relatively invariant." 1/ With respect to the "semantic roadmap" or "association map" technique itself, Doyle's suggestion is that various measures of word and index term cross-associations may be applied to the generation of graphic displays of both types of co-occurrence relationships. Because of the variety of, in particular, the "proximal" correlations, it is assumed that the literature searcher should be given a display in which the repre- sentation of the assemblage of the varied relationships is two-dimensional rather than one. 2/ An example is given, based upon computer processing of 600 abstracts of SDC internal reports to find intersections between 500 topical words, of associational con- nections for the word "output". This was generated by selecting the eight words most strongly correlated in the data with "output", such as "manual" and "radar", and then finding three other words highly correlated with each of these and also correlated with "output" itself. From the initial graph, it is further shown that item surrogates might be generated by word selection rules applied to documents to p i3ck up, for example, "New York Air Defense system data o[OCRerr]tp[OCRerr]ts D.C. [OCRerr] Continuing related work by Doyle and others at SDC has included various experi- mental studies of "pseudo-documents" consisting of lists of the twelve most frequently occurring words in 100-item samples of abstracts in various subject fields (Doyle, 1961 [OCRerr]l6l]) Of special interest in terms of potential improvements and modifications to machine indexing techniques are studies, based on similar lists, looking to the separa- tion of words that may have been used in several different senses, i.e., the detection of homographs by statistical means (Doyle, 1963 E171]) More recent investigations by Doyle involve considerations of differences between word-grouping and document-group- ing techniques and of possibilities for use of hybrid methods. 6.2.4 Work of Giuliano and Associates, the ACORN Devices A program directed toward the design of "an English command and control language system" under an Air Force contract with Arthur D. Little, Inc., involves several inter- related aspects of natural language text processing, use of statistical association factors in search, man-machine interaction during search, and display of associational relation- ships by means of analog network devices. In this program and in related research, Giuliano and his associates are convinced that: 1/ 2/ 3/ Doyle, 1961 [l69[OCRerr], p. 15. Doyle, 1962 [l63[OCRerr], p. 379. Doyle, 1961 L169], pp. 24-25. 124