MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
"The criteria for attributing significance to words . . . may be positional (in virtue of
their occurrence in titles or section headings), or semantic (in virtue of their
relation to words like `summary'), or perhaps even pragmatic (in the case of names
of specialists mentioned in text footnotes, or bibliography
"A cataloguer or abstract-writer would naturally give more weight to a technical
word that appears in a title, in a first paragraph, or in a summary. A machine
can be programmed to do the same. It can be instructed to recognize the title by
position and capitalization . .. It can place first-paragraph indications... It can
test every heading or subtitle for the words rsummaryt or `conclusions' and place
a summary indication after each word in the summary paragraphs." 1/
"The statistical criteria . . . by no means exhaust the potential clues to the
representativeness of sentences. Among other plausible clues are certain words
and phrases ... authors use words such as `conclusion', `demonstrate', `disclose',
`prove', `show', and `summary' (and related forms of these) with high frequency in
sentences that contain concise statements about the topic or topics of the article.
The occurrence in a sentence of such a phrase as `it was found that...', `the
experiment proves. . . `, or `the central problem is . . . ` would indicate probably
even more sharply than any single word could that the sentence was likely to be
highly representative of the topics..." 2/
3.3.6 Recent Examples of Mixed Systems Experimentation
It is quite obvious from the above samples of suggestions for the use of various
special clues for automatic extraction, that improved systems will largely depend upon
a mixture of means for determining subject- representativeness of words, phrases, and
sentences Many of the clues suggested by Edmundson and WyUys are continuing to be
explored, as mixed systems, at RAND 3/ and the System Development Corporation, (1962
[590]), for example. Two specific recent examples of mixed systems experimentation
are the automatic abstracting experiment programs at Thompson Ramo-Wooldridge and
the work involving detection of first incidences of nouns at the Harvard Computation
Laboratory.
The TRW programs to investigate possibilities of computer generation of document
auto-abstracts, involving both English and Russian language texts are based upon a
combination of four different methods to measure significance and determine representa-
tiveness. These four methods are briefly described as follows:
The Key method has its source of machine recognizable clues the specific
characteristics of the body of the document and is based on a Key Glossary of
content words taken from the body pf the document.
1/
2/
3/
Edmundson and Wyllys, 1961 [181], pp. 227 and 229.
Wyllys, 1963L653J, p.25.
See National Science Foundation's CR&D report No. 11, [430], pp. 314-315.
86