SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Knowledge-Based Searching with TOPIC
chapter
J. Lehman
C. Reid
National Institute of Standards and Technology
D. K. Harman
this list exhaustive. This produced the following
results:
Relevant = 55
Rel_ret=17
R-Precision = 0.2182
Thus we reduced the recall, but increased the precision.
Presumably by adding more specific cancers (or at least
the ones that statistically are most common) we could
have improved the recall here.
The second problem is more severe though. It appears
impossible to build any kind of model that would allow
us to determine, with any kind of confidence, that the
person who has died is a US citizen. In our revised
results list we find many prominent persons who died of
a named cancer but who are not US citizens (e.g., the
Venezuelan Ambassador).
In addition, the notion of prominence is also hard to
capture. Of course, we might argue that anyone who's
obituary is on the wire service is prominent by
definition! Be that as it may, we observed a number of
documents that we did not retrieve because we had not
included the specific prominent role indicator in our
Topic. Thus we added the following roles words -
t1author't, "poettt, "writer", "artist", "painter" - to the
Topic and got the following results:
Relevant = 55
Rel_ret=33
R-Precision - .0909
Thus we improved the recall but at the expense of the
precision again. Nofice that we still have not included
any business or government roles, which presumably
would help retrieve the relevant documents in the WSJ
corpus.
Our conclusion, is that this is a significant challenge for
Topic, and all other system. The citizenship question
often cannot be resolved by reference to the text alone,
and we see no alternative but accept the false hits.
Prominence is also difficult, but could conceivably be
approached by an extensive list of prominence and role.
words. The specific cancer seems tractable since there
are only a finite number of cancers and just a small set
of those are common.
3.4.2.3 AD HOC TOPIC 133
A relevant document for this topic must describe some
design feature of the Hubble Space Telescope, but must
not report of the launch activity itself nor the Hubble
Constant or Edwin Hubble.
The official Topic was essentially a simple structure of
the form: Hubble Space Telescope and not launch and
not Edwin Hubble. This gave the following results:
Relevant = 80
Rel_ret = 29
R-Precision = 0.3625
219
which is surprisingly poor given the apparent simplicity
of the topic.
Analysis of the behavior of the negation function in
Topic shows that it is too restrictive, and so we
eliminated the negated concepts leaving just the phrase
"Hubble Space Telescope". Using this as the query
gave:
Relevant = 80
Rel_ret=78
R-Precision = 0.6000
which would have been above median and close to best.
Adding as disjuncts (OR) the words "Hubble" and "HST"
gave:
Relevant = 80
Rel_ret=79
R-Precision = 0.6000
that is we retrieved one extra relevant document with no
decrease in precision.
We conclude that although the information need
statement is careful to spell out the cases where the
document will be non-relevant, the TREC corpus has
few documents where these conditions apply, so that a
simple query performs very well. This is presumably
the approach most sites took.
4. FINAL OBSERVATIONS FROM
TREC-2
The TREC-2 topic descriptions, particularly the ad hoc
topics, exceed the level of domain knowledge available
to most users of heterogeneous document collections.
Most Topic (content-based) search operafional users are
driven by time pressures to locate/summarize the most
relevant details in the fewest possible documents. The
exhaustive search result analysis implied by examining
hundreds of relevant documents will not be addressed in
most user environments; our experience is that ten to
thirty documents is the level of search result analysis
performed by a user (unless significant duplicafion of
material occurs earlier, which would reduce the number
of documents actually analyzed). Ergonomically, high
precision in the first (10,20...50) documents is more
likely to keep users attracted than high recall at much
larger counts.
Although we have yet to perform any analysis of
duplicate information on the TREC2 results, our belief
is that duplicate data is plentiful in the TREC2 "relevant
lists", and that the reading of duplicate data by the
human user will cause the result analysis to be
(prematurely) terminated.
We are certain that, unless summarization is performed,
the relevant search results on most topics are too
numerous to warrant user attention. It would seem