NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Knowledge-Based Searching with TOPIC chapter J. Lehman C. Reid National Institute of Standards and Technology D. K. Harman this list exhaustive. This produced the following results: Relevant = 55 Rel_ret=17 R-Precision = 0.2182 Thus we reduced the recall, but increased the precision. Presumably by adding more specific cancers (or at least the ones that statistically are most common) we could have improved the recall here. The second problem is more severe though. It appears impossible to build any kind of model that would allow us to determine, with any kind of confidence, that the person who has died is a US citizen. In our revised results list we find many prominent persons who died of a named cancer but who are not US citizens (e.g., the Venezuelan Ambassador). In addition, the notion of prominence is also hard to capture. Of course, we might argue that anyone who's obituary is on the wire service is prominent by definition! Be that as it may, we observed a number of documents that we did not retrieve because we had not included the specific prominent role indicator in our Topic. Thus we added the following roles words - t1author't, "poettt, "writer", "artist", "painter" - to the Topic and got the following results: Relevant = 55 Rel_ret=33 R-Precision - .0909 Thus we improved the recall but at the expense of the precision again. Nofice that we still have not included any business or government roles, which presumably would help retrieve the relevant documents in the WSJ corpus. Our conclusion, is that this is a significant challenge for Topic, and all other system. The citizenship question often cannot be resolved by reference to the text alone, and we see no alternative but accept the false hits. Prominence is also difficult, but could conceivably be approached by an extensive list of prominence and role. words. The specific cancer seems tractable since there are only a finite number of cancers and just a small set of those are common. 3.4.2.3 AD HOC TOPIC 133 A relevant document for this topic must describe some design feature of the Hubble Space Telescope, but must not report of the launch activity itself nor the Hubble Constant or Edwin Hubble. The official Topic was essentially a simple structure of the form: Hubble Space Telescope and not launch and not Edwin Hubble. This gave the following results: Relevant = 80 Rel_ret = 29 R-Precision = 0.3625 219 which is surprisingly poor given the apparent simplicity of the topic. Analysis of the behavior of the negation function in Topic shows that it is too restrictive, and so we eliminated the negated concepts leaving just the phrase "Hubble Space Telescope". Using this as the query gave: Relevant = 80 Rel_ret=78 R-Precision = 0.6000 which would have been above median and close to best. Adding as disjuncts (OR) the words "Hubble" and "HST" gave: Relevant = 80 Rel_ret=79 R-Precision = 0.6000 that is we retrieved one extra relevant document with no decrease in precision. We conclude that although the information need statement is careful to spell out the cases where the document will be non-relevant, the TREC corpus has few documents where these conditions apply, so that a simple query performs very well. This is presumably the approach most sites took. 4. FINAL OBSERVATIONS FROM TREC-2 The TREC-2 topic descriptions, particularly the ad hoc topics, exceed the level of domain knowledge available to most users of heterogeneous document collections. Most Topic (content-based) search operafional users are driven by time pressures to locate/summarize the most relevant details in the fewest possible documents. The exhaustive search result analysis implied by examining hundreds of relevant documents will not be addressed in most user environments; our experience is that ten to thirty documents is the level of search result analysis performed by a user (unless significant duplicafion of material occurs earlier, which would reduce the number of documents actually analyzed). Ergonomically, high precision in the first (10,20...50) documents is more likely to keep users attracted than high recall at much larger counts. Although we have yet to perform any analysis of duplicate information on the TREC2 results, our belief is that duplicate data is plentiful in the TREC2 "relevant lists", and that the reading of duplicate data by the human user will cause the result analysis to be (prematurely) terminated. We are certain that, unless summarization is performed, the relevant search results on most topics are too numerous to warrant user attention. It would seem