NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Latent Semantic Indexing (LSI) and TREC-2 chapter S. Dumais National Institute of Standards and Technology D. K. Harman (documents not in the LSI scaling were mislabeled), so the isial run results are incomplete and misleading. We have corrected this translation problem, and the correct results are labeled lsial*. These results are summarized In Table 2. We have not yet completed the comparison agalnst the 9 separate subspaces from ThEC-1. performance would be somewhat better. Similarly for the topl()0() documents, lsial* had more than 4000 more documents without relevance judgements than did lsiasm. Table 3 isiasm lsial* isiasm `sial* Table 2 toplOO toplOO toplOOO toplOOO isiasm isial lsial* relevant 2153 2122 15559 12230 error correct not-relevant 2847 1961 7869 6987 not-judged 0 927 26572 30694 Rel_ret 7869 4756 6987 Avg prec .3018 .1307 .2505 Table 3: Summary of missing relevance Prat 100 .4306 .2664 .3922 judgemeuts for standard vector method and LSI. Prat 10 .5020 .3340 .5100 R-prec .3580 .1937 .3069 Q >= Median 37 (2) 16 (1) 25 (1) Q< Median 13 (0) 34 (7) 25 (0) Table 2: LSI Adhoc Results. Comparison of standard vector method with LSI (corrected version, but missing relevance judgements). In terms of absolute levels of performance, both lsiasm and lsial* are about average. The SMART results (lsiasm) are somewhat worse than the [OCRerr]IREC-2 SMART results reported by Buckley et al., Fuhr et al., or Voorhees, but this is because we used slightiy different pre-processing options and did not include phrases. Mthough it is generally difficult to compare across systems, the SMART (lsiasm) and LSI (lsial*) runs can meaningfully be compared since both use the same pre-processing. The starting term-document matrix was the same in both cases. Much to our disappointment, the reduced-dimension LSI performance appears to be somewhat worse than the comparable SMART vector method. However, it is important to realize that many of the documents returt[OCRerr] by lsial* were not judged for relevance because they were not submitted as an official run. Table 3 shows the number of doccments for which there are no judgements. Consider the results for just the toplOO documents for each query (i.e., the documents judged by the NIST assessors). For lsiasm, all 5000 documents were judged since this was an official run, and 2153 were relevant. For lsial*, only 4073 documents were judged and almost as many, 2122, were relevant. Thus, if only 31 of the 927 unjudged lsial* documents are relevant LsI performance would be comparable to SMART performance, and if more than 31 were relevant LSI 110 Because the missing relevance judgements make direct comparisons between SMART and LSI difficult, we decided to look at performance for just the documents for which we had relevance judgements. That is, we looked at performance considering just the 38175 unique documents for which we have adhoc relevance judgements. These results are shown in Table 4. Table 4 lsiasm lsial* 38175 38175 Rel_ret 9493 9596 Avg prec .3700 .3789 Prat 100 A306 .4466 PratlO .5020 .5220 R-prec .3977 .3995 Table 4: LSI Adhoc Results. Comparison of standard vector method with LSI using only documents for which relevance judgements were available. The most strildng aspect of these results is the higher overall levels of performance. This is to be expected since we are only considering the 38175 documents for which we have relevance judgements, and there are 700k fewer documents than in the official results. Considering only this subset of documents, there is a small advantage for LSI compared to the SMART vector method. Taken together with the results for just the top 100 documents these results suggest that LSI can outperform a stralghtforward vector method. We were somewhat disappointed at the relatively