The table beloW illustrates the problem of weight- ing plirasal terms Using topic 101 and a relevant docu- ment (WSJ870226-0091). Topic 101 matches W5J870226-0091 duplicate terms not shown TERM TE JDF sdi 1750 eris 3175 star 1072 wars 1670 laser 1456 weapon 1639 missile 872 NEW WEIGHT 1750 3175 1072 1670 1456 1639 872 space+base 2641 2105 interceptor 2075 2075 exoatmospheric 1879 3480 system+defense 2846 2219 reentry+vehicle 1879 3480 initiative+defense 1646 2032 svstem+interceptor 2526 3118 DOCRANK 30 10 changing the weighting scheme for compound terms, along with other minor improvements (such as expanding the stopword list for topics, or correcting a few parsing bugs) has lead to the overall increase of precision of nearly 20% over our official ~II~EC-2 ad-hoc results. Table 4 summarizes these new runs for queries 101-150 against WSJ database. Simllar improvements have been obtained for queries 51-100. The results of the routing runs against SJMN data- base are somewhat more troubling. Applying the new weighting scheme we did see the average precision increase by some 5 to 12% (see column 4 in Table 3), but the results remain far below those for the ad-hoc runs. Direct runs of queries 51-100 against SJMN database produce results that are about the same as in the routing runs (which may indicate that our routing scheme works fine), however the same queries run against WSJ data- base have retrieval precision some 25% above SJMN runs. This may indicate some problems with SJMN data- base or the relevance judgements for it. ~OT SPOT' RETRIEVAL Another difficulty with frequency-based term weighting arises when a long document needs to be retrieved on the basis of a few short relevant passages. If the bulk of the document is not direcdy relevant to the query, then there is a strong possibility that the document will score low in the final ranling, despite some strongly relevant material in it. This problem can be dealt with by subdividing long documents at paragraph breaks, or into approximately equal length fragments and indexing the database with respect to these (e.g., Kwok 1993). While 134 Run nyuirl nyuirla nyuir2 nyuir2a Name ad-hoc ad-hoc ad-hoc ad-hoc Queries 50 50 50 50 Tot number of docs over all queries Ret 49884 5OOOO 49876 5OOOO Rel 3929 3929 3929 3929 ReIRet 2983 3108 3274 3401 Recall (interp) Precision Averages 0.00 0.7013 0.7201 0.7528 0.8063 0.10 0.4874 0.5239 0.5567 0.6198 0.20 0.4326 0.4751 0.4721 0.5566 0.30 0.3531 0.4122 0.4060 0.4786 0.40 0.3076 0.3541 0.3617 0.4257 0.50 0.2637 0.3126 0.3135 0.3828 0.60 0.2175 0.2752 0.2703 0.3380 0.70 0.1617 0.2142 0.2231 0.2817 0.80 0.1176 0.1605 0.1667 0.2164 0.90 0.0684 0.1014 0.0915 0.1471 1.00 0.0102 0.0194 0.0154 0.0474 Average precision over all rel does Avg 0.2649 0.3070 0.3111 0.3759 Precision at 5 does 0.4920 0.5200 0.5360 0.6040 10 does 0.4420 0.4900 0A880 0.5580 15 does 0.4240 0.4653 0A693 0.5253 20 does 0.4050 0.4420 0.4390 0.4980 30 does 0.3640 0.3993 0.4067 0.4607 100 does 0.2720 0.2914 0.3094 0.3346 200 docs 0.1886 0.2064 0.2139 0.2325 S00docs 0.1026 0.1103 0.1137 0.1229 1000 does 0.0597 0.0622 0.0655 0.0680 R-Precision (after Rel) Exact 0.3003 0.3332 0.3320 0.3950 Table 4. Automatic ad-hoe run statistics for queries 101-150 against WSJ database: (1) nyuirl - TREC-2 official run with and fields only; (2) ?~uir1a - revised term weighting run; (3) nyuir2 - official TREC-2 run with , , and fields only; and (4) nyuir2a - revised weighting run. such approaches are effective, they also tend to be costly because of increased index size and more complicated