from the WSJ training database: GTS (takeover) GTS (merge) GTS (buy -out) GTS (acquire) with =0.00145576 =0.00094518 =0.00272580 =0.00057906 SIM (takeover,merge) = 0.190444 SIM (takeover,buy -out) =0.157410 SIM (takeover,acquire) =0.139497 SIM (merge, buy -out) = 0.133800 SIM (merge,acquire) =0.263772 SIM (buy -out,acquire) = 0.109106 Therefore both takeover and buy-out can be used to spe- cialize merge or acquire. With this filter, the relation- ships between takeover and buy-out and between merge and acquire are either both discarded or accepted as synonymous. At this time we are unable to tell synonymous or near synonymous relationships from those which are priinarily complementary, e.g., man and woman. In ThEC-1 the impact of query expansion through term similarities on the system's overall performance was generally disappointing. For TREC-2 we have made a number of changes to the term cottelation model, but again time limitations prevented us from properly testing all options. Among the most important changes are: (1) Exclusion of pairs obtained from SUBJEGF-- VERB relations: we detennined that these con- texts are generally of litfie use as neither subject nor verb subeategorizes well for the other. More- over we observed that the presence of these pairs was the source of many unwanted term associa- tions.11 (2) Automatic pruning of low~ontent terms from the queries: terms with low idf weights, terms with low information contribution weights that are elements of compound terms, are removed from queries before database search. As we tuned various cutoff thresholds we noted that a significant increase in both recall and precision could be obtained. 12 Subject-Verb pairs were retained as eompound terms, however. 12 The Information Contribution Ineasure indicates the strength of j;j word pairings, and is defined as IC (x, fx,y]) = where f,~ is n,+d~-l the absolute frequency of pair [x,y] in the corpus, n, is the frequency of term x at the head position, and d~ is a dispersion parameter understood as the number of distinct Syntactic contexts in which term x is found. 129 word cluster takeover merge, buy-out, acquire, bid benefit compensate, aid, expense capital cash, jund, money staff personnel, employee, force attract lure, draw, woo sensitive crucial, difficult, critical speculate rumor, uncertainty, tension president director, executive, chairman vice deputy outlook forecast, prospect, trend law rule, policy, legislate, bill earnings profit, revenue, income portfolio asset, invest, loan inflate growth. demand, earnings industry business, company, market growth increase, rise, gain firm bank, concern, group, unit environ climate, condition, situation debt loan, secure, bond lawyer attorney counsel attorney, administrator, secretary compute machine, sofiware, equipment competitor rival, competition, buyer alliance partnership, venture, consortium big large, major, huge, significant fight battle, attack, war, challenge base facile, source, reserve, support shareholder creditor, customer, client investor, stockholder Table 1. Selected clusters obtained from syntactic contexts, derived from approx. 40 million words of WSJ text, with weighted Tanimoto formula.