are none of these in the current example), and lower values
to features outside the optimal tree.8 At this point the spe-
cific values chosen represent our "best guess" at a weighting
scheme, further experimentation Will undoubtedly reveal a
better strategy. As in the first canonical form, the overall
weight for the TOPIC tree is based on the cross-validation
rate for the maximal tree.


4 The TREC-2 Experiments

      For TREC-2 we again focused only on the docu-
ment routing problem. Since our technique requires training
data it does not easily lend itseff to the ad hoc retrieval
problem and so rather than "force-fit" it we chose to gener-
ate four sets of results for the routing queries (topics 51-
100). Fach set of results was generated totally automati-
cally. The results sets are labell~ adsl, ads2, ads3, and
ads4, and the table below shows to which combinations of
features and TOPIC models they correspond.

             Table 1: Results Identification

   Result Set     Word Features    TOPIC Model

     adsl           stemmed          model-i

     ads2           unstemmed        model-i

     ads3           stemmed          model-2

     ad~            unstemmed        model-2

     and two sets of training vectors labelled with the
     ground truth information.9 Since CART is a statisti-
     cally-oriented classifier, we decided to minimize the
     "noise" in the training sets by using only the Wall
     Street Journal articles identified in the qrel files. Fur-
     ther, for all but topics 80 and 81, we used just the
     Wall Street Journal articles on Disk 2.
  *  Second, we grew the CART trees from this training
     data. Since we had two sets of training data for each
     topic, we grew two trees for each topic.
  *  Third, we used the algorithms described in Section 3
     to convert the CART trees into a TOPIC readable
     form. This produced four TOPIC definitions for each
     of the information need statements. Table 1 above
     shows the various combinations.
  *  Fourth, we ran the TOPIC definitions against the
     indexed unseen data. 10 Again, to minimize noise
     effects, we used only the Associated Press articles on
     Disk 3 to generate our official results.
  *  Fifth, we sorted and merged the results generated by
     TOPIC and converted them into the TREC format for
     scoring by NIST.

4.2 Discussion of Official Results

       The official results for adsl and ads2, together with
the unofficial results for ads3 and adA, are shown in
Table 2.

      Although we generated four sets of results, the
resource constraints at MST resulted in only adsl and ads2
being officially scored. Reference in the remainder of the
paper to scores associated with ads3 and adA are to the
unolficial score generated by us using the TREC-2 scoring
program and the published qrels for the routing topics.

4.1 The Experimental Procedure

      The experimental procedure for TRBC-2 consists
of five basic steps. We briefly describe each of these:
 *  First, we generated the CART training data from the
    information need statements and the ground truth
    files (i.e., the qrels) provided by NIST. This produced
    two feature sets for each topic (corresponding to the
    stemmed and unstemmed versions of the features),

8. A variable is in the optimal tree if its k-value is greater than k*;
is on the fringe if k=k*; and outside the optimal tree if k~*. Note
that in general the individual features appear at multiple locations
in the tree. Our strategy is to remove duplicates by retaining the
instance with the highest k-value.


                                        258

        Table 2: TREC-2 Results (AP Only)
 Run     No.     No.     Rel.    A~      Exact

  ID     Retr.   Rel.    Ret.   Prec.    Prec.

 adsi   40,423   5,677    822   0.0195   0.0390

 ads2   33,034   5,677   1,468  0.0821   0.1092

 ads3   49,006   5,677   1,182  0.0168   0.0374

 adA    50,000   5,677   1,847  0.0630   0.0868

       The first observation is that the trees built using
exact words as features (i.e., results ads2 and adA) had
higher precision than those built using word stems. We


9. The feature specification and extraction procedure we used is
identical to that used in ThEC-1 and is described in detail in the
ll~EC-l proceedings. The only differences are the addition of a
stemmed version of the features and the fact that we do not make
use of the feature count information.
10. We are grateful to Verity Inc. for allowing us to have access to
their computer systems and databases.