The effect of using this information is shown in Table 5. The table demonstrates the impact on system performance for Topic 22 of adsbal. Without any ordering we detected Table 5: Performance of T* as a Function of Output Ordering Rel-Ret Recall Precision Tree/Output Ordering @ 200 @ 200 @ 200 T*: no additional ordering 28 0.2642 0.1400 T*: ordering based on surrogate splits 43 0.4057 0.2150 28 relevant documents in the first 200, but with the ordering scheme just described we were able to improve this to 43 in the first 200. This in turn translated into an increase of 14 points of recall and 7 points of precision at the 200 document point. The surrogate splits also give us some insights into the overall behavior of the tree selection process as the number of training samples change. Thus although the optimal tree for Topic 22 in adsba2 was: class 0 (0.050) drug<=0 .50 class 1 (0.862) with worse performance than the optimal tree for adsbal, when we look at the top three surrogate splits we see that they are: cocaine<=0 .50 [0.89) coca<=0 .50 [0.88] government<=0 .5 [0.86) That is although the optimal split for the augmented training set changed, we still see the importance of the same set of word features, which in turn indicates are certain stability in the underlying feature space. We might predict that as the training set increases in size that we would see the splits also becoming more stable. 4.5. Commentary The TREC corpus represents a significant challenge for our system. Our previous results with a small corpus, while encouraging, did not allow us to evaluate how well the technique might do with realistically sized document collections. Our conclusion based on the results we have from TREC is that CART does exhibit some interesting behaviors on a realistic corpus, and that, despite the small size of the training sets and the restricted choice of features, for some topics it produces competitive results. So although the overall performance is moderate (relative to the better performing systems at TREC), we believe that the absolute performance (given that the system is totally auto- 9. As yet, there is no theoretical justificafion for this algorithm. It does however have the intuitive property that documents that satisfy additional splits get a higher score in proportion to the 11power" of those splits. 224