The effect of using this information is shown in Table 5. The table demonstrates the
impact on system performance for Topic 22 of adsbal. Without any ordering we detected
          Table 5: Performance of T* as a Function of Output Ordering

                                                 Rel-Ret   Recall  Precision
                Tree/Output Ordering              @ 200    @ 200    @ 200

         T*: no additional ordering                28      0.2642   0.1400
         T*: ordering based on surrogate splits    43      0.4057   0.2150

28 relevant documents in the first 200, but with the ordering scheme just described we
were able to improve this to 43 in the first 200. This in turn translated into an increase of
14 points of recall and 7 points of precision at the 200 document point.

  The surrogate splits also give us some insights into the overall behavior of the tree
selection process as the number of training samples change. Thus although the optimal
tree for Topic 22 in adsba2 was:
         class 0    (0.050)
     drug<=0 .50
         class 1    (0.862)

with worse performance than the optimal tree for adsbal, when we look at the top three
surrogate splits we see that they are:
     cocaine<=0 .50     [0.89)
     coca<=0 .50        [0.88]
     government<=0 .5   [0.86)

That is although the optimal split for the augmented training set changed, we still see the
importance of the same set of word features, which in turn indicates are certain stability
in the underlying feature space. We might predict that as the training set increases in size
that we would see the splits also becoming more stable.

4.5. Commentary
  The TREC corpus represents a significant challenge for our system. Our previous
results with a small corpus, while encouraging, did not allow us to evaluate how well
the technique might do with realistically sized document collections. Our conclusion
based on the results we have from TREC is that CART does exhibit some interesting
behaviors on a realistic corpus, and that, despite the small size of the training sets and
the restricted choice of features, for some topics it produces competitive results. So
although the overall performance is moderate (relative to the better performing systems
at TREC), we believe that the absolute performance (given that the system is totally auto-

  9. As yet, there is no theoretical justificafion for this algorithm. It does however have the intuitive
  property that documents that satisfy additional splits get a higher score in proportion to the
  11power" of those splits.


                                         224