IX-5


         The major procedures used for evaluation in the SMART system

are described elsewhere. [3,43    They are the recall-precision curve, and

four global measures:  rank recall, log precision, normalized recall, and

normalized precision.  The measures vary from 0 to 1, with 0 representing

the worst possible performance and 1 representing perfect performance.

These measures all reflect both recall and precision, requiring both

perfect recall and perfect precision to produce a measure of 1, but the

rank recall and normalized recall measures both reflect recall more than pre-

cision, while the log and normalized precision reflect precision more

strongly than recall.  The "quasi-Cleverdon" recall-precision curves shown

here are averaged recall-precision curves over the set of 42 requests.


3.   Results

         Table 1 shows the distribution of association pairs as a function

of word frequency, with a cosine correlation at a cutoff of .6.   It is

seen that the largest number of correlations occur for words of very low

frequency, frequencies 1 and 2.    With the correlation measure used, it is

very easy for low frequency words to co-occur significantly, since, if two

words of frequency 1 occur in the same document they will always have a

correlation of 1.0.  With a collection size of 200 documents, in which

1179 words occur only once, one may expect over 7000 correlations above

cutoff of words of frequency 1 with other words of frequency 1 purely on

a random basis.  If the words of frequency 2 are also considered, the total

number of random correlations above .6 would be expected to be about 12000.

It is clear therefore that the 18000 correlations observed do not actually