CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Simulated ranking and document output cut-off
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 192 -
CHAPTER 5
Simulated ranking and document output cut-off
There is confusion of ends and means in this type of attack upon
measurement in principle. Perhaps if medicine threw away the
thermometer, the eneephalograph, the X-ray, and all other
technicalities, inedicine would become much more human! How
much more preferable the tender hand on the brow than a nasty
piece of glass in the mouth - how inhuman! But is it sympathy
and fellow-feeling that we want from the physician or a technical
competence to identify the condition and give us the cure? The
bedside manner still has a place in the cure, even although the
hand on the brow has been replaced by the thermometer.
L.T. Wilkins: Social Deviance, page 9
With all the results so far given, the presentation has been on the basis
of coordination level cut-offs. The reader is invited to consider the same
test results, but now presented on the basis of a simulated ranking order and a
document output cut-off, tn Chapter 3, one of the main problems considered
was that of totalling the results of a set of questions that was heterogenous in
having different numbers of starting terms and matching terms. Several
solutions were considered, but only brief mention was made of one possible
method, namely document output cut-off. Although this method was reeognised
as having many advantages, it was decided not to use it for the main test
results; this was partly because of the additional effort required to obtain the
necessary prerequisite of a ranking order, but also because it would have
involved a transformation of the test results as actually obtained by the
co-ordination level cut-off. At a later date a simpler method of deriving a
simulated ranking order was found and, in trying this out, it was shown that
there was a possibility of obtaining an 'area measure' which could be used for
producing an order of performance effectiveness for the different index languages.
Therefore, the majority of the test searches were converted to a simulated
ranking order, and in this chapter the results are presented by the document
output cut-off method.
The influence of the SMART system was mainly responsible for our
original investigation into attempting to obtain a ranked output for the Cranfield
test searches. In the SMART system, the output of a search is arranged in an
order of decreasing correlation with the search question; this is established by
each document having a scoring that is obtained by calculations based on the
match between the request terms and the document terms in the particular
dictionary being tested. Thus every document in the collection is assigned a
rank order number, the rank position reflecting the correlation with the search
system. A sample output from the SMART system, showing the results for
Question 147 searched on the Cranfie!d 200 document collection for fourteen
different options, is given in Fig. 5.1. This output sheet shows, for each of the
fourteen options, the file numbers of the fifteen highest ranked documents and
also the rank numbers of the five documents which are relevant to this particular
question. The heading at the top of each section refers to the particular option
being tested, and it can be seen that, with 'ABSTR OLD QSt, for instance, the
five relevant documents, Nos. 708, 711. 713, 712 and 709 were ranked 21, 32,
68, 76 and 122 respectively.
In Fig, 5.2. are shown the conventional search results for 42 questions
by Index Language I.l.a, and these are set out in coordination levels.