Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2

CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Simulated ranking and document output cut-off chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 192 - CHAPTER 5 Simulated ranking and document output cut-off There is confusion of ends and means in this type of attack upon measurement in principle. Perhaps if medicine threw away the thermometer, the eneephalograph, the X-ray, and all other technicalities, inedicine would become much more human! How much more preferable the tender hand on the brow than a nasty piece of glass in the mouth - how inhuman! But is it sympathy and fellow-feeling that we want from the physician or a technical competence to identify the condition and give us the cure? The bedside manner still has a place in the cure, even although the hand on the brow has been replaced by the thermometer. L.T. Wilkins: Social Deviance, page 9 With all the results so far given, the presentation has been on the basis of coordination level cut-offs. The reader is invited to consider the same test results, but now presented on the basis of a simulated ranking order and a document output cut-off, tn Chapter 3, one of the main problems considered was that of totalling the results of a set of questions that was heterogenous in having different numbers of starting terms and matching terms. Several solutions were considered, but only brief mention was made of one possible method, namely document output cut-off. Although this method was reeognised as having many advantages, it was decided not to use it for the main test results; this was partly because of the additional effort required to obtain the necessary prerequisite of a ranking order, but also because it would have involved a transformation of the test results as actually obtained by the co-ordination level cut-off. At a later date a simpler method of deriving a simulated ranking order was found and, in trying this out, it was shown that there was a possibility of obtaining an 'area measure' which could be used for producing an order of performance effectiveness for the different index languages. Therefore, the majority of the test searches were converted to a simulated ranking order, and in this chapter the results are presented by the document output cut-off method. The influence of the SMART system was mainly responsible for our original investigation into attempting to obtain a ranked output for the Cranfield test searches. In the SMART system, the output of a search is arranged in an order of decreasing correlation with the search question; this is established by each document having a scoring that is obtained by calculations based on the match between the request terms and the document terms in the particular dictionary being tested. Thus every document in the collection is assigned a rank order number, the rank position reflecting the correlation with the search system. A sample output from the SMART system, showing the results for Question 147 searched on the Cranfie!d 200 document collection for fourteen different options, is given in Fig. 5.1. This output sheet shows, for each of the fourteen options, the file numbers of the fifteen highest ranked documents and also the rank numbers of the five documents which are relevant to this particular question. The heading at the top of each section refers to the particular option being tested, and it can be seen that, with 'ABSTR OLD QSt, for instance, the five relevant documents, Nos. 708, 711. 713, 712 and 709 were ranked 21, 32, 68, 76 and 122 respectively. In Fig, 5.2. are shown the conventional search results for 42 questions by Index Language I.l.a, and these are set out in coordination levels.