Information Retrieval Experiment

IRE Information Retrieval Experiment Laboratory tests of manual systems chapter E. Michael Keen Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 150 Laboratory tests of manual Systems were free to select and reject entries individually and did not have to accept every entry a given heading led to. The resulting `selected' precision ratios not unexpectedly favoured the one index that did not contain abstracts because swiff title scanning was unrecordable. The remaining five indexes had remarkably similar precision levels. The INSPEC printed index testers chose recall and time as their main measures, so it can be suggested that time is a suitable replacement for precision in these circumstances. A plot from the Off-shelf test16 of relevant selected against time is given in Figure 8.6: we may regard this plot as the printed index testers equivalent of the `Cranfield' recall/precision plot. U L U L 3C 2C C 0.5 LISA x LL 0 I ISA * BS 0 RZI + CCA U + U [OCRerr] 6 9 12 15 Seorch [OCRerr]ime (mm) Figure 8.6. Results of information science searches of six indexes in the Aberystwyth off-shelf test, taken from Figure 4.2 in Keen16 However it could be argued that the criterion of search time is conceptually distinct from non-relevant entries, and it is the recording process that fails: a longer time spent on an index search may not mean more irrelevant entries are encountered. So better methods were used in EPSILON to measure both precision and time, but again quite different index types exhibited similar precision results. Specifically the selected precision results of the comparable set of five Off-shelf indexes fell in the range 43-52 per cent, in EPSILON search tests [OCRerr]52 per cent and in EPSILON scanning tests [OCRerr]54 per cent. The suggested explanation is that there is a natural level of precision where searchers' tolerance for examining irrelevant is the governing factor1 whatever the system type. This kind of result was also seen in WUSCS, with precision 43[OCRerr]8 per cent22, but may not be limited to heuristic systems, as