IRE
Information Retrieval Experiment
Laboratory tests of manual systems
chapter
E. Michael Keen
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
150 Laboratory tests of manual Systems
were free to select and reject entries individually and did not have to accept
every entry a given heading led to.
The resulting `selected' precision ratios not unexpectedly favoured the one
index that did not contain abstracts because swiff title scanning was
unrecordable. The remaining five indexes had remarkably similar precision
levels. The INSPEC printed index testers chose recall and time as their main
measures, so it can be suggested that time is a suitable replacement for
precision in these circumstances.
A plot from the Off-shelf test16 of relevant selected against time is given in
Figure 8.6: we may regard this plot as the printed index testers equivalent of
the `Cranfield' recall/precision plot.
U
L
U
L
3C
2C
C
0.5
LISA x
LL 0 I
ISA *
BS 0
RZI +
CCA U
+
U [OCRerr]
6 9 12 15
Seorch [OCRerr]ime (mm)
Figure 8.6. Results of information science searches of six indexes in
the Aberystwyth off-shelf test, taken from Figure 4.2 in Keen16
However it could be argued that the criterion of search time is conceptually
distinct from non-relevant entries, and it is the recording process that fails: a
longer time spent on an index search may not mean more irrelevant entries
are encountered. So better methods were used in EPSILON to measure both
precision and time, but again quite different index types exhibited similar
precision results. Specifically the selected precision results of the comparable
set of five Off-shelf indexes fell in the range 43-52 per cent, in EPSILON
search tests [OCRerr]52 per cent and in EPSILON scanning tests [OCRerr]54 per cent.
The suggested explanation is that there is a natural level of precision where
searchers' tolerance for examining irrelevant is the governing factor1
whatever the system type. This kind of result was also seen in WUSCS, with
precision 43[OCRerr]8 per cent22, but may not be limited to heuristic systems, as