IRE
Information Retrieval Experiment
An experiment: search strategy variations in SDI profiles
chapter
Lynn Evans
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Retrospect 311
some 7 phrases/item, or, about 15 singlet terms/item. It might be argued that
this number may be small enough to be prejudicial against certain of the
search strategies. No particular view is offered on this point other than that
It was not felt to be the case during the experiment and does not seem to be
obviously so. With the above qualification the following conclusions may be
drawn from the experimental results:
(1) As measured by information scientist effort expended on the purely
intellectual aspects of profile compilation and modification, the simplest
search strategy, CT (co-ordinate matching of terms without weights),
occupied almost exactly half as much time as the most complex strategy,
BW (boolean logic with weights).
(2) The search strategies exhibiting the best retrieval performance were
GWC (group-weight cumulation) and TWC (term-weight cumulation).
In the boolean comparison of retrieval performance, strategy BW
appeared to do very well but it is now considered that the method of
evaluation was faulty and no conclusions are drawn concerning either of
the boolean search strategies. The worst performer was strategy CT,
always being in one of the last two positions.
(3) Although the best retrieval performances were produced by strategies
using weighting techniques, experience gained during the project in
subjectively assigning weights to terms suggested that the majority of
SDI users would not be particularly attracted to doing this task for
themselves.
(4) The most cost-effective strategy overall was CRTW (co-ordinate
matching of restricted list of terms with weights). In terms of information
scientist effort only, the most cost-effective strategies were CRTW, CT
and TWC, and, although not strictly comparable, the least cost-effective
was BW.
In the secondary experiment comparing controlled-language and free-
language boolean profiles, the former: (1) were compiled more quickly (given
pre-knowledge of the controlled language); (2) comprised fewer search terms;
and (3) showed comparable overall retrieval performance. Their main
drawback is that the use of controlled language is not likely to appeal to those
non-information workers who wish to prepare their own profiles. Although
not evident in this study another factor which can work against controlled-
language profiles is that in subject areas where new terminology is being
introduced rapidly the controlled language may lag behind and be inadequate
until updated.
14.5 Retrospect
Looking back affer some five years the experiment is seen to have been in the
mainstream of information retrieval research at the time. On the whole its
methodology was based on established procedures and it also reflected the
changing emphasis in retrieval experiments, viz. whereas in the 1960s the
main interest had been in indexing languages, by the early 1970s the
Concentration was on search techniques. With the growing interest in