IRE Information Retrieval Experiment An experiment: search strategy variations in SDI profiles chapter Lynn Evans Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Retrospect 311 some 7 phrases/item, or, about 15 singlet terms/item. It might be argued that this number may be small enough to be prejudicial against certain of the search strategies. No particular view is offered on this point other than that It was not felt to be the case during the experiment and does not seem to be obviously so. With the above qualification the following conclusions may be drawn from the experimental results: (1) As measured by information scientist effort expended on the purely intellectual aspects of profile compilation and modification, the simplest search strategy, CT (co-ordinate matching of terms without weights), occupied almost exactly half as much time as the most complex strategy, BW (boolean logic with weights). (2) The search strategies exhibiting the best retrieval performance were GWC (group-weight cumulation) and TWC (term-weight cumulation). In the boolean comparison of retrieval performance, strategy BW appeared to do very well but it is now considered that the method of evaluation was faulty and no conclusions are drawn concerning either of the boolean search strategies. The worst performer was strategy CT, always being in one of the last two positions. (3) Although the best retrieval performances were produced by strategies using weighting techniques, experience gained during the project in subjectively assigning weights to terms suggested that the majority of SDI users would not be particularly attracted to doing this task for themselves. (4) The most cost-effective strategy overall was CRTW (co-ordinate matching of restricted list of terms with weights). In terms of information scientist effort only, the most cost-effective strategies were CRTW, CT and TWC, and, although not strictly comparable, the least cost-effective was BW. In the secondary experiment comparing controlled-language and free- language boolean profiles, the former: (1) were compiled more quickly (given pre-knowledge of the controlled language); (2) comprised fewer search terms; and (3) showed comparable overall retrieval performance. Their main drawback is that the use of controlled language is not likely to appeal to those non-information workers who wish to prepare their own profiles. Although not evident in this study another factor which can work against controlled- language profiles is that in subject areas where new terminology is being introduced rapidly the controlled language may lag behind and be inadequate until updated. 14.5 Retrospect Looking back affer some five years the experiment is seen to have been in the mainstream of information retrieval research at the time. On the whole its methodology was based on established procedures and it also reflected the changing emphasis in retrieval experiments, viz. whereas in the 1960s the main interest had been in indexing languages, by the early 1970s the Concentration was on search techniques. With the growing interest in