IRE Information Retrieval Experiment Retrieval system tests 1958-1978 chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 248 Retrieval system tests l958[OCRerr]l978 (1) that artificial indexing languages do not perform strikingly better th.'iii natural language; (2) that complex structured descriptions do not perform strikingly bettci than simple ones; (3) that the number of searching keys is more important than their individual quality; (4) that the characterization of queries is more important than that [OCRerr],l documents; (5) that formal properties of the data may be turned to advantage, as iii weighting schemes. But of course, as these statements all refer only to mechanism variable[OCRerr], they can have real meaning only by being related to their environment of' data parameters; and the main failure of information retrieval research hit% been in determining those environment properties significant for systelil operation and in establishing the relationship between data and mechanisni variables. Cleverdon'20 in 1971 maintained that `it is, in theory, possible to design and operate a system that will achieve a given satisfactory performance, at the least possible cost, in a particular environment'. But he also observes that while it is possible, in any given situation, to design all effective system, `a problem that is still unsolved is how it is possible to predicate exactly what a situation will be.. . . Designing for the hypothesised, but probably non-existent, "average" user, we may produce systems that satisfy no-one'. (pp.67-8) Some advance in this area since 1971 can in fact be detected: a good deal of rather crude evidence about systems has been gathered; and some system models have been proposed which have stood up to initial testing, for example the Robertson'21 and van Rusbergen"0 probabilistic theories. But it remains the case that our ignorance is large: to take a conspicuous instance, we have virtually no information about the real recall levels of large online search systems, or about real recall for many retrieval schemes investigated by research workers. 12.9 The current state of retrieval system understanding After an evaluative survey of the retrieval test literature, van de Water (`I al.122 concluded that the standards and content of tests were slightly higher than those found in a survey carried out five years earlier, but that information science was nowhere near established as a science. This is certainly true; but perhaps this is aiming too high too soon. A more reasonable question is whether retrieval research has any more modest, but nonetheless material, achievements to its credit. The best way of answering this question is to ask whether there have been any research results which have been applied to operational systems. Even allowing for some delay, one would hope that after five or ten years good research results could have had operational outcomes. Cleverdon'23 considered this question in 1976. Looking at the historical development of retrieval systems, he asked whether some more conspicuous research projects had contributed, either positively or negatively, to the