IRE Information Retrieval Experiment Retrieval system tests 1958-1978 chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. The decade 1958-1968 219 increasing specialization and growing volume of the literature. The felt inadequacies of crude natural language indexing of the kind originally represented by Taube's Uniterms, and the difficulties of replacing it by anything more sophisticated done automatically, encouraged the character- istic development of the 1960s: the large thesaurus. This was a human product, used for manual indexing, but to provide index descriptions of documents increasingly exploited in machine searching, or more accurately, in machine scanning for document descriptions matching a human search specification. The old professional dogma about the need for sophisticated indexing, and the new economic fact about the potentialities of automated databases, were combined in the large batch document retrieval systems established during the middle 1960s. The proposals for novel intellectual and technological approaches needed testing by controlled experiment, or at least investigation. Thus quite apart from automation, it was apparent that the new indexing methods, especially post-coordination, whether applied with natural or a controlled indexing language, should be compared with more established methods. This applied to facetted classification as well, for example. The application of conventional methods within an automated enviroment also called for studies, primarily relating to costs. At the same time, the innovative approaches to automatic indexing as well as searching required extensive testing, both for feasibility and effectiveness. Thus during the first decade after 1958, experimental work was primarily focused on tests comparing forms of manual indexing, primarily in terms of the indexing languages used, on studies of the effects of automation on systems involving manual indexing, and on wholly automatic methods of document and request characterization and searching. However most studies of systems involving automatic searching with manual indexing were less studies of the effects of automation as such than studies of the behaviour of indexing languages. Indeed the salient feature of the testing done between 1958 and 1968 was its concern with indexing languages. Most of the tests done in the period, and all of the major ones, therefore fall into one or the other of two groups: one concerned with manual indexing using manually constructed indexing languages, and the other with automated indexing. The first group includes the various Cranfield tests 1-3, 6, Schuller's test7, the Syntol work8, Altmann's9, Blagden's'0, and Shaw and Rothman's11 t:;j;de with post-coordinate indexing using a thesaurus led to a whole subgroup projects, Lancaster's Medlars investigation12' 13, and the series of (Case) Western Reserve University (CWRU) studies14' [OCRerr] The problems encoun- tests on roles and links, including those of Sinnett16, Herner, Lancaster Johanningsmeier1 7, Cohen, Lauer and Schwartz18, Montague19, and Oot et al.20. The second group of tests, on automatic indexing, includes conducted by Dale and Dale21, O'Connor22, Damerau23, Borko24, Tague25, Melton26, Dennis27, and Stone and Rubino[OCRerr]8, by the Smart Project29-32, and at A. D. Little33' 34, as well as the other research reported Stevens, Heilprin and Guiliano35 and Stevens36. The tests of indexing languages in the first group focused mainly on controlled languages, with some work on simple natural language indexing. Those of the second were sometimes concerned simply with automatic indexing, e.g. Dale and Dale's and Dennis' tests, sometimes with comparison between automatic and manual indexing, as in the Smart tests and Melton's, Damerau's and Borko's