IRE
Information Retrieval Experiment
Retrieval system tests 1958-1978
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
The decade 1958-1968 219
increasing specialization and growing volume of the literature. The felt
inadequacies of crude natural language indexing of the kind originally
represented by Taube's Uniterms, and the difficulties of replacing it by
anything more sophisticated done automatically, encouraged the character-
istic development of the 1960s: the large thesaurus. This was a human
product, used for manual indexing, but to provide index descriptions of
documents increasingly exploited in machine searching, or more accurately,
in machine scanning for document descriptions matching a human search
specification. The old professional dogma about the need for sophisticated
indexing, and the new economic fact about the potentialities of automated
databases, were combined in the large batch document retrieval systems
established during the middle 1960s.
The proposals for novel intellectual and technological approaches needed
testing by controlled experiment, or at least investigation. Thus quite apart
from automation, it was apparent that the new indexing methods, especially
post-coordination, whether applied with natural or a controlled indexing
language, should be compared with more established methods. This applied
to facetted classification as well, for example. The application of conventional
methods within an automated enviroment also called for studies, primarily
relating to costs. At the same time, the innovative approaches to automatic
indexing as well as searching required extensive testing, both for feasibility
and effectiveness. Thus during the first decade after 1958, experimental work
was primarily focused on tests comparing forms of manual indexing,
primarily in terms of the indexing languages used, on studies of the effects of
automation on systems involving manual indexing, and on wholly automatic
methods of document and request characterization and searching. However
most studies of systems involving automatic searching with manual indexing
were less studies of the effects of automation as such than studies of the
behaviour of indexing languages. Indeed the salient feature of the testing
done between 1958 and 1968 was its concern with indexing languages.
Most of the tests done in the period, and all of the major ones, therefore fall
into one or the other of two groups: one concerned with manual indexing
using manually constructed indexing languages, and the other with automated
indexing. The first group includes the various Cranfield tests 1-3, 6, Schuller's
test7, the Syntol work8, Altmann's9, Blagden's'0, and Shaw and Rothman's11
t:;j;de with post-coordinate indexing using a thesaurus led to a whole subgroup
projects, Lancaster's Medlars investigation12' 13, and the series of (Case)
Western Reserve University (CWRU) studies14' [OCRerr] The problems encoun-
tests on roles and links, including those of Sinnett16, Herner, Lancaster
Johanningsmeier1 7, Cohen, Lauer and Schwartz18, Montague19, and
Oot et al.20. The second group of tests, on automatic indexing, includes
conducted by Dale and Dale21, O'Connor22, Damerau23, Borko24,
Tague25, Melton26, Dennis27, and Stone and Rubino[OCRerr]8, by the Smart
Project29-32, and at A. D. Little33' 34, as well as the other research reported
Stevens, Heilprin and Guiliano35 and Stevens36. The tests of indexing
languages in the first group focused mainly on controlled languages, with
some work on simple natural language indexing. Those of the second were
sometimes concerned simply with automatic indexing, e.g. Dale and Dale's
and Dennis' tests, sometimes with comparison between automatic and
manual indexing, as in the Smart tests and Melton's, Damerau's and Borko's