IRE
Information Retrieval Experiment
Retrieval system tests 1958-1978
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
246 Retrieval system tests I958[OCRerr]1978
Dll/Dl2:D21/D22:Mll/M12:M21/M22; and we can clearly continue, for
as many variables and values of each as we can identify.
In general retrieval system tests have exhibited biases in the way they have
approached this set of study possibilities. Much more attention has been paid
to the mechanism variables M than the data variables D. The mechanism
variables have been made explicit, the data one left implicit: in other words[OCRerr]
though test authors have often paid lip service to the possible influence of
their data variable values on their results, they have nevertheless tended tO
characterize the entire system performance in terms of the mechanism
variables studied; variable D has been left undifferentiated, while perhaps
several values of a single M variable, or a few values of several M variables,
have been examined. It has not, moreover, been open to third parties to put
different tests together on the grounds that while their data variable values
have differed their mechanism variable values have been the same, so
amalgamating the tests would permit the effects of data variation to be
examined: the mechanism variables have generally not been identically or
sufficiently similarly treated.
Some projects, like those of Salton and Sparck Jones, have begun to tackle
this problem by working with more than one data set; but it has to be
recognized (as the data details of Sparck Jones and Bates95 make plain) that
the characterization and control of data variables in these test series is much
less systematic even than that of the mechanism variables. It is moreover
generally the case that where the same data have been used by different
projects, the treatment ofthe mechanism variables has been too heterogeneous
for it to be possible to combine the test results to obtain information about an
extended set of mechanism variable values.
12.8 Methodological and substantive achievements
Thus if we accept that a proper understanding of retrieval Systems can be
achieved only with the aid of both a well-organized descriptive framework
and extensive series of experiments, each bearing on the other, and look now
at the evidence of the chapter survey, what methodological and substantive
progress has been made in achieving this understanding?
If we compare, say, Montague's test of 196519 with Evans' of 197561, 62, we
can detect some methodological improvements and a substantive develop-
ment: Montague's test was vitiated by the use of incomparable query sets
and incomparable document sets, i.e. the individual experiments in the group
could not be compared usefully with one another because the data sets used
differed. In many of them the query set used was also very small. Evans used
a constant set of queries and documents for a range of comparisons, and a
somewhat larger query set than any of Montague's. The substantive
development is represented by the shiff from document indexing, studied in
Montague's test, to query formulation, the focus of Evan's test. At the same
time, the difference between the tests is not as large as might be hoped for, in
methodological solidity or depth of understanding. While Montague's test is
open to criticism in mixing real and synthetic queries, in Evans' the amount
of output assessed for relevance per query was somewhat arbitrarily varied.
Again, while Montague's test explored a variety of rather arbitrarily related