IRE
Information Retrieval Experiment
Gedanken experimentation: An alternative to traditional system testing?
chapter
William S. Cooper
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Theory and experiment in information retrieval 201
experimental one, gedanken experimentation is an approach which offers
considerable hope of supplementing, and perhaps in many cases rendering
less necessary, classical retrieval testing.
11.1 Theory and experiment in information retrieval
If one were pressed to describe the central `theory' underlying document
retrieval, it would be hard to do much more than list the obvious conceptual
elements of the retrieval situation. A typical list would note that there must
be a collection of documents or records of some kind; a population of
potential searchers; that to provide them with search assistance it seems
necessary to isolate certain search properties of the documents (the
`descriptors' or `index terms') and of the searchers' information needs (usually
specified in the form of `requests' or `queries'); that rules for matching
information need properties against document properties (the `match
function' or `retrieval strategy') are also needed; and so forth. Although some
might be willing to dignify such an account with the name `theory', it is really
not so much a theory of retrieval as a review of the problem setting with
suggested terminology for discussing it. Occasionally a powerful bit of real
theory might surface, as for instance the theory of syntax in a scheme for
automatic indexing, or Boolean Logic in the specification of certain request
languages, but these have to do with special kinds of retrieval systems or
their components and do not constitute an overall theory of retrieval. In fact,
in the search for a general theory it is hard to do much better than to give
some elaboration of the vague rule that a system should retrieve for the user
those documents most likely to satisfy him. As scientific theories go this truism
is not very impressive, but it is the only wisp of general theory we have. What
was said of a recent political candidate can be said of document retrieval
theory: Deep down inside it's shallow.
Perhaps partly in recognition of this paucity of theory, many researchers
have turned to experimentation, and especially laboratory experimentation.
As might be expected, the classical experimental approach has been fairly
theory-independent, consisting essentially in the trying out of various
competing retrieval schemes (including indexing methods, etc.) to see which
seem to work best. The methodology involved has been ably documented in
other chapters of this book, and so need not be reviewed here except to note
that the difficulties to be met in drawing useful conclusions from a retrieval
experiment of classical design have turned out to be much more numerous
and serious than had been expected. There are sampling and other statistical
difficulties; difficulties in generalizing results obtained in just one or a few
test collections; difficulties in generalizing the needs of the test user
population, or in the absence of a real user population difficulties in assuring
the realism of manufactured requests; difficulties arising from the variability
and sensitivity to test conditions of the judgements of document relevance or
usefulness; difficulties in extrapolating results to real situations where
something about the system or the environment is bound to be different; and
difficulties arising from the interaction of various available features of the
retrieval rules under test which, if at all numerous, cannot as a practical