IRE
Information Retrieval Experiment
Gedanken experimentation: An alternative to traditional system testing?
chapter
William S. Cooper
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
200 Gedanken experimentation: An alternative to traditional system testing?
guesswork more thoughtful and theory-guided than it might otherwise be-
to make each guess resemble the physicists' analytical, theory-derived expert
surmise more than it does a layman's initial hunch. The hope is that
enlightened guesswork of this sort, though far from infallibly correct, is less
likely to be mistaken than it would be in the absence of the aids proposed by
the information scientist.
Gedanken experimentation is not incompatible with classical, full-scale
system testing, but if successful should reduce the need for it. Ideally, the two
might be combined; that is, classical retrieval tests would be made to confirm
that the theory-guided guesswork proposed by the information scientist does
indeed yield better retrieval results than the traditional guesswork it is
intended to replace. Perhaps a modest amount of full-scale testing of this sort
is called for when a particular form of gedanken experimentation is first
introduced. However, at least three considerations might cause a researcher
to hesitate before undertaking extensive testing along these lines. First, since
it is the results of `theory-guided' guesswork that are to be tested out, there is
a priori reason to suppose, even in the absence of any tests, that the guesses
are probably superior to traditional ways of guessing which have no explicit
theoretical underpinnings. At least, the method is probably superior if its
underlying rationale is sound, suggesting that it may be easier and more
appropriate to undertake a critical examination of the theory than a large-
scale empirical test. Secondly, as a practical matter there is likely to be a
trade-off between test effort and effort spent in devising better techniques of
gedanken experimentation: resources invested in the one will be lost to the
other. Since gedanken techniques usually offer hope of improvement with
little danger of making things worse, it would seem sensible to put the
emphasis there. Finally, full-scale retrieval tests are difficult, expensive1
unreliable, and often inconclusive. This suggests that the research effort
devoted to them might be better spent simply in developing design ideas
(such as gedanken experimentation techniques) which have a theoretically
defensible basis and implementing these ideas operationally, without
bothering to test them out empirically at all. From a scientific point of view
that may be a heretical suggestion, but information retrieval is more a
technology than a science and, as has already been pointed out, technologies
often progress faster via a process of inspired tinkering than through
programmes of formal experimentation.
In a broader perspective, what may be called for is a shiff in the
information retrieval field's research priorities-a shift which may already
be under way-from conventional trial-and-error testing of plausible but
somewhat ad hoc systems to the generation of theoretically more soundly
motivated design ideas. The sounder the theory behind a design idea, the less
the need to test it out empirically. The development of gedanken techniques
is an area which seems ripe for such idea generation. We classify it here as a
design rather than a testing activity, because gedanken experiments are
essentially attempts to make rational judgements about design parameters of
various kinds. But carrying out a gedanken experiment often involves
envisioning a trivial retrieval experiment of some sort, and in that special
sense might be regarded as an (imaginary) testing activity moved back in
time into the design stages.
Whether it is regarded as a design activity or an unconventional
I