IRE Information Retrieval Experiment Gedanken experimentation: An alternative to traditional system testing? chapter William S. Cooper Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 200 Gedanken experimentation: An alternative to traditional system testing? guesswork more thoughtful and theory-guided than it might otherwise be- to make each guess resemble the physicists' analytical, theory-derived expert surmise more than it does a layman's initial hunch. The hope is that enlightened guesswork of this sort, though far from infallibly correct, is less likely to be mistaken than it would be in the absence of the aids proposed by the information scientist. Gedanken experimentation is not incompatible with classical, full-scale system testing, but if successful should reduce the need for it. Ideally, the two might be combined; that is, classical retrieval tests would be made to confirm that the theory-guided guesswork proposed by the information scientist does indeed yield better retrieval results than the traditional guesswork it is intended to replace. Perhaps a modest amount of full-scale testing of this sort is called for when a particular form of gedanken experimentation is first introduced. However, at least three considerations might cause a researcher to hesitate before undertaking extensive testing along these lines. First, since it is the results of `theory-guided' guesswork that are to be tested out, there is a priori reason to suppose, even in the absence of any tests, that the guesses are probably superior to traditional ways of guessing which have no explicit theoretical underpinnings. At least, the method is probably superior if its underlying rationale is sound, suggesting that it may be easier and more appropriate to undertake a critical examination of the theory than a large- scale empirical test. Secondly, as a practical matter there is likely to be a trade-off between test effort and effort spent in devising better techniques of gedanken experimentation: resources invested in the one will be lost to the other. Since gedanken techniques usually offer hope of improvement with little danger of making things worse, it would seem sensible to put the emphasis there. Finally, full-scale retrieval tests are difficult, expensive1 unreliable, and often inconclusive. This suggests that the research effort devoted to them might be better spent simply in developing design ideas (such as gedanken experimentation techniques) which have a theoretically defensible basis and implementing these ideas operationally, without bothering to test them out empirically at all. From a scientific point of view that may be a heretical suggestion, but information retrieval is more a technology than a science and, as has already been pointed out, technologies often progress faster via a process of inspired tinkering than through programmes of formal experimentation. In a broader perspective, what may be called for is a shiff in the information retrieval field's research priorities-a shift which may already be under way-from conventional trial-and-error testing of plausible but somewhat ad hoc systems to the generation of theoretically more soundly motivated design ideas. The sounder the theory behind a design idea, the less the need to test it out empirically. The development of gedanken techniques is an area which seems ripe for such idea generation. We classify it here as a design rather than a testing activity, because gedanken experiments are essentially attempts to make rational judgements about design parameters of various kinds. But carrying out a gedanken experiment often involves envisioning a trivial retrieval experiment of some sort, and in that special sense might be regarded as an (imaginary) testing activity moved back in time into the design stages. Whether it is regarded as a design activity or an unconventional I