IRE Information Retrieval Experiment Gedanken experimentation: An alternative to traditional system testing? chapter William S. Cooper Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 208 Gedanken experimentation: An alternative to traditional system testing? refer to such data-gathering as an experiment, since the aim is merely to obtain crude estimates of certain statistics rather than to test anything. Here is the kind of small-scale empirical investigation by which many conventional `What-works-best?' tests might well be replaced, if indeed there is to be any experimentation at all beyond gedanken experimentation. 11.4 Further remarks These examples by no means exhaust the possibilities inherent in an approach based on probability and utility theory and gedanken or small-scale experimentation. There are ways of combining gedanken weighted indexing with gedanken weighted requesting, of constructing thesauri which weight relationships among terms probabilistically by thought experiment, of translating boolean requests into probabilistically weighted ones, and so on. One of the most far reaching advantages of the probabilistic approach to system design is that it provides a natural means of combining large numbers of weak clues. Many kinds of evidence could be brought to bear in ordering system output that are not exploited in conventional systems, but which it would be natural to utilize in a probabilisitc system. Among them are the many kinds of relatively weak clues available even before a request is received, e.g. document recency, citedness, language, level of technicality1 form of publication, and so on. These could all be used with low weights as part of the probability computations and would for many kinds of requests be apt to bring about greatly improved retrieval. Known-work searches on the basis of non-standard clue-types constitute another possible application1 2 There is much scope for further investigation in this area. 11.5 Summary When a retrieval system design is explicitly probabilistic or utility-theoretic, its parameters are endowed with a clear meaning which makes their estimation a fit subject for gedanken experimentation or in some cases small- scale statistical estimation techniques. Since by virtue of the statistical theory embodied in them such systems are known a priori to make optimal or near- optimal use of the data at their disposal, comparative tests among whole systems of this kind may be largely replaceable by tests of the accuracy of their associated input data estimation methods, or in obvious cases by simple judgements of which of these estimation methods is probably most accurate. This suggests as potentially advantageous an approach to information retrieval research which (1) emphasizes the discovery of explicitly probabil- istic or utility-theoretic retrieval system designs; (2) emphasizes the development of improved input estimation methods including gedanken experimentation techniques; and (3) de-emphasizes the role of traditional comparative system tests in favour of restricted data-gathering aimed at measuring error of estimation in the input data. Gedanken experimentation, as opposed to actual data-gathering, is apt in general to be most valuable where decisions must be taken quickly, frequently, and with a minimum of fuss. Indexing and request-weighting