Information Retrieval Experiment

IRE Information Retrieval Experiment An experiment: search strategy variations in SDI profiles chapter Lynn Evans Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 312 An experiment: se[OCRerr]rch strategy vd rid' tons in SDI profiles automatic indexing it is now being seen more clearly how interdependent arc indexing and searching methods. At various points in the above description some shortcomings of the original experiment have been mentioned. It may be useful to conclude by gathering together and discussing these defects and also those questions which were raised but remained unresolved. It is hoped that some activities were performed adequately but inevitably they are of less interest and will only be mentioned briefly. Those parts of the investigation which are considered to have been sound include: a very adequate document collection; a meaningful range of search strategies; a realistic profile compilation method involving standard tasks which allowed an accurate measure of the effort required from the information scientist on the different search strategies; a valid procedure for collecting relevance assessments; and the recruitment of the user group and the mechanics of the experiment in general. Less satisfactory areas include: the rather low number of queries; retrieval performance evaluation by the boolean comparison method; the absence of automatic term-weighting; the lightweight nature of the cost data; and the significance of the experimental results. Concerning the number of queries it is now considered (although nowhere proved) that perhaps twice the number of queries would have been more convincing' or, at least a number sufficient enough that the results of a few individual queries do not obtrude on the overall results. In our experiment this effect was exemplified by the differences observed when calculating by the two averaging methods, numbers and ratios. With a greater number of queries it would also have been possible to ignore those queries for which there were too few or, less importantly, too many relevant items in the collection. It is not clear what the implications of such a practice are but certainly the results would thereby be more reproducible. As has already been mentioned too many recall/precision ratios of the order 0/1, 1/1, etc., are not really acceptable. The problem could have been eased indirectly if a more drastic approach had been taken originally with some of the user interest statements. Those that clearly comprised more than one question could have been treated separately. This would have resulted in `cleaner' profiles of which fewer were overlong, some profile performances would probably have been subject to less extraneous influences, and the number of queries would have been[OCRerr] larger. Although a token number of the user statements were in fact split up more could have been and the experiment would have been better for it. At the time the view taken was that as little as possible should be done to change the conditions from that of `real life' and, since these were statements very like those received from users of an operational system, the less tampering the better. This is now deemed to have been misguided and to have done what is now suggested would not have affected the validity of the test in any way. The most disappointing outcome of the whole experiment was the failure to develop an acceptable method for comparing an optimum boolean strategy with any strategy producing a ranked output. A few simple examples quickly show the inappropriateness of using the boolean output itself as the basis for comparison. Very little can be offered in the way of a solution even now and