Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2

CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Conclusions chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. correct, and attempt to find the reasons why they should be as they are. It would be quite incorrect to suggest that no-one has previously argued in favour of single terms, natural language and coordmatio.n, for these were the bedrock of the Uniform System of coordinate indexing as originally propounded by the late Dr. Taube in 1951. But while the device of coordin- ation - or. as we would now term it, post-coordination - continues in favour, there are few who now accept (for Information Retrieval Systems) uncontrolled vocabularies, and some who insist additionally on the use of links and roles. Even Dr. Taube himself was, within a couple of years of the inception of the Uniterm System. to start devising associated maps, and there is no indic- ation, in the writings at that time of the group at Documentation Inc., of any awareness that the resultant increased recall would be more than offset by the lower precision. There are doubtless indexes in existence which follow the original Uniterm principles, but one of the few persons who has consistently, in print, advocated the use of natural language and coordination is Mr. Th. te Nuyl with his L'Unit6 System (Ref. 16 ). Even so, for most people L'Unit[OCRerr] System will be associated mainly with the ingenious coding system rather than the use of natural language. It is of interest to note that the clustering of the natural language terms into broad alphabetical groups (as in L'Unit[OCRerr]) brings about the confounding of word forms, so, possibly unintentionally, te Nuyl did adopt a coding device which was, it would appear from the results of this test, the only way to improve performance over natural language. Then there are, of course, permuted title indexes, which use the natural language of the title, but these can hardly be considered in [OCRerr]the same light, since they do not have the facilities of post-coordination. Therefore it is against these few that are ranged, for instance, the activities over the last fifty years of the Universal Decimal Classification, which is probably now more widely used than ever before. At the same time, a large number of national and international organisations are engaged in'constructing thesauri, while n[OCRerr]any groups in the research field are endeavouring to develop computer methods for the formation of classes of terms (e.g. Ref. 17). The effort that is put into these activities, by whichever process the classes may be formed, is presumably influenced by the widely held belief that it is only by such means that a high recall ratio is obtainable. Yet even in Cranfield I we reported that a recall ratio of 97% was possible merely by using the words in the titles. There was no way of knowing in that experiment the corresponding precision ratio, but it was not only assumed (correctly) that it would be very low, but it was also assumed that it would be lower than would have been the case if such a recall ratio had been obtained with a conventional index language. As far as this test is concerned, the latFer assumption would be unjustified; is it now reasonable to assume that the grouping of natural language terms to form controlled vocabularies, or the broadening of search strategies, must inevitably result in a loss in overall performance? We would certainly not make such a statement on the basis of this single test; however, .it would be surprising if the comparative test results were peculiar to the particular, environment of this test, and it does seem 3 i.