CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Conclusions
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
correct, and attempt to find the reasons why they should be as they are.
It would be quite incorrect to suggest that no-one has previously argued
in favour of single terms, natural language and coordmatio.n, for these were
the bedrock of the Uniform System of coordinate indexing as originally
propounded by the late Dr. Taube in 1951. But while the device of coordin-
ation - or. as we would now term it, post-coordination - continues in favour,
there are few who now accept (for Information Retrieval Systems) uncontrolled
vocabularies, and some who insist additionally on the use of links and roles.
Even Dr. Taube himself was, within a couple of years of the inception of the
Uniterm System. to start devising associated maps, and there is no indic-
ation, in the writings at that time of the group at Documentation Inc., of any
awareness that the resultant increased recall would be more than offset by
the lower precision.
There are doubtless indexes in existence which follow the original
Uniterm principles, but one of the few persons who has consistently, in
print, advocated the use of natural language and coordination is Mr. Th.
te Nuyl with his L'Unit6 System (Ref. 16 ). Even so, for most people L'Unit[OCRerr]
System will be associated mainly with the ingenious coding system rather
than the use of natural language. It is of interest to note that the clustering
of the natural language terms into broad alphabetical groups (as in L'Unit[OCRerr])
brings about the confounding of word forms, so, possibly unintentionally,
te Nuyl did adopt a coding device which was, it would appear from the
results of this test, the only way to improve performance over natural
language.
Then there are, of course, permuted title indexes, which use the
natural language of the title, but these can hardly be considered in [OCRerr]the same
light, since they do not have the facilities of post-coordination.
Therefore it is against these few that are ranged, for instance, the
activities over the last fifty years of the Universal Decimal Classification,
which is probably now more widely used than ever before. At the same
time, a large number of national and international organisations are
engaged in'constructing thesauri, while n[OCRerr]any groups in the research field
are endeavouring to develop computer methods for the formation of classes
of terms (e.g. Ref. 17).
The effort that is put into these activities, by whichever process the
classes may be formed, is presumably influenced by the widely held
belief that it is only by such means that a high recall ratio is obtainable.
Yet even in Cranfield I we reported that a recall ratio of 97% was possible
merely by using the words in the titles. There was no way of knowing in
that experiment the corresponding precision ratio, but it was not only
assumed (correctly) that it would be very low, but it was also assumed that
it would be lower than would have been the case if such a recall ratio had
been obtained with a conventional index language.
As far as this test is concerned, the latFer assumption would be
unjustified; is it now reasonable to assume that the grouping of natural
language terms to form controlled vocabularies, or the broadening of
search strategies, must inevitably result in a loss in overall performance?
We would certainly not make such a statement on the basis of this
single test; however, .it would be surprising if the comparative test results
were peculiar to the particular, environment of this test, and it does seem
3
i.