CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Testing Techniques
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 100 -
on which it was retrieved are decoded as Property and Hypersonic. Intellect
was put in on later searches, to eliminate such unwanted combinations.
Contrary to the example shown in Fig. 6.7, in practice the score sheets for a
question rarely recorded documents with only one search term present, since this
would usually have involved recording the large majority of the documents in the
collection. The decision as to what coordination score to begin recording documents
varied for each question, depending partly on the number of starting terms in the
question. The objective was to examine an average of about 100 documents from the
collection (involving two or three score sheets), and this decision was fairly easily made
by looking at the density of postings on the search sheets. In some cases, when postings
were very heavy, a proportion of the collection only was examined (e. g. if half the
collection, the odd or even numbered documents only, etc. }, and the results scaled up.
This was done to reduce the large clerical effort involved in searching so many questions
this way (involving looking at nearly 400,000 ,documents' on the search sheets in this
first series of tests alone}, but was only done when the results were statistically valid.
An exception to this was that the relevant documents were always fully recorded.
To obtain the final results for a question, the documents which had been assessed
as relevant were recorded on a separate score sheet, and deleted from those first
produced. The base document for the question being tested was deleted altogether
at this stage. Then the actual numbers of relevant and non-relevant documents
were totalled up, a separate total being obtained for each index language, at all
coordination levels and at each exhaustivity level. The final record is seen on
a Results Sheet, (Fig. 6.8}. Here, for question 181, it is noted that the Search
rule is type A which, as stated previously, allowed any combination of terms to be
accepted; the question has 7 starting terms. The search sheets were examined
for all documents having a coordination score of 3 or more, and there are two
relevant documents sought in this question. Three tables of figures are given, for
the three levels of exhaustivity, each table recording the coordination score and
language variables. For example, using the highly exhaustive indexing {weights
5-10), a three term coordination score using language 3 retrieves both of the relevant
documents, and 60 non-relevant documents. At the next level of exhaustivity
{weights 7-10), the non-relevant documents drop to 45; at the lowest level of
exhaustivity, the non-relevant documents drop to 10. In this case the recall is
maintained throughout, but with index language 6, for instance, at a coordination
score of 4, the effect of moving from high exhaustivity to low exhaustivity is to lose
the one relevant document retrieved. It will'be noticed that no non-relevant figures
are given for coordination scores 1+ and 2+, although the relevant documents are
shown here. In general, an attempt was made to cut down the clerical effort by
ignoring the count of non-relevant documents when the precision ratio was less than
3%, although, as will be recounted in the next volume, some [OCRerr]ampling was done at
these low precision levels. The figures obtained from this particular question are
then ready to be totalled with those from other questions to provide results for a set
of questions. This, and the various methods for arriving at these totals, will be
considered in the next volume.
There were many additional tests, in which were investigated the effect of such
matters as the single term hierarchies, the set of concept languages, again incor-
porating the various recall devices such as alphabetical and hierarchical grouping.
and also the various searches with controlled terms. These othcr tests meant, of
course, that the preparation of the question-indexes had to be commenced frol[OCRerr] the
beginning. For instance, the single-terra hierarchies resulted in a group of terms