ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text

CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Testing Techniques chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 100 - on which it was retrieved are decoded as Property and Hypersonic. Intellect was put in on later searches, to eliminate such unwanted combinations. Contrary to the example shown in Fig. 6.7, in practice the score sheets for a question rarely recorded documents with only one search term present, since this would usually have involved recording the large majority of the documents in the collection. The decision as to what coordination score to begin recording documents varied for each question, depending partly on the number of starting terms in the question. The objective was to examine an average of about 100 documents from the collection (involving two or three score sheets), and this decision was fairly easily made by looking at the density of postings on the search sheets. In some cases, when postings were very heavy, a proportion of the collection only was examined (e. g. if half the collection, the odd or even numbered documents only, etc. }, and the results scaled up. This was done to reduce the large clerical effort involved in searching so many questions this way (involving looking at nearly 400,000 ,documents' on the search sheets in this first series of tests alone}, but was only done when the results were statistically valid. An exception to this was that the relevant documents were always fully recorded. To obtain the final results for a question, the documents which had been assessed as relevant were recorded on a separate score sheet, and deleted from those first produced. The base document for the question being tested was deleted altogether at this stage. Then the actual numbers of relevant and non-relevant documents were totalled up, a separate total being obtained for each index language, at all coordination levels and at each exhaustivity level. The final record is seen on a Results Sheet, (Fig. 6.8}. Here, for question 181, it is noted that the Search rule is type A which, as stated previously, allowed any combination of terms to be accepted; the question has 7 starting terms. The search sheets were examined for all documents having a coordination score of 3 or more, and there are two relevant documents sought in this question. Three tables of figures are given, for the three levels of exhaustivity, each table recording the coordination score and language variables. For example, using the highly exhaustive indexing {weights 5-10), a three term coordination score using language 3 retrieves both of the relevant documents, and 60 non-relevant documents. At the next level of exhaustivity {weights 7-10), the non-relevant documents drop to 45; at the lowest level of exhaustivity, the non-relevant documents drop to 10. In this case the recall is maintained throughout, but with index language 6, for instance, at a coordination score of 4, the effect of moving from high exhaustivity to low exhaustivity is to lose the one relevant document retrieved. It will'be noticed that no non-relevant figures are given for coordination scores 1+ and 2+, although the relevant documents are shown here. In general, an attempt was made to cut down the clerical effort by ignoring the count of non-relevant documents when the precision ratio was less than 3%, although, as will be recounted in the next volume, some [OCRerr]ampling was done at these low precision levels. The figures obtained from this particular question are then ready to be totalled with those from other questions to provide results for a set of questions. This, and the various methods for arriving at these totals, will be considered in the next volume. There were many additional tests, in which were investigated the effect of such matters as the single term hierarchies, the set of concept languages, again incor- porating the various recall devices such as alphabetical and hierarchical grouping. and also the various searches with controlled terms. These othcr tests meant, of course, that the preparation of the question-indexes had to be commenced frol[OCRerr] the beginning. For instance, the single-terra hierarchies resulted in a group of terms