CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Indexing Procedures
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
48
Hierarchical linkage may be implemented in a post coordinate system by gene-
ric posting, by coding based on a form of semantic factoring as in the WI[OCRerr]U system,
or by search strategies based on a classification schedule or thesaurus. In a pre-
coordinate system it may be achieved to a large degree by the file arrangement, as
in the systematic juxtaposition of related classes in a classified file; it may be achieved
fragmentarily by inversion of subject heading in an Alphabetical subject catalogue
(e. g., Drag, Base; Drag, Form; Drag, Induced; etc. ) or by search strategies based
on a syndetic network of see also references.
Measuring the performance of index devices
In order to establish recall and precision performance figures for the different
devices, both singly and in various combinations, it was first of all desirable that
we established as far as possible figures for indexing in which none of the devices
was operating. Then it would be possible to determine the impact on these figures
of the introduction of each device in turn. This assumes, of course, a test collection
and a set of questions to be put to it, where it is known just what documents are rele-
vant to each question, as described in the previous chapter.
Performance figures for an 'unindexed, collection seemed to imply a situation
in which the complete text of each item in the collection was searched for each ques-
tion. This would have been too tedious an operation (although something like it, ex-
cept that it was on a small scale, using computer facilities, has been described by
Swanson (Ref.18)). The alternative which we decided to take, was to use, as the
base situation, one in which the simplest known indexing device was used and to
measure the impact on this of all the other devices. This simplest device was taken
to be that of condensation of the full text into an index language consisting solely of
the 'uniterms, thrown up by the title and text of the document itself, quite uncon-
trolled by any prior index language.
So the first step was to establish, by the indexing of the test documents, a crude,
elemental index language from which all the other languages (each one characterized
by the addition of a particular device or aggregate of devices} would be derivable.
Before this could be done it was necessa:ry to provide for the control of two major
parameters in indexing, exhaustivity and specificity.
Exhaustivity and specificity
Exhaustivity in indexing refers to the degree to which one recognizes (i.e. includes
in the index descriptions} the different concepts or notions dealt with in a document.
Specificity refers to the generic level at which these concepts or notions are recog-
nized. For example, suppose a report has as its main theme the subject 'Drag on
swept wings at high subsonic speeds'. If one neglects, for the time being, the various
subsidiary themes which are also dealt with, this report may be said to deal with
three concepts - an aerodynamic characteristic, an aerodynamic structure and a
flow condition. If these concepts were described in the above fashion in the index
description, this latter would be exhaustive but not specific. If the description con-
sisted only of Drag - High subsonic speeds it would be neither exhaustive nor speci-
fic; for whilst the terms retained are specific, the absence of any reference to
Swept wings implies that the subject deals with aerodynamic structures in general
(some structure is implicit, of course} and this is less than specific, since to be
this a description must be exactly coextensive with the notion represented. There
can be no reduction in exhaustivity which is not a reduction in specificity; but the
reverse does not hold.