CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Formation of Index Languages
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 67 -
potent precision device and whilst the measurement of its impact, alone and in conjunc-
tion with other devices (including all the recall devices) was of course essential, it
could not be included as a variable when measuring the impact of the other devices
on single terms. Completely free manipulation of classes is only feasible if we begin
with single terms; this is a basic assumption of post-coordinate systems. It was
clearly desirable to obtain performance figures for the impact of single devices on
single classes before attempting to measure the joint impact of several devices - and
even a slight degree of pre-coordination would have compromised such figures.
Confounding of synonyms
This is perhaps the most obvious of all indexing devices and the one least likely
to be neglected even in the crudest of indexes. Much of this work was straightforward:
e. g., recognition of synonymity between such terms as Acoustics and Sound, Amount
and Quantity, Calculation and Computation, Axisymmetric and Axisymmetrical, Vertex
and Apex, Viscid and Viscous. However, exact synonymity is relatively rare (there
might even be argument about some of the examples above). The commoner situation
is a partial synonymity, where terms are interchangeable only in particular contexts.
The evident richness of the English language, even in the literature of high-speed aero-
dynamics, led to quite different terms being used on different occasions (but often in
the same document) to represent the same thing; e. g., the notion of Proximity might
be conveyed by that term or by Near, Nearest, Nearly, Close, Closely, Off, Adjacent,
Contact, etc. Two terms which might be used synonymously on most occasions would
occasionally diverge seriously; e. g., Interplanetary flight is equated with Interplanetary
voyage; Hypersonic flight with Hypersonic flow, Free flight with Free falling. But
Voyage, Flow and Falling cannot be regarded as synonyms.
The establishment of a synonym-list suffered one unfortunate drawback in that
it preceded the construction of classification schedules. Ideally, a synonym-list
in any given area should be extracted from a detailed classification; only by a system-
atic organization of all used terms according to their meanings can the ramifications
of complete and partial synonymity be exposed. For administrative reasons, however,
it was desirable to proceed with the measurement of relatively straightforward devices
like synonyms, word-forms, weights, etc., whilst the preparations for the more dif-
ficult devices like hierarchical linkage went on.
The truth of the assertions just made was borne out when the classified hierarchies
were completed, in that a number of further bynonyms, unrecognized in the synonym
programme, were disclosed. However, these cases were relatively few and we are
satisfied that the synonym-list on which the tests were made was reasonable on the
whole.
One difficult decision necessary in establishing the synonym list was whether
we should recognize variant word forms as synonyms. Whilst the usual view of syno-
nymity excludes variant word forms as being examples of a grammatical rather than
a semantic relationship, the practice of many subject heading lists, thesauri, etc.
which fail to recognize variant word forms at all is an implicit acceptance of the view
that such variants are virtually synonymous. Certainly, in the process of indexing
by natural language terms extracted from the documents, the fact that one word form
rather than another was selected was often almost fortuitous and this is shown, with
examples, in the section on hierarchical linkage. However, this argument was not
regarded as acceptable; a thesaurus, etc. may fail to recognize variant generic levels
as well as variant word forms, and so implicitly confound a genus and its species.