CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Formation of Index Languages
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
-70 -
generalized category of Spatial Phenomena, its class mates will eventually include
such terms as Underwater or Buried. In a question on the Upper atmosphere, if ex-
pansion of the first term brings in such classes it is clearly unhelpful. Many other
examples could be given; e.g. , Boosted and Reinforced could reasonably be assigned
to a basic category of activities affecting the dimensions of a thing. But in the col-
lection indexed these terms appeared in quite different contexts - Boosted under
Rocketry and Reinforced under Structures.
The procedure finally adopted was a compromise. Where, in the test collection,
an index term had appeared only in one particular context it was placed in a hierarchy
reflecting that context, without regard to whether it was a necessary or contingent
relation. For example, Gun would be regarded as a method of propulsion in any
fundamental hierarchy; but in the test collection it appears only as a designation of
a kind of aeronautical testing device (a special kind of wind tunnel) and so it was
located with the latter. An extreme example would be Gamma; as a single term, this
cou;d hardly appear in any 'fundamental' category other than Letters; in the test col-
lee[OCRerr]io[OCRerr]l it appeared only as the designation of a kind of steel and was located as such.
If, however, a term appeared i.[OCRerr] several different contexts suffering a significant
qualification of meaning, it was placed in a 'fundamental' category; e.g. Integral
appeared in its mathematical and structural sense and was therefore placed in a
Common properties category. Similarly, the term Working appeared in two main
guises: to designate a particular section of a wind tunnel and to designate a fluid
(e. g., a test gas). The sense of the term Working is significantly different in the
two contexts and it was therefore relegated to a common properties category.
Problems of terminology
Closely related to the above problem was that of interpreting the intended mean-
ing (from the point of view of the test collection) of the terms used in the natural lan-
guage indexing. The organization of terms into hierarchies constitutes a form of
controlled vocabulary, of course; in this case, it was a control being exercised retro-
spectively, after the indexing stage. The object was to place each term in the hier-
a,'chy to which it would have been assigned had the indexing been done using the con-
trolled vocabulary. So where the same essential notion was conveyed in various
grammatical styles, these variants would have been ignored and one form done ser-
vice for all; that is to say, the particular grammatical form of a term might have to
be disregarded since its semantic content in the index was the only point of interest
now. For example, a writer might refer indifferently to 'reduction of x by compres-
sion', or 'reduction of x by compressing' or 'reduction of x compressively' without
wishing to convey a significantly different idea. Again, any one of the phrase[OCRerr]J 'plate
with curvature', 'curved plate', 'curve of the plate', and 'curving the plate' might
be used in a report without any intention of conveying different nuances of meaning
{i. e., without meaning to refer to the structure or the property or the operation in
particular). Other examples were Test and Testing; Calculation, Calculating and
Calculated; Asymptote, Asymptotic and Asymptotically. All these variations in ex-
pression were ignored and the different forms juxtaposed in the hierarchies.
Where different word forms reflected significantly different emphases in mean-
ing they were assigned to their formal categories. So Buckle and Buckling appeared
as processes and Buckled as a property; Cantilever was used to characterize a kind
of beam, but Cantilevered designated atype of structure. Scooped appeared as a
property and [OCRerr]icooping as a process.