CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Formation of Index Languages chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. -70 - generalized category of Spatial Phenomena, its class mates will eventually include such terms as Underwater or Buried. In a question on the Upper atmosphere, if ex- pansion of the first term brings in such classes it is clearly unhelpful. Many other examples could be given; e.g. , Boosted and Reinforced could reasonably be assigned to a basic category of activities affecting the dimensions of a thing. But in the col- lection indexed these terms appeared in quite different contexts - Boosted under Rocketry and Reinforced under Structures. The procedure finally adopted was a compromise. Where, in the test collection, an index term had appeared only in one particular context it was placed in a hierarchy reflecting that context, without regard to whether it was a necessary or contingent relation. For example, Gun would be regarded as a method of propulsion in any fundamental hierarchy; but in the test collection it appears only as a designation of a kind of aeronautical testing device (a special kind of wind tunnel) and so it was located with the latter. An extreme example would be Gamma; as a single term, this cou;d hardly appear in any 'fundamental' category other than Letters; in the test col- lee[OCRerr]io[OCRerr]l it appeared only as the designation of a kind of steel and was located as such. If, however, a term appeared i.[OCRerr] several different contexts suffering a significant qualification of meaning, it was placed in a 'fundamental' category; e.g. Integral appeared in its mathematical and structural sense and was therefore placed in a Common properties category. Similarly, the term Working appeared in two main guises: to designate a particular section of a wind tunnel and to designate a fluid (e. g., a test gas). The sense of the term Working is significantly different in the two contexts and it was therefore relegated to a common properties category. Problems of terminology Closely related to the above problem was that of interpreting the intended mean- ing (from the point of view of the test collection) of the terms used in the natural lan- guage indexing. The organization of terms into hierarchies constitutes a form of controlled vocabulary, of course; in this case, it was a control being exercised retro- spectively, after the indexing stage. The object was to place each term in the hier- a,'chy to which it would have been assigned had the indexing been done using the con- trolled vocabulary. So where the same essential notion was conveyed in various grammatical styles, these variants would have been ignored and one form done ser- vice for all; that is to say, the particular grammatical form of a term might have to be disregarded since its semantic content in the index was the only point of interest now. For example, a writer might refer indifferently to 'reduction of x by compres- sion', or 'reduction of x by compressing' or 'reduction of x compressively' without wishing to convey a significantly different idea. Again, any one of the phrase[OCRerr]J 'plate with curvature', 'curved plate', 'curve of the plate', and 'curving the plate' might be used in a report without any intention of conveying different nuances of meaning {i. e., without meaning to refer to the structure or the property or the operation in particular). Other examples were Test and Testing; Calculation, Calculating and Calculated; Asymptote, Asymptotic and Asymptotically. All these variations in ex- pression were ignored and the different forms juxtaposed in the hierarchies. Where different word forms reflected significantly different emphases in mean- ing they were assigned to their formal categories. So Buckle and Buckling appeared as processes and Buckled as a property; Cantilever was used to characterize a kind of beam, but Cantilevered designated atype of structure. Scooped appeared as a property and [OCRerr]icooping as a process.