IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Suffix Dictionaries chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VI-2 be stored in the system to word stems only. The suffix `5' dictionary is applied in the same manner, but in this case the only `suffix' removed is the terminal [OCRerr] with the object of conflating singular and plural word forms. Many of the considerations relating to the methods Oi construction of the stem dictionary have been discussed by Salton and Lesk L7]. The comments made here relate to the extent to which the present dictionaries correctly conflate English word forms and so use the correct stems. The conflation of singular and plural words is not perfectly achieved by terminal "5" removal, although over success is obtained in the case of the Cran-l aerodynamics terminology. The failures are due to well-known singular and plural forms such as "body" and t'bodies", "axis" and "axes", "bureau" and "bureaux", "appendix" and "appendices", etc. Also, the ter- minal "5" does not always denote a plural form, and words like "bluntness" and "aerodynamics" have the s" removed. This latter occurrence rarely affects retrieval, however, since a request and document both containing the word 11bluntness" will match on the word without its terminal "5". It is possible to imagine a case of incorrect conflation, for example, the word "axe11 could be incorrectly conflated with "axes", but such occurrences are extremely rare within the narrow subject fields under test. The full suffix removal procedure incorporates spelling rules which correctly identify "bod" as the stem of both "body" and `tbodies11, and correctly conflate "hope", "hoped" and 11hoping", as well as "hop", "hopped" and "hopping". There are some cases, however, where the correct stem is not recognized. For example, the words `1computation", "computations" and `1computational'1 are correctly conflated and given the same concept number as the look-up procedure, but a second group of similar words is given a second concept number including such words as "compute11, 11computed", "computers", "computer", and "computing".