ix4 2. Sjmilarity Measures In the present system, documents and queries are represented as vectors in n-dimensional Euclidean space, ~~here n is the number of allowable concepts or index terms in the system. Documents are retrieved on the basis of their closeness to the query vector, with `Tcloseness" meaning small E~iclidean distance between the vectors. Since the document vectors are of varying length, however, perpendicular distance at scme fixed distance from the origin may not always be a good measure. Normalization of the document vectors so th~t their endpoints lie on the unit hypersphere, and use of the arc-length along the hypersphere as a distance measure, removes this problem. The measures used in the present stu~r are functions of this arc length through the cosine of the angle bet~een two document vectors. The measures used are defined in the following ways: (1) Cosine measure dd 5dd = 1 2 12 d11~1d2 (2) Tanimoto1s measure [4] S = d1 ~ dd ______________ 1 2 d1~d1 + d *d - d *d 22 12 where d and d are document vectors and S is the similarity of 1 2 dld2 document one with document two. Bonner uses the coefficient (2) to form his document-doc~nt similarity matrices, while Rocchio uses the cosine measure to compare documents.