SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Feedback and Mixing Experiments with MatchPlus chapter S. Gallant W. Caid J. Carleton T. Gutschow R. Hecht-Nielsen K. Qing D. Sudbeck National Institute of Standards and Technology D. K. Harman documents have been retrieved, AIa[OCRerr]chPlus can revert to context vectors for finding the closest remaining document. 4. MaichPlus requires only about 300 multiplica- tions and additions to search a document. More- over it is easy to decompose the search for a cor- pus of documents with either parallel hardware or, less expensively, several networked conven- tional machines (or chips). Each machine can search a subset of the document context vectors and return the closest distances and document numbers in its subset. The closest from among the distances returned by all the processors then determines the documents chosen for retrieval. We also plan to investigate a clusier iree prun- ing procedure that finds nearest neighbor docu- ment context vectors without having to compute dot products for all document context vectors. This data organization affects retrieval speed, but does not change the order in which docu- ments are retrieved. 3 Experiments and Runs Sub- mitted The four runs (two ad-hoc, two routing) submitted to TREC-Il are described in the following sections. 3.3 Routing Using Neural Network Learning and Output Mixing Both routing runs used two types of neural network learning. 3.3.1 Stem Weight Learning The Stem Weight Learning approach uses neural net- work learning to compute weights for terms in a query; there is one neural network input for every query term.3 Every judged document for a query provides a training example as follows. Let V[OCRerr] be the context vector for judged document n. If query term i has context vector V[OCRerr], then the [OCRerr]th network input for training example n is given by V[OCRerr] V[OCRerr], the inner prod- uct of V[OCRerr] and V[OCRerr]. For training example n, the desired network output is ::::i, according to the relevance of document n. A fast single-cell learning algorithm, the pocket al- gorithm with ratchet [3, 4], generated weights. (More complex algorithms did not produce better results.) These weights were then used as term weights for nor- mal query processing. Note that Wong, Yao, el al [9] previously used a similar approach, applying a variant of perceptron learning to learn weights with the SMART vector space model. 3.3.2 Full Context Vector Learning 3.1 Totally Automated This run used the entire topic as a query, with weight of 2 applied to the topic section. We also employed a $Match filter that put documents having at least 4 of the concept terms before other documents. Con- text vectors determined the actual order of the docu- ments (as well as which documents were in the 1000- document submission list.) Note that these ad hoc retrieval runs were totally automated from topic to retrieval list. 3.2 Relevance Feedback From the First 20 Retrievals Here we took the top 20 documents from the previous section, read them to estimate relevance, and then modified the initial query for the remaining 980 re- trievals. To change a query we added and normalized all relevant document context vectors from the first 20 retrievals and then added this vector to 0.7 times the original query vector. (The 0.7 factor was ob- tained from experiments with a different corpus.) 103 This approach is similar to the previous Stem Weight approach, except we compute an entire query con- text vector rather than weights for stems. Here we use document context vectors directly as inputs to the network; for example the [OCRerr]th network input for training example n is given by [OCRerr] The network weights produced by learning are directly interpreted as a query context vector. For most topics, this approach was not as good as the previous approach for two apparent reasons. First the Stem Weight approach makes use of the original query terms, a valuable piece of user input. Second, the Stem Weight approach uses fewer (5-50) trainable parameters, possibly avoiding a tendency with the Full Context Vector approach ([OCRerr] 300 parameters) to `overfit the data'. 3.3.3 Routing Run #1: Best Candidate Our first routing run was to take the best candidate from 4 sources: the two neural network approaches, terms in the concept section of the topic were used for routing experiments.