SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Feedback and Mixing Experiments with MatchPlus
chapter
S. Gallant
W. Caid
J. Carleton
T. Gutschow
R. Hecht-Nielsen
K. Qing
D. Sudbeck
National Institute of Standards and Technology
D. K. Harman
documents have been retrieved, AIa[OCRerr]chPlus can
revert to context vectors for finding the closest
remaining document.
4. MaichPlus requires only about 300 multiplica-
tions and additions to search a document. More-
over it is easy to decompose the search for a cor-
pus of documents with either parallel hardware
or, less expensively, several networked conven-
tional machines (or chips). Each machine can
search a subset of the document context vectors
and return the closest distances and document
numbers in its subset. The closest from among
the distances returned by all the processors then
determines the documents chosen for retrieval.
We also plan to investigate a clusier iree prun-
ing procedure that finds nearest neighbor docu-
ment context vectors without having to compute
dot products for all document context vectors.
This data organization affects retrieval speed,
but does not change the order in which docu-
ments are retrieved.
3 Experiments and Runs Sub-
mitted
The four runs (two ad-hoc, two routing) submitted
to TREC-Il are described in the following sections.
3.3 Routing Using Neural Network
Learning and Output Mixing
Both routing runs used two types of neural network
learning.
3.3.1 Stem Weight Learning
The Stem Weight Learning approach uses neural net-
work learning to compute weights for terms in a
query; there is one neural network input for every
query term.3
Every judged document for a query provides a
training example as follows. Let V[OCRerr] be the context
vector for judged document n. If query term i has
context vector V[OCRerr], then the [OCRerr]th network input for
training example n is given by V[OCRerr] V[OCRerr], the inner prod-
uct of V[OCRerr] and V[OCRerr]. For training example n, the desired
network output is ::::i, according to the relevance of
document n.
A fast single-cell learning algorithm, the pocket al-
gorithm with ratchet [3, 4], generated weights. (More
complex algorithms did not produce better results.)
These weights were then used as term weights for nor-
mal query processing.
Note that Wong, Yao, el al [9] previously used a
similar approach, applying a variant of perceptron
learning to learn weights with the SMART vector
space model.
3.3.2 Full Context Vector Learning
3.1 Totally Automated
This run used the entire topic as a query, with weight
of 2 applied to the topic section. We also employed
a $Match filter that put documents having at least 4
of the concept terms before other documents. Con-
text vectors determined the actual order of the docu-
ments (as well as which documents were in the 1000-
document submission list.) Note that these ad hoc
retrieval runs were totally automated from topic to
retrieval list.
3.2 Relevance Feedback From the
First 20 Retrievals
Here we took the top 20 documents from the previous
section, read them to estimate relevance, and then
modified the initial query for the remaining 980 re-
trievals. To change a query we added and normalized
all relevant document context vectors from the first
20 retrievals and then added this vector to 0.7 times
the original query vector. (The 0.7 factor was ob-
tained from experiments with a different corpus.)
103
This approach is similar to the previous Stem Weight
approach, except we compute an entire query con-
text vector rather than weights for stems. Here we
use document context vectors directly as inputs to
the network; for example the [OCRerr]th network input for
training example n is given by [OCRerr] The network
weights produced by learning are directly interpreted
as a query context vector.
For most topics, this approach was not as good as
the previous approach for two apparent reasons. First
the Stem Weight approach makes use of the original
query terms, a valuable piece of user input. Second,
the Stem Weight approach uses fewer (5-50) trainable
parameters, possibly avoiding a tendency with the
Full Context Vector approach ([OCRerr] 300 parameters) to
`overfit the data'.
3.3.3 Routing Run #1: Best Candidate
Our first routing run was to take the best candidate
from 4 sources: the two neural network approaches,
terms in the concept section of the topic were used
for routing experiments.