SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Query Improvement in INformation Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project
chapter
J. Yang
R. Korfhage
E. Rasmussen
National Institute of Standards and Technology
Donna K. Harman
(3) Effects of the genetic operators - mutation and crossover
Previous examples have shown the overall effect of the GA operations, where the
genetic modification processes on the query term weights have improved retrieval results.
However, in most situations, the function of the two important operators - mutation which
assigns a new weight to a randomly selected term, and crossover which exchanges individual
term weights between two randomly selected pairs - is difficult to display. However, in
experiments on the DOE database we found one example, topic 13, which shows how
mutation and crossover can create a new query individual retrieving, in this example, all of the
relevant documents which are retrieved part by one of their parent and part by the other
parent, together with the other new query individual retrieving none.
Table 12 depicts this situation. The top part of the table shows the term weights of the
query individuals. The left side are the two original query individuals - the parents, and the
right side are the children after mutation and crossover. The rest of the table shows the
documents retrieved by the parent 1, parent 2 and the children. We can see that one of the
children has inherited all of the proper term weights from the parents, and this child retrieved
all of the relevant documents retrieved by both of the parents. Note that although two of the
term weights are changed by the mutation, it does not affect the combination effect because
the difference between the changed weights and the original ones from one of the parents is
not significant. Of cause, the effects of crossover and mutation are not always this clear cut.
(4) Parallel search using multiple query individuals
The genetic algorithm using multiple individuals in the document retrieval process
provides the capability of parallel search with different query individuals searching the
document space simultaneously. Each individual, with the same terms but different weights,
may retrieve different documents which are located in different areas in the document space.
This makes the genetic search distinct from other searching methods, where only one query is
used.
The effects of parallel search can be explained by showing the documents retrieved
from several query individuals. Here we give the results from topic 24. Table 13 displays the
term weights of query individuals in the first generation, and Table 14 shows the relevant
documents retrieved by four different query individuals from Table 13. We can see that for
these four queries, there is no overlap among the relevant documents retrieved. Thus each
query individual emphasizes different concepts by assigning high weights to them. Documents
which are more closely related to those concepts would be retrieved by different query
individuals.
The situation also can be viewed as having the query individuals handle the AND and
OR operation simultaneously, which is impossible for methods using only one query unless
they use the Boolean model. However, many researchers reject the Boolean model because it
is difficult for users to apply the AND/OR operations correctly (e.g., Bookstein, 1985).
46