lations, and the searchers' responses to our questionnaire. Table 1 shows the distribution of numbers of words and op- erators per query, and also of time required to construct a query. Table 2 shows the distribution of searchers' attitudes to the topics, each indicated on a scale of one to five, from least to most. [_______ Mean StdDev. Mm. Max. N Operators 9.94 5.72 1.00 44.00 375 [Words 19.40 14.63 1.0 145.00 375 LL 11.31 7.48 1.00 40.00 367 minutes Table 1. Characteristics of queries for ad hoc and routing topics. Mean StdDev. Mm. Max. N Familiar- 1.81 1.15 1.00 5.00 388 ity __ Easeof 2.82 1.11 1.00 5.00 372 construc- tion Enough 3.20 1.11 1.00 5.00 322 informa- tion ________ Table 2. Characterization of topics by searchers, for rout- mg and ad hoc topics. Our ad hoc questionnaire also included a question on how many years of experience each searcher had in online searching. The mean response was 6.8 years. Unfortu- nately, we do not have these data for the routing searchers. We wished to consider whether there were any relation- ships between the various characteristics of queries and top- ics and the performance of the queries themselves. For this purpose, we constructed a table in which each separate query formulation (75x5=375) is associated with performance measures, the characteristics enumerated in tables 1 and 2, and the three topic categories of broadness, hardness and re- striction defined by Harman (this volume). For perfor- mance, we considered using one or more of three measures: average of 11-point precision; precision at 100 documents; and R-precision (defined by Harman, this volume). Factor analysis of these three measures showed that a single factor accounts for more than 90% of the variance among them, so that they represent, in effect, a single aspect or factor of performance. The average precision was chosen as represen- tative of this factor, and we have used it both in evaluation of our retrieval results, and in attempting to determine the effect of the other variables we have considered, on retrieval performance. Since this variate does not exhibit a normal distribution, logarithmic and logistic transforms were ex- plored. The logistic leads to a most nearly normal distribu- tion of the transformed score, but we can still not say that the transformed variable follows a normal distribution. The results of applying ANOVA to seek a predictor of p are shown in Table 3. No significant relations appear. Because of the range of values assumed by the variables Operators, Words and Time, the relation was sought using 38 regression analysis. Once again, no significant relations were found, and the scatter plots (not included here) make it clear that there is no trend to be found.. Both hardness and broadness are significantly related to performance. The former is expected, since the hardness is determined by me- dian average precision; the latter is less obvious. Anal sis of variance for lo I 1- Independent variable Significance Familiaritv 0.149 Easiness 0.169 Information 0.907 Table 3. Significance levels of F-tests using ANOVA to seek dependence of the logistically transformed average pre- cision on the searcher's assessments of their query formula- tion. The search for relations between average precision and characteristics of the query formulation, whether provided by the search, or determined from the query text itself, was motivated by the results, discussed below, which show that it is desirable to weight formulations in proportion to their average precision. Thus, if we could find a surrogate for average precision which can be known without evaluating the retrieved documents, it would be possible to approxi- mate the effective combination on the first pass of a re- trieval operation. This hope is frustrated at this time. 3.3 Query Combination and Data Fusion Results: Ad hoc Topics The official results reported to JkBC-2 were for the overall performance of each of two treatments for the ad hoc topics, and of one treatment for the routing topics. For those results, we refer the reader to the relevant section of this volume. Here we report on our further investigations on the effect of combination of queries, and of data fusion, on performance. Our first investigation in query combination was to see if combining query formulations has a regular, beneficial ef- fect, as hypothesized. To do this, we generated the five dif- ferent search groups for the ad hoc topics, as described in section 2.2, and did experimental runs on all single query groups, all 2-way combinations of queries, all 3-way com- binations of queries, all 4-way combinations of queries, and the combination of all 5 query formulations. The results are presented in Table 4, where it is evident that the average performance increases monotonically as more evidence is added. The increase is strict and significant, as shown in Table 4a, where we display the number of times that each combination level performed better than each other level. We note that the data fusion results are not significantly better than any but 1-way combination (that is, average per- formance for single queries), but also that its performance is not significantly different from unweighted 5-way combina- tion