SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Natural Language Processing in Large-Scale Text Retrieval Tasks
chapter
T. Strzalkowski
National Institute of Standards and Technology
Donna K. Harman
REFERENCES
Church, Kenneth Ward and Hanks, Patrick. 1990.
"Word associafion norms, mutual informa-
tion, and lexicography." Computational
Linguistics, 16(1), MIT Press, pp.22-29.
Crouch, Carolyn J. 1988. "A cluster-based approach
to thesaurus construcfion." Proceedings of
ACM SIGIR-88, pp.309-320.
Grishman, Ralph, Lynette Hirschman, and Ngo T.
Nhan. 1986. "Discovery procedures for sub-
language selectional patterns: initial experi-
ments". Computational Linguistics, 12(3), pp.
205-215.
Grishman, Ralph and Tomek Strzalkowski. 1991.
"Information Retrieval and Natural Language
Processing." Position paper at the workshop
on Future Directions in Natural Language Pro-
cessing in Informafion Retrieval, Chicago.
Harman, Donna. 1988. "Towards interacfive query
expansion." Proceedings of ACM SIGIR-88,
pp.321-331.
Harman, Donna and Gerald Candela. 1989.
"Retrieving Records from a Gigabyte of text
on a Minicomputer Using Statistical Rank-
mg." Journal of the American Society for
Information Science, 41(8), pp.581-589.
Hindle, Donald. 1990. "Noun classificafion from
predicate-argument structures." Proc. 28
Meeting of the ACL, Pittsburgh, PA, pp.268-
275.
Lewis, David D. and W. Bruce Croft. 1990. "Term
Clustering of Syntactic Phrases". Proceedings
of ACM SIGIR-90, pp.385-405.
Mauldin, Michael. 1991. "Retrieval Performance in
Ferret: A Conceptual Information Retrieval
System." Proceedings of ACM SIGIR-91, pp.
347-355.
Meteer, Marie, Richard Schwartz, and Ralph
Weischedel. 1991. "Studies in Part of Speech
Labelling." Proceedings of the 4th DAJJPA
Speech and Natural Language Workshop,
Morgan-Kaufman, San Mateo, CA. pp.331-
336.
Sager, Naomi. 1981. Natural Language Information
Processing. Addison-Wesley
Sparck Jones, Karen. 1972. "Stafistical interpreta-
tion of term specificity and its application in
retrieval." Journal of Documentation, 28(1),
pp.11-20.
Sparck Jones, K. and B. 0. Barber. 1971. "What
makes automatic keyword classificafion effec-
tive?" Journal of the American Society fi)r
Information Science, May-June, pp.166-175.
Sparck Jones, K. and J. I. Tait. 1984. "Automatic
search term variant generation." Journal of
Documentation, 40(1), pp.50-66.
186
Strzalkowski, Tomek and Barbara Vauthey. 1991.
"Fast Text Processing for Information
Retrieval." Proceedings of the 4th DARPA
Speech and Natural Language Workshop,
Morgan-Kaufman, pp.346-351.
Strzalkowski, Tomek and Barbara Vauthey. 1992.
"Information Retrieval Using Robust Natural
Language Processing." Proc. of the 30th ACL
Meefing, Newark, DE, June-July. pp.104-111.
Strzalkowski, Tomek. 1992. "TTP: A Fast and
Robust Parser for Natural Language."
Proceedings of the 14th International Confer-
ence on Computafional Linguistics (COL-
ING), Nantes, France, July 1992. pp.198-204.
Wilks, Yorick A., Dan Fass, Cheng-Ming Guo,
James B. McDonald, Tony Plate, and Brian M.
Slator. 1990. "Providing machine tractable
dictionary tools." Machine Translation, 5, pp.
99-154.
APPENDIX A: EXAMPLE QUERY
We show TRBC topic 057 and a part of result-
mg search query with only top ranked terms showed
in Table A.1. Please note that we rank terms by their
idf scores even though their actual scores are idf *
weight. It is worth pointing out, however, that the
NIST system uses idf scores to decide if a term falls
below a preset "significance" threshold. We only
show fields used for query generation.
<top>
<num> Number: 057
<title> Topic: MCI
<desc> Description:
Document will discuss how MCI has been doing since the Bell
System breakup.
[OCRerr]arr> Narrative:
A relevant document will discuss the financial health of MCI Com-
munications Corp. since the breakup of the Bell System (AT&T
and the seven regional Baby Bells) in January 1984. The status in-
dicated may not necessarily be a direct or indirect result of the
breakup of the system and ensuing regulation and deregulation of
Ma Bell or of the restrictions placed upon the seven Bells; it may
result from any number of factors, such as advances in telecom-
munications technology, MCI initiative, etc. MCI's financial
hcalth may be reported directly: a broad statement about its eam-
ings or cash flow, or a report containing financial data such as a
quarterly report; or it may be reflected by one or more of the fol-
lowing: credit ratings, share of customers, volume growth, cuts in
capital spending, figure net loss, pre-tax charge, analysts' or
MCI's own forecast about how well they will be doing, or MCI's
response to price cuts that AT&T makes at its own initiative or
under orders from the Federal Communications Commission
(FCC), such as price reductions, layoffs of employees out of a per-
ceived need to cut costs, etc. Daily OTC trading stock market and
monthly short interest reports are NOT relevant; the inventory
must be longer te[OCRerr], at least quarterly.
<Aop>