NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)

SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Natural Language Processing in Large-Scale Text Retrieval Tasks chapter T. Strzalkowski National Institute of Standards and Technology Donna K. Harman REFERENCES Church, Kenneth Ward and Hanks, Patrick. 1990. "Word associafion norms, mutual informa- tion, and lexicography." Computational Linguistics, 16(1), MIT Press, pp.22-29. Crouch, Carolyn J. 1988. "A cluster-based approach to thesaurus construcfion." Proceedings of ACM SIGIR-88, pp.309-320. Grishman, Ralph, Lynette Hirschman, and Ngo T. Nhan. 1986. "Discovery procedures for sub- language selectional patterns: initial experi- ments". Computational Linguistics, 12(3), pp. 205-215. Grishman, Ralph and Tomek Strzalkowski. 1991. "Information Retrieval and Natural Language Processing." Position paper at the workshop on Future Directions in Natural Language Pro- cessing in Informafion Retrieval, Chicago. Harman, Donna. 1988. "Towards interacfive query expansion." Proceedings of ACM SIGIR-88, pp.321-331. Harman, Donna and Gerald Candela. 1989. "Retrieving Records from a Gigabyte of text on a Minicomputer Using Statistical Rank- mg." Journal of the American Society for Information Science, 41(8), pp.581-589. Hindle, Donald. 1990. "Noun classificafion from predicate-argument structures." Proc. 28 Meeting of the ACL, Pittsburgh, PA, pp.268- 275. Lewis, David D. and W. Bruce Croft. 1990. "Term Clustering of Syntactic Phrases". Proceedings of ACM SIGIR-90, pp.385-405. Mauldin, Michael. 1991. "Retrieval Performance in Ferret: A Conceptual Information Retrieval System." Proceedings of ACM SIGIR-91, pp. 347-355. Meteer, Marie, Richard Schwartz, and Ralph Weischedel. 1991. "Studies in Part of Speech Labelling." Proceedings of the 4th DAJJPA Speech and Natural Language Workshop, Morgan-Kaufman, San Mateo, CA. pp.331- 336. Sager, Naomi. 1981. Natural Language Information Processing. Addison-Wesley Sparck Jones, Karen. 1972. "Stafistical interpreta- tion of term specificity and its application in retrieval." Journal of Documentation, 28(1), pp.11-20. Sparck Jones, K. and B. 0. Barber. 1971. "What makes automatic keyword classificafion effec- tive?" Journal of the American Society fi)r Information Science, May-June, pp.166-175. Sparck Jones, K. and J. I. Tait. 1984. "Automatic search term variant generation." Journal of Documentation, 40(1), pp.50-66. 186 Strzalkowski, Tomek and Barbara Vauthey. 1991. "Fast Text Processing for Information Retrieval." Proceedings of the 4th DARPA Speech and Natural Language Workshop, Morgan-Kaufman, pp.346-351. Strzalkowski, Tomek and Barbara Vauthey. 1992. "Information Retrieval Using Robust Natural Language Processing." Proc. of the 30th ACL Meefing, Newark, DE, June-July. pp.104-111. Strzalkowski, Tomek. 1992. "TTP: A Fast and Robust Parser for Natural Language." Proceedings of the 14th International Confer- ence on Computafional Linguistics (COL- ING), Nantes, France, July 1992. pp.198-204. Wilks, Yorick A., Dan Fass, Cheng-Ming Guo, James B. McDonald, Tony Plate, and Brian M. Slator. 1990. "Providing machine tractable dictionary tools." Machine Translation, 5, pp. 99-154. APPENDIX A: EXAMPLE QUERY We show TRBC topic 057 and a part of result- mg search query with only top ranked terms showed in Table A.1. Please note that we rank terms by their idf scores even though their actual scores are idf * weight. It is worth pointing out, however, that the NIST system uses idf scores to decide if a term falls below a preset "significance" threshold. We only show fields used for query generation. <top> <num> Number: 057 <title> Topic: MCI <desc> Description: Document will discuss how MCI has been doing since the Bell System breakup. [OCRerr]arr> Narrative: A relevant document will discuss the financial health of MCI Com- munications Corp. since the breakup of the Bell System (AT&T and the seven regional Baby Bells) in January 1984. The status in- dicated may not necessarily be a direct or indirect result of the breakup of the system and ensuing regulation and deregulation of Ma Bell or of the restrictions placed upon the seven Bells; it may result from any number of factors, such as advances in telecom- munications technology, MCI initiative, etc. MCI's financial hcalth may be reported directly: a broad statement about its eam- ings or cash flow, or a report containing financial data such as a quarterly report; or it may be reflected by one or more of the fol- lowing: credit ratings, share of customers, volume growth, cuts in capital spending, figure net loss, pre-tax charge, analysts' or MCI's own forecast about how well they will be doing, or MCI's response to price cuts that AT&T makes at its own initiative or under orders from the Federal Communications Commission (FCC), such as price reductions, layoffs of employees out of a per- ceived need to cut costs, etc. Daily OTC trading stock market and monthly short interest reports are NOT relevant; the inventory must be longer te[OCRerr], at least quarterly. <Aop>