SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Retrieval Experiments with a Large Collection using PIRCS chapter K. Kwok L. Papadopoulos K. Kwan National Institute of Standards and Technology Donna K. Harman h. use of manual labor (1) mostly manually built using special interface (2) mostly machine built with manual correction (3) initial core manually built to "bootstrap" for completely machine-built completion (4) other (describe) 2. externally-built auxiliary file a. type of file (Treebank, WordNet, etc.) b. total amount of storage (megabytes) c. total number of concepts represented d. type of representation (frames, semantic nets, rules, etc.) II. Query construction (please fill out a section for each query construction method used) A. Automatically built queries (ad-hoc) 1. topic fields used 2. total computer time to build query (cpu seconds) 3. which of the following were used? a. term weighting with weights based on terms in topics SEARCH FOR WSJ ThRMINOLOGY IN LIBRARY AND FROM TOPICS. NONE [OCRerr]ThE>, <DESC>, <NARR>, <CON> S (AVERAGE FOR EACH QUERY). YES + OTHER WEIGHTS b. phrase extraction from topics NO c. syntactic parsing of topics NO d. word sense disambiguation NO e. proper noun identification algorithm NO f. tokenizer (recognizes dates, phone numbers, common patterns) (1) which patterns are tokenized? NO g. heuristic associations to add terms NO h. expansion of queries using previously-constructed data structure (from part I) YES (1) which structure? WORD-PAIR PHRASE FILE i. automatic addition of Boolean connectors or proximity operators NO j. other (describe) NONE B. Manually constructed queries (ad-hoc) 1. topic fields used 2. average time to build query (minutes) <TIThE>, <DESC>, <NARR>, <CON> 300 minutes for 25 queries. 3. type of query builder a. domain expert NO b. computer system expert YES 4. tools used to build query a. word frequency list SO[OCRerr]MES b. knowledge base browser [OCRerr]nowledge base described in part I) (1) which structure from part I NO c. other lexical tools (identify) NO 5. which of the following were used? a. term weighting YES b. Boolean connectors (AND, OR, NOT) YES 168