Text REtrieval Conference (TREC)
System Description

Organization Name: Microsoft/City University

Run ID: plt8f1

Section 1.0 System Summary and Timing
Section 1.1 System Information
Hardware Model Used for TREC Experiment: 16 node PII cluster System Use: SHARED Total Amount of Hard Disk Storage: 144 Gb Total Amount of RAM: 6,144 MB Clock Rate of CPU: 300 MHz
Section 1.2 System Comparisons
Amount of developmental "Software Engineering": ALL List of features that are not present in the system, but would have been beneficial to have: List of features that are present in the system, and impacted its performance, but are not detailed within this form: Parallelism was used.

Section 2.0 Construction of Indices, Knowledge Bases, and Other Data Structures
Length of the stopword list: 450 words Type of Stemming: LOVINS Controlled Vocabulary: NO Term weighting: YES Additional Comments on term weighting: Okapi BM_25 Phrase discovery: NO Kind of phrase: Method used: OTHER Type of Spelling Correction: NONE Manually-Indexed Terms: NO Proper Noun Identification: NO Syntactic Parsing: NO Tokenizer: NO Word Sense Disambiguation: NO Other technique: NO Additional comments: A simple SGML/HTML parser was used.

Section 2.0 Construction of Indices, Knowledge Bases, and Other Data Structures

Length of the stopword list: 450 words

Type of Stemming: LOVINS

Controlled Vocabulary: NO

Term weighting: YES

Additional Comments on term weighting: Okapi BM_25

Phrase discovery: NO

Kind of phrase:
Method used: OTHER

Type of Spelling Correction: NONE

Manually-Indexed Terms: NO

Proper Noun Identification: NO

Syntactic Parsing: NO

Tokenizer: NO

Word Sense Disambiguation: NO

Other technique: NO

Additional comments: A simple SGML/HTML parser was used.

Section 3.0 Statistics on Data Structures Built from TREC Text
Section 3.1 First Data Structure
Structure Type: INVERTED INDEX Type of other data structure used: Brief description of method using other data structure: Total storage used: 0.038 Gb Total computer time to build: 0.085 hours Automatic process: YES Manual hours required: hours Type of manual labor: NONE Term positions used: NO Only single terms used: YES Concepts (vs. single terms) represented: NO Number of concepts represented: Type of representation: simple stemmed keyword Auxilary files used: NO Type of auxilary files used: Additional comments: This is the training collection used
Section 3.2 Second Data Structure
Structure Type: INVERTED INDEX Type of other data structure used: Brief description of method using other data structure: Total storage used: 0.085 Gb Total computer time to build: 0.18 hours Automatic process: YES Manual hours required: hours Type of manual labor: NONE Term positions used: YES Only single terms used: YES Concepts (vs. single terms) represented: NO Number of concepts represented: Type of representation: simple stemmed keyword Auxilary files used: NO Type of auxilary files used: Additional comments: This is the test collection used.
Section 3.3 Third Data Structure
Structure Type: Type of other data structure used: Brief description of method using other data structure: Total storage used: Gb Total computer time to build: hours Automatic process: Manual hours required: hours Type of manual labor: NONE Term positions used: Only single terms used: Concepts (vs. single terms) represented: Number of concepts represented: Type of representation: Auxilary files used: Type of auxilary files used: Additional comments:

Section 4.0 Data Built from Sources Other than the Input Text
Internally-built Auxiliary File File type: NONE Domain type: DOMAIN INDEPENDENT Total Storage: Gb Number of Concepts Represented: concepts Type of representation: NONE Automatic or Manual: Total Time to Build: hours Total Time to Modify (if already built): hours Type of Manual Labor used: NONE Additional comments:
Externally-built Auxiliary File File is: NONE Total Storage: Gb Number of Concepts Represented: concepts Type of representation: NONE Additional comments:

Section 5.0 Computer Searching
Average computer time to search (per query): 6.7 CPU seconds
Times broken down by component(s):
Section 5.1 Searching Methods
Vector space model: NO Probabilistic model: YES Cluster searching: NO N-gram matching: NO Boolean matching: NO Fuzzy logic: NO Free text scanning: NO Neural networks: NO Conceptual graphic matching: NO Other: NO Additional comments: The timings are the average query optimisation times.
Section 5.2 Factors in Ranking
Term frequency: YES Inverse document frequency: YES Other term weights: NO Semantic closeness: NO Position in document: NO Syntactic clues: NO Proximity of terms: NO Information theoretic weights: NO Document length: YES Percentage of query terms which match: NO N-gram frequency: NO Word specificity: NO Word sense frequency: NO Cluster distance: NO Other: NO Additional comments:

Send questions to trec@nist.gov

Disclaimer: Contents of this online document are not necessarily the official views of, nor endorsed by the U.S. Government, the Department of Commerce, or NIST.

From - Thu Sep 2 12:37:18 1999 Received: from potomac.nist.gov (potomac.nist.gov [129.6.13.23]) by email.nist.gov (8.9.3/8.9.3) with ESMTP id MAA01567 for ; Thu, 2 Sep 1999 12:14:45 -0400 (EDT) Received: (from nobody@localhost) by potomac.nist.gov (AIX4.2/UCB 8.7/8.7) id MAA698552 for trec@nist.gov; Thu, 2 Sep 1999 12:14:43 -0400 (EDT) Date: Thu, 2 Sep 1999 12:14:43 -0400 (EDT) Message-Id: <199909021614.MAA698552@potomac.nist.gov> Subject: TREC System Description From: A.MacFarlane@nist.gov (andym@soi.city.ac.uk) Content-Type: text X-Mozilla-Status: 8001 X-Mozilla-Status2: 00000000 X-UIDL: 9c61ab85e6922df30f4431c9aff395d0 Text REtrieval Conference (TREC) System Description