Text REtrieval Conference (TREC)
System Description

Organization Name: Microsoft/City University Run ID: plt8f1
Section 1.0 System Summary and Timing
Section 1.1 System Information
Hardware Model Used for TREC Experiment: 16 node PII cluster
System Use: SHARED
Total Amount of Hard Disk Storage: 144 Gb
Total Amount of RAM: 6,144 MB
Clock Rate of CPU: 300 MHz
Section 1.2 System Comparisons
Amount of developmental "Software Engineering": ALL
List of features that are not present in the system, but would have been beneficial to have:
List of features that are present in the system, and impacted its performance, but are not detailed within this form: Parallelism was used.
Section 2.0 Construction of Indices, Knowledge Bases, and Other Data Structures
Length of the stopword list: 450 words
Type of Stemming: LOVINS
Controlled Vocabulary: NO
Term weighting: YES
  • Additional Comments on term weighting: Okapi BM_25
Phrase discovery: NO
  • Kind of phrase:
  • Method used: OTHER
Type of Spelling Correction: NONE
Manually-Indexed Terms: NO
Proper Noun Identification: NO
Syntactic Parsing: NO
Tokenizer: NO
Word Sense Disambiguation: NO
Other technique: NO
Additional comments: A simple SGML/HTML parser was used.
Section 3.0 Statistics on Data Structures Built from TREC Text
Section 3.1 First Data Structure
Structure Type: INVERTED INDEX
Type of other data structure used:
Brief description of method using other data structure:
Total storage used: 0.038 Gb
Total computer time to build: 0.085 hours
Automatic process: YES
Manual hours required: hours
Type of manual labor: NONE
Term positions used: NO
Only single terms used: YES
Concepts (vs. single terms) represented: NO
  • Number of concepts represented:
Type of representation: simple stemmed keyword
Auxilary files used: NO
  • Type of auxilary files used:
Additional comments: This is the training collection used
Section 3.2 Second Data Structure
Structure Type: INVERTED INDEX
Type of other data structure used:
Brief description of method using other data structure:
Total storage used: 0.085 Gb
Total computer time to build: 0.18 hours
Automatic process: YES
Manual hours required: hours
Type of manual labor: NONE
Term positions used: YES
Only single terms used: YES
Concepts (vs. single terms) represented: NO
  • Number of concepts represented:
Type of representation: simple stemmed keyword
Auxilary files used: NO
  • Type of auxilary files used:
Additional comments: This is the test collection used.
Section 3.3 Third Data Structure
Structure Type:
Type of other data structure used:
Brief description of method using other data structure:
Total storage used: Gb
Total computer time to build: hours
Automatic process:
Manual hours required: hours
Type of manual labor: NONE
Term positions used:
Only single terms used:
Concepts (vs. single terms) represented:
  • Number of concepts represented:
Type of representation:
Auxilary files used:
  • Type of auxilary files used:
Additional comments:
Section 4.0 Data Built from Sources Other than the Input Text
Internally-built Auxiliary File

File type: NONE
Domain type: DOMAIN INDEPENDENT
Total Storage: Gb
Number of Concepts Represented: concepts
Type of representation: NONE
Automatic or Manual:
  • Total Time to Build: hours
  • Total Time to Modify (if already built): hours
Type of Manual Labor used: NONE
Additional comments:
Externally-built Auxiliary File

File is: NONE
Total Storage: Gb
Number of Concepts Represented: concepts
Type of representation: NONE
Additional comments:
Section 5.0 Computer Searching
Average computer time to search (per query): 6.7 CPU seconds
Times broken down by component(s):
Section 5.1 Searching Methods
Vector space model: NO
Probabilistic model: YES
Cluster searching: NO
N-gram matching: NO
Boolean matching: NO
Fuzzy logic: NO
Free text scanning: NO
Neural networks: NO
Conceptual graphic matching: NO
Other: NO
Additional comments: The timings are the average query optimisation times.
Section 5.2 Factors in Ranking
Term frequency: YES
Inverse document frequency: YES
Other term weights: NO
Semantic closeness: NO
Position in document: NO
Syntactic clues: NO
Proximity of terms: NO
Information theoretic weights: NO
Document length: YES
Percentage of query terms which match: NO
N-gram frequency: NO
Word specificity: NO
Word sense frequency: NO
Cluster distance: NO
Other: NO
Additional comments:
Send questions to trec@nist.gov

Disclaimer: Contents of this online document are not necessarily the official views of, nor endorsed by the U.S. Government, the Department of Commerce, or NIST.

From - Thu Sep 2 12:37:18 1999 Received: from potomac.nist.gov (potomac.nist.gov [129.6.13.23]) by email.nist.gov (8.9.3/8.9.3) with ESMTP id MAA01567 for ; Thu, 2 Sep 1999 12:14:45 -0400 (EDT) Received: (from nobody@localhost) by potomac.nist.gov (AIX4.2/UCB 8.7/8.7) id MAA698552 for trec@nist.gov; Thu, 2 Sep 1999 12:14:43 -0400 (EDT) Date: Thu, 2 Sep 1999 12:14:43 -0400 (EDT) Message-Id: <199909021614.MAA698552@potomac.nist.gov> Subject: TREC System Description From: A.MacFarlane@nist.gov (andym@soi.city.ac.uk) Content-Type: text X-Mozilla-Status: 8001 X-Mozilla-Status2: 00000000 X-UIDL: 9c61ab85e6922df30f4431c9aff395d0 Text REtrieval Conference (TREC) System Description

Text REtrieval Conference (TREC)
System Description

Organization Name: Microsoft/City University Run ID: plt8f1
Section 1.0 System Summary and Timing
Section 1.1 System Information
Hardware Model Used for TREC Experiment: 16 node PII cluster
System Use: SHARED
Total Amount of Hard Disk Storage: 144 Gb
Total Amount of RAM: 6,144 MB
Clock Rate of CPU: 300 MHz
Section 1.2 System Comparisons
Amount of developmental "Software Engineering": ALL
List of features that are not present in the system, but would have been beneficial to have:
List of features that are present in the system, and impacted its performance, but are not detailed within this form: Parallelism was used.
Section 2.0 Construction of Indices, Knowledge Bases, and Other Data Structures
Length of the stopword list: 450 words
Type of Stemming: LOVINS
Controlled Vocabulary: NO
Term weighting: YES
  • Additional Comments on term weighting: Okapi BM_25
Phrase discovery: NO
  • Kind of phrase:
  • Method used: OTHER
Type of Spelling Correction: NONE
Manually-Indexed Terms: NO
Proper Noun Identification: NO
Syntactic Parsing: NO
Tokenizer: NO
Word Sense Disambiguation: NO
Other technique: NO
Additional comments: A simple SGML/HTML parser was used.
Section 3.0 Statistics on Data Structures Built from TREC Text
Section 3.1 First Data Structure
Structure Type: INVERTED INDEX
Type of other data structure used:
Brief description of method using other data structure:
Total storage used: 0.038 Gb
Total computer time to build: 0.085 hours
Automatic process: YES
Manual hours required: hours
Type of manual labor: NONE
Term positions used: NO
Only single terms used: YES
Concepts (vs. single terms) represented: NO
  • Number of concepts represented:
Type of representation: simple stemmed keyword
Auxilary files used: NO
  • Type of auxilary files used:
Additional comments: This is the training collection used
Section 3.2 Second Data Structure
Structure Type: INVERTED INDEX
Type of other data structure used:
Brief description of method using other data structure:
Total storage used: 0.085 Gb
Total computer time to build: 0.18 hours
Automatic process: YES
Manual hours required: hours
Type of manual labor: NONE
Term positions used: YES
Only single terms used: YES
Concepts (vs. single terms) represented: NO
  • Number of concepts represented:
Type of representation: simple stemmed keyword
Auxilary files used: NO
  • Type of auxilary files used:
Additional comments: This is the test collection used.
Section 3.3 Third Data Structure
Structure Type:
Type of other data structure used:
Brief description of method using other data structure:
Total storage used: Gb
Total computer time to build: hours
Automatic process:
Manual hours required: hours
Type of manual labor: NONE
Term positions used:
Only single terms used:
Concepts (vs. single terms) represented:
  • Number of concepts represented:
Type of representation:
Auxilary files used:
  • Type of auxilary files used:
Additional comments:
Section 4.0 Data Built from Sources Other than the Input Text
Internally-built Auxiliary File

File type: NONE
Domain type: DOMAIN INDEPENDENT
Total Storage: Gb
Number of Concepts Represented: concepts
Type of representation: NONE
Automatic or Manual:
  • Total Time to Build: hours
  • Total Time to Modify (if already built): hours
Type of Manual Labor used: NONE
Additional comments:
Externally-built Auxiliary File

File is: NONE
Total Storage: Gb
Number of Concepts Represented: concepts
Type of representation: NONE
Additional comments:
Section 5.0 Computer Searching
Average computer time to search (per query): 6.7 CPU seconds
Times broken down by component(s):
Section 5.1 Searching Methods
Vector space model: NO
Probabilistic model: YES
Cluster searching: NO
N-gram matching: NO
Boolean matching: NO
Fuzzy logic: NO
Free text scanning: NO
Neural networks: NO
Conceptual graphic matching: NO
Other: NO
Additional comments: The timings are the average query optimisation times.
Section 5.2 Factors in Ranking
Term frequency: YES
Inverse document frequency: YES
Other term weights: NO
Semantic closeness: NO
Position in document: NO
Syntactic clues: NO
Proximity of terms: NO
Information theoretic weights: NO
Document length: YES
Percentage of query terms which match: NO
N-gram frequency: NO
Word specificity: NO
Word sense frequency: NO
Cluster distance: NO
Other: NO
Additional comments:
Send questions to trec@nist.gov

Disclaimer: Contents of this online document are not necessarily the official views of, nor endorsed by the U.S. Government, the Department of Commerce, or NIST.