System Summary and Timing
  Organization Name: Rank Xerox Research Centre (RXRC)
  List of Run ID's: base.xerox, simple.xerox, join.xerox, join-short.xerox, 
             cmwe.xerox, cjoin.xerox (NLP); xerox-spS, xerox-spP, xerox-spT, 
             xerox-spD (SPANISH/SP)

  Construction of Indices, Knowledge Bases, and other Data Structures 

    Methods Used to build Data Structures 

    - Length (in words) of the stopword list: 619 (NLP), by part of speech 
      (SP) 
    - Controlled Vocabulary?: no  
    - Stemming Algorithm: morphology            
      - Morphological Analysis: yes, inflectional (English/Spanish) 
    - Term Weighting:  sqrt(tf)*idf/sqrt(doc-length) 
    -  Phrase Discovery?:  yes, some runs            
      - Kind of Phrase: adjacent pairs, syntactic pairs (NLP), adjacent noun 
        pairs (SP) 
      - Method Used (statistical, syntactic, other): syntactic, statistical 
    -  Syntactic Parsing?:  yes, some runs 
    -  Word Sense Disambiguation?: no 
    -  Heuristic Associations (including short definition)?: no 
    -  Spelling Checking (with manual correction)?: no  
    -  Spelling Correction?: no 
    -  Proper Noun Identification Algorithm?:  no 
    -  Tokenizer?:  standard SMART tokenizer            
    -  Manually-Indexed Terms?: no 
    -  Other Techniques for building Data Structures: no 

    Statistics on Data Structures built from TREC Text

    - Inverted index           
      - Run ID: base.xerox 
      - Total Storage (in MB): 90 
      - Total Computer Time to Build (in hours): 0.13 (real) 
      - Automatic Process? (If not, number of manual hours): yes  
      - Use of Term Positions?: no 
      - Only Single Terms Used?: yes 
    - Inverted index           
      - Run ID: simple.xerox  
      - Total Storage (in MB): 125  
      - Total Computer Time to Build (in hours): 0.3 (real) 
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions?: no 
      - Only Single Terms Used?: no, stems and phrases 
    - Inverted index           
      - Run ID: join.xerox 
      - Total Storage (in MB): 126 
      - Total Computer Time to Build (in hours): 0.3  
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions?: no 
      - Only Single Terms Used?: no, stems and syntactic pairs 
    - Inverted index           
      - Run ID: join-short.xerox 
      - Total Storage (in MB): 126 
      - Total Computer Time to Build (in hours): 0.3 
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions?: no 
      - Only Single Terms Used?: no, stems, phrases and syntactic pairs 
    - Inverted index           
      - Run ID:  cmwe.xerox 
      - Total Storage (in MB): 126 
      - Total Computer Time to Build (in hours): 0.3 
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions?: no 
      - Only Single Terms Used?: no, stems and syntactic pairs 
    - Inverted index           
      - Run ID:  cjoin.xerox 
      - Total Storage (in MB): 126 
      - Total Computer Time to Build (in hours): 0.3 
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions?: no 
      - Only Single Terms Used?: no, stems, phrases and syntactic pairs 
    - Inverted index           
      - Run ID: xerox-spS 
      - Total Storage (in MB): ??? 
      - Total Computer Time to Build (in hours): ??? 
      - Automatic Process? (If not, number of manual hours): ??? 
      - Use of Term Positions?: no  
      - Only Single Terms Used?: yes  
    - Inverted index           
      - Run ID: xerox-spP/spT/spD 
      - Total Storage (in MB): 191 
      - Total Computer Time to Build (in hours): 0.5 
      - Automatic Process? (If not, number of manual hours): yes 
      - Use of Term Positions?: no 
      - Only Single Terms Used?: no, stems and phrases 
    - Clusters           
    - N-grams, Suffix arrays, Signature Files           
    - Knowledge Bases            
      - Use of Manual Labor                  
    - Special Routing Structures           
    - Other Data Structures built from TREC text           

  Query construction

    Automatically Built Queries (Ad-Hoc)

    - Topic Fields Used: title, desc, narr (not in join-short.xerox, 
      xerox-spD) 
    - Average Computer Time to Build Query (in cpu seconds): 9 
    - Method used in Query Construction          
      - Term Weighting (weights based on terms in topics)?:  yes 
      - Phrase Extraction from Topics?: yes, some runs 
      - Syntactic Parsing of Topics?: yes, some runs 
      - Word Sense Disambiguation?:  no 
      - Proper Noun Identification Algorithm?: no 
      - Tokenizer?:   SMART's standard tokenizer              
      - Heuristic Associations to Add Terms?: no 
      - Expansion of Queries using Previously-Constructed Data Structure?: 
        ??? (tagging???)              
      - Automatic Addition of Boolean Connectors or Proximity Operators?: no 
      - Other: no 

    Manually Constructed Queries (Ad-Hoc)

    - Topic Fields Used: title, desc, narr 
    - Average Time to Build Query (in Minutes): 10 
    - Type of Query Builder          
      - Domain Expert: no 
      - Computer System Expert: yes 
    - Tools used to Build Query          
      - Word Frequency List?:  no 
      - Knowledge Base Browser?: no                
      - Other Lexical Tools?: no               
    - Method used in Query Construction          
      - Term Weighting?: yes 
      - Boolean Connectors (AND, OR, NOT)?:  no 
      - Proximity Operators?:  no 
      - Addition of Terms not Included in Topic?: no              
      - Other: no 

  Searching

    Search Times

      - Run ID: averaged over all runs 
      - Computer Time to Search (Average per Query, in CPU seconds): 2s 

    Machine Searching Methods

      - Vector Space Model?:  yes 

    Factors in Ranking

      - Term Frequency?:  yes 
      - Inverse Document Frequency?: yes 
      - Other Term Weights?: no 
      - Semantic Closeness?: no 
      - Position in Document?: no 
      - Syntactic Clues?: no 
      - Proximity of Terms?: no 
      - Information Theoretic Weights?: no 
      - Document Length?: yes 
      - Percentage of Query Terms which match?: 1.09-1.80  (NLP) 
      - N-gram Frequency?: no 
      - Word Specificity?:  no 
      - Word Sense Frequency?: no 
      - Cluster Distance?: no 
      - Other: no 

    Machine Information

    - Machine Type for TREC Experiment: SPARC Ultra I 
    - Was the Machine Dedicated or Shared: shared 
    - Amount of Hard Disk Storage (in MB): 9000 
    - Amount of RAM (in MB): 132 
    - Clock Rate of CPU (in MHz): 167 MHz 

    System Comparisons 

    - Given appropriate resources            
      - Could your system run faster?: yes 
    - Features the System is Missing that would be beneficial:  Boolean 
      matching, proximity information

    Significant Areas of System

    - Brief Description of features in your system which you feel impact the 
      system and are not answered by above questions : ???