System Summary and Timing Organization Name: MDS at RMIT List of Run ID's: MDS001 MDS002 MDS003 Construction of Indices, Knowledge Bases, and other Data Structures Methods Used to build Data Structures - Length (in words) of the stopword list: none - Controlled Vocabulary?: no - Stemming Algorithm: Lovins - Morphological Analysis: NO - Term Weighting: IDF - Phrase Discovery?: no - Syntactic Parsing?: no - Word Sense Disambiguation?: no - Heuristic Associations (including short definition)?: no - Spelling Checking (with manual correction)?: no - Spelling Correction?: no - Proper Noun Identification Algorithm?: no - Tokenizer?: - Manually-Indexed Terms?: no - Other Techniques for building Data Structures: no Statistics on Data Structures built from TREC Text - Inverted index - Run ID: MDS001 - Total Storage (in MB): Not Applicable - Total Computer Time to Build (in hours): Not Applicable - Automatic Process? (If not, number of manual hours): Not Applicable - Search Times - Run ID: MDS002 - Computer Time to Search (Average per Query, in CPU seconds): Because awk scripts are used extensively to run many processes, we do not have CPU times available. Thus elapsed times are given: 75 sec elapsed time. - Component Times: First search: 20 sec elapsed time Term expansion: 15 sec elapsed time Second search: 30 sec elapsed time Combination: 10 sec elapsed time (Estimates due to awk scripts.) - Search Times - Run ID: MDS003 - Computer Time to Search (Average per Query, in CPU seconds): 75 sec elapsed time - Component Times: First search: 20 sec elapsed time Term expansion: 15 sec elapsed time Second search: 30 sec elapsed time Combination: 10 sec elapsed time (Estimates due to awk scripts.) Factors in Ranking - Term Frequency?: yes - Inverse Document Frequency?: yes - Other Term Weights?: no - Semantic Closeness?: no - Position in Document?: no - Syntactic Clues?: no - Proximity of Terms?: no - Information Theoretic Weights?: no - Document Length?: yes, pivoted - Percentage of Query Terms which match?: no - N-gram Frequency?: no - Word Specificity?: no - Word Sense Frequency?: no - Cluster Distance?: no - Other: no Machine Information - Machine Type for TREC Experiment: Sparc 10 (4 cpus) - Was the Machine Dedicated or Shared: shared - Amount of Hard Disk Storage (in MB): 20,000 - Amount of RAM (in MB): 245 - Clock Rate of CPU (in MHz): 50 System Comparisons - Given appropriate resources: - Could your system run faster?: Yes. For MDS001, the system was simulated. Exhaustive search was used to compute similarities. We are planning to use the information learnt in these experiments as input to a new version of mg. For MDS002, 003 simple awk scripts were used for finding terms for expansion, and for combining. Both of these processes were much lower than need be. - Features the System is Missing that would be beneficial: Phrases