Source File Subset Definition Files
Source file Subset Definition (SSD) files

The default operation of the, TDT3trk.pl and TDT3det.pl programs will score the output of a system for the entire test corpus as defined by the index file. If desired, the program will accept as input, via the -U SubsetFile option for TDT3trk.pl, or -S SubsetFile option for TDT3det.pl, a subset definition file, which the program uses to compute performance statistics over arbitrarily defined, independent subsets.

For the topic detection evaluation, the sub-setting and scoring is performed after the reference topics have been mapped to the hypothesis topics. This way the topic cluster mapping is not conditioned on the subsets. The tracking evaluation script scores each subset independently.

The format of the subset file is a simple SGML file with the format:

The index file examples directory, doc/example_indexes, contains three subset definition files which were generated by the TDT3BuildIndex.pl script. They are

In the 'set' tag, the 'title' field is currently unused, but the 'heading' field is used to differentiate subsets in the generated report. The 'filename' field of the 'source_file' tag identifies which source files are included in the subset. This value should be root filename of the processed source file, i.e. the filename without a parent directory specified, (e.g. tkntext or asrtext), nor a file type extension, (e.g. .tkn or .asr).