For the topic detection evaluation, the sub-setting and scoring is performed after the reference topics have been mapped to the hypothesis topics. This way the topic cluster mapping is not conditioned on the subsets. The tracking evaluation script scores each subset independently.
The format of the subset file is a simple SGML file with the format:
The index file examples directory, doc/example_indexes, contains three subset definition files which were generated by the TDT3BuildIndex.pl script. They are
In the 'set' tag, the 'title' field is currently unused, but the 'heading' field is used to differentiate subsets in the generated report. The 'filename' field of the 'source_file' tag identifies which source files are included in the subset. This value should be root filename of the processed source file, i.e. the filename without a parent directory specified, (e.g. tkntext or asrtext), nor a file type extension, (e.g. .tkn or .asr).