For each line in the SEE results tables: Document selector code (A-J) a letter code assigned to the person at NIST (or other evaluation: TREC or TDT) who chose the documents to be summarized Model summarizer code (A-J) a letter code assigned to the person who created the model summary Assessor code (A-J) a letter code assigned to the person who judged the peer summary Peer summarizer code (baseline[1-5], manual[A-J], or system submission[6-26]) a code indicating the source of the peer summary, i.e., the one being judged. 1-5 indicates a simple automatically algorithm was used. NIST created these and the definitions are in the guidelines A-J indicate a human created this summary - this way we can compare manual against manual to see an upper limit on performance 6-26 indicate that a participating system creating the summary As I indicated in the email announcing results, these codes are found in the table on the results page: If you want lines for the summaries you submitted, just look at the table for the task you are interested in, and at all lines with a peer summarizer code equal to your code from the results page table. Count of quality questions with non-0 answers Simply the number of the 12 quality questions with scores > 0 Mean of the quality question scores Average of the 12 scores each with value 0-3 ######### 12 peer quality questions ask for counts of ERRORS Q1 (0 = 0, 1 = 1-5, 2 = 6-10, 3 = 11 or more) Q2 (No questions asked on 10-word summaries) Q3 ... Q12 You can find these questions by following the link to the manual evaluation protocol. No quality questions were asked about the task 1 (very short summaries) since the format of those summaries was unrestricted and most of the questions need not have applied. Fraction of unmarked peer units at least related to the model's subject We ask the assessor what fraction of the number of peer units which didn't overlap at all in meaning with any model unit was at least related to the subject of the model Number of peer units Number of rough sentences in the peer Number of marked peer units Number of peer units that the assessor felt expressed at least some of the meaning of the model Number of unmarked peer units Number of peer units that the assessor felt did not express any of the meaning of the model Number of model units The number of roughly elementary discourse units (e.g., clauses etc) in the model Mean coverage As indicated in the protocol, the assessor judges the coverage by the peer summary of each unit in the model. This is the mean of those coverage scores. Median coverage median of the per-model-unit coverage scores Sample std of coverage scores sample standard deviation of the per-model-unit coverage scores Mean length-adjusted coverage Median length-adjusted coverage Sample std of adjusted coverage scores These are the analogous statistics but for a coverage score that emphasizes brevity as well as coverage. Details here: