Information Technology Lab, Information Access Division NIST: National Institute of Standards and Technology

  • Multimodal Information Group Home
  • Benchmark Tests
  • Tools
  • Test Beds
  • Publications
  • Links
  • Contacts
  • ACE Phase 1 (2000)
    EDT Guideline Clarifications


    Prepared March 9-10, 2000
    with corrections to 2c and 3 made March 12, 2000
    based on a presentation  to the EDT community, March 9, 2000
    A. Meyers and R. Grishman, NYU

    1.  Extents

    It is important to include all modifiers in the extent, even if they are non-restrictive modifiers or modifiers of the modifiers.

    1a. Floated quantifiers

    A floated quantifier is a quantifier, such as "both", "all", or "each" which modifies a noun phrase but appears in a position other than at the beginning of the noun phrase;  for example, "Fred and Mary both like parsing."  Floated quantifiers should not be included in the extent of the noun phrase they modify.  Thus in
     
    [He] and [the pope] both want to end the exploitation of man by man.


    the extents are "he" and "the pope".

    1b.  Identification of announcers in broadcast non-sentences

    In the phrase
     
    "[Pamela McCall] in [the [BBC] news room in [London]]"


    the extent of the announcer mention is just "Pamela McCall";  the following phrases are not part of the mention of the announcer.  This would also be the case if the text was
     

    "[Pamela McCall], BBC News, [London]"


    Similarly for "Orelon Sidney with your latest headline news weather update", the mention of the announcer is just "Orelon Sidney".

    2.  GSPs

    2a.  "Arab"

    "... she will discuss this situation with [Arab] leaders." (CNN 599)


    "Arab" is not an adjective associated with a GSP [it is not associated with a specific political unit or set of units]

    2b.  "the French"

    "I'm still working with uh Russians, [the French], the United Nations"


    Treating words like "the French", "the English", "the Dutch" when they refer to a group of people:   they are instances of plural nouns which evoke entities of type person.  Thus, in this example, "the French" evokes an entity of type person with extent "the French" and head "French".   This contrasts with anaphoric usage, such as "You like the German wine, but I like the French."  In such a case, "French" is a proper adjective which evokes a GSP entity.

    2c.  plural GSPs

    A mention of a set of GSPs, in the form of an NP whose head is a plural GSP noun, evokes an entity of type GSP.  Thus the phrase "the southern states" evokes an entity of type GSP.  In contrast, the phrase "the South" evokes an entity of type location. The difference between
    these cases is the type of the head of the phrase.

    3.  Named heads of unmarkable phrases

    If a full noun phrase is not markable by EDT rules, and the head of the phrase is a name which, if it stood alone, would evoke an EDT entity, then treat the name as a mention whose extent and head are this name.
     
    "Today's [New York Times]"  (NYT 485)


    In this example, "Today's New York Times" refers to an issue of a newspaper, which is not a reportable entity.  However, the head of the phrase, "New York Times", is a name which is a mention of an entity of type organization.  Therefore we generate a mention of this entity, whose head and extent are "New York Times".
     

    "The next [Jordan]? Not Yet..." (NYT 471)


    Here "Jordan" does refer to a real person, but the full noun phrase means something like "the next person like Jordan" and in the current context is an attribute which is not being positively asserted (of Kobe Bryant).  It is therefore not markable (generic).  However, the head, "Jordan", is markable as a mention of the entity Michael Jordan.

    4. Periodicals are organizations

    Following the named entity guidelines (NE99, section 5.4.4), all periodicals are to be treated as organizations.  We don't require annotators to know which periodicals correspond to organizations of the same name
     
    The Archives of Disease in Childhood (CNN 1006)

     
    The New York Times


    Both of these are to be treated as organizations.

    5. Names

    As the guidelines indicate, names are atomic ... names within other names are not to be marked.  It is therefore important to know what constitutes a name, even if the name is not annotated (because it is not one of the EDT types).

    Law cases are not names, so the individual constituents (the plaintiff and defendant) are markable:
     

    [Barry] vs. [the United States] (NYT 485)


    Holiday names are names, so "Valentine" is not markable (as a person entity) in
     

    Valentine's Day (Minicorpus 1 APW 467)


    Speech names, such as "State of the Union" (Minicorpus 1:ea 95), are names, so the constituent "the Union" is not markable.

    Titles are not names, so in "Secretary of [state]" (CNN 599) "state" can be annotated as a reference to an organization (the Department of State).

    6. Facility vs. organization

    Many entities, such as museums and schools, have characteristics of both facilities and organizations.  In some cases, we explicitly list in the guidelines whether such entities should be typed as facilities or as organizations.
    For other ambiguous entities which are not explicity listed, we provide the following general guidance:  if the class of facility/organization evokes in the listener an image of a building (e.g., a hospital or a hotel), then classify the entity as a facility.

    7.  TV Programs

    A TV network or station is an organization, but a TV program is not. If a TV network name is embedded in a program name (ABC News, CBS Nightly News), it is not markable since names are indivisible.

    Although some program names may also be names of organizations, this may be difficult for the annotator to ascertain, so we uniformly consider TV program names not markable.

    8.  Organization/person

    If a text does not make clear if an entity is an organization or a set of people (without formal association), mark the entity as a person.  In all three cases below, the bracketed item is to be marked as a person entity.
     
    "[Animal advocates] _ with a big assist from the pet food lobby _  say these problems should be dealt with on a case-by-case basis."  (NYT 486)
    "[Authorities] have just used water on the fire for the first time  and they say there was a risk the impact of the water on four hundred tons of burning ammonium nitrate, might have caused an explosion." (mini-1 CNN 3)
    "[Authorities] said the new casualties included wounded people who later died in hospital and bodies of victims taken from the scene before being accounted for." (mini-1 APW 484)

     

    9. "We/Us/Our" in news broadcast articles

    "We/Us/Our" in news broadcast articles are to be marked as mentions of entities of type person.  They are considered to refer to the staff of the News show, and not to be coreferential with the broadcasting organization (ABC, CBS).

        "Just before [we] leave you, a brief review of the day's two major stories."  (ABC 1676)

        "A couple of minutes ago, [we] reported that the big tobacco companies were in Congress today, pushing hard for a national settlement of al l the lawsuits against them. (ea 809)

        "Finally from [us] this evening, [our] regular Monday report" (ea 1564)
     

    10. "The World"

    "The world" is frequently used to represent the entire population of the planet or all the governments, etc.   It is markable as a mention of an entity of type location.

        "[The entire world] will see images of the Pope in Cuba." (ea 877)
     

    11.  Generic vs. specific

    11a. Indefinite NPs that are reported as part of concrete events should be interpreted as specific (nongeneric), even in cases of plural NPs of nonspecific number. Consider

       "[Animal advocates] _ with a big assist from the pet food lobby _
       say these problems should be dealt with on a case-by-case basis."
       (NYT 486)

    If you believe that an actual set of animal advocates are taking this position, then it is nongeneric. If you believe that the reporters are making a generalization about animal advocates around the country or throughout the entire world, then it should be interpreted as generic.   If the text read "said" (instead of "say"), it would probably be referring to a specific event and so would be a nongeneric NP.

    11b. If there is a literal interpretation of a sentence with an NP referring to a specific set, the NP is markable, even if there is an implied reading in which the NP is generic (does not refer to a specific set).  For example, in
     

    "All the musicians that I've known are drunks."


    The speaker is making a statement about a finite set (all the musicians that I've ever known). The audience may be expected to generalize this set to all musicians generically, but the literal reading is specific, so this mention is reportable.

    11c.  If there is a literal interpretation of a sentence with an NP referring to a generic set, the NP is generic (non-markable), event if there is an implied coreference of this (overtly generic) NP with another entity mentioned in the discourse.  Thus in
     

    Fiercely agressive representation ... cannot be an excuse for smearing [a lawyer] (NYT 463)


    "a lawyer" is to be treated as generic (non-markable), even though there is an implied reference to a specific lawyer mentioned earlier in this discourse.

    [ ACE Home ]

     

     

     

    Page Created: September 6, 2007
    Last Updated: November 4, 2008

    ACE Phase 1 links:

    ACE Phase 1 Home

    Documentation

    Schedule

    Resources

    Contacts

    ACE Home

    Multimodal Information Group is part of IAD and ITL
    NIST is an agency of the U.S. Department of Commerce
    Privacy Policy | Security Notices
    Accessibility Statement | Disclaimer | FOIA