PROPOSAL NO.: 2000-08

DATE: May 10, 2000
REVISED:

NAME: Definition of Additional Subfields in field 754 (Added Entry--Taxonomic Identification) in the Bibliographic format

SOURCE: Florida Center for Library Automation

SUMMARY: This paper proposes adding subfields to field 754 to provide different levels of hierarchy to record taxonomic identification. This would be used instead of repeating subfield $a if desired to allow for more flexible searching and display of the data in the field.

KEYWORDS: Field 754; Taxonomic identification; Added Entry--Taxonomic Identification

RELATED:

STATUS/COMMENTS:

05/10/00 - Forwarded to the MARC Advisory Committee for discussion at the July 2000 MARBI meetings.

07/09/00 - Results of MARC Advisory Committee discussion - Rejected.
LC should rewrite the proposal using the structure that is in field 654 (Subject Added Entry--Faceted Topical Terms) because it may be more expandable.

07/27/00 - Results of LC/NLC review - Agreed with the MARBI decisions.


PROPOSAL NO. 2000-08: Definition of Additional Subfields in Field 754

1. BACKGROUND

Field 754 (Added entry--Taxonomic Identification) contains taxonomic identification information associated with the item described in the record. It has only two subfields defined apart from standard $6 and $8 subfields. Subfield $a is defined as "taxonomic name/taxonomic hierarchical category" and $2 is for the "source of taxonomic identification". MARC 21 explains that "subfield $a contains a taxonomic name and the taxonomic hierarchical category to which the name belongs. The taxonomic name conforms to the syntax controls of the taxonomic classification system identified in subfield $2. Subfield $a is repeatable for each taxonomic name/taxonomic hierarchy associated with the item. Each combination is input in repeatable $a subfields in taxonomic hierarchial order." The example given is:

754 ##$aPlantae (Kingdom) $aSpermatophyta (Phylum) $aAngiospermae (Class)
$aDicotyledoneae (Subclass) $aRosales (Order) $aRosaceae (Family) $aRosa (Genus) $asetigera
(Species) $atomentosa (Variety). $2[code for Lyman David Benson's Plant Classification]

2. DISCUSSION

Field 754 is used by the State University System of Florida for taxonomic information in Florida Environments Online, a database of resources pertaining to the species and ecology of Florida. It has not been determined whether field 754 is used by anyone else, although the Florida Center for Library Automation is advocating the use of 754 within the Z39.50 Biological Interest Group (Z-Big) when mapping its biological attribute set, commonly known as the Darwin Core, to MARC.

A problem with field 754 is that it uses a flat structure (repeated subfields $a) to represent hierarchical information. Different levels of the hierarchy are supposed to be designated by parenthetical labels within the subfield itself, e.g. "$a Rosa (Genus)".

There are several disadvantages to this approach. The primary problem is that it is not possible to provide a direct mapping from a taxonomical attribute to a simple field/subfield combination. When searching for the Genus "Rosa", for example, it is not possible to construct a search of 754 $a as this would be non-distinct. One must instead use a complex search that 754 $a contains the term "Rosa" adjacent to the term "Genus", or that the terms "Rosa" and "Genus" are both contained within the same occurrence of $a, or some similar construction.

A second problem is that labels are embedded within data fields, causing a lack of flexibility on display. Whereas some implementations may wish to display complete taxonomic hierarchy with embedded labels:

others, particularly within scientific applications, may want to compress this: Genus and species, for example, are very commonly used together for searching, and in some applications may be the only taxonomic information recorded. A search on "Rosa setigera" will probably want to return data in the form of "Rosa setigera" rather than "Rosa (Genus) setigera (Species). While the desired display is possible, it requires parsing textual information to strip parenthetical data held within it.

It is proposed that the definition and use of subfield $a remain essentially unchanged. This will obviate the need for retrospective conversion of existing 754 fields, and will present an option for implementations that wish to enter data with parenthetical labels. It is proposed that additional subfields be defined in field 754 to represent discrete layers of hierarchy if that approach is desirable. These would be used instead of repeatable subfields $a with parenthetical labels. Subfield $b is reserved for the highest level of taxonomic information (whatever that might be in the taxonomical scheme indicated in $2), $c for the next highest level, and so on. This retains the advantage of the current definition that it does not assume any particular taxonomic classification system. That is, $b is not tied to "Kingdom", $c to "Phylum" etc. A few subfields are undefined, allowing for future definition if needed. Subfield $z, defined in some fields for a public note, could also be defined as such in 754; in the Florida implementation it would contain the common name corresponding to the scientific name given in other subfields.

This would allow simpler mapping to search use attributes and more flexible display, as it is easier to add a constant label than to strip embedded data. In addition it would allow taxonomic information already created to remain in its present form, while also allowing the same information to be recorded in the following form:

754 $b Plantae $c Spermatophyta $d Angiospermae $e Dicotyledoneae $f Rosales $g Rosaceae
$h Rosa $i setigera $j tomentosa $2[code for Lyman David Benson's Plant Classification]

3. PROPOSED CHANGES

In field 754 (Added entry -- Taxonomic identification) in the MARC 21 Bibliographic Format (<> indicates addition; [] indicates omission):


Library of Congress Library of Congress
Library of Congress Help Desk (01/26/01)