ABOUT BIOTA

BIOTA stands for Biosystematic Information on Terrestrial Arthropods (Arthropoda exclusive of Crustacea). The BIOTA database is managed and maintained by the Systematic Entomology Laboratory, Plant Sciences Institute, Agricultural Research Service, United States Department of Agriculture. BIOTA ecompasses all of the ongoing and anticipated ADP projects with the Systematic Entomology Laboratory (SEL), with the focus on producing a nomenclatorial database of the arthropods of North America. Although SEL is a primary developer of the nomenclatorial database, this project is a cooperative community-wide project. Without the support and contributions of individual specialists and institutions outside of SEL, this project cannot be successful.

ORGANIZATION AND COOPERATORS

BIOTA is managed and coordinated by a committee of eight scientists within SEL This group of people will help coordinate the gathering and review of data within their respective fields of expertise, either by providing original data or by working with cooperators within their area of taxonomic specialty.

COOPERATORS

Many people within the scientific community have graciously agreed to cooperate in helping to build the BIOTA Nomenclatorial Checklist. Much of this data is accessible through other specialty homepages or published reports. All of these sources are gratefully acknowledged and are cross- references as noted in the following examples.

Julian Donahue, Los Angeles, CA (Lepidoptera)
Rosser W. Garrison, Azusa, CA (Odonata)
Patrick McCafferty, Purdue University, West Lafayette, IN (Ephemeroptera)--See also May Fly Central Homepage:
Douglass Miller, Systematic Entomology Laboratory, ARS, USDA, Beltsville, MD (Coccoidea)--See also Scalenet Homepage:
John Morse, Clemson University, Clemson, SC (Trichoptera)--See also Trichoptera Homepage:
Lois B. O'Brien, Florida A & M University, Tallahasse, FL, and Steve Wilson, Central Missouri State University, Warrensburg, MO (Fulgoroidea)
Norman Platnick, American Museum of Natural History, New York, NY (Arachnida)
Randall T. Schuh, American Museum of Natural History, New York, NY (Miridae)
Margaret Thayer, Field Museum, Chicago, IL (Coleoptera)-- See also Coleoptera Homepage:
F. Christian Thompson, Systematic Entomology Laboratory, ARS, USDA, Washington, DC, and Neal Evenhuis, B. P. Bishop Museum, Honolulu, HA (editors, Diptera)

ODONATA

HISTORICAL BACKGROUND LEADING TO DEVELOPMENT OF BIOTA

This nomenclatorial database project had its inception with a 1991 meeting of a group of systematists and collection managers called the Entomological Collections Network (ECN). The group met to discuss the feasibility of a check list for North America and made preliminary determinations on the personnel available to contribute to it and the types of information they wished to see in the database. While the project was highly important, the group did not have the resources to undertake such a large project or to maintain the database once it had been created. In July 1993 individuals within the Systematic Entomology Laboratory (SEL) expanded the vision of ECN and began to work on a nomenclatorial database of the terrestrial arthropods of the world. This project is now a Laboratory project, involving all of the personnel within SEL, and has become the core of the BIOTA program.

WHY THE BIOTA DATABASE IS IMPORTANT TO YOU

Entomological systematists and collection managers must treat well over a million described species. Various estimates of the total number of species range from 2-100 million, but numbers near ten million are commonly used (Hammond 1992). For North America the number of valid species is more than 100,000 (Kosztarab and Schaefer 1990), and the estimated number of names is probably one third more. Whatever the exact numbers may be, insects, centipedes, millipedes, spiders, and mites represent an overwhelming majority of the species of plants and animals alive today. The completed nomenclatorial database will be the first comprehensive list of these taxa since Linnaeus's Systema Naturae, 13th edition (Gmelin 1790-1792).

The most used and useful resources for the collection manager, and systematists in general, are check lists and catalogs. The number of battered copies of check lists and catalogs on the shelves of curator's/collections managers' offices are a testament to this fact.

The numbers, ecological importance, and economic impact of terrestrial arthropods are of major significance at a time of greatly increased interest in, and concern with, biodiversity and the biological resources of the world. Collections of plant and animal specimens held by various institutions and individuals around the world have diverse functions: 1) legislation, 2) commerce (labelling of samples), 3) biological control, 4) quarantine, 5) agriculture (programs within USA agencies such as Department of Defense, Department of the Interior, Environmental Protection Agency, Food and Drug Administration, Health and Human Welfare, State, US Department of Agriculture), 6) biodiversity, 7) evolution, 8) ecology, 9) biogeography, and 10) natural resources. The creation, curation, and maintenance of these collections depend on accurate and complete authority files of the species, genera, and families of organisms together with their synonyms.

STRATEGY

Much data about scientific names exist. Beyond the name is its history (who created it and who subsequently used it), typification (what is its type? who designated the type? where is the type?), its placement within a classification, and its underlying concept now and in the past (what characters circumscribe the concept and what are their histories). Magnitude is the challenge for entomology: There are millions of kinds of arthropods; can entomologists know them all? The key to biosystematic knowledge is names: All information is indexed by scientific names. Thus, the first step to knowing biodiversity is to have a stable and comprehensive naming system. To develop this system, we need data about names. To know terrestrial arthropods, we need information about numerous arthropods. Much of the data about names (typification, circumscription, detailed taxonomic placement, history) is useful only to the systematist, who must make decisions about names and their underlying concepts. Users of names depend on systematists to know these data and to make correct decisions based on them. Users usually want only to know the correct (valid) name and the basic classification of the name (Class, order, family and genus). So initially, we will concentrate on providing the basic data that users need most.

DATA COVERAGE

The database is constructed in FileMaker Pro, version 3.0. This database format was chosen for simplification and ease of use. Databases constructed in FileMaker Pro are easy to manage and are easy to convert and export to other kinds of databases. Because BIOTA will be the information source for the ITIS database, our minimal format will be fully compatable with that system. However, BIOTA will be designed to accommodate much more comprehensive data.

Within a hierarchical framework, the nomenclatorial database consists of four tables: 1) species-group names, 2) biogeographic range table of valid species-group names, 3) genus-group names, and 4) family group names.

The Species Table

The records of the species table (Fig. 2) are designed to list every scientific name, available or unavailable, applied to species of terrestrial arthropods. These names include valid senior synonyms, junior synonyms, homonyms, misspellings, unavailable names, misidentifications, and invalid combination. Each record contains the following fields (each is explained as necessary)

ORDER: The order to which the name belongs.

FAMILY: The family to which the name belongs.

GENUS: The generic assignment for this name.

SPECIES: The species-group name.

AUTHOR(S): The author(s) of the name.

DATE: The date of publication of the name. The brackets and parentheses conventions are ignored; only the valid date for the species-group name is given.

STATUS: The currently recognized status of a species-group name. The status field appears as a pop-up list in the data entry program and allows the following possibilities.

Valid: This name is the currently recognized senior synonym for a species.
Syn.: This name is a junior synonym.
Homo.: This name is a junior primary or secondary homonym.
Unav.: This name is nomenclatorially unavailable. The name may be infrasubspecific, a nomen nudum, or fail some other criterion of availability under the Code of Zoological Nomenclature.
Missp.: This name is a misspelling of an available name. In cases of identical misspellings, only the first instance of the misspelling is listed in the database.
Misid.: A misidentification of a specimen through the incorrect application of a species-group name. Only misidentifications that have been published and are included in widely cited or important works will be included, e.g., misidentifications in revisions, identification manuals, and non-systematic papers.
Icomb.: This name is an invalid combination, representing an obsolete classification. Some species of great economic or social importance continue to be referred to by names used in older classifications. For example the corn earworm is now called Helicoverpa zea but commonly appears in the literature as Heliothis zea. Only invalid combinations of names of great importance will be captured.

ORIGINAL GENUS: This is the generic name associated with a species-group name at the time of its proposal. For example the black cutworm Agrotis ipsilon Hufnagel was described as Phalaena ipsilon Hufnagel 1766. The original genus, therefore, is Phalaena. This field is included for the use of those who follow the parenthesis convention of authors names and for the detection of primary homonymies. The original genus also allows for the reconstruction of the original combination, which can be used a primary key for valid names in a database representing multiple classifications.

VALID GENUS, VALID SPECIES, VALID AUTHORS & VALID DATE: These four fields contain the valid name, author(s), and date for the species-group name. The valid species-group name is given for the species in its original orthography (spelling) of the name unless corrected under the Code of Zoological Nomenclature.

AUTHORITY & TELEPHONE: These two fields contain the name and telephone number (or E-Mail address) of someone who has offered to serve as an authority for the species-group name in the record in case the user of the database needs more information on a particular species.

NA SPECIES:A logical true or false field reflecting the occurrence or non-occurrence of the species in North America north of Mexico. A synonym or some other status name automatically takes the true or false value of the valid name for the species. The range table subdivides the Nearctic into Mexico, conterminous United States, Canada, Alaska, and Greenland.

COMMENT: The comment field is a memo field of no fixed length and serves various purposes. Primarily it lists the source of the name, either a publication or a non-published source. If the source is a publication, such as a catalog or revision, the inclusion of the source gives the user entry into more extensive literature on the species, e.g., biologies, foodplants, distributions, etc. The memo field is also used to include information about misspellings, misidentifications, unavailable names, and homonymies. Although unlimited in length the source field is not to be used for information outside the scope of a check list. The source memo field appears as a button in Fig. 1 whose selection produces a pop-up editing window.

All other buttons in figure 2 represented by words in brackets or parentheses are control buttons for the data entry program.

The Range Table

The Range table gives the geographical range of each valid species by biogeographic region. More detailed distribution information falls outside the scope of a check list project. The exception is that North America is subdivded into Mexico, conterminous United States, Canada, Alaska, and Greenland. Within the data entry program the range table appears as an associated browse table within the species entry screen (Fig. 3) (only part of the table is visible in the figure). It is also accessible as a separate screen in the data entry program (Fig. 4). The fields in the range table are as follows:

GENUS & SPECIES: These two fields contain the currently accepted name for a species and replicate each valid name in the species-group name table.

NEARCTIC, NEOTROPIC, PALEARCTIC, ETHIOPIAN, ORIENTAL, AUSTRALIAN, OCEANIC & ANTARCTIC: The fields represent the various biogeographical regions. Within each field the presence of a species in that region is indicated with a small "x." The field is blank if the species does not occur in that region. The regions generally follow biological boundaries, although there are exceptions caused by practical limitations imposed by the type of geographical data available. For example, when a data source gives only "Mexico," then for practical reasons, that "Mexico" is considered part of the Nearctic region, even though Mexico is partly Neotropical and partly Nearctic. Without further information, it is impossible to determine whether the species recorded as occurring in "Mexico" occurs in the Neotropical or Nearctic parts of the country.

There are two exceptions to the rule of presence or absence of a species in a biogeographical region, the NEARCTIC and OCEANIC fields. The NEARCTIC field can contain five flags: 1) m = Mexico, 2) u = conterminous United States, 3) c = Canada, 4) a = Alaska, and 5) g = Greenland. For example, if a species occurs in the United States, Canada, and Alaska, the letters "uca" are entered in the field. The complete sequence of letters is "mucag." The state of Hawaii is part of the Oceanic biogeographical region.

The Nearctic region has been subdivided for a number of reasons. The first stage of the project is to produce a list of the North American species even though the eventual product will cover the entire world. The National Biological Survey also will need a list developed for the United States, and several Federal agencies need a break down of North American range based on legal and political definitions, not biological ones.

The state of Hawaii is in the Oceanic region. Presence of a species in Hawaii is indicated by the letter "h." If a species occurs in the Oceanic region, but not Hawaii, the letter "x" is used. If a species is absent in the Oceanic region, the field is left blank.

The other words in the screen (Fig. 4) represent database control buttons.

The Genus Table

The genus table (Fig. 5) contains all of the genus-group names. The fields in this table closely parallel the species table.

ORDER, FAMILY, GENUS, AUTHOR(S) & DATE: These fields are essentially the same as in the species-group name table.

STATUS: This field has the same structure as the equivalent field in the species-group name table with the same choices in the pop-up list. The possiblity of invalid combination (ICOMB), of course, is missing.

VALID GENUS, VALID AUTHOR(S) & VALID DATE: The accepted name if the genus-group name of the record is not valid.

COMMENTS: The same field as in the species-group name table.

Buttons in the top right-hand corner of the screen represent database controls.

The Family Table

The family-group name table contains all of the family-group names. Only the basic data element on them is captured in the following fields (Fig. 6):

ORDER: As in the species and generic group name tables. FAMILY, AUTHOR(S) & DATE: The family-group name as originally spelled, its author(s) and the date of publication.

VALID FAMILY, DATE, VALID SUBFAMILY, DATE, VALID TRIBE & DATE: These six fields are contained in a box in Fig. 6 labeled "The Current Classification" and represent the current classification into which the family-group name falls. This particular example is simple: The family-group name Anataelinae was proposed as a subfamily by Burr in 1909 and currently is considered to be a subfamily of the Pygidicranidae. There is no current tribal classification of this family of the earwigs. Therefore the tribal level of the box is empty.

COMMENTS: See under the species- and genus-group name tables.

Data Sources and Previous Work

The knowledge needed to build the checklist is scattered across many outdated published works and in personal research files. Some of the data are summarized in catalogs, a few of which are based on computerized databases. More or less complete taxonomic catalogs exist in some form for most taxa of North American terrestrial arthropods. Many of these are in some kind of electronic storage. Many of them, however, are not generally available, and all are not quite complete, but these problems can be solved. The status of our knowledge of the terrestrial arthropod fauna of North America is summarized in two recent works (Danks 1979 for Canada, Kosztarab & Schaefer 1990 for United States; CONABIO is preparing a similar summary for the Mexican fauna (Llorente in correspondence). These works list all the major published data sources, which are not here cited.

Management Plan

The organization of the project is three tiered (Fig. 7). The first level is a Steering Committee consisting originally of Ronald W. Hodges, Robert W. Poole [responsible for program and enquiry aspects], and F. Christian Thompson (Systematic Entomology Laboratory), Charles Mitter (University of Maryland), Ronald McGinley (Smithsonian Institution), and Scott Miller (Bishop Museum). They made decisions on database structure, set data entry protocols, and coordinate with the outside contributors to the project. Together they are responsible for the construction of the database, its completion, and its integrity. The Systematic Entomology Laboratory has assumed responsibility for the maintenance of the database, updating and distribution of the completed database.

The second level is an Advisory Committee. The members represent diverse interests by scientific study, biological group, and geographic location. They are available to advise the steering committee.

The third level of the organization consists of a network of collaborators. The list continues to grow as the project becomes better established. Responsibilities of collaborators vary. Some will provide parts of the database relevant to groups they work on. Others will coordinate other collaborators for major groups as well as enter data. For example, Margaret Thayer will coordinate the entry of the Coleoptera names for North America.

The management plan incorporates five levels of activity. Advice will be sought from the Advisory Committee and scientific community through the Entomological Collections Network. Advice will be considered by the Steering Committee and incorporated into the project. The database, standards and protocols established by the Steering Committee will be implemented by the Production and Scientific Editors. They will oversee the work of the catalogers, handle the data flow to and from the collaborators, and insure the integrity of the information. Data will be searched for and/or entered by the catalogers, who will make modifications as directed by the editors or in accordance to standards and protocols. Names and associated data ultimately will be provided by systematists, who will certify the final data.

Dissemination

The database will be made available through INTERNET. The server will use UNIX as an operating system and will be connected through one of the nodes already established by the Agriculture Research Service of the United States Department of Agriculture at its Beltsville campus. The database will be accessed through TELNET and FPS. The front end of the database will be a series of screens allowing the user to view the data in browse tables filtered to fit the user's needs or through a series of queries. Query results may be viewed on the screen or written to text files that the user may down load through FPS. We intend to make the database available in part as soon as a significant block of data is entered and reviewed

The database also will be distributed on CD-ROM. A machine to master the CD-ROM has been acquired. The first CD-ROM will be the North American names, but will be expanded to cover the world fauna

References

ASC. 1992. An Information Model for Biological Collections. Report of the Biological Collections Data Standards Workshop, August 18- 24, 1992. 99 pp. DRAFT October 1992.

CSIRO. 1993. Australian National Insect Collection. Database field specifications, data dictionary and user guide. 31 pp., CSIRO Division of Entomology, Canberra 25 May 1993.

Danks, H. V. 1979. Canada and its insect fauna. Mem. Entomol. Soc. Canada 108:1-573.

Gmelin, J. F. 1790Ñ1792. Systema Naturae, Editio decima tertia, aucta, reformata. 1(4Ñ6): 1517Ñ3020. Lipsiae.

Hammond, P. 1992. Species inventory. Pp. 17-39. In Groombridge, B. (ed.), Global Biodiversity. Status of the Earth's Living Resources. xix + 585 pp. Chapman & Hall.

IOPI. 1993. World Vascular Plant Checklist. A Case Model of Checklist Data. DRAFT, Version 4.0 04.02.1993.

Kosztarab, M, & Schaefer, C. (eds.) 1990. Systematics of the North America Insects and Arachnids: Status and Needs. Virginia Agric. Exp. Sta., Information Ser. 90-1, 247 pp.

Miller, S. E. 1992. Specimen databases and the lack of standard nomenclature: A proposal for North American Insects. Insect Collection News 7:7-8.

Thompson, F. C. (coordinator). 1990. Automatic Data Processing for Systematic Entomology: Promises and Problems. A Report for the Entomological Collections Network. [48 pp.] Entomological Collections Network, Baton Rouge.