Database Objects: Report on the SWNVF Database

Relational Databases
Primary Tables
Foreign Key Tables

III. DATABASE OBJECTS AND STRUCTURE

The database is a collection of visual (field) and laboratory measurements as well as supporting metadata, taken for samples from geographic points or for depth intervals from field observations or geophysical logs within the SWNVF. Field measurements include such items as color, alteration, and texture. Laboratory measurements include various kinds of chemical analyses, age determinations, X-ray diffraction analyses, petrographic analyses, including optical mineralogic analyses for individual minerals, microprobe analyses, and physical properties such as grain and bulk densities. The database is structured so that subsets of information can be extracted for specific geographic and time (age or stratigraphic) points or ranges. The database is designed to facilitate comprehensive, integrated analyses of all available geologic data from the SWNVF. Most important types of geologic information are included within the initial version of the database or work is in progress to add such information to an updated version. A few types of data important for geologic interpretation are omitted from this database; prominent among such missing information is paleomagnetic data (see, for example Hudson et al., 1994).

Relational Databases

A relational database consists of a set of tables. These tables have a row/column structure in which columns reflect "entities" or data items and associated attribute information, while rows reflect observations of the entities and associated attributes. Entities can be simple, i.e. location (loc_id), sample (sam_id) and split (spl_id); or complex, i.e. split_id/run_no. Attributes provide descriptive and/or auxiliary information which "track" with the entity to create a unique record or observation. Examples of attributes are north and east coordinates, sample type, worker(s), weight % SiO2,or volume % quartz.

If there is a need to extract specific observations from a database table, then the entity "identifier" must be unique, non-null and thus distinguishable from all other entity entries in the table. For example, if the results (attributes) from each of several analytical runs for a split_id need to uniquely identified, the initial entity: split_id, may need to be combined with an attribute: run_no, to create a unique, multi-column entity identifier, defined as a Primary Key. The Primary Key of a table, composed of a single or multiple columns, provides an unambiguous link to auxiliary tables; and is considered as a Foreign Key by the auxiliary tables. For example, the set of attribute information from a single sample may reflect the efforts of several workers (see Figure 9). A spreadsheet-type approach to this case would be either to have columns: worker_#1, worker_#2 ... in a single observation or row of the spreadsheet/table, or to have multiple rows, with the only difference between rows being the name of the worker. The relational database approach would be to define the sam_id as the Primary Key, and drop the worker field/column from the initial table so that all observations maintain a one-to-one relationship between attributes and entities. The worker information, which reflects a one-to-many, worker-to-sample relationship, would be stored in a separate table with columns: sam_id, worker_name. Sam_id would be defined as a foreign key, referencing the sam_id column of the initial sample table. This approach obviates the need for creating multiple, near duplicate columns in a table, or creating rows of mostly-redundant information, yet allows full recovery of all of the data associated with the entity. The process of minimizing redundant data within the database is called normalization.

Table A.

Sam_id	attr #1	attr #2	attr #3	worker #1	worker #2	worker #3

BV-3	12.3	6.4	blue	Smith	Jones	Barney
BV-4	8.2	2.3	grey	Jones	Barney

Table B.

Sam_id	attr #1	attr #2	attr #3	worker

BV-3	12.3	6.4	blue	Smith
BV-3	12.3	6.4	blue	Jones
BV-3	12.3	6.4	blue	Barney
BV-4	8.2	2.3	grey	Jones
BV-4	8.2	2.3	grey	Barney

Table C.

Sam_id	attr #1	attr #2	attr #3

BV-3	12.3	6.4	blue
BV-4	8.2	2.3	grey

Table D.

Sam_id	worker

BV-3	Smith
BV-3	Jones
BV-3	Barney
BV-4	Jones
BV-4	Barney

Figure 9. Comparison of spreadsheet and relational table structures. Spreadsheet Table A has multiple columns for workers. Identification of work performed by Barney would requires a query of three fields. Spreadsheet Table B has a single worker field, but must duplicate other attribute data within each row. Relational Table C stores unique (one-to-one) attribute values for a sam-id, while relational Table D stores multivalued (one-to-many) values for a sam-id (primary key). A query of the worker field of Table D, with a "relate" to Table C by sam-id, could retrieve all attribute data (attr #1, attr #2, attr #3) created by Barney.

Database for the SWNVF

The database for the SWNVF is relational, and is currently loaded in the commercial Oracle relational database management system (RDBMS) product. Data objects and structures (outlined in Figure 10) were created to support full extraction of all information originally contained within nonrelational database tables, ensure extensibility of the data set, establish hierarchical, parent-child relationships, and allow geographically-based data retrieval. The database was "normalized" to reduce storage of redundant data, allow enforcement of quality assurance (integrity) during the database loading stage, and fully exploit relationships among the data. Finally, accelerated storage retrieval has been initiated to a small degree through de-"normalization", which provides repetitive data within two or more tables to avoid frequent linking between the tables.

Figure 10. Entity-relationship diagram for geological data of the SWNVF

Primary Tables

The entity-relationship model for "point" data results in a set of database tables that also incorporate the integral hierarchy at which these data were collected. This hierarchy is reflected through successive one-to-many tables in which:

- a location (x,y coordinate) can have many samples (with different z coordinates),

- a sample can have many splits (i.e. chemical, petrologic),

- a split can have many measurements (i.e. chemical - Na, K, Mg) and,

- measurements can have replicates (although measurements averaged from replicates are reported in the database, except for microprobe analyses where mineral zonation can be a factor).

Database tables that define location, sample, and split (i.e. pa_split, ca_split, ma_split) contain columns that define attributes specific to the location, sample, or split. These tables also provide cumulative supporting information for measurements. Additional descriptive information for each measurement such as uncertainties, units of measurement, etc. apply directly (uniquely) to a particular measurement and are usually contained in tables at the measurement level. At all levels, one-to-many relationships between data fields (entities) within the data relevant at that level are managed by placing these data in separate tables.

The entity-relationship model for data from "geologic intervals" is simpler than that from single points. Geologic intervals normally represent several types of observations for each drill hole, each observation represented by a character string. Measurements of all types are combined within a single large database table; an attribute identifies the measurement type for each geologic interval. The loc_ids that provide drill hole names, together with depths/elevations of the interval boundaries, allow data for geologic intervals to be integrated with data for samples. Because there are no "splits" for geologic intervals, fewer data are required to define attributes.

At present, measurement tables contain metadata (information about the data) as well as measurements. These tables could be split into pairs of "clustered" tables although it may be difficult in some cases to discriminate between data and metadata. For example, measurement units might be viewed as data critical to the measurement, or simply might represent an attribute to be stored in a separate table.

Foreign Key Tables

Foreign Key tables provide data code definitions, subsidiary data code information, acceptable values for fields (to insure data retrievability), and/or groupings of data.

Many of the primary tables contain descriptive data that are abbreviated as codes. Foreign Key tables with the naming convention "attribute"_list can be linked to the primary tables to insure that input data is correct, and to allow expansion of the codes at the time of data retrieval, as shown in Table 13.

texture_code	texture_name

A	partly altered
B	blackened mafic
G	granular
H	shreddy (disaggregated)
M	microlite or groundmass phase
P	prismatic (elongate)
S	shattered
W	wormy (coarse) resorption

Table 13. Portion of the Foreign Key table texture_list

The Foreign Key tables allow compaction of the data in the larger primary tables and can also supply additional subsidiary information about the data item. For example, worker names are generally entered as initials. The database table worker_list contains the worker code and the worker name, organization, city, and state. Other tables, such as oxide_list, contain information to allow mapping and/or data conversion, as shown in Table 14.

oxide_code	oxide_name	element_code	element_name	ox_el_wt_ratio

Ag2O	Silver Oxide	Ag	Silver	1.07416
Al2O3	Alumina	Al	Aluminum	1.88988
As2O3	Arsenic Oxide	As	Arsenic	1.32038

Table 14. Portion of the Foreign Key table oxide_list

Database table ca_measure provides all chemical analyses in the database as their most common oxide, except for elements that generally occur in native form, such as mercury. The last field (ox_el_wt_ratio) in table oxide_list provides the conversion factor from oxide weight to element weight, which allows extraction of chemical analyses in elemental representation.

Hierarchical groupings for codes is accomplished by imbedding a "parent:child" relationship within a Foreign Key table. Within such tables, paired fields containing the same type data represent the "parent" and "child", as illustrated in Table 15.

Column 1	Column 2	Column 3	Column 4
record ID	child	parent	other data fields

1	C	B
2	D	B
3	B	A
4	A

Table 15. Illustration of parent:child relationships within database, implying the hierarchical grouping:

                    A
                     |
                    B
                    / \
                  C D

Use of such a Foreign Key table allows extraction of other data fields based on a "parent", all "children", the children's "children", etc. to some specified level. Thus, in the example above, data fields can be extracted for all records where Column 2 is B, C, or D by extracting records where Column 3 is B, that is, extracting records for B and its children. In this normalized database, parent:child relationships are defined within a Foreign Key table and not by an additional column within a data table. An example within the database is Foreign Key table comp_list, which defines components, mostly minerals, that are analyzed by petrography and by XRD, as shown in Table 16.

comp_code	comp_name	comp_group_code
child		Parent

WW	all minerals	(no parent)
FE	felsic minerals	WW
FS	Feldspar	FE
KF	K-spar	FS
PL	Plagioclase	FS
QZ	Quartz	FE
AO	Anorthoclase	KF
MC	Microcline	KF
OR	Orthoclase	KF
SD	Sanidine	KF

Table 16. Portion of the Foreign Key table comp_list

One could use this table to create a query that would extract measurements for all felsic minerals (FE, FS, KF, PL, QZ, AO, MC, OR, SD) through identification of the parent component (FE). The hierarchy also allows use of felsic mineral abundances in special cases. For example, feldspar is often completely destroyed by hydrothermal alteration, rendering K-spar and plagioclase indistinguishable and thus individually indeterminate. In such cases, total feldspar might be determined for comparison with total feldspar in samples outside the hydrothermally altered area. Another example is to combine individual analyses determined by a petrographer when these are demonstrated to be inaccurate. In many cases optical petrographic analyses are erroneous for K-spar and plagioclase. Combining these into total feldspar retains reliable petrographic information while avoiding a complete loss of information. The hierarchy established in database table comp_list for mafic minerals has particularly useful application in petrographic analysis owing to the great sensitivity of these minerals to alteration.

Overlapping groupings of codes can be accomplished by a many-to-many mapping of codes. The mapping table lith_group_map can be combined with the Foreign Key table lith_list to allow groupings of lithologic information to be extracted from primary tables. Portions of the tables lith_group_map and lith_list shown in Tables 17 and 18 illustrate how a particular lithology can be included within two or more groups. For example, an extraction of information for dikes should include basaltic dikes (lith_code BD in Table 17), but an extraction for basic rocks should also include basaltic dikes. This is accomplished by defining two parents for lith_code BD in Table 18, assuring retrieval from any appropriate parent.

lith_code	lith_name

BA	basic rocks
BD	basaltic dike
BS	basalt
GB	gabbro
ID	dike
IS	shallow intrusives
IN	intrusive

Table 17. Portion of the Foreign Key table lith_list

lith_code	lith_group_code

BD	IN
BD	BA
BS	BA
GB	BA
ID	IN
IS	IN

Table 18. Portion of the Foreign Key table lith_ group_map

Back