III. DATABASE OBJECTS AND STRUCTURE
The database is a collection of visual (field) and laboratory measurements as well as supporting metadata, taken for samples from geographic points or for depth intervals from field observations or geophysical logs within the SWNVF. Field measurements include such items as color, alteration, and texture. Laboratory measurements include various kinds of chemical analyses, age determinations, X-ray diffraction analyses, petrographic analyses, including optical mineralogic analyses for individual minerals, microprobe analyses, and physical properties such as grain and bulk densities. The database is structured so that subsets of information can be extracted for specific geographic and time (age or stratigraphic) points or ranges. The database is designed to facilitate comprehensive, integrated analyses of all available geologic data from the SWNVF. Most important types of geologic information are included within the initial version of the database or work is in progress to add such information to an updated version. A few types of data important for geologic interpretation are omitted from this database; prominent among such missing information is paleomagnetic data (see, for example Hudson et al., 1994).
A relational database consists of a set of tables. These tables have a row/column structure in which columns reflect "entities" or data items and associated attribute information, while rows reflect observations of the entities and associated attributes. Entities can be simple, i.e. location (loc_id), sample (sam_id) and split (spl_id); or complex, i.e. split_id/run_no. Attributes provide descriptive and/or auxiliary information which "track" with the entity to create a unique record or observation. Examples of attributes are north and east coordinates, sample type, worker(s), weight % SiO2, or volume % quartz.
If there is a need to extract specific observations from a
database table, then the entity "identifier" must be unique, non-null and thus
distinguishable from all other entity entries in the table. For example, if
the results (attributes) from each of several analytical runs for a split_id
need to uniquely identified, the initial entity: split_id,
may need to be combined with an attribute: run_no,
to create a unique, multi-column entity identifier, defined as a Primary Key.
The Primary Key of a table, composed of a single or multiple columns, provides
an unambiguous link to auxiliary tables; and is considered as a Foreign Key
by the auxiliary tables. For example, the set of attribute information from
a single sample may reflect the efforts of several workers (see Figure 9). A
spreadsheet-type approach to this case would be either to have columns: worker_#1,
worker_#2 ... in a single observation or row of the spreadsheet/table, or to
have multiple rows, with the only difference between rows being the name of
the worker. The relational database approach would be to define the sam_id
as the Primary Key, and drop the worker field/column from the initial table
so that all observations maintain a one-to-one relationship between attributes
and entities. The worker information, which reflects a one-to-many, worker-to-sample
relationship, would be stored in a separate table with columns: sam_id,
worker_name. Sam_id
would be defined as a foreign key, referencing the sam_id
column of the initial sample table. This approach obviates the need for creating
multiple, near duplicate columns in a table, or creating rows of mostly-redundant
information, yet allows full recovery of all of the data associated with the
entity. The process of minimizing redundant data within the database is called
normalization.
Table A.
Sam_id |
attr #1 |
attr #2 |
attr #3 |
worker #1 |
worker #2 |
worker #3 |
BV-3 |
12.3 |
6.4 |
blue |
Smith |
Jones |
Barney |
BV-4 |
8.2 |
2.3 |
grey |
Jones |
Barney |
Table B.
Sam_id |
attr #1 |
attr #2 |
attr #3 |
worker |
BV-3 |
12.3 |
6.4 |
blue |
Smith |
BV-3 |
12.3 |
6.4 |
blue |
Jones |
BV-3 |
12.3 |
6.4 |
blue |
Barney |
BV-4 |
8.2 |
2.3 |
grey |
Jones |
BV-4 |
8.2 |
2.3 |
grey |
Barney |
Table C.
Sam_id |
attr #1 |
attr #2 |
attr #3 |
BV-3 |
12.3 |
6.4 |
blue |
BV-4 |
8.2 |
2.3 |
grey |
Table D.
Sam_id |
worker |
||
BV-3 |
Smith |
||
BV-3 |
Jones |
||
BV-3 |
Barney |
||
BV-4 |
Jones |
||
BV-4 |
Barney |
Figure 9. Comparison of spreadsheet and relational table structures. Spreadsheet Table A has multiple columns for workers. Identification of work performed by Barney would requires a query of three fields. Spreadsheet Table B has a single worker field, but must duplicate other attribute data within each row. Relational Table C stores unique (one-to-one) attribute values for a sam-id, while relational Table D stores multivalued (one-to-many) values for a sam-id (primary key). A query of the worker field of Table D, with a "relate" to Table C by sam-id, could retrieve all attribute data (attr #1, attr #2, attr #3) created by Barney.
Database for the SWNVF
The database for the SWNVF is relational, and is currently loaded in the commercial Oracle relational database management system (RDBMS) product. Data objects and structures (outlined in Figure 10) were created to support full extraction of all information originally contained within nonrelational database tables, ensure extensibility of the data set, establish hierarchical, parent-child relationships, and allow geographically-based data retrieval. The database was "normalized" to reduce storage of redundant data, allow enforcement of quality assurance (integrity) during the database loading stage, and fully exploit relationships among the data. Finally, accelerated storage retrieval has been initiated to a small degree through de-"normalization", which provides repetitive data within two or more tables to avoid frequent linking between the tables.
Figure 10. Entity-relationship diagram for geological data of the SWNVF
The entity-relationship model for "point" data results in a set of database tables that also incorporate the integral hierarchy at which these data were collected. This hierarchy is reflected through successive one-to-many tables in which:
- a location (x,y coordinate) can have many samples (with different z coordinates),
- a sample can have many splits (i.e. chemical, petrologic),
- a split can have many measurements (i.e. chemical - Na, K, Mg) and,
- measurements can have replicates (although measurements averaged from replicates are reported in the database, except for microprobe analyses where mineral zonation can be a factor).
Database tables that define location, sample, and split (i.e. pa_split, ca_split, ma_split) contain columns that define attributes specific to the location, sample, or split. These tables also provide cumulative supporting information for measurements. Additional descriptive information for each measurement such as uncertainties, units of measurement, etc. apply directly (uniquely) to a particular measurement and are usually contained in tables at the measurement level. At all levels, one-to-many relationships between data fields (entities) within the data relevant at that level are managed by placing these data in separate tables.
The entity-relationship model for data from "geologic intervals" is simpler than that from single points. Geologic intervals normally represent several types of observations for each drill hole, each observation represented by a character string. Measurements of all types are combined within a single large database table; an attribute identifies the measurement type for each geologic interval. The loc_ids that provide drill hole names, together with depths/elevations of the interval boundaries, allow data for geologic intervals to be integrated with data for samples. Because there are no "splits" for geologic intervals, fewer data are required to define attributes.
At present, measurement tables contain metadata (information about the data) as well as measurements. These tables could be split into pairs of "clustered" tables although it may be difficult in some cases to discriminate between data and metadata. For example, measurement units might be viewed as data critical to the measurement, or simply might represent an attribute to be stored in a separate table.
Foreign Key tables provide data code definitions, subsidiary data code information, acceptable values for fields (to insure data retrievability), and/or groupings of data.
Many of the primary tables contain descriptive data that are abbreviated as codes. Foreign Key tables with the naming convention "attribute"_list can be linked to the primary tables to insure that input data is correct, and to allow expansion of the codes at the time of data retrieval, as shown in Table 13.
texture_code |
texture_name |
A |
partly altered |
B |
blackened mafic |
G |
granular |
H |
shreddy (disaggregated) |
M |
microlite or groundmass phase |
P |
prismatic (elongate) |
S |
shattered |
W |
wormy (coarse) resorption |
Table 13. Portion of the Foreign Key table texture_list
The Foreign Key tables allow compaction of the data in the larger primary tables and can also supply additional subsidiary information about the data item. For example, worker names are generally entered as initials. The database table worker_list contains the worker code and the worker name, organization, city, and state. Other tables, such as oxide_list, contain information to allow mapping and/or data conversion, as shown in Table 14.
oxide_code |
oxide_name |
element_code |
element_name |
ox_el_wt_ratio |
Ag2O |
Silver Oxide |
Ag |
Silver |
1.07416 |
Al2O3 |
Alumina |
Al |
Aluminum |
1.88988 |
As2O3 |
Arsenic Oxide |
As |
Arsenic |
1.32038 |
Table 14. Portion of the Foreign Key table oxide_list
Database table ca_measure provides all chemical analyses in the database as their most common oxide, except for elements that generally occur in native form, such as mercury. The last field (ox_el_wt_ratio) in table oxide_list provides the conversion factor from oxide weight to element weight, which allows extraction of chemical analyses in elemental representation.
Hierarchical groupings for codes is accomplished by imbedding a "parent:child" relationship within a Foreign Key table. Within such tables, paired fields containing the same type data represent the "parent" and "child", as illustrated in Table 15.
Column 1 |
Column 2 |
Column 3 |
Column 4 |
record ID |
child |
parent |
other data fields |
1 |
C |
B |
|
2 |
D |
B |
|
3 |
B |
A |
|
4 |
A |
Table 15. Illustration of parent:child relationships within database, implying the hierarchical grouping:
A
|
B
/
\
C
D
Use of such a Foreign Key table allows extraction of other data fields based on a "parent", all "children", the children's "children", etc. to some specified level. Thus, in the example above, data fields can be extracted for all records where Column 2 is B, C, or D by extracting records where Column 3 is B, that is, extracting records for B and its children. In this normalized database, parent:child relationships are defined within a Foreign Key table and not by an additional column within a data table. An example within the database is Foreign Key table comp_list, which defines components, mostly minerals, that are analyzed by petrography and by XRD, as shown in Table 16.
comp_code |
comp_name |
comp_group_code |
child |
Parent |
|
WW |
all minerals |
(no parent) |
FE |
felsic minerals |
WW |
FS |
Feldspar |
FE |
KF |
K-spar |
FS |
PL |
Plagioclase |
FS |
QZ |
Quartz |
FE |
AO |
Anorthoclase |
KF |
MC |
Microcline |
KF |
OR |
Orthoclase |
KF |
SD |
Sanidine |
KF |
Table 16. Portion of the Foreign Key table comp_list
One could use this table to create a query that would extract measurements for all felsic minerals (FE, FS, KF, PL, QZ, AO, MC, OR, SD) through identification of the parent component (FE). The hierarchy also allows use of felsic mineral abundances in special cases. For example, feldspar is often completely destroyed by hydrothermal alteration, rendering K-spar and plagioclase indistinguishable and thus individually indeterminate. In such cases, total feldspar might be determined for comparison with total feldspar in samples outside the hydrothermally altered area. Another example is to combine individual analyses determined by a petrographer when these are demonstrated to be inaccurate. In many cases optical petrographic analyses are erroneous for K-spar and plagioclase. Combining these into total feldspar retains reliable petrographic information while avoiding a complete loss of information. The hierarchy established in database table comp_list for mafic minerals has particularly useful application in petrographic analysis owing to the great sensitivity of these minerals to alteration.
Overlapping groupings of codes can be accomplished by a many-to-many mapping of codes. The mapping table lith_group_map can be combined with the Foreign Key table lith_list to allow groupings of lithologic information to be extracted from primary tables. Portions of the tables lith_group_map and lith_list shown in Tables 17 and 18 illustrate how a particular lithology can be included within two or more groups. For example, an extraction of information for dikes should include basaltic dikes (lith_code BD in Table 17), but an extraction for basic rocks should also include basaltic dikes. This is accomplished by defining two parents for lith_code BD in Table 18, assuring retrieval from any appropriate parent.
lith_code |
lith_name |
BA |
basic rocks |
BD |
basaltic dike |
BS |
basalt |
GB |
gabbro |
ID |
dike |
IS |
shallow intrusives |
IN |
intrusive |
Table 17. Portion of the Foreign Key table lith_list
lith_code |
lith_group_code |
BD |
IN |
BD |
BA |
BS |
BA |
GB |
BA |
ID |
IN |
IS |
IN |
Table 18. Portion of the Foreign Key table lith_ group_map
|
Back |