Providing Object-Oriented Interfaces for Molecular Biology Databases*

Victor M. Markowitz, I-Min A. Chen, Anthony Kosky, and Ernest Szeto

Information and Computing Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720

Numerous repositories for molecular biology data are implemented using commercial relational database management systems (DBMSs). These DBMSs provide reliable facilities for managing data in molecular biology databases (MBDs) such as the Genome Sequence Data Base (GSDB), but do not provide constructs for directly representing application-specific objects, such as sequences and alleles: such objects are usually represented by disconnected tuples that are scattered among multiple tables. Interacting with MBDs implemented using relational DBMSs can be substantially simplified by providing these MBDs with object oriented interfaces for examining (browsing and querying) their structure and content, and thus insulating users and applications from the underlying DBMSs.

The first step of developing an object-oriented interface for a relational MBD is constructing an object view for the MBD. For most MBDs, constructing such a view cannot be carried out automatically. It is often hard to determine algorithmically what tables represent classes of objects and what tables represent attributes (e.g., set-valued attributes) that cannot be represented as columns in the tables representing classes. Furthermore, the foreign-key information which is essential for determining the relationships between different classes of objects is often missing in relational MBD definitions. Consequently, an interactive procedure is needed for filling in the missing information and/or determining the structure of the desired object view for a relational MBD.

We have developed a retrofitting tool that allows constructing and maintaining object views on top of existing relational MBDs, without affecting the structure and content of MBDs, and therefore without disturbing existing applications based these MBDs. This tool is based on the Object-Protocol Model (OPM). OPM provides constructs for modeling objects and protocols (laboratory experiments) specific to molecular biology applications.[1] The OPM retrofitting tool generates a canonical OPM view using all the available information on the relational MBD, and provides facilities for interactively refining the OPM view by renaming classes and attributes, changing the value classes of attributes, hiding classes and attributes from the view, defining new derived classes and attributes, grouping simple attributes into tuple attributes, adding and removing classes and attributes, merging and splitting classes, and so on.

Once an OPM view has been developed for a relational MBD, the object-oriented interface on top of the MBD is provided by the OPM data management tools. The OPM retrofitting tool has been applied for constructing an OPM view for Genome Sequence Data Base (GSDB) 2.2 in order to allow using the OPM browsing and querying tools on top of GSDB 2.2.

*Supported by a grant from the Director, Office of Energy Research, Office of Health and Environmental Research, of the U.S. Department of Energy under Contract DE-AC03-76SF00098.

[1] Chen, I.A., and Markowitz, V.M., An Overview of the Object-Protocol Model (OPM) and OPM Data Management Tools, Information Systems, Vol. 20, No 5 (July 1995), pp. 393-418.


Abstracts scanned from text submitted for January 1996 DOE Human Genome Program Contractor-Grantee Workshop.

Return to Table of Contents