AMASE: An Object-Oriented Metadatabase Catalog for Accessing Multi-Mission Astrophysics Data

Cynthia Cheung
David Leisawitz
Nick Roussopoulos
Stephen Kelley
Jane Wang
Gail Reichert
David Silberberg
NASA/GSFC
Astrophysics Data Facility
Greenbelt, MD
Dept. of Computer Science & UMIACS
University of Maryland
College Park, MD 20742
Hughes
STX
Greenbelt, MD

Abstract:

This paper describes the development of AMASE, an astrophysics catalog, which uses object-oriented database (OODB) technology to facilitate access to NASA's multi-mission and multi-spectral astrophysics data holdings. The catalog is an on-line metadatabase consisting of scientific reference pointers to enable easy location of low-level space astrophysics data in the public archives. The metadatabase catalog contains both descriptions of mission-oriented data and published astrophysics catalog data. We use OODB techniques to classify data products with scientific attributes, enabling queries across mission and spectral boundaries of data residing in heterogeneous databases. We discuss the advantages of using an object-relational database (a hybrid of object-oriented and relational database technologies) to mitigate many of the problems that arise in using pure relational technology.

1 Introduction

Astrophysical data analysis demands efficient access to large volume of complex data. NASA data sets in the data archives are structured with a mission perspective. These data sets are products of space missions and, hence, are stored in the archives according to mission-specific parameters, in disparate mass storage systems and in different formats. This is largely a historical development, reflecting the evolution of computers and mass storage technology in the past 30 years.

Scientists often wish to conduct research that cuts across mission boundaries, and would do so more often were it not so challenging to locate and understand unfamiliar data sets. It is generally advantageous, for example, to use multi-wavelength observations to interpret a class of celestial objects. Yet it is currently necessary to learn the specific structure of each NASA mission data archive before one can determine whether the relevant data exist and then access them. The internet has provided different means for scientists to access the mission data centers through Web browsers, FTP, Gopher or customized interfaces. But the procedures to locate data granules is dependent on the services provided by the individual data center. The user has to know a priori which data archive to search. Some mission archives have built interfaces to make the underlying database structure transparent to the users. Nonetheless, the search is conducted one data set at a time and one mission at a time, and the researcher must integrate information gleaned from the various catalogs. The problem is exacerbated if the user is unfamiliar with the data products generated by a particular mission and thus does not even know where to begin. A uniform, multi-mission and multi-spectral user interface to these heterogeneous data holdings is surely needed.

This paper describes the development of AMASE, an astrophysics metadatabase catalog system using object-oriented database (OODB) technology to facilitate access to NASA's multi-mission and multi-spectral astrophysics data holdings. The AMASE catalog is an on-line metadatabase consisting of scientific reference pointers to enable easy location of low-level space astrophysics data in the public archives. We use OODB techniques to classify data products with scientific attributes, enabling queries across mission and spectral boundaries residing in heterogeneous databases. The AMASE contains both descriptions of mission-oriented data and published astrophysics catalog data.

Section 2 of this paper discusses the rationale for using OODB technology. Section 3 discusses the schema design. Section 4 describes the query capability and the HTML interface. Finally, the conclusions are in Section 5.

2 Adopted Object-Oriented Database Design Paradigm

Traditionally, relational databases are used to represent the astrophysics catalogs. However, due to the hierarchical nature of many of the data relationships, an object-oriented approach results in a data model which more closely mirrors the structure of the data. The Object-Oriented Database (OODB) paradigm allows greater flexibility in characterizing classes of objects and can provide a ``scientific'' view into the heterogeneous multi-variate catalogs.

The second and perhaps more important reason for choosing OODB is that a significant fraction of the metadata are either in semi-structured form, e.g., documentation in free text, or have complex data types none of which is supported by traditional relational systems. For example, data types such as celestial positions, with implicit transformation between different coordinate systems, are required to support spatial queries. Furthermore, many of these complex data types are linked together via arbitrary programs (methods) to form other high level of objects. Such as objects cannot be built using SQL statements on normalized relations for the reason that they are arbitrary and data driven. None of these are supported by traditional relational database technology.

The third compelling reason for the OODB approach is heterogeneity of the astrophysics catalogs and their data. Seemingly similar attributes may have different calibration, different units, and even different interpretation. The OO approach allows us to apply different algorithms depending on the data source as necessary.

Despite the limited modeling capabilities of traditional relational database systems, they offer maturity, robustness, and predictable performance. A central tenet of our approach to building AMASE is to accommodate access to existing astrophysics databases. They constitute a huge investment over many years and the information in them is a key NASA asset. It would be impractical to require that these databases be ported to a new database system and their application programs be rewritten for this new database. Since many of these are implemented in relational systems, interoperability of these systems with AMASE is essential. The Object-Relational system provides the right hooks and tools for such interoperation.

For this reason we were compelled to use the hybrid Object-Relational approach which combines the advantages of both worlds. We selected Illustra as a platform for AMASE. Illustra is a commercial Object-Relational DBMS which offers the advantages of the OODB modeling and the ease of writing SQL-like queries. It supports SQL3 OO features such as inheritance, polymorphism, and user defined data types. One of the most important data type that were found necessary for the search is spatial data type and its underlying spatial R-tree indexing. Finally, Illustra allows access to relational systems through datablades, such as Sybase, and others will be available soon. It is one of the primary goals of AMASE to interoperate with legacy systems and Illustra provides an appropriate platform for achieving it.

3 AMASE Database Design

NASA astrophysics data holdings are structured in AMASE with the ``scientific'' perspective. Astrophysicists can search for a class of celestial objects by characterizing their data requirements in scientific parameters without knowledge of the location of the distributed data holdings, the number of missions and the interfaces to all the storage data systems. AMASE provides a uniform user interface which supports global multi-mission and multi-spectral searches, across heterogeneous and distributed data holdings just as if they were stored inside a single database.

Two missions in different wavelength bands and with different mission operations philosophy were selected to demonstrate the multi-mission and multi-spectral search capability. ROSAT is an X-ray mission that is still operating, with data gathered from pointed observations by guest investigators. IRAS was an infrared mission that was operated mainly in the scanning mode. The available data products from these missions were reviewed and parameterized for input into AMASE. We decided to store only fundamental astronomical measurements that will help locate low-level data. Derived quantities will be generated dynamically by algorithms stored in the OODB. Data products in the archives are referenced by hypertext pointers. Currently, the pointers bring up the appropriate data request form of the data center and instructions on how to access the particular data set. Eventually, the data retrieval information will be built into the database and made transparent to the user.

We seek to encapsulate the existing data, metadata, associated documentation and analysis software into ``objects'' in our catalog, anticipating the complex requirements in the course of scientific inquiries. The scientific attributes include fundamental astronomical measurements such as name of object, position, coordinates, flux, spectral bandpass, surface brightness, color, velocity, proper motion and redshift, as well as astronomical classification. These attributes are generally available in published astronomical catalogs.

The design of AMASE is to capture these relevant information from the astronomical catalogs and use them as criteria to search the mission data archives. The mission data are described using attributes gleaned from the mission timelines, target lists and observation logs that are generated directly from the FITS headers of data files in the archives. These attributes include pointing position, observation time, instrument configuration and name of primary target. Other information such as the instrument field of view, spectral coverage and operation time span are also captured by the search algorithms to augment the search and provide the user with ``supplementary information''.


  
Figure 1:The AMASE schema

PostScript version of figure 1


  
Figure 2: The AMASE schema

PostScript version of figure 2


In AMASE, all the scientific attributes and mission data attributes are linked to an ``Astronomical Object''. An ``Astronomical Object'' can be a single astronomical object, a class of astronomical objects, or an abstract class with well-defined scientific attributes. A hierarchy of astronomical ``classes'' has been defined in the object-oriented database so that a child class inherit all the attributes of the parent class. For example, an active galaxy is characterized by all the attributes of the ``galaxy'' class, plus some additional attributes that are specific to the ``active galaxy'' class. The attributes can be Radio Loud and Nuclear Activity. The schema also support astronomical name aliases. When a user queries for science data, the search criteria are issued against the ``Astronomical Object'' while the query results returned are in terms of mission data descriptors. The mission data descriptors identify the names of the relevant mission data sets, archive location and retrieval information.

We also designed a hierarchy of types. We defined position types for multiple coordinate system including galactic, ecliptic, and equatorial. We modeled the positional uncertainty of multiple instruments and calibration techniques, spectral types, variability, uncertainty, and extinction. We finally defined methods specifically to handle positional uncertainty, spectral variability, and flux sensitivity,

We aggregated instrument, position, and wave-length types into classes such AstronomicalObject and ObservationEntry. Names, positions, and spectral types are extracted from published catalogs and linked to the NASA mission catalog via the class of AstronomicalObject. Another feature of the design captures references to on-line data, mission proposals, targets and time-lines.

The schema of the classes used in AMASE is shown in Figures 1 and 2.

4 The AMASE System Implementation

This section describes the system's query capabilities and its interface. In the prototype, emphasis was primarily given to the basic functionality of the system although we also tried to improve performance using advanced indexing and replication techniques.

AMASE runs on an SGI Indigo-2 workstation, with a 40-GB magneto-optical jukebox and a stand alone DLT tape drive as secondary storage.

4.1 Advanced Query Capability

In addition to the query capability provided by Illustra, AMASE supports three categories of astrophysics specific queries: Name/Alias retrieval, Positional, and astronomical object class:

Among the OO features of Illustra, we capitalize on its built-in spatial data type, access and manipulation functions, procedures to facilitate point and region queries. We also use reference types to link object classes and collection types to support multi-valued attributes, e.g multiple aliases of an AstronomicalObject. Special purpose methods are used to retrieve and convert class specific attribute values.

Figures 3, 4 and 5 show some of the queries supported by AMASE in their HTML format and the produced output.


  
Figure 3: Name-based Query


  
Figure 4: Spatial Search with a radius


  
Figure 5: Result Screen and Continuation Query Form


4.2 The HTML User Interface

We have designed and implemented an HTML interface to AMASE to provide accessibility to remote users. It consists of a sequence of forms and result pages. The forms include radio buttons, data entry fields, and push buttons grouped into centered tables for presentation. The selection and data entry fields are validated against methods built in the HTML interface which at the same time take care of coordinate transformation (from simple translation to precession). The forms execute ``cgi'' based code that invokes the underlying queries and generates HTML formatted the output. Embedded in or associated with displayed objects are links to external catalogs and/or data archives.

4.3 Data Entry

We have populated the database with approximately 40,000 astronomical objects from selected regions of the sky, specifying position (galactic, ecliptical and equatorial), type, wavelength band, object name aliases etc. We populated several other classes (of equivalent cardinality) which encapsulate primary and derived observation data and published astronomical catalog information.

5 Conclusion and Future Work

In this paper we described AMASE an astrophysics catalog, which uses object-oriented database (OODB) technology to facilitate access to NASA's multi-mission and multi-spectral astrophysics data holdings. We capitalized on OODB techniques to classify data products with scientific attributes, enabling queries across mission and spectral boundaries residing in heterogeneous databases. We discussed the advantages of using an object-relational database (a hybrid of object-oriented and relational database technologies) to mitigate many of the problems that arise in using pure relational technology.

Our immediate goal is to increase the database of AMASE by including more catalog and observation entries from other NASA missions. We also plan to increase the query capability by allowing ``data mining'', for example searching for ``proximity'' of objects, variability and uncertainty of retrieval values. We also plan to develop semiautomatic tools to extract pertinent information from raw files in FITS format.


Last Update 20 November, 1995

Jim Blackwell