GEDP Components
The GEDP is designed using a J2EE based architecture leveraging a Model-View-Control
(MVC) approach which separates the presentation and control logic. Details of architecture coponents
are provided below.
Use Cases
Use cases are utilized to transform user requirements into a sequence of user-system interactions.
Our
software development process begins with a high-level domain analysis and detailed use case analysis.
These use cases drive the business logic included in the GEDP.
Design Document
Draft GEDP design document providing an overview of MVC components.
Object Models
Class diagrams for GEDP architectural components. Object models describe
database, utility, mapping, and bean classes affiliated with the MVC
infrastructure.
Data Models
The GEDP data model is based on the
MAGEML and MIAME specifications. Diagrams of the GEDP data model as wells and associated database
Diagrams of the GEDP data model along with database creation scripts are provided.
Java API
The Java API for the GEDP domain and enterprise classes that encapsulate and persist expression data.
MAGE Components
The MAGE-OM
is currently being leveraged to facilitate data integration. MAGE-APIs are currently under development.
MAGE Object Model (MAGE-OM)
MGED's UML representations of gene expression data. MAGE-OM objects are currently mapped to data
captured within the GEDP database.
MAGE API
A Java, SOAP, and HTTP API to the MAGE-OM leveraging the NCICB's
caBIO instrastructure.
The APIs have been developed and are currently under testing.
netCDF Components
netCDF (network Common Data Form)
is an interface and library for accessing array data.
The netCDF library defines a binary representation of scientific data and NetCDF
components support the creation, access, and sharing of scientific data in a performance and
storage efficient manner. The GEDP leverages the netCDF libraries to efficiently store and retrieve
intensity values from large experiment files.
For example, during the upload of an Affymetrix platform experiment, the GEDP system
extracts the data from the uploaded Affymetrix .cel files and stores the .cel file header
data in the GEDP database. The .cel file data values are stored separately in netCDF
format. The generated netCDF files can be downloaded from the GEDP site and may be
processed as described below.
- Many statistical analysis packages (R for example) have
support for reading netCDF format files directly
- There is a standard
set of tools
available for displaying the contents of NetCDF files.
- There is a C++
or Java API available for
accessing the data in NetCDF file
The GEDP application includes a
java class that utilizes the aforementioned Java NetCDF api to provide
simplifed access to the data in the netCDF files created during the
experiment submission process.
R Components
For statistical processing, the GEDP leverages the R statistical package.
R is a language and environment for statistical computing and graphics and provides a variety of
statistical (e.g. clustering) techniques.
MGED Ontologies
The Microarray Gene Expression Data society has initiated an
Ontology Working Group
tasked with developing an ontology for describing samples used in microarray experiments.
The use of the
NCI Enterprise Vocabulary Services as well as MGED ontologies under development
are being explored for future releases.
NCI's cancer Bioinformatics Infrastructure Objects (caBIO)
caBIO in an architecture that provides programmatic access via APIs (Java, SOAP, HTTP) to a variety of
publicly available NCI intramural (CGAP,
CMAP, etc.) and extramural data repositories (Unigene, LocusLink, etc.)
The GEDP uses caBIO API's to obtain gene lists and cellular pathway information. caBIO is also used
to map Affy reporter IDs to GenBank Accession numbers.
|