Cancer.gov
National Cancer Institute   The Director's Challenge    
 
NCICB Home home tools protocols reagents informatics organization dataSets

  Director's Challenge  >  Informatics  >  Infrastructure Components :



What's New
Director's Challenge PI Meeting Agenda Nov 3-5, 2004

caWorkbench available for download!

caArray 1.2 release now available!

MAGE-OM API

DC Publications

Microarray News
MAGEML Standard

Gene Expression Specification (V1.0)

Related Links
BioConductor

R Project

Microarray Informatics at EBI

Whitehead Institute for Genome Research



The The Gene Expression Data Portal (GEDP) is a portal that allows users to submit and retrieve microarray experiments (Affymetrix and Spotted Array), as well as other types of expression experiments (SAGE).

The GEDP includes the eXpressionWay tool which allows researchers to view detailed information about the genes associated with their experiments, including the cellular pathways that contain the target genes. Additional analysis tools are currently under development. For detailed information on the usage of the GEDP and eXpressionWay tool, please see the GEDP User's Guide.

GEDP Components
The GEDP is designed using a J2EE based architecture leveraging a Model-View-Control (MVC) approach which separates the presentation and control logic. Details of architecture coponents are provided below.

Use Cases
Use cases are utilized to transform user requirements into a sequence of user-system interactions. Our software development process begins with a high-level domain analysis and detailed use case analysis. These use cases drive the business logic included in the GEDP.

Design Document
Draft GEDP design document providing an overview of MVC components.

Object Models
Class diagrams for GEDP architectural components. Object models describe database, utility, mapping, and bean classes affiliated with the MVC infrastructure.

Data Models
The GEDP data model is based on the MAGEML and MIAME specifications. Diagrams of the GEDP data model as wells and associated database Diagrams of the GEDP data model along with database creation scripts are provided.

Java API
The Java API for the GEDP domain and enterprise classes that encapsulate and persist expression data.

MAGE Components
The MAGE-OM is currently being leveraged to facilitate data integration. MAGE-APIs are currently under development.

MAGE Object Model (MAGE-OM)
MGED's UML representations of gene expression data. MAGE-OM objects are currently mapped to data captured within the GEDP database.

MAGE API
A Java, SOAP, and HTTP API to the MAGE-OM leveraging the NCICB's caBIO instrastructure. The APIs have been developed and are currently under testing.

netCDF Components
netCDF (network Common Data Form) is an interface and library for accessing array data. The netCDF library defines a binary representation of scientific data and NetCDF components support the creation, access, and sharing of scientific data in a performance and storage efficient manner. The GEDP leverages the netCDF libraries to efficiently store and retrieve intensity values from large experiment files.

For example, during the upload of an Affymetrix platform experiment, the GEDP system extracts the data from the uploaded Affymetrix .cel files and stores the .cel file header data in the GEDP database. The .cel file data values are stored separately in netCDF format. The generated netCDF files can be downloaded from the GEDP site and may be processed as described below.

  1. Many statistical analysis packages (R for example) have support for reading netCDF format files directly
  2. There is a standard set of tools available for displaying the contents of NetCDF files.
  3. There is a C++ or Java API available for accessing the data in NetCDF file
The GEDP application includes a java class that utilizes the aforementioned Java NetCDF api to provide simplifed access to the data in the netCDF files created during the experiment submission process.

R Components
For statistical processing, the GEDP leverages the R statistical package. R is a language and environment for statistical computing and graphics and provides a variety of statistical (e.g. clustering) techniques.

MGED Ontologies
The Microarray Gene Expression Data society has initiated an Ontology Working Group tasked with developing an ontology for describing samples used in microarray experiments. The use of the NCI Enterprise Vocabulary Services as well as MGED ontologies under development are being explored for future releases.

NCI's cancer Bioinformatics Infrastructure Objects (caBIO)
caBIO in an architecture that provides programmatic access via APIs (Java, SOAP, HTTP) to a variety of publicly available NCI intramural (CGAP, CMAP, etc.) and extramural data repositories (Unigene, LocusLink, etc.) The GEDP uses caBIO API's to obtain gene lists and cellular pathway information. caBIO is also used to map Affy reporter IDs to GenBank Accession numbers.

Please send comments and suggestions to ncicb@pop.nci.nih.gov | Privacy Notice | Accessibility Information

cancer.gov nih.gov H H S logo - link to U. S. Department of Health and Human Services firstgov.gov