Home » 2005 Annual Meeting » 2005_Poster_Summaries » #03 caArray Database: Data Management and Annotation Tool for Microarray Data

Document Actions

#03 caArray Database: Data Management and Annotation Tool for Microarray Data

caArray database is a standards based open source data management system that features MIAME 1.1 compliant data annotation forms, controlled vocabularies (MGED ontology), and MAGE-ML import and export. caArray also provides interfaces for programmatic access to microarray data and analytical tools. caArray database and tools can be accessed at http://caArray.nci.nih.gov

caArray data portal was designed based on the international data standard for micoarray data, MAGE-OM. caArray includes web-based data annotation forms that capture MIAME 1.1 level annotations using controlled terminology from MGED ontology. These annotations include information about contacts, protocols, biomaterials, experiments, arrays and array designs. caArray 1.0 supports submission of Affymetrix and GenePix native data files, as well as MAGE-ML import and export. In addition to document based data submission, caArray provides application programming interfaces (APIs) for programmatic access to data: MAGE-OM API can be used for fine grain data retrieval and EJB API via data transfer objects (DTO) for data transfer. End users can access the data files and annotations through the caArray data portal by downloading the native data files, or exporting MAGE-ML. caArray common data elements (CDEs) are stored in the NCICB’s shared, publicly accessible metadata repository, caDSR. Currently MGED ontology terms are stored in the caArray database. A connection to the caCore’s Enterprise Vocabulary (EVS) services will be available later this year. Working together with caArray database are two open source microarray data analysis tools: caWorkbench, a desktop tool for analysis, annotation and visualization of microarray data and webCGH, an application that allows users to view DNA copy number measurements relative to genome locations and annotated genome features. Both of these tools also connect to the NCICB’s cancer Bioinformatics Infrastructure Objects (caBIO) model, permitting access to a variety of genomic, cancer models, and clinical trials information. Additional tools that retrieve data from the caArray via the MAGE-OM API are being developed by several NCI -designated cancer centers funded by the cancer biomedical informatics grid (caBIG) program.

The caArray database and analysis tools were developed to be consistent with caBIG compatibility guidelines that highlight the use of controlled vocabularies, CDEs, well documented APIs and UML models. caBIG is a new initiative coordinated by NCI in partnership with other members of the cancer research community. caBIG seeks to create a network that links organizations, institutions, and individuals to enable the sharing of cancer research infrastructure, data, and interoperable tools. It is an open-access, open-source activity that promises to expedite progress in cancer research. caArray’s compatibility with the caBIG design requirements facilitate the cross silo use of cancer biology information to promote integrated cancer research.

caArray has a n-tier architecture and is built with future extendibility in mind. It utilizes J2EE framework and provides programmatic interfaces as well as a web portal interface for submission and retrieval of microarray data. EJB interface provides transactional API capability with the use of Data Transfer Objects (DTO s) used to transfer actual data. The Web portal built utilizing Struts framework uses the EJB and corresponding DTOs to perform transactions with the backend. An RMI based query api based on MAGE-OM provides fine grain programmatic access to the persisted data. Internally, caArray utilizes an object relational mapping (ORM) tool called OJB to abstract the Java Messaging actual data source from the application and provide object-based access to data. Netcdf, an binary file format with open API is utilized for storage and fast retrieval of data. This file format stores data in cube matrix format and can be queried on its dimensions. It allows for faster query and retrieval of data compared to database or text files. The application utilizes Service (JMS) for parsing large data files asynchronously.

caArray utilizes JAAS for authentication and authorization. The implementation of the role based access control uses the common NCICB security schema. This configurable security architecture allows for LDAP or RDBMS based authentication and has a concept of groups which allows for sharing of data amongst a consortium of researchers. MAGE-OM also utilizes a common security service to filter objects based on user roles and permission. This implementation is provided via Aspect Oriented Programming (AOP). The Solaris production environment is listed here, information about other configurations that we have tested and verified can be found at http://caarray.nci.nih.gov/caARRAY/devdoc/caarraydbdocs.

	DBMS	Application Server
Model	Sunfire 1280	SunFire 480R
CPU	4 x 900 MHZ (UltraSPARC III)	2 x 900 MHZ (UltraSPARC III)
Memory	8 GB	10 GB
Local Disk	36 GB (Mirrored)	36 GB (Mirrored)
Network Link Speed	Fiber	100 MB (Switched)
OS	Sun Solaris 5.8	Sun Solaris 5.8
Comments	Shared with other NCICB databases.DB: Oracle 8i	App server: JBoss 3.2.3

caArray datasets and open source tools are publicly available, and can be accessed at http://caArray.nci.nih.gov; caArray source code is available for local installations at http://ncicb.nci.nih.gov/download under an open source license. For more information about caArray database and tools, please contact the NCICB application support ncicb@pop.nci.nih.gov.

last modified 06-02-2005 02:45 PM

Related terms

Document Actions

#03 caArray Database: Data Management and Annotation Tool for Microarray Data