James Myers, Al Geist, Jens Schwidder, Alan Chappell,, Tara Talbott, Mike Peterson
During the last quarter, the SAM team focused its developement efforts on migration to the Slide version 2 code base, and on improving portal and Grid integration. Complementing this work were a large number of community interactions at major conferences and deepening coordination with collaborating projects. SAM-related presentations were given at Grid Forum 9, SuperComputing 2003, the 2nd International Semantic Web Conference, and the 2nd Annual Data Provenance and Annotation Workshop. Ongoing discussions with the NSF-sponsored George E. Brown, Jr. Network for Earthquake Engineering Simulation Grid (NEESgrid) project have have resulted in a plan to include the SAM-based Electronic Laboratory Notebook (ELN) in the NEESgrid software suite.
Ongoing work includes development of SAM 2.0 including support for versioning and semantically-scoped queries, development of the Data Format Description Language (DFDL) within Global Grid Forum, and interactions with the Jakarta Slide project and Java Content Repository standard Expert Groups. NEESgrid funded integration work will also begin in the next quarter.
Data Grid Integration: The SAM team is currently investigating options for integrating SAM's naming, annotation, translation, and records capabilities with underlying Data Grid repositories. Initial development work has demonstrated connections to Data Grids via GridFTP using the Java COG kit and design options for deeper integration of access control and other metadata are being explored.
Slide 3.0 Migration: Discussions with the University of Michigan CHEF project and the Storage Resource Broker team have identified the emerging Java Content Repository (JSR 170) standard as a potential integration API. To prepare for such use, the SAM team is continuing to investigate the changes that will be necessary to migrate SAM MMS and notebook functionality to the Slide 3.0 JSR 170 reference implementation. Slide 3.0 will provide a higher-level server-side API and will standardize some of the functionality for messaging and configurable security that the SAM project has added to Slide.
Semantic Grid: With the release of initial RDF capabilities functionality in SAM 1.1, and growing community interest in semantic data mapping, the SAM team is shifting emphasis towards the detailed design of the Semantic Services (SS) layer and Semantic Grid concepts. Several initial capabilities are being developed to help elicit requirements while design work proceeds towards a more comprehensive mechanism. In particular, a pedigree/provenance property has been defined whose value is dynamically generated based on a description of pedigree in terms of other relationships. The pedigree definition and property value are both in the Resource Description Framework (RDF) format. An additional mechanism to export an RDF description of all the existing webDAV properties on a resource, which will make SAM metadata available for processing in RDF-capable applications and agents, is in progress. SAM and CMCS team members presented papers describing this work at the Dublin Core and Semantic Web conferences (see below).
Data Format Description Language (DFDL):Work is continuing to design a standard for a language that can describe the content of arbitrary data files. The Grid Forum DFDL working group has been very active, working by email and during intesive meetings at Grid Forum 9 and SuperComputing 2003. The SAM team has been very involved in crafting the standard and has contributed significant concepts that derive from the developed of the BFD language and its extensions within the SAM project.
SAM 2.0:Initial migration to the Slide 2 codebase has been performed. On the client side, this work introduced support of https in the standard SAM library based on the Apache Commons http-client package. The SAM server has also been refactored to work with Slide 2 and work is continuing to expose and exploit newly available capabilities including DAV versioning, binding (hard links) and the DASL search language. The SAM team is also exploring the capabilities of new data storage modules developed for Slide which promise better performance and scalability. In terms of community involvement, this migration synchronizes SAM with the Slide team and we have begun submitting bug fixes that will be incorporated into the Slide 2 release.
Open Source Software Licensing: This work is essentially complete. A last review at PNNL on January 20, 2004 should allow us to proceed with posting source code at www.sourceforge.net.
Support For External Data Transformations: SAM now provides multi-step metadata generation and data transformation mechanisms that may include BFD, web service, and/or XSLT steps. This capability is being exploited in the CMCS project to provide translations of molecular structure file formats using third party tools wrappered as web services.
SAM team members participated in a wide range of meetings, workshops, reviews, and collaboration discussions during this quarter:
Collaboratory for Multiscale Chemical Science (CMCS): An ongoing collaboration related to the use of SAM as the primary CMCS data/metadata management system. CMCS collaborated on the design of the web service interface for metadata extraction and data transformation and is providing ongoing feedback including bug reports and performance evaluation.
Network for Earthquake Engineering and Simulation (NEES) Grid: PNNL has been offered a subcontract from the NEESgrid project to integrate the ELN and SAM capabilites into the NEESgrid infrastructure. The effort will leverage ongoing work in the SAM, CMCS, and CHEF projects and will focus on integration with the NEESgrid portal and metadata/data repository. This effort will result in a notebook capability for the NEESgrid project that will launch directly from the NEESgrid portal, provide Grid-based single sign-on with the portal, and store/retrieve data from the NEESgrid data/metadata repository. As a result of this subcontract, Jim Myers will resign from the NEESgrid External Advisory Board.
Web Downloads Registrations to download SAM and notebook software are continuing at a pace of 1-2 per day.
International Conference on Semantics for a Networked World,with a focus on Grid Databases: Jim Myers was invited to serve on the Program Committee for this conference, which will be held July 17-19, 2004, Paris, France.