To: ADEC Interoperability Working Group From: John Good, Joe Mazzarella, Bruce Berriman Subject: Alternative Plan for NASA Archive Interoperability We appreciate the work that Tom has put in getting the momentum going with his draft proposal. Appropriate cross-referencing / cross-linking can indeed be very useful, but we think that we can make much greater progress towards the goal of true integration and inter-operability without excessive effort, and we would like to propose here a plan for doing so. While we concur with some basic elements of the original plan (such as the importance of providing an XML output), it does not seem to incorporate adequate interoperability at the level of basic data services, or bring us close enough to the broader goals of the NVO regarding an infrastructure for data exchange. With modest efforts, we believe that we can get where we want to be in a 6-9 month timeframe, bearing in mind that with the limited resources available to us, we can only affort to engage in new efforts that have material benefit for each of our centers individually and apart from forwarding the broader NVO goals. Rather than starting with an approach that focuses on linkage of Web forms, we think we should begin with a focus on the data services themselves. Starting with some basic documentation and a directory of available services (URLs), combined with output options that supply data in 'raw' formats, a rich variety of data integration services can be easily constructed. Specifically, we suggest the following enhancements to our existing systems: 1. Document Primary URL Data Services: If we can all agree on SOAP as a query request structure standard, we won't object. However, we suspect it will be a while before everyone has time to implement it, and the need is not yet pressing. URL-based request mechanisms will need to be maintained for years to come, to support queries issued from HTML forms via Web browsers. Furthermore, most current data services require only a few simple keyword/value pairs, for which the URL mechanism works just fine whether issued from a Web browser or from a client computer program. There is little to be gained by adding the complexity of SOAP/XML for data requests until necessary to construct a specific distributed application. The main problem that most potential direct users of services have (besides the fact that they return HTML pages instead of raw data) is that there is often inadequate or no documentation of the services themselves. We need complete documentation of URL keywords, with appropriate values and ranges, etc., elevating the services to an "API" of sorts. We therefore suggest that all current and future services be documented and maintained in this manner. For an example of what we mean, see the services described in http://irsa.ipac.caltech.edu/applications/Oasis/svc/ 2. Provide "Raw Data" Output Modes for Data Services: All services should provide a mode (or alternate service) that returns "raw" data. By raw we mean a data stream containing a FITS image (e.g., MIME type image/x-fits) or spectrum, tabular data in either a flat ASCII data stream or - if we can agree on a consolidation of VOTable and XDF - an XML data stream. In some cases XML-based data constucts are better suited than ASCII or FITS tables, such as object attributes (and links to related data resources) in NED query reports, catalog lists, catalog data dictionaries, bibliographic reference lists, XY plots, data collections, etc. 3. Integration Services: The role of "integration services" should be to give the data centers (and astronomers with sufficient programming skills) all they need to build data integration and knowledge discovery tools suited to their targetted needs. This is a key goal of the NVO that is well within reach if facilitated by the ADEC efforts as outlined below. One example functional enhancement of our services would be to construct a uniform set of services that return very quick summaries (estimates are acceptible) of the data available for a region of the sky. For instance, if you were to ask IRSA for an inventory of its holdings withing 1 degree of M51, such a service would reply that there are about 3000 2MASS source, 2 IRAS ISSA images, 100 2MASS images, etc. NED would reply that it has approximately 106 galaxies, 3 galaxy clusters, 2 galaxy groups, 30 infrared sources, 1 QSO, 19 radio sources, 2 supernovae, 2 visual sources, and 19 X-ray sources. Such a service should take only a few seconds to respond and if we structure the response correctly (XML), parallel responses from all archives could be merged before presentation. We would also include, for each entry, the complete URL to the relevant data service. This is a fairly simple thing to implement; we have one example of a service that generates such statistics and most other centers have similar summary services. Each of us (and anyone else who is so motivated) will be free to use these services to collect and merge such inventories and present the result as appropriate for our user communities. Whether we format and present this composite inventory as a formatted HTML page or through a GUI; whether we go the next step and collect actual data and how we let the user tell us which subset of the data we want; and what we do with the data once we get it are all questions that should be left to the individual initiative of each center (or anyone else who wishes to build wrappers around our services). As an aside, we also point out that this can be the basis of a wider-reaching collaborative effort to define various "data collections" or "Concept Maps". 4. Regarding GLU: Recent experience and success (as guaged by a high degree of user satisfaction with the 1-click data service links in NED's query reports (using positions and object names) and within the OASIS data integration workbench) indicates that direct URLs work fine, without the extra maintenance required to support GLU. In astronomy there just aren't that many resources to deal with, and they change infrequently enough that it is quite easy to keep URL dependencies in software up to date. If this situation changes and it becomes truly advantageous to add directory type servies for NASA archive services and for the more general VO, an approach that utilizes industry-standard XML-based solutions such as RDF, WSDL and UDDI is preferable over GLU, for reasons well stated in Timothy Kimbal's message on 01/07/2002. We believe that the above plan is workable in a 6-9 month period, provides truly new functionality (rather than being a rehash of existing capabilities) and, most importantly, it has both the appearance and reality of moving toward true archive service interoperability.