SOFTWARE AND DATA REQUIREMENTS FOR THE XML REGISTRY FOR THE EPA-STATE NATIONAL ENVIRONMENTAL INFORMATION EXCHANGE NETWORK

 

 

 

CONTRACT NO. 68-W-99-002

TASK ORDER No. 021

 

 

 

Prepared for:

 

United States Environmental Protection Agency

Office of Environmental Information

1200 Pennsylvania Avenue, NW.

Washington, DC 20460

 

Task Order Project Officer:

 

Michael Pendleton

 

 

 

 

 

 

 

 

 

Prepared by:

 

Systems Development Center

Science Applications International Corporation

6565 Arlington Boulevard

Falls Church, VA 22042


 


 

class=Section2>

CONTENTS

EXECUTIVE SUMMARY...................................................................................................... ES-1

 

1.0      INTRODUCTION............................................................................................................... 1

1.1  Purpose.......................................................................................................................... 2

1.2  Scope............................................................................................................................ 2

1.3  System Overview........................................................................................................... 2

1.4  System Architecture........................................................................................................ 3

 

2.0      REFERENCES.................................................................................................................... 6

 

3.0      APPLICABLE STANDARDS............................................................................................ 7

3.1  OASIS/ebXML............................................................................................................. 8

3.2..............   International Organization for Standardization (ISO)/International Electrotechnical

 Commission (IEC) 11179-3:2000 Information technology – Metadata Registry (MDR) - Part 3, Registry metamodel and basic attributes................................................................................... 8

3.3  Universal Description, Discovery and Integration (UDDI)................................................ 9

3.4  Assumptions about Applicable Standards...................................................................... 11

 

4.0      SOFTWARE REQUIREMENTS..................................................................................... 11

4.1  Roles and Role Management......................................................................................... 11

                    4.1.1  Registration Authority........................................................................................ 12

4.1.2  Registry Administrator................................................................................... 12

4.1.3  Responsible Organization............................................................................... 12

4.1.4  Submitting Organization................................................................................. 13

4.1.5  Registry Clients.............................................................................................. 13

4.2  Accessibility................................................................................................................. 13

4.3  Lifecycle Management.................................................................................................. 14

4.3.1  Registration................................................................................................... 14

4.3.1.1  Registered Objects.  ...................................................................... 14

4.3.1.2  XML tags....................................................................................... 15

4.3.1.3  XML datatypes.............................................................................. 15

4.3.1.4  XML schemas (DETs).................................................................... 15

4.3.1.5  XML namespaces.......................................................................... 15

4.3.1.6  XML Trading Partner Agreements.................................................. 16

4.3.1.7  XML document.............................................................................. 16

4.3.1.8  WSDL document.  ........................................................................ 16

4.3.1.9  Registration Process....................................................................... 16

4.3.2  Development Forum...................................................................................... 17

4.3.3  Classification................................................................................................. 18

4.3.4  Administration............................................................................................... 18

4.3.5  Version Control............................................................................................. 19

4.3.6   Object Status Management........................................................................... 20

4.3.7   Validation..................................................................................................... 20

4.3.8   Modifying Content........................................................................................ 21

4.3.9   Approving Objects....................................................................................... 21

4.3.10  Retiring Objects........................................................................................... 21

4.3.11  Removing Objects....................................................................................... 21

4.3.12  Quality Control and Error Handling.............................................................. 22

4.3.13  Audit Trail Maintenance............................................................................... 22

4.4  Query Management...................................................................................................... 22

4.4.1  Discovery/Query........................................................................................... 22

4.4.2  Retrieval........................................................................................................ 23

 

5.0      DATA REQUIREMENTS................................................................................................ 23

5.1  XML Objects and Metadata......................................................................................... 23

5.2  Data Requirements of the OASIS/ebXML RIM version 2.0.......................................... 25

5.3  Data Requirements of the UDDI Specification version 3.0............................................. 26

5.4  Data Requirements of the ISO/IEC 11179 Metamodel.................................................. 27

5.5  Data Requirements Summary........................................................................................ 28

 

6.0     INTEROPERABILITY REQUIREMENTS.................................................................... 30

6.1  Security and Privacy .................................................................................................... 30

6.2  Linkages....................................................................................................................... 30

 

7.0      CONCEPT OF OPERATIONS........................................................................................ 31

 

8.0      PRELIMINARY REGISTRY TOOL OPTIONS............................................................ 33

8.1   Background................................................................................................................. 33

8.2   Existing Online Registries............................................................................................. 34

8.3   Available Registry Software......................................................................................... 35

8.4   Commercially Available Tools...................................................................................... 36

8.5   Related Software......................................................................................................... 37

 

9.0      ACCEPTANCE REQUIREMENTS................................................................................ 38

 

 

EXHIBITS

 

Exhibit 1.  Major Metadata Groupings............................................................................................ 24

Exhibit 2   Data Requirements Summary......................................................................................... 29

 

APPENDIXES

 

Appendix A     Summary of XML Registry Software Requirements

Appendix B     Data Requirements for the OASIS/ebXML Registry Information Model v. 2.0

Appendix C     Data Requirements for the UDDI Specification v. 3.0

Appendix D    Data Requirements for the ISO/IEC 11179 Part 3 Metamodel

Appendix E     XML Registry Requirements Glossary


EXECUTIVE SUMMARY

 

“In the simplest sense, the benefits of XML will be achieved only if organizations of a significant number are using the same XML definitions.  Therefore, these XML definitions must be available for partners to discover and retrieve.  A registry/repository is a mechanism used to discover and retrieve documents, templates, and software (i.e., objects and resources) over the Internet.”  (http://xml.gov)

 

The Environmental Protection Agency (EPA) and its state and tribal information trading partners have initiated collaborative design and development of an Internet-based voluntary National Environmental Information Exchange Network (Network) for state, federal, and Native American Tribal environmental agencies.  An eXtensible Markup Language (XML) Registry is proposed as a component of the Network to serve as a clearinghouse of Network related information, as well as to provide operational support for implementation of the State and EPA nodes of the Network.  In addition, the State-EPA Network XML registry may become part of a larger federation of federal XML registries.  The registry will support both human and automated interactions supporting XML object registration, object status tracking, as well as querying and retrieval for reuse. 

 

The goal of the Network Steering Board is to provide a vehicle for standardizing information exchanges to improve the quality and consistency of the data, and to reduce the reporting burden on the states and tribes.  Therefore, the Network dataflows should be based on data standards that are stored in the Environmental Data Registry.  To ensure the greatest interoperability, the XML Registry should achieve the linkage between data standard metadata and the XML schemas and related documents that are based upon the approved data standards.  To support harmonization of dataflows on the Network, it is important that approved XML schemas and the standard XML tags and other component parts be available for discovery and reuse and reference in new XML schemas.

 

To achieve all of these goals, the proposed XML Registry will be developed based upon three standards: Organization for the Advancement of Structured Information Standards/Electronic Business using eXtensible Markup Language (OASIS/ebXML), International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 11179, and Universal Description, Discovery and Integration Initiative (UDDI).  The OASIS/ebXML standard will be used as a source of specifications for basic XML registry functionality and services.  ISO/IEC 11179 will be used as a source of specifications for the storage of XML tags that are related to corresponding, well-documented  data elements, along with associated enumerated value lists, and their linkage to other XML objects (documents, trading partner agreements, datatypes).  The UDDI specification will guide the registration and discovery of Web services that are part of the Network.

 

This XML Registry Requirements Document will serve to inform the decision about whether to acquire or build an XML Registry to support the Network.  The document outlines applicable standards, surveys available tools, and describes functional and data requirements needed to support the Network.  Once initial decisions have been made on the requirements, an analysis of available implementation options will be developed.   


1.0       INTRODUCTION

 

“In the simplest sense, the benefits of XML will be achieved only if organizations of a significant number are using the same XML definitions.  Therefore, these XML definitions must be available for partners to discover and retrieve.  A registry/repository is a mechanism used to discover and retrieve documents, templates, and software (i.e., objects and resources) over the Internet.” (http://xml.gov)

 

EPA and its state and tribal information trading partners have initiated collaborative design and development of an Internet-based voluntary National Environmental Information Exchange Network (Network) for state, federal, and Native American Tribal environmental agencies.

 

According to the State/EPA Information Management Workgroup, “a Network based on standardized Internet language will allow individual agencies to invest in internal data storage systems of their choice at a pace they can afford, while also supporting easy exchange of environmental data between agencies.”  The Network will facilitate information exchanges between “nodes” maintained individually by participating partners that will use the Internet       to exchange information via standardized eXtensible Markup Language (XML) Data Exchange Templates (DETs) or schemas. [The term schema will be used in this report to refer to an XML document designed for data exchange].  Schemas will be based upon the approved data standards to bring better consistency and quality to the data that trading partners exchange.  Exchange of data between nodes will be governed by Trading Partner Agreements (TPAs) between the partners.  TPAs document the agreed upon data, exchange format, frequency of exchange, security, and related issues.

 

One of the critical nodes on the Network will be an XML Registry that will provide the capability to share information about XML schemas approved for use on the Network, as well as information about schemas under development.  An XML Registry contains registry entries that contain descriptive information, or metadata, about registered XML objects.  The objects may be stored in the registry or in a related repository.  The registry supports the submission and registration of objects, administration of the objects, and makes the metadata available for discovery, understanding, and reuse.  This XML Registry will serve as a location for one-stop shopping of selected information related to the Network, including both a “clearing house” for information and “operational support” for Node implementation.  It should not duplicate functions provided on other Network Nodes.

 

As the information on the Network should be based on data standards approved by the Environmental Data Standards Council (EDSC), the XML Registry should be related to the Environmental Data Registry (EDR) that contains metadata about standard data elements, associated enumerated value domains, and data element groups.  Data standards are "documented agreements on formats and definitions of common data” that are established to bring better consistency and quality to the information that organizations maintain.  The EDR also registers application data elements.  Data trading partners may also develop XML schemas for data they want to share.  It should be possible to document the data elements (as specified by XML tags in an XML schema) in the EDR, even though the data may not be “standardized” through any formal process.

 

1.1       Purpose

 

This XML Registry Requirements document serves to document the requirements for an XML Registry to support the EPA/State Network.  It is part of a series of documents designed to inform the decision about how to provide an XML Registry to support the Network.  The document describes applicable standards, surveys available tools, and describes functional and data requirements needed to support the Network. 

 

1.2       Scope

 

This document identifies functional and data requirements of the XML Registry software, as well as necessary interconnections to related applications.  This document does not include design specifications for the XML Registry, as it may be used to inform a decision to purchase an available registry solution rather than to build a new one.  An options analysis will be addressed in a follow-on document.  The document also includes a high-level concept of operations based upon current understanding of the Network architecture.  A more detailed concept of operations may be developed after a registry solution is selected and the architecture of the Network is more fully defined.

 

1.3       System Overview

 

An XML Registry is planned as part of the Network to serve as a central location for XML objects and related resources.  The XML Registry will provide a lifecycle management interface that will be a tool to manage XML objects through their development and implementation lifecycle.  This interface will be accessible to a limited set of authorized users who will make use of the registration and update functions to manage the metadata about the XML objects, including their status, version, and organizational contacts.  It will provide a forum for exchange of information about XML objects under development to promote harmonization and reuse of schemas.  It will provide a means of tracking an XML object through its progress from development to review to approval.  And, it will provide a source of standardized formats for transmitting data. 

 

The XML Registry will include a query interface that will allow users such as system developers to access available resources (such as schemas and trading partner agreements)


through a central registry, in order to promote reuse and discourage development of disparate exchange formats.  The query and retrieval functions will include both a Web site to support human interactions with the XML Registry and an Application Programming Interface (API) that will enable automatic query and retrieval of objects from the Registry.  As the XML objects in the Registry will be linked to the related data elements and definitions, users will be able to query the Registry based on semantic content, assuring more efficient searching and effective query results.

 

The EPA-State Network XML Registry will include both a registry and a repository function.  A registry is a facility that stores relevant descriptive information (metadata) about registered objects, and makes that information available for discovery, understanding, and reuse.  A repository is a storage and retrieval facility for registered objects that can be retrieved.  Note that a registered object can be stored in the registry, in a repository connected to the XML Registry, or in another separate place since an XML object may be accessed through use of a Unique Identifier (UID) that references the object’s location.

 

A registered object is something that an organization wants to publish for discovery and retrieval.  Registered objects may include: XML tags (elements), enumerated value lists, XML schemas, XML schema components, XML datatypes, XML namespaces, XML documents, trading partner agreements, and administrative documents (submittal and approval documentation). 

 

Section 3.0 of this document provides an introduction to the standards that are applicable to the XML Registry design and operation.  At this time, no single standard describes a comprehensive XML Registry to manage the full array of objects needed to support standards-based XML.  The data standards that support the dataflows on the Network should be fully documented in the XML Registry, and the Registry should provide Web services to support business to business transactions.  The registry will need to include documentation for the data elements) referred to by the XML tags in schemas as well as the XML schemas themselves.  The registry will need to include data elements and their definitions to help manage the semantics (meaning) of data from the time of creation through all stages of processing, analysis, and use.  To meet the requirements of the Network, the XML Registry will need to be based on a combination of standards, including ISO/IEC 11179, OASIS/ebXML, and UDDI.

 

1.4       System Architecture

 

The XML Registry described in this requirements document would provide a single source of metadata for data elements, XML schema, and Web services to support the development of harmonized, standards-based data exchanges.  An architecture is needed that will support the entire Network enterprise.  It is envisioned that state and EPA programs will be developing schemas to define data exchanges on the Network and searching for and using Network schemas to format instance documents used in actual data exchanges.  Following is a list of issues to be considered in selecting an XML Registry Architecture. 

 

C           Availability and Reliability.  As it is envisioned that the XML Registry will support day-to-day Network operations by serving schemas for the validation of data in instance documents, the registry needs to be deployed on a robust platform.  The registry needs to be reliably available during business hours across the entire United States, which  will require selection of an architecture that can provide the needed availability.

 

C           Currency.  As it is envisioned that the XML Registry will serve as the source of standardized XML components and the system of record for current schemas in use, it is important that the data be kept current.

 

C           Information Sharing.  There is a requirement that the XML Registry serve as a forum for collaborative development of schemas, which means that the architecture needs to support sharing information about standard XML components, and provide a forum for discussion about schema under development.

 

C           Security.  Security of information in the XML Registry is required to ensure that the data not be altered due to intentional or unintentional actions.  Standard Internet security methods, such as secure sockets layer, will be required to protect both the data and the servers hosting the data.

 

Architectural options to be considered include: 

 

C           A single, centralized XML Registry.

C           A distributed network of XML Registries.

C           Multiple registries operating in a peer-to-peer network.

 

A single, centralized XML Registry could manage the information about all of the dataflows on the Network. 

 

The following describes the benefits of a single, centralized registry.  The single registry option can provide the greatest benefits for easing information sharing and maintaining current information.  A single registry allows the Network to reference one location for all standard XML components, thus improving ease of query and retrieval.  A single registry could provide a sole discussion forum about schema under development, thus engaging all potentially interested parties in harmonizing schemas.  The single registry provides the simplest solution for maintaining current information on schemas in use since it avoids the problem of duplicating or replicating data and maintaining data in different locations.  The registry is intended to provide data update services.  If data is updated in a variety of registries, extra effort is needed to copy updates to the various registries on the Network to maintain currency.  The single registry can also provide greater data security as it ensures that data and system integrity are overseen by a single operation. 

 

The drawbacks to a single registry include its possible failure during Network operations.  The single registry does represent a single point of failure, a situation that can be overcome by the architectural solution chosen for implementation.  A computer center can provide a backup, mirrored environment to ensure continuous operation.  A single, centralized registry may also be overloaded by Network operations requests, which can be overcome by providing adequate telecommunications and processing capacity to support demand.

 

A distributed network of XML registries could be managed separately by the various participating organizations.  One benefit of the distributed network is that it enables each participating organization to manage its own registry for its own XML components.  For example, a state environmental agency could have an XML registry on its node where it could manage XML components for use on the Network, as well as other state-specific XML components.  If each registry on the network maintained a copy of all of the Network XML components, this distributed architecture would provide an automatic backup system in the event that one registry fails to operate.  However, keeping multiple copies of the XML Registry current across multiple registries is a major endeavor that requires the resources needed to automatically propagate changes to all the registries to avoid a problem with data currency.  Also, the automatic propagation creates a potential for errors caused by collisions with other XML registry content.  The distributed registry will also make it more difficult to query and retrieve the standard XML components for reuse.  In addition, one of the goals of the Registry is to serve as a collaboration tool for coordinated development of harmonized schema.  At this time, harmonization is easier to facilitate through a single source of current information about XML components that are undergoing change with resulting changes in versions.  With multiple registries, sharing information across systems and tracking changes/versions becomes more difficult. 

 

The third option is a peer-to-peer architecture in which multiple registries are networked together and XML objects can link to data on other network servers.  In this model, data would be shared among the systems.  The intent of a peer-to-peer model is to allow registry participants to link to XML components on a number of servers, building on XML products provided by a number of Network participants, and maintained by those participants on their individual registries operating in a shared environment.  This model could distribute the responsibility and the cost of XML registry data maintenance among all participants.  Although peer-to-peer architecture does not require replication of data across participating servers, some data replication would be needed to avoid the availability issue presented by the potential downtime of a single, central registry.  The need for some data replication adds costs and raises potential error, just as with the distributed network of registries.  In addition, peer-to-peer architecture presents potential security problems to those participating in the Network.

                                                           

The Environmental Information Exchange XML Registry requirements include a registry service that provides the means for managing objects in a repository and a registry client that is used to access them.  To support lifecycle management and human querying, the registry services will be implemented using a public Web site with some functions restricted to registered users via authentication using Secure Sockets Layer (SSL).  A Web services API will support the automated business to business transactions.  For example, a search command across multiple sites that are part of a UDDI network would enable an organization to find and retrieve schemas in the XML Registry using keywords in a search (like water or waste).  In addition, one of the requirements is the need for the registry to communicate to other registries that may contribute to or download from the central registry. 

                                               

 

2.0       REFERENCES

 

Blueprint for a National Environmental Information Exchange Network, (Information Management Working Group) Network Blueprint team, October 30, 2000; document amended June 2001.

 

Cooperation between XML Registries and Related Registries, A Collaborative Effort between the XML Working Group and Federal and State Government Agencies, XML Working Group Task 2.2.3.2 Registry Standards Harmonization http://xml.gov/documents/completed/lbnl/20020417status.htm

 

DISA Registry Initiative, http://www.disa.org/drive/Registry_resources.html

 

DoD XML Registry, http://diides.ncr.disa.mil/xmlreg/user/index.cfm

 

ebXMLSoft, Inc., http://www.ebxmlsoft.com/index.html

 

Freeb XML Initiative, http://www.freebxml.org/registry.htm

 

ISO/IEC 11179 Information Technology–Metadata Registries (MDR), http://metadata-stds.org/

 

ISO/IEC FDIS 11179-3 Information technology – Metadata Registry (MDR) - Part 3, Registry metamodel and basic attributes, June 2002.

 

Logistics Management Institute, Requirements for an XML Registry, May 2001.

Metadata for Documents, http://www.nist.gov/sc4/liaisons/tc10sc8/Annex1.ppt

                             

National Environmental Information Exchange Network Information Package, June 2001.

 

OASIS/ebXML Registry Information Model, v 2.0.  By members of the OASIS/ebXML Registry Technical Committee.  Approved April 2002, http://www.oasis-open.org/committees/regrep/documents/2.1/specs/ebrim_v2.1.pdf

 

OASIS/ebXML Registry Services Specification, v 2.0.  By members of the OASIS/ebXML Registry Technical Committee.  Approved April 2002, http://www.oasis-open.org/committees/regrep/documents/2.1/specs/ebrs.pdf

 

OASIS ebXML Registry Technical Committee, http://www.oasis-open.org/committees/regrep/

 

Oracle Corporation Web Services Overview,  http://www.oracle.com/ip/develop/ids/jdevdocs/9iWebSv.pdf and http://otn.oracle.com/tech/webservices/htdocs/uddi/overview.html

 

Reference Model for an Open Archival Information System (OAIS), http://www.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf

 

State/EPA Registry at NIST, http://xmlregistry.nist.gov/EPA-States/

 

Universal Description, Discovery and Integration (UDDI) version 3.0, Published Specification, 19 July 2002, http://uddi.org/pubs/uddi-v3.00-published-20020719.htm

 

XML.gov Registries, http://xml.gov/registries.htm

 

XML.org Registry, http://www.xml.org/xml/registry.jsp

 

XML Registry and Repository, from OASIS Cover Pages, http://xml.coverpages.org/xmlRegistry.html

 

 

3.0       APPLICABLE STANDARDS

 

Currently there are various specifications for XML registries including, pre-eminently, the OASIS/Electronic Business using eXtensible Markup Language (ebXML) Registry standard, which is being developed by the Organization for the Advancement of Structured Information Standards (OASIS), and the UDDI specification, which was developed by a vendor consortium and is undergoing further development by an OASIS Technical Committee.  Also relevant to the XML Registry is the ISO/IEC 11179 Metadata Registry standard.

3.1       OASIS/ebXML

 

OASIS and UN/CEFACT initially developed separate XML Registry/Repository specifications.  The efforts were merged into a single OASIS Technical Committee.  The goal of the OASIS/ebXML registry is to “provide a stable store where information submitted by a Submitting Organization is made persistent.”  The stored information can be used to facilitate ebXML-based Business to Business (B2B) partnerships and transactions.  Submitted content may include XML schema and documents, process descriptions, ebXML Core Components, context descriptions, Unified Modeling Language (UML) models, information about users and user roles, and software components.

 

The OASIS/ebXML registry information model (RIM) is intended to achieve interoperable registries and repositories with an interface that enables submission, query, and retrieval on the contents of the registry and repository.  The registry specification is designed to serve a wide range of business categories by covering the spectrum from general purpose document registries to real-time B2B registries.  The registry specification includes the OASIS/ebXML Registry Information Model version 2.1(v. 3.0 is expected to be published early in 2003), which provides a blueprint for the ebXML Registry.  The information model can be used to guide registry implementers in registry design.  The RIM describes the types of objects that are stored in a registry, the type of metadata recorded about the objects, and how the information in a registry is organized.

 

The Registry Information Model is accompanied by a Registry Services Specification, which "defines the interface to the ebXML Registry Services as well as interaction protocols, message definitions and XML schema."  Registry Services may be implemented in several ways including, a public Web site, a private Web site, or hosted by a Virtual Private Network (VPN) provider.  The ebXML Registry Service is comprised of a set of interfaces designed to manage the objects and inquiries associated with the OASIS/ebXML Registry.  The two primary interfaces for the Registry Service consist of a Life Cycle Management interface that controls the processes necessary for managing an object within the XML Registry and a Query Management Interface that controls the release of information from the XML Registry.  Both of these interfaces are accessed through the use of a Registry Client Interface.  The Registry Services Specification defines the interfaces exposed by the Registry Service as well as the interface for the Registry Client.  The XML Registry makes use of a repository for storing and retrieving persistent information required by the Registry Services.

 

3.2       International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) 11179-3:2000 Information technology – Metadata Registry (MDR) - Part 3, Registry metamodel and basic attributes


ISO/IEC 11179 is not an XML specification.  The ISO/IEC 11179-3:2002 revision being published in December 2002, provides a metamodel for shareable data through specification of a data element metadata registry and guidance for full documentation of data elements.  The purpose of the standard is to promote the standardization and registration of data elements and their components that document data in order to make the data understandable and available for sharing and reuse.  The standard provides guidance on the formulation and maintenance of discrete descriptions and semantic content (metadata) that can be used to formulate data elements and their components in a consistent, standard manner.  It describes the data element characteristics necessary to uniquely identify and fully document data elements and their components to enable sharing, including identifiers, definitions, and classification categories.  The standard includes the documentation of value domains that store the names and definitions of the permissible values or enumerated values associated with data elements.

 

ISO/IEC 11179 describes a metadata registry that can assist users of shared data in having a common understanding of an item’s meaning, representation, and identification.  Metadata about data elements and their components is stored in a metadata registry that can support data sharing with descriptions of data.  Registration is the process of documenting the metadata about data elements and their components.  Registration should be carried out at the data element and component level to promote and maximize semantic value.  Complete data element metadata, as outlined in the ISO/IEC 11179 model, enables the end user to interpret the intended meaning confidently, correctly, and unambiguously.

 

There are commonalities between a data element metadata registry and an XML registry.  Both seek to document reusable specifications for information objects, with ISO/IEC 11179 focusing on the individual parts (XML tags and data elements) and both types of registries registering groups (XML schema and data standards).  XML schema are hierarchical groupings of elements of data.  The ISO/IEC 11179 metadata registry model includes the semantic content about data elements in the form of definitions and supporting information that can provide a valuable tool for searching by keyword or concept.  Merging the capability of a 11179 metadata registry and an XML registry would meet many needs for B2B transactions.

 

There are plans to improve interoperability between ISO/IEC 11179 metadata registries and XML registry/repositories.  This collaboration will be the subject of a conference planned for Santa Fe, New Mexico in January 2003.

 

3.3       Universal Description, Discovery and Integration (UDDI)

 

UDDI (Universal Description, Discovery, and Integration) is an XML-based specification which allows businesses to publish and discover information about their business  functions and Web services offerings.  Web services define any Internet-based applications that perform specific tasks and comply with a standard specification; they are platform and language neutral, and can be described, published, discovered, and invoked dynamically in a distributed computing environment.

 

The UDDI project is an industry initiative to create a platform-independent, open framework for describing services, discovering businesses, and integrating business services using the Internet, as well as an operational registry that is available today.  UDDI is the first truly cross-industry effort driven by all major platform and software providers, as well as marketplace operators and e-business leaders.  UDDI describes a Web service as “a self-describing, self-contained, modular unit of application logic that provides some business functionality to other applications through an Internet connection.  Applications access Web services via ubiquitous Web protocols and data formats, such as Hyper-Text Transport Protocol (HTTP) and XML, with no need to worry about how each Web service is implemented.  Web services can be mixed and matched with other Web services to execute a larger workflow or business transaction.”   This specification can enhance description, discovery, and integration capabilities for the Network.

 

The UDDI registry can be conceptually viewed as an "extended telephone directory," providing registration and searching of:

 

C           White pages (business address, contact, known identifiers).

C           Yellow pages (categorizations based on standard taxonomies such as industry classification or recognized locational codes).

C           Green pages (technical information about Web services such as interface specifications expressed in Web Services Definition Language (WSDL)).

 

The UDDI registry specifications were developed under the auspices of OASIS and are described on the UDDI Web site (http://www.uddi.org).  Version 3.0 of the Published Specification dated July 19, 2002 can be found at this Web address:  (http://uddi.org/pubs/uddi-v3.00-published-20020719.htm). The UDDI Business Registry supports Web services by providing a place for a company to register its business and the services that it offers.  People or businesses that need a service can use this registry to find a business that provides the service.  Major services, such as the EDR or Network XML registry can be registered in a UDDI registry, with a pointer to each service.  Lower-level content in the EDR and Network XML registry could also be registered in a UDDI registry, with appropriate lower level pointers to the content.  The registry specification describes the major information components for this Web services registry, including metadata about business entities, business services, service types, and specification pointers to technical information about the service.

 

The XML Registry should enable access to contact information, Web addresses, and interfaces of Web services.  It should deploy, manage, and secure Web services for the Network.

3.4       Assumptions about Applicable Standards

 

One of the goals of the XML Registry is to enable management and linking of atomic information objects like data elements and the groupings of those elements in order to provide the capability of managing the data elements individually and leveraging those granular components in a variety of structures (schemas).  The registry should also make them available for searching based on semantic content.  Another goal is to provide an active Web services registry as one of the nodes of the Network.  In order to meet these goals, a combination of standards is needed.

 

The XML Registry needs to comply with parts of the most recent OASIS/ebXML standards, including the RIM and the Registry Services specification, as those are the most widely accepted standards in the emerging field of XML schema management.  As the requirements for this XML Registry include linking XML group objects to elementary objects (XML tags, data elements, enumerated lists, and related components and metadata) the ISO/IEC 11179 standard can be used to define the data requirements for administering the element data.  As it is intended that the XML Registry will provide Web services, the XML Registry will also be based on the UDDI standard.  Therefore, the data requirements for the XML Registry will be based on a combination of the OASIS/ebXML RIM, the metamodel for the ISO/IEC 11179 standard for metadata registries, and the UDDI specification.

 

 

4.0       SOFTWARE REQUIREMENTS

 

This section will describe functional requirements for the XML Registry.  Appendix A includes a summary of the requirements.  Key requirements include: support for human as well as automated system interactions to search and retrieve XML objects, linkages to Web services, potential for interoperability with other XML registries, inclusion of an expandable hierarchical classification scheme for organization of XML objects for discovery and retrieval, linking XML objects to related data element metadata to support discovery based on semantic content, and registration of Web services that will be part of the Network.

 

This section describes the supported user roles and the requirements for registry accessibility.  The registry requirements are described in two sections: lifecycle management and query management.  It is assumed that the registry services will be available through interfaces designed for both automatic and human interactions with the registry content. 

 

4.1       Roles and Role Management

 

The OASIS/ebXML Registry Services Specification describes the different types of XML Registry users.   While the Responsible Organization (RO) is the “owner” of the XML objects submitted on its behalf, the associated  Submitting Organization (SO) is the Point of Contact for the object.  As the ISO/IEC 11179 standard has an established set of registry roles and responsibilities, the OASIS/ebXML standard modeled their roles and responsibilities on those in ISO/IEC 11179.  It is assumed that the XML Registry will adopt the same structure and extend the list as needed.  For registry administration purposes, it is important to distinguish between Registry Guests or Clients and Registered Users.  Registry Clients do not have rights to submit or update Registry content, but only have query access to discover and access content.  They have no contract and do not require authentication to use the registry.  Registered users have a contract with the Registration Authority (RA) that must be authenticated for usage.  They can submit or update Registry content.  The following sections describe the categories of Registry Users.

 

4.1.1  Registration Authority.  A Registration Authority (RA) is the host of the registered XML Objects and is responsible for the content of the Registry.  The RA is the single organization responsible for establishing and maintaining information about the XML objects.  The RA is responsible for:

 

C           establishing policies and procedures for using the registry.

C           enrolling and maintaining a list of Registry organizations and users.

C           providing authentication certificates for appropriate Registry organizations and users.

C           ensuring that registered objects are reused by Registry users.

C           receiving and processing submissions for registration of objects.

C           assigning appropriate registration and administrative statuses to the objects.

C           deleting objects, if necessary.

 

The ISO/IEC 11179 standard permits a network of hierarchical registration authorities, with a single Registration Authority responsible for the entire registry, complemented by subsidiary RAs associated with a specific program or node.  The Registration Authority is a type of Registered User.

 

4.1.2  Registry Administrator.  A Registry Administrator oversees the XML Registry and is responsible for the availability of its services and the integrity of its data.  The Registry Administrator evaluates and enforces registry security policy.  The Registry Administrator may be the same individual as the Registration Authority.  The Registry Administrator is a type of Registered User.

 

4.1.3  Responsible Organization.  A responsible organization (RO) oversees the coordination of XML objects in a particular organization (e.g., a program office or a state) and the contents of the metadata associated with the XML object.  An RO can create registry objects.  An RO is a type of Registered User.  An RO is responsible for:

 

C         Advising about names, meaning, and permissible values of tags or data elements submitted for registration.

C         Coordinating the development of objects so that proposed elements are unique and proposed groups use standard elements, and proposed schemas are harmonized with related dataflows.

C         Identifying the need to update registered objects.

C         Ensuring the quality of the metadata for the XML objects associated with the RO.

 

4.1.4  Submitting Organization.  A Submitting Organization (SO) is the organization or unit within an organization that submits an XML object for addition, change or cancellation/withdrawal.  They would be enrolled as a Registry user and issued an authentication certificate to perform a number of lifecycle operations on their own XML Objects.  An SO can be the same as an RO or the point of contact for an RO.  An SO is a type of Registered User.

 

An SO might be a functional area business manager or an application system manager.  An SO  is responsible for:

 

C         Identifying and documenting XML objects appropriate for registration.

C         Submitting proposals for registering XML objects to the appropriate RO. 

 

4.1.5  Registry Clients.  A registry client is any person who uses the registry to discover and retrieve an XML object.  Registry clients do not need to be registered users.  It is important for the XML Registry to track retrievals of XML objects so that usage of the objects can be tracked.  If an unregistered user retrieves an XML schema and makes use of it or references it from an external XML schema, that user will want to know if that schema has been superseded by a subsequent version or has been retired.  One way to track usage is to require users to enter their name and email address when they retrieve an object.  That way, they can be notified when the retrieved object changes its status.  Alternatively, users can be given the option of signing up for notification of change.

 

4.2       Accessibility

 

The XML Registry needs to be accessible to all participating Network partners.  It also needs to be secure, so that data cannot be compromised or lost.   Data maintenance functions would be accessed over a secure connection using SSL.  Data maintenance would be carried out on a backend server, and a schedule would be set up to capture data and copy it to the public access server where it could be searched, but not modified. 

 

Procedures for lifecycle management of the XML objects need to be developed so that rules can be published regarding the accessibility of objects during different draft states. 

As the XML Registry will need to be available to serve schemas upon request by Network Nodes to support runtime validation, the XML Registry needs to be available on a 24 hours a day, 7 days a week basis in a robust environment to support reliable operations.  The XML Registry would also include a public access capability to discover and retrieve published XML objects.

 

4.3       Lifecycle Management

 

This section will describe the functions of the registry required to manage the registration of objects and subsequent management of them, including version changes, configuration management, content updates, and retirement. 

 

4.3.1    Registration

 

One of the goals of the XML Registry is to serve as a central repository that is the single source of information related to the Network.  The Network dataflows will be built upon standards, and standards will take a number of forms, including standard data elements, and collections of data elements in data types or schema components.  Therefore, the XML Registry needs to register the full array of XML objects so it can adequately support the Network. 

 

4.3.1.1  Registered Objects.  The XML Registry needs to register the following objects: XML tags (elements), enumeration lists, XML schema, XML schema components, attributes, datatypes, Trading Partner Agreements (TPAs), and Web Service Definition Language (WSDL) documents.  The XML Registry needs to include functionality for registering and managing XML objects within namespaces.  The XML Registry will have administrative functions, and will record the status of XML objects and keep track of which schemas are using different schema components.  As a result, it will also need to store XML documents, including standard approval documents for XML schema.  All objects in the registry will have a Universally Unique identifier (UUID).  The registry will generate the UUID upon object submittal unless the user supplies a conforming UUID along with the submitted object. 

 

One of the tenets for managing dataflows on the Network is to harmonize the schemas that are exchanging data on the Network.  Harmonization involves ensuring that the dataflows don’t duplicate or conflict with one another.  This is achieved by developing data types or sub-schema components for some pieces of data and posting those in a central location.  These are, in turn, referenced by other schemas on the network, ensuring reuse of schema design and consistent data flows.  In this fashion, the XML Registry would support, though not enforce, harmonization of schemas on the Network.

 

One of the organizational strategies for the XML Registry is use of namespaces, which will provide a context for tags and schemas and ensure that there will be no tag name collisions.  Namespaces will need to be hierarchical with one namespace for everything exchanged on the Network, and possibly, subsidiary namespaces for specific program areas or specific data exchanges.  The network is developing a Core Reference Model (CRM) that may provide some structure for the Network namespaces.

 

Additional metadata to be managed in the XML Registry includes contact and administrative information about submitters and users of particular XML objects.  This will facilitate collaboration between organizations seeking to create and use XML schema for a particular dataflow. 

 

4.3.1.2  XML tags.  XML tags represent elements that are the basic building blocks of an XML instance document.  XML tags are similar to data elements in a database.  Elements in a schema can be defined globally or locally.  Global elements are defined in the root element of a schema and may be defined through a complex datatype.  Local elements are nested inside a schema structure and cannot be externally referenced.  Attributes can be recorded about each XML tag to provide further information about the elements.  XML tags can be associated with enumerations or lists of enumerated values that establish the names or codes and associated definitions for a set of codes or values that form the content of the data element.  XML tags will be related to data element metadata, including related definitions.

 

4.3.1.3  XML datatypes.  XML datatypes represent the kinds of information that elements and attributes can hold, such as character strings or dates.  Simple data types can be predefined or user-defined.  Examples of predefined datatypes include string, decimal, date, integer, and the like.  Complex datatypes are user-defined datatypes that contain child elements or attributes, and they can be defined globally in the root element of a schema or locally anywhere in a schema associated with a single element.

 

4.3.1.4  XML schemas (DETs).  XML offers a number of DETs.  XML Document Type Definitions (DTDs) and XML schema both are document types that define the required structure of an XML document and the constraints on its content.  While initial DET development was done using DTDs, EPA applications are moving towards XML schemas as a standard for designing data exchange protocols.  It is assumed that XML schemas will be developed according to the DET guidance.  The Network Core Reference Model may also provide some structure for Network schemas.  This model can form the basis for developing a parser to validate the schema. 

 

4.3.1.5  XML namespaces.  The World Wide Web Consortium (W3C) defines an XML namespace as “a collection of names, identified by a Uniform Resource Indicator (URI) reference, which are used in XML documents as element types and attribute names.”  In practice, namespaces are being used to identify groups of XML objects that share a common context within a specific business or program area.  The addition of the namespace context allows overlapping XML to be tagged with distinguishing labels, to avoid potential name collisions in application.  Therefore, namespaces are organizational mechanisms that allow a business area data steward to oversee naming in a particular area to enforce naming conventions and ensure uniqueness.  A namespace is declared in the root element of a schema, using a namespace identifier through a user-defined namespace prefix.  (A namespace identifier would be www.epa.gov/xml and the namespace prefix would be epa.)  At this time, EPA has not decided on a namespace strategy.  However, it is clear to developers of XML schemas that namespaces will be needed to ensure uniqueness of naming to avoid tag name collisions on the network.  Namespace management will require controls on changing objects that are referenced externally and requirements for notification of changes.

 

4.3.1.6  XML Trading Partner Agreements.  A Trading Partner Agreement (TPA) is a document that defines the conditions under which two partners will transact business together.   It is also referred to as a collaboration protocol agreement by the ebXML specifications. 

 

4.3.1.7  XML document.  A document that contains data surrounded by XML tags.  In XML, documents can be seen independently of files.  One document can comprise many files, or one file can contain many documents.  An XML document may be any of a number of data exchange formats, including XML schema, DTDs, and others.

 

4.3.1.8  WSDL document.  A WSDL document is just a simple XML document.  It contains a set of definitions to define a Web service.  A WSDL document defines a Web service using these elements:  <portType> (operations performed by the Web service), <message> (messages used by the Web service), <types> (datatypes used by the Web service), and <binding> (communication protocols used by the Web service).

 

4.3.1.9  Registration Process.  The XML Registry should support automated registration of XML schema and other objects.  Authorized users will submit their XML objects as part of a Registration Package through the Registration Application Programming Interface (API), through SSL, that will fully document the object to be registered.  The Registration Package will consist of metadata about the object in the form of a Registry Entry and attached files containing the object to be registered.  The API will load the Registry Entry into the XML Registry.  If the object to be registered is an XML schema, it will be parsed to validate it, and to capture additional information for the registration process.  Sub-schemas, tags, and associations in the schema will be analyzed for registration as individual objects.  If the object is valid, the XML Registry will store the object in a repository file and the related information in the registry with a draft status and make it available for review.  The API would check the XML tags against the registered tags (and associated data elements) in the EDR.  If the tags in the schema did not exist in the EDR, the API would register the new tags and associated data elements with a draft status in the EDR.

 

The Registration Package will contain information that will support the creation of associations between objects in the Registration Package, and between the registered objects and the Registration Package itself.  Associations can be made to other objects within the XML Registry or to objects external to the registry.  Examples of associations include:

 

C           An XML document can be associated with its corresponding schema.

C                             An XML schema may be related to an approval document.

C           An elementary object (like an XML tag) may be related to a registered schema.

C           A registered schema may reference a registered datatype or schema component.

C           A registered object may supersede an earlier version of an object.  

 

Authorized personnel from EPA program offices and states and tribes will register as submitters, and in that role, will be able to register draft XML objects using the approved Registration Package format. 

 

All objects in the registry will have a UUID that will be generated by the XML Registry upon object submittal unless the user supplies a conforming UUID along with the submitted object.  The registry must create an audit trail to capture the events in the process of submitting an XML Object.

 

In order to accurately track and verify the submission through the registration process, the XML Registry will require all Registration Packages to contain the registry-provided authentication digital certificate and corresponding digital signature.

 

4.3.2    Development Forum

 

One of the intended uses of the XML Registry is to function as an information clearinghouse for all Network participants.  As a clearinghouse it would provide information on ongoing Network projects, post work in progress on draft XML objects, and provide a forum for exchange of ideas on XML objects. 

 

The Network is designed to serve a distributed enterprise of state and EPA programs.  Its users will be geographically distributed, have diverse programmatic interests, and will have needs for a forum for collaboration on XML objects throughout the lifecycle.  The registry features will promote reuse of existing XML objects and promote harmonization of new development with existing and emerging XML development.  This can be done by allowing registration of XML objects in draft form for discovery, review, and comment by other users.  The application should be able to track versions of XML objects as changes are made over time.  Draft XML objects will be submitted through a Registration Package with the same security requirements as final XML objects.

 

Initially, collaborators on development of an XML schema would post a notification in the development forum.  Initial versions of the schema would be registered as a file, with the minimum amount of metadata (including schema name and description), allowing them to be discovered in searches.  These initial versions would not undergo the validation process and would carry an early draft status and a version number lower than 1 (0.9, for example).  A discussion forum would permit the posting of comments by registry users.  Once the XML schema had undergone initial review and revision, it would be assigned a 1.0 version number, and would be submitted to the registry for validation.

 

The XML Registry should also provide the capability to register the building blocks of XML schema, including tags (data elements), datatypes, enumerated lists, and schema components that can be discovered for reuse in new XML object development.  These components will be registered for reuse in the metadata registry.  This step will promote using standards and harmonizing schema across the Network. 

                             

4.3.3    Classification

 

Part of the registration process will be to prepare and submit metadata about a registered object that will support administration of the object, as well as facilitate search and retrieval of the object.  A classification scheme will be used to organize the registry contents.  A classification scheme may also be a registered object. 

 

A classification scheme is an arrangement or division of objects into groups that are based on characteristics that the objects have in common, e.g., origin, composition, structure, application, or function. 

 

A registered object can be classified in a number of ways.  As part of the submission process, the SO must classify the object according to one or more previously registered classification schemes.  Classification schemes commonly used for retrieval include geographic location, industry, subject matter, and various taxonomies.  The Network is currently evaluating use of a Core Reference Model (CRM) that will identify the major entities or business areas of data exchange.  The CRM may form the basis of a classification scheme for the XML Registry. 

 

4.3.4    Administration

 

The XML Registry will have a Registration Authority who will oversee the XML objects in the Registry and who will have rights to change the administrative status of objects and to delete objects, if required.  The Registry administration functions from the ISO/IEC 11179 metadata registry standard will be adopted for this XML Registry.

 


The following sections address the administrative functions to be supported including XML object version control, promotion of objects through a series of administrative statuses, management of information about who is using an XML object, maintenance of an audit trail about changes to the object, and management of information about Web services accessibility. 

 

An additional piece of metadata, Object Stability, will help users identify which objects to reuse.  The Submitter can select a value that describes the current status of the object, indicating its level of stability (static or dynamic).

 

Section 5.0 will address the metadata required to be associated with a registered object as part of the registry entry.  A registry entry is relevant descriptive information about a registered object.  The principal metadata attributes for each registered object may include: Name, Name Context, Version, Object Identifier, Object Type, Classification, Registration Status, Administration Status, Status Date, Role (Submitting Organization, Responsible Organization, Registration Authority), and Stability. 

 

The XML Registry may also manage information about external data (or reference documents) that are information items that are related to a registered object but which reside outside the registry.  External data items may be submitted for an object that is being registered; the XML Registry will record an association with the external data, but the external data will not be managed as a registered object. 

 

4.3.5  Version Control

 

As data exchanges change over time, the XML objects supporting the exchanges will change.  Just as application systems are assigned new version numbers when changes are made, XML schemas and other objects will change and new version numbers will be needed.

 

Authorized Submitters will be able to submit objects for registration.  Once registered, registered objects will not be updated directly, but a new version of a registered object could be submitted, and would be linked to the older version(s) of the object.  Administrative metadata about an object can be modified by authorized users without versioning.  Submitters would not be allowed to delete objects, but would be able to recommend retirement of objects or mark objects to restrict publication.  The Registration Authority would be allowed to delete the objects recommended for deletion.  This controlled change environment will ensure that objects that are referenced by other objects cannot be changed or removed, which would cause a reference failure.  For this reason, usage needs to be tracked to predict the impact of versioning or retiring an XML object.

 


4.3.6    Object Status Management

 

In the OASIS/ebXML model, the required functions of the lifecycle management service include: submit, approve, update, deprecate, and remove XML objects, and add or remove slots from the registry entries.  Slots are extensions to the required metadata about the registered objects.  Each organization needs to determine its own process of review and approval, and assignment of the rights to designate an XML Object as Approved. 

 

ISO/IEC 11179 uses different terms to describe the lifecycle management of a registered object.  ISO/IEC 11179 also has a more detailed list of statuses in the lifecycle (including both administrative and registration statuses), and a lifecycle that supports review and approval for use.  Since the Network is intended to be based on the data standards, some XML objects including tags and eventually schema components and/or data types and even schemas may be part of a standard.  Therefore, it is preferred to have a lifecycle that supports review and approval of the registered objects.

 

The XML Registry will have a dual system of statuses.  Administrative status will designate the object’s position in the Registration Authority’s processing lifecycle.  Administrative statuses may include: received, draft, rejected, submitted for certification, processed, and being promoted.  Registration status will be used to designate the object’s position in the registration (and review and approval) lifecycle.  Registration statuses will be based upon the XML Technical Resources Group (TRG) approval statuses, and will include: working draft, last call working draft, candidate recommendation, and proposed recommendation.  Users would have to consider the status of an object before reusing it.  An object with a draft status is posted for comment and is subject to change during its development.  An object with a recommended status has been through a review and approval process and is posted to promote reuse.

 

4.3.7    Validation

 

The XML Registry will need to provide validation of XML objects during registration.  Validation will ensure that the schema is well-formed, and that all the external references in it are functional.  As schemas reference tags, this will require registration of XML tags and associated data elements.  The Registration API can ensure that the XML components are registered in the correct order so that validation can be achieved.

 

Validation can include checking the registry to ensure that tag names are not duplicated within a namespace, ensuring that an XML schema references valid tags, and ensuring that external links in a schema are valid.  The API will need to use the URL to access the external registry in order to retrieve the requested XML object to ensure that the usage is correct in the schema. If the external source is not available, the parser will fail and an error message will be returned.

 

Schemas will need to be periodically revalidated as referenced components will be retired or versioned, and it will be critical to identify the need to change the referencing schema so that it will work as designed.

 

4.3.8    Modifying Content

 

The XML Registry will provide the capability for an authorized SO to modify XML administrative metadata about objects on behalf of the RO that submitted the object and therefore has authorization to change it.  Authorized updates will involve modifying the metadata about the object.  If the object itself is replaced or over-written, it will be versioned. 

 

4.3.9    Approving Objects

 

Before an object can be discovered and retrieved, it needs to be approved for publication in the XML Registry.  Rules will be determined by the Environmental Information Exchange Network Steering Board (NSB).

 

A likely scenario is that a Submitting Organization will be able to decide when an object is available for public review and comment.  Only schemas that have been reviewed by the XML Technical Resources Group and found to be compatible with data standards and harmonized with other Network dataflows will be marked as recommended for reuse.

 

4.3.10  Retiring Objects

 

Once a Submitting or Responsible Organization is no longer using an XML schema or other object, it can be marked as retired.  It will not be deleted from the registry as it may be referenced by other XML objects, or it may be useful in a historical context. 

 

4.3.11  Removing Objects

 

As the XML Registry is a historical record of XML objects usage, it will be rare that an XML object will be deleted from the XML Registry.  Rules for deletion will be established.  Only the Registration Authority will have the right to delete a record.  Submitting and Responsible organizations will be able to mark objects for restricted publication, or as recommended for deletion.

 

 

 

 

 

4.3.12  Quality Control and Error Handling

 

During validation, the XML Registry will return an error to the user if an attempt is made to submit an object that fails validation, violates a data constraint or duplicates an existing object (like a tag within a namespace).

 

4.3.13  Audit Trail Maintenance

 

An audit trail is a historical record of all actions taken on a registry entry by a registered user.  The audit trail maintains information on the creation, the impact of any change, the related submission, and the submitting organization.  This enables all registry entry actions (creation, update, or deletion) to be traced back to the submitting organization and recorded with the date and time of the action.  The audit trail is a requirement of the OASIS/ebXML standard.  The ISO/IEC 11179 standard does not include rigorous tracking of changes as it only requires the recording of date of receipt and date of last modification.  However, for a registry operating in a runtime environment, audit trails will be essential to ensure data integrity.

 

4.4       Query Management

 

One of the primary purposes of the XML Registry is to facilitate the discovery of XML objects for retrieval for potential reuse.  The registry services will include querying and retrieving XML objects from the registry and repository by both human interactions with a Web site that enables browsing and drill-down, and by automated interactions with the XML Registry via the API.  Unlike the registration and maintenance functions, the query and retrieval functions will not require users to be registered, and therefore SSL and authentication digital certificates and signatures are not needed.

 

4.4.1    Discovery/Query

 

In order to improve the ability to discover XML objects they need to be fully described with metadata that can be stored in a database and can be searched.  The metadata attributes are what classify the objects, promote understanding, and facilitate discovery through searches.

 

Namespaces will provide hierarchical classification schemes enabling association of an XML object with primary business areas, and serving as one of the most useful ways to find the object.  Queries will support searches by object identifier, version number, associations, classifications, descriptions, names, alternate names, affiliated responsible and submitting organizations and using organizations.  “Keyword” queries will be possible as XML objects will be linked to metadata that provides semantic content, including data elements and definitions.  The registry services interface will return data on the registry entry (metadata) and allow linkage to the object itself stored in the repository.

 

Web site queries via human interaction may be performed using a Web browser to browse and drill down or using a filtered query that allows multiple searches to narrow results.

 

XML Registry queries will be conducted in an automated fashion using the API.  The API will permit other organizations to interact with the XML Registry through a Simple Object Access Protocol (SOAP) message, using standard query syntax, including SQL for complex queries.

 

XML files will be stored in a publicly accessible area, and will be available for discovery by Internet search engines.  Successful discovery and retrieval would be limited without the advantage of organized metadata.

 

4.4.2    Retrieval

 

The registry services interface will support query of objects by searching the related metadata.  Once an object is found, the registry service will provide access to the object in the repository.  Users or applications will be able to retrieve not just the registry entry metadata but the objects themselves.  The XML Registry should provide support for the runtime validation of XML content against the registered XML schema as part of the Network operations.  The XML Registry/Repository will serve schemas for data validation.  Data validation will be done by the Network Nodes that will initiate a query of the XML Registry to access the relevant schemas for use in data validation.

 

 

5.0       DATA REQUIREMENTS

 

In order to develop an XML Registry that meets the software requirements specified in Section 4, the data requirements for the XML Registry need to incorporate components of three standards: the OASIS/ebXML Registry Information Model (RIM) v 2.0, the UDDI Published Specification version 3.0, and the metamodel for metadata registries in ISO/IEC 11179 Part 3.  This section will describe what XML objects will be stored in the XML Registry, and what metadata can be recorded about each object.  The section will briefly describe the information requirements of each standard.  How the data will be structured will be addressed in a follow-on design document.

 

5.1       XML Objects and Metadata

 

An XML Registry must be able to record a broad range of information related to XML transactions.  The information may be recorded in a database, linked as documents, or can be accessible from another resource.  The type of XML objects that could be included in a Registry are: XML components (XML schema with component relationships, XML Datatypes, XML attributes, Object Classification schemes, XML tags), Documents (Trading Partner Agreement, Trading Partner Profile), Registry Packages, Service Bindings, Users and Organizations (Names, Mailing/Locational Information, Contact Information), and Web Services.

 

Registration of the above objects will require the following major metadata groupings.  Descriptions of the types of information that could be part of each grouping are provided.

 

 

Metadata Grouping Name

Definition

Data Content

Administration

Maintains information necessary for the management of Registry Objects.

Administration and Registration Status, Object Stability, Internal Identifier, Registry Package Description

Point of Contact

Information about an individual or organization that has a role related to a Registry Object

Person and/or Organization Name, Mailing/Locational Address, Telephone Numbers, Email Address, Role

Descriptive

Bibliographic information about a Registry Object.

Name, Name Context, Object Type, Definition, Abstract, Purpose, Version, Format, Alternate Identifier, Effective Date, End Date

Classification

Arrangement of objects into groups based on characteristics which the objects have in common.

Group/Category, Associations to Other Objects

Security

Defines the access control for Registry Objects.

Object Access, User Roles, Permissions, User Authentication

Linked Objects

Associates content in the registry with content that may reside outside the registry.

Linked Documents, Web Site URLs

Audit Tracking

 

Record of information changes.

 

 

GroupingsCreate Date, Create User, Last Change Date, Last Change User, Data Change Description

Web Services

Information about the linking of the registry to other resources using Web services.

Bindings and Associations, Links, Usage, Compliance

XML Tags

Describes the relationship of XML tags in Registry Objects to data standards or data elements in applications.

XML Tag Names, Data Element Relationships

 

Exhibit 1.  Major Metadata Groupings (Continued)

 

 

The following sections will outline how the OASIS/ebXML RIM version 2.0, UDDI Specification, and ISO/IEC 11179-3 handle these high-level data requirements.

 

5.2                   Data Requirements of the OASIS/ebXML RIM version 2.0

 

The ebXML/RIM version 2.0 model addresses all of the data groupings listed above.  The registry information model includes:

 

Registry Object - An abstract base class used by most classes in the model.  Registry Objects are related to subclasses that include information related to the following data groupings from Exhibit 1: Administration , Point of Contact, Descriptive, Classification, Security, Linked Objects, Audit Tracking, Web Services, XML Tags and XML Objects.

 

Classification Scheme - A structured way to classify or categorize Registry Objects.  The structure of a Classification Scheme may be defined internal or external to the Registry.  This is related to the Classification and XML Objects section of the data groupings.

 

Auditable Event - This is an action that changes a Registry Object instance.  The rules established for the registry define what actions require audit tracking and the level of tracking needed for each action.  This section is related to the Administrative, Audit Tracking, Point of Contact, and XML Objects data groupings.

 

User, Postal Address, Email Address, and Organization - This information provides a way to identify people or organizations with an interest in the Registry Object.  This information can be related to the Point of Contact data grouping.  The User information can also be related to the Audit Tracking, and XML Objects data group.

 

Service and Service Binding - This information represents technical information on a specific way to access a specific interface and includes the linkage information.  This can be related to the Web Services and XML Objects data groups.

 

External Link - URLs can be used to associate content in the Registry with content that may reside outside the registry, such as a DTD or Trading Partner Agreement that is on another Web site.  The use of External Links supports this service and can be found in the Linked Objects data grouping from the table above.

 

Security - The Security section of the model requires that each Registry Object be associated with security controls that govern access to operations or methods performed on that object.  This includes mechanisms that control Permissions, Privileges, Roles, Access Groups, User Identification, and Authentication.  The Security data grouping would include this type of information.

 

Appendix B describes the high-level data requirements of the OASIS/ebXML RIM.

 

5.3                   Data Requirements of the UDDI Specification version 3.0

 

The UDDI specification describes the Web services and behaviors of all instances of a UDDI registry.  Central to UDDI’s purpose is the representation of data and metadata about Web services.  A UDDI registry offers a standard mechanism to classify, catalog and manage Web services so that they can be discovered and used.  The UDDI information model consists of the following entities:

 

Business Entity - The top-level XML element in a business’s UDDI entry captures the starting set of information required by partners seeking to locate information about a business’s services including its name, its industry or product category, its geographic location, and optional categorization and contact information.  It includes support for “yellow pages” taxonomies to search for businesses by industry, product, or geography.  This can be related to the Point of Contact, Descriptive, and Classification data groupings.

 

Business Service - A grouping of a series of related Web services that can be related to either a business process or a category of services.  An example of a business process could be a logistics/delivery process, which could include several Web Services including shipping, routing, warehousing, and last-mile delivery services.  By organizing Web Services into groups associated with categories or business processes, UDDI allows more efficient search and discovery of Web Services.  This can be related to the Descriptive, Classification, and Web Services data groups.

 

Binding Template - One or more technical Web Service Descriptions captured in an XML element called a binding template.  The binding template contains the information that is relevant for application programs that need to invoke or to bind to a specific Web Service.  This information includes the Web Service’ URL address, and other information describing hosted services, routing and load balancing facilities.  This can be related to the Web Services data group.

 

Compliance Information - Each Binding Template element contains an element called a tModel that contains information which enables a client to determine whether a specific Web service is a compliant implementation so that it can be determined whether the specific Web service being invoked complies with a particular behavior or programming interface.  This can be related to the Web Services data group.

 

Appendix C describes the high-level data requirements of the UDDI specification.

 

5.4       Data Requirements of the ISO/IEC 11179 Metamodel

 

The 11179 metamodel is a blueprint for the elements of an information architecture.  The 11179 metamodel is designed to manage individual elements of information, such as enumerated values and their definitions (called value domains) and data elements, as well as groups of these elements by classification schemes.  Classification schemes can be used to group data elements and enumerated values associated with a data standard, an information system, or an XML schema.  In addition, by hierarchical arrangement of classification schemes, the data elements and enumerated values can be organized to represent the structure of a data base or an XML schema.  Following are descriptions of the major components of the 11179 metamodel, and an indication of how they relate to the XML metadata groupings.

 

Administrative Data.  The 11179 metamodel contains a great deal of metadata about each component.  Many components of the metamodel are designated as Administered Components.  Administered components are components of the metamodel that require definitions and specification for reuse and/or sharing in or among enterprises.  Each administered component carries administration information, including an identifier (composed of a Registration Authority Identifier, Data Identifier, and Version), registration and administration status designations, origin, organization and contact information, as well as create date, change date, effective date, and end date.  Each administered component in the model can be registered and tracked independently.  The components of this Administration Region of the 11179 metamodel are related to the Administrative and Point of Contact data groupings.

 

Data elements.  Data elements are the heart of the ISO 11179 metamodel.  While some components of the model, such as value domains and classification schemes, can be registered independently, much organizational metadata is focused on the primary elements of an information architecture, data elements.  Registration of a data element in a metadata registry

requires that certain characteristics of the data element be recorded to clearly describe and define it.  These characteristics are stored as attributes of the data element, stored in separate, related tables.  Data elements are equivalent to XML tags, and so this Data Element Region is related to the XML tag data grouping.  The 11179 model allows data elements to be related to one another through a data element derivation (more than one data element can be used to derive another data element, or more than one data element can be combined to create a derived data element).  This could be related to the XML object data grouping.

 

Names and Definitions.  Data element (and other model object) names and definitions are stored in the Naming and Identification Region of the metamodel.  This organization allows a data element to have multiple names and definitions in context, and allows XML tag names to

 

be represented as alternate data element names in context.  While it is another region of the model, it carries data element attributes that are related to XML tags. 

 

Value Domains.  Data Elements generally are associated with a Value Domain that provides representation information.  Value Domains can be enumerated or non-enumerated.  An enumerated domain is associated with a discrete set of permitted values, such as names or codes.  A non-enumerated domain must include a definition/description of the possible valid values for the data element representation.  These might be established by a range or a rule.  A value meaning is the meaning or semantic content of a value, or a data value.  A value meaning is paired with a permissible value to explain its meaning.  Value domains would be related to enumerations for XML tags.

 

Classification Schemes.  Classification schemes enable data elements or other objects to be organized by groups or themes.  A classification scheme is defined as the descriptive information for the arrangement or division of objects into groups.  In the Classification Region of the model, a classification scheme can be defined and related to data elements and other objects to be included in that classification grouping.  This is related to the Classification and XML object data groupings.

 

Appendix D describes the high-level data requirements of the ISO/IEC 11179 metamodel.

 

5.5                   Data Requirements Summary

 

To summarize the data requirements, the following table shows the major groupings of data to be recorded about Registry Objects.  The table also shows whether the major groupings are accommodated by the data requirements for ISO/IEC 11179, OASIS/ebXML, and UDDI.

 

 

Data grouping

OASIS/ebXML

UDDI

ISO/IEC 11179

Administration

Yes

Yes

Yes

Point of Contact

Yes

Yes

Yes

Descriptive

Yes

Yes

Yes

Classification

Yes

Yes

Yes

Security

Yes

No

No

Linked Objects

Yes

No

No

Audit Tracking

Yes

Yes

No

Web Services

Yes

Yes

No

XML Tags

No

No

Yes

XML Objects

Yes

Yes

Yes

Exhibit 2.  Data Requirements Summary (continued)

 

The table shows that standard metamodels have a lot in common.  In addition to including many of the same attributes, the registry metamodels are similar in other ways.  All the specifications,  OASIS/ebXML, ISO/IEC 11179, and UDDI, handle object identification in the same way by assigning a UUID to each object.  Alternate identifiers may be needed for Registry Objects to enable association with external links, as well as associations (Any Registry Object instance may be associated with any other Registry Object instance.)  All of the specifications include metadata for administrative and point of contact identification and management, and use a classification scheme to organize records for retrieval.  The levels of security and audit tracking required by the three specifications vary and the Registry would need to be built to the most stringent requirement identified by review of the specifications (e.g., Audit Tracking in OASIS/ebXML).  The association of XML tags to data element information is only supported by the ISO/IEC 11179 specification so associations between the XML registry and the 11179 registry structure need to be created.  By incorporating the information requirements of all three of these specifications, the registry can accommodate all the information needs expressed by the listed data groupings.  Summaries of the data requirements of the three applicable standards can be found in Appendixes B, C, and D.  Appendix E contains a glossary of the terms and definitions used in this document.

 

 

6.0       INTEROPERABILITY REQUIREMENTS

 

6.1       Security and Privacy

 

The security requirements of the XML Registry are in part determined by the Information Management Working Group (IMWG) Network Blueprint.  The blueprint states that public key infrastructure (PKI) technology using digital signatures and digital certificates should be considered for verifying and authenticating the validity of partners exchanging information.  The secure sockets layer (SSL) and Secure HyperText Transfer Protocol (S-HTTP) are specified for transmitting data securely.  An information request may flow over the network under different security levels, including public access and end-to-end authentication through certificates with digital signatures. 

 

The security requirements of the ebXML model include:

 

C           The registry must be able to authorize appropriate access to its contents.  The identity of the ownership of registry content as well as the privileges assigned to a user for the registry content must be authenticated.  The registry must be able to assure confidentiality for some of the registry contents that may not be suitable for public viewing.  Roles will determine the level of authorization assigned to a user. 

 

C           The registry requires user-level security and document-level authorization.  Session-based security may be used to avoid authenticating every message or interaction. 

 

C           The registry may only accept content from any client if a certificate issued by the Registration Authority, is provided and is digitally signed.  Messages between registry services and their clients must be confidential.  Messaging can use the distinguished name from the certificate to authenticate the user when the registry receives a request.  The distinguished name is the name that is associated with the digital certificate that is being used to authorize a request to the registry.  The payload of the message also must be signed, and the registry will store the signature as part of the content.

 

6.2                   Linkages

 

It is intended that the XML Registry will serve as a functional node on the EPA/State Network.  The XML Registry will interoperate with each of the EPA and state nodes to provide a central source of information on the Network.  The XML Registry could support runtime validation of the content of XML instance documents against the registered XML schema as part of the Network operations.  As the XML Registry will need to be available to serve schemas upon request by Network Nodes to support runtime validation, the XML Registry needs to be available on a 24 hours by 7 days a week basis in a robust environment to support reliable operations.

 

The XML Working Group of the federal Chief Information Officer’s Council is analyzing the requirements for an XML Registry that would serve the federal government.  Initial results indicate that this may become a federation of separate registries, each serving an agency or

department or a particular business area.  It is intended that the State/EPA Network Registry would be able to interact with that federation of federal registries.  At this time, the requirements for that registry or group of registries have not been defined, so there is no specific plan for how to implement that linkage. 

 

States may also develop statewide XML registries and the State/EPA Network Registry may also have to interoperate with those registries as some content may be common among those registries and the EPA/State XML Registry. 

 

 

 

 

7.0       CONCEPT OF OPERATIONS

 

The OASIS/ebXML Registry is designed to support business processes.  A business process is defined as “a collection of business transactions between business partners.”

 

Participants in the Network will maintain Trading Partner Profiles (TPP) that describe the business processes in which each organization can engage.  The TPP will specify the technological capabilities supported and the requirements that must be met to exchange business documents with them.  In order to exchange data over the network, a set of organizations will need to sign a Trading Partner Agreement (TPA) that defines the conditions under which the partners will transact business.  The TPA will address: identification of the participating organizations, purpose of the TPA, dataflows to be used (including the specific format and structure to be used for exchanging information and the url for the location of the format), transport protocols and electronic addresses of the parties, procedures for dispute resolution, duration of the TPA, contingencies for exchange failure, internal system requirements, legal framework, rules for message exchanges, parallel paper transactions, performance and reliability, quality and stewardship, record retention, roles and responsibilities, security, termination conditions, and intended use of data.  TPAs and TPPs will become registered XML objects in the XML Registry. 

 

A pair of business partners, such as a state and EPA, planning to exchange data on a particular business process, such as verifying facility data in a central file of facility data, or updating information on wastewater permits, will use the XML Registry in the process of defining the protocol for data exchange. 

 

Once the need for data exchange is established, the parties may agree to collaborate on developing an XML schema.  In the Discovery and Retrieval Phase, they will query the XML

Registry to discover available components and review them for potential reuse or reference/inclusion in the new XML schema.  They will retrieve them via download for analysis and reuse.  In the XML Registry, they can review requirements for XML schema design and post notice of intent to develop this new schema in the Developers Forum.  They would download available components for use, including standard XML tags or XML schema components or data types. 

 

Once the draft schema was prepared, they would prepare a minimum set of metadata about the schema so that they could post the draft schema as a file available for discovery and review in the Registry.  During the review period, the developers may be contacted by other registry users regarding similar efforts underway, or may receive comments from reviewers.  Some collaboration might be required to harmonize with related exchange templates.  After a specified review period, the Submitting Organization would review comments, make any needed changes, and submit the XML schema to the XML TRG for approval consideration.  At that point, the SO would be required to complete a registration package for full registration in the XML Registry.  The SO would use the Registration API to prepare the registry package for submission.  The package might consist of a schema, its associated metadata, and documents.  The API would analyze the package and notify the developers of any missing or problematic components.  The API would create a UUID for the XML object, validate the XML schema, and parse the XML schema for components that could be registered as data types, tags, enumerated value lists, or schema components.   Once an XML object was approved and complete registered, the SO could request that the RA change the status of the XML schema to recommended, and it could be posted for reuse.

 

Once populated, the XML Registry will be an integral part of the Network.  All of the content related to the Network data exchanges will be stored in the Registry.  During the runtime phase, ebXML messages will be exchanged among trading partners using the messaging service.  Automated business to business transactions would be facilitated using the API to support automated querying.  External users would be able to construct a query on the UDDI network, specifying a keyword to use in searching classification schemes.  The XML Registry Query API would receive the query message and return both the metadata in the registry entry as well as the linked object from the repository.  Users could also use a Web interface to browse and drill down in search of XML objects that matched their search criteria.  The registry would track usage of the XML objects by collecting information about registry clients

who downloaded the objects.  That way, the Registry Administrator could send them notification of change to an object.  

 

If a change was made to any XML object, the SO could use the lifecycle management function to register a new version of the XML object.  The older version would remain in the registry to support references to it.  Objects would be versioned when any major part of the metadata was modified, or if the XML object itself was changed.  The registry administration function would notify users about changes in objects for which they were registered.  Through the registration of a new version of an object, the SO would declare the old object as “retired.”  However, if the SO actually wanted to delete an object from the registry, they would need to make a request to the RA.  The RA would conduct an impact analysis of the deletion of the object by analyzing usage of the object.  The SO would be able to mark objects to restrict their publication.  

 

The registry would support the data validation to be done on the Network by serving schemas upon request.  Data exchanges on the Network would require access to the schemas to support data validation.  Data validation would be carried out by the nodes.  The nodes would send a request to the Registry to access the relevant schema to support data validation.

 

 

 

8.0       PRELIMINARY REGISTRY TOOL OPTIONS

                       

8.1       Background

 

EPA and its information trading partners have identified a need for an XML Registry to support proposed XML data interchange over the Network.  XML Registry development is a relatively new field.  It is important to survey existing registries and available tools as part of the process of identifying a solution for the Network.

 

XML Working Group of the Federal CIO Council (XMLWG) has created a working group to consider development of federal government XML Registries.  Both the Department of Defense and the National Institute of Standards and Technology registries have been considered to be prototypes for a government-wide registry.  At this time, an alternatives analysis of managing XML resources in federal agencies is under development.  The alternatives analysis compares the implications of not developing any government registry with two major architectural options:

 

C    Single Unified Registry/Repository: Building a single federal registry/repository that requires that every federal agency wishing to publish schemas or artifacts submit their objects to the central registry/repository.

 

C     Federated/Distributed Model: Each agency or entity may develop or acquire its own registry/repository, meeting government-wide specifications to ensure interoperability with the central government-wide registry/repository.

 

The report’s findings will help shape the future of XML registries in the federal sector.

In addition, new tools are emerging from the commercial sector that may change the options

available to the Network in the near future.  This section simply presents information about available registries and registry development tools.  The tools have not been tested and analyzed; the information presented is based on what is available in the industry press and in marketing materials.  Upon determining the requirements of the XML Registry, an options analysis will be conducted to determine if any of the available registry software could meet those requirements. 

 

8.2       Existing Online Registries

 

Some registries are already available online and can be used to post XML objects for discovery and reuse.

 

 

 

XML.org.

 

The XML.org Registry offers a central clearinghouse for developers and standards bodies to publicly submit, publish, and exchange XML schemas, vocabularies and related documents.  Operated by OASIS—the non-profit XML interoperability consortium—the XML.org Registry is a self-supporting resource created by and for the community-at-large.

 

Industry groups and other organizations that have developed XML schemas or vocabularies are encouraged to register their work at the XML.org Registry.  The registry is available online at http://www.xml.org/xml/registry.jsp.  Schemas can be registered and searched by industry.  The environmental industry is included.  The registry is an independent entity that will serve as a model for future registries.  It offers no administrative controls on what is registered and cannot be tailored specifically to meet EPA Network needs.

 

NIST registry

 

The National Institute of Standards and Technology (NIST), in cooperation with EPA, supported the development of a prototype XML registry as a proof of concept.  It was based on software developed for the Defense Logistics Information Service (DLIS) and was compliant with the ebXML version 1 specification.  It is available at: http://xmlregistry.nist.gov/EPA-States/.

 

This software includes the ability to search the registry and repository for XML objects by URN, common name, version, keyword, organization, object type, file type, and dates.  The software also allows viewing of XML objects.  If you are an authorized user, you can submit and validate a schema, and conduct some administration of XML objects in the registry.

 

Some of the deficiencies in this implementation are that objects are limited to an 8-character file name.  The registry does not support tracking the status of an XML object.  The registration of schema modules was not entirely successful. 

 

Environmental Data Registry (EDR)

 

The EDR is not an XML Registry.  It is a metadata registry that is based upon the ISO/IEC 11179 metamodel.  Currently, the EDR stores XML tags as alternate name contexts for approved standard data elements.  The EDR has a lot in common with the XML Registries in that it registers information resources and other “group objects” that are associated with individual data elements that are registered.  The EDR also has a lot of the functionality that is defined in the OASIS/ebXML version 2 specification, including notification of change, web-based querying and retrieval, flexible object linking through associations, secure data update, version control, and status tracking.  The EDR does not currently support all of the functionality of the XML registries as specified, but could be modified to meet all of the stated requirements.

 

8.3                   Available Registry Software

 

Several implementations of XML registries offer a source of code that could be obtained and customized for Network use.  XML registries have been developed for the Federal government, and the code can be obtained for reuse.  Another XML registry has been developed by the Open Source Code community that makes software freely available for reuse.

 

DISA/DoD Registry

 

The Defense Information Systems Agency (DISA) of the Department of Defense (DoD) has developed an XML registry to promote interoperability.  The registry provides a baseline set of XML Information Resources developed through coordination and approval among the DoD communities and includes the following functions: browse, search, and retrieve XML objects by keyword, information resource type, version, and namespace.

 

The availability of this code would need to be investigated.

 

Open Source ebXML Registry

 

An alpha version of an Open Source OASIS ebXML registry has been developed by a consortium called the ebxmlrr project.  The registry implements version 2.1 of the OASIS ebXML Registry specifications.  The release is available in both source and binary form under an Apache style open source license that permits royalty free use of the source and binaries.  The release may be downloaded from the following location:

http://sourceforge.net/project/showfiles.php?group_id=37074&release_id=97479

 

The code was developed by an international collaboration with developers from around the world who are jointly participating in an open source community project hosted at Source Forge.  The initial code base for the ebXMLRR project at Source Forge originated from a donation by Sun Microsystems, Inc. to the open source development community of an internal implementation developed at Sun.  Sun donated the code to the ebXMLRR project at Source Forge in November of 2001.  Since then, an international community of developers committed to open international standards has worked together to complete this implementation of the

 

OASIS ebXML Registry/Repository standard.  The ebXMLRR project information is available at http://ebxmlrr.sourceforge.net.

 

 

Since both the registry and the client software are based on Java and XML, the implementations are portable across platforms and operating systems, and can also interoperate with implementations on other platforms and those written in other languages.  The software can interoperate with any database that supports SQL97, including Oracle. 

 

The registry is designed to store metadata about Web Service descriptions, XML data and documents, binary data (such as images, sound files, video data, executable application files, CAD files, etc.), and any other kind of data.  Using the registry, this data can be searched and classified using advanced and ad hoc query mechanisms such as XML filter query and SQL query.  The registry features a Lifecycle Manager for schema validation and other administrative functions and a Query Manager that provides read-only query functions.

 

In addition to the registry, the ebxmlrr UI client software provides a unique user interface (UI) for graphically visualizing registry content.  It is based on the Java API for XML Registries ('JAXR'). JAXR provides a single standard Java API that allows Java programmers to interact with emerging standards for XML Registries including both ebXML Registry and UDDI.  This UI client software also uses the Java API for XML Messaging ('JAXM') in order to send SOAP-based messages between the client software and the registry.  These SOAP messages are used to send requests to the OASIS ebXML Registry and to receive responses from the registry.

 

8.4                   Commercially Available Tools

 

A couple of vendors have developed XML Registry Software that is available commercially.  More products are currently under development by software vendors.  Following is a brief description of the currently available software.

 

XML Global

 

XML Global has developed an array of GoXML tools, including the GoXML Registry that is designed to store and organize business documents, processes, and services.  The registry is based on the ebXML standard version 2.  The registry tool is integrated with the related software tools, such as GoXML Transform, which stores transformation rules, schemas, DTDs and EDI dictionaries.  GoXML Transform Central is an open standards-based platform for Enterprise Integration and automated supply chain management that includes ebXML Message Handling Services, Web Services, and Process and Transformation integration, with optional metadata management provided by Registry services.  XML Global's GoXML Registry is a metadata registry server with Web and programmatic interfaces that includes a registry engine, a repository, a Web-based registry client and a registry services API.  The registry tool includes lifecycle management of registry objects, version control, management of authorized users and privileges, search and retrieval of repository objects, content administration, and registry service interfaces.  The XML Global products run on Windows, Solaris, and Linux platforms.

 

XML Canon

 

XML Canon/Developer is a repository for XML schemas, DTDs, instance documents, stylesheets, and adjuncts.  It includes the ability to index and search XML objects.  It includes namespace management, and flexible lifecycle management, as well as check-in and check-out and version control.  It includes a data dictionary for the management of vocabulary components in a data dictionary.  It can be integrated with related TIBCO products for development of XML.

 

XML Canon Developer runs on Windows, NT, UNIX (HP UX, Linux, and Solaris).  It can be Web-enabled through use of an add-on Portal package.

 

8.5                   Related Software

 

Oracle XML DB

 

The Oracle Corporation has included Oracle XML DB as part of its Oracle 9i Relational Database Management System (RDBMS) software package.  Oracle XML DB is a storage and retrieval technology for XML objects.  It is based on the W3C XML data model and provides the capability to store XML objects in a relational database and can serve as an XML repository.  It provides a variety of functions, including access control, folder organization, WebDAV and FTP access, SQL search, hierarchical indexing, and a navigational API to rename, delete, and copy files.  Because this product is very new there is minimal available information.  It appears that you can use XMLSpy to register an XML schema into the Oracle 9i database using a graphical schema editor.  But, it does not appear to include registry functions—although those could be developed to manage the registration and retrieval of objects in the repository.  In addition, Oracle9iAS Release 2 includes a fully UDDI v1.0 and 2.0-compliant registry, and provides a comprehensive J2EE platform to develop, deploy, and manage Web services.

 

DISA Registry Initiative (DRIve)

 

The Data Interchange Standards Association (DISA) developed an XML registry based upon the ebXML V. 1 specifications to register business process models, XML schemas and DTDs, and related business objects (such as industry-specific code lists) in a systematic way.  The registry enables search and retrieval of models, schemas, profiles, and other objects, using common retrieval methods, including browse and drill-down, as well as filtered searches, both required by the ebXML specifications.  In addition, as required by ebXML, DRIve will use standard message formats based on SOAP, an XML messaging specification, for submission of objects and responses.  It will also have security to protect against unauthorized access, integrity of objects, and nonrepudiation of entries.

 

DRIve participation is limited to members of DISA affiliated organizations – Accredited Standards Committee X12, Hotel Electronic Distribution Network Association, Interactive Financial Exchange, Mortgage Industry Standards Maintenance Organization, Open Philanthropy Exchange Forum, and Open Travel Alliance (and others that may join with DISA as the project unfolds).

 

It is not known whether this software prototype could be obtained for reuse.

 

 

9.0                   ACCEPTANCE REQUIREMENTS

 

It is anticipated that the requirements listed and described in Appendix A will be part of the acceptance criteria for an XML Registry application.                             


 


 

                                                                                               

 

 

 

 

 

 

 

 

 

 

APPENDIX A

 

Summary of XML Registry Software Requirements


 


Appendix A - Summary of XML Registry Software Requirements

 

Requirement

Number

Requirement Description

1

Provide registry access to authorized users from EPA, States, Tribes, and industry partners.

2

Serve as a single, centralized XML Registry that will manage the information about all of the dataflows on the Network and serve as the single source of information related to the Network..

3

Comply with the OASIS/ebXML Registry Information Model (version 2.0).

4

Comply with ISO/IEC 11179 (International Standard) Metadata Registries (MDR).

5

Provide the capability to manage both atomic information objects like data elements and enumerations, as well as the groupings of those elements and permit linking so that the granular components (elements) can be leveraged in a variety of structures (schemas).

6

Provide support for human as well as automated system interactions to search and retrieve XML objects.  Human interface needs to be accessible in accordance with Federal Section 508 guidelines.

7

Provide linkages to Web services.

8

Provide potential for data exchange with other XML registries.

9

Include an expandable hierarchical classification scheme for organization of XML objects for discovery and retrieval.

10

Manage XML objects within namespaces and provide the capability to manage a dynamic, hierarchical namespace architecture. 

11

Record the status of XML objects and control the update of that status.

12

Keep track of which schemas are using different schema components.

13

Generate a Universally Unique identifier (UUID) for each registered object.

14

Support schema harmonization by providing a place to register new schema development efforts and by registering standard data elements and data types or sub-schema components for standard groupings of data to promote reuse.

15

Store contact and administrative information about submitters and users of particular XML objects.

16

Store XML objects including XML tags (elements), enumerations (name/value pairs), XML schemas, XML schema components, XML datatypes, XML namespaces, XML documents, trading partner agreements, and administrative documents (approval documentation, submission manifests).

17

Support automated registration of submitted objects and related metadata.

18

Serve as an information clearinghouse by providing information on ongoing Network projects, posting work in progress on draft XML objects, and providing a forum for exchange of ideas on XML objects.

19

Provide security features to ensure control of access to authorized users to protect data integrity.

20

Upon registration, provide schema validation against schema guidelines to ensure that it is well-formed.

21

Provide version control on all registered objects.

22

Ensure unified understanding of registered objects through use of complete metadata, including definitions.

23

Provide an audit trail for all actions taken on a registry entry.

24

Create interfaces to support several registry user roles, including Registration Authority, Submitting Organization, and Responsible Organization, as well as registry guests/clients.

25

Support query of Registry Entry metadata using browse and drill down techniques.

26

Support additional filtered and ad hoc queries for certain user classes through the API.

27

Support retrieval of objects from repository via download.

28

Support human and automated registry service interfaces to submit, register, modify, search, and retrieve objects. 

29

Support tracking of an XML object through the lifecycle from registration in draft through retirement.  A dual set of status codes will keep track of administrative and registration statuses.

30

Create potential for future linking to other federal, state, and other related XML registries.

31

The XML Registry should provide support for the runtime validation of XML content as part of the Network operations by providing access to the registered XML schema by Network Nodes that will validate the data content of Network exchanges.

 

 

 

 

 

 

 

 

 


 


 

 

 

 

 

 

 

 

 

 

 

 

APPENDIX B

 

Data Requirements for the OASIS/ebXML Registry Information Model v. 2.0


 


Appendix B - Data Requirements for the OASIS/ebXML Registry Information Model v.2.0

 

OASIS/ebXML Model Classes and Descriptions

Classes

Description

Model Section: Detail View

RegistryObject

An abstract base class used by most classes in the model; provides minimal metadata for registry objects.

RegistryEntry

Common base class for classes in the information model that require additional metadata beyond the minimal metadata required by Registry Object.

Slot

Provide a dynamic way to add arbitrary attributes to Registry Object instances.

ExtrinsicObject

Provide metadata that describes submitted content whose type is not intrinsically known to the Registry and therefore must be described by means of additional attributes.

RegistryPackage

Allow for grouping of logically related Registry Object instances even if the individual member objects belong to different submitting organization.

ExternalIdentifier

Provides additional identifier information for the Registry Object

ExternalLink

Used to associate content in the registry with content that may reside outside the registry.

Model Section: Registry Audit Trail

AuditableEvent

Describes the information model elements that support the audit trail capability of the Registry.  Provide a long-term record of events that effect a change in a Registry Object.

User

User instances keep track of the identity of the user that generated the Auditable Event.

Organization

Provides information on related organizations.

Postal Address

A simple reusable entity class that defines attributes of a postal address.

Telephone Number

A simple reusable entity class that defines attributes of a telephone number.

Email Address

A simple reusable entity class that defines attributes of an email address.

Person Name

A simple entity class for a person’s name.

Service

Provides information on services, such as Web services.

ServiceBinding

Registry Object instances that represent technical information on a specific way to access a specific interface offered by a Service instance.

SpecificationLink

Provides linkage between a Service Binding and one of its technical specifications that describes how to use the service using the Service Binding

Model Section: Association of Registry Objects

Association

Used to define many-to-many associations among Registry Objects in the information model.

Model Section: Classification of Registry Objects

Classification Scheme

The metadata that describes a registered taxonomy.

Classification Node

Defines the tree structure where each node in the tree is a Classification Node.

Classification

Classifies a Registry Object instance by referencing a node defined within a particular classification scheme.

Model Section: Security View

AccessControlPolicy

Defines the policy rules that govern access to operations or method performed on the Registry Object.

Permission

Used for authorization and access control to Registry Objects.

Privilege

Controls access to a protected Registry Object.

 

 

class=Section9>

Privilege Attribute

A common base class for all types of security attributes that are used to grant specific access control privileges.

Role

Roles are used to grant Privileges to Principals.

Group

An aggregation of users that may have different Roles.

Identity

Used to identify a person, an organization, or software service.

Principal

An entity that has a set of Privilege Attributes.

 

 

 

                                                                                                           

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 APPENDIX C

 

Data Requirements for the UDDI Specification v. 3.0


 



Appendix C

 

Data Requirements for the UDDI Specification v. 3.0

Model Sections

Descriptions

Entity: Business Entity

Top-level XML element in a business’s UDDI entry, captures the starting set of information required by partners seeking to locate information about a business’s services.

Business Key

Uniquely identifies the Business Entity within the registry.

Discovery URLs

List of Uniform Resource Locators (URL) that point to alternate, file-based service discovery mechanisms.

Name

Simple textual name for a Business Entity.

Description

Simple textual descriptive information about the Business Entity.

Contacts

Records contact information for a person or a job role within the Business Entity so that someone who finds the information can make human contact for any purpose.

Business Services

Describe families of Web services.  Provided by the Business Entity.

Identifier Bag

List of other identifiers, each valid in its own identifier system, (e.g., tax identifier or DUNS number).

Contact Bag

List of business categories that each describes a specific business aspect of the Business Entity,(e.g.,  industry, product category or geographic region).

Signature

May be digitally signed using XML digital signatures.

Entity: Business Service

A grouping of a series of related Web services that can be related to either a business process or a category of services.

Service Key

Identifies the Business Service within the registry.

Business Key

Identifies the Business Service within the registry.

Name

Simple textual name for the Business Service.

Description

Simple textual descriptive information about the Business Service.

Category Bag

List of business categories that each describes a specific business aspect of the Business Service (e.g., industry, product category or geographic region.)

Signature

May be digitally signed using XML digital signatures.

Entity: Binding Template

Technical descriptions of Web services are provided by Binding Template entities.

Binding Key

Identifies a Binding Template.

Service Key

Identifies the Business Service that contains the Binding Template.

Description

Simple textual descriptive information about the Binding Template.

Access Point

An attribute-qualified URI, typically a URL, representing the network address of the Web service being described.

tModel Instance Details

List of one or more tModel Instance Info elements.

Category Bag

List of categorizations that each describes a specific aspect of the Binding Template (e.g., industry, product category or geographic region.)

Signature

May be digitally signed using XML digital signatures.

Entity: tModel

Describes Web services in ways that are meaningful enough to be useful during searches is an important goal of UDDI.

C-2

 
Name

Simple textual name for the tModel.

Description

Simple textual descriptive information about the tModel.

Overview Doc

Used to house references to remote descriptive  information or instructions related to the tModel.

Identifier Bag

List of logical identifiers, each valid in its own identifier system.

Category Bag

List of categories that describe specific aspects of the tModel (e.g., its technical type).

Signature

May be digitally signed using XML digital signatures.

Entity: Publisher Assertion Structure

A set of Business Entity structures whose members  would like to make some of their relationships visible in their UDDI registrations.

From Key

The first of two Business Entity instances between which an assertion is made

To Key

The second of two Business Entity instances between which an assertion is made.

Keyed Reference

Describes the relationship between the Business Entity elements identified by From Key and To Key

Signature

May be digitally signed using XML digital signatures.

Entity: Operational Info Structure

Used to convey the operational information for the UDDI core data structures (the Business Entity, Business Service, Binding Template and tModel structures).

Created

Information about a publishing operation is captured whenever a UDDI core data structure is published.

Modified

The time at which the entity with which the Operational Info is associated was created or last changed.

C-3

 
Modified Including Children

Contains information about how modifications are related to each other.

Node ID

A unique key that is used to identify a node within a UDDI registry.

Authorized Name

Provides an indication of the owner of the data.

 

 

 

 

 

 

 

 

 

 

C-4

 

 


 

 

 

 

 

 

 

 

 

 

 

 


APPENDIX D

 

Data Requirements for the ISO/IEC 11179 Part 3 Metamodel


 


 

Appendix D

Data Requirements for the ISO/IEC 11179 Part 3 Metamodel

 

The primary purpose of the ISO/IEC 11179-3 is to specify the structure of a metadata registry, the basic attributes which are required to describe metadata items, and the types of metadata items that are administered in a registry.  The basic unit for which metadata is collected in the registry is called an administered item and information about an administered item is recorded in an administration record.  The first table describes the type of information included in an administration record and includes detailed information about the data element region of the registry.  Types of administered items along with their definitions are included in table 2.of Appendix D.

 

Table 1: Administration Record and Data Element Region

Data Requirements

Descriptions

Region: Administration/Identification

Contains information related to the identification and registration of items submitted to the Registry.

Registration Authority Identifier

An Identifier assigned to the organization responsible for maintaining the Registry.

Language Identification

The collection of identifiers required to identity a language or language variation for a particular purpose.

Contact

An instance of a role of an individual or an organization to whom an information item(s), a material object(s) and/or person(s) can be sent to or from in a specified context.

Item Identifier

An identifier for an item.

Administered Record

A collection of administrative information for an administered item.

Region: Naming and Identification

Manages the names and definitions of administered items.

Context

A universe of discourse in which a name or definition is used.

Terminological Entry

An entry containing information on terminological units for a specific administered item within a context.

Language Section

The part of a terminological entry containing information related to one language.

Designation

The designation of an administered item within a context.

Definition

The definition of an administered item within a context.

Region: Classification

The descriptive information for an arrangement or division of objects into groups.

Classification Scheme

The descriptive information for an arrangement or division of objects into groups based on characteristics, which the objects have in common.

Classification Scheme Item

Item of content in a classification scheme.

Classification Scheme Item Relationship

The relationship among items in a classification scheme.

Region: Data Element

A unit of data for which the definition, identification, representation, and permissible values are specified by means of a set of attributes.

Data Element Concept

A concept that can be represented in the form of a data element, described independently of any particular representation.

Value Domain

A set of permissible values.

Representation Class

The classification of types of representations.

Data Element Example

A representative illustration of a data element.

D-2

 
Data Element Derivation

The relationship among a data element which is derived, the rule controlling its derivation, and the data elements from which it is derived.

Derivation Rule

The logical, mathematical, and/or other operations specifying derivation.

 

Table 2: Type of Administered Items in ISO/TEC 11179-3

Name

Description

Classification Scheme

The descriptive information for an arrangement or division of objects into groups based on characteristics, which the objects have in common.

Conceptual Domain

A set of valid value meanings.

Context

A universe of discourse in which a name or definition is used.

Region: Data Element

A unit of data for which the definition, identification, representation, and permissible values are specified by means of a set of attributes.

Data Element Concept

A concept that can be represented in the form of a data element, described independently of any particular representation.

Object Class

A set of ideas, abstractions, or things in the real world that are identified with explicit boundaries and meaning and whose properties and behavior follow the same rules.

Property

A characteristic common to all members of an object class.

Representation Class

The classification of types of representations.

Value Domain

A set of permissible values.

 

 

 

 

 

 

 

 

Text Box: D-3

 

 

 

 

 

 

 

 

 

 

class=Section13>

 

 

 

 

 

 

 

 

 

APPENDIX E

 

XML Registry Requirements Glossary

 


 


Appendix E - XML Registry Requirements Glossary

 

API - Application Programming Interface.

 

Business Process - a collection of business transactions between business partners.

 

B2B - Business to Business.

 

Classification Scheme - A classification scheme is an arrangement or division of objects into groups that are based on characteristics that the objects have in common, e.g., origin, composition, structure, application, or function. 

 

CRM - Core Reference Model. 

 

Dataflow - a collection of elements that passes from one process to another.

 

DET - Data Exchange Templates.

 

Distinguished Name - the name that is associated with the digital certificate that is being used to authorize a request to the registry.

 

Distributed Architecture - several registries exist and interact with a “central” XML registry.

 

ebXML (electronic business eXtensible Markup Language) - defines an entire

e-commerce infrastructure, of which the registry is an integral part.

 

ebxmlrr - OASIS ebXML Registry Reference Implementation Project.

 

EDSC - Environmental Data Standards Council.

 

Information Model - describes the types of objects that are stored in a registry, the type of metadata recorded about the objects, and how the information in a registry is organized.

 

ISO/IEC 11179 (International Standard) Metadata Registries (MDR) - the International Standard for standardization and registration of data elements and their components for sharing and making them understandable.

 

Module - the module entity represents a segment of source code that may be used by

many programs.

 

NEIEN - National Environmental Information Exchange Network, a.k.a. the "Network."

 

NSB - Network Steering Board.

 

Node - An endpoint of a link or juncture common to two or more links in a network.

 

OASIS (Organization for the Advancement of Structured Information Standards) - a nonprofit international consortium that creates interoperable industry specifications based on public standards, such as XML and Standard Generalized Markup Language (SGML).

 

Object - A passive entity that contains or receives data, e.g., bytes, fields, files, directories, network nodes, pages, programs, segments, words.

 

Parser - A program that interprets user input and determines what to do with the input.

 

Peer-to-peer network- A network where there is no dedicated server. Every computer can share files and peripherals with all other computers on the network, given that all are granted access privileges.

 

PKI - Public Key Infrastructure. 

 

Registered Object - something that an organization wants to publish for discovery and retrieval.  Registered objects may include XML tags (elements), XML schemas, XML schema fragments, XML datatypes, namespaces, documents, trading partner agreements, and administrative documents.

 

Registry - The mechanism used to register, discover and retrieve documents, templates, and software, (i.e., objects and resources).

 

Registry Client - Any user who uses the registry to discover and retrieve an XML object.

 

Registration Authority (RA) - A recognized expert organization that is responsible for populating and maintaining the registry.

 

Registration status -  Registration status will be used to designate the object’s position in the registration (and review and approval) lifecycle.  Registration statuses will be based upon the XML Technical Resources Group (TRG) approval statuses, and includes: working draft, last call working draft, candidate recommendation, and proposed recommendation.

 

Registry client -  Registry clients do not have rights to submit or update registry content, but only have query access to discover and access content.  They have no contract and do not require authentication to use the registry.

 

Registry entry - relevant descriptive information, or metadata about a registered object.

Repository - a storage facility for registered objects with an access method that enables retrieving individual objects, perhaps with an additional authentication and permission layer.

 

Responsible Organization (RO) - a responsible organization is responsible for coordination of XML objects in a particular organization (e.g., a program office or a state).

 

RIM - Registry Information Model.  The RIM describes the types of objects that are stored in a registry, the type of metadata recorded about the objects, and how the information in a registry is organized.

 

S-HTTP - Secure Hyper-Text Transfer Protocol.

 

SSL - Secure Sockets Layer.

 

Static Configuration - a configuration, in which objects can be submitted, but registered objects cannot be updated and deleted.

 

Submitting Organization (SO) - An individual or organizational element designated to identify and report data elements suitable for registration.  The entity that originally registered an object.

 

SOAP - Simple Object Access Protocol.  Provides a lightweight messaging format that works with any operating system, any programming language, and any platform.

 

Tags - elementary objects of an XML schema, XML tags are data identifiers enclosed in angle brackets, like this: <...>

 

TPA (Trading Partner Agreement) - Conditions under which the partners will transact business.

 

TPP (Trading Partner Profile) - A description of the business processes in which each

organization engages.

 

UDDI - Universal Description, Discovery, and Integration.

 

URI - Uniform Resource Indicators.

 

UML - Unified Modeling Language.

 

UN/CEFACT - United Nations Centre for Trade Facilitation and Electronic Business.

 

URN - a persistent, globally unique name assigned to an object.

 

UUID - Universally Unique Identifier.

 

Version  - The version number is a value that identifies the sequence of changes in specifications for a data item for audit trail purposes.

 

VPN - Virtual Private Network.

 

W3C - World Wide Web Consortium.

 

WSDL (Web Service Definition Language) document - a simple XML document, containing a set of definitions for a Web service .

 

XML (eXtensible Markup Language) - a system for defining specialized markup languages that are used to transmit formatted data.

 

XML attributes - attributes are normally used to describe XML objects or to provide additional information about elements.

 

XML DTDs - Document Type Definitions.

 

XML Namespaces - A collection of names, identified by an URI reference, which are used in XML documents as element types and attribute names. In order for XML documents to be able to use elements and attributes that have the same name but come from different sources, there must be a way to differentiate between the markup elements that come from the different sources.

 

XML schema - a document that defines the required structure of an XML document and constraints on its content.