SOFTWARE AND DATA REQUIREMENTS FOR THE XML
REGISTRY FOR THE EPA-STATE NATIONAL ENVIRONMENTAL INFORMATION EXCHANGE NETWORK
CONTRACT
NO. 68-W-99-002
TASK
ORDER No. 021
Prepared
for:
United
States Environmental Protection Agency
Office
of Environmental Information
1200
Pennsylvania Avenue, NW.
Washington,
DC 20460
Task
Order Project Officer:
Michael
Pendleton
Prepared
by:
Systems
Development Center
Science
Applications International Corporation
6565
Arlington Boulevard
Falls
Church, VA 22042
CONTENTS
EXECUTIVE SUMMARY...................................................................................................... ES-1
1.0
INTRODUCTION............................................................................................................... 1
1.1 Purpose.......................................................................................................................... 2
1.2 Scope............................................................................................................................ 2
1.3 System Overview........................................................................................................... 2
1.4 System Architecture........................................................................................................ 3
2.0
REFERENCES.................................................................................................................... 6
3.0
APPLICABLE STANDARDS............................................................................................ 7
3.1 OASIS/ebXML............................................................................................................. 8
3.2.............. International Organization for Standardization
(ISO)/International Electrotechnical
Commission (IEC) 11179-3:2000 Information
technology – Metadata Registry (MDR) - Part 3, Registry metamodel and basic
attributes................................................................................... 8
3.3 Universal Description, Discovery and
Integration (UDDI)................................................ 9
3.4 Assumptions about Applicable Standards...................................................................... 11
4.0
SOFTWARE REQUIREMENTS..................................................................................... 11
4.1 Roles and Role Management......................................................................................... 11
4.1.1
Registration Authority........................................................................................ 12
4.1.2 Registry Administrator................................................................................... 12
4.1.3 Responsible Organization............................................................................... 12
4.1.4 Submitting Organization................................................................................. 13
4.1.5 Registry Clients.............................................................................................. 13
4.2 Accessibility................................................................................................................. 13
4.3 Lifecycle Management.................................................................................................. 14
4.3.1 Registration................................................................................................... 14
4.3.1.1 Registered Objects. ...................................................................... 14
4.3.1.2 XML tags....................................................................................... 15
4.3.1.3 XML datatypes.............................................................................. 15
4.3.1.4
XML schemas (DETs).................................................................... 15
4.3.1.5 XML namespaces.......................................................................... 15
4.3.1.6 XML Trading Partner Agreements.................................................. 16
4.3.1.7 XML document.............................................................................. 16
4.3.1.8 WSDL document. ........................................................................ 16
4.3.1.9 Registration Process....................................................................... 16
4.3.2 Development Forum...................................................................................... 17
4.3.3 Classification................................................................................................. 18
4.3.4 Administration............................................................................................... 18
4.3.5 Version Control............................................................................................. 19
4.3.6 Object Status Management........................................................................... 20
4.3.7 Validation..................................................................................................... 20
4.3.8 Modifying Content........................................................................................ 21
4.3.9 Approving Objects....................................................................................... 21
4.3.10 Retiring Objects........................................................................................... 21
4.3.11 Removing Objects....................................................................................... 21
4.3.12 Quality Control and Error Handling.............................................................. 22
4.3.13 Audit Trail Maintenance............................................................................... 22
4.4 Query Management...................................................................................................... 22
4.4.1 Discovery/Query........................................................................................... 22
4.4.2 Retrieval........................................................................................................ 23
5.0
DATA REQUIREMENTS................................................................................................ 23
5.1 XML Objects and Metadata......................................................................................... 23
5.2 Data Requirements of the OASIS/ebXML RIM
version 2.0.......................................... 25
5.3 Data Requirements of the UDDI Specification
version 3.0............................................. 26
5.4 Data Requirements of the ISO/IEC 11179
Metamodel.................................................. 27
5.5 Data Requirements Summary........................................................................................ 28
6.0
INTEROPERABILITY REQUIREMENTS.................................................................... 30
6.1
Security and Privacy .................................................................................................... 30
6.2 Linkages....................................................................................................................... 30
7.0
CONCEPT OF OPERATIONS........................................................................................ 31
8.0
PRELIMINARY REGISTRY TOOL OPTIONS............................................................ 33
8.1 Background................................................................................................................. 33
8.2 Existing Online Registries............................................................................................. 34
8.3 Available Registry Software......................................................................................... 35
8.4 Commercially Available Tools...................................................................................... 36
8.5 Related Software......................................................................................................... 37
9.0
ACCEPTANCE REQUIREMENTS................................................................................ 38
EXHIBITS
Exhibit 1. Major Metadata Groupings............................................................................................ 24
Exhibit 2
Data Requirements
Summary......................................................................................... 29
APPENDIXES
Appendix
A Summary of XML Registry Software Requirements
Appendix
B Data Requirements for the OASIS/ebXML Registry Information Model
v. 2.0
Appendix
C Data Requirements for the UDDI Specification v. 3.0
Appendix
D Data Requirements for the ISO/IEC 11179 Part 3 Metamodel
Appendix
E XML Registry Requirements Glossary
EXECUTIVE
SUMMARY
“In the simplest sense, the benefits of XML will
be achieved only if organizations of a significant number are using the same
XML definitions. Therefore, these XML
definitions must be available for partners to discover and retrieve. A registry/repository is a mechanism used to
discover and retrieve documents, templates, and software (i.e., objects and
resources) over the Internet.” (http://xml.gov)
The Environmental Protection Agency (EPA) and
its state and tribal information trading partners have initiated collaborative
design and development of an Internet-based voluntary National Environmental
Information Exchange Network (Network) for state, federal, and Native American
Tribal environmental agencies. An eXtensible Markup Language
(XML) Registry is proposed as a component of the Network to serve as a
clearinghouse of Network related information, as well as to provide operational
support for implementation of the State and EPA nodes of the Network. In addition, the State-EPA Network XML
registry may become part of a larger federation of federal XML registries. The registry will support both human and
automated interactions supporting XML object registration, object status
tracking, as well as querying and retrieval for reuse.
The goal of the Network Steering Board is to
provide a vehicle for standardizing information exchanges to improve the
quality and consistency of the data, and to reduce the reporting burden on the
states and tribes. Therefore, the
Network dataflows should be based on data standards that are stored in the
Environmental Data Registry. To ensure
the greatest interoperability, the XML Registry should achieve the linkage
between data standard metadata and the XML schemas and related documents that
are based upon the approved data standards.
To support harmonization of dataflows on the Network, it is important
that approved XML schemas and the standard XML tags and other component parts
be available for discovery and reuse and reference in new XML schemas.
To achieve all of these goals, the proposed XML
Registry will be developed based upon three standards: Organization for the
Advancement of Structured Information Standards/Electronic Business using
eXtensible Markup Language (OASIS/ebXML), International Organization for
Standardization/International Electrotechnical Commission (ISO/IEC) 11179, and
Universal Description, Discovery and Integration Initiative (UDDI). The OASIS/ebXML standard will be used as a
source of specifications for basic XML registry functionality and services. ISO/IEC 11179 will be used as a source of
specifications for the storage of XML tags that are related to corresponding,
well-documented data elements, along
with associated enumerated value lists, and their linkage to other XML objects
(documents, trading partner agreements, datatypes). The UDDI specification will guide the registration and discovery
of Web services that are part of the Network.
This XML Registry Requirements Document will
serve to inform the decision about whether to acquire or build an XML Registry
to support the Network. The document
outlines applicable standards, surveys available tools, and describes
functional and data requirements needed to support the Network. Once initial decisions have been made on the
requirements, an analysis of available implementation options will be
developed.
1.0 INTRODUCTION
“In the simplest sense, the benefits of XML will
be achieved only if organizations of a significant number are using the same
XML definitions. Therefore, these XML
definitions must be available for partners to discover and retrieve. A registry/repository is a mechanism used to
discover and retrieve documents, templates, and software (i.e., objects and
resources) over the Internet.” (http://xml.gov)
EPA and its state and tribal information trading
partners have initiated collaborative design and development of an
Internet-based voluntary National Environmental Information Exchange Network
(Network) for state, federal, and Native American Tribal environmental
agencies.
According to the State/EPA Information
Management Workgroup, “a Network based on standardized Internet language will
allow individual agencies to invest in internal data storage systems of their
choice at a pace they can afford, while also supporting easy exchange of
environmental data between agencies.”
The Network will facilitate information exchanges between “nodes”
maintained individually by participating partners that will use the Internet to exchange information via standardized
eXtensible Markup Language (XML) Data Exchange Templates (DETs) or schemas.
[The term schema will be used in this report to refer to an XML document
designed for data exchange]. Schemas
will be based upon the approved data standards to bring better consistency and
quality to the data that trading partners exchange. Exchange of data between nodes will be governed by Trading
Partner Agreements (TPAs) between the partners. TPAs document the agreed upon data, exchange format, frequency of
exchange, security, and related issues.
One of the critical nodes on the Network will be
an XML Registry that will provide the capability to share information about XML
schemas approved for use on the Network, as well as information about schemas
under development. An XML Registry
contains registry entries that contain descriptive information, or metadata,
about registered XML objects. The
objects may be stored in the registry or in a related repository. The registry supports the submission and
registration of objects, administration of the objects, and makes the metadata
available for discovery, understanding, and reuse. This XML Registry will serve as a location for one-stop shopping
of selected information related to the Network, including both a “clearing
house” for information and “operational support” for Node implementation. It should not duplicate functions provided
on other Network Nodes.
As the information on the Network should be
based on data standards approved by the Environmental Data Standards Council
(EDSC), the XML Registry should be related to the Environmental Data Registry
(EDR) that contains metadata about standard data elements, associated
enumerated value domains, and data element groups. Data standards are "documented agreements on formats and
definitions of common data” that are established to bring better consistency
and quality to the information that organizations maintain. The EDR also registers application data
elements. Data trading partners may
also develop XML schemas for data they want to share. It should be possible to document the data elements (as specified
by XML tags in an XML schema) in the EDR, even though the data may not be
“standardized” through any formal process.
1.1 Purpose
This XML Registry Requirements document serves
to document the requirements for an XML Registry to support the EPA/State
Network. It is part of a series of
documents designed to inform the decision about how to provide an XML Registry to
support the Network. The document describes
applicable standards, surveys available tools, and describes functional and
data requirements needed to support the Network.
1.2 Scope
This document identifies functional and data
requirements of the XML Registry software, as well as necessary
interconnections to related applications.
This document does not include design specifications for the XML
Registry, as it may be used to inform a decision to purchase an available
registry solution rather than to build a new one. An options analysis will be addressed in a follow-on
document. The document also includes a
high-level concept of operations based upon current understanding of the
Network architecture. A more detailed
concept of operations may be developed after a registry solution is selected
and the architecture of the Network is more fully defined.
1.3 System Overview
An XML Registry is planned as part of the
Network to serve as a central location for XML objects and related
resources. The XML Registry will
provide a lifecycle management interface that will be a tool to manage XML
objects through their development and implementation lifecycle. This interface will be accessible to a
limited set of authorized users who will make use of the registration and
update functions to manage the metadata about the XML objects, including their
status, version, and organizational contacts.
It will provide a forum for exchange of information about XML objects under
development to promote harmonization and reuse of schemas. It will provide a means of tracking an XML
object through its progress from development to review to approval. And, it will provide a source of standardized
formats for transmitting data.
The XML Registry will include a query interface
that will allow users such as system developers to access available resources
(such as schemas and trading partner agreements)
through a central registry, in order to promote
reuse and discourage development of disparate exchange formats. The query and retrieval functions will
include both a Web site to support human interactions with the XML Registry and
an Application Programming Interface (API) that will enable automatic query and
retrieval of objects from the Registry.
As the XML objects in the Registry will be linked to the related data
elements and definitions, users will be able to query the Registry based on
semantic content, assuring more efficient searching and effective query
results.
The EPA-State Network XML Registry will include
both a registry and a repository function.
A registry is a facility that stores relevant descriptive information
(metadata) about registered objects, and makes that information available for
discovery, understanding, and reuse. A
repository is a storage and retrieval facility for registered objects that can
be retrieved. Note that a registered
object can be stored in the registry, in a repository connected to the XML
Registry, or in another separate place since an XML object may be accessed
through use of a Unique Identifier (UID) that references the object’s location.
A registered object is something that an
organization wants to publish for discovery and retrieval. Registered objects may include: XML tags
(elements), enumerated value lists, XML schemas, XML schema components, XML
datatypes, XML namespaces, XML documents, trading partner agreements, and
administrative documents (submittal and approval documentation).
Section 3.0 of this document provides an
introduction to the standards that are applicable to the XML Registry design
and operation. At this time, no single
standard describes a comprehensive XML Registry to manage the full array of
objects needed to support standards-based XML.
The data standards that support the dataflows on the Network should be
fully documented in the XML Registry, and the Registry should provide Web
services to support business to business transactions. The registry will need to include
documentation for the data elements) referred to by the XML tags in schemas as
well as the XML schemas themselves. The
registry will need to include data elements and their definitions to help
manage the semantics (meaning) of data from the time of creation through all
stages of processing, analysis, and use.
To meet the requirements of the Network, the XML Registry will need to
be based on a combination of standards, including ISO/IEC 11179, OASIS/ebXML,
and UDDI.
1.4 System Architecture
The XML Registry described in this requirements
document would provide a single source of metadata for data elements, XML
schema, and Web services to support the development of harmonized,
standards-based data exchanges. An
architecture is needed that will support the entire Network enterprise. It is envisioned that state and EPA programs
will be developing schemas to define data exchanges on the Network and
searching for and using Network schemas to format instance documents used in
actual data exchanges. Following is a list
of issues to be considered in selecting an XML Registry Architecture.
C Availability and Reliability. As it is envisioned that the XML Registry
will support day-to-day Network operations by serving schemas for the
validation of data in instance documents, the registry needs to be deployed on
a robust platform. The registry needs
to be reliably available during business hours across the entire United States,
which will require selection of an
architecture that can provide the needed availability.
C Currency. As it is envisioned that the XML Registry will serve as the
source of standardized XML components and the system of record for current
schemas in use, it is important that the data be kept current.
C Information Sharing. There is a requirement that the XML Registry
serve as a forum for collaborative development of schemas, which means that the
architecture needs to support sharing information about standard XML
components, and provide a forum for discussion about schema under development.
C Security. Security of information in the XML Registry is required to ensure
that the data not be altered due to intentional or unintentional actions. Standard Internet security methods, such as
secure sockets layer, will be required to protect both the data and the servers
hosting the data.
Architectural options to be considered
include:
C A single, centralized XML Registry.
C A distributed network of XML
Registries.
C Multiple registries operating in a
peer-to-peer network.
A single, centralized XML Registry could manage
the information about all of the dataflows on the Network.
The following describes the benefits of a
single, centralized registry. The
single registry option can provide the greatest benefits for easing information
sharing and maintaining current information.
A single registry allows the Network to reference one location for all
standard XML components, thus improving ease of query and retrieval. A single registry could provide a sole
discussion forum about schema under development, thus engaging all potentially
interested parties in harmonizing schemas.
The single registry provides the simplest solution for maintaining
current information on schemas in use since it avoids the problem of
duplicating or replicating data and maintaining data in different
locations. The registry is intended to
provide data update services. If data
is updated in a variety of registries, extra effort is needed to copy updates
to the various registries on the Network to maintain currency. The single registry can also provide greater
data security as it ensures that data and system integrity are overseen by a
single operation.
The drawbacks to a single registry include its
possible failure during Network operations.
The single registry does represent a single point of failure, a
situation that can be overcome by the architectural solution chosen for
implementation. A computer center can
provide a backup, mirrored environment to ensure continuous operation. A single, centralized registry may also be
overloaded by Network operations requests, which can be overcome by providing
adequate telecommunications and processing capacity to support demand.
A distributed network of XML registries could be
managed separately by the various participating organizations. One benefit of the distributed network is
that it enables each participating organization to manage its own registry for
its own XML components. For example, a
state environmental agency could have an XML registry on its node where it
could manage XML components for use on the Network, as well as other
state-specific XML components. If each
registry on the network maintained a copy of all of the Network XML components,
this distributed architecture would provide an automatic backup system in the
event that one registry fails to operate.
However, keeping multiple copies of the XML Registry current across
multiple registries is a major endeavor that requires the resources needed to
automatically propagate changes to all the registries to avoid a problem with
data currency. Also, the automatic
propagation creates a potential for errors caused by collisions with other XML
registry content. The distributed
registry will also make it more difficult to query and retrieve the standard
XML components for reuse. In addition,
one of the goals of the Registry is to serve as a collaboration tool for
coordinated development of harmonized schema.
At this time, harmonization is easier to facilitate through a single
source of current information about XML components that are undergoing change
with resulting changes in versions.
With multiple registries, sharing information across systems and
tracking changes/versions becomes more difficult.
The third option is a peer-to-peer architecture
in which multiple registries are networked together and XML objects can link to
data on other network servers. In this
model, data would be shared among the systems.
The intent of a peer-to-peer model is to allow registry participants to
link to XML components on a number of servers, building on XML products
provided by a number of Network participants, and maintained by those
participants on their individual registries operating in a shared
environment. This model could
distribute the responsibility and the cost of XML registry data maintenance
among all participants. Although
peer-to-peer architecture does not require replication of data across
participating servers, some data replication would be needed to avoid the
availability issue presented by the potential downtime of a single, central
registry. The need for some data
replication adds costs and raises potential error, just as with the distributed
network of registries. In addition,
peer-to-peer architecture presents potential security problems to those
participating in the Network.
The Environmental Information Exchange XML
Registry requirements include a registry service that provides the means for
managing objects in a repository and a registry client that is used to access
them. To support lifecycle management
and human querying, the registry services will be implemented using a public
Web site with some functions restricted to registered users via authentication
using Secure Sockets Layer (SSL). A Web
services API will support the automated business to business transactions. For example, a search command across
multiple sites that are part of a UDDI network would enable an organization to
find and retrieve schemas in the XML Registry using keywords in a search (like
water or waste). In addition, one of
the requirements is the need for the registry to communicate to other
registries that may contribute to or download from the central registry.
2.0 REFERENCES
Blueprint
for a National Environmental Information Exchange Network, (Information Management Working Group) Network
Blueprint team, October 30, 2000; document amended June 2001.
Cooperation
between XML Registries and Related Registries, A Collaborative Effort between
the XML Working Group and Federal and State Government Agencies, XML Working Group Task 2.2.3.2 Registry
Standards Harmonization http://xml.gov/documents/completed/lbnl/20020417status.htm
DISA Registry Initiative, http://www.disa.org/drive/Registry_resources.html
DoD XML Registry, http://diides.ncr.disa.mil/xmlreg/user/index.cfm
ebXMLSoft, Inc., http://www.ebxmlsoft.com/index.html
Freeb XML Initiative, http://www.freebxml.org/registry.htm
ISO/IEC
11179 Information Technology–Metadata Registries (MDR), http://metadata-stds.org/
ISO/IEC
FDIS 11179-3 Information technology – Metadata Registry (MDR) - Part 3,
Registry metamodel and basic attributes, June 2002.
Logistics Management Institute, Requirements for an XML Registry, May
2001.
Metadata
for Documents, http://www.nist.gov/sc4/liaisons/tc10sc8/Annex1.ppt
National
Environmental Information Exchange Network Information Package, June 2001.
OASIS/ebXML
Registry Information Model, v 2.0. By members of the OASIS/ebXML Registry
Technical Committee. Approved April
2002, http://www.oasis-open.org/committees/regrep/documents/2.1/specs/ebrim_v2.1.pdf
OASIS/ebXML
Registry Services Specification,
v 2.0. By members of the OASIS/ebXML
Registry Technical Committee. Approved
April 2002, http://www.oasis-open.org/committees/regrep/documents/2.1/specs/ebrs.pdf
OASIS ebXML Registry Technical Committee, http://www.oasis-open.org/committees/regrep/
Oracle Corporation Web Services Overview, http://www.oracle.com/ip/develop/ids/jdevdocs/9iWebSv.pdf
and http://otn.oracle.com/tech/webservices/htdocs/uddi/overview.html
Reference
Model for an Open Archival Information System (OAIS), http://www.ccsds.org/documents/pdf/CCSDS-650.0-B-1.pdf
State/EPA Registry at NIST, http://xmlregistry.nist.gov/EPA-States/
Universal Description, Discovery and Integration
(UDDI) version 3.0, Published Specification, 19 July 2002, http://uddi.org/pubs/uddi-v3.00-published-20020719.htm
XML.gov Registries, http://xml.gov/registries.htm
XML.org Registry, http://www.xml.org/xml/registry.jsp
XML
Registry and Repository, from
OASIS Cover Pages, http://xml.coverpages.org/xmlRegistry.html
3.0 APPLICABLE STANDARDS
Currently there are various specifications for
XML registries including, pre-eminently, the OASIS/Electronic Business using
eXtensible Markup Language (ebXML) Registry standard, which is being developed
by the Organization for the Advancement of Structured Information Standards
(OASIS), and the UDDI specification, which was developed by a vendor consortium
and is undergoing further development by an OASIS Technical Committee. Also relevant to the XML Registry is the
ISO/IEC 11179 Metadata Registry standard.
3.1 OASIS/ebXML
OASIS and UN/CEFACT initially developed separate
XML Registry/Repository specifications.
The efforts were merged into a single OASIS Technical Committee. The goal of the OASIS/ebXML registry is to
“provide a stable store where information submitted by a Submitting
Organization is made persistent.” The
stored information can be used to facilitate ebXML-based Business to Business
(B2B) partnerships and transactions.
Submitted content may include XML schema and documents, process
descriptions, ebXML Core Components, context descriptions, Unified Modeling
Language (UML) models, information about users and user roles, and software
components.
The OASIS/ebXML registry information model (RIM)
is intended to achieve interoperable registries and repositories with an
interface that enables submission, query, and retrieval on the contents of the
registry and repository. The registry
specification is designed to serve a wide range of business categories by
covering the spectrum from general purpose document registries to real-time B2B
registries. The registry specification
includes the OASIS/ebXML Registry
Information Model version 2.1(v. 3.0 is expected to be published early in
2003), which provides a blueprint for the ebXML Registry. The information model can be used to guide
registry implementers in registry design.
The RIM describes the types of objects that are stored in a registry,
the type of metadata recorded about the objects, and how the information in a
registry is organized.
The Registry
Information Model is accompanied by a Registry
Services Specification, which "defines the interface to the ebXML
Registry Services as well as interaction protocols, message definitions and XML
schema." Registry Services may be
implemented in several ways including, a public Web site, a private Web site,
or hosted by a Virtual Private Network (VPN) provider. The ebXML Registry Service is comprised of a
set of interfaces designed to manage the objects and inquiries associated with
the OASIS/ebXML Registry. The two
primary interfaces for the Registry Service consist of a Life Cycle Management
interface that controls the processes necessary for managing an object within
the XML Registry and a Query Management Interface that controls the release of
information from the XML Registry. Both
of these interfaces are accessed through the use of a Registry Client
Interface. The Registry Services
Specification defines the interfaces exposed by the Registry Service as well as
the interface for the Registry Client.
The XML Registry makes use of a repository for storing and retrieving
persistent information required by the Registry Services.
3.2 International Organization for
Standardization (ISO)/International Electrotechnical Commission (IEC)
11179-3:2000 Information technology – Metadata Registry (MDR) - Part 3,
Registry metamodel and basic attributes
ISO/IEC 11179 is not an XML specification. The ISO/IEC 11179-3:2002 revision being
published in December 2002, provides a metamodel for shareable data through
specification of a data element metadata registry and guidance for full
documentation of data elements. The
purpose of the standard is to promote the standardization and registration of
data elements and their components that document data in order to make the data
understandable and available for sharing and reuse. The standard provides guidance on the formulation and maintenance
of discrete descriptions and semantic content (metadata) that can be used to
formulate data elements and their components in a consistent, standard
manner. It describes the data element
characteristics necessary to uniquely identify and fully document data elements
and their components to enable sharing, including identifiers, definitions, and
classification categories. The standard
includes the documentation of value domains that store the names and
definitions of the permissible values or enumerated values associated with data
elements.
ISO/IEC 11179 describes a metadata registry that
can assist users of shared data in having a common understanding of an item’s
meaning, representation, and identification.
Metadata about data elements and their components is stored in a metadata
registry that can support data sharing with descriptions of data. Registration is the process of documenting
the metadata about data elements and their components. Registration should be carried out at the
data element and component level to promote and maximize semantic value. Complete data element metadata, as outlined
in the ISO/IEC 11179 model, enables the end user to interpret the intended
meaning confidently, correctly, and unambiguously.
There are commonalities between a data element
metadata registry and an XML registry.
Both seek to document reusable specifications for information objects,
with ISO/IEC 11179 focusing on the individual parts (XML tags and data
elements) and both types of registries registering groups (XML schema and data
standards). XML schema are hierarchical
groupings of elements of data. The
ISO/IEC 11179 metadata registry model includes the semantic content about data
elements in the form of definitions and supporting information that can provide
a valuable tool for searching by keyword or concept. Merging the capability of a 11179 metadata registry and an XML
registry would meet many needs for B2B transactions.
There are plans to improve interoperability
between ISO/IEC 11179 metadata registries and XML registry/repositories. This collaboration will be the subject of a
conference planned for Santa Fe, New Mexico in January 2003.
3.3 Universal Description, Discovery and
Integration (UDDI)
UDDI (Universal Description, Discovery, and
Integration) is an XML-based specification which allows businesses to publish
and discover information about their business
functions and Web services offerings.
Web services define any Internet-based applications that perform
specific tasks and comply with a standard specification; they are platform and
language neutral, and can be described, published, discovered, and invoked
dynamically in a distributed computing environment.
The UDDI project is an industry initiative to
create a platform-independent, open framework for describing services,
discovering businesses, and integrating business services using the Internet,
as well as an operational registry that is available today. UDDI is the first truly cross-industry effort
driven by all major platform and software providers, as well as marketplace
operators and e-business leaders. UDDI
describes a Web service as “a self-describing, self-contained, modular unit of
application logic that provides some business functionality to other
applications through an Internet connection.
Applications access Web services via ubiquitous Web protocols and data
formats, such as Hyper-Text Transport Protocol (HTTP) and XML, with no need to
worry about how each Web service is implemented. Web services can be mixed and matched with other Web services to
execute a larger workflow or business transaction.” This specification can enhance description, discovery, and
integration capabilities for the Network.
The UDDI registry can be conceptually viewed as
an "extended telephone directory," providing registration and
searching of:
C White pages (business address,
contact, known identifiers).
C Yellow pages (categorizations based
on standard taxonomies such as industry classification or recognized locational
codes).
C Green pages (technical information
about Web services such as interface specifications expressed in Web Services
Definition Language (WSDL)).
The UDDI registry specifications were developed
under the auspices of OASIS and are described on the UDDI Web site (http://www.uddi.org). Version 3.0 of the Published Specification
dated July 19, 2002 can be found at this Web address: (http://uddi.org/pubs/uddi-v3.00-published-20020719.htm).
The UDDI Business Registry supports Web services by providing a place for a
company to register its business and the services that it offers. People or businesses that need a service can
use this registry to find a business that provides the service. Major services, such as the EDR or Network
XML registry can be registered in a UDDI registry, with a pointer to each
service. Lower-level content in the EDR
and Network XML registry could also be registered in a UDDI registry, with
appropriate lower level pointers to the content. The registry specification describes the major information
components for this Web services registry, including metadata about business
entities, business services, service types, and specification pointers to
technical information about the service.
The XML Registry should enable access to contact
information, Web addresses, and interfaces of Web services. It should deploy, manage, and secure Web
services for the Network.
3.4 Assumptions about Applicable Standards
One of the goals of the XML Registry is to
enable management and linking of atomic information objects like data elements
and the groupings of those elements in order to provide the capability of
managing the data elements individually and leveraging those granular
components in a variety of structures (schemas). The registry should also make them available for searching based
on semantic content. Another goal is to
provide an active Web services registry as one of the nodes of the
Network. In order to meet these goals,
a combination of standards is needed.
The XML Registry needs to comply with parts of
the most recent OASIS/ebXML standards, including the RIM and the Registry
Services specification, as those are the most widely accepted standards in the
emerging field of XML schema management.
As the requirements for this XML Registry include linking XML group
objects to elementary objects (XML tags, data elements, enumerated lists, and
related components and metadata) the ISO/IEC 11179 standard can be used to
define the data requirements for administering the element data. As it is intended that the XML Registry will
provide Web services, the XML Registry will also be based on the UDDI standard. Therefore, the data requirements for the XML
Registry will be based on a combination of the OASIS/ebXML RIM, the metamodel
for the ISO/IEC 11179 standard for metadata registries, and the UDDI
specification.
4.0 SOFTWARE REQUIREMENTS
This section will describe functional
requirements for the XML Registry.
Appendix A includes a summary of the requirements. Key requirements include: support for human
as well as automated system interactions to search and retrieve XML objects,
linkages to Web services, potential for interoperability with other XML
registries, inclusion of an expandable hierarchical classification scheme for
organization of XML objects for discovery and retrieval, linking XML objects to
related data element metadata to support discovery based on semantic content,
and registration of Web services that will be part of the Network.
This section describes the supported user roles
and the requirements for registry accessibility. The registry requirements are described in two sections:
lifecycle management and query management.
It is assumed that the registry services will be available through
interfaces designed for both automatic and human interactions with the registry
content.
4.1 Roles and Role Management
The OASIS/ebXML Registry Services Specification
describes the different types of XML Registry users. While the Responsible Organization (RO) is the “owner” of the
XML objects submitted on its behalf, the associated Submitting Organization (SO) is the Point of Contact for the
object. As the ISO/IEC 11179 standard
has an established set of registry roles and responsibilities, the OASIS/ebXML
standard modeled their roles and responsibilities on those in ISO/IEC 11179. It is assumed that the XML Registry will adopt
the same structure and extend the list as needed. For registry administration purposes, it is important to
distinguish between Registry Guests or Clients and Registered Users. Registry Clients do not have rights to
submit or update Registry content, but only have query access to discover and
access content. They have no contract
and do not require authentication to use the registry. Registered users have a contract with the
Registration Authority (RA) that must be authenticated for usage. They can submit or update Registry
content. The following sections
describe the categories of Registry Users.
4.1.1 Registration Authority. A Registration Authority (RA) is the host of
the registered XML Objects and is responsible for the content of the
Registry. The RA is the single
organization responsible for establishing and maintaining information about the
XML objects. The RA is responsible for:
C establishing
policies and procedures for using the registry.
C enrolling
and maintaining a list of Registry organizations and users.
C providing
authentication certificates for appropriate Registry organizations and users.
C ensuring
that registered objects are reused by Registry users.
C receiving
and processing submissions for registration of objects.
C assigning
appropriate registration and administrative statuses to the objects.
C deleting
objects, if necessary.
The ISO/IEC 11179 standard permits a network of
hierarchical registration authorities, with a single Registration Authority
responsible for the entire registry, complemented by subsidiary RAs associated
with a specific program or node. The
Registration Authority is a type of Registered User.
4.1.2 Registry Administrator. A Registry Administrator oversees the XML
Registry and is responsible for the availability of its services and the
integrity of its data. The Registry
Administrator evaluates and enforces registry security policy. The Registry Administrator may be the same
individual as the Registration Authority.
The Registry Administrator is a type of Registered User.
4.1.3 Responsible Organization.
A responsible
organization (RO) oversees the coordination of XML objects in a particular
organization (e.g., a program office or a state) and the contents of the
metadata associated with the XML object.
An RO can create registry objects.
An RO is a type of Registered User.
An RO is responsible for:
C Advising about names, meaning, and
permissible values of tags or data elements submitted for registration.
C Coordinating the development of objects
so that proposed elements are unique and proposed groups use standard elements,
and proposed schemas are harmonized with related dataflows.
C Identifying the need to update
registered objects.
C Ensuring the quality of the metadata
for the XML objects associated with the RO.
4.1.4 Submitting Organization. A Submitting Organization (SO) is the
organization or unit within an organization that submits an XML object for
addition, change or cancellation/withdrawal.
They would be enrolled as a Registry user and issued an authentication
certificate to perform a number of lifecycle operations on their own XML
Objects. An SO can be the same as an RO
or the point of contact for an RO. An SO
is a type of Registered User.
An SO might be a functional area business manager
or an application system manager. An
SO is responsible for:
C Identifying and documenting XML objects
appropriate for registration.
C Submitting proposals for registering
XML objects to the appropriate RO.
4.1.5 Registry Clients. A registry client is any person who uses the
registry to discover and retrieve an XML object. Registry clients do not need to be registered users. It is important for the XML Registry to
track retrievals of XML objects so that usage of the objects can be
tracked. If an unregistered user
retrieves an XML schema and makes use of it or references it from an external
XML schema, that user will want to know if that schema has been superseded by a
subsequent version or has been retired.
One way to track usage is to require users to enter their name and email
address when they retrieve an object.
That way, they can be notified when the retrieved object changes its
status. Alternatively, users can be
given the option of signing up for notification of change.
4.2 Accessibility
The XML Registry needs to be accessible to all
participating Network partners. It also
needs to be secure, so that data cannot be compromised or lost. Data maintenance functions would be
accessed over a secure connection using SSL.
Data maintenance would be carried out on a backend server, and a
schedule would be set up to capture data and copy it to the public access
server where it could be searched, but not modified.
Procedures for lifecycle management of the XML
objects need to be developed so that rules can be published regarding the
accessibility of objects during different draft states.
As the XML Registry will need to be available to
serve schemas upon request by Network Nodes to support runtime validation, the
XML Registry needs to be available on a 24 hours a day, 7 days a week basis in
a robust environment to support reliable operations. The XML Registry would also include a public access capability to
discover and retrieve published XML objects.
4.3 Lifecycle Management
This section will describe the functions of the
registry required to manage the registration of objects and subsequent
management of them, including version changes, configuration management,
content updates, and retirement.
4.3.1 Registration
One of the goals of the XML Registry is to serve
as a central repository that is the single source of information related to the
Network. The Network dataflows will be
built upon standards, and standards will take a number of forms, including
standard data elements, and collections of data elements in data types or
schema components. Therefore, the XML
Registry needs to register the full array of XML objects so it can adequately
support the Network.
4.3.1.1 Registered Objects. The
XML Registry needs to register the following objects: XML tags (elements),
enumeration lists, XML schema, XML schema components, attributes, datatypes,
Trading Partner Agreements (TPAs), and Web Service Definition Language (WSDL)
documents. The XML Registry needs to
include functionality for registering and managing XML objects within namespaces. The XML Registry will have administrative
functions, and will record the status of XML objects and keep track of which
schemas are using different schema components.
As a result, it will also need to store XML documents, including
standard approval documents for XML schema.
All objects in the registry will have a Universally Unique identifier
(UUID). The registry will generate the
UUID upon object submittal unless the user supplies a conforming UUID along
with the submitted object.
One of the tenets for managing dataflows on the
Network is to harmonize the schemas that are exchanging data on the
Network. Harmonization involves
ensuring that the dataflows don’t duplicate or conflict with one another. This is achieved by developing data types or
sub-schema components for some pieces of data and posting those in a central
location. These are, in turn,
referenced by other schemas on the network, ensuring reuse of schema design and
consistent data flows. In this fashion,
the XML Registry would support, though not enforce, harmonization of schemas on
the Network.
One of the organizational strategies for the XML
Registry is use of namespaces, which will provide a context for tags and
schemas and ensure that there will be no tag name collisions. Namespaces will need to be hierarchical with
one namespace for everything exchanged on the Network, and possibly, subsidiary
namespaces for specific program areas or specific data exchanges. The network is developing a Core Reference
Model (CRM) that may provide some structure for the Network namespaces.
Additional metadata to be managed in the XML
Registry includes contact and administrative information about submitters and
users of particular XML objects. This
will facilitate collaboration between organizations seeking to create and use
XML schema for a particular dataflow.
4.3.1.2 XML tags. XML tags represent elements that are the
basic building blocks of an XML instance document. XML tags are similar to data elements in a database. Elements in a schema can be defined globally
or locally. Global elements are defined
in the root element of a schema and may be defined through a complex
datatype. Local elements are nested
inside a schema structure and cannot be externally referenced. Attributes can be recorded about each XML
tag to provide further information about the elements. XML tags can be associated with enumerations
or lists of enumerated values that establish the names or codes and associated
definitions for a set of codes or values that form the content of the data
element. XML tags will be related to
data element metadata, including related definitions.
4.3.1.3 XML datatypes.
XML datatypes represent
the kinds of information that elements and attributes can hold, such as
character strings or dates. Simple data
types can be predefined or user-defined.
Examples of predefined datatypes include string, decimal, date, integer,
and the like. Complex datatypes are
user-defined datatypes that contain child elements or attributes, and they can
be defined globally in the root element of a schema or locally anywhere in a
schema associated with a single element.
4.3.1.4 XML schemas (DETs). XML
offers a number of DETs. XML Document Type
Definitions (DTDs) and XML schema both are document types that define the
required structure of an XML document and the constraints on its content. While initial DET development was done using
DTDs, EPA applications are moving towards XML schemas as a standard for
designing data exchange protocols. It
is assumed that XML schemas will be developed according to the DET
guidance. The Network Core Reference
Model may also provide some structure for Network schemas. This model can form the basis for developing
a parser to validate the schema.
4.3.1.5 XML namespaces.
The World Wide Web
Consortium (W3C) defines an XML namespace as “a collection of names, identified
by a Uniform Resource Indicator (URI) reference, which are used in XML
documents as element types and attribute names.” In practice, namespaces are being used to identify groups of XML
objects that share a common context within a specific business or program area. The addition of the namespace context allows
overlapping XML to be tagged with distinguishing labels, to avoid potential
name collisions in application.
Therefore, namespaces are organizational mechanisms that allow a
business area data steward to oversee naming in a particular area to enforce naming
conventions and ensure uniqueness. A
namespace is declared in the root element of a schema, using a namespace
identifier through a user-defined namespace prefix. (A namespace identifier would be www.epa.gov/xml and the namespace prefix
would be epa.) At this time, EPA has
not decided on a namespace strategy.
However, it is clear to developers of XML schemas that namespaces will
be needed to ensure uniqueness of naming to avoid tag name collisions on the
network. Namespace management will
require controls on changing objects that are referenced externally and
requirements for notification of changes.
4.3.1.6 XML Trading Partner Agreements. A
Trading Partner Agreement (TPA) is a document that defines the conditions under
which two partners will transact business together. It is also referred to as a collaboration protocol agreement by
the ebXML specifications.
4.3.1.7 XML document. A
document that contains data surrounded by XML tags. In XML, documents can be seen independently of files. One document can comprise many files, or one
file can contain many documents. An XML
document may be any of a number of data exchange formats, including XML schema,
DTDs, and others.
4.3.1.8 WSDL document. A WSDL
document is just a simple XML document.
It contains a set of definitions to define a Web service. A WSDL document defines a Web service using these
elements: <portType> (operations
performed by the Web service), <message> (messages used by the Web
service), <types> (datatypes used by the Web service), and
<binding> (communication protocols used by the Web service).
4.3.1.9 Registration Process. The XML
Registry should support automated registration of XML schema and other
objects. Authorized users will submit
their XML objects as part of a Registration Package through the Registration
Application Programming Interface (API), through SSL, that will fully document
the object to be registered. The
Registration Package will consist of metadata about the object in the form of a
Registry Entry and attached files containing the object to be registered. The API will load the Registry Entry into
the XML Registry. If the object to be
registered is an XML schema, it will be parsed to validate it, and to capture
additional information for the registration process. Sub-schemas, tags, and associations in the schema will be
analyzed for registration as individual objects. If the object is valid, the XML Registry will store the object in
a repository file and the related information in the registry with a draft
status and make it available for review.
The API would check the XML tags against the registered tags (and
associated data elements) in the EDR.
If the tags in the schema did not exist in the EDR, the API would
register the new tags and associated data elements with a draft status in the
EDR.
The Registration Package will contain
information that will support the creation of associations between objects in
the Registration Package, and between the registered objects and the
Registration Package itself.
Associations can be made to other objects within the XML Registry or to
objects external to the registry.
Examples of associations include:
C An XML document can be associated
with its corresponding schema.
C An XML schema may
be related to an approval document.
C An elementary object (like an XML
tag) may be related to a registered schema.
C A registered schema may reference a
registered datatype or schema component.
C A registered object may supersede an
earlier version of an object.
Authorized personnel from EPA program offices
and states and tribes will register as submitters, and in that role, will be
able to register draft XML objects using the approved Registration Package
format.
All objects in the registry will have a UUID
that will be generated by the XML Registry upon object submittal unless the
user supplies a conforming UUID along with the submitted object. The registry must create an audit trail to
capture the events in the process of submitting an XML Object.
In order to accurately track and verify the
submission through the registration process, the XML Registry will require all
Registration Packages to contain the registry-provided authentication digital
certificate and corresponding digital signature.
4.3.2 Development Forum
One of the intended uses of the XML Registry is
to function as an information clearinghouse for all Network participants. As a clearinghouse it would provide
information on ongoing Network projects, post work in progress on draft XML objects,
and provide a forum for exchange of ideas on XML objects.
The Network is designed to serve a distributed
enterprise of state and EPA programs.
Its users will be geographically distributed, have diverse programmatic
interests, and will have needs for a forum for collaboration on XML objects
throughout the lifecycle. The registry
features will promote reuse of existing XML objects and promote harmonization
of new development with existing and emerging XML development. This can be done by allowing registration of
XML objects in draft form for discovery, review, and comment by other
users. The application should be able
to track versions of XML objects as changes are made over time. Draft XML objects will be submitted through
a Registration Package with the same security requirements as final XML
objects.
Initially, collaborators on development of an
XML schema would post a notification in the development forum. Initial versions of the schema would be
registered as a file, with the minimum amount of metadata (including schema
name and description), allowing them to be discovered in searches. These initial versions would not undergo the
validation process and would carry an early draft status and a version number
lower than 1 (0.9, for example). A
discussion forum would permit the posting of comments by registry users. Once the XML schema had undergone initial
review and revision, it would be assigned a 1.0 version number, and would be
submitted to the registry for validation.
The XML Registry should also provide the
capability to register the building blocks of XML schema, including tags (data
elements), datatypes, enumerated lists, and schema components that can be
discovered for reuse in new XML object development. These components will be registered for reuse in the metadata
registry. This step will promote using
standards and harmonizing schema across the Network.
4.3.3 Classification
Part of the registration process will be to
prepare and submit metadata about a registered object that will support
administration of the object, as well as facilitate search and retrieval of the
object. A classification scheme will be
used to organize the registry contents.
A classification scheme may also be a registered object.
A classification scheme is an arrangement or
division of objects into groups that are based on characteristics that the
objects have in common, e.g., origin, composition, structure, application, or
function.
A registered object can be classified in a
number of ways. As part of the
submission process, the SO must classify the object according to one or more
previously registered classification schemes.
Classification schemes commonly used for retrieval include geographic
location, industry, subject matter, and various taxonomies. The Network is currently evaluating use of a
Core Reference Model (CRM) that will identify the major entities or business
areas of data exchange. The CRM may
form the basis of a classification scheme for the XML Registry.
4.3.4 Administration
The XML Registry will have a Registration
Authority who will oversee the XML objects in the Registry and who will have
rights to change the administrative status of objects and to delete objects, if
required. The Registry administration
functions from the ISO/IEC 11179 metadata registry standard will be adopted for
this XML Registry.
The following sections address the
administrative functions to be supported including XML object version control,
promotion of objects through a series of administrative statuses, management of
information about who is using an XML object, maintenance of an audit trail
about changes to the object, and management of information about Web services
accessibility.
An additional piece of metadata, Object
Stability, will help users identify which objects to reuse. The Submitter can select a value that
describes the current status of the object, indicating its level of stability
(static or dynamic).
Section 5.0 will address the metadata required
to be associated with a registered object as part of the registry entry. A registry entry is relevant descriptive
information about a registered object.
The principal metadata attributes for each registered object may
include: Name, Name Context, Version, Object Identifier, Object Type,
Classification, Registration Status, Administration Status, Status Date, Role
(Submitting Organization, Responsible Organization, Registration Authority),
and Stability.
The XML Registry may also manage information
about external data (or reference documents) that are information items that
are related to a registered object but which reside outside the registry. External data items may be submitted for an
object that is being registered; the XML Registry will record an association
with the external data, but the external data will not be managed as a
registered object.
4.3.5 Version Control
As data exchanges change over time, the XML
objects supporting the exchanges will change.
Just as application systems are assigned new version numbers when
changes are made, XML schemas and other objects will change and new version
numbers will be needed.
Authorized Submitters will be able to submit
objects for registration. Once
registered, registered objects will not be updated directly, but a new version
of a registered object could be submitted, and would be linked to the older
version(s) of the object.
Administrative metadata about an object can be modified by authorized
users without versioning. Submitters
would not be allowed to delete objects, but would be able to recommend
retirement of objects or mark objects to restrict publication. The Registration Authority would be allowed
to delete the objects recommended for deletion. This controlled change environment will ensure that objects that
are referenced by other objects cannot be changed or removed, which would cause
a reference failure. For this reason,
usage needs to be tracked to predict the impact of versioning or retiring an
XML object.
4.3.6 Object Status Management
In the OASIS/ebXML model, the required functions
of the lifecycle management service include: submit, approve, update,
deprecate, and remove XML objects, and add or remove slots from the registry
entries. Slots are extensions to the
required metadata about the registered objects. Each organization needs to determine its own process of review
and approval, and assignment of the rights to designate an XML Object as
Approved.
ISO/IEC 11179 uses different terms to describe
the lifecycle management of a registered object. ISO/IEC 11179 also has a more detailed list of statuses in the
lifecycle (including both administrative and registration statuses), and a
lifecycle that supports review and approval for use. Since the Network is intended to be based on the data standards,
some XML objects including tags and eventually schema components and/or data
types and even schemas may be part of a standard. Therefore, it is preferred to have a lifecycle that supports
review and approval of the registered objects.
The XML Registry will have a dual system of
statuses. Administrative status will
designate the object’s position in the Registration Authority’s processing
lifecycle. Administrative statuses may
include: received, draft, rejected, submitted for certification, processed, and
being promoted. Registration status
will be used to designate the object’s position in the registration (and review
and approval) lifecycle. Registration statuses
will be based upon the XML Technical Resources Group (TRG) approval statuses,
and will include: working draft, last call working draft, candidate
recommendation, and proposed recommendation.
Users would have to consider the status of an object before reusing
it. An object with a draft status is
posted for comment and is subject to change during its development. An object with a recommended status has been
through a review and approval process and is posted to promote reuse.
4.3.7 Validation
The XML Registry will need to provide validation
of XML objects during registration.
Validation will ensure that the schema is well-formed, and that all the
external references in it are functional.
As schemas reference tags, this will require registration of XML tags
and associated data elements. The
Registration API can ensure that the XML components are registered in the
correct order so that validation can be achieved.
Validation can include checking the registry to
ensure that tag names are not duplicated within a namespace, ensuring that an
XML schema references valid tags, and ensuring that external links in a schema
are valid. The API will need to use the
URL to access the external registry in order to retrieve the requested XML
object to ensure that the usage is correct in the schema. If the external
source is not available, the parser will fail and an error message will be
returned.
Schemas will need to be periodically revalidated
as referenced components will be retired or versioned, and it will be critical
to identify the need to change the referencing schema so that it will work as
designed.
4.3.8 Modifying Content
The XML Registry will provide the capability for
an authorized SO to modify XML administrative metadata about objects on behalf
of the RO that submitted the object and therefore has authorization to change
it. Authorized updates will involve
modifying the metadata about the object.
If the object itself is replaced or over-written, it will be
versioned.
4.3.9 Approving Objects
Before an object can be discovered and
retrieved, it needs to be approved for publication in the XML Registry. Rules will be determined by the
Environmental Information Exchange Network Steering Board (NSB).
A likely scenario is that a Submitting
Organization will be able to decide when an object is available for public
review and comment. Only schemas that
have been reviewed by the XML Technical Resources Group and found to be
compatible with data standards and harmonized with other Network dataflows will
be marked as recommended for reuse.
4.3.10 Retiring Objects
Once a Submitting or Responsible Organization is
no longer using an XML schema or other object, it can be marked as
retired. It will not be deleted from
the registry as it may be referenced by other XML objects, or it may be useful
in a historical context.
4.3.11 Removing Objects
As the XML Registry is a historical record of
XML objects usage, it will be rare that an XML object will be deleted from the
XML Registry. Rules for deletion will
be established. Only the Registration
Authority will have the right to delete a record. Submitting and Responsible organizations will be able to mark
objects for restricted publication, or as recommended for deletion.
4.3.12 Quality Control and Error Handling
During validation, the XML Registry will return
an error to the user if an attempt is made to submit an object that fails
validation, violates a data constraint or duplicates an existing object (like a
tag within a namespace).
4.3.13 Audit Trail Maintenance
An audit trail is a historical record of all
actions taken on a registry entry by a registered user. The audit trail maintains information on the
creation, the impact of any change, the related submission, and the submitting
organization. This enables all registry
entry actions (creation, update, or deletion) to be traced back to the
submitting organization and recorded with the date and time of the action. The audit trail is a requirement of the
OASIS/ebXML standard. The ISO/IEC 11179
standard does not include rigorous tracking of changes as it only requires the
recording of date of receipt and date of last modification. However, for a registry operating in a
runtime environment, audit trails will be essential to ensure data integrity.
4.4 Query Management
One of the primary purposes of the XML Registry
is to facilitate the discovery of XML objects for retrieval for potential
reuse. The registry services will
include querying and retrieving XML objects from the registry and repository by
both human interactions with a Web site that enables browsing and drill-down,
and by automated interactions with the XML Registry via the API. Unlike the registration and maintenance
functions, the query and retrieval functions will not require users to be
registered, and therefore SSL and authentication digital certificates and
signatures are not needed.
4.4.1 Discovery/Query
In order to improve the ability to discover XML
objects they need to be fully described with metadata that can be stored in a
database and can be searched. The
metadata attributes are what classify the objects, promote understanding, and
facilitate discovery through searches.
Namespaces will provide hierarchical
classification schemes enabling association of an XML object with primary
business areas, and serving as one of the most useful ways to find the
object. Queries will support searches
by object identifier, version number, associations, classifications,
descriptions, names, alternate names, affiliated responsible and submitting
organizations and using organizations.
“Keyword” queries will be possible as XML objects will be linked to
metadata that provides semantic content, including data elements and
definitions. The registry services
interface will return data on the registry entry (metadata) and allow linkage
to the object itself stored in the repository.
Web site queries via human interaction may be
performed using a Web browser to browse and drill down or using a filtered
query that allows multiple searches to narrow results.
XML Registry queries will be conducted in an
automated fashion using the API. The
API will permit other organizations to interact with the XML Registry through a
Simple Object Access Protocol (SOAP) message, using standard query syntax,
including SQL for complex queries.
XML files will be stored in a publicly
accessible area, and will be available for discovery by Internet search
engines. Successful discovery and
retrieval would be limited without the advantage of organized metadata.
4.4.2 Retrieval
The registry services interface will support
query of objects by searching the related metadata. Once an object is found, the registry service will provide access
to the object in the repository. Users
or applications will be able to retrieve not just the registry entry metadata
but the objects themselves. The XML
Registry should provide support for the runtime validation of XML content against
the registered XML schema as part of the Network operations. The XML Registry/Repository will serve
schemas for data validation. Data
validation will be done by the Network Nodes that will initiate a query of the
XML Registry to access the relevant schemas for use in data validation.
5.0 DATA REQUIREMENTS
In order to develop an XML Registry that meets
the software requirements specified in Section 4, the data requirements for the
XML Registry need to incorporate components of three standards: the OASIS/ebXML
Registry Information Model (RIM) v 2.0, the UDDI Published Specification
version 3.0, and the metamodel for metadata registries in ISO/IEC 11179 Part
3. This section will describe what XML
objects will be stored in the XML Registry, and what metadata can be recorded
about each object. The section will
briefly describe the information requirements of each standard. How the data will be structured will be
addressed in a follow-on design document.
5.1 XML Objects and Metadata
An XML Registry must be able to record a broad
range of information related to XML transactions. The information may be recorded in a database, linked as
documents, or can be accessible from another resource. The type of XML objects that could be
included in a Registry are: XML components (XML schema with component
relationships, XML Datatypes, XML attributes, Object Classification schemes,
XML tags), Documents (Trading Partner Agreement, Trading Partner Profile),
Registry Packages, Service Bindings, Users and Organizations (Names,
Mailing/Locational Information, Contact Information), and Web Services.
Registration of the above objects will require
the following major metadata groupings.
Descriptions of the types of information that could be part of each
grouping are provided.
Metadata
Grouping Name |
Definition |
Data
Content |
Administration |
Maintains
information necessary for the management of Registry Objects. |
Administration
and Registration Status, Object Stability, Internal Identifier, Registry
Package Description |
Point
of Contact |
Information
about an individual or organization that has a role related to a Registry
Object |
Person
and/or Organization Name, Mailing/Locational Address, Telephone Numbers,
Email Address, Role |
Descriptive |
Bibliographic
information about a Registry Object. |
Name,
Name Context, Object Type, Definition, Abstract, Purpose, Version, Format,
Alternate Identifier, Effective Date, End Date |
Classification |
Arrangement
of objects into groups based on characteristics which the objects have in
common. |
Group/Category,
Associations to Other Objects |
Security |
Defines
the access control for Registry Objects. |
Object
Access, User Roles, Permissions, User Authentication |
Linked
Objects |
Associates
content in the registry with content that may reside outside the registry. |
Linked
Documents, Web Site URLs |
Audit
Tracking |
Record
of information changes. |
GroupingsCreate
Date, Create User, Last Change Date, Last Change User, Data Change Description |
Web Services |
Information about the linking of the registry
to other resources using Web services. |
Bindings and Associations, Links, Usage,
Compliance |
XML
Tags |
Describes
the relationship of XML tags in Registry Objects to data standards or data
elements in applications. |
XML
Tag Names, Data Element Relationships |
Exhibit
1. Major Metadata Groupings (Continued)
The following sections will outline how the
OASIS/ebXML RIM version 2.0, UDDI Specification, and ISO/IEC 11179-3 handle
these high-level data requirements.
5.2 Data Requirements of the
OASIS/ebXML RIM version 2.0
The ebXML/RIM version 2.0 model addresses all of
the data groupings listed above. The
registry information model includes:
Registry Object - An abstract base class used by
most classes in the model. Registry
Objects are related to subclasses that include information related to the
following data groupings from Exhibit 1: Administration , Point of Contact,
Descriptive, Classification, Security, Linked Objects, Audit Tracking, Web
Services, XML Tags and XML Objects.
Classification Scheme - A structured way to
classify or categorize Registry Objects.
The structure of a Classification Scheme may be defined internal or
external to the Registry. This is
related to the Classification and XML Objects section of the data groupings.
Auditable Event - This is an action that changes
a Registry Object instance. The rules
established for the registry define what actions require audit tracking and the
level of tracking needed for each action.
This section is related to the Administrative, Audit Tracking, Point of
Contact, and XML Objects data groupings.
User, Postal Address, Email Address, and
Organization - This information provides a way to identify people or
organizations with an interest in the Registry Object. This information can be related to the Point
of Contact data grouping. The User
information can also be related to the Audit Tracking, and XML Objects data group.
Service and Service Binding - This information
represents technical information on a specific way to access a specific
interface and includes the linkage information. This can be related to the Web Services and XML Objects data
groups.
External Link - URLs can be used to associate
content in the Registry with content that may reside outside the registry, such
as a DTD or Trading Partner Agreement that is on another Web site. The use of External Links supports this
service and can be found in the Linked Objects data grouping from the table
above.
Security - The Security section of the model
requires that each Registry Object be associated with security controls that
govern access to operations or methods performed on that object. This includes mechanisms that control
Permissions, Privileges, Roles, Access Groups, User Identification, and
Authentication. The Security data
grouping would include this type of information.
Appendix B describes the high-level data
requirements of the OASIS/ebXML RIM.
5.3 Data Requirements of the UDDI
Specification version 3.0
The UDDI specification describes the Web
services and behaviors of all instances of a UDDI registry. Central to UDDI’s purpose is the
representation of data and metadata about Web services. A UDDI registry offers a standard mechanism
to classify, catalog and manage Web services so that they can be discovered and
used. The UDDI information model
consists of the following entities:
Business Entity - The top-level XML element in a
business’s UDDI entry captures the starting set of information required by
partners seeking to locate information about a business’s services including
its name, its industry or product category, its geographic location, and
optional categorization and contact information. It includes support for “yellow pages” taxonomies to search for
businesses by industry, product, or geography.
This can be related to the Point of Contact, Descriptive, and Classification
data groupings.
Business Service - A grouping of a series of
related Web services that can be related to either a business process or a
category of services. An example of a
business process could be a logistics/delivery process, which could include
several Web Services including shipping, routing, warehousing, and last-mile
delivery services. By organizing Web
Services into groups associated with categories or business processes, UDDI
allows more efficient search and discovery of Web Services. This can be related to the Descriptive,
Classification, and Web Services data groups.
Binding Template - One or more technical Web
Service Descriptions captured in an XML element called a binding template. The binding template contains the
information that is relevant for application programs that need to invoke or to
bind to a specific Web Service. This
information includes the Web Service’ URL address, and other information
describing hosted services, routing and load balancing facilities. This can be related to the Web Services data
group.
Compliance Information - Each Binding Template
element contains an element called a tModel that contains information which
enables a client to determine whether a specific Web service is a compliant
implementation so that it can be determined whether the specific Web service
being invoked complies with a particular behavior or programming
interface. This can be related to the
Web Services data group.
Appendix C describes the high-level data
requirements of the UDDI specification.
5.4 Data Requirements of the ISO/IEC 11179
Metamodel
The 11179 metamodel is a blueprint for the
elements of an information architecture.
The 11179 metamodel is designed to manage individual elements of
information, such as enumerated values and their definitions (called value
domains) and data elements, as well as groups of these elements by
classification schemes. Classification
schemes can be used to group data elements and enumerated values associated
with a data standard, an information system, or an XML schema. In addition, by hierarchical arrangement of
classification schemes, the data elements and enumerated values can be organized
to represent the structure of a data base or an XML schema. Following are descriptions of the major
components of the 11179 metamodel, and an indication of how they relate to the
XML metadata groupings.
Administrative Data. The 11179 metamodel contains a great deal of metadata about each
component. Many components of the
metamodel are designated as Administered Components. Administered components are components of the metamodel that
require definitions and specification for reuse and/or sharing in or among
enterprises. Each administered
component carries administration information, including an identifier (composed
of a Registration Authority Identifier, Data Identifier, and Version),
registration and administration status designations, origin, organization and
contact information, as well as create date, change date, effective date, and
end date. Each administered component
in the model can be registered and tracked independently. The components of this Administration Region
of the 11179 metamodel are related to the Administrative and Point of Contact
data groupings.
Data elements.
Data elements are the heart of the ISO 11179 metamodel. While some components of the model, such as
value domains and classification schemes, can be registered independently, much
organizational metadata is focused on the primary elements of an information
architecture, data elements.
Registration of a data element in a metadata registry
requires that certain characteristics of the
data element be recorded to clearly describe and define it. These characteristics are stored as
attributes of the data element, stored in separate, related tables. Data elements are equivalent to XML tags,
and so this Data Element Region is related to the XML tag data grouping. The 11179 model allows data elements to be
related to one another through a data element derivation (more than one data
element can be used to derive another data element, or more than one data
element can be combined to create a derived data element). This could be related to the XML object data
grouping.
Names and Definitions. Data element (and other model object) names and definitions are
stored in the Naming and Identification Region of the metamodel. This organization allows a data element to
have multiple names and definitions in context, and allows XML tag names to
be represented as alternate data element names
in context. While it is another region
of the model, it carries data element attributes that are related to XML
tags.
Value Domains.
Data Elements generally are associated with a Value Domain that provides
representation information. Value
Domains can be enumerated or non-enumerated.
An enumerated domain is associated with a discrete set of permitted
values, such as names or codes. A
non-enumerated domain must include a definition/description of the possible
valid values for the data element representation. These might be established by a range or a rule. A value meaning is the meaning or semantic
content of a value, or a data value. A
value meaning is paired with a permissible value to explain its meaning. Value domains would be related to
enumerations for XML tags.
Classification Schemes. Classification schemes enable data elements
or other objects to be organized by groups or themes. A classification scheme is defined as the descriptive information
for the arrangement or division of objects into groups. In the Classification Region of the model, a
classification scheme can be defined and related to data elements and other objects
to be included in that classification grouping. This is related to the Classification and XML object data
groupings.
Appendix D describes the high-level data
requirements of the ISO/IEC 11179 metamodel.
5.5 Data Requirements Summary
To summarize the data requirements, the
following table shows the major groupings of data to be recorded about Registry
Objects. The table also shows whether
the major groupings are accommodated by the data requirements for ISO/IEC
11179, OASIS/ebXML, and UDDI.
Data grouping |
OASIS/ebXML |
UDDI |
ISO/IEC 11179 |
Administration |
Yes |
Yes |
Yes |
Point
of Contact |
Yes |
Yes |
Yes |
Descriptive |
Yes |
Yes |
Yes |
Classification |
Yes |
Yes |
Yes |
Security |
Yes |
No |
No |
Linked
Objects |
Yes |
No |
No |
Audit
Tracking |
Yes |
Yes |
No |
Web
Services |
Yes |
Yes |
No |
XML
Tags |
No |
No |
Yes |
XML
Objects |
Yes |
Yes |
Yes |
Exhibit 2. Data Requirements Summary (continued) |
The table shows that standard metamodels have a
lot in common. In addition to including
many of the same attributes, the registry metamodels are similar in other
ways. All the specifications, OASIS/ebXML, ISO/IEC 11179, and UDDI, handle
object identification in the same way by assigning a UUID to each object. Alternate identifiers may be needed for
Registry Objects to enable association with external links, as well as
associations (Any Registry Object instance may be associated with any other
Registry Object instance.) All of the
specifications include metadata for administrative and point of contact
identification and management, and use a classification scheme to organize
records for retrieval. The levels of
security and audit tracking required by the three specifications vary and the
Registry would need to be built to the most stringent requirement identified by
review of the specifications (e.g., Audit Tracking in OASIS/ebXML). The association of XML tags to data element
information is only supported by the ISO/IEC 11179 specification so
associations between the XML registry and the 11179 registry structure need to
be created. By incorporating the
information requirements of all three of these specifications, the registry can
accommodate all the information needs expressed by the listed data
groupings. Summaries of the data
requirements of the three applicable standards can be found in Appendixes B, C,
and D. Appendix E contains a glossary
of the terms and definitions used in this document.
6.0 INTEROPERABILITY REQUIREMENTS
6.1 Security and Privacy
The security requirements of the XML Registry
are in part determined by the Information Management Working Group (IMWG)
Network Blueprint. The blueprint states
that public key infrastructure (PKI) technology using digital signatures and
digital certificates should be considered for verifying and authenticating the
validity of partners exchanging information.
The secure sockets layer (SSL) and Secure HyperText Transfer Protocol
(S-HTTP) are specified for transmitting data securely. An information request may flow over the network
under different security levels, including public access and end-to-end
authentication through certificates with digital signatures.
The security requirements of the ebXML model
include:
C The registry must be able to
authorize appropriate access to its contents.
The identity of the ownership of registry content as well as the
privileges assigned to a user for the registry content must be
authenticated. The registry must be
able to assure confidentiality for some of the registry contents that may not
be suitable for public viewing. Roles
will determine the level of authorization assigned to a user.
C The registry requires user-level
security and document-level authorization.
Session-based security may be used to avoid authenticating every message
or interaction.
C The registry may only accept content
from any client if a certificate issued by the Registration Authority, is
provided and is digitally signed.
Messages between registry services and their clients must be confidential. Messaging can use the distinguished name
from the certificate to authenticate the user when the registry receives a
request. The distinguished name is the
name that is associated with the digital certificate that is being used to
authorize a request to the registry.
The payload of the message also must be signed, and the registry will
store the signature as part of the content.
6.2 Linkages
It is intended that the XML Registry will serve
as a functional node on the EPA/State Network.
The XML Registry will interoperate with each of the EPA and state nodes
to provide a central source of information on the Network. The XML Registry could support runtime
validation of the content of XML instance documents against the registered XML
schema as part of the Network operations.
As the XML Registry will need to be available to serve schemas upon
request by Network Nodes to support runtime validation, the XML Registry needs
to be available on a 24 hours by 7 days a week basis in a robust environment to
support reliable operations.
The XML Working Group of the federal Chief
Information Officer’s Council is analyzing the requirements for an XML Registry
that would serve the federal government.
Initial results indicate that this may become a federation of separate
registries, each serving an agency or
department or a particular business area. It is intended that the State/EPA Network
Registry would be able to interact with that federation of federal
registries. At this time, the
requirements for that registry or group of registries have not been defined, so
there is no specific plan for how to implement that linkage.
States may also develop statewide XML registries
and the State/EPA Network Registry may also have to interoperate with those
registries as some content may be common among those registries and the
EPA/State XML Registry.
7.0 CONCEPT OF OPERATIONS
The OASIS/ebXML Registry is designed to support
business processes. A business process
is defined as “a collection of business transactions between business
partners.”
Participants in the Network will maintain
Trading Partner Profiles (TPP) that describe the business processes in which
each organization can engage. The TPP
will specify the technological capabilities supported and the requirements that
must be met to exchange business documents with them. In order to exchange data over the network, a set of
organizations will need to sign a Trading Partner Agreement (TPA) that defines
the conditions under which the partners will transact business. The TPA will address: identification of the
participating organizations, purpose of the TPA, dataflows to be used
(including the specific format and structure to be used for exchanging
information and the url for the location of the format), transport protocols
and electronic addresses of the parties, procedures for dispute resolution,
duration of the TPA, contingencies for exchange failure, internal system
requirements, legal framework, rules for message exchanges, parallel paper
transactions, performance and reliability, quality and stewardship, record
retention, roles and responsibilities, security, termination conditions, and
intended use of data. TPAs and TPPs
will become registered XML objects in the XML Registry.
A pair of business partners, such as a state and
EPA, planning to exchange data on a particular business process, such as
verifying facility data in a central file of facility data, or updating
information on wastewater permits, will use the XML Registry in the process of
defining the protocol for data exchange.
Once the need for data exchange is established,
the parties may agree to collaborate on developing an XML schema. In the Discovery and Retrieval Phase, they
will query the XML
Registry to discover available components and
review them for potential reuse or reference/inclusion in the new XML
schema. They will retrieve them via
download for analysis and reuse. In the
XML Registry, they can review requirements for XML schema design and post
notice of intent to develop this new schema in the Developers Forum. They would download available components for
use, including standard XML tags or XML schema components or data types.
Once the draft schema was prepared, they would
prepare a minimum set of metadata about the schema so that they could post the
draft schema as a file available for discovery and review in the Registry. During the review period, the developers may
be contacted by other registry users regarding similar efforts underway, or may
receive comments from reviewers. Some
collaboration might be required to harmonize with related exchange
templates. After a specified review
period, the Submitting Organization would review comments, make any needed
changes, and submit the XML schema to the XML TRG for approval
consideration. At that point, the SO
would be required to complete a registration package for full registration in
the XML Registry. The SO would use the
Registration API to prepare the registry package for submission. The package might consist of a schema, its
associated metadata, and documents. The
API would analyze the package and notify the developers of any missing or
problematic components. The API would
create a UUID for the XML object, validate the XML schema, and parse the XML
schema for components that could be registered as data types, tags, enumerated
value lists, or schema components.
Once an XML object was approved and complete registered, the SO could
request that the RA change the status of the XML schema to recommended, and it
could be posted for reuse.
Once populated, the XML Registry will be an
integral part of the Network. All of
the content related to the Network data exchanges will be stored in the
Registry. During the runtime phase,
ebXML messages will be exchanged among trading partners using the messaging
service. Automated business to business
transactions would be facilitated using the API to support automated querying. External users would be able to construct a
query on the UDDI network, specifying a keyword to use in searching
classification schemes. The XML
Registry Query API would receive the query message and return both the metadata
in the registry entry as well as the linked object from the repository. Users could also use a Web interface to
browse and drill down in search of XML objects that matched their search
criteria. The registry would track
usage of the XML objects by collecting information about registry clients
who downloaded the objects. That way, the Registry Administrator could
send them notification of change to an object.
If a change was made to any XML object, the SO
could use the lifecycle management function to register a new version of the
XML object. The older version would
remain in the registry to support references to it. Objects would be versioned when any major part of the metadata
was modified, or if the XML object itself was changed. The registry administration function would
notify users about changes in objects for which they were registered. Through the registration of a new version of
an object, the SO would declare the old object as “retired.” However, if the SO actually wanted to delete
an object from the registry, they would need to make a request to the RA. The RA would conduct an impact analysis of
the deletion of the object by analyzing usage of the object. The SO would be able to mark objects to
restrict their publication.
The registry would support the data validation
to be done on the Network by serving schemas upon request. Data exchanges on the Network would require
access to the schemas to support data validation. Data validation would be carried out by the nodes. The nodes would send a request to the
Registry to access the relevant schema to support data validation.
8.0 PRELIMINARY REGISTRY TOOL OPTIONS
8.1 Background
EPA and its information trading partners have
identified a need for an XML Registry to support proposed XML data interchange
over the Network. XML Registry
development is a relatively new field.
It is important to survey existing registries and available tools as
part of the process of identifying a solution for the Network.
XML Working Group of the Federal CIO Council
(XMLWG) has created a working group to consider development of federal
government XML Registries. Both the
Department of Defense and the National Institute of Standards and Technology
registries have been considered to be prototypes for a government-wide
registry. At this time, an alternatives
analysis of managing XML resources in federal agencies is under
development. The alternatives analysis
compares the implications of not developing any government registry with two
major architectural options:
C Single Unified Registry/Repository:
Building a single federal registry/repository
that requires that every federal agency wishing to publish schemas or artifacts
submit their objects to the central registry/repository.
C Federated/Distributed Model: Each agency or
entity may develop or acquire its own registry/repository, meeting
government-wide specifications to ensure interoperability with the central
government-wide registry/repository.
The report’s findings will help shape the future
of XML registries in the federal sector.
In addition, new tools are emerging from the
commercial sector that may change the options
available to the Network in the near
future. This section simply presents
information about available registries and registry development tools. The tools have not been tested and analyzed;
the information presented is based on what is available in the industry press
and in marketing materials. Upon determining
the requirements of the XML Registry, an options analysis will be conducted to
determine if any of the available registry software could meet those
requirements.
8.2 Existing Online Registries
Some registries are already available online and
can be used to post XML objects for discovery and reuse.
XML.org.
The XML.org Registry offers a central
clearinghouse for developers and standards bodies to publicly submit, publish,
and exchange XML schemas, vocabularies and related documents. Operated by OASIS—the non-profit XML
interoperability consortium—the XML.org Registry is a self-supporting resource
created by and for the community-at-large.
Industry groups and other organizations that
have developed XML schemas or vocabularies are encouraged to register their
work at the XML.org Registry. The
registry is available online at http://www.xml.org/xml/registry.jsp. Schemas can be registered and searched by
industry. The environmental industry is
included. The registry is an independent
entity that will serve as a model for future registries. It offers no administrative controls on what
is registered and cannot be tailored specifically to meet EPA Network needs.
NIST registry
The National Institute of Standards and
Technology (NIST), in cooperation with EPA, supported the development of a
prototype XML registry as a proof of concept.
It was based on software developed for the Defense Logistics Information
Service (DLIS) and was compliant with the ebXML version 1 specification. It is available at: http://xmlregistry.nist.gov/EPA-States/.
This software includes the ability to search the
registry and repository for XML objects by URN, common name, version, keyword,
organization, object type, file type, and dates. The software also allows viewing of XML objects. If you are an authorized user, you can
submit and validate a schema, and conduct some administration of XML objects in
the registry.
Some of the deficiencies in this implementation
are that objects are limited to an 8-character file name. The registry does not support tracking the
status of an XML object. The
registration of schema modules was not entirely successful.
Environmental Data Registry (EDR)
The EDR is not an XML Registry. It is a metadata registry that is based upon
the ISO/IEC 11179 metamodel. Currently,
the EDR stores XML tags as alternate name contexts for approved standard data
elements. The EDR has a lot in common
with the XML Registries in that it registers information resources and other
“group objects” that are associated with individual data elements that are
registered. The EDR also has a lot of
the functionality that is defined in the OASIS/ebXML version 2 specification,
including notification of change, web-based querying and retrieval, flexible
object linking through associations, secure data update, version control, and
status tracking. The EDR does not
currently support all of the functionality of the XML registries as specified,
but could be modified to meet all of the stated requirements.
8.3 Available Registry Software
Several implementations of XML registries offer
a source of code that could be obtained and customized for Network use. XML registries have been developed for the
Federal government, and the code can be obtained for reuse. Another XML registry has been developed by
the Open Source Code community that makes software freely available for reuse.
DISA/DoD Registry
The Defense Information Systems Agency (DISA) of
the Department of Defense (DoD) has developed an XML registry to promote
interoperability. The registry provides
a baseline set of XML Information Resources developed through coordination and
approval among the DoD communities and includes the following functions:
browse, search, and retrieve XML objects by keyword, information resource type,
version, and namespace.
The availability of this code would need to be
investigated.
Open Source ebXML Registry
An alpha version of an Open Source OASIS ebXML
registry has been developed by a consortium called the ebxmlrr project. The registry implements version 2.1 of the
OASIS ebXML Registry specifications.
The release is available in both source and binary form under an Apache style
open source license that permits royalty free use of the source and
binaries. The release may be downloaded
from the following location:
http://sourceforge.net/project/showfiles.php?group_id=37074&release_id=97479
The code was developed by an international
collaboration with developers from around the world who are jointly
participating in an open source community project hosted at Source Forge. The initial code base for the ebXMLRR
project at Source Forge originated from a donation by Sun Microsystems, Inc. to
the open source development community of an internal implementation developed
at Sun. Sun donated the code to the
ebXMLRR project at Source Forge in November of 2001. Since then, an international community of developers committed to
open international standards has worked together to complete this
implementation of the
OASIS ebXML Registry/Repository standard. The ebXMLRR project information is available
at http://ebxmlrr.sourceforge.net.
Since both the registry and the client software
are based on Java and XML, the implementations are portable across platforms
and operating systems, and can also interoperate with implementations on other
platforms and those written in other languages. The software can interoperate with any database that supports
SQL97, including Oracle.
The registry is designed to store metadata about
Web Service descriptions, XML data and documents, binary data (such as images,
sound files, video data, executable application files, CAD files, etc.), and
any other kind of data. Using the
registry, this data can be searched and classified using advanced and ad hoc
query mechanisms such as XML filter query and SQL query. The registry features a Lifecycle Manager
for schema validation and other administrative functions and a Query Manager
that provides read-only query functions.
In addition to the registry, the ebxmlrr UI
client software provides a unique user interface (UI) for graphically
visualizing registry content. It is
based on the Java API for XML Registries ('JAXR'). JAXR provides a single
standard Java API that allows Java programmers to interact with emerging
standards for XML Registries including both ebXML Registry and UDDI. This UI client software also uses the Java
API for XML Messaging ('JAXM') in order to send SOAP-based messages between the
client software and the registry. These
SOAP messages are used to send requests to the OASIS ebXML Registry and to
receive responses from the registry.
8.4 Commercially Available Tools
A couple of vendors have developed XML Registry
Software that is available commercially.
More products are currently under development by software vendors. Following is a brief description of the
currently available software.
XML Global
XML Global has developed an array of GoXML
tools, including the GoXML Registry that is designed to store and organize
business documents, processes, and services.
The registry is based on the ebXML standard version 2. The registry tool is integrated with the
related software tools, such as GoXML Transform, which stores transformation
rules, schemas, DTDs and EDI dictionaries.
GoXML Transform Central is an open standards-based platform for
Enterprise Integration and automated supply chain management that includes
ebXML Message Handling Services, Web Services, and Process and Transformation
integration, with optional metadata management provided by Registry
services. XML Global's GoXML Registry
is a metadata registry server with Web and programmatic interfaces that
includes a registry engine, a repository, a Web-based registry client and a
registry services API. The registry
tool includes lifecycle management of registry objects, version control,
management of authorized users and privileges, search and retrieval of
repository objects, content administration, and registry service interfaces. The XML Global products run on Windows,
Solaris, and Linux platforms.
XML Canon
XML Canon/Developer is a repository for XML
schemas, DTDs, instance documents, stylesheets, and adjuncts. It includes the ability to index and search
XML objects. It includes namespace
management, and flexible lifecycle management, as well as check-in and
check-out and version control. It
includes a data dictionary for the management of vocabulary components in a
data dictionary. It can be integrated
with related TIBCO products for development of XML.
XML Canon Developer runs on Windows, NT, UNIX
(HP UX, Linux, and Solaris). It can be
Web-enabled through use of an add-on Portal package.
8.5 Related Software
Oracle XML DB
The Oracle Corporation has included Oracle XML
DB as part of its Oracle 9i Relational Database Management System (RDBMS)
software package. Oracle XML DB is a
storage and retrieval technology for XML objects. It is based on the W3C XML data model and provides the capability
to store XML objects in a relational database and can serve as an XML
repository. It provides a variety of
functions, including access control, folder organization, WebDAV and FTP
access, SQL search, hierarchical indexing, and a navigational API to rename,
delete, and copy files. Because this
product is very new there is minimal available information. It appears that you can use XMLSpy to
register an XML schema into the Oracle 9i database using a graphical schema
editor. But, it does not appear to
include registry functions—although those could be developed to manage the
registration and retrieval of objects in the repository. In addition, Oracle9iAS Release 2 includes a
fully UDDI v1.0 and 2.0-compliant registry, and provides a comprehensive J2EE
platform to develop, deploy, and manage Web services.
DISA Registry Initiative (DRIve)
The Data Interchange Standards Association
(DISA) developed an XML registry based upon the ebXML V. 1 specifications to
register business process models, XML schemas and DTDs, and related business
objects (such as industry-specific code lists) in a systematic way. The registry enables search and retrieval of
models, schemas, profiles, and other objects, using common retrieval methods,
including browse and drill-down, as well as filtered searches, both required by
the ebXML specifications. In addition,
as required by ebXML, DRIve will use standard message formats based on SOAP, an
XML messaging specification, for submission of objects and responses. It will also have security to protect against
unauthorized access, integrity of objects, and nonrepudiation of entries.
DRIve participation is limited to members of
DISA affiliated organizations – Accredited Standards Committee X12, Hotel
Electronic Distribution Network Association, Interactive Financial Exchange,
Mortgage Industry Standards Maintenance Organization, Open Philanthropy
Exchange Forum, and Open Travel Alliance (and others that may join with DISA as
the project unfolds).
It is not known whether this software prototype
could be obtained for reuse.
9.0 ACCEPTANCE REQUIREMENTS
It is anticipated that the requirements listed
and described in Appendix A will be part of the acceptance criteria for an XML
Registry application.
APPENDIX
A
Summary of XML Registry Software Requirements
Requirement Number |
Requirement Description |
1 |
Provide registry access to authorized users from EPA, States, Tribes, and industry partners. |
2 |
Serve as a single, centralized XML Registry that will manage the information about all of the dataflows on the Network and serve as the single source of information related to the Network.. |
3 |
Comply with the OASIS/ebXML Registry Information Model (version 2.0). |
4 |
Comply with ISO/IEC 11179 (International Standard) Metadata Registries (MDR). |
5 |
Provide the capability to manage both atomic information objects like data elements and enumerations, as well as the groupings of those elements and permit linking so that the granular components (elements) can be leveraged in a variety of structures (schemas). |
6 |
Provide support for human as well as automated system interactions to search and retrieve XML objects. Human interface needs to be accessible in accordance with Federal Section 508 guidelines. |
7 |
Provide linkages to Web services. |
8 |
Provide potential for data exchange with other XML registries. |
9 |
Include an expandable hierarchical classification scheme for organization of XML objects for discovery and retrieval. |
10 |
Manage XML objects within namespaces and provide the capability to manage a dynamic, hierarchical namespace architecture. |
11 |
Record the status of XML objects and control the update of that status. |
12 |
Keep track of which schemas are using different schema components. |
13 |
Generate a Universally Unique identifier (UUID) for each registered object. |
14 |
Support schema harmonization by providing a place to register new schema development efforts and by registering standard data elements and data types or sub-schema components for standard groupings of data to promote reuse. |
15 |
Store contact and administrative information
about submitters and users of particular XML objects. |
16 |
Store XML objects including XML tags
(elements), enumerations (name/value pairs), XML schemas, XML schema
components, XML datatypes, XML namespaces, XML documents, trading partner
agreements, and administrative documents (approval documentation, submission
manifests). |
17 |
Support automated registration of submitted
objects and related metadata. |
18 |
Serve as an information clearinghouse by providing information on ongoing Network projects, posting work in progress on draft XML objects, and providing a forum for exchange of ideas on XML objects. |
19 |
Provide security features to ensure control of
access to authorized users to protect data integrity. |
20 |
Upon registration, provide schema validation
against schema guidelines to ensure that it is well-formed. |
21 |
Provide version control on all registered
objects. |
22 |
Ensure unified understanding of registered objects through use of complete metadata, including definitions. |
23 |
Provide an audit trail for all actions taken on a registry entry. |
24 |
Create interfaces to support several registry user roles, including Registration Authority, Submitting Organization, and Responsible Organization, as well as registry guests/clients. |
25 |
Support query of Registry Entry metadata using browse and drill down techniques. |
26 |
Support additional filtered and ad hoc queries for certain user classes through the API. |
27 |
Support retrieval of objects from repository via download. |
28 |
Support human and automated registry service interfaces to submit, register, modify, search, and retrieve objects. |
29 |
Support tracking of an XML object through the lifecycle from registration in draft through retirement. A dual set of status codes will keep track of administrative and registration statuses. |
30 |
Create potential for future linking to other
federal, state, and other related XML registries. |
31 |
The XML Registry should provide support for
the runtime validation of XML content as part of the Network operations by
providing access to the registered XML schema by Network Nodes that will
validate the data content of Network exchanges. |
APPENDIX
B
Data Requirements for the OASIS/ebXML Registry
Information Model v. 2.0
Appendix
B - Data Requirements for the
OASIS/ebXML Registry Information Model v.2.0
OASIS/ebXML
Model Classes and Descriptions |
|
Classes |
Description |
Model Section: Detail View |
|
RegistryObject |
An abstract base class used by most classes in
the model; provides minimal metadata for registry objects. |
RegistryEntry |
Common base class for classes in the
information model that require additional metadata beyond the minimal
metadata required by Registry Object. |
Slot |
Provide a dynamic way to add arbitrary
attributes to Registry Object instances. |
ExtrinsicObject |
Provide metadata that describes submitted
content whose type is not intrinsically known to the Registry and therefore
must be described by means of additional attributes. |
RegistryPackage |
Allow for grouping of logically related
Registry Object instances even if the individual member objects belong to
different submitting organization. |
ExternalIdentifier |
Provides additional identifier information for
the Registry Object |
ExternalLink |
Used to associate content in the registry with
content that may reside outside the registry. |
Model Section: Registry Audit Trail |
|
AuditableEvent |
Describes the information model elements that
support the audit trail capability of the Registry. Provide a long-term record of events that effect a change in a
Registry Object. |
User |
User instances keep track of the identity of
the user that generated the Auditable Event. |
Organization |
Provides information on related organizations. |
Postal Address |
A simple reusable entity class that defines
attributes of a postal address. |
Telephone Number |
A simple reusable entity class that defines
attributes of a telephone number. |
Email Address |
A simple reusable entity class that defines
attributes of an email address. |
Person Name |
A simple entity class for a person’s name. |
Service |
Provides information on services, such as Web
services. |
ServiceBinding |
Registry Object instances that represent
technical information on a specific way to access a specific interface
offered by a Service instance. |
SpecificationLink |
Provides linkage between a Service Binding and
one of its technical specifications that describes how to use the service
using the Service Binding |
Model Section: Association of Registry Objects |
|
Association |
Used to define many-to-many associations among
Registry Objects in the information model. |
Model Section: Classification of Registry
Objects |
|
Classification Scheme |
The metadata that describes a registered
taxonomy. |
Classification Node |
Defines the tree structure where each node in
the tree is a Classification Node. |
Classification |
Classifies a Registry Object instance by
referencing a node defined within a particular classification scheme. |
Model Section: Security View |
|
AccessControlPolicy |
Defines the policy rules that govern access to
operations or method performed on the Registry Object. |
Permission |
Used for authorization and access control to
Registry Objects. |
Privilege |
Controls access to a protected Registry
Object. |
|
|
class=Section9>
Privilege Attribute |
A common base class for all types of security
attributes that are used to grant specific access control privileges. |
Role |
Roles are used to grant Privileges to
Principals. |
Group |
An aggregation of users that may have
different Roles. |
Identity |
Used to identify a person, an organization, or
software service. |
Principal |
An entity that has a set of Privilege
Attributes. |
APPENDIX C
Data Requirements for the UDDI Specification v.
3.0
Appendix
C
Data
Requirements for the UDDI Specification v. 3.0 |
|||
Model
Sections |
Descriptions |
||
Entity: Business Entity |
Top-level XML element in a business’s UDDI
entry, captures the starting set of information required by partners seeking
to locate information about a business’s services. |
||
Business Key |
Uniquely identifies the Business Entity within
the registry. |
||
Discovery URLs |
List of Uniform Resource Locators (URL) that
point to alternate, file-based service discovery mechanisms. |
||
Name |
Simple textual name for a Business Entity. |
||
Description |
Simple textual descriptive information about
the Business Entity. |
||
Contacts |
Records contact information for a person or a
job role within the Business Entity so that someone who finds the information
can make human contact for any purpose. |
||
Business Services |
Describe families of Web services. Provided by the Business Entity. |
||
Identifier Bag |
List of other identifiers, each valid in its
own identifier system, (e.g., tax identifier or DUNS number). |
||
Contact Bag |
List of business categories that each
describes a specific business aspect of the Business Entity,(e.g., industry, product category or geographic
region). |
||
Signature |
May be digitally signed using XML digital
signatures. |
||
Entity: Business Service |
A grouping of a series of related Web
services that can be related to either a business process or a category of
services. |
||
Service Key |
Identifies the Business Service within the
registry. |
||
Business Key |
Identifies the Business Service within the registry. |
||
Name |
Simple textual name for the Business Service. |
||
Description |
Simple textual descriptive information about
the Business Service. |
||
Category Bag |
List of business categories that each
describes a specific business aspect of the Business Service (e.g., industry,
product category or geographic region.) |
||
Signature |
May be digitally signed using XML digital
signatures. |
||
Entity: Binding Template |
Technical descriptions of Web services are
provided by Binding Template entities. |
||
Binding Key |
Identifies a Binding Template. |
||
Service Key |
Identifies the Business Service that contains
the Binding Template. |
||
Description |
Simple textual descriptive information about
the Binding Template. |
||
Access Point |
An attribute-qualified URI, typically a URL,
representing the network address of the Web service being described. |
||
tModel Instance Details |
List of one or more tModel Instance Info
elements. |
||
Category Bag |
List of categorizations that each describes a
specific aspect of the Binding Template (e.g., industry, product category or
geographic region.) |
||
Signature |
May be digitally signed using XML digital
signatures. |
||
Entity:
tModel |
Describes Web services in ways that are
meaningful enough to be useful during searches is an important goal of UDDI. |
||
C-2 |
Simple textual name for the tModel. |
||
Description |
Simple textual descriptive information about
the tModel. |
||
Overview Doc |
Used to house references to remote
descriptive information or
instructions related to the tModel. |
||
Identifier Bag |
List of logical identifiers, each valid in its
own identifier system. |
||
Category Bag |
List of categories that describe specific
aspects of the tModel (e.g., its technical type). |
||
Signature |
May be digitally signed using XML digital
signatures. |
||
Entity: Publisher Assertion Structure |
A set of Business Entity structures whose
members would like to make some of
their relationships visible in their UDDI registrations. |
||
From Key |
The first of two Business Entity instances
between which an assertion is made |
||
To Key |
The second of two Business Entity instances between
which an assertion is made. |
||
Keyed Reference |
Describes the relationship between the
Business Entity elements identified by From Key and To Key |
||
Signature |
May be digitally signed using XML digital
signatures. |
||
Entity: Operational Info Structure |
Used to convey the operational information for
the UDDI core data structures (the Business Entity, Business Service, Binding
Template and tModel structures). |
||
Created |
Information about a publishing operation is
captured whenever a UDDI core data structure is published. |
||
Modified |
The time at which the entity with which the
Operational Info is associated was created or last changed. |
||
C-3 |
Contains information about how modifications
are related to each other. |
||
Node ID |
A unique key that is used to identify a node
within a UDDI registry. |
||
Authorized Name |
Provides an indication of the owner of the
data. |
C-4
APPENDIX
D
Data Requirements for the ISO/IEC 11179 Part 3
Metamodel
Appendix
D
Data Requirements for the ISO/IEC 11179 Part 3
Metamodel
The primary purpose of the ISO/IEC 11179-3 is to
specify the structure of a metadata registry, the basic attributes which are
required to describe metadata items, and the types of metadata items that are
administered in a registry. The basic
unit for which metadata is collected in the registry is called an administered
item and information about an administered item is recorded in an
administration record. The first table
describes the type of information included in an administration record and
includes detailed information about the data element region of the
registry. Types of administered items
along with their definitions are included in table 2.of Appendix D.
Table
1: Administration Record and Data Element Region |
|||
Data
Requirements |
Descriptions |
||
Region: Administration/Identification |
Contains information related to the
identification and registration of items submitted to the Registry. |
||
Registration Authority Identifier |
An Identifier assigned to the organization
responsible for maintaining the Registry. |
||
Language Identification |
The collection of identifiers required to
identity a language or language variation for a particular purpose. |
||
Contact |
An instance of a role of an individual or an
organization to whom an information item(s), a material object(s) and/or
person(s) can be sent to or from in a specified context. |
||
Item Identifier |
An identifier for an item. |
||
Administered Record |
A collection of administrative information for
an administered item. |
||
Region: Naming and Identification |
Manages the names and definitions of
administered items. |
||
Context |
A universe of discourse in which a name or
definition is used. |
||
Terminological Entry |
An entry containing information on
terminological units for a specific administered item within a context. |
||
Language Section |
The part of a terminological entry containing
information related to one language. |
||
Designation |
The designation of an administered item within
a context. |
||
Definition |
The definition of an administered item within
a context. |
||
Region: Classification |
The descriptive information for an arrangement
or division of objects into groups. |
||
Classification Scheme |
The descriptive information for an arrangement
or division of objects into groups based on characteristics, which the
objects have in common. |
||
Classification Scheme Item |
Item of content in a classification scheme. |
||
Classification Scheme Item Relationship |
The relationship among items in a
classification scheme. |
||
Region: Data Element |
A unit of data for which the definition,
identification, representation, and permissible values are specified by means
of a set of attributes. |
||
Data Element Concept |
A concept that can be represented in the form
of a data element, described independently of any particular representation. |
||
Value Domain |
A set of permissible values. |
||
Representation Class |
The classification of types of
representations. |
||
Data Element Example |
A representative illustration of a data
element. |
||
D-2 |
The relationship among a data element which is
derived, the rule controlling its derivation, and the data elements from
which it is derived. |
||
Derivation Rule |
The logical, mathematical, and/or other
operations specifying derivation. |
Table
2: Type of Administered Items in ISO/TEC 11179-3 |
|
Name |
Description |
Classification Scheme |
The descriptive information for an arrangement
or division of objects into groups based on characteristics, which the
objects have in common. |
Conceptual Domain |
|
Context |
A universe of discourse in which a name or
definition is used. |
Region: Data Element |
A unit of data for which the definition,
identification, representation, and permissible values are specified by means
of a set of attributes. |
Data Element Concept |
A concept that can be represented in the form
of a data element, described independently of any particular representation. |
Object Class |
A set of ideas, abstractions, or things in the
real world that are identified with explicit boundaries and meaning and whose
properties and behavior follow the same rules. |
Property |
A characteristic common to all members of an
object class. |
Representation Class |
The classification of types of
representations. |
Value Domain |
A set of permissible values. |
APPENDIX
E
XML Registry Requirements Glossary
Appendix
E - XML Registry Requirements
Glossary
API - Application Programming Interface.
Business
Process - a collection of
business transactions between business partners.
B2B - Business to Business.
Classification
Scheme - A classification
scheme is an arrangement or division of objects into groups that are based on
characteristics that the objects have in common, e.g., origin, composition,
structure, application, or function.
CRM - Core Reference Model.
Dataflow - a collection of elements that passes from one
process to another.
DET - Data Exchange Templates.
Distinguished
Name - the name that is
associated with the digital certificate that is being used to authorize a
request to the registry.
Distributed
Architecture - several
registries exist and interact with a “central” XML registry.
ebXML (electronic business eXtensible Markup Language) - defines an entire
e-commerce infrastructure, of which the registry
is an integral part.
ebxmlrr
- OASIS ebXML Registry
Reference Implementation Project.
EDSC - Environmental Data Standards Council.
Information
Model - describes the types of
objects that are stored in a registry, the type of metadata recorded about the
objects, and how the information in a registry is organized.
ISO/IEC
11179 (International Standard) Metadata Registries (MDR) - the International Standard for
standardization and registration of data elements and their components for
sharing and making them understandable.
Module - the module entity represents a segment of
source code that may be used by
many programs.
NEIEN - National Environmental Information Exchange
Network, a.k.a. the "Network."
NSB - Network Steering Board.
Node - An endpoint of a link or juncture common to
two or more links in a network.
OASIS
(Organization for the Advancement of Structured Information Standards) - a nonprofit international consortium that
creates interoperable industry specifications based on public standards, such
as XML and Standard Generalized Markup Language (SGML).
Object - A passive entity that contains or receives
data, e.g., bytes, fields, files, directories, network nodes, pages, programs,
segments, words.
Parser - A program that interprets user input and
determines what to do with the input.
Peer-to-peer
network- A network where there
is no dedicated server. Every computer can share files and peripherals with all
other computers on the network, given that all are granted access privileges.
PKI - Public Key Infrastructure.
Registered
Object - something that an
organization wants to publish for discovery and retrieval. Registered objects may include XML tags
(elements), XML schemas, XML schema fragments, XML datatypes, namespaces,
documents, trading partner agreements, and administrative documents.
Registry - The mechanism used to register, discover and
retrieve documents, templates, and software, (i.e., objects and resources).
Registry
Client - Any user who uses the
registry to discover and retrieve an XML object.
Registration
Authority (RA) - A recognized
expert organization that is responsible for populating and maintaining the
registry.
Registration
status - Registration status will be used to
designate the object’s position in the registration (and review and approval)
lifecycle. Registration statuses will
be based upon the XML Technical Resources Group (TRG) approval statuses, and
includes: working draft, last call working draft, candidate recommendation, and
proposed recommendation.
Registry
client - Registry clients do not have rights to
submit or update registry content, but only have query access to discover and
access content. They have no contract
and do not require authentication to use the registry.
Registry
entry - relevant
descriptive information, or metadata
about a registered object.
Repository - a storage facility for registered objects
with an access method that enables retrieving individual objects, perhaps with
an additional authentication and permission layer.
Responsible
Organization (RO) - a
responsible organization is responsible for coordination of XML objects in a
particular organization (e.g., a program office or a state).
RIM - Registry Information Model. The RIM describes the types of objects that
are stored in a registry, the type of metadata recorded about the objects, and
how the information in a registry is organized.
S-HTTP - Secure Hyper-Text Transfer Protocol.
SSL - Secure Sockets Layer.
Static
Configuration - a
configuration, in which objects can be submitted, but registered objects cannot
be updated and deleted.
Submitting
Organization (SO) - An
individual or organizational element designated to identify and report data
elements suitable for registration. The
entity that originally registered an object.
SOAP - Simple Object Access Protocol. Provides a lightweight messaging format that
works with any operating system, any programming language, and any platform.
Tags - elementary objects of an XML schema, XML tags
are data identifiers enclosed in angle brackets, like this: <...>
TPA (Trading Partner Agreement) - Conditions under
which the partners will transact business.
TPP (Trading Partner Profile) - A description of the
business processes in which each
organization engages.
UDDI - Universal Description, Discovery, and
Integration.
URI - Uniform Resource Indicators.
UML - Unified Modeling Language.
UN/CEFACT - United Nations Centre for Trade Facilitation
and Electronic Business.
URN - a persistent, globally unique name assigned to
an object.
UUID - Universally Unique Identifier.
Version - The
version number is a value that identifies the sequence of changes in
specifications for a data item for audit trail purposes.
VPN - Virtual Private Network.
W3C - World Wide Web Consortium.
WSDL
(Web Service Definition Language) document - a simple XML document, containing a set of definitions for a Web
service .
XML
(eXtensible Markup Language) -
a system for defining specialized markup languages that are used to transmit
formatted data.
XML
attributes - attributes are
normally used to describe XML objects or to provide additional information
about elements.
XML
DTDs - Document Type
Definitions.
XML
Namespaces - A collection of
names, identified by an URI reference, which are used in XML documents as
element types and attribute names. In order for XML documents to be able to use
elements and attributes that have the same name but come from different
sources, there must be a way to differentiate between the markup elements that
come from the different sources.
XML schema - a document that defines the required structure of an XML document and constraints on its content.