Semantic Plug and Play

Preface

This paper has been prepared as a contribution to the Joint Workshop on Standards for the Use of Models that Define the Data and Processes of Information Systems, September 1996 in Seattle, Washington.

The objective of the paper is to identify opportunities for collaboration among organizations whose charter is the development of standards for information systems based on models. In the paper I describe what is currently going on in the standards organizations I have worked in over the last few years, why it is important to business, and how the business goals can be better achieved through the development of a convergent process.

The paper is organized into the following sections:

  • Preface
  • Semantic Plug and Play -- Model-Driven, Interoperable Information Systems
  • 1. Interoperability through semantic plug and play
  • 2. Scenario for rapid interoperation
  • 3. Process of semantic plug and play
  • 4. Architecture for semantic plug and play
  • 5. Some conclusions about semantic plug and play
  • Appendices: Background for Semantic Plug and Play
  • Appendix 1: Concepts of Semantic Plug and Play
  • Appendix 2: Modeling and Modeling Techniques
  • Appendix 3: Current Standards Relevant to Semantic Plug and Play
  • I have tried to show a number of relationships, similarities and differences among existing standards. To do so I appeal to a number of concepts that are relatively common to thinking about information management. Since the terminology used to express those concepts varies significantly across the global audience of the workshop, and since some members of the workshop may find some of the concepts novel, I have provided explanations of them, but moved them to Appendix 4.1, so as not to bog down your reading of the main message. Similarly I have placed in Appendix 4.2 brief descriptions of the work of those standards in terms of those concepts.

    I have taken advantage of the HTML format in which this paper is published by incorporating "hot links" from the text to these appendices. This feature allowed me to minimize the amount of explanation required of a concept or standard in order to talk about it. I have also included URL links, where I am aware of them, to pages on the World Wide Web to provide access to further information about the various standards. I hope you find this organization useful.

    Your feedback is encouraged. Please send your comments to Jim Fulton at jfulton@atc.boeing.com.

    Semantic Plug and Play --
    Model-Driven, Interoperable Information Systems

    Boeing, along with most large corporations, uses many different computing tools to manage vast volumes of information. Competitive pressures require continuous improvement not only in the individual tools, but in their ability to share data, and by so doing to work together, to interoperate. The cost of this interoperation is high. Neither the vendor products available in the marketplace, nor Boeing's home-grown tools are designed to make agile changes in the set of tools they interact with. The task of integration is time-consuming and labor-intensive, and must be repeated with every change in the tool suite. Moreover, the unrelenting competition in the global marketplace means that changes are seemingly always upon us.

    Suppose though that interoperability were somehow built into the tools. How would things be different? One approach is suggested by the current marketing buzzword "Plug and Play"

    1. Interoperability through semantic plug and play

    We hear the words "Plug and Play" and they sound neat, but only real techno-weanies know what they really mean (until we're told that the system we have doesn't really support it). Taking the words at face value we can distinguish at least three kinds:

    1.1. Hardware plug and play

    The concept of "plug and play" has become a major factor in the marketing of personal computers. Customers are thrilled:

    Of course, marketeers are salivating. Especially when indications are that the concept might actually work (with the usual limitations and conditions and exceptions and exclusions from blame, all spelled out in legally unexceptionable language).

    1.2. Object plug and play

    On a larger scale, we witness the emergence of another kind of plug and play in the marketing of architectures that support the sharing of interoperable objects. SQL, CORBA, OLE, ODBC, HTML, JAVA or whatever is the latest bit of alphabet soup to leak from the ad agency's pen (now there's an anachronism) -- all promise us that we can mix and match our information systems on whatever platform we choose. Just install our operating system, buy our software development kits, and just sign here, if you please. Systems users and administrators can't wait:

    If only it worked!

    1.3. Semantic plug and play

    By semantic plug and play (SPP) I mean an architecture in which the relationships among the data manipulated by various applications are managed through the models that define that data and the operations performed upon it. Hardware plug and play promises that the physical components of an architecture can be changed and continue to function and exchange signals as necessary without excessive dependency on the user's knowledge of how to control their interrelationships. Object plug and play promises that different object managers and databases can be assembled to exchange objects as necessary without excessive dependency on the user's knowledge of how each manager controls the objects within its scope. Semantic plug and play promises that different applications can exchange specific types of objects, each with specific roles in each of several applications, without excessive dependency on the user's knowledge of how that specific data functions in those other applications. Again, these promises can be exciting both to users and to administrators:

    How would semantic plug and play work? We can imagine the following scenario:

    2. Scenario for rapid interoperation

    The new analysis application has just arrived from the vendor, and it is your job to install it. More importantly, you have to integrate it with the other applications your company uses. This new tool will analyze data from your CAD (computer-aided design) applications, comparing overall cost and quality factors, and feed the results of the analysis into your materials planning applications. This of course means that it has to be able to find the right data in your CAD tools and produce the right data for materials planning. That your job is to link the data needed by this tool to the data available from those other applications.

    Fortunately, according to the marketing blurbs this tool conforms to the industry standard for this kind of analysis, and will work readily with any CAD and materials planning tools that conform to their own industry standards. However, the tool also offers additional features that provide analytic capabilities that significantly improve its value to the company, and the decision has been made that those features must be implemented. But those features require data that are not included in the CAD standard, and the results are not part of the materials planning standard. So some tinkering is going to be needed.

    You begin the install process. The loading and configuring required for the tool to operate in standalone mode is straightforward. The installer detects that you have common brands both of an object-request broker (ORB) and a relational database (RDB), each with an interface that conforms to its respective standard, and asks which of these it is to use to access CAD data. You select the ORB. It asks the same question about where to put the results of the analysis, and you select the RDB. It asks whether each tool maintains its own object dictionary, and you point it instead to an integrated dictionary in the RDB. The installer examines the object definitions for the ORB and detects that the standard CAD data is available. It asks whether you want to accept the standard mappings for the CAD data, and you agree. It then asks whether you want to make use of the extended feature set. When you answer affirmatively, it informs you that additional CAD data is needed beyond what is needed in the standard. Here the tailoring begins.

    During the planning that led to the selection of this tool, the business view of the data manipulated by the tool was mapped to the business view of data used by other tools, and the engineering process was modified to assure that the right data was captured in the CAD tool. Furthermore, an approach to solving some of the implementation problems was selected: some of the data, which classifies parts in ways not defined by the standard, will go into minor text fields that the standard provides primarily for documentation purposes; the rest, numeric arrays that are a natural by-product of the engineering process, are already available in the CAD tool, but are not included in standard exchanges. These revisions to the CAD data model have already been documented in the dictionary. The installer examines the object definitions and produces a list of object types whose physical data representations are compatible with the data needed for the additional features, including the definitions you had already placed there for this purpose. Even with those definitions, however, the semantically correct mapping is not one that the installer can ferret out for itself with accuracy, but it finds and invokes a synonymy analyzer in your dictionary, and produces a set of alternative mappings. One by one you select the object types that have been allocated for these purposes.

    When you come to the minor standard fields that are being adapted to this purpose, the installer informs you that those fields can be used only if their contents are restricted more tightly than the standard permits. Since the CAD tool allows anything permitted by the standard, the installer asks if it can take over the management of those fields by replacing the modules the CAD tool uses to put data into them. You say no (under certain conditions, engineering needs to be able to put values in those fields other than what this analysis tool allows), and the installer informs you that it will not be able to perform its special analyses for a CAD design unless the values in these fields meet its requirements. It asks if it can put a wrapper around the modules that modify those fields to issue a warning when values are inserted that do not meet its requirements, and you agree. It also does a quick check of the data already collected in those fields by the CAD tool, and tells you that 83% of those fields contain data that prevents the use of the special features. It asks you if you want to install a conversion add-in to the CAD tool that will help the engineers to find those fields and reset the values where appropriate; you say yes.

    You go through a similar process for all the CAD data the tool requires, and again for all of the data produced by the tool to make it available to materials planning and other downstream applications. The effect of all of this is to map the definitions of the data manipulated by this tool to the definitions of the data already being manipulated by other applications. When you are finished with this process, the installer asks if you are ready to complete the installation. When you select "deferred installation" the installer generates the changes to the database schema needed to accommodate the storage of its results, and compiles all the linkage modules necessary to share the selected data with the other tools. It then gives you the name of the installation routine to be executed later that night when it won't conflict with users. The next day some engineers begin using the conversion add-in to enable the analysis routines; others start performing the analysis, and by week's end the materials planning people are seeing benefit from the improvements the analysis makes in their process.

    3. Process of semantic plug and play

    The insertion into the architecture of tools that are designed for semantic plug and play should follow a process very similar to the scenario described above. If we strip away the anecdotal character of the scenario and ask what such a process would really involve, we get something like the following:

    1. Explicit models of the s required and provided by each tool, are mapped to the models of the enterprise:
    2. The mappings between the models are then compiled into the code that links the tool into the architecture:
    3. The tool then operates as though its view of the enterprise were the only view there is.

    As a result the tool then interoperates with all the other tools in the environment.

    In order for a such a process to work,

    4. Architecture for semantic plug and play

    Semantic plug and play will require some significant improvements to existing technology, which will require a cooperative interaction between tool users, tool vendors and industry standards. It is unreasonable to expect that any but the largest of corporations (if they) can afford to build and maintain a model integration facility of the kind described above on their own. It is also unreasonable to expect that any vendor can offer a commercially viable version of such a facility unless each of the components in the architecture conform to an appropriate set of standards. In what follows I will describe the kinds of technologies and standards that seem to be required to achieve a semantic plug and play environment.

    4.1. Modeling languages

    The models that define the data, processing, object and knowledge services in the view of any application must be expressed in languages that conform to appropriate industry standards. For an MIF (model integration facility) to function as described above, it must be able to compare the models delivered with a tool with the models already available in its local enterprise repository. This means that tool models must be delivered in a form that the MIF can recognize and translate into the form used by local integrators.

    This does not mean that there will be a single standard modeling language. Modeling languages, as seen by the creator or user of a model, are presentations of views of information systems appropriate to different phases and tasks of the life cycles of those systems (i.e., modeling languages present views of information systems, which themselves present views of the business). It is not likely that any one model presentation language can supplant all of the modeling languages currently available. Even for a particular modeling domain, e.g., data modeling, competition among modeling tool vendors will create evolutionary pressure on the languages.

    What a tool must provide, however, is not the presentation of its models but their public interface. That is, the tool must provide a representation of the models that can be imported into the MIF and presented through whatever model presentation tools have been adopted. This requires that the tool vendor and the tool user adopt modeling languages that conform to a common standard for public interfaces.

    4.1.1. Modeling language standards

    Today if vendors wanted to provide models to their customers, they could do so through any of several standard forms:

    Unfortunately, these standards are relatively immature and are not fully implemented in the modeling tools available on the market. Moreover the standards are overlapping and non-interchangeable. A vendor who selected a CDIF-based tool could not supply models to an EXPRESS-based MIF, or vice-versa. But vendors should not have to know or care whether their customers model in CDIF or EXPRESS or IDEF, any more (now that STEP is in place) than CAD tool vendors need to know which product data managers are used by their customers.

    4.1.2. Requirements for modeling languages

    To achieve semantic plug and play in a form that allows vendors and customers relative freedom of choice over modeling tools, there needs to be a harmonization of standards for modeling languages:

    4.1.2.1. Standard public interface

    A standard public interface for modeling tools would allow tool vendors to use any standards-conformant modeling tools to define the services that their tools provide or require, and to make those models available to any customer regardless of the modeling tools used by their customers.

    4.1.2.2. Standard integrated semantics

    A standard integrated semantics for the public interface should allow the exchange of multiple interrelated views of the tool and its role in the information system. It may be convenient presently to exchange data models, processing models, object models and knowledge models as separate units of modeling information. Those are the modeling techniques currently in use, and many vendors may well be able to specify fully the services they require from and provide to an environment using only one of those models. However, complex tools require the ability to specify interconnections between the definitions in these models:

    4.1.2.3. Process-driven modeling techniques

    Standards organizations should strive to develop process-driven modeling techniques. Every STEP application protocol (AP) is driven by a consensus business process, as expressed in an application activity model (AAM). The process is defined generically so that it can be tailored to the details of any enterprise, but the process provides a context for deciding what kinds of data services must be exchanged through a STEP standard.

    Modeling techniques and languages, and the standards that codify them, seem to be driven more by religious fervor than by either science or consensus (although the recent trend toward using or incorporating formal logic into modeling languages is a hopeful sign). It is time that the work on measurable, repeatable software engineering processes, which is going on in part in ISO/IEC JTC1/SC7, be used to provide a similar foundation for modeling languages.

    It is perfectly reasonable that a vendor should promote a modeling tool as the best approach to defining the functionality of an application, because the promotion addresses its visible compromise of presentation, functionality and performance. It is not reasonable for standards organizations to engage in competitive non-cooperation in formulating a standard for exchanging models.

    4.2. Model mapping

    A critical step in a semantic plug and play architecture is the semantic mapping of the services of one component to those of another (or to the conceptual schema that brokers the exchange of services between them). Unless an application specifically states that a service is defined in terms of an unrestricted industry standard, the mapping of the services of one application to those of another will invariably require the knowledgeable participation of experts in the applications, both in the tool and in the business processes that the tool supports.

    The objective of model mapping is to create rules by which instances of data, processing, object and knowledge services defined in one view can be linked to instances defined in another view. Such a linkage should allow those services to be physically provided by one application (the server) but accessed by another (the client). Model mapping facilities are needed whether the client handles its own interfaces to the various servers that support it, or negotiates the services it requires through a conceptual processor that hides the details of physical distribution of services among the various applications.

    Regardless of the tools or techniques used initially to define the views, the need for view mapping requires that those definitions conform to a modeling standard, which can be input to a model integration facility (MIF) in which the services required by each client application can be mapped to the services provided by the server applications. Once the mapping is accomplished, the view mapping facility should be able to generate the code that enables services requested by the client to be provided by the server.

    4.2.1. Model mapping Standards

    At present the integration of a tool into an environment is not accomplished through model mapping, although reverse engineering a model from a tool might be used as a step in that process. Even if tools came out of the box with complete models, there are neither technology nor techniques nor standards available to use those models to support the integration process. Instead tool integration is achieved at the implementation level by writing translators or APIs that physically translate a client view into a server view. Modeling tools that might provide cost or impact analysis are not generally used to accomplish such a mapping.

    Some modeling languages, such as EXPRESS and CDIF, have capabilities that could be used to relate the objects in one model to those in another, but those capabilities were not designed for the application of modeling to semantic plug and play environments. However, ISO TC184/SC4 has done some work in this area:

    4.2.2. Requirements for model mapping

    The requirements for model-mapping languages are similar to requirements for modeling languages in general.

    4.2.2.1. Language-independent mapping languages

    In a semantic plug and play environment, semantic mapping is performed by an enterprise to map a tool into an enterprise model. The model integration facility (MIF) that supports that mapping needs to be able to import models in the format of the public modeling interface, whatever their source. Thus the MIF must be independent of the presentation language in which the models were built.

    4.2.2.2. Standard semantics for mapping languages

    Moreover, the mappings themselves are shareable information. An international enterprise that performed a mapping at one site would presumably like other sites to take advantage of that work, regardless of the commercial MIF selected at those other sites. Moreover, having built up a set of mappings from various tools over a number of years, the enterprise would like to be able to migrate those mappings to the new and improved MIF product that it purchases.

    The interchange and migration of mappings between models, as well as the models themselves, requires an industry standard for the public interface for the mappings. As with modeling languages themselves, the presentation language that an MIF provides to create and view mappings is a competitive feature of the MIF, as long as the presentation is equivalent to the public interface.

    4.2.2.3. Process-driven mapping language

    Just as I recommended above that standards for modeling languages should be driven by the requirements of the entire life cycle of software engineering, I recommend here that standards for model-mapping languages should be driven by the requirements of the software engineering processes that are sensitive to those mappings. Certainly one such process is the generation of the code to implement service exchange, a part of the semantic plug and play installation process. But this may be just one of many views of the use of those mappings. The software engineering process being examined by JTC1/SC7 should be reviewed to determine whether other processes might be sensitive to those mappings, and what other information should be associated with the mappings to serve those processes.

    4.3. Model implementation

    For semantic plug and play to work as described above, the process of mapping the models of a new tool into the enterprise models must result in the tool's interoperation with the other tools in that environment. To achieve this at minimal cost, the services required or supported by the tool, and the mechanisms by which the tool accesses or provides those services, must be implicit in the models. Furthermore the infrastructure embedded in the MIF (model integration facility needs to have the functionality to realize those implicit services with real ones.

    4.3.1. Model implementation standards

    Industry standards currently support three approaches that I know of are for making explicit the services that are implicit in models:

    4.3.2. Requirements for model implementation

    In order to create an semantic plug and play environment, the modeling infrastructure needs to be improved in several ways supported by standards.

    4.3.2.1. Modeling language-independent compilers

    Any processes that operate on models should be implemented in terms of the standard public interface for the models, not in terms of the presentation provided to modelers . For example, it should be possible to generate the pre- and post-processors for a STEP Part 21 file format, from any standard data model, whether that model be constructed using EXPRESS or any other presentation language. And the same principle holds for SQL and SDAI.

    Of course we can approximate that independence by extracting an EXPRESS file from the integrated models, and then passing that file to an EXPRESS compiler. But tools tend to gravitate toward more complex data structures, that allow them to pass on to users more precise decision-making information.

    4.3.2.2. Integrity preserving exchange mechanisms

    There should be a standard set of integrity-preserving access functions derivable from any data model, where "integrity-preserving" here implies preservation of consistency, not necessarily preservation of completeness. The ability to restrict access to such functions is critical to protecting data in a shared environment. The primary work of the model integration facility (MIF) will be to link these access functions from the clients through the view mappings to the servers. Functions outside this standard set will normally require a controlled design process, and special protections for the data they access.

    One of the chief problems for a tool is the reconciliation of the integrity constraints built into the tool as purchased with those that have been established in the enterprise. An example might be a text-valued field intended to support user-classification of parts. The standard in which the field is defined might make no restriction on the values of this field, other than to specify its length, and a tool that conforms to the standard would in turn impose no restriction. But a given enterprise might well decide to impose a usage for this field that restricted it to values that could be used to pass data critical to down-stream processing. (The semantic plug and play scenario above suggested such a situation.) How does the tool learn of and implement this specialization of the field? Several possibilities need to be explored:

    It is premature at this point to speculate which of these approaches is best suited for standardization.

    4.3.2.3. Compilable Model-Mappings

    The mapping between tool models and enterprise models specifies how services are to be exchanged among the tools used by the enterprise. For semantic plug and play to work economically, this mapping must be compilable, i.e., there need to be tools that compile the mapping into linkages between the services of a tool and services of the enterprise.

    One approach to this compilability would look something like this:

    Note that in this section I talk about compiling the mappings, not the mapping languages, although this amounts to compiling the standard public interface for the mappings.

    4.4. Application domain standards

    If a tool defines its view of the world from scratch, making no attempt to reuse publicly available definitions of the data, processing, object and knowledge services that it exchanges, even if those definitions are made available in a standard language, then the tool can hardly be said to support semantic plug and play. Even when tool developers are not extending the state of the art but only implementing concepts that everyone agrees are "implicit" in the domain, the task of resolving differences in terminology, description and details of representation for a large and highly interrelated collection of data is an extremely arduous process.

    On the other hand, when a tool implements an existing industry standard that has already been incorporated into an enterprise, then the focus of integration can be directed to the differences between the tool and the standard, i.e., to the extensions to the standard and to the unsupported features of the standard. This is a much more manageable task.

    4.4.1. Current application domain standards

    A computing tool almost invariably provides a proprietary exchange mechanism that enables some level of data sharing among its different installations. When different users select different tools, however, data sharing depends on the agreement of the vendors to adopt some sharing mechanism as a standard. For a while the exchange format used by the dominant vendor might suffice as a standard, but the cost to other vendors of keeping up with changes in this format invariably lead to a breakdown in such informal arrangements, and if market pressure is sufficient, a formal standard is sought.

    For example, the various application protocols (APs) defined in STEP (ISO 10303 -- Standard for the Exchange of Product model data) define the data requirements of a number of product data domains. STEP APs provide a working mechanism for domain-specific data sharing. Recently ISO TC184/SC4 has been exploring how the STEP APs, together with the various parts of the Parts Library standard (PLIB), can be made to support cross-domain data sharing (AP interoperability), but no long-term approach has yet been adopted.

    STEP Architecture versus a Three-Schema Architecture. On the surface the STEP architecture looks to be one in which sharing of services through views of a single source of services would be a natural consequence. Every application resource model (ARM) of every AP is mapped through an application interpreted model (AIM)to the STEP generic resource models (GRM). To the uninitiated this looks like a three-schema architecture, with an interface between a conceptual schema and several external schemas; but there are significant differences:

    STEP has developed a mechanism for data sharing among APs: an application interpreted construct (AIC) is a model that is shared among multiple AIMs in a way that allows sharing of instances among those APs. But the value of AICs, given the potential of the GRM as a true conceptual schema, is to my mind negligible!

    4.4.2. Requirements for application domain standards

    Certainly the process of developing application domain standards should proceed as rapidly as possible. Such a process has many benefits, including the following:

    To play an effective role in a semantic plug and play in a business arena that allows vendors and customers relative freedom of choice over modeling tools, the standards for application domains should support, promote and exploit the convergence of standards for a modeling infrastructure.

    4.4.2.1. Convergence with modeling language standards

    A domain standard defines the format and interpretation of a public interface that enables the exchange of information between tools that implement that domain in different (or the same) enterprise. The definitions of the domain by the standard should themselves be made available through a standard public interface for modeling tools. However, this does not imply that the models that define the standard must be either created or used in a single standard model presentation language. When STEP first started participants used a variety of modeling languages - ER, IDEF, NIAM, etc. - which were basically similar but differed in details (and which tended to enflame religious disputes between modeling ideologies). More importantly, at that time there was no public interface that allowed models to be exchanged between implementations of these languages.

    I participated in a PDES (Product Data Exchange using STEP--the US activity contributing to STEP) project whose goal was to enable such an interchange of models. That project concluded that since models were in reality only sets of rules, the only adequate mechanism for exchanging models would have to be logic-based, for that was the most comprehensive formal foundation for rules. We expressed that conclusion in the Technical Report on the Semantic Unification Meta-Model, Volume 1: Semantic Unification of Static Models [ISO TC184/SC4/WG3 N175], and referred to the Semantic Unification Meta-Model as the "SUMM". I fear this was another case of the right answer and the wrong solution. We completed a technical report but no mechanism for actually converting models from one form to another. In the meantime STEP went though a shift in procedures and began modeling entirely in EXPRESS, and the market for model exchange within STEP disappeared.

    Thus was EXPRESS born of necessity, to serve both as a presentation language and as a public interface. In this dual role it serves a number of purposes:

    And this has spawned a market for EXPRESS tools, with EXPRESS-G presentations, model editing capabilities, and compilers that generate database schemas, parsers for STEP Part 21 exchange files, and (on the horizon) APIs that conform to SDAI (STEP Part 22--Standard Data Access Interface). So STEP is wedded to EXPRESS. (As with most marriages, this one is not without its marital spats and demands.)

    However, EXPRESS was not designed around the processes by which models are created, integrated, analyzed, shared, reused, adapted, and managed within a large enterprise. Much less was it design to support an environment in which different processes require different views of models. In such a context, the use of an public interface as a presentation language, begins to fail to meet user requirements. In those enterprises what are valuable in STEP (or any domain standard) are the following facts (roughly in order of importance):

    1. The standard models dictate a public interface for the domains within which enterprises need to exchange data (and it helps that there are tools that can compile the public interface form of the models into the code that will manage the public interface for the domain being modeled).
    2. Vendors of tools for a standard domain can be persuaded to
    3. The models can be obtained through a standard public interface for models.

    What is not important to an enterprise is that these models were created or are used within the enterprise in the form of EXPRESS as a presentation language. Any presentation will do, as long as it meets the needs of the processes that use and create models as the product of their activity. An integrated standard for modeling is much more likely to achieve this. As with most marriages, STEP's relationship to its modeling technique will be the stronger when is attracted to the reality, not the image.

    4.4.2.2. Convergence with model-mapping language standards

    Similarly, no standard, including STEP, should develop its own parochial approach to mapping one model as a view of the other. The need for such mapping is generic, it applies to all tools, whether they support end-users or modelers or systems developers; it is certainly not specific to product data, or any other application domain. It should be possible to incorporate application domain standards into an enterprise with the same kind of model integration facility (MIF) that is used to incorporate new tools. That means that the model mapping language used in the development of standards for an application domain, should not be specific to that application domain, but should be a generic standard.

    4.4.2.3. Convergence with standards for a three-schema architecture

    The most critical requirement for application domain standards is to provide a mechanism by which individual domains can interoperate with others. This kind of interoperation cannot be based on the assumption that the interoperating domains are all standardized within the same framework. Industry needs to ability to plug in new technology, regardless of whether it implement an application domain standard. The approach taken by STEP to render its own APs interoperable must be compatible with the standards we need to implement semantic plug and play for technology in general .

    The three-schema architecture is probably the firmest foundation that standards can give to semantic plug and play. STEP's mapping of APs to the GRM (Generic Resource Models), especially with the introduction of AICs (application interpreted constructs) is, as we have seen an incomplete step toward a three-schema architecture. This direction should be followed to its conclusion, in cooperation with the work being done on Conceptual Schema Modeling Facilities (CSMF) by JTC1/SC21/WG3, should lead to effective standards and technologies to support the three-schema architecture.

    5. Some conclusions about semantic plug and play

    Let me conclude by summarizing my suggestions for specific improvements to standards, and then tying it all together in a grand synthesis, just like they taught us to do in school.

    5.1. Summary of standards required for semantic plug and play

    To review let me briefly repeat the requirements described above for improvements in the standards for modeling and the use of models.

    1. We need the following improvements in standards for modeling languages:
    2. We need the following improvements in standards for model mapping:
    3. We need the following improvements to the infrastructure by which models are implemented:
    4. We need the following improvements to standards for application domains:

    5.2. What is really needed: cooperative progress toward standards for semantic plug and play

    You will have noticed from what I have said is that I believe that some existing standards don't stand up to all my theoretical biases. What you might not have noticed is that I don't fault any of the organizations for these "shortfalls". Theoretical purity prevents compromise when compromise is needed, and invariably overlooks practical needs and obstacles that an ecumenical approach can usually recognize and provide for.

    The question now is how to move toward an environment that allows not only sharing of services not only across STEP APs, but across any domains where an enterprise finds such sharing useful, whether those domains be standardized by STEP, by some other standard, or not at all. The primary thesis of the paper is that semantic plug and play is an achievable approach to such sharing, that builds upon techniques pioneered by STEP, by CDIF, by SQL, by CSMF and by others.

    By way of a straw horse for consideration, let me suggest the following:

    Organizations such as TC184/SC4, which are developing application domain standards, should cooperate with organizations such as JTC1/SC7 and JTC1/SC21, which are building infrastructure standards, in the development of a joint standard for an implementable three-schema architecture, in which the relationship between application views and an single source of services (SSS) is well defined by models. Such a standard would allow individual application domain standards to be implemented on a stand-alone basis, thereby enabling semantically precise communication among specific classes of tools. But it would also allow those tools to be plugged into the semantically rich backbone of an enterprise and play a critical niche role.

    Such an architecture requires a much more extensive consensus among the cooperating organizations about what the architectural components are, how they are interrelated, how they are defined in models, what standards apply to what components, and who is responsible for what standards. This kind of consensus cannot be achieved through the casual, occasional and superficial monitoring that typifies most liaison activities. Although organizations readily agree to the establishment of liaisons, all too often the message being communicated is "We'll open our doors and let you see how we are doing things right, and then you can follow our lead and stop your inane folly." Rarely, I should say "never in my experience", has a liaison been able to trigger a profound synthesis of two standards organizations, when both had the inertia of existing standards.

    To achieve this consensus will require each organization to make a major commitment to a cooperative joint process, and to recognize that not only do they have something critical to contribute to the process, but something to gain. The process must provide a manageable, politically sensitive compromise between the needs of each organization to carry out its program to provide timely standards that meet immediate needs, while at the same time taking meaningful steps toward a convergent architecture. What can be done to initiate this process? Possibilities include

    Appendices: Background for Semantic Plug and Play

    The following sections provide background material for the above discussion of semantic plug and play:

    Most of you will be familiar enough with most of this material (enough more than likely to find some errors, or at least points of disagreement, with what is said here), but the audience for this paper is so diverse that I could not rely on an assumption that it was so familiar that it needn't be said. By way of compromise I have moved this background material here to an appendix and provided hot links to it from the text above.

    Appendix 1: Concepts of Semantic Plug and Play

    In order to describe the architectural requirements for semantic plug and play, it is useful to make a number of distinctions. Please accept my apologies if this section seems somewhat didactic. My goal is to describe how to make semantic plug and play work in a technologically heterogeneous environment. Hence I want to describe the various technological cultures that make up that environment with as little prejudice as I can muster for or against any of the technologies or paradigms.

    A-1.1. Information Services for Semantic Plug and Play

    Applications interact by providing services to one another. These services can take any of several forms: data, processing, objects or knowledge.

    A-1.1.1. Roles in the Exchange of Information Services

    When a computing architecture is built so that several concurrently operating hardware components are arranged so that one provides common services to the others, it is called a client/server architecture. It is useful to generalize this terminology somewhat to distinguish the role of applications in all the various exchanges of services that are required:

    An application can be a client for some services and a server for others. Moreover, an application might initially invoke a service from a server, and then repeatedly use the results of that service without recurrent requests to the server. It might even put those results into another application, e.g., a local database, so that it can retrieve them more efficiently.

    A-1.1.2. Mechanisms for the Exchange of Information Services

    If an application needs a service performed by some other service provider, whether the service involve data, processing, objects or knowledge, it can achieve that objective in any of several ways:

    A-1.1.3. Kinds of Information Services

    Although I have spoken of data, processing, objects or knowledge services as though they were just special cases of a general concept of service, this is really a case of generalization not specialization. The mechanisms for sharing data, processing, objects, and knowledge have for the most part evolved independently and to some extent in competition with one another. To provide an adequate view of what those services are, we need to examine each type of service individually and then look for potentially valuable generalizations.

    A-1.1.3.1. Data Services

    Data are any putative representations of facts, whether they be sentences in a natural language, labeled diagrams, or structured sequences of binary ones and zeros in a computer. The simplest and most common form of service between applications is the exchange of data. Successful exchange requires that both sender and receiver agree on the format of the data to be exchanged, and also to some criteria for determining that a particular packet of data is the "right" response to a particular request. It is not required that the sender know what the receiver will do with the data, nor that the receiver know how the sender got the data in the first place.

    Although much of the debate about data architectures today seems to be couched in terms of the relational versus the object-oriented paradigm, from the standpoint of the applications, the real question is one of deciding how best to access the data that the application shares with other applications. There seem to be three options: file exchange, database query, and database navigation.

    A-1.1.3.1.1. File Exchange

    File exchange is the oldest form of data sharing among applications, and is the simplest form to set up initially. Although applications might use the exchange format as its primary data structure, this normally only works with relatively simple applications. Normally an application will translate the exchange form into an internal data structure that allows it to provide competitively high performance.

    Reliance on file exchange as the mechanism for data sharing tends to create a one-schema architecture, with the consequent N-Squared Problem. Nonetheless, because of its simplicity it is commonly used. Part 21 of STEP defines the format of a file to exchange instances of data conforming to an EXPRESS model, and hence conforming to the specific EXPRESS models that define the STEP application protocols. It is currently the only form of data sharing within STEP that has been made an international standard, although a draft international standard is under review for another mechanism, which is mentioned below.

    The ripple effect of the N-Squared Problem seems largely nullified when the exchange format is standardized. Internal application data structures have to evolve in ways that allow them to continue to exchange the standard data format. Problem still remain however, especially when a company needs to support cross-domain data sharing:

    A-1.1.3.1.2. Database Query

    Access through a query language is the hallmark of the relational database paradigm. Of course the word 'query' here is misleading, since query languages are used to add and alter data in a data base as well as to answer questions. A RDBMS (relational database management system) typically provides a query language that is a dialect of the standard for the Structured Query Language (SQL), where 'dialect' implies both additions ("enhancements") to and subtractions from ("unsupported minor features of") the standard.

    The database query approach to data access enables a single database to support data sharing among multiple applications. Databases can be designed to integrate the data from all applications, and each application can access just the data it is interested in by tailoring the queries it makes to its own purposes.

    The essential feature of the query approach to data access is that it is "set-oriented": a query retrieves (or updates) the data specified about each member of the set of objects selected in the query. Queries about individual objects are special cases of this set orientation, in which the selection criteria specifies data values that are unique to the individual, i.e., key data: the classical search mechanism still examines all the rows of the table to find "all" the rows with the specified value. Of course, the only RDBMSs that survive in the marketplace are those that replace the classical search mechanism with an indexed search.

    The set-oriented approach works well when all the data required by the query can be "normalized" to a single two-dimensional table, or at least to a small number of such tables. When relationships are required between tables, the classical solution is to perform a "join" to form a virtual super-table that combines the data from all the joined tables in a highly redundant form. The classical form of the join process was extremely poor in performance, and commercial relational databases have competed in finding proprietary ways for using indexes and other techniques to support cross-table queries with acceptable performance.

    A-1.1.3.1.3. Database Navigation

    Although query languages are commonly associated with the "old" relational database paradigm, they are not the original form of database access. When database first emerged as mechanisms for sharing data among applications, they provided two essential components

    The library of data access functions is known as an application programming interface (API). Such APIs are the typical form a data access in the "new" object-oriented database paradigm. Although a version of the SQL standard is being written to support access to objects in an ODBMS (object-oriented database management system), it is not yet clear that use of a query language for will ODBMS will become as wide-spread as it did for RDBMSs.

    A-1.1.3.2. Processing Services

    Although theoretically each application could do all of its own processing, i.e., perform all the transformations required to meet the objectives of the business that are its responsibility, this is generally an expensive practice. Once a computing process has been designed, implemented and tested, it is likely to be less costly to reuse that process in other applications than for them to implement their own versions. Reuse can take place either at compile-time, in which case a copy of the code is inserted into the reusing application, or at run time, in which case the reusing application (the client) asks the original application (the server) to perform the process for it.

    A-1.1.3.3. Object Services

    An object, as defined by the adherents of object-oriented design and programming, is an encapsulated package of data and processing. Object services enable one application to invoke the data and processing services of an object without itself having to contain the code to perform those services. All that is required is a set of object interface services, for which there are several competing emerging standards: OLE, CORBA, ODBC.

    "Object" seems to be one of those words whose application to information processing has made them emotively non-neutral. Everyone has a strong opinion about what they mean, whether positive or negative. Advocates seem to be able to use the words in glowing ways demonstrating how obvious it is that they have a new and better way of doing things, that this is the way of the future, and that funds should be diverted from obsolete approaches to the new paradigm. Detractors object that the hyperbole for the new approach is just old wine in new bottles, i.e., new words for the same old thing, that they can do with their old techniques anything that the new approach can do, and that what little real innovation is to be found was something they were thinking about adding to their approach anyway.

    However this debate turns out, it does seem clear that object definition involves a "packaging" or "encapsulation" of data and processes that provides a kind of cohesion that is not normally found in traditional information systems. This packaging often involves implementations that presume a kind of inter-object messaging in order to be executed. Thus even if the details of object definitions are very like data and process definitions, at the very least the packaging will need to be defined in order for the semantic plug and play process to map the objects to sharable enterprise objects with any accuracy.

    A-1.1.3.4. Knowledge Services

    The work "knowledge", like the word "object" discussed above, seems to be emotively charged in its application to computing systems, as in knowledge-based engineering (KBE) or knowledge-exchange. However we react to the hype surrounding the word, there does seem to be a potential for changes in the kinds of interactions between computing systems.

    The key word that typifies a knowledge-based system and distinguishes knowledge-exchange from data exchange is "rule". Of course all software is constructed of rules; a program is always a set of rules. Ordinary programmers, not LISPers, just don't happen to use that word. But that misses the essential point. It is not that KBE systems are built with rules; it's that they exchange rules.

    Contrast a traditional exchange of geometric data between CAD (computer-aided design) tools, perhaps through STEP, and an exchange of "knowledge" between KBE tools. Where the CAD tools exchange the data needed to reconstruct in the receiving tool the geometric specification of a particular part, the KBE tools exchange rules that define a whole class of parts, from which the geometry of that particular part can be generated by specifying appropriate parameters.

    The rules to be exchanged can vary from parametric descriptions (functions that generate specific parts from particular parameters), to the programs that implement such functions, to sets of axioms that can be fed as input to an "inference engine". Usually exchanges presume a common "ontology", i.e., a predefined set of terms (data definitions), functions (executable processes) and background rules (axioms), so that what is exchanged is only the rules particular to a specific application of the ontology.

    Knowledge services therefore would include the ability to exchange, combine, verify, and execute sets of rules, to invoke "inference engines" to those rules to determine consequences, and to apply them to local data to derive instances.

    A-1.2. Approaches to Client/Server Architectures

    The complexity of a semantic plug and play architecture depends to a large extent on the diversity of the applications that must interoperate within it. It is inherently more difficult to support interoperability when the mix of applications is varied and changing. To bring this dimension into our discussion, it is useful to distinguish domain-specific data-sharing from cross-domain data sharing:

    Domain-specific data-sharing takes place among applications supporting users who do the same kind of thing. For example, structural engineers define the geometry of parts that make up airplanes, cars, ships, etc. The computing tools that support that activity are commonly called CAD (computer-aided design) tools. Different companies (sometimes different divisions of the same company) select different CAD tools to support this activity, and build their business processes around the tool they select. These companies (and divisions) have found that they need to exchange data between these tools because they define the parts that are exchanged. The information needed by the engineer in the company buying the part is much the same as that needed by the engineer in the selling company (i.e., of the same domain), but the different representations in the different tools has made that exchange difficult.

    Cross-domain data sharing takes place when different kinds of applications have to share a common subset of data, or to put it another way, when they work on different subsets ("views") of a logically common pool of information. For example a CAD tool might have to share data with an engineering analysis tool. The analysis tool you installed in our make-believe scenario above, does not do structural design, but it uses some of the results of structural design to perform certain analyses, which the structural engineers might want to reflect in the drawings the produce to present their designs. So the CAD tool might send certain geometric attributes of parts identified by certain key data to the analysis tool, which will return, say, the values of a stress analysis for the CAD tool to display using its visualization capabilities. Although it is possible for a single vendor to support multiple domains with a tool suite integrated through a proprietary data format, it has often been found that such suites achieve integration at a significant cost in functionality, relative to comparable standalone products. Industry has tended to select tools from different vendors that do their respective jobs well, and to integrate them with some home-grown procedures and interfaces.

    A-1.2.1. Data Sharing through Views

    Cross-domain data sharing implies an environment in which most of the applications use a portion of the data available. Indeed, for most objects, it is unlikely that any one application will use all of the data about objects of that kind. I will use the word "view" to refer to a perspective of a business that distinguishes its information requirements from those of other perspectives. For example, engineering, manufacturing, marketing, purchasing, etc., are all major perspectives, i.e., views, of a business. Each view

    It should be noted that this very approach means that one cannot expect a precise definition of what a view is: just as a view is one among many perspectives of a business, which is implemented to facilitate the operation of the business from that perspective, views are multi-dimensional objects that admit of multiple perspectives. Whether it is very like a snake, or very like a rope, or very like a wall, or very like a tree depends on what part of the elephant you grasp.

    A-1.2.2. Three-Schema Architecture

    The classical three-schema architecture (3SA) organized the definition of data into three kinds of schemas:

    The objective of these distinctions was an architecture would enable every application, external or internal, to interact in its assigned role through a single two-way interface to the conceptual processor, which would effectively provide a virtual single source of data (SSD).

    A-1.2.2.1. One-, Two- and Three-Schema Architectures

    It was this single interface that distinguished the three-schema architecture from a one-schema architecture or a two-schema architecture:

    Assuming the technology can be made available, a semantic plug and play environment is easier to establish in a three-schema architecture. The model of a tool need be mapped only to the conceptual schema, and only one interface need be constructed.

    A-1.2.2.2. Three Schema Architectures for All Information Services

    The distinctions among these architectures was originally defined in terms of data, but there is no reason in principle why it does not apply just as well to all information services. A conceptual schema can provide definitions not just of the data services but also of processing, object and knowledge services. With the proper directory a conceptual processor can serve as a broker for all of those services, thus creating a single virtual source of information services, or more compactly, a single source of services (SSS).

    Such a generalization tends to blur the distinction between internal and external schemas. The conceptual processor would route requests for processing, object and knowledge services to applications that in a classical three-schema architecture would be classified as external, since that is where data conversion is done. Of course that distinction has been fuzzy anyway, since commercial applications, especially those implementing current computer-aided design (CAD) standards (STEP Part 203 - Configuration Controlled Design). Increasingly these tools are not only supporting users directly in their design activities, but also providing access to the data from those activities through a STEP interface, thereby serving as both an external and an internal processor. This blurring of distinctions is no major loss to the objectives of the three-schema architecture. What is important is that a client application negotiates its service requirements through a single server (the conceptual processor) that in turn routes service requests to the original service providers.

    A-1.2.3. Federated Applications

    Although there are a number of data management systems that claim that they can play the role of internal processor, and perhaps at least part of the role of conceptual processor, there seem to be few applications that are designed to play the role of external processor in the classical sense, i.e., few are designed to plug into a data management architecture and access the data they require through the facilities of the conceptual processor. Each, it seems, wants to have control of its physical data structures.

    There are some good reasons for this:

    Federated data architectures seek to achieve an adequate measure of data sharing without compromising the responsibility of each application in the network to meet its own users' performance and functionality requirements. In such an architecture, each application decides

    For any data that the application decides to acquire from elsewhere, the application is a data client; for any data that the application decides to provide access for other applications, the application is a data server. Data managers might well be important data servers in this federated network, but unless they implement all of the services of a conceptual processor, they have no distinguished status as far as the network in concerned.

    The question still remains whether

    Since some data is invariably stored multiple times in a federated architecture, any federated conceptual schema will need to establish rules that define which physical copy is the master, i.e., the copy that is by default retrieved to fulfill requests for that logical data. Even more difficult is the negotiation of a process for resolving differences in changes that are made by users of different applications.

    A-1.3. Representations of Views

    A schema defines the kind of services required or provided by a tool, or (in the case of a conceptual schema) shared among a collection of tools. Schemas are commonly captured in data models, enhanced to some extent with processing, object or knowledge models. These models specify how the business objects of interest to the users of the tool are to be represented and manipulated. If we look closer, however, we find that there are at least three kinds of representation that must be defined for a tool:

    Standards are normally concerned with public interfaces. For the most part the effectiveness of a presentation is part of the competitive advantage that a tool will use in its marketing. However, there are some issues about the relationships among these representations that are of importance in defining standards for semantic plug and play.

    The symptom of these issues lies in the difference between the information in the presentation and the information in the public interface. Vendors are used to building tools that work in a standalone fashion; interfaces that allow exchange of data to other tools are usually developed (if at all) as afterthoughts, and often with concern that to provide a full interface, i.e., a public interface that fully captures the presentation, is just an invitation for a user to convert to another tool. The result is that often a tool makes information visible in a presentation that cannot be extracted from the public interface. If that data is needed by another tool, it must be manually entered.

    This difference can actually be exacerbated by the existence of a domain standard to which the tool conforms. The vendor will provide a standard public interface that does not contain data that reflects vendor enhancements, additional data or functions that users find enticing, but are not yet part of the standard. The user can rightfully claim (depending on the exact wording of the standard) that to put the extensions into the public interface would render the tool non-conformant.

    Another difference occurs when information provided by the tool does not end up as expected in the presentations of the tool. For example some CASE tools have had public interfaces through which data could be imported into the tool, but the objects thus imported would not appear in the graphic presentations of the tools, unless the user went through some contortions to move those objects into the appropriate graphics. I might import an Entity definition into the tool, but it would not appear on the Entity-Relationship diagram unless I edited the diagram, added a new object and then grabbed the imported data from some other tool presentation. The tool did not have the capability of even rudimentary graphic layout from imported data.

    The ideal of semantic plug and play is that there be an equivalence between the set of presentations of a tool and its public interface. Any information a user can see in any presentation should be exportable from that presentation into the interface and importable from the interface into the presentation. Standards should be flexible enough to allow tools to supplement a standards-conformant core with data that has not yet been standardized (and might never be).

    Appendix 2: Modeling and Modeling Techniques

    In the wide world of business, the word "model" can be applied to anything from a look-alike toy to an full-scale physical mockup to an electronic abstraction used for simulation to a whole class of products. Those are all legitimate uses of the word. For our purposes here, however, I shall always (unless I specify otherwise) use the word to mean a definition of a view.

    A-2.1. What is a Model?

    The word "modeling technique" bears an ambiguity that has proved an obstacle in achieving convergence of modeling approaches. It can mean either of the following:

    By "modeling technique" I shall in this paper exclusively mean the latter of these two. More accurately, I shall mean the underlying semantics of a modeling language. In order to make effective use of a modeling technique, an enterprise must have a well-designed method or process that guides the collection of information, and that process may well be acquired from a company that specializes in systems development methodologies. But the kinds of standards being considered here are neutral with respect to modeling methods, i.e., how the data is collected, and address only the mechanisms for sharing models among different methods.

    A-2.2. Modeling Techniques

    A model captures information needed about a component of an information system in order to make decisions about implementing it or integrating it in a way that it can operate and interoperate with other components of that system.

    Just as a view represents a certain perspective of the business, a model represents a certain perspective of the view. A model is a representation of the view for a particular purpose. Each model we take the time to build should assemble information about the view that we need in order to perform some task in implementing the view. Models should not be built for their own sake. No matter how well-entrenched a modeling technique is, if we do not know what we are going to do with that model, if there are no down-stream processes that need the definitions that are assembled in the model, then the model is a waste of time.

    What kind of models do we need? That is like asking what kind of data do we need to do our business. The electronic marketplace is filled with "new" kinds of data--usually new ways of presenting data, but occasionally truly new kinds of data--that proclaim themselves to be essential to the competitive edge. A model is needed if plays a valuable role in defining, implementing, operating or improving the aspect of the business comprehended by a view. So it is not possible to limit a priori the list of useful models.

    For the time being then we must leave aside any attempt to provide a complete list of the kinds of models that are needed, and ask only what kinds of models are currently finding significant use in defining views.

    A-2.2.1. Data Definition

    The most commonly used form of view definition is data modeling. Its objective is to define the kinds of data that can be manipulated (created, calculated, retrieved, altered or removed) by an application that supports a view. These definitions specify not only the types of data that are needed but the relationships among data of these types and the rules for manipulating the data in ways that assure that it continues to conform to the assumptions that are expected by the view. (These rules are commonly called "integrity constraints".)

    A-2.2.1.1. Data Definition: Role in Semantic Plug and Play

    In a semantic plug and play architecture, a data model for the view implemented by a tool would enable the following:

    A-2.2.1.2. Data Definition: State of the Art

    A competitive market for data modeling tools and techniques emerged in the early 80's, and led to a proliferation of alternatives, each promoting its competitive edge.

    A-2.2.1.2.1. Common Forms of Data Definition

    The most common technique of data modeling is entity-relationship attribute (ERA) modeling, of which there are many dialects:

    A-2.2.1.2.2. Distinguishing Features of Data Definition

    Other distinctions among data definition languages include their support for the following:

    The dominance of ERA modeling is easy to understand: not only is it relatively easy to teach (its graphical presentation put the essentials of the relational data model within reach of mere mortals), but vendors of CASE (computer-aided software engineering) tools were quickly able to market products that not only provided graphical editors for these models but also could generate database schemas from those models. For data bigots this was the millennium: our data-driven database environments were now themselves (meta-)data-driven. Look what you can do with them: Build a data model in your CASE tool and (depending on the tool) it will

    There were of course a few minor problems:

    For most of us that go into information systems, these are the kind of problems that exist to be solved (for others of course they exist to be exploited). It seems quite clear that for semantic plug and play to become a reality, there must be communicable techniques for defining the semantics of the data a view needs to manipulate in order to compare that data with what is needed by other views.

    A-2.2.1.3. Data Definition: Current Standards

    The wide recognition of the importance defining the meaning and form of data has led to a variety of standards activity on the subject. The standards described in this section are the international and US (ANSI) standards that I am familiar with.

    A-2.2.2. Process Definition

    Process modeling is another common form of view definition. Where data models define the rules imposed by a view for a permissible state of a data collection, process models define the transformations permitted by a view between those states.

    A-2.2.2.1. Process Definition: Role in Semantic Plug and Play

    In a semantic plug and play architecture, a process model for the view implemented by a tool would enable the following:

    A-2.2.2.2. Process Definition: State of the Art

    Process modeling, as a formal technique within industry, got its start in the 1970's with the development of structured analysis and structured programming, of which there were several dialects.

    Because of the immense cost both of computing and of developing computing systems, and because computing development was a chaotic process, in the theoretical sense that the usability of its product was highly sensitive to the accuracy of the specifications that the process was given to implement (as well as in more colloquial sense that no one really knows what they were doing or why), there was a major need for techniques for specifying requirements for computing systems in ways that reduced the errors and consequent rework. Structured analysis was a technique for analyzing the processes to be implemented by a computing system into progressively smaller and precisely defined components with precisely specified interfaces.

    Programming theory identified some generic control structures that were common to all programming languages, and described the benefits in reusability and in maintainability of restricting programs to the use of those structures, and techniques such as structured programming, finite-state modeling and petri-nets were the result. So-called lower-CASE tools emerged that facilitated the construction of programs in various programming languages (usually COBOL or FORTRAN) based on those structures, sometimes through the use of a higher-order language, i.e., a process modeling language.

    Although there is a lot of hype about the desirability of "executable process models", the techniques of process modeling have matured to the point where that goal is generally achievable, although it has been achieved on a limited basis in tightly constrained environments.

    A-2.2.2.2.1. Common Forms of Process Definition
    A-2.2.2.2.2. Distinguishing Features of Process Definition Techniques

    Process definition techniques vary in a number of ways, which tool vendors select and adapt to make their products more market-worthy. The dimensions of this variation include the following, and even greater variation exists among tools in the extent to which these distinctions form the basis for the code generated by a tool:

    A-2.2.2.3. Process Definition: Current Standards

    Tool vendors were quick to implement proprietary versions of structured analysis, i.e., upper-case process modeling. Notational schemes differed, but since these models were not used to generate code, the semantic variations were not noticed by the users.

    A-2.2.3. Object Definition

    Object-oriented systems are claimed to offer features that are not to be found in traditional systems:

    How successfully object-oriented programming and object-oriented databases will compete with other development paradigms remains to be seen. At this point it appears that OO implementations will find a substantial niche in the marketplace, but by no means will drive out all competitors.

    A-2.2.3.1. Object Definition: Role in Semantic Plug and Play

    In a semantic plug and play architecture, a object model for the view implemented by a tool would enable the following:

    A-2.2.3.2. Object Definition: State of the Art

    Is object definition really something new? Since we do not yet have a clear semantics for object models (i.e., one that is at least agreed to by the practitioners) that can be compared and contrasted with semantic definitions of data and process models, it is difficult to understand what, beyond the definition of data, processes and their relationships, the definition of objects requires in order to support the semantic plug and play of object-oriented systems with other object-oriented systems, let alone with other kinds of systems. My personal viewpoint is that the object-oriented movement did reveal some significant gaps in the modeling techniques theretofore in use, and codified by being hard-coded into CASE tools, gaps that most serious practitioners had recognized and begun to find ways to correct. Any information about a business that is needed to build an OO system is also needed to build a traditional system, it just gets used in a different way. We should be able to model the business without committing ourselves to whether we are going to support its requirements by traditional systems, object-oriented systems, other paradigms or some mixture of them all. In other words, I don't believe in object-oriented analysis; the gaps can be rectified by incremental, evolutionary improvement in classical techniques. Of course I'm willing to be taught.

    A-2.2.3.3. Object Definition: Current Standards

    There are several overlapping approaches to object sharing currently being debated. The competition (and consequent hyperbole) is so severe that the standardization process is going on outside the traditional standards organizations, and would suggest that the fate of certain major companies rests on their success in making their approach a de facto standard, to be blessed by some standards organization when they get around to it. Among the competitors are

    However all of the above standards seem to be concerned with the implementation and exchange of objects, not their definition. The following seem to be addressing techniques for modeling objects:

    A-2.2.4. Knowledge Definition

    Whatever the form of the knowledge application, in order for it to plug and play in our operating environment, it will still be necessary to compare its view of data, services and rules with those in its new environment, and to create mappings between them. This requires that we have interchangeable techniques for modeling knowledge systems.

    Knowledge definition therefore seems to be an extension beyond data and process modeling. Starting with a data model that defines the basic concepts to be expressed, a process model that defines the basic transformations in that data, an object model that defines the packaging of data and processes into exchangeable units, knowledge definition adds the rules that enable the exchange of rules between systems.

    Here we are treading on relatively new ground. Aside from the interlingua project and the KIF language which sprung from it, and which is being explored by a number of standards organizations, little work has been done on what it means to exchange knowledge. Certainly no one has addressed the problem of how to achieve semantic plug and play of knowledge systems.

    A-2.2.4.1. Knowledge Definition: Role in Semantic Plug and Play

    In a semantic plug and play architecture, a definition of the knowledge managed by a knowledge-based tool would enable the following:

    A-2.2.4.2. Knowledge Definition: State of the Art

    KBE tools are being used for an ever-increasing percentage of the design process. Their ability to generate designs for whole classes of parts significantly reduces the effort require to produce those designs following the traditional one-by-one design process, and therefore significantly reduces the costs of design.

    However, KBE tools tend to be standalone products, whose interfaces to other tools, whether knowledge-based or otherwise, have to be built and maintained by using organizations.

    The approaches to knowledge definition that are implemented in these tools are proprietary, and are treated as a competitive advantage. The Knowledge Sharing Project at Stanford provides a general facility for defining ontologies in KIF for use by knowledge tools.

    A-2.2.4.3. Knowledge Definition: Current Standards

    There are no existing standards for knowledge definition. However, several organizations are exploring such standards.

    A-2.2.4.3.1. STEP Knowledge Exchange

    ISO TC184/SC4 has long identified four levels of product data sharing for STEP:

    1. File exchange, currently implemented by Part 21
    2. Shared memory, no longer being considered
    3. Data sharing, planned to be implemented by Part 22 (SDAI)
    4. Knowledge sharing

    No work has been done to data on knowledge sharing.

    A-2.2.4.3.2. STEP Parametrics

    Although no general work has been done on STEP knowledge sharing, a substantial investigation has been made of techniques for exchanging parametric part definitions, i.e., rules that enable the generation of classes of parts from sets of parameters. As yet no STEP Parts have resulted from this activity.

    A-2.2.4.3.3. KIF Knowledge Exchange

    KIF was developed with the deliberate intention of supporting the exchange of knowledge between different tools. Although it has demonstrated its ability to achieve this goal when tools are designed with KIF planned as an exchange mechanism, it is not yet clear that it will work for KBE tools currently in production. KIF is in the standardization process through ANSI X3T2.

    A-2.3. A Framework for Modeling

    John Zachman ["A Framework for Information Systems Architecture," IBM Los Angeles Scientific Center Report No. G320-27B5] proposed that models of an information system be organized into a two-dimensional matrix. One dimension showed various aspects of an information system that could be defined and managed with a high degree of independence of one another: data, processes and locations (and in some versions events, organization and motivation). The other dimension showed increasing degrees of technology-dependence, from the business view, whose specification "should" be totally independent of implementing technology, to the specific implementation of the system with specific software running on specific machines.

    Whatever we think of Zachman's framework, it does seem that there is an important difference between business-oriented, technology-independent models in the upper regions of the framework, and the computing-oriented, technology-specific models in the lower regions, a difference that is significant to the kinds of standards that are appropriate. The primary function of technology-specific models, whether they be text files containing the source code for a program, or data structures in a CASE repository, is to construct an implementation that does the job specified for it well in the selected environment. The primary "user" of these models are the compilers that convert their instructions into executable code. They are a medium of exchange between a person and a machine. Business models on the other hand serve to communicate the semantics of various components of the information system, the meaning of the data and the reasons for certain kinds of transformations. As such they are primarily devices to communicate among people. In a semantic plug and play environment, the communication of such semantics is critical to achieving interoperation.

    This difference means that it is much more important that standards for communication of semantics, i.e., for modeling languages, be interchangeable than it is for standards for programming languages. Standards are required for programming languages only to assure that a program written in a given language will be compilable on any machine that has a compiler for that language, or to enable a program in one language to invoke a program in another. There is no requirement that a compiler for one language be able interpret a program in another language. Modeling languages are different. For semantic plug and play to work, the models that define the semantics of the view implemented in a tool must be interpretable by whatever view mapping tool a potential user might have. That means that there must be a semantic, language-independent core of the modeling tools that they all share and that they can use to communicate the specific semantic assumptions of a view.

    Zachman's framework was found by many to provide a useful starting point for discussion of the role of models in the management of information systems, but it never evolved into an implementable architecture, perhaps because its scope was too big for any one vendor to comprehend, and for industry to implement it by assembling it from off-the-shelf components requires precisely the kind of semantic plug and play for modeling tools that we are talking about for the information systems modeled by those tools.

    Appendix 3: Current Standards Relevant to Semantic Plug and Play

    The standards organizations included in this section are those that seem appropriate to participate in the Joint Workshop Standards for the Use of Models that Define the Data and Processes of Information Systems. That is, they are standards for definition, management or use of models. Comments and descriptions are mine based on my experience with these organizations, but are sometimes drawn from "official" sources. Where available, I have included the URL of a Web Site to provide access to an organizations own perspective of itself.

    A-3.1. ANSI X3T2

    ANSI X3T2 is the organization chartered to develop US standards for communications and to represent the US in ISO/IEC JTC1/SC7.

    A-3.1.1. CSMF (Conceptual Schema Modeling Facilities)

    One of the primary focuses of attention of X3T2 over the past few years has been the US position on the CSMF standard being developed by JTC1/SC21/WG3. The US has promoted a logic-based approach to conceptual schema modeling as the only way to adequate enable the integration of databases and knowledge bases into an integrated conceptual schema. To that end X3T2 is in the process of developing US standards on two languages based on logic, preparing to champion them through the process of international standardization, and promoting their application to the needs of international standards.

    A-3.1.1.1. KIF

    KIF (Knowledge Interchange Format) was developed at Stanford University as part of the Knowledge Sharing Project under contract with ARPA (Advanced Research Projects Agency). Its objective was to provide a means to exchange "knowledge" among "knowledge-based engineering" tools. A working draft is currently under review by ANSI X3T2 for US standardization.

    KIF is a character-based form of formal logic, i.e., the first-order predicate calculus with identity and functions. As such it inherits all of the formal characteristics of the predicate calculus. It carries the classical text-book definitions of its semantics and the classical axiom sets as defaults, but users are free to tailor the axiom sets or the semantics to meet their needs.

    KIF was developed with the deliberate intention of supporting the exchange of knowledge between different tools. Although it has demonstrated its ability to achieve this goal when tools are designed with KIF planned as an exchange mechanism, it is not yet clear that it will work for KBE tools currently in production.

    KIF is in the standardization process through ANSI X3T2, and is currently being explored by several international standards organizations:

    A-3.1.1.2. Conceptual Graphs

    Conceptual Graphs (CG) is a modeling technique developed by John Sowa. It is intended to provide a graphic form of presentation of formal logic, based on work by Charles Sanders Peirce.

    CG is currently under review for standardization by ANSI X3T2, and is being proposed to ISO/IEC JTC1/SC21/WG3 as a standard graphic visualization language for CSMF. X3T2 is deliberately defining the initial core KIF and CG standards as formally equivalent alternative representations of the predicate calculus.

    A-3.2. EIA (Electronic Industries Association)

    EIA is an ANSI-certified standards organization, with principal focus on electronic data standards, such as EDIF (Electronic Data Interchange Format). Because of the apparent similarity between circuit diagrams and graphic CASE models, vendors and users of CASE products explored the possible application of EDIF to the exchange of data between CASE tools. The effort failed since the similarity in the graphic structure of the two domains was not matched by a corresponding similarity in the semantics of those graphs, and the functions that CASE tools needed to perform were based primarily on the semantics. However, the group of vendors and users rechanneled their efforts into a new standard within EIA, namely CDIF.

    A-3.2.1. CDIF (CASE Data Interchange Format)

    CDIF is an interim standard for the exchange of data between tools used in software engineering, i.e., in the specification, design and implementation of information systems with software. CDIF is being reviewed for international standardization by ISO/IEC JTC1/SC7.

    CASE tools commonly provide a family of modeling techniques, i.e., representations of different aspects of information systems (entity-relationship diagrams, data-flow diagrams, state-transition diagrams, etc.). Thus CDIF was built as a family of standards which includes several modeling techniques, each defined as a subject area. These subject areas currently include Foundation and Common (both of which are automatically included in every other subject area), Data Definition, Data Modeling and Data Flow. A State-Event subject area has been released as a working draft, and work is beginning on an Object Modeling subject area. Each of these subject areas is being defined as an instance of a meta-meta-model defined in the Framework document. There is a standard encoding based on this meta-meta-model for the exchange both of subject areas and of the models that instantiate the subject areas.

    A-3.2.1.1. CDIF Framework

    The family resemblance among the various CDIF subject areas is the result of a common framework that defines the structure and concepts to be used in defining a subject area. If each subject area is a meta-model, whose instances are models of a certain kind, then the framework defines a meta-meta-model, whose instances are the subject areas themselves. The CDIF exchange format is defined in terms of the meta-meta-model and is inherited by each of the subject areas.

    A-3.2.1.2. CDIF Data Definition Subject Area

    The CDIF Data Definition (DDEF) subject area enables the definition of the types of data that a computer can distinguish as elementary values of the data defined in models of data and data flows. This subject area was developed separately from the Data Modeling subject area in order that it could be used separately by subject areas that needed to define data without using the whole of DM.

    A-3.2.1.3. CDIF Data Modeling Subject Areas

    The CDIF Data Modeling (DM) subject area supports traditional CASE techniques of data modeling. By the very nature of the process by which vendors of different tools came to agree on the standard, DM provides a rich and flexible meta-model for integrating data models. It is an ERA form of modeling which

    Although each of the features of the subject area is used by one tool or another, few tools will take advantage of all of those features. I remains to be seen whether a process can be put in place to exchange models through CDIF in a way that assures that models can be "round-tripped", i.e., sent from one tool through CDIF and back, without loss of information.

    A-3.2.1.4. CDIF Data Flow Subject Area

    The CDIF Data Flow subject area is intended to enable the exchange models which implement the various flavors of structured analysis, including Yourdon, SADT and IDEF0. It supports such features of data flow models as

    A-3.2.1.5. CDIF State-Event Subject Area

    CDIF also includes a preliminary State-Event subject area, which is intended to enable the exchange models which implement the various flavors of finite state process modeling, including State-Transition Diagram and Petri Nets.

    A-3.2.1.6. CDIF Object Modeling Subject Area

    EIA is beginning work on an Object Modeling subject area for CDIF. An approach to object definition, based on work by Rumbaugh and Booch, seems to be emerging from OMG, and is being explored by EIA for this purpose.

    A-3.3. IEEE (Institute of Electrical & Electronic Engineers) Computer Society)

    The IEEE Computer Society (IEEE CS) is an ANSI-certified organization chartered to develop US computing standards.

    A-3.3.1. IEEE IDEF (Integrated Definition Language)

    IDEF (nee ICAM Definition Language) is a family of languages for modeling information systems. It was originally developed under the auspices of the US Air Force Integrated Computer-Aided Manufacturing (ICAM) project. It became a FIPS (Federal Information Processing Standard) and was managed by the IDEF Users Group. Recently projects have been initiated with IEEE CS to develop US standards for two languages in that family, IDEF0 and IDEF1X. It is anticipated that upon completion of the ANSI standardization process, these languages will be proposed to ISO/IEC JTC1/SC7 for international standardization.

    Part of the IEEE IDEF standards is an exchange language (public interface) called IDL, a somewhat unfortunate selection of a name since CORBA also uses the name IDL for its public interface.

    A-3.3.2. IEEE P1175

    P1175 is an IEEE standard that has many similarities to EIA CDIF. It is included here for completeness, but I am not sufficiently well-versed in it to provide any details.

    A-3.4. ISO TC184/SC4: Industrial Data

    SC4 is chartered with the development of international standards for industrial data. It is currently working three such standards STEP (STandard for the Exchange of Product model data), PLIB (Parts Library) and MANDATE (Manufacturing Data). Working documents of SC4 are available through SOLIS (STEP On-Line Information Service); International Standards (IS) and Draft International Standards (DIS) are copyrighted by and are available for sale through ISO.

    A-3.4.1. STEP (ISO 10303: STandard for the Exchange of Product model data)

    STEP is the most widely known of the SC4 standards, and the only one yet to achieve publication as an International Standard.

    A-3.4.1.1. EXPRESS (ISO 10303-11)

    EXPRESS is the language used by ISO TC184/SC4 to define STEP. It is being used in an increasing number of applications outside of STEP.

    A-3.4.1.2. STEP File Exchange (ISO IS 10303-21)

    STEP Part 21 defines the mechanism for exchanging instances of the data defined in an EXPRESS model. (Unlike most other major Parts of STEP, Part 21 does not have a convenient, catchy acronym.) Part 21 has been the basis for tool vendors to write import and export functions that allow their tools to "speak STEP" in the form of exchangeable files. To my knowledge this is the first industry standard for a model-driven form of data exchange.

    A-3.4.1.3. SDAI (ISO DIS 10303-22: Standard Data Access Interface)

    SDAI is a standard for an application programming interface (API) to provide data manipulation functions for instances of data defined in an EXPRESS model. SDAI is intended as the basis for tool vendors to "speak STEP" to a shared database. Part 22 defines an abstract API which must be combined with a binding to a particular language. The language bindings currently under development are Part 23 (C++) and Part 24 CORBA IDL).

    A-3.4.1.4. STEP Generic Resources (ISO 10303-4x)

    The "40-Series" of STEP parts, i.e., ISO 10303-41 through -49, define the Generic Resource Models (GRM) of STEP. These are STEP schemas, defined in EXPRESS, that are not intended for direct implementation, but are to be reused in the development of implementable application protocols.

    A-3.4.1.5. STEP Application Protocols (ISO 10303-2xx)

    STEP is implemented as a collection of application protocols (APs), for example, STEP Part 203 (Configuration Controlled Design) and other STEP Parts in the 200-series). Each AP defines a particular domain of sharable product data. An application protocol itself consists of several components:

    A-3.4.1.6. STEP Knowledge Exchange

    TC184/SC4 has long identified four levels of product data sharing for STEP:

    1. File exchange, currently implemented by Part 21
    2. Shared memory, no longer being considered
    3. Data sharing, planned to be implemented by Part 22 (SDAI)
    4. Knowledge exchange

    No work has been done to date on developing a standard for knowledge exchange.

    A-3.4.1.7. STEP Parametrics

    Although no general work has been done on STEP knowledge sharing, a substantial investigation has been made of techniques for exchanging parametric part definitions, i.e., rules that enable the generation of classes of parts from sets of parameters. As yet no STEP Parts have resulted from this activity.

    A-3.4.2. PLIB (ISO DIS 13584: Parts Library)

    PLIB (Parts Library) is another standard of ISO TC184/SC4, developed under WG2. Although it is intended to use as much of the STEP architecture and models as possible, it is developing facilities for including definitions of the concepts used to describe the parts in standards parts libraries, even when those descriptions do not conform to STEP. Thus PLIB will encapsulate semantic definitions in whatever form they might take, and not to be developing a technique for specifying those semantics.

    A-3.5. ISO/IEC JTC1/SC7: Software Engineering

    JTC1/SC7 is chartered to develop international standards for software engineering. Among the subjects currently being addressed by its various working groups are the following:

    In practice is has been WG11 that has been concerned with standards for the exchange of CASE models. It is they that are reviewing CDIF as an international standard for this purpose as a means of completing their Project 7.28 -- Software Engineering Data Description and Interchange (SEDDI).

    Because of their awareness that other organizations are also developing standards for CASE modeling, WG11 has established a number of liaisons:

    A-3.6. ISO/IEC JTC1/SC14: Data Elements

    ISO/IEC WD 11179, Specification and Standardization of Data Elements, is a working draft of a standard for the attribution, classification, definition, naming, identification and registration of data elements. This standard is being developed by ISO/IEC JTC1/SC14, with US input from ANSI X3L8.

    A-3.7. ISO/IEC JTC1/SC21: OSI

    A-3.7.1. ISO/IEC JTC1/SC21/WG3: Database

    A-3.7.1.1. SQL (Structured Query Language)

    SQL is a database query language based on the relational model of database structures. It is declarative in nature, in that a statement in SQL specifies what data is to be retrieved, added or updated, but leaves it up to the database implementation to determine both how to physically store data and how to execute SQL statements against that structure.

    There is an international standard for SQL, the most recent of which (SQL-92) is defined in ISO/IEC JTC1/SC21/WG3 N9075:1992. Most commercial relational database management systems (RDBMS) provide a dialect based on that standard, which usually offers some alluring extensions and also fails to implement some "non-essential" features.

    A version of the SQL standard is under development that is supposed to provide support for object-oriented applications.

    A-3.7.1.2. CSMF (Conceptual Schema Modeling Facilities)

    CSMF (Conceptual Schema Modeling Facilities) is a standard being developed by ISO/IEC JTC1/SC21/WG3 for defining conceptual schemas. Because WG3 is also responsible for the SQL data definition and manipulation language standard, the role of that language in the CSMF standard is a major consideration. Also under consideration are logic-based modeling languages, such as Conceptual Graphs and KIF, in order that the CSMF standard be adequate to emerging disciplines such a knowledge-based engineering.

    A-3.7.1.3. IRDS (Information Resource Dictionary System)

    IRDS is an industry standard for a repository to support the sharing of models of information systems.

    A-3.8. ISO/IEC JTC1/SC22

    A-3.8.1. ISO/IEC JTC1/SC22/WG2: PCTE (Portable Common Tool Environment)

    A-3.9. UN/EDIFACT

    A-3.9.1. Basic Semantic Repository (BSR)

    The Basic Semantic Repository (BSR) is being developed by the UN/EDIFACT committee to provide a library of terminology that can be used with minimum ambiguity in electronic commerce. An element of that library is referred to as a Basic Semantic Unit (BSU).

    A-3.10. Object Database Consortia and Standards

    There are several overlapping approaches to object sharing currently being debated. The competition (and consequent hyperbole) is so severe that the standardization process is going on outside the traditional standards organizations, and would suggest that the fate of certain major companies rests on their success in making their approach a de facto standard, to be blessed by some standards organization when they get around to it. Among the competitors are

    A-3.10.1. CORBA (Common Object Request Broker Architecture)

    CORBA is a commercial standard, developed by the OMG (Object Management Group) consortium, for a Object Request Broker (ORB) that will transmit messages between objects, regardless of the manner of their implementation, and thereby enable interoperability among any applications that conform to the standard, so long as those applications agree on the semantics of the objects being interchanged. The language defined by CORBA for expressing these messages is IDL, which is different from the language of the same name for the IDEF public interface.

    A-3.10.2. OLE (Object Linking and Embedding)

    OLE is a mechanism promoted and implemented by Microsoft to enable the assembly of complex documents whose components are created and maintained by different applications. OLE allows a document developer to collect data into a database application, link that data into a spreadsheet for analysis, link the spreadsheet into a document in a word processor, for editing of explanatory text, along with graphics linked from yet another application, which may itself might be linked to the spreadsheet.

    Current implementations of OLE tend to be "coarse-grained" in that the objects that are embedded or linked are usually whole documents, tables, pictures, or large, named components of such objects. OLE does not currently facilitate linking individual data items or query results from databases.

    A-3.10.3. ODBC (Open Database Connectivity)

    ODBC is a standard for a set of interfaces that enables the exchange of data based on the SQL query language. ODBC is the result of collaboration between the X/Open and the SQL Access Group. ODBC enables an application running on one machine to talk to data servers on other machines through drivers that handle the linkage between the SQL statements executed on the client and the facilities of the servers.


    Return to: JSW Home Page.
    Send feedback to: jfulton@atc.boeing.com .