Federal XML Working Group

Wednesday, September 10, 2003 Meeting Minutes

State Plaza Hotel

2117 E Street, N.W

Washington DC 20037

Please send all comments or corrections to these minutes to Glenn Little at glittle@lmi.org.

Mr. Owen Ambur: I think most everyone here today knows each other but for the benefit of Norm Walsh, who’s on the telecon, let’s go ahead and introduce ourselves. I’m Owen Ambur, chairman of the Working Group, and I have some brief announcements:

1. First, Lee Ellis has become the GSA co-chair of this group. We’re looking into becoming a Community of Practice, as opposed to a Working Group as in the past. I think most of you know that Karen Evans has been named to the position at OMB [U.S. Office of Management and Budget, http://www.whitehouse.gov/omb/] that Mark Forman vacated [Administrator for IT and E-Government]. That’s of particular interest to us because when she was named vice chair of the CIO Council last year, she issued a statement of vision for the Council that expressly referenced XML. So if any of you have not seen it, I have a copy you can look at. [Editor’s note: It is available online at http://www.cio.gov/documents/karen_memo_12_17_02.pdf ]

2. I think most of you know about the XML Authoring and Editing Tool Forum on September 29, which we are cosponsoring in cooperation with the DC XML Users Group and Booz Allen Hamilton. Though the program was not posted yet, as of yesterday 120 folks have already registered and the facility only holds 150. So if you have any colleagues who may be interested, I encourage you to have them register as soon as possible via the link in the “What’s New” section of the XML.gov home page [http://xml.gov/index.asp#new]. [Editor’s note: The forum has been moved to the Key Bridge Marriott.]

3. Last, on the 22nd of September , I’m working with Martin Smith at the Department of Homeland Security to schedule an expert panel discussion on XML metadata high-level design issues. A group of experts are being invited to discuss high-level issues that agencies should have in mind as they begin to implement XML. It’s scheduled to be held at GSA. I don’t know the size of the room, but when I agreed to have the XML Working Group cosponsor it, I insisted that a large enough facility be secured to accommodate some of our stakeholders. [Editor’s note: This event was moved to FEMA.]

[Introductions]

Mr. Ambur: Alright, Norm [Walsh], I know you’re on the line. Is there anyone else?

[Brand Niemann and Marc Le Maitre introduced themselves.]

Mr. Ambur: Marc, do you have anything new to tell us about XRI [OASIS Extensible Resource Identifier TC, http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xri] and identity theft?

Mr. Le Maitre: Not at this time.

Mr. Ambur: Well, if that’s all the introctions and announcements…

Mr. John Kane: What’s Marion’s [Marion Royal of GSA] position to be, with this committee, if he’s no longer co-chair?

Mr. Ambur: Lee Ellis is here. Lee, would you like to take a moment to shed some light on that?

Mr. Lee Ellis: My name is Lee Ellis. I’m the new co-chair. I was appointed by GSA ‘Electronic Government’ to replace Marion in this position. We’ll be getting a new website up soon—in a week or two. It’ll be set up to be a mirror image of the current one, but will reflect that we’re a user group not associated with the CIO Council. We’re now linked to the [GSA] Office of Electronic Government and Technology. They’ll do the support we’re accustomed to. I’ve accepted the appointment that will replace Marion, but he’ll still work and collaborate with us. The reason for this is that we view it as a new chapter.

Two contract RFPs [Requests for Proposal] were just let this past week:

1. For a consolidated component registry, intended also to introduce a collaborative environment to develop and register components, schemas, documents, and other online endeavors. The registry is intended to grow and evolve as the technology progresses. It’s intended to establish it as a platform for federal government work.

The second one was, we let a contract for an E-initiatives platform called BGeForms [Business Gateway e-Forms Project], co-sponsored with the Small Business Administration, to be a gateway for communication with small business. It will include a platform for registration and for XML-enabled electronic forms, all using a Tamino native XML database. It’s viewed as an opportunity for agencies to fulfill GPEA requirements. If any agencies have a problem, they need to speak up now. We’ll institute a way to get forms in, to hold them in a registry and be filled out, and in the future, to be auto-processed to the agency where they belong. In the future, under other contracts, as we grow, it will evolve into a full E-Form XML business gateway; a collaborative environment for XML, for sharing components, for records management. It should fit into everything.

Mr. Ambur: That’s exciting stuff -- because one of the reasons this group was formed in 2000 was to pursue the potential to render all .gov forms in XML and gather the data from them in XML. So it’s very exciting for me, personally, to see that objective coming to fruition.

Mr. Ellis: Ken Sall is working with GSA and SiloSmashers to do schemas for e-travel—getting e-authentication online. There are six now?

Mr. Ken Sall: When I last tracked them. Some are on the Fenestra website [http://www.fenestra.com/eforms/], as well as WebServices.gov [http://web-services.gov/].

Mr. Ambur: There’s a link to that site from ‘What’s Important’ section on the xml.gov home page. [http://xml.gov/index.asp#important]

Mr. Ellis: We also have an electronic tools form we’ll integrate into it. This is “proprietary agnostic,” so it will accommodate all the technologies that are there, so we’ve had Sun [Microsystems, http://www.sun.com/index.xml] and various vendors work with us to try to refine these architectural component problems and achieve resolution. We’ve tried to mitigate as many problems as possible, and technical situations that preclude any one vendor or portion of the citizenry. We also want to make it 508 [Section 508 of the Rehabilitation Act] compliant.

Mr. Kane: Is Marion still the Contracting Officer for projects he’s been involved in until now?

Mr. Ellis: I work with Marion. He’s usually the Program Manager, and I’m the Operations Manager.

Mr. Kane: How does this consolidated registry contract fit in with the Yellow Dragon NIST contract?

Mr. Ellis: We have a memo of understanding to coordinate with them. This is the second level of that. We had to open the bid to everyone. Instead of a pilot, this is the due portion of what we’ve been meeting for over the last two years—putting something in place. It’s not a pilot; it’s the operation of the E-Gov Consolidation Registry.

Mr. Kane: Is Yellow Dragon still a pilot?

Mr. Ellis: Still a pilot. As we evolve this, we can still do as much as possible with the pilot registry.

Mr. Kane: If we have a schema on the pilot, should we move it somewhere else?

Mr. Ellis: Yes.

Mr. Houser: The definition of “component”…are we using the FEA [Federal Enterprise Architecture, [http://www.feapmo.gov/] definition of ‘component?’

Mr. Ellis: Yes. I also work with the Architectural Component subcommittee, in which Marion is still active. That would be the link. Until recently, we haven’t knuckled down to “What is the component?’ We’ve broken them down to two arenas: technical and service components. For instance, Pay.gov would be considered a service component. A technical component would be more schema-based, or something you can drop into your code. We’ll try to have a registry where you can break these down into categories of usage.

Mr. Houser: Maybe service components that are complex, like designation of benefits based upon income verification. Is that in your scope?

Mr. Ellis: It’s probably not refined at this point down to that level. The first portion sets the platform as a collaboration zone. It’ll allow us to communicate as the entire federal government, and to share…let you log into your particular group or someone else’s group as a visitor or general user. We’re going incrementally at first to set the environment, set the registries, using the pilots to build a platform and build a working business model that will encompass the whole federal government, but be usable for your particular section.

One of the problems we tried to overcome by utilizing a collaboration zone is also to utilize roles and responsibilities, because some key management positions don’t have technical expertise that developers would utilize. So, we have those from super-technical to the managerial side, so as we evolve, the platform will evolve as well. One of the reasons why E-Forms was chosen was because on one contract, they have 170 business cases at OMB for E-Forms for the federal government. If we only have 28 executive agencies and 80 small bureaus, that means that some agencies are putting in two and three business cases for forms, so we’re combining them into one consolidated effort.

Mr. Houser: Could you send this out specifically to the list [XML Working Group listserve] so I can forward it to my forms people?

Mr. Ellis: Sure.

Mr. Kane: What’s the timeline for implementation?

Mr. Ellis: We hope to have the collaboration zone operating by the end of the calendar year.

Mr. Houser: In the forms solutions, what vendors are participating?

Mr. Ellis: All the major forms players are involved in some fashion. Only five vendors actually came in on the original platform. The first portion was to set up the gateway—I think it was Software AG, SETA, SAIC…I’m not sure whether Mitretek was one.

Mr. Houser: What you’re talking about are integrators and developers. I’m talking about forms vendors like Adobe and Microsoft.

Mr. Ellis: I don’t think any of them were participating solely.

Mr. Ambur: The intent is to enable any eForms software vendor to interact with the XML registry to obtain the necessary XML schemas and implement them in their products, so that folks are free to use any software tool they wish to provide the valid XML instance documents required to do business with any government agency.

Mr. Houser: Until you corral them, you’ll have lots of random e-forms initiatives persistent.

Mr. Ellis: And that’s great. I applaud that effort. That’s why we try to start off by vendor-agnostic. To mitigate those problems, we came up with setting up a collaborative environment as far as the component registry. On the business gateway, we’re setting up a portal effort, because there are so many legacy systems. You can’t replace them all at once. You can’t tell people to use only one contractor. So the way to mitigate is to set up a business gateway, business routing, places to start—so people don’t have to use efforts aimed at their particular agencies, but can piggy-back on others. Keep your costs down, open it up to citizenry, business, and government, and make it open.

Mr. Houser: Owen, you can join me on this, because you were at the Semantic Web seminar on Monday. The big obstacle in data administration is getting everyone to agree on terms and metadata. One of the benefits of semantic technologies is that they let people use their own ontologies. We can use the semantic technologies to concord the separate ontologies. I wonder if GSA is looking at that?

Mr. Ellis: It’s a huge backbone of the effort. Agencies can have their own forms. We had to have a platform to let agencies control the configuration management of their arenas. We also wanted to set up a platform where they could find out more. We also wanted it Web-based, because not everyone is doing their thing from within the federal agency itself. People are in remote locations, universities, etc. Eventually we want to get into e-authentication, because not all people use electronic signatures. We wanted it Web-based, because of the disparity of time zones, military people…all of those were huge factors. So we’re setting the base to accomplish those goals. That’s why the original portions were portal efforts and collaboration zones—because agencies don’t want to give up control.

Mr. Ambur: But collaboration services in conjunction with the XML registry will facilitate the creation of Communities of Practice who need to work together to share data. So this is good; almost too good to be true.

Mr. Ellis: That’s where this room will come into play: be facilitators to agencies, say “There’s stuff out there to explore,” make it win-win. The project is very big. OMB is 100% behind us. There’s no problem with funding at this juncture. We need to take the best logical steps, one-by-one, to build on the infrastructure. We’ll start out with the FEA model, and the XML Working Group, and build from there.

Mr. Morgan: Do you imagine this group focusing on particular agencies that are eager to start, and taking on their work as pilots, and focusing on early implementation? As the system comes online in phases, we can be there.

Mr. Ellis: Exactly. We decided this was to be a 4-phase project. We’ll exercise those based on how fast we can force-feed it to everyone. We have OMB’s backing. We looked at the Clinger-Cohen Act…cease-and-desist things that are old technology. If you have to be GPEA- and 508-compliant, and you have to get things organized in an enterprise architecture environment throughout the federal government, pretty soon you’ll not have a lot of options, because OMB is already looking at turning off some of the older systems that are not compliant. We already know we’ll miss GPEA compliance this year. We need to make the best effort for next year.

Mr. Sall: Could you see this group—similar to what Roy is saying—could you see this group bringing use cases to the deployment? Not just specific forms and needs, but use cases in agencies?

Mr. Ellis: Absolutely.

Mr. Ambur: I’d say we should focus on stuff that works.

Mr. Ellis: We’re looking at stuff that works. We’ll measure our efforts on what does work. We’ve done a lot of planning for several years. Now it’s time to do it in the “do world,” not the bureaucrat world.

Mr. Ambur: That’s exciting stuff. I appreciate your sharing it with us. I didn’t know whether you were ready to discuss it.

Mr. Ellis: It’s the tip of the iceberg.

Mr. Ambur: I appreciate it.

Mr. Houser: Please send out an email to round up the troops on this.

Mr. Ambur: The sooner people know about it, the better.

Mr. Sall: Is the Statement of Work on FedBizOpps [Federal Business Opportunities government procurement website, http://www.fedbizopps.gov/]?

Mr. Ellis: Not yet.

Mr. Ambur: We’re a little behind schedule, but the information that Lee had to share with us is very good, and probably more important than the other topics on our agenda anyway.

[Additional introductions]

Mr. Ambur: With that, Ken and Norm, I’ll turn it over to you.

Mr. Sall: We can’t get the Internet portion to work, but we do have your slides up here.

Mr. Walsh: That’s not a problem.

Mr. Norm Walsh

Sun Microsystems

XML, RDF, RSS, and XSLT: A Mixture of Technologies

Mr. Walsh: I’m sorry I can’t be there in person. I appreciate Owen’s invitation to speak about RSS and XML technologies. If I could get to D.C., I would have been there in person. Thanks, Ken, for being my fingers.

Slide 2 [About Norm]: I’m an XML Standards Engineer at Sun. That’s a high-falootin’ way to say I work on lot of standards groups in a lot of organizations to make sure that core XML technologies come together in the most useful way in the world at large. I also work in Java standards and tools as well.

Slide 3 [About This Presentation]: One of the things I point to is the W3C [World Wide Web Consortium, http://www.w3.org/] Technical Architecture Group and the Electronic group. They’re trying to put together the architecture of World Wide Web documents that describe how the pieces of the Web technologies fit together. At the moment, they’re supposed to have three things on how to identify things:

How do you represent information in RDF [Resource Description Framework, http://www.w3.org/RDF/], etc.? Then, how do you reach them when you’ve found them? We hope to get the first draft out this year. The others are

2. Semantic Web issues, and

Web Services.

That’s not obvious in the Web document.

Slide 4 [Goals]: We need to look at lots of interesting questions, in working with the 11 very intelligent people doing this. That’s why I built the website “nwalsh.com” that attracted Owen’s or Ken’s interest, or some number of you thought it might be interesting to have me talk about it.

Mr. Ambur: Norm, the topic of our discussion later in this meeting will be the emerging technology lifecycle management process. I have suggested we should use XML and related open standards technology to support the process on a widely distributed basis, so that we can more effectively collaborate with vendors as well as our .gov colleagues to come more quickly, efficiently, and effectively to an understanding of the merits of proposed emerging technology components. So that’s the background on why we were interested in scheduling your talk—because you’re already doing it, by using technologies like XML, RDF, XSLT, and RSS.

Mr. Walsh: I’m not sure how much exposure members of your Working Group have with emerging technology or XML technologies, so I’ll give a high-level overview. I’ll go over it on the next slide, and how they complement each other, but not go into it in so much detail that you’ll be bored out of your skull by my ramblings. So I’ll give an overview, and then open up the floor. In case you want me to describe how these relate to problems you have, I’d be happy to do that.

Slide 5 [Web Site as Information System Microcosm]: So the high level is building an information system. For you, we’re talking about electronic government. I try something much smaller—the playground—but we have many of the same problems. First, you have content. Then you have “when it was created, who created it, who has rights, when do they expire, what are the related tools?” Any of these attributes you can add to say things about it. We need navigation tools, etc. Once people have found the content, you have to deliver it to them. They may want it in other formats; they may want Braille, other colors, etc. If you manage a large repository, how do you let people subscribe to your site and get things that they want to look at? Then, how do you update, change, or remove content?

Mr. Sall: Exactly. What you’re talking about with content delivery, metadata, notification—these are all aspects of the repository Lee Ellis is talking about.

Mr. Walsh: Terrific. So I talk about some of your issues. I’m not going to cover all the topics today—only three.

Slide 6 [Content = XML]: The content, I’m assuming, is XML. I’m not going to spend a lot of time talking about it. I’m sure it’s familiar to you. Just two points, because it’s different for RDF, and I want to highlight the distinctions:

1. First, XML is a tree structure. For any XML document, you get one tree structure, and for any tree structure, you get one XML document. There are variations, but the documents on the trees are more or less similar to each other.

2. The other point is that XML documents stand by themselves. If I give you two and say, “Please merge them,” it doesn’t have any particular meaning. There’s no mechanical way of saying “Every third element has a certain attribute.” It means they’re basically islands. If you want to index or address them, or write a process, you have to go over each in turn.

Slide 7 [Metadata = RDF]: The metadata format that a lot of people are using (and the one that it seems as if people are going to use) is RDF. It’s unlike XML. It’s a graph, not a tree. It has multiple parents and multiple siblings. If you want to write it as XML on disk, there are any number of ways to do it. There’s no one-to-one correspondence between stuff on disk and RDF. Also, if you want to merge two graphs, there’s no confusion about what it means. You just have a bigger graph, with all the nodes and edges. If you want to write a process for RDF, you can write one for that one graph.

Mr. Sall: With the tree-versus-graph distinction, are you saying that in the XML case, you have explicit document order of how you traverse elements in documents, but with RDF it’s more like the traditional computer sense of trees that have multiple children, so there are multiple traverse orders, like pre and post? Is that what you’re talking about?

Mr. Walsh: Not quite. It’s closer to what you said first. You have obvious document order in a tree. You know where you are by what’s seen and not seen. RDF is not like that. It’s just a bag of stuff. I have a slide later where I talk more about each of the technologies.

Mr. Houser: More like the difference between a book and a map?

Mr. Walsh: Yes. A book is like an XML document; there’s a beginning, middle, and end. RDF is more like a map; you can go anywhere from where you start.

Slide 8 [Notification = RSS]: It’s an XML vocabulary; a format designed to address the idea of notification. The only points are, unlike RDF and XML, there’s no standards organization working on RSS. It’s a grass-roots development process, and it has its own community of people working on it, but the point is, use it anyway. It’s simple enough that whatever comes from the grass-roots process is easy to adapt to. There’s no need to get concerned about the lack of a standards organization on it. If you want a good example of a success story, the Simple API for XML [SAX, http://www.saxproject.org/] was grass-roots development. It’s almost universally used.

Mr. Ambur: Why hasn’t RSS been taken to a voluntary consensus standards organization? Commonly, software vendors have no incentive to collaborate quickly to develop standards because we, their customers, aren’t smart enough to insist upon it. If we keep buying proprietary stovepipe systems, vendors will be more than happy to sell them to us, in the hope of locking us into doing business with them, and no one else.

Mr. Walsh: When the thing started, they needed a syndication format. They came out with a version with a vocabulary of 10 or 15 elements, and they didn’t need a standards organization. Now they’re working on the next generation. It was never big enough for the critical mass to take it to a standards organization. It’s possible that the work now will bring all these flavors together as one. It might go to the W3C or somewhere else, but the process is now working well enough that collaboration hasn’t seemed necessary.

Mr. Ambur: I will be very interested to see if we can capitalize on RSS in the emerging technology process, because at least in the early stages of considering emerging technology components you don’t need to manage all of the information about all of them in one place. Instead, we should syndicate and access relevant information wherever it exists on the Web.

Mr. Walsh: Because for people with business models that need aggregation, it’s a natural choice for them.

Mr. Sall: Can you comment on how the meaning of the acronym changes?

Mr. Walsh: It was originally a rich site summary language; then people decided they could use it to publish summaries. There was a business model to be made to publish those and aggregate, so they started using it as a syndication format. After a while, they thought of…

Mr. Ambur: Really simple syndication.

Mr. Walsh: Thank you. No one frets too much about what it means. The work to define the next generation is struggling to come up with a name. I’m a fan of using the technology, and I’m not worrying about the details of what it’s called.

Mr. Sall: Their own website describes RDF as “summary” now. It’s curious that they have that acronym.

Mr. Walsh: Actually, RDF is one of the RSS flavors. I’ll talk about why in a bit.

The central point is that it’s a simple way to publish notifications, and let people subscribe and get things when they need them. It has value. Rather than invent your own, you might use the flavor of RSS you like best.

Slide 9 [Access/Delivery/Update]: Access and delivery and update are interesting problems. I’d be happy to discuss it if you want to some other time, or you can send me an email, but I couldn’t put it all in today’s talk, so that’s the last on those.

[Slide bullets were the following:

Access = Metadata as content

Delivery = Sending what the user wants (XML, HTML, PDF, etc.). (There are some
interesting architectural and technical issues here.)

Update = Adding content, providing for feedback]

Slide 10 [RDF Concepts]: I’ll talk a little about what RDF is, because everyone knows what XML and XML documents look like and stand for. It was developed by the W3C as part of the Semantic Web activity, which it actually predates. It’s the same product base. It’s really a framework for representing metadata statements. By that, I mean a piece of content that is authored by a person, or “has these access rights,” or “is related to this organization.” RDF is simple. All it has is three parts: subject, predicate, and object (value for the predicate). It’s either a simple value, like “Name” or “Date,” or another subject also on the graph. Whatever else you take away from this, and about how complex you’ve heard it was, or issues about what XML format to use to represent things, the important thing is, it’s just a collection of simple tables or statements called “triples.”

Slide 11 [RDF Statements]: Here are some informal examples of RDF statements. Each has a subject, a verb, and an object on the other end. If you download the presentation, the appendix at the end has the list in proper RDF. This is informal, not proper. All the RDF statements look like this. The only other thing is that the first statement is true. The third statement says I’ve created this document. Note that I appear as the subject in the first sentence and the object in the third. Any questions on what RDF is?

Mr. Ambur: The way you have explained it is pretty clear.

Slide 12 [Why RDF?]: So why use RDF? Two things: it extends easily. It’s easy to add new subjects and predicates. Now you have a distributed Working Group. If you adopt the RDF framework as a way to do metadata, then an agency such as the IRS can have its subject and predicate for its purposes, the EPA can have the same for its; each domain can define it for itself, and can establish common tools that work with all the vocabularies. It’s very easy to combine their data later on. If, later, the IRS and EPA need to bring their metadata together, because they’re graphs, it’s very easy to throw that metadata together and apply the same set of tools. It’s a very big win. So down the road on what the Semantic Web is working on, because statements are so simple, it’s possible to write tools that view logical input from a statement.

You can correlate separate metadata statements together. The nice thing about writing you own vocabulary is, suppose the IRS and EPA define different authors. When you combine the material later and you know they mean the same concept, you can add metadata that says they mean the same thing. Your tool will know they mean the same thing. So that’s where the promise of RDF is. It’s only in the last 18 months that there have been tools freely available that allow you to solve the problems. I’ll be honest—I didn’t use it for many years. Now, in the last 18 months, the tools are there. I’m not a convert to the Semantic Web vision, but RDF allows you to use them easily. That’s all it takes to impress me.

Mr. Houser: Would you compare it to Dublin Core [Dublin Core Metadata Initiative, http://dublincore.org/] or Topic Maps [XTM (XML Topic Maps), http://www.topicmaps.org/]?

Mr. Walsh: Dublin Core is an example of an RDF vocabulary, for example, of predicates you’d use. It’s very widely used. If you’re talking about document authorship and description, it makes sense to look at Dublin Core vocabulary. There’s still a benefit in choosing the same names you can. Topic Maps are a different thing. RDF is all about representing metadata. It does it with these triples. Topic Maps do also, but it’s a competing standard—ISO [International Organization for Standardization, http://www.iso.ch/iso/en/ISOOnline.openerpage] standard. It has a slightly different vision of what metadata is like, but both communities agree that they’re talking about the same problem underneath. I’ve been to several conferences about building a framework to unify them. I started out using RDF. If I had been in a different community, I might have been using Topic Maps.

Mr. Houser: We’re considering a metadata standard for our community. Which should we choose as the basis for that policy?

Mr. Walsh: Do you mean RDF or Topic Maps?

Mr. Houser: Or Dublin Core.

Mr. Walsh: Dublin Core is an example of RDF, so if you’re going to RDF, yes, you should do Dublin Core. I was trying to say at the end that I’m using RDF, but I’ve not given Topic Maps a fair shake. It makes sense to look at RDF and Topic Maps as well, if you’re thinking of going in that direction. Choosing Dublin Core or another predicate library that you know about is also a good idea.

Mr. Houser: So there’s no straight transformation between the two?

Mr. Walsh: I’ve seen papers on how to map one to the other, but there’s some disagreement. I think in the next few years there will be tools to map, but now they’re slightly different ways to look at metadata.

Mr. Ambur: Walt, when you say “we’re looking at policy,” are you talking about the Veterans Administration?

Mr. Houser: Yes, the Veterans Administration.

Mr. Ambur: It’s my understanding that NARA had proposed a policy for the management of Web records. Does it address the metadata requirements?

Mr. Houser: It doesn’t address that. Norm, can you put OWL [Web Ontology Language, http://www.w3.org/TR/owl-ref/] in the context of this presentation?

Mr. Walsh: OWL is Web Ontology Language. RDF is a collection of statements, but a collection isn’t a language. There’s not an obvious relationship between statements. They might have the same subject and predicate, but they’re just in a bag. OWL is an effort to define standard semantic statements for the languages—the range of subjects and predicates—so OWL is a natural extension of RDF, designed to help advance the concepts of the Semantic Web.

Mr. Ambur: Another thing I’ll mention in this context is that the FirstGov folks at GSA are working on a content model for FirstGov. I talked to Dana Hallman about briefng this group on their content model when she’s ready to talk about it.

Mr. Sall: The conference we went to on the Semantic Web—their website has a nice paper. It talks about the pros and cons…

Ms. Elizabeth Fong: Can you give us that [website URL]?

Mr. Sall: Not offhand, but I can send it to the list [TopQuadrant, http://www.topquadrant.com/].

Mr. Brand Niemann: It’s on the web-services.gov [http://web-services.gov/] site for September 8. The link’s there.

Slide 13 [RSS]: RSS is an XML vocabulary, schema, or DTD that defines elements for summarizing content—saying, “We wrote this, having this link, and having an abstract.” Some flavors let you have RSS in metadata content. That’s very popular for publishing what’s new and available about websites. There are end-user tools for aggregation, and some business models for aggregation. There are websites, like syndicate.com and userland.com, that are aggregating RSS flavors together. There are dozens, so you can go to one place and get all these RSS feeds.

Slide 14 [RSS Viewer]: Here’s an example of the RSS viewer I use. This is what I mean by aggregation. On the left panel are RSS feeds. The highlighted one is about Norman Walsh. On the right side, you see a list about my most recent articles. You get the abstract of that article, a subcategory to “Daily Quotation.” There are a bunch of things that are hidden. You could add a reference to any website you want, so you can imagine a future where all agencies are getting XML online. They could use RSS to publish what’s new, and individual people in agencies that need to keep track have a tool like this RSS viewer to go out and grab those documents, and as new things occur, they’re highlighted. It’s an easy to way to subscribe to a “what’s new” website, for example.

Slide 15 [Other RSS Applications]: So RSS is used for website updates. I recently saw someone propose that magazines and journals publish their indexes this way. A perfect example of how you can take this is, I publish my daily schedule. Every day my computer publishes an RSS feed of things I need to do in the next seven days, so I get this new notification in my RSS viewer, so I’m using it for a different sort of notification, but it gives the flavor of using it.

Mr. Ambur: Norm, are you familiar with the RDF calendar initiative?

Mr. Walsh: I have a slightly different set of representations for content in my Palm Pilot. They’re not yet harmonized, but putting them in there makes a lot of sense.

Mr. Ambur: I first heard about it a year or two ago. I’m looking for a better understanding of how and when we might be able to use it on the XML.gov site.

Mr. Walsh: I could imagine when that occurs in the RDF format, so people pick it up. The other thing is, they require a critical mass of people using them before they’ll take off. Initially it didn’t look like it was going anywhere. Now there are 400,000 users, so it looks like it’s taking off. As people take advantage of it, it encourages more people to use it.

Mr. Ambur: I know of a specific example of a use case, where a publisher was co-sponsoring an XML event and wanted me to establish a link to it. They had the event listed on their site but it was buried in a lengthy list along with other events of no particular interest to my stakeholders. I wasn’t willing to link to their site from the XML.gov site and force my stakeholders to scroll to look for it. I wanted to be able to point directly to it.

Mr. Walsh: One thing—I’m doing Extreme Markup Language, or www.extreme.org. They had work I didn’t have. Between RDF and my Palm, it was actually the first work I did with RDF.

Slide 16 [XSLT]: XSLT is a language to transform from one language to another. Some people thing it’s odd. I’m not inclined to, since I’m on the group that developed it. Why I used it now was to transform documents into XML. I’d be surprised if some of you weren’t using it. It’s used for formatting objects, then putting them into PDF for printing. It’s used for summaries, and then it’s also used for metadata extraction. Someone was talking about an initiative for metadata content on the website. One of the ways …I want content on my site in RDF for navigation, but I don’t want to maintain it in two places, so lots of it comes out of updates I publish there. I auto-extract the RDF.

If you’re working with XML and have to transform from one flavor to another, I recommend you look at XSLT. Michael Kay had the definitive book—also Jenny Peniston. At the end of my presentation, I have a recommendation page.

Slide 17 [XSLT and RDF: Oil and Water?]: One complication of XSLT you run into if you’re using RDF: I can’t really explain it, but I’ll try to describe it. Remember, RDF is a graph. XSLT is a tree translation language. Graphs are not trees, so if you apply XSLT to RDF, you bump into problems of getting a tree view of RDF to do XSLT over it. I’ve written RDFTwig [http://rdftwig.sourceforge.net/] for an XSLT process that I use. That I’ll talk about later on, but you’re going to need to plan for some pain to get XSLT to process RDF. But it’s worth the pain, because it’s a phenomenal way to represent metadata.

Mr. Houser: Would you have to pick “Point A” and “Point B,” and proceed across the graph to come up with the serialization?

Mr. Walsh: That’s one of the things you have to do. One way is to always use the same tools for serialization. The RDFTwig technique is one way to build the tree: do Point A, then come back and do Point B. It allows you to dynamically do it.

Mr. Houser: Why do XSLT on RDF?

Mr. Walsh: The next slide will make that clear. I have ….I don’t know what your projector is like. Can you read the picture?

Slide 18 [Putting the Pieces Together]: I have XML in the far left side. Those are actual essays I write for my website…I want to produce RSS, PDF, and an HTML version to publish on the website. Because I’m extracting the metadata, when it comes time to produce, for example, the HTML version, I need access to the document itself and the metadata as well. That’s why I have to apply XSLT to the RDF, because I’m already using XSLT to fix the transformation problem for which I already have the answer. So I’ll talk a little about what the diagram says, to give you a flavor of how the pieces fit together.

We start with the XML essay. We make an XML document with the title, author, and content. We want to add the metadata to the metadata for my site, so the first step is to process the essay with XSLT to produce the metadata. That’s new. I also have additional metadata from other essays, and some I track by hand. I put that through cwm [http://www.w3.org/2000/10/swap/doc/cwm.html]. In the middle, I have RDF. That’s the meat for my entire site…every essay, and the hierarchies, all the navigational information. I use RDFTwig to extract the meat I need for that, along with the original XML, and using those, I can build the HTML and PDF versions, and the updated RSS feed. It has all the articles I wrote in the last 30 days in reverse chronological order, built only from the RDF.

Why have metadata built by hand? One of the things we were keenly aware of when we built the site was, I did not know what I was going to write about. I wanted the information in topic hierarchy. I didn’t want information about what topic each essay was on in the essay itself, because later on I would have had to edit every essay that had topic information on it in order to add a branch. So I maintained it separately. Because it’s in RDF, it merges easily with RDF coming out of the essays for a unified view of the data sources.

Mr. Ambur: I like this picture a lot, because it depicts what I think some people have posed as a false choice—namely, that you have to have either internal or external metadata. I think the real answer is that we need to have to have both internal and external metadata. It is important to have at least some metadata embedded directly within each record, but it will never be possible or feasible to anticipate and embed in any record all the metadata that may ever be appropriate to associate with it. Thus, the architecture you are depicting is clearly the right answer. The question I have is whether there are scalability issues associated with it. For example, would it make sense for governmentwide use, as opposed to on a single, relatively small site like yours?

Mr. Walsh: I’m absolutely concerned that it has scalability issues. The culprit is the type of cwm tool. It’s actually doing a bunch of work. If I gave it a million essays, it would fail. That doesn’t say it’s a bad design or information flow, but the evil to avoid is duplicate information, so if you have metadata, you want to use it directly, not maintain, for example, the title of the article in two places. You have topic hierarchies that are not represented in one place, but you don’t’ want to maintain the article in two places. The issue of scalability is not one I’ve addressed yet. It is doable. You would want to ask the people who are developing to consider scalability. It’s possible to augment the RDF without rebuilding the whole graph.

Slide 19 [Successes]: So what have we learned? What works well? RDF lets you aggregate metadata. It has been phenomenally valuable. It lets me ignore the fact that they come from different sources. It allows me to modify one file with hierarchy in it. It’s a success story for deriving content from metadata.

One of the questions that came up was the way some people are doing markup in RSS. I was asked to write an opinion piece for XML.com. I did, and it was published. I realized that I wanted people to know that I wrote that, but out of respect for XML.com, I couldn’t republish it on my site. It wouldn’t be fair to copy it to my site. I was thinking about it for five seconds when I realized that all the navigation on my site is metadata. All I had to do to make it appear to be on my site was construct five or six lines of metadata and put it in, and now there’s a summary in my RSS feed. It fits neatly into my site, and the fact that it’s published on a different site is more or less irrelevant. I was delighted when I realized I get that win out of the situation.

Slide 20 [Failures]: You saw that picture. It was fairly complicated. There were lots of arrows, circles…it means there are lots of pieces fitting together. My guess is that, for the size and scale of your pieces, it’s not complicated. Compared to mine, it’s more complicated.

Mr. Ambur: I think the issue relates to your second bullet. If each component can scale massively, then your depiction is not that complicated. We are supposed to think in terms of a component-based architecture, so if we have logically separated functions, then it makes perfect sense. The question is whether the components, individually and collectively, can scale massively to meet the needs of government as a whole, or whether it will be necessary to have smaller, stovepiped applications of that duplicate components, despite the additional maintenance overhead and costs.

Mr. Walsh: If you asked me to make it for EGov, then I’d have to think hard about scalability problems. For myself, if I wanted to do attachments, I would find a replacement for cwm. I spoke to Tim Berners-Lee about it. He admits it’s not the fast way. It’s the easy way. There are other engines that do the same thing, that are non-quadratic, so just replacing that single component might make all my problems go away for a while. I’m sure with your systems, you need to look at scalability for all components.

Slide 21 [Conclusions]: So what conclusions can we draw? XML is a big win. If there were any other conclusion, you’d have to be surprised. Storing metadata separately has its advantages. RDF is very useful. I’m happy I’m using it. It satisfies my most important test, which is it allows me to solve problems easily. RSS keeps people up-to-speed; I check my sites several times every day. I’m trying to keep RSS behind what’s published. XSLT is the obvious choice for a tool for transformations with a little work, and it can work with RDF as well.

That’s the end of my presentation. The next slide has some references. The appendix is three slides I mentioned before. It’s a more technical look of the example of RDF from earlier. It’s too technical for here, but it provides the whole story.

Mr. Kane: The Adobe XMP [Extensible Metadata Platform, http://www.adobe.com/products/xmp/main.html] is RDF compliant. They’re talking about moving it to XML. Is that true—are they moving to XML, or will it always be RDF?

Mr. Walsh: I can’t speak for Adobe, but the XSD [XML Schema Definition Tool] stuff they’re vetting looks like XML.

Mr. Kane: It can be, but I don’t think it has to be yet. I was wondering, if Adobe had plans to make it such, would it indicate that they’re embracing XML more than they have?

Mr. Walsh: I have no information from which to comment. It clearly can be XML. I would feel bad if I had a group of people who said, “It must be.” All the metadata for images in the date tag [on Mr. Walsh’s material] you can embed in XMP and JPEG. I happen to use a different method.

Mr. Ambur: Adobe plans to be at the September 29 XML tool forum. They registered a little late, so I’m not sure whether they made it onto the agenda or not but, hopefully, they or one of their partners can answer the question. We’re a little over our time, go let’s go right into the break. When we come back, I’ll give a brief presentation on what we’re going to do next.

Break

Owen Ambur

U.S. Fish & Wildlife Service

XML Working Group Co-chair

Emerging Technology:

Managing the IT Innovation LifeCycle: XSD for Stage 1 - Identification

Mr. Ambur: Shall we get started again? I’m gong to whip through my presentation pretty quickly, since we have a small group and I think people here have a pretty good understanding of what we’re up to. We’re well behind schedule, but for good reason, because we got good information this morning, including some serendipitous information that I didn’t know whether Lee could share with us. I think you all know what I’m up to, so I’ll whip through it.

Mr. Houser: I’m not as sure…

Mr. Ambur:

Slide 2 [Context]: The Emerging Technology subcommittee of the CIO Council has been asked to develop a process whereby the emerging technology [ET] lifecycle can be better managed. The driving force is the inability of .gov decision makers to deal with all the information coming at them, particularly from proponents of new and emerging technologies, which by definition are not well understood yet. The ET Subcommittee has been tasked to develop this process in order to provide for more efficient and effective communication than we currently have, not only for the benefit of .gov folks but for vendors as well. The suggestion I made early-on is to structure this communication in a way that takes advantage of open standards like XML, while recognizing the driving and shaping force of the FEA and the electronic government initiatives. First and foremost, we’re not doing this just as an academic exercise; we’re doing it to acquire ET components that work and make sense for use by government agencies.

Slide 3 [Principles and Assumptions (Stephen Covey, et al.)]: The objective ultimately is to acquire and use technology more effectively than we currently are. I suggested the target of this process is a fully completed OMB Circular A-11, Exhibit 300 for components to be acquired and used, if not government-wide, then at least by more than one government agency. We can’t bite off the whole process and deal with it all at once; we have to deal with a component at a time, a manageable chunk at a time. We should take advantage of whatever information already exists, wherever it exists, but we should be clear about how we can understand what folks have to tell us about their proposed ET components.

Slide 4 [Principles and Assumptions (W3C, OASIS, et al.)]: So XML has an important role to play—including something that Norm briefed us on—structuring meaning for more efficient and effective communication. We should practice what we preach with respect to component-based architecture, by taking one step at a time. Indeed, this is not a new concept. We can think back several years ago to Raines’ Rules. Former director Raines of OMB instructed us to pursue investments in small chunks, each of which adds value in its own right. We should adhere to that principle in constructing our own process. [Editor’s note: Raines’ Rules are summarized at http://users.erols.com/ambur/itmra.htm See especially Rule 7.]

Slide 5 [Today’s Objective]: Today’s exercise is to agree substantially on the elements of the first stage of the pocess. I suggest that the first data component of the ET process should be an XML schema to help us identify proposed components. The aim is to deliver this draft schema to the ET subcommittee at its meeting next week. I’m hoping the schema will contain good semantics for the elements and that they’ll be in fairly good form. I’m not aiming for perfection. Also, this group is not a decision-making body. Our role is to propose, draft, and advise.

Slide 6 [Key Issues]: Some key issues I hope to address are:

1. Best names for elements, so they clearly convey what we mean. The semantics depend on deciding about the context.

I propose eight elements for stage one. There might be one or two of those elements that we don’t need in the first stage. Or we may need more. However, I do believe we should stick to a small and manageable number for the first stage.

3. A point that Ken [Sall] made, that was helpful in thinking this through—I was thinking the name of each element itself should be fully descriptive—I’ll say more about that in a minute.

With respect to proper form, we need to make sure we’re not violating the XML Developer’s Guide, and that we’re adhering to any other relevant practices.

Slide 8 [Semantics - Just FEA?]: With respect to semantics, should we just focus just on the context of the FEA, or should we bear in mind a broader context, like the World Wide Web? For example, under the first sub-bullet, should we use the FEA TRM as a controlled vocabulary? Or should we think more broadly in terms of technical standards that may not yet be recognizied in the TRM? With respect to the concept of emerging technology, [at this point, Mr. Ambur displayed slide 7 - ‘Semantics - Just ET?’], should we craft the schema broadly enough to encompass all types of technology, or should we focus it more narrowly on information technology? I think it is a given that emerging technology is only emerging for part of its lifecycle and that at some point in its maturity ET becomes simply IT. And assumptions like this affect how the elements of the process should be named. I’ll have more specifics on the elements later. With respect to context, should we just focus on the FEA, which is all OMB and CIOs care about, even though proposed ET components may not fit neatly into the FEA? It may be unrealistic for us to expect proponents of proposed ET components to plug into the controlled vocabulary in the FEA right away. We should think in the larger context of how to enable such components to be brought into and matured in the process.

Mr. Houser: Is there a controlled vocabulary in the TRM [Technical Reference Model, http://feapmo.gov/featrm2.asp] or FEA?

Mr. Ambur: They’ve identified specific terms for standards recognized in the TRM. It’s a subset of all the standards in the world. With respect to ET, should we force proponents to limit their proposals to those standards? I would prefer that we reference an external, more comprehensive listing of standards than just the FEA TRM. I know, for example, that NIST has expressed concerned about the TRM with respect to where we start and stop such a listing of standards officially recognized by the federal government.

Mr. Houser: They also have considerable experience in where to start and stop.

Mr. Ambur: I don’t want to prematurely enforce structure on folks who want to bring good ideas. The process should move into more structure, but at least in the first stage, we should ask whether it is appropriate to try to force proposed ET components into the TRM or whether a larger, less well controlled vocabulary might be more applicable. So those are the kinds of issues we’re looking for advice on.

Slide 9 [Other Semantics]: In terms of context—with the end in mind—when I first thought about it, I thought of taking a subset of elements from Exhibit 300 and just using them in the ET process. However, the schema for Exhibit 300 is not in conformance with the XML Developer’s Guide and it is also focused on projects -- which may involve many different components -- so it’s not appropriate to use ‘out-of-the-box’ if you will. However, it is important that we map the elements of the ET process into Exhibit 300, while not being hung up on using the specific element names, which may be inapplicable in the early stages of the ET process.

Slide 10 [Semantics - Broader Context]: On this slide, I tried to pick up the point Ken made about parent and child elements; so if we make “Technology” the root element and “Information” a child element, then we can use “ComponentName,” “ComponentDescription,” and “ComponentType,” and it’s clear what they mean in the context. Anything else on that, Ken?

Mr. Sall: It comes down to intent. I’m for using this. Some would say you need full, expansive names to make them totally reusable by themselves, but my comment was that in this context, you gain a lot by doing it this way. It’s not totally inconsistent with what UBL [OASIS Universal Business Language, [http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl] does. UBL does data modeling with ISO 11179 [Specification and Standardization of Data Elements], and if I recall, most end up being a string anyway, so it’s when you have a constrained data type that we gain the most usefulness out of the reuse.

Slide 11 [Data Types & Constraints]: If you look at the XSD for the OMB 300, most elements that map to these are OMB rich text data types. I don’t understand the benefit of rich text, and I don’t think, in the initial stages of the ET process, that we should constrain people to use OMB-specified terms if we’re reaching out to the world to come to us. I do think that for “ComponentName” and “ComponentDescription” we should impose length constraints, because I propose that those two elements be indexed on the ET.gov site, so people can briefly scan through the component name and description to see if it may match their interest. I propose that the other elements of the schema be used to provide different, selective views of the listings of the components -- so people could look, for example, only at those components that incorporate standards of interest to them. Initially, I proposed that we use an element name DataTypeOrModel, because the schema for Exhibit 300 includes an element called dataTypesUtilized. However, they don’t use that term in the same sense that database designers and XML developers do. The examples OMB provides include natural resources and health data, so there may be a better term to use in naming that element. The only one of the eight elements I am proposing that doesn’t have a direct analog in the schema for Exhibit 300 is one I call ‘ComponentType,’ for which I am proposing the controlled vocabulary should be “hardware, software, or data.” I don’t think the Component Subcommittee or GSA is thinking that way. They seem to be thinking only of Web Services components. That may make sense for E-Gov in general but it does not make sense in the early stages of the ET process. For example, if you’re with Conformative Systems [http://www.conformative.com/] and you have a hardware device, do we say, “We don’t want to talk to you?” Particularly in the early stages of the ET process, we need to be more flexible. When our draft schema gets to the ET Subcommittee or to the Components Subcommittee, they may say “No, that’s not the way we want to do it” but, from my perspective, it makes sense.

Mr. Sall: Would you consider adding elements, like ‘Service,’ or ‘BusinessProcess?’

Mr. Ambur: I think that’s more in line with the Service Component Reference Model [SRM, http://www.feapmo.gov/feaSrm2.asp].

Mr. Sall: I was thinking of the Component subcommittee. When they talk about components, they include those things. They include a number of things—more than we normally think of when we talk about components. They specifically talk about service, business process, software, data models…

Mr. Ambur: To me, it’s two different things, and should be reflected in two different elements. The reason I put a question mark next to “ServiceType” is because I’m not sure it’s really relevant to the first stage, the identification stage of emerging technology. I’m not averse to making it an optional element in the first stage, but I wouldn’t want to make it mandatory, because some components may not fit neatly into the categories of the SRM. Likewise with respect to the TRM [Technical Reference Model, http://www.feapmo.gov/feaTrm2.asp] there may not yet be any standard established that is applicable to an emerging technology component. And we should not preclude proponents of other models from advancing them.

Mr. Bruce Cox: It may be useful to say, “In the context of the FEA, this is where it fits, but in a broader sense, here’s where it fits.” And maybe also for ‘ServiceType,’ so maybe we need two elements, and show in a larger context that “This is their position.”

Mr. Ambur: I think that might happen in the second or third stage of the process. Initially, it seems to me proponents should be free to identify any service or technical classification system they choose, but in the next stage and particularly in the third stage, where the ET subcommittee accepts stewardship, then they have to fit more specifically into the FEA.

Slide 12 [Other Overriding Issues?]: We want to keep the scope narrow, to deliver something to the ET subcommittee, because we’re not the decision-makers anyway. We just put a draft on the table and try to stimulate action. In that regard, it is noteworthy that the first draft was put together back in March by Jonathan Smith of Booz Allen, who was then under contract to provide support to the ET Subcommittee. Since then, nothing further has really been done. Part of Norm Lorentz and John Gilligan decided to eliminate the AIC’s working groups was because they were viewed as a distraction from the deliverables, but in fact what we are trying to do is deliver the schemas required to support the ET process, which no one else is planning to do. It’s a “Catch 22” but we should work through it and deliver a draft schema to the ET Subcommittee next week, so at least if they say, “That’s wrong,” there’ll be some burden on them to say what’s right, and not just continue to engage in high-level theoretical discussions that don’t lead to any actionable steps.

That’s my quick spiel. Ken, you have XML Spy to display the draft XML schema for review and editing…

Mr. Sall: Sure.

Mr. Ambur: Ken took the proposed elements and put them into XSD form. I understand that XML Spy is not the best for live editing, but maybe we can show what it looks like and go from there. In the handout I gave you, the tabulation of the elements, if you see where it says “Derived from” at the top, the link in the online version points to the March draft that Jonathan Smith prepared. Jonathan initially identified the elements of Exhibit 300 he thought were applicable to the ET process. Then Kevin Phelps, who staffs the ET Subcommittee for Mark Day, added some additional value. Then I took those elements and mapped them to elements of the XSD that Susie Adams of Microsoft compiled under contract to OMB. Those elements are in the first column of the table. That column also includes the data types OMB has specified for those elements.

Mr. Sall: Are you looking at the first stage?

Mr. Ambur: I think you can go right to the draft of the XSD. Matthew [McKennirey], you’re not on the line are you? [No response.] Matthew made one suggestion regarding the WebAddress element -- that we use the Data Type “anyURI”.

In the middle column are my first cuts at the elements I believe should be contained in the schema for the first stage of the process. Three of them are ComponentName, ComponentDescription, and WebAddress. ComponentName and ComponentDescription would be indexed and displayed on the et.gov site. The URL indicated in the WebAddress element would be established as a hypertext link on the ComponentName so that viewers could jump to the full instance document to see the rest of the information on components that may be of interest to them. The “costAndEfficiency” is right from the OMB 300 XSD. The prompt for that element in the exhibit itself is, “How will this project reduce cost and improve efficiency?” We can simply change that element to conform to the upper camel case practice recommended in the XML Developers Guide (CostAndEfficiency) so that it leads directly into the 300 Exhibit in a straightforward manner. Or perhaps there may be other terms like “Justification” or “Benefit” that make more sense in the first stage of the ET process.

I made the element “TechnicalStandards” plural in the first draft, because there may be more than one technical standard that applies. However, the proper name should be TechnicalStandard (singular) and the element should be repeatable, to accommodate the fact that one component may embody mulitple technical standards.

Two of the other elements specified in the 300 XSD are “relationtoFEAServiceCompRefModel” and “dataTypesUtilized.” The latter refers to the DRM, but it predated it, so the XSD couldn’t reference the DRM per se as a controlled vocabulary. The former references the FEA SRM. Initially, I proposed to change the dataTypesUtilized element to DataTypeOrModel. I’m interested in your comments on how that element should properly be named and whether we should permit reference to any data model that may be applicable to proposed ET components, rather than just the FEA DRM per se. The final component I have proposed for Stage 1 is “ComponentType.” To me, the distinction among hardware, software, and data is pretty simple and basic. It might also be appropriate to include the term “firmware” in that classification, but no one has made that case yet.

So you [Ken Sall] have the XSD up there?

Mr. Sall: I have two of them. [Mr. Sall displayed schema documents that were not available for the minutes.]

Mr. Ambur: Have you had a chance to look at Matthew’s?

Mr. Sall: Briefly. He added some value. He replaced my ‘Foo’ element with the capability to add some unconstrained information.

Mr. Ambur: In his revison of your draft, can you explain the meaning of making an element “discursive” as he has proposed?

Mr. Sall: I can’t, other than the capability of breaking it down further, so he put the explanation right here, in the ‘Other’ column. You can add more child elements across ‘efficiency’ down the road.

Mr. Morgan: Do you imagine CostAndEfficiency to be described by [Microsoft] Excel tables, or charts, or graphs, where that component needs lots of file types?

Mr. Sall: He’s taking that in general, where the top level he went through had another level of expressiveness.

Mr. Houser: The other thing is that CostAndEfficiency is kind of a…sort of mixing things all together into a mess of confusion. ‘Efficiency’ is the “efficient allocation of resources.” Resources have costs, then you have revenues, which could be under ‘Benefits.’ Mixing them together gives me some discomfort.

Mr. Ambur: Should we rename the element in the context of the first stage of the ET process?

Mr. Houser: It’s a marketing problem that Roy just mentioned, that has to be sold to the OMB folks.

Mr. Morgan: Who wrote the circular.

Mr. Ambur: I’m willing to make the pitch that we’re mapping into their elements, rather than merely using them exactly “as is” if it does not make sense to do so.

Mr. Houser: ‘costAndEfficiency’ can be a basket into which you put a variety of concepts.

Mr. Ambur: I propose in the initial stage that we not index this element on the ET.gov website, so it may not be necessary to constrain it at all. Proponents would be free to use that element to convey any information they deem to be relevant, and those who go to the proponents’ sites to view such information could take it for whatever it may be worth. Thus, I’m not particularly concerned about how we name that element in the first stage of the ET process.

Mr. Houser: Just as long as they can go to the site.

Mr. Ambur: Right. If you have a better term that’s more meaningful, I’d be happy to argue that it maps into the “costAndEfficiency” element in the XSD for Exhibit 300.

Mr. Cox: If we use the same component name, they can reuse the text they’ve already written for their ‘300’ component.

Mr. Houser: They’ll never agree on what term to use, because some agencies have already done their own automatic process like we have.

Mr. Sall: The OMB 300 XSDs tend to take everything normally called a “string” and define it as an OMB rich text thing. That’s just the name. It’s really just XHTML. It says, “Wherever we want text to appear, anything—that’s XHTML.” That can fit here. It goes to Roy’s point about being able to do tables. They’ve opened that up. It’s in line with your point—using what they’ve got, hooking in, in a clear way, to let you piggy-back on elements, actual data types they define, if it makes sense in the context of what you’re trying to do.

Mr. Ambur: I’d ask if you think it makes sense. To me, it’s a needless complication in the first stage of the process to require the use of data types specified by OMB. I’m hoping there’ll be others in the world, who run with the information provided in accordance with the XSD and do much better, more creative things with the data than we might ever dream of doing ourselves. I’ve already had contact with the IAC [Industry Advisory council, http://www.iaconline.org/]. I’m also looking into whether ITAA [Information Technology Association of America, http://www.itaa.org/] might be interested in picking up on that potential.

Mr. Cox: This field that supports rich text—what comes to mind is [Microsoft] InfoPath, because if you’re defining an XSD schema in InfoPath, it’s an available data type, but if you’re not, that may be why there’s rich text there.

Mr. Houser: Rich text is not a standard format.

Mr. Ambur: I absolutely want to avoid any proprietary lock-ins.

Mr. Sall: They just call it rich text, right? Doesn’t it just map to the XSD?

Mr. Cox: I don’t know. Does anyone have a tool to map Microsoft 2003?

Mr. Ambur: Perhaps the thought is that agencies will use InfoPath to produce their 300 Exhibits.

Mr. Cox: Yes, InfoPath—but only if you define the XSD to do that.

Mr. Ambur: I’m hoping these XML instance documents that validate on the schema will be on the Web and proliferate, and the folks out there will create their own—maybe having nothing to do with government at all. Perhaps we can piggy-back on the efforts of others. I could go for renaming the CostAndEfficiency element “Justification” or “Benefit.” However, I don’t feel strongly about it. If we decide to rename that element and since it’s not constrained in the first stage of the ET process anyway, we can say that it will eventually map into the costAndEfficiency element of Exhibit 300. On the other hand if we don’t change the name, it provides one clear link directly into OMB’s XSD.

Mr. Sall: Someone else suggested some additional changes and additions to the schema for the first stage of the ET process.

Mr. Ambur: It was Mamoon Yunus [ForumSystems, http://www.forumsystems.com/].

Mr. Sall: This is looking at the schema side of the event. Seeing how the data will appear might help. Elements and names and hierarchies are a little different. Should we go over that?

Mr. Ambur: It could be helpful. Jay Di Silvestri of Corel [http://www.corel.com/servlet/Satellite?pagename=Corel/Home] offered to put together an instance document so that we can see how the XSD causes the data to look in actual use. I suggested they work with DataPower and Conformative and ForumSystems, three vendors of XML acceleration hardware. I suggested they work with those three because, to me, it’s easier to get my mind around hardware components than software or data components because they have physical reality. We can see where they begin and end, and we can readily determine whether they fit together or not, physically speaking. Moreover, particularly in the early stages of the ET process, I don’t think should we should define everything as Web Services component. My sense is that some folks in the Components Subcommittee may wish to ignore hardware components, but I believe the ET process should be flexible enough to accommodate all kinds of components while at the same time making it easy for folks to identify particular kinds of components, if they know in advance what kind they are looking for.

Mr. Sall: At the top level, we have “TechnologyType” [in the schema on display] and it has a namespace that we just made up.

Mr. Ambur: Mark Day has already reserved the ET.gov and EmergingTechnology.gov domains for use by the ET Subcommittee. The namespace could also be an extension to the XML.gov site, but I understand GSA is planning to use the XML.gov URL for the XML registry that will support the eForms portal and the Business Gateway.

Mr. Sall: It has similar elements: “WebAddress,” “ComponentDescription,” “CostAndEfficiency,” “TechnicalStandard,” “DataTypeOrModel,” and “ComponentType.” He’s added some levels of description of occasional detail to the technical standard, based on looking at what are the actual values in the FEA. There’s a big list of services and technical standards. He has access channels. This can also come in on the wireless. I don’t know what that system is. Then electrical channels, delivery channels…it seems like he’s blending a lot of the FEA terminology.

Mr. Ambur: I agree completely that somewhere in the process the linkages to the FEA should be drawn. I asked ForumSystems and DataPower to think about it before their joint presentation to the XML Working Group awhile back, and Mamoon Yunus of ForumSystems has done so with respect to their product. For me, the bottom line is, this is more complexity than I want in the first stage of the ET process. Back to the addition that Matthew proposed with respect to discursiveness, I can’t go to the ET Subcommittee and explain it, and if I can’t explain it myself, I’m not going to suggest to anyone else that they should use it. What I want to do is take a fairly simple schema to the ET Subcommittee that adds significant value. I want to deliver it to them in fairly good form and then let them do what they may with it.

Mr. Sall: [Displaying another slide] This is what you’re talking about.

Mr. Ambur: Yes, and is “DataTypeOrModel” the best name for that element? I think we should change it. I think Matthew suggested we use “DataModel.” Does anyone else have another suggestion? The prompt in the Exhibit 300 is, “What types of data will be used in this investment?” Examples of “data types” provided by OMB in Exhibit 300 include health data and natural resources data. However, that is a different interpretation of the term “data type” than used by database designers and XML schema developers. What might be a better name for that element?

Mr. Houser: DataDomain.

Mr. Ambur: That was one of the ideas we had. We didn’t know whether it was the best.

Mr. Houser: I’m thinking, “OK, object-oriented programming.” Are you talking about the UML [Unified Modeling Language, http://www.uml.org/] model?

Mr. Ambur: No.

Mr. Sall: So “DataDomain.”

Mr. Ambur: That’s good. That’s exactly the kind of feedback we’re looking for. That brings us back to the context issue. Does it make sense to make “Technology” the top parent element, then under that, “Information” as a child element?

Mr. Cox: If you’re talking about other kinds of information than IT, that’s where you need to make a distinction, but I don’t think you need to do that. I would put ‘IT’ in front of it. You’re talking about an Information Technology component. I’d say ‘ITComponent.’

Mr. Ambur: Walt is making the point that the Developer’s Guide says, “Avoid abbreviations.” A caveat is that ‘IT’ is well recognized, so I have no vehement objections, but…

Mr. Cox: I agree.

Mr. Morgan: Does the EGov Act give ideas of the overall domain of this discourse?

Mr. Ambur: The Act requires NSF or someone to develop a taxonomy for scientific research documentation, but I don’t think it requires development of a taxonomy for ET or IT per se.

Mr. Houser: “Technology” is only one. You can have different kinds, and you can have capital investments that are non-technologies.

Mr. Ambur: I’m thinking of “Technology” being the top and “InformationTechnology” being the child element of interest to us. However, other folks might use the top element in conjunction with child elements for other kinds of technology, such as medical technology or environmental technology.

Mr. Cox: ‘Information Technology’ as a phrase is something everyone recognizes. They know what you mean. If you separate the two layers, they scratch their heads and say, “Why?” You have to explain.

Mr. Ambur: So you suggest “InformationTechnology” as the top-level parent element? I can live with that.

Mr. Sall: The root element of the whole thing will be “InformationTechnology.” That’s the change, right?

Mr. Cox: What is the root now?

Mr. Ambur: I didn’t have one.

Mr. Sall: I put one in, to be able to generate…

Mr. Ambur: I’m comfortable with limiting the context to “InformationTechnology” at the root element if everyone else is.

Mr. Houser: You can always change it.

Mr. Ambur: Someone probably will. This will at least stimulate them to take that action if they want to.

Mr. Morgan: I may be naïve, but does OMB accept requests for things like medical or space technology?

Mr. Houser: Yes.

Mr. Morgan: Do they use the 300?

Mr. Ambur: Exibit 300 is for all capital investments, not just information technology.

Mr. Houser: I didn’t know they were doing ‘300s’ for anything else but IT. I thought it was rolling into [OMB Exhibit] 53.

Mr. Ambur: Exhibit 53 is a subset of all of the data submitted in the A-11, Exhibit 300s, just for IT.

Mr. Houser: I’m not sure I can agree, but I don’t know if I can disagree.

Mr. Ambur: I’ve not been involved with it for several years but at that point, for example, the Bureau of Reclamation was preparing them for hydroelectric power investments, and my own agency was required to submit them for major investments in refuge roads, etc.

Mr. Houser: I can check.

Mr. Ambur: For the purpose of our discussion, I suggest we use different elements for the first stage of the ET process anyway, and map into the 300. The both Exhibit 300 and the XSD for it are moving targets anyway. I think it is best to use what are semantically clear and logical terms for the first stage of the ET process, as long as we’re mapping and have a clear vision to match them up with the elements of Exhibit 300 further along in the process.

Mr. Sall: One of the things we’re trying to do is to be compliant with the XML Developers Guide, and all the guidance from LMI [Logistics Management Institute, http://www.lmi.org/default.htm] is to make sure you have documented the elements liberally. So you put it in the actual documentation that Owen had right in the schema, so obviously you have to update it if you change the HTML page, but it serves as a way to explain well what these terms are—so when you’re in front of the ET subcommittee, it defines it right there. In XMLSpy, you can use that [Mr. Sall displayed several additional pages of XML]. All the documentation shows up. It’s like the approach that Matthew did with ‘any.’ Replacing ‘Foo’ with ‘any’ allows you to add on without constraining them. The documentation will be replaced in the long run anyway, but in the short run…

Mr. Ambur: Sort of a placeholder.

Mr. Sall: A placeholder, and it doesn’t constrain it.

Mr. Ambur: It could be a string or rich text.

Mr. Sall: Right.

Ms. Fong: The technical standard—what’s the relationship to the TRM? Suppose the technology doesn’t have supporting standards, or you want to call it ‘Technology’ maybe. I don’t know how you want to fit it.

Mr. Ambur: The whole context is information technology. The notion we’re trying to get out is that it should be based upon interoperability, other than someone’s proprietary implementation, and at some point in the process we should lead into the notion of voluntary consensus standards. However, in the first stage, we want to be cognizant of the fact there might be innovative ideas out there for which there is no standard yet. But by the time it gets to the later stages, the issue of standards should be addressed. Compoents that are eventually acquired should either comply with voluntary consensus standards for interoperability or a clear justification should be provided for not complying with such standards. I suggest that TechnicalStandard should be an optional element in the first stage, to give people the ability to identify related standards if applicable. Remember, in the first stage, the technical standards associated with proposed components would not be displayed on ET.gov, but could be used sorting and select components if the viewer knows they are particularly or only interested in those that comply with certain standards. So if they have an interest in components that comply with a standard like WebDAV [Web-based Distributed Authoring and Versioning, http://www.ics.uci.edu/~ejw/authoring/], for example, like I do, they have a way to identify components that work with the standard.

Is “TechnicalStandard” the most descriptive term? First “InformationTechnology” as the root element and then “TechnicalStandard” as a child element?

Mr. Cox: I think it’s sufficiently, correctly ambiguous. It doesn’t say “ISO,” or “OASIS,” or “W3C.” It’s just a generic term.

Mr. Morgan: The justification said there could be zero or one of them?

Ms. Fong: Zero or many.

Mr. Morgan: I imagine some components we look at have many related standards.

Mr. Sall: At some point, the term was “TechnicalStandards” (plural). The guidance says don’t user plural standard names, because we can always repeat the element, but the element itself is singular.

Mr. Morgan: If the element is repeated, the metadata about the instances would be different. It could talk about different standards. There could be many instances talking about ET.

Mr. Ambur: I thought about paring it back to just “Standard” but I like the notion of leaving “Technical” in there, because it indicates that we point to the FEA TRM eventually. With respect to FEA models, OMB acknowledges they’re not complete and they invite agencies to help populate them further. We can support that objective by enabling people to identify technical standards that have not been previously identified in the TRM.

Mr. Ambur: What about “WebAddress”? Matthew suggested using the datatype “anyURI”?

Mr. Sall: It makes sense. I think that’s what he’s talking about.

Mr. Ambur: As I recall, he also proposed to rename the element itself but “WebAddress” is fine with me. Any other ideas?

Mr. Morgan: Doesn’t ‘URI’ have a larger meaning?

Mr. Sall: It depends upon whether you’re defining HTTP. ‘URN’ vs. ‘URL’—is that what you’re saying?

Mr. Houser: ‘URI’ is more correct. What is OMB doing?

Mr. Sall: I don’t recall.

Mr. Ambur: They don’t have an analogous term. The element of the XSD for Exhibit 300 that I’m mapping into is “SponsorOwnerContactInformation.”

Mr. Morgan: There could be other information in there.

Mr. Ambur: I like “WebAddress” because it conveys meaning to the common person. I don’t fully understand the difference between URI, URL, and URN but I’m comfortable with the term “WebAddress”. If we want to change it later, fine. My concern is that when proponents provide the values required by the element, can folks use it to link and jump to the site where the full, valid XML instance documents reside on the Web.

Mr. Cox: The appropriate data type is ‘URI.’

Ms. Fong: Keep “WebAddress.”

Mr. Ambur: I’m comfortable with that.

So with respect to ComponentName and ComponentDescription, what about constraints? My concern is to make it as easy as possible to scan through the list of names, and then if you want more information than provided in the brief description, you can just click on the name and go to the instance document. I’m currently suggesting that ComponentName be limited to 60 characters and ComponentDescription to 600 characters. I won’t know what that really means until I see it on the screen, but I can live with it as a placeholder for now, unless someone has a better idea.

Mr. Sall: Unless you don’t want to constrain them at all.

Mr. Ambur: Well, I think we at least need to constrain the length of ComponentName, and if someone has a 30,000-word description of their component, you’d have to scroll a long way to get to the next one.

Mr. Cox: At least to prevent dumping of marketing material.

Mr. Ambur: We absolutely don’t want rich text. With respect to the CostAndEfficiency element, I can live with no constraints since that element will not be indexed on the et.gov site, and proponents can put whatever they want in that field for display on their own Web sites anyway. So I think we have consensus on TechnicalStandard and, for DataTypeModel, to use “DataDomain,” right? With respect to ComponentType—we need to talk about that more. Ken—you wanted other things on there?

Mr. Sall: I just thought, in terms of the Component subcommittee, it seems like you’re going after different…I’d like to engage them in that discussion.

Mr. Cox: What comes to mind is the OSI Model [ISO Open System Interconnection Reference Model, http://www.iso.org/iso/en/CatalogueListPage.CatalogueList?ICS1=35&ICS2=100], which has physical layers. Maybe the user vocabulary should have something like that, to say, “I’m addressing the data layer,” etc.

Mr. Houser: If they don’t get it quickly, it won’t fly.

Mr. Cox: Does the FEA have a description of the overall architecture that includes those?

Mr. Ambur: You mean hardware, software, and data? I don’t think so. My concern is that they started with the Business Reference Model [http://www.feapmo.gov/feabrm.asp]. It’s a 30,000-foot view of the activities of government, but it doesn’t touch the ground, where the rubber meets the road and the business actually gets done. OMB loves it because they think it gives them an understanding of everything government does, but I’m afraid that it obscures the components that cut across the business lines. Lee [Ellis] talked about forms this morning. Form automation and records management requirements and components cut across every business line. By focusing on business lines, I’m afraid we risk building bigger stovepipes while failing to recognize the potentials to reuse hardware, software, and data components across business lines.

Mr. Cox: The U.S. Patent and Trademark Office [http://www.uspto.gov/] and its enterprise architecture are going to have that physical layer as part of their architecture, on top of which applications run, etc. So those things are going to be part of the FEA, whether they’re wanted or not, so they have to take it into account, and eventually it should be here. Whatever arises out of agency architectures has to have a place in the FEA.

Mr. Houser: When we talk about saying we need middleware, and this and that, it just gets complicated.

Mr. Cox: If emerging technology addresses one of the layers, there needs to be a way of addressing what it’s about.

Ms. Fong: What’s a smart card? Middleware?

Mr. Houser: I don’t think so.

Mr. Morgan: It has three things: hardware, software, and data.

Mr. Ambur: That’s a good example. So we couldn’t constrain ComponentType to one value for smart cards.

Ms. Fong: So it has to be one or many; it could be hardware, software, and data.

Mr. Sall: So we want ‘ComponentType’ to be…

Ms. Fong: Multiple type.

Mr. Ambur: So the controllable vocabulary is repeatable.

Mr. Cox: This is a case where an example in the comment would be helpful.

Mr. Sall: When we get the schema closer to where we want it to be, we can generate an instance. That would help, or people could play around, like start with the instance and go backward with the schema.

Mr. Ambur: Any other big concerns?

Mr. Morgan: I’m counting on some time this afternoon to continue with this.

Mr. Ambur: Is anyone staying?

[Mr. Houser and Mr. Kane indicated that they planned to remain for the afternoon Registry/Repository meeting.]

Mr. Ambur: Bruce, are you comfortable with this?

Mr. Cox: Yes. I’m looking forward to using it.

Mr. Ambur: Based upon the information gathered in the first stage, the second stage should then speak for itself. Either folks will express interest in proposed components identified and described in the first stage, or thye won’t. And that’s what I’m aiming for.

Mr. Morgan: As soon as CIOs do that to vendors and vendors try to use it, we’ll get feedback.

Mr. Ambur: As soon as the possibility of establishing an ET process was raised, Davis Roberts [of the XML Working Group and SAIC, http://www.saic.com/] indicated she is faced with that same problem. Mark Forman is directing people to her as a representative of IAC and she can’t deal with all of them.

Mr. Ambur: We’ll be back in an hour. Thank you all.

End meeting.

Attendees:

Last Name	First Name	Organization
Ambur	Owen	FWS
Barr	Annie	GSA
Brish	Arie	Conformative Systems
Cox	Bruce	USPTO
Ellis	Lee	GSA
Fong	Elizabeth	NIST
Houser	Walt	VA
Kane	John	NARA
Kantor	Bohdan	Library of Congress
Le Maitre	Marc	OneName
Morgan	Roy	NIST
Niemann	Brand	EPA
Ruggero	Russ
Sall	Ken	SiloSmashers