The Draft Meeting Notes Digest of the Grid Information Services Working Group

 
Information Services
Request for Comments: GIS-WG-010-1
Obsoletes: none
Category: informational

1 Purpose of the Document

This document contains the past meeting notes of the Grid Information Services working group of the Gridforum. The document is periodically updated to reflect the changes. Each time the document is updated the version number will be increased by one. This ensures that the paper can be properly referenced.

2 GF6 / GlobalGF1 (Amsterdam, Netherlands, 5-7 March 2001)

will be included at later time.

3 GF5 (Marlborough, MA, U.S.A.) October 2000

We list here the relevant activities persued during the fith Grid Forum. This includes an XML tutorial heald by Reagan Moore, a schema suggested within the performance working group, and a Jini activity.

In meetings outside the offical schedule, we talked with Internet2 about taking a lead on the people development [2], what we should be doing is monitor their progress and comment to possible directions they take. References to TERENA[3],[4] are also useful.

4 GF4 (Redmond, WA, U.S.A.) July 2000

The fourth Gridforum was sucesful in the aspect that all discussions held during the working groups have been included in the documents. We observed tat many documents that were presented were not read by other working groups. Moreover, the discussion about the COmpute resource document as part of the scheduling working group was a failuer as the document was originally scheduled to be discussed in a timeslot lasting 30-45 minutes. The total time reserved by the scheduling working group for this document was about 10 minutes leaving the participants of thest session with a feeling that not even the surface could be touched. Ken Kilngenstein introduced the educational person object as also discussed as part of the TERENA middlweare project. To find more out about the activities, we refer to the appropiate working group document.

5 GF3 (San Diego, CA, U.S.A.) March 2000

The agenda of the thrid Gridfurom related activities are listed below. We had to main sessions in which we covered many different topics.

5.1 Summary of the meeting

During the Grid Forum the discussions were centered around the following topics:

  1. Charter. We established that the mission statement of the charter is accepted by the working group. We identified that the charter should include a finer grain of milestones to encourage others to participate in smaller tasks.
  2. Introduction. Since we recognized that the knowledge difference between participating members was large, we decided to have a 45 minute introduction about the goals, purpose, and the current activities of the working group. We identified that this introduction is still too short and that we want to hold a tutorial on Information Services prior to the next Grid Forum.
  3. Chairs. Since we established that the work of the Group is too much to be handled by a single Chair, we nominated Mary Thomas and Lennart Johnsson as co-chairs. No other volunteers were found during the meeting. Gregor suggested that a person from the eGrid or jGrid (Japan) could be employed as additional co-chairs.
  4. Report Presentations. We spent two hours briefly reviewing some of the reports published through our web site. The reports discussed are listed below:

    1. Directory based information services

      1. Schema related documents

        1. GOS was presented in it's entirety. (presented by Gregor von Laszewski)
        2. MDSML was presented in it's entirety. In addition, shortcomings as well es differences to DSML and RISML were discussed. (presented by Gregor von Laszewski)
        3. (RISML) was given an introduction, and was compared to MDSML and DSML. We may need a more formal document on it. (Presented by Brett Didier)
        4. In the discussion we identified that DSML and MDSML should be supported, but that DSML is not sufficient to help organizing schemas in a distributed fashion (as our working group is doing it). MDSML can export to DSML but not vice versa.
      2. Objects related documents

        1. People: The object classes for people are based on the X.500 and inetOrg person. A grid person is an extension of inetOrg person. We discussed extensively with Internet2 about joining our efforts as they maintain a large database of educational people. (presented by Brett Didier)
        2. Compute resources: Due to time constraint we have did not fully present the compute resources. We may consider presenting it at the next Grid Forum. We discussed the notion of ImageResources vs. Physical resources. We noticed that some groups could use this notation like Andrew Chenn and Alexander Reinefeld. This notion was introduced by MDS, but has, so far, not achieved much notice since it is one of the hidden features of the original MDS. We hope to encourage discussion on this issue. (Presented by Gregor von Laszewski)
    2. Other presentations

      1. XML data framework (is this correct?)

        1. GXD: We presented the concepts of GXD. (presented by Pete Vanderbilt) <ask Pete for a one paragraph summary to be included here, that includes a contrast to (a) ) >
  5. Open discussion. (see next section).

5.2 Notes to presentations and discussions

5.2.1 Presentations

5.2.1.1 Grid object specification.

The purpose of Gos is to allow the formalization of objects and defined scopes of objects. The difference between the LDAPv3schema syntax and Gos syntax is based on the descriptive annotations and the introduction of a simple name space. The namespace is a simple syntactical trick and thus can be implemented with current LDAP directory technology, even though a namespace is not part of the directories. A program converts this language to a schema so that an LDAP server could understand it. The inclusion of aggregates is another syntactical trick to allow utilizing LDAP servers that do not support object classes of the KIND abstract. Aggregate class attributes are ``inlined'' into other object classes during program execution. At this time it is unclear if we should use the term abstract class in lieu of aggregate class. There was no compelling reason brought up by the Grid Forum members that favors one or the other. Thus, this issue is postponed till it is brought up on the e-mail list. Aggregates help to introduce multiple inheritance. Note that LDAP servers do not allow multiple inheritance.

5.2.1.2 MDSML.

In MDSML, attributes are defined with each object. LDAP and DSML separate the definition for defining the types of attributes from the object definition. The group developing MDSML found this to be quite cumbersome in an environment where a large group of objects and attributes is maintained. Instead they suggested defining the type as part of the object specification itself and develop a program that allows for typechecking. The group at ANL (Gregor von Laszewski, and Peter Lane have successfully demonstrated a prototype version of such a validating MDSML compiler). Though MDSML can be easily converted to separate the attribute definition from the attribute specification the development team wanted to point out this important difference. Once the Working group decides in collaboration with other projects how to proceed it will be very likely that the definition of the types in Gos is maintained as is, but in MDSML it will be separated. If this is the case it must be possible to allow (objectclass+, attributes+)*. This has to be compared against DSML.

5.2.1.3 RISML.

RISML and MDSML are fairly similar, but have some syntactic differences. It initiated some additional discussion on what syntax may be appropriate for out problems.

5.2.1.4 Issues.

The following issues were raised during the discussion:

  1. Can any of the above syntaxes express complicated data types and how will they map to the underlying data model? Preliminary Answer: If we use LDAP, LDAP does not support a complex data model. We are restricted to what LDAP provides. On the other hand we can build and register services as meta-data in the directory that than use more sophisticated data types.
  2. Is it foreseeable that picking a single model is top restrictive? Preliminary Answer: We need multiple models standing by each other. We use LDAP as a kind of a meta-model. It is important for the WG to consider different data models as well. GXD and XMLare good examples of this because it can transfer to other implemented data models. Unfortunately, there was and maybe still is a lack of manpower to analyze all of the problems and models. We have chosen a straight forward beginning as the current active members are convinced that directory based information models must be supported due to the administrative domain problems in security that force organizational independence of information services.
  3. How could one represent the CIM model, and has anyone looked into this? See Section [*].
  4. How important is the child-of relationship? It is not part of LDAP, but is a part of the X500 definition. There are advantages to defining client side programs using this relationship. Example: If you know where objects are located in a tree, you can search more efficiently.
  5. What platforms does the ANL Grid software fro MDSML run on? The program is written in Java, but the translator from LATEX to XML is ``hacked'' in perl ;-)
  6. How do we deal with people that want to use DSML rather than our object description? The lacking of namespaces and child-of relationships in DSML is a major issue. There are good reasons not to use DSML. One is that it is a floating target. Another, as pointed out above, is that it is ill suited for handling a large collaborative effort for defining objects. There is even a citation of some other disadvantages in the original DSML specification.
  7. Are you planning to use aggregates for some of the additional information (e.g. accounting, security)? Yes, but it depends on what is required, not enough information to know a complete answer at this time. We should examine those classes to see how we best utilize it.
  8. We identified that there were no leads working on object classes for data resources (hard dives and such, not data grid) and network resources.

5.2.2 GXD

5.2.2.1 Presentation.

GXD (Grid eXtensible Data) is a software framework facilitating publication and use of data from diverse data sources. GXD defines an object-oriented data model designed to represent a wide range of things including data, its metadata, resources and query results. GXD also defines a data transport language, a dialect of XML, for representing instances of the data model. This language allows for a wide range of data source implementations by supporting both the direct incorporation of data and the specification of data by various rules.

The GXD software library, prototyped in Java, includes client and server runtimes.  The server runtime facilitates the generation of documents containing data encoded in the GXD transport language. The GXD client runtime interprets these documents (potentially from many data sources) to create an illusion of a globally interconnected data space, one that is independent of data source location and implementation.

The basic idea is that a data source, whether implemented using a database, files or a high-performance storage system, can be exported in GXD format using commodity web technology (such as CGI programs, servlets, JSP and JDBC). An application using the GXD runtime can then access any such data source. Over time, data sources can evolve to use common schemas and to contain cross-references to one another.

GXD is typically used to represent a range of things from archives (self-describing, top-level containers), through various kinds of organizational structures to datasets. ``Small'' datasets may be represented directly in GXD while ``large'' datasets are kept in their native format (such as HDF) and referenced using URLs combined with identifying metadata.  GXD can also represent things like authorship, lineage and data quality.

Although GXD may be applied to scientific data, it was designed to be more general. In particular, GXD can represent infrastructure items, such as users, machines, accounts and so on. As such, it could serve as a component of an implementation-neutral grid information service.

For more information on GXD, see www.nas.nasa.gov/pv/gxd/.

5.2.2.2 Issues.

  1. What are the real applications/schemas for GXD?

    Currently GXD is being applied to experimental windtunnel data. GXD presents various views of the data including one based on hierachical containers representing ``tests'', ``runs'' and ``sequences''. We are looking for other applications, possibly in Earth Sciences or Astrobiology.
  2. What are the limitations of GXD?

    Here are a few considerations.  First, GXD is designed for relatively slowly-changing data (in that it internally caches GXD documents) - for quickly changing data, you might use the upcoming ``methods'' feature or, alternatively, use GXD to return a handle to some other service. Second, GXD is meant to complement other Grid technology and, so, doesn't have things like persistent dataset caching or replication. And finally, the GXD implementation is prototype quality and Java only.
  3. Why not use GXD to save big data?

    There are already many tools that operate on data in standard scientific data formats, such as HDF. If you were to put binary data into XML, its size would increase dramtically and you'd probably need to get it back into its native form in order to use the standard tools. Using GXD, you instead encode a reference to the data (together with some metadata saying what the data is).
  4. Is there a method for discovery of interfaces?

    Yes, each node carries an indication of which (GXD) interfaces it implements. For example, a client can determine whether a given GXD data node implements a given extension to a standard interface.
  5. How does GXD relate to Java XML parser classes?

    The Java XML Parser generates schema-specific Java classes that allow for convenient access to XML data. Currently GXD supports only generic access (more like DOM) although GXD may have schema-specific Java classes in the future. Also the GXD classes have more functionality, like the ability to follows links and to provide more of an object model.

5.2.3 General issues

This section contains an unorderd list of a number of questions that were raised during the meeting. We found it important to maintain this list in order to address them in the appropiate task groups. Additional items that have been discussed lead to the task group definition and their working agenda listed in the appendix. We have refrained from repeating this information in this section.

  1. Working group activities

    1. The current working group activities are characterized by a small group of people. This group works largely on information services and models based on directories. Due to the lack of manpower within the group, other information models have not been considered. The group of actively working members feels that directories play an essential role. Initial other research activities like GXD are complimentary to the directory effort. There could be others!
  2. Getting involved in the working group

    1. The reason why we presented in detail examples of documents produced by the working group is to show that it is possible to contribute to the group with minimal background knowledge in information services. Examples are: a literature review, defining some needed object classes, ...
    2. Recommendation was issued to download the x.500 and x.501 documents from the IETF website.
    3. Where does a new member of the working group start? see Section [*].
  3. Increasing relationship to other working groups:

    1. Accounting: Gregor will contact the chair of the Accounting working group to identify someone to participate and start the communication.
    2. Security: Marlon Pierce (NPAC) will help start the communication with this group regarding the people representation
    3. Collaborative environments: PNNL & ...
    4. Networking: performance working group.
    5. Remote IO GF working group: Reagan Moore

      1. The overlap between GIS and Remote I/O needs to be addressed. The GIS is the integration among all wg's. Is it up to us to define the objects or the schemas? It is our task to convince the other wg's that our definitions/interfaces are correct. Need a similar notation for communications in each of the wg's. These are exercises to show that they are doable and show how simple the task is.
  4. Increasing relationships to activities outside of current working groups:

    1. Action Item - create Steering Group to assess the Digital Library technologies and how they can interoperate with the products of this group, also we need to evaluate other implementations to avoid redundancy of information/effort.
    2. Another group to communicate with is the Archivist Group.The issue here is how to keep information beyond the life of LDAP.
    3. Identify people interested in interfacing with CORBA
  5. We need a possible whitepaper on the GIS. As part of the whitepaper we may have to discuss the following:

    1. Regan Moore is interested in understanding how the work this group is doing relates to data services. An example of what is provided by these services is that there is no persistent identifier - the data moves around and the reference breaks. How would this be managed by this service. Peter Vanderbilt suggests GXD. Real answer is this service shouldn't manage it, but be used to advertise the data services.
    2. Donald Petravick: Purpose of this is to describe data, the job of the GIS is to identify that we have a data center and publish the service rather than to push the data into the GIS concept.
    3. The next question is whether some services are generalized across all of the services, for example user names. Gregor: Registration of service will help address some of these issues. I do not think that we should take out data from SRB, but propose the registration of services that identify the interface to the service. The user could customize the view that they are interested in For example, a host oriented view vs. data oriented view vs. service oriented..
    4. Pete Vanderbilt: Queries for information or scheduling jobs will require knowing where the data is located and possibly replicated. We whill need to combine information where data is replicated from SRB and from GIS to identify where the host is located (on which the SRB catalog is running).
    5. We could try to make it easier for groups doing the programming by providing sets of guidelines that would help the programmers define how to organize the model.
    6. The approach of the group should be minimalist in nature for the service. Remember that the physicists/chemists don't speak CORBA. We want to provide something that everyone can understand.
  6. The challenge is that the activities out of other WGs, will have different requirements for the services. Directory must support rich level of complexity, yet provide minimal picture. Providing hierarchical structure, use of referrals or replication may not be the right answer. (Gregor disagrees with this).

    1. The info service doesn't need to know LSF, but rather that a computing system is running LSF.
    2. Two layered world: what services available, now that you know where they are, use these services.
  7. Peter V: LDAP points at services.... Wonder about if each little service will have its own API's/protocols separately, or coordinated for similar ways to get to the services. There will be times when you have to take data from multiple services and join them... will need a common data model. Building individual services, but until they use the same interfaces and protocols, they won't work together. We need simple layer services that work together.

    1. GVL: We need to support both issues in a GIS WG. I can't require everyone to commit to the same protocol. Separate domains, etc. Need to be careful to assign yet another layer of complexity. Do we support organizational structure like service registration, or is there another model? First step in defining information discovery. This is why LDAP and X500 are being adopted so widely.
  8. Will there be a wg on interoperability services? Will this occur by default. Do we need to formalize it to watch it. People aren't aware of all of the services?

    1. Should we make a survey and maintain a list of projects that are solely related to interoperable projects People are not publicizing their grid work/activities. Not a good idea to just have links, but should be augmented with additional information to support decision making. Not sure how to organize on limited man power. Work out a plan for wg before next gf on how to support this capability so that we can make that a future task.
    2. Reagan Moore will organize the survey for information services. Will then need to reorganize the web site and make it more beneficial.
  9. Why are the grid services more than just the info services. What is the grid arch... multiple? This is being done in the higher up wg. The Gis wg assumes the grid architecture.
  10. In next few months, Mary will be doing design for user portal stuff. She plans on publishing schemas to GIS and wants comments. NCSA/NPACI/IPG user portal. User Portal, Gateway, CPSE, Globus, MIX.
  11. Andrew Chen is also doing work in this - send email Group around Satoshi M. is also doing this kind of work.
  12. We need SQL and Oracle and xml/excellon expertise Jini, CORBA, RMI Grid requirements differ from industrial.
  13. Naming issue: we need to reach syntactic agreement on the top level of the tree. Should we have o=grid, and dc=X (domain component)? Answer is yes. but it must be stated.

    1. Action: Form the LDAP/X.500 Working Group. Lead: Steve Fitzgerald Steve comment: Globus is re-working LDAP to incorporate a dc component (will allow service to lookup Services by domain name).
  14. Existing Infrastructure:

    1. Lead Jarich.... (name?, email, Poland eGrid)
    2. GIS should look into what is being done at NCSA/NASA.
    3. We need an expert in relational databases to help put together an RFC document:

      1. Volunteers wanted ... Jarich...
      2. Comment about needing an LDAP interface to the db so that you don't need an MDS. Should we have a discussion about other protocols,as I am not sure that LDAP is the best protocol?

        1. Action Item: have a volunteer to investigate other protocols besides LDAP and MDS. Establish a list of items/activities that appear to be useful. Pete, volunteer to lead, Brett has volunteered. The w3c and IETF have people doing RFC's in this area. We need contacts to steer this activity
  15. Discussions and Observations

    1. GIS should follow a minimalisitc approach for the model
    2. evaluation of other-than-LDAP technology is necessary, but it was pointed out that the current group does not have enough resources. Directory information model is service based and includes the concept of metadata repository and registration of services in it.
    3. Should a working group on interoperable services be formed? What does this mean exactly?
    4. wrong: - GIS vs Security
      The security group has an implementation in its charter while the GIS does not. Though the Globus team may provide a prototype implementation.
    5. Application and User working group (?): there was currently no clear vision about how these groups could contribute. The chairs will review in the near future the mission of these groups and if the documents have relevance to these groups.

5.3 Conclusion

From the response of the participants we conclude that we had a successful meeting. Nevertheless, a lot of work has to be done on a voluntary basis. The current active members of the working group have thus chosen to work on a subset of issues. More volunteers are needed to expand in other directions.

6 GF2 (Chicago, IL, U.S.A.) October 1999


6.1 Working Group related Agenda

During the second Grid Forum meeting we had the following agenda

6.2 Summary of action items/goals:

As part of the discussions we derived the following summary of actions and Goals:

6.3 Suplementary Notes Taken on Oct 19, 1999

Gregor gave a condensed version of the presentation made at the first Grid Forum meeting, which proposed the need for the Grid Information Service Working Group. There was agreement by the attendees for the need of existence of this working group, although opinions varied on what the focus of this working group should be. This session was widely attended and this contributed to confusion about the purpose and expectations of this working group, since the participants did not share a common terminology. Different opinions were expressed relating to the level (schema, api, protocol) that should be the focus of this working group, what products this working group should provide (single schema definition, schema definition language, software tools), and how this group should work with the other groups (no interaction, liaisons). No clear agreement on any of these issues occurred during this meeting, but it was important for this discussion to occur to clearly point out the lack of agreement about the purpose of this group due to members which had not previously attended the meetings of this working group.

Some key issues that came up in this meeting:

6.4 Suplementary Notes Taken on Oct 20, 1999

6.5 Suplementary Notes Taken on Oct 20, 1999

7 GF1 Defining the Charter

The meeting was attended by representatives of the following Groups/Projects, e.g. Alliance, ASCII, Gateway, Globus, DOE2000, IPG, RIB, VA Linux, 3 different groups from problems solving environments, Scheduling and monitoring, ComputingPortals. This is important to note in order to emphazise that not just members from the MDS team participated at the meeting. During the course of the meeting we determined the following action items.

7.1 Action Items (Charter Definition)

  1. [done] Define a working group charter

    1. We developed the following General statement to be included in the charter
    2. Identify requirements for and facilitate the development of interoperable models and mechanisms for the information services necessary for doing grid-based computing, such as the definition of the data models, mechanisms for accessing metadata, ... .
    3. The working group will also shepherd the implementation of standards of the models and mechanisms so developed where the community deems it appropriate.
    4. define a timeline as part of the charter
  2. [done] Define a a preliminary process that governs the activities within the working group Process

    1. Motivation: The working group could be potentially large: We concluded it would be advantageous to work on two leveles. Thus the working group is similar in function as an area in IETF. Each working group has multiple task groups.
    2. Define concrete tasks groups. Task groups should be initiated by the community or on suggestion by the working group chair, or steering group.
  3. [continuous] Publicizes the effort in community

    1. Communicate and coordinate with other groups (not only Grid Forum)
    2. publicize the charter to the community and ask for participation in the working group
    3. Examples: Information services workshop at NAS IPG, TERENA meeting, Jiniforum, Web page
  4. [in progress] Getting bodies behind the effort.

    1. due to personal changes at NASA many of the tasks could not be completed
    2. Grid Forum may be so far just a smaller group => encourage others to participate in Gridforum (TERENA, IETF, ...)

7.2 Discussion Topics

  1. Who are the customers?

    1. What are the requirements defined by the customer
    2. Define/suggest an extensible and consistent data model.
    3. There could be more than one.
    4. (Policies can be part of the data model?)
  2. Define/suggest API for easy access

    1. discussed shortly the OMG model
    2. CFP, Proposal,
  3. Possible short term goals

    1. Task 0:

      1. Usage Scenarios/Requirement
      2. Solicit usage scenarios which describe what objects are involved when where and why?
      3. Ask this from the other Grid groups.
    2. Task 1:

      1. Define a grid information objectmodel.
  4. How do we write down the objectmodel?

    1. identify candidates for syntax
    2. groups needs to exchange their schemas
    3. standard schema definition language
    4. define a standard repository interface which interfaces to multiple different repositories/protocols/API/Bindings
    5. Information modeling tools
    6. Idea: In stead for asking for one gigantic proposal follow the Portal/Datorr model while asking for smaller proposals
  5. Proposals for Information about specific schemas

    1. People (X.500)
    2. Hardware Resources
    3. Computers
    4. Network
    5. Data Storage
    6. Software
    7. Jobs/Tasks
    8. Collaborative Envir.
  6. Each group

    1. Gathering information about particular area
    2. Talk to each other and other groups
    3. gather requirements
    4. formulate use cases for each area
    5. making sure that a at least one proposal in that are appears
    6. statement about scalebility
    7. statement about who to do security, access restrictions.

8 Author's Adresses

Gregor von Laszewski
 
Mathematics and Computer Science Division
9700 South Cass Aveneue
Argonne National Laboratory
Argonne, IL 60439, U.S.A.
phone: (630) 252 0472
fax: (630) 252 5986
e-mail: gregor@mcs.anl.gov

Karen Schuchart
 
Brett Didier
 
Mary Thomas
 
Mike Helm
mike@fionn.lbl.gov
Steve Fitzgerald
steve@isi.edu
Al Gilman
 
Peter Lane
 
Pete Vanderbilt
 

9 Acknowledgement

This document has been written with the help of the working group members. Thus correctly speacking we ought to have every working group member listed as author ;-) This document was formated with LATEX [1].

10 Copyright

Copyright (C) Grid Forum 2000. All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Grid Forum or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into

Bibliography

1
Ralph Droms.
A latex style for rfcs and internet drafts.
Internet draft, IETF, Bucknell University, http://www.ietf.org/internet-drafts/2-latex.template, July 1991.

2
Internet2.
Ldap internet2 edu person.
http://www.educause.edu/eduperson/.

3
Trans-european research and education networking association (terena).
http://www.terena.nl/.

4
Terena task force lsdldap services deployment.
http://www.terena.nl/task-forces/tf-lsd/.


Contents



created by Gregor von Laszewski, gregor\@mcs.anl.gov