Comments on EGEE Middleware Architecture DJRA1.1 Version 0.2 dated July 1, 2004 David Adams July 20, 2004 General comments --I feel this is a significant improvement over the previous draft (dated June 8, 2004). The system appears more as a loosely coupled collection of interacting services rather that a monolithic aggregation. --However the services are not independent and it would be useful to have a diagram or diagrams illustrating the dependencies. --A section of service infrastructure would be useful. See comments in the following. Common issues such as service discovery and creation would be discussed there. The distribution of catalogs might also be covered. Another issue is WSDL vs. client API's. --How do users chase down the point of failure which may occur at any point in a long chain of services? 2. Executive summary. --Figure 1 is useful but I am no comfortable with the services bridging categories. --In "API and Grid Access Service", I would replace "user will gain access with "user may gain access". The GAS is optional, isn't it? 3. Requirements --In "Security", I presume the third A should be auditing rather than accounting. 4. Service Oriented Architecture --In "Services", I find the definition of service misleading. I would say that a service instance is a process that may depend on the state of other service instances. I suppose the document is technically correct if service refers to the template used to create the isntance rather than the instance itself but it would be helpful to draw this distinction or drop the second sentence. 5. Security Services 5.1 Authentication --I would like to see a connection from user credentials to user identity and, in particular, a mechanism to connect the multiple credentials that a user might obtain to his or her single identity. 5.2 Authorisation --I naively expect that we would use a pull model to put minimal burden on the user/client. Or is the agent something that appears only behind the GAS? Or perhaps I completely misunderstand. 5.2.2 Policy combination and evaluation --An example policy *might* make this clearer. 6. API and Grid Access Service The last sentence of the introductory section indicates that services have to be directly accessible, i.e. without the GAS. Preesumably this implies they must provide authentication and aouthorization. What model is followed here? 7. Basic Information and Monitoring Services --Is a ProducerFactory conceptually different from a ConsumerFactory? Or a WMS or SE factory? Perhaps there should be a dedicated section on service infrastructure to address issues such as factories. 7.1.4 Security --Isn't the discussion of authorization rules more general and not just limited to information services? This might be included in the aforementioned section on service infrastructure. 8. Job Management Services --I don't see any discussion of the environment in which a job can expect to run. Is there a sandbox? Is all input and output restricted to this sandbox? What about /tmp? What about access to SE's? How about access to external services, e.g. DB's? 8.1. Accounting --Isn't accounting a type ofinformation service? Or maybe it should have its owwn category. It doesn't fit too well in job management. 8.2. Computing element --I propose to draw a clearer distinction between push and pull CE's. Rather that saying a CE provides both interfaces, reserve CE for the push model and invent a new term, say computing manager (CM), for the pull interface. A CM might be implemented making use of a generic CE interface. 8.3. Workload Management --I wonder if the detailed discussion of the internals of the WMS is appropriate for this document. Presumably a service with the same interface but different internal decomposition would still be considered a WMS. --As mentioned in the text, the interface of the WMS might be the same as the (push) CE. If so, maybe it should simply be called a CE. 8.4. Job provenance --I would like to have the possibility of access to the output sandbox to understand failures. 8.5. Package management --I would first like to have a package location service, i.e. one that returns the location (accessible directory name) given the package name. Next would be something that it able to install a package and its dependencies. Finally the service could also manage the space, deleting unused packages when space is needed. 9. Data Services 9.1 Storage Element --I am not real comfortable with the emphasis on tactical vs. strategic storage. It might be useful to have a short-lived SE but a site should not reclaim the storage before the contracted lifetime has expired. Instead I would identify the characteristics of SE's (reliability, lifetime, deletion policies, ...) and label them accordingly. --The phrase "LFN or GUID" appears in many places. I propose the SE only support one primary identifier (e.g. the GUID). A user holding a secondary identifier such as an LFN or symlink would obtain the corresponding primary ID from a catalog service before interacting with the SE. --I don't see the posix-like file I/O as a required part of the SE interface. It should be offered as a separate service. 9.1.4. Posix-like File I/O --The mutability of logical files also needs to be addressed at the SE level. Or are we assuming that a modified file gets a new GUID? If so, this should be made explicit and prominent. --I see two components to posix-like I/O: mapping file names (with directories) to GUID's (analogous to inodes) and I/O operations on the files associated with these GUID's. The latter is a site service and the former is more naturally left to the VO's. 9.2.1. File Names --There is another important file name, the physical file name recognized by the OS. This is required if a file is to be opened by an ordinary application (one not using the grid posix-like I/O) or when a file in the sandbox is to be inserted into an SE. --As discussed extensively in the ATLAS DB list, there are problems if the GUID used for a file is the same as the POOL GUID and the POOL file is published before and after data is appended. Are we assuming or allowing use of POOL GUID's? 9.2.3. Catalog Services --It looks like the FC has two roles: to track the the symlink-LFN-GUID associations used in the posix file I/O and flagging master SURLS. These should be separated. The former is part of the Posix I/O service. The latter, if present, should be stored in the replica catalog. --The name "File Catalog" it too generic in any case. 9.2.4. Operations --This looks like it could be subsumed into the preceding section. 9.2.5. Scalability and Consistency --Clearly these comments appy to all catalogs,not just file catalogs. This discussion should move accordingly. 9.2.6. The Master Replica --I don't think we need a "master source for replications". If the logical files are immutable and replication is reliable, then any replica can be used. --If update operations are to be supported now or ever, we need considerably more discussion and a single master replica is not the only approach. For example, there could be a monotonically increasing version number or update time associated with each replica. I would find it useful to have a flag with each replica indicating whether or not it is the final version. --I am not advocating mutability. I think we can simply assign a new primary ID if a file is changed. In any case, we should have a clearer vision in architecture. 9.2.7. Directories --This discussion can move to the description of the Posix I/O service. 9.2.8. Virtual directories --This high-level view is very experiment-specific and should not appear in the middleware discussion. It is also a pretty complicate way of describing something that boils down to a list of logical files. 9.2.9. Datasets --Again, not a discussioin appropriate to the middlware architecture. 9.2.10. Security --The files can be accessed through the SE interface without reference to the Posix I/O service. Thus, the file ACL's must be maintained by the SE, not the Posix I/O service. Directories only exist in the Posix I/O service and so their ACL's reside there. --I suppose the preceding suggests the NTFS-like ACL is more appropriate. 9.3. Data management 9.3.4. Transfer Fetcher --This seems to be part of the DS with no external interface. Another implementation of the DS might do without such a component. Thus it is only implementation and need to be described in the architecture. 9.3.6. File Transfer Library --It appears that this exists to provide users with the Posix I/O view of data management. If so, it should be cleraly labeled as such. If other API's are included, they should be moved to a separate library. 9.4. File Metadata Catalog --I would like to see EGEE address catalog services more generally, not restrict itself to file metadata. My users may want to assign metadata to applications, jobs, datasets, users or anything else that can be identified. 10 Use cases --The use cases illustrate use of the described services. It would be helpful to have many more illustrating details. These might include inserting a file, extracting a file, replicating a file and the steps required to run a job.