Comments on EGEE Middleware Architecture DJRA1.1
Version 0.2 dated July 1, 2004
David Adams
July 20, 2004

General comments
--I feel this is a significant improvement over the previous draft
(dated June 8, 2004). The system appears more as a loosely coupled 
collection of interacting services rather that a monolithic
aggregation.
--However the services are not independent and it would be useful to
have a diagram or diagrams illustrating the dependencies.
--A section of service infrastructure would be useful. See comments
in the following. Common issues such as service discovery and
creation would be discussed there. The distribution of catalogs might
also be covered. Another issue is WSDL vs. client API's.
--How do users chase down the point of failure which may occur at
any point in a long chain of services?

2. Executive summary.
--Figure 1 is useful but I am no comfortable with the services
bridging categories.
--In "API and Grid Access Service", I would replace "user will gain
access with "user may gain access". The GAS is optional, isn't it?

3. Requirements
--In "Security", I presume the third A should be auditing rather than
accounting.

4. Service Oriented Architecture
--In "Services", I find the definition of service misleading. I would
say that a service instance is a process that may depend on the state
of other service instances. I suppose the document is technically
correct if service refers to the template used to create the isntance
rather than the instance itself but it would be helpful to draw this
distinction or drop the second sentence.

5. Security Services

5.1 Authentication
--I would like to see a connection from user credentials to user
identity and, in particular, a mechanism to connect the multiple
credentials that a user might obtain to his or her single identity.

5.2 Authorisation
--I naively expect that we would use a pull model to put minimal
burden on the user/client. Or is the agent something that appears
only behind the GAS? Or perhaps I completely misunderstand.

5.2.2 Policy combination and evaluation
--An example policy *might* make this clearer.

6. API and Grid Access Service
The last sentence of the introductory section indicates that services
have to be directly accessible, i.e. without the GAS. Preesumably this
implies they must provide authentication and aouthorization. What
model is followed here?

7. Basic Information and Monitoring Services
--Is a ProducerFactory conceptually different from a ConsumerFactory?
Or a WMS or SE factory? Perhaps there should be a dedicated section
on service infrastructure to address issues such as factories.

7.1.4 Security
--Isn't the discussion of authorization rules more general and not
just limited to information services? This might be included in the
aforementioned section on service infrastructure.

8. Job Management Services
--I don't see any discussion of the environment in which a job can
expect to run. Is there a sandbox? Is all input and output restricted
to this sandbox? What about /tmp? What about access to SE's? How
about access to external services, e.g. DB's?

8.1. Accounting
--Isn't accounting a type ofinformation service? Or maybe it should
have its owwn category. It doesn't fit too well in job management.

8.2. Computing element
--I propose to draw a clearer distinction between push and pull CE's.
Rather that saying a CE provides both interfaces, reserve CE for the
push model and invent a new term, say computing manager (CM), for the
pull interface. A CM might be implemented making use of a generic
CE interface.

8.3. Workload Management
--I wonder if the detailed discussion of the internals of the WMS is
appropriate for this document. Presumably a service with the same
interface but different internal decomposition would still be
considered a WMS.
--As mentioned in the text, the interface of the WMS might be the
same as the (push) CE. If so, maybe it should simply be called a CE.

8.4. Job provenance
--I would like to have the possibility of access to the output
sandbox to understand failures.

8.5. Package management
--I would first like to have a package location service, i.e. one
that returns the location (accessible directory name) given the
package name. Next would be something that it able to install a
package and its dependencies. Finally the service could also manage
the space, deleting unused packages when space is needed.

9. Data Services

9.1 Storage Element
--I am not real comfortable with the emphasis on tactical vs.
strategic storage. It might be useful to have a short-lived SE but
a site should not reclaim the storage before the contracted lifetime
has expired. Instead I would identify the characteristics of SE's
(reliability, lifetime, deletion policies, ...) and label them
accordingly.
--The phrase "LFN or GUID" appears in many places. I propose the SE
only support one primary identifier (e.g. the GUID). A user holding
a secondary identifier such as an LFN or symlink would obtain the
corresponding primary ID from a catalog service before interacting
with the SE.
--I don't see the posix-like file I/O as a required part of the
SE interface. It should be offered as a separate service.

9.1.4. Posix-like File I/O
--The mutability of logical files also needs to be addressed at the
SE level. Or are we assuming that a modified file gets a new GUID?
If so, this should be made explicit and prominent.
--I see two components to posix-like I/O: mapping file names (with
directories) to GUID's (analogous to inodes) and I/O operations on
the files associated with these GUID's. The latter is a site service
and the former is more naturally left to the VO's.

9.2.1. File Names
--There is another important file name, the physical file name
recognized by the OS. This is required if a file is to be opened by
an ordinary application (one not using the grid posix-like I/O) or
when a file in the sandbox is to be inserted into an SE.
--As discussed extensively in the ATLAS DB list, there are problems
if the GUID used for a file is the same as the POOL GUID and the
POOL file is published before and after data is appended. Are we
assuming or allowing use of POOL GUID's?

9.2.3. Catalog Services
--It looks like the FC has two roles: to track the the
symlink-LFN-GUID associations used in the posix file I/O and
flagging master SURLS. These should be separated. The former is
part of the Posix I/O service. The latter, if present, should be
stored in the replica catalog.
--The name "File Catalog" it too generic in any case.

9.2.4. Operations
--This looks like it could be subsumed into the preceding section.

9.2.5. Scalability and Consistency
--Clearly these comments appy to all catalogs,not just file catalogs.
This discussion should move accordingly.

9.2.6. The Master Replica
--I don't think we need a "master source for replications". If the
logical files are immutable and replication is reliable, then any
replica can be used.
--If update operations are to be supported now or ever, we need
considerably more discussion and a single master replica is not the
only approach. For example, there could be a monotonically increasing
version number or update time associated with each replica. I would
find it useful to have a flag with each replica indicating whether or
not it is the final version.
--I am not advocating mutability. I think we can simply assign a new
primary ID if a file is changed. In any case, we should have a clearer
vision in architecture.

9.2.7. Directories
--This discussion can move to the description of the Posix I/O service.

9.2.8. Virtual directories
--This high-level view is very experiment-specific and should not
appear in the middleware discussion. It is also a pretty complicate way
of describing something that boils down to a list of logical files.

9.2.9. Datasets
--Again, not a discussioin appropriate to the middlware architecture.

9.2.10. Security
--The files can be accessed through the SE interface without reference
to the Posix I/O service. Thus, the file ACL's must be maintained by
the SE, not the Posix I/O service. Directories only exist in the Posix
I/O service and so their ACL's reside there.
--I suppose the preceding suggests the NTFS-like ACL is more appropriate.

9.3. Data management

9.3.4. Transfer Fetcher
--This seems to be part of the DS with no external interface. Another
implementation of the DS might do without such a component. Thus it
is only implementation and need to be described in the architecture.

9.3.6. File Transfer Library
--It appears that this exists to provide users with the Posix I/O view
of data management. If so, it should be cleraly labeled as such. If
other API's are included, they should be moved to a separate library.

9.4. File Metadata Catalog
--I would like to see EGEE address catalog services more generally, not
restrict itself to file metadata. My users may want to assign metadata
to applications, jobs, datasets, users or anything else that can be
identified.

10 Use cases
--The use cases illustrate use of the described services. It would be 
helpful to have many more illustrating details. These might include
inserting a file, extracting a file, replicating a file and the steps
required to run a job.