"EGEE Middleware Architecture"
EGEE-DJRA1.1-476451-v0.1 from June 8, 2004

General comments
----------------

0. The paper presents a set of service descriptions, which are often
   very comprehensive and some even present architectural
   proposals. The services clearly follow (inherit?) the EDG
   middleware organisation, with few rather minor additions.

1. The paper does not describe gLite architecture. It presents a set
   of services and notes on WSRF framework only. Different sections
   appear to assume certain architecture (e.g., orientation towards PC
   clusters, a centralised workload manager, existence of a user
   interface), but there is no description of the relation of the
   services and their intended reason of existence. An overview
   section, presenting relations of different services and components,
   is needed.

2. It is only a draft, but still, it contains too many implied
   assumptions (see later comments for details). A person without
   prior knowledge of EDG middleware can easily get lost. A person
   with a good knowledge of EDG can easily map proposed components
   onto existing EDG ones.

3. The document gives an impression of a set of discoordinated
   Sections written by different people who did not communicate with each
   other. The level of details in different Sections and entire layout
   differ by orders of magnitude. This is perhaps due to the architecture
   absence. Several services offer overlapping functionalities
   (especially those related to information services, accounting, logging
   and such) , while some gaps are left uncovered. Also, some services
   have contradictory ideas about each other. See detailed comments
   below. Editor should go through the document and align it, perhaps
   together with the authors.

4. EGEE Annex 1 implied at least 2 alternatives for each
   service. This paper presents only one, perhaps due to the absence
   of the architecture on which the services should be mapped.

5. The computational power is always implied (though not explicitly
   described) to be PC clusters consisting of one master and several
   worker nodes. Since the architecture is absent, it is not clear why
   other kinds of resources, e.g., standalone workstations, or Condor
   pools, or Mosix clusters, are not considered. Operating system for
   all the components appears to be implicitly GNU/Linux, though no
   reasoning is given - architecture is missing.

6. Related issue: the notion of a WN (worker node) is used extensively
   and yet is never defined. 

7. Applications requirements most of the time are not correlated with
   HEPCAL ones, and in places are misinterpreted (see detailed
   comments). One can assume there was no users representative in the
   design team, and if it is true, it's quite unfortunate.

8. It seems that the [absent] architecture implies VO-based
   implementation, where each VO is served by a set of dozens of
   services. It is very unlikely to scale beyound 2-3 VOs.

9. Each service description needs notes on fault tolerance and
   recovery. Some service descriptions offer it, but most don't.

10.Each service description needs an assessment of latency introduced
   by this service, and the architecture - if and when it appears -
   should present overall design latencies.

11.Authentication and authorisation services are the most abused ones:
   every service makes use of those, but none correlates with the
   actually presented description (sections 4.1, 4.2), which is fairly
   vague and does not really describe those details used by many other
   service descriptions. See detailed comments.

12.Many references are missing. Too many to list here. Every mentioned
   tool, application, installation, abbreviation, organisation,
   project, initiative (starting with GGF) must have an apropriate
   reference. Editor should go through the document and add missing
   citations. 

13.Figure 6 offers the closest view of the architecture; perhaps it
   should be extended to other services, for completeness.

14.Term "VO" is used all over, but often in an inconsistent
   manner. Would be nice to define what is a VO, its role or
   functionality, and use it consistently.

15.There is a bit too much of general descriptions of different
   services (e.g. security, WSRF), which do not really belong to
   the architecture paper, at least in the existing shape, when they
   are rather abstract and are not specific. Perhaps a reference to
   corresponding GGF documents should be put instead of those texts.

16.If one collect all the services mentioned in the document, the list
   will be longer than the originally proposed in Section 1. E.g.,
   Section 4.4 refers to a "configuration service", which is
   never expanded. One may guess some of the services are internal to
   components, but it is hardly ever explained.


Detailed comments
-----------------

Section 1, "Introduction"
`````````````````````````
Rather strange to start building Grid from VOs; but as it seems to be
an architectural decision, it has to be explicitly mentioned in the
missing "Architecture" section.

Big attention is paid for sharing resources inside a VO, while none -
to sharing resources *between* VOs. The entire paper does not seem to
take into account that VOs may come in hundreds. SWEGRID alone has
more than a dozen. As NorduGrid experience shows, it is not a big task
to arrange resource sharing inside a VO, as all the ownerships and
policies tend to be largely the same and easily managed. The real
issue is how to share resources between the VOs.

The list of Grid projects does not mention Globus - the project that
coined the name. Is it intentional or simply another implied thing?

The list of services is odd - meaning, the granularity and the
separation into "site specific" and "Grid wide specific". This
separation is never supported in the later course of the document, and
as there is no architecture, there is nothing to validate
it. Moreover, the separation is really questionable. For example,
information collector and authorization services are definitely
site-specific, while the former is not listed at all, and the latter
is Grid-wide! Nobody authorises anybody Grid-wide!

The granularity of services is also questionable. For example, Job
Monitoring is definitely a part of CE, as this is where jobs actually
are, and this is the only component which can reliably assess job
status. Also, it is not clear why Grid Access and Authentication are
separate services: one can not exist or makes no sence without
another. 

The Package Manager service needs explanation here. 

Services such as VO management and resource validation (QoS) are
missing, or perhaps are included into another service - so the
granularity realy varies by orders of magnitude.

Figure 1 looks like an attempt at an architecture, but it needs
explanation. Also, plenty of links are missing, at least as it turns
our after reading the rest of the document. 

Then, after a questionable grouping into 2 categories, another
regrouping into 5 categories is performed, for no obvious reason.

Security services includes allowing/denying access to resources -
while of course it partially belongs to security issues, most of
access considerations are driven by VO policies, related to financial
and political motives, not security. 

Despite being aparently VO-based, there is no dedicated set of
VO-related services, such as VO management, inter-VO accounting,
authorization, monitoring etc.

API and Grid access service appears to be the client part in the
absent architecture. It is not explicitly mentioned, and does not
describe User Interface - the notion heavily used by other services.

Information and monitoring services: really refreshing to see the move
towards fine-grained authorization scheme for info access! Good.

Job management services: includes rightfully accounting; however,
accounting is done against storage facilities usage and network usage
and whatever else, not necessarily connected to jobs as such. So the
splitting is questionable.

Data services: should include accounting, see above. Also, it is very
disturbing to see that the designers limit themselves with file-based
data. Even GGF considers two options: files and
databases. Unfortunately, HEP-specific data containers, such as
ROOT-files, are not considered by anybody, despite clear HEPCAL
requirements. Neither HEPCAL Datasets are considered (rather, they are
misinterpreted, see later).

Section 2, "Applications Requirements"
``````````````````````````````````````
Must be split in several subsections:
- user requirements
- resource owners requirements
- security and privacy requirements
- ...

Users requirements must include:
- Robustness
- Reliability
- Stability
- Round-the-clock operability
- Workload optimisation based on:
  * cycles availability
  * storage availability
  * quality of data access (effective, not geographical, "proximity")
  * bandwith usage
  * costs of resource usage
  * ... (see e.g. HEPCAL)
- ...

Resource owners requirements must include:
- Non-invasiveness
- Portability
- ...

Resource sharing across organisations: it looks like the recepy is to
join these organisations in a VO and arrange thus for resource
sharing. How about sharing between VOs?

Section 3, "Service Based Architecture"
```````````````````````````````````````
Very consistent Section, but wrong name, or wrong contents - the
Section describes framework, not architecture.

Major general point: WSRF is described many times as a proposal with
yet to be defined specifications. The advantage of SOA is said to be a
single well-defined protocol, and yet the same section sais this
protocol is not well-defined. While this is acceptable and actually
belongs to an R&D project, one can hardly use these arguments for
EGEE, which is a deployment project. Deployment needs performance, and
there is no reference to any analysis of a service-based system
performance. LHC experiments in particular need low latencies and high
robustness, and it is not clear whether a generic SOA is better in
this respect than a tailored dedicated architecture. What is
good for an abstract Grid is not necessarily optimal for such a unique
installation as LHC, and a detailed comparison analysis is sorely
missing.

Section 3.1 sais "This model is clearly very suitable for Grid
middleware, where we have to manage Virtual Organizations (VO) and
Site policies." That's not what was argued before, and it is far from
being clear what VOs have to do with SOA.

Section 3.3 praises SOAP, which is good for Web-based applications
that can tolerate high latencies, but is not necessarily optimal for
high volumes of data and high throughput. It is ASCII, after all, and
it tends to increase needed bandwith by wrapping a single 1-bit flag
into a complex envelope. There's a trade-off between standardization
and performance for sure.

Section 3.5 "State" is very reasonable generally, but does not appear
to be applied in further documentation. NorduGrid's Grid Manager suits
this particular framework better than the proposed gLite services.

Section 3.6 "WSRF" sais literally " To date, not all of these
specifications have been made public yet". What are the chances they
will be defined within EGEE lifetime? Does not seem to be a reliable
choice for a 2-year project aiming at providing production facilities.

And a P.S.: in words of Carl Kesselman, "WSRF is not a silver bullet".


Section 4 "EGEE Grid Services"
``````````````````````````````
Section 4.1 Authentication
```````````````````````````
One of the best sections in the document, clearly presenting
requirements and options. However, does not mention whether this model
is already implemented or needs some work to get realized. Also, no
alternatives offered.

Section 4.1.2 assumes a centrally managed set of CAs (the "single
trust domain") - is it really the intention? Newcomers/outsiders will
have enormous problems, which is not quite a Grid idea.

Section 4.1.3 seems to be detached from the rest, and it's not clear
whether it is proposed for implementation in EGEE.

Section 4.1.4 "Revocation" is very important, as it influences the way
services are interoperating. Unfortunately, no details are given.

Section 4.1.5 does not offer procedures to deal with stolen/lost
credentials. E.g., if a secure storage gets stolen, what is the
procedure? Shall the user notify some agency (Grid Police)? Any tools
for that? What if an entire organisation credentials get
lost/compromised, will every user get notified? By whom? In both
cases, what will happen to the tasks which are already running in the
system?

Section 4.1.6 on pseudonyms is very interesting, but again, it is not
clear if it is a part of the proposed architecture or just a note
aside.

Missing sections:
* Identity modifications: people's names get changed, domain names get
  changed, organisation names and affiliations get changed, -
  basically, identity of each actor and legal body can change many
  times. How does this system accomodate for it?
* Multiple identities and credentials: a person or a resource very
  well can have several legal identities, like double citizenship. How
  does this system accomodate for it?

Section 4.2 "Authorization"
```````````````````````````
Also a very nice section, which unfortunately stays shy of proposing
some definite tool/mechanism. But perhaps it will be in the separate
deliverable for this area.

Section 4.2.2 "Delegation": the important issue is how can one
guarantee that any service can impersonate any user and access in
his/her name any other service? For example, in the present Grid
implementations, a user may submit a job which needs to fetch data
from a storage service, which identity user does not know in
advance. How can one provide that (a) the computer on which the job is
executed accepts credentials of that storage element, and (b) the
storage element will serve data for that user? Currently this problem
is not solved. One must not expect that all the resources will accept
all the users and all the certification authorities.

Missing section: on dynamic authorisation. Roles within VO do change
dynamically, and authorization services must follow it. For example, a
user submits a task as a manager and is authorised to access certain
resources. However, in course of the task, he is demoted to a regular
user, or changes a VO altogether. He must not be able to access those
sensitive services any longer, within hours at most after his
athorisation patterns changed. And other way around, a person may be
given some extra authority by his VO, and he should be able to
excert it as soon as possible.

Section 4.3 "Auditing"
``````````````````````
Describes no services??? Impossible to comment, has no substance. It
seems like the auditors are supposed to simply browse through hundreds
of logs from different services, so much for a service. That's what
they do anyway, so perhaps this section is not needed?

Section 4.3.3 explicitly says "The auditing process in itself does not
require a separate service", while Figure 1 "gLite Services" has a box
for it, and it is listed as a service in Section 1. Which one is wrong?

Section 4.4 "API and Grid Access Service"
``````````````````````````````````````````
Rather odd section. From "Use cases" in Section 5 one discovers that
GAS is actually the user client, realized as a service, but this
section does not say so. It describes a client-like service with
STRONG assumptions about architecture, which is of course not
there. It is also not clear whether there is any regular client
foreseen - e.g., Web has browsers for clients, and browsers are not
services. Is something similar planned for gLite? Architecture MUST
mention it.

Perhaps the name should be "Client API..."? And the Figure 3 should
replace the strawman in the Figure 6, then we'll get the architecture!
(almost, save the security).

What is the "configuration service"? The line on page 20 is very vague
and does not explain how this service relates/communicates to others,
how does it maintain the lists, etc. This service seems to be one of
the central ones and deserves more than one line of explanation. Is it
a single point of failure? When this service is down, what is the
solution? 

Section 4.4.1 lists Information, Configuration and Authorization
services. There is no Configuration service description anywhere.

Section 4.5 "Information and Monitoring"
````````````````````````````````````````
Pretty empty section for such an important issue. Information
collection and propagation is essential for any Grid, it is its
nerve system. Without it, other services are disconnected. The purpose
of the Information System is to provide for resource discovery, and
this part is not addressed at all. So, it deserves much better
description, with functionality requirements and architectural
details. At the moment it only lists implementation details, no
architecture. Also, alternatives must be presented.

The whole Section focuses on the querying aspect, not so much on how
to collect the information. Neither it describes how to achieve the
"appearance of a single federated database", which is set as the
goal. Also, it does not seem to be important to provide relevant
information in such a database - anything will do? There must be
some criteria on how recent the "stored" information is?

The section is actually a description of R-GMA without naming it. It
even uses R-GMA-specific terms (e.g. "TableAuthorisation", "log4j"
!!!). Hardly passes for an architectural proposal, rather, an
implementation description.

Does not describe an informational model of EGEE: what are the
components, what are the units and objects and actors (nodes? jobs?
clusters? users?).

Introduction: bluntly states that relational model is the one. No
explanation is given, and it contradicts both to the hierarchical
structure proposed by Globus, and the hierarchical nature of
information on Grid. If it is desided that EGEE opts for a relational
model, a solid reason must be given.

The suggested system is a complicated hierarchy of caches and
providers, which is shown (by EDG) not to be satisfactory and not
scalable.

The list of services should be given ih human words, not
"OnDemandProducer" parlance with no obvious meaning.

The structure of the information system must be described - its
architecture. E.g., one expects there will be sensors, which will send
info to some nodes, which in turn will be known to a central
"dispatcher". Or is it flat, where each sensor knows everything about
others? What is an entry point to such system? Is there a registry of
resources? Is it one registry or many? How does the thing scale?

Section 4.5.3 "Services" : seems like too many layers of services,
couldn't it be kept simple? With so many intermediate steps the system
is more likely to fail.

What is the topology of producers? How scalable is the system where
higher-level producers pull information snapshots from lower-level
ones? Furthermore, it appears that even the primary producers have a
"minimum retention period" (read - cache?); the risk of getting very
outdated info propagated all the way to the clients is extremely
high.

IMPORTANT: assumes that users can enter data into the system. This
must be declared in the [absent] architecture, as it has very strong
implications for resource providers. It is a potential security hole,
so a clear reasoning and methods to protect it must be given.

Last sentence mentions "mediator" - who/what is that? A description is
needed. How does it find the resources?

Section 4.5.4 "Security": mentiones the issue above; however, does not
deal with the situation when some resource owners will allow users to
publish info, while others won't. This will render the proposed system
useless. Either all must allow it (impossible) or it should be
forbidden (pretty feasible).

Another issue which is not addressed is how to deal with confidential
information, which, e.g., should not be visible to other VOs, or to
certain members of the same VO.

How are authorisation rules supposed to propagate through all the
layers of caches?

Section 4.6 "Job Monitoring"
````````````````````````````
Not clear why it is a separate service. Also, the approach is wrong:
it limits job monitoring to stdout monitoring, which is the least
significant part of it. Job does not know about itself anything, while
LRMS does. So this is a task of the CE to collect such info and
publish it into the information system. No need for a separate
service. 

Another concern: in the model of a PC cluster with worker nodes on a
private network, implementation of this service is hardly possible, as
it seems.

Section 4.7 "Accounting"
````````````````````````
Probably the best section in the entire document: consistently
presents requirements and a draft architecture.

Not clear why accounting is a separate service from the Information
system: after all, all the information overlaps. It will be rather
silly to have different info collectors/meters for accounting and
infosystem. This is why, again and again, an OVERALL ARCHITECTURE is
badly needed, to prevent such overlaps.

Not clear why abstraction layers are needed - to compare oranges with
bananas? It assumes that if a user has a certain amount of credits, he
will have to ballance between, say, storage and CPU usage. If he'll
spend too much disk, he won't be able to compute for longer that a
certain time. This makes no sense. An abstraction within the same
class of services is OK, but not between them.

Cost computation and billing: how would this system deal with
dynamically changing costs? Like, sales of CPU time? Or bonuses? Or
inflation?

Could users split a bill? E.g., charge 50%/50% different VOs with
which he is affiliated?

Section 4.8 "Computing Element"
```````````````````````````````
Incredibly confusing/confused section! CE in Computational Grid is the
central element, its heart, and yet this section gives a very
played-down description.

Is CE a gatekeeper? It represents a computing resource all right, but
what is the architecture? Is there one CE per cluster? Or one per
node? One per processor?

It manages jobs, but why does not it manage data connected to the job?
What is a job? Does it include input/output data management (location
resolution, download and upload, possibly registration)? This is
what CE should manage, too!

Distribution of jobs within a heterogeneous cluster is a task of a
local RMS, not CE - unless of course CE is a replacement for LRMS,
which is not clear.

Intarface to LRMS mentiones "standards". There are none, known LRMS-es do
not adhere to standards. Even OpenPBS and PBSPro use different
notations.

Section 4.8.1 "Job management functionality": Must include job-related
data management: stage in and stage out, like LRMS-es do, but in a
global sence. If job description presents URLs for an input data, CE
must fetch it. If job description presents URL for output data, CE
must upload it in the name if the user.

Must include setting up job environment - e.g., defining necessary
pathes,according to job description.

CE appears to be unable by design to send jobs to a queue: in the
situation with several WMS-es, it is said to accept the first job to
arrive, and refuse others. So it makes no use of local queues,
creating instead a huge global queue. Won't scale.

Section 4.8.2 "Other functionality": CE is said to send "CE
availability" messages to WMS. In this respect, it duplicates the job
of Information System. If it is the architectural decision, it must be
mentioned. WMS should be perfectly able to interpret information in
the Infosystem, and needs no messages from all the CEs.

How about authentication and authorization on the resource? Is it not
the job of CE?

Section 4.8.3 "Internal CE architecture" mentions mapping between Grid
users and local (Unix?) users, while "Authorization" does not imply
it. Not clear why it is necessary.

The procedure of job acceptance (creation of UC, execution in JC, MON)
seems to be unnecessary complicated, and it is not clear why a single
component must use Web Services inside, to communicate in between its
own sub-components. This seems to add to latency and increase error
probability.

Missing section: Interactive tasks.

Section 4.9 "Workload management"
`````````````````````````````````
Assumes only one Workload Manager??? Single point of failure??? Is it
the architectural decision?

Why does it need Logging and Bookkeeping, while we already have got
Accounting and Infosystem? Same information collected 3 times?!

Why two main job management requests are "submission and cancelation"?
Must be compared with any LRMS, which had requests like resubmission,
suspension, and most importantly, monitoring.

Matchmaking seems to be too simplistic. It must be also based on
credits available for the user, bandwith, and other optimization
parameters.

Sections 4.9.3 and 4.9.4: are those components (ISM and TQ) necessary?
Requirements must be presented.

How does ISM gets filled with information? Why not to re-use R-GMA? Or
are they related in some way?

Section 4.9.5 "Job Logging and Bookkeeping" - as mentioned above, why
is it a WMS component and not Accounting?

Section 4.10 "Job Provenance"
`````````````````````````````
Nice description, but it is absolutelly impossible to fulfill, as it
will demand more storage than entire LHC data!!!

Why not to delegate some of the described tasks to Accounting?

Section 4.10.2 mentions "input/output sandboxes", which are not part
of the [absent] architecture. Unclear what is actually meant, or what
assumptions are made.

Section 4.10.3 mentions various servers, but does not define their
location - are those stand-alone, or reside on a gatekeeper? Or on
Worker Nodes? or on Storage Elements? Well, as such components are not
defined anyway, it is difficult to do any match.

Also, this section suggests to name some files after JobID. This is
impossible on Linux file systems, as all know JobIDs are URLs and
include characters incompatible with file names. Or the [absent]
architecture implies different pattern for JobIDs?

Section 4.11 "Storage Element"
``````````````````````````````
General point: almost no description of what functionality SEs are
expected to provide. One would imagine they manage
authentication/authorisation, disk space, quotas, support
infoproviders/meters/whatever-they-are-called, keep local registry of
data, publish it or propagate upon request to global registries, and
so on. All this should be listed explicitly, not scattered across the
section. 

Introduction, "Tactical Storage" - seems to be a local cache for a CE,
which is a good thing to have. But then it should be a CE component.

4.11.2 sais it will use "security services for the purpose of
authorization accounting" - well, accounting does not belong to
security services, according to previous sections.

Figure 8 is impossible to digest, it needs a WTF service...

Figure 9 uses undefined notation of the Worker Node and makes strong
implications of its capabilities: what will happen if WN can not
communicate to services other than CE, because sysadmins decided so?

Section 4.12 "File and Replica Catalogs"
````````````````````````````````````````
Unfortunately, assumes that file is a unit of information/data, which
is of course not always the case. Also, in many areas, from HEP to
biotech, researchers operate with datasets, not with files - rather,
they don't care whether data are stored in files, databases or
elsewhere.

Introduction: Mentions "Replica Catalogue" which is never defined.

Unclear whether replicas are always on SEs, or anywhere else? E.g., if
a client fetches a file for analysis, is it considered a replica? If a
job caches the file on a local disk, is it a replica? Physically, they
are, but in this architecture it is not addressed. 

Mentions that Catalogs create the illusion of a single file system,
which contradicts to the suggested design of SEs in 4.11.5, which
argues that a virtual file system is possible.

Decodes GUID as "Grid Unique ID", while the rest of the section uses
"Globally Inique ID" - is it the same entity or different ones?

Anyway, Catalogs must not assign GUIDs. GUID by definition is Globally
Unique and is assigned when a file is created, not when it is
registered. Like a DNS server does not assign unique IP numbers, it
only lists them. Thus it is either an application that assigns a GUID,
or a SE, if the application is uncapable of doing it.

Section 4.12.2 "File names" argues that GUID has 1:1 correspondence to
LFN. This must be a mistake, as one of them will not be needed in this
case. Back to the example of IPs: one may well have several aliases
(LFNs) for a unique number. For example, the same file may have
different logical meaning for different analysis streams, so it may
well carry several LFNs, while having only one GUID.

Same section introduces SURL. Why "S" is used is unclear: it is a URL,
just like any other URL, why invent a new name?

Same section sais that LFN is mutable - this contradicts to HEPCAL
requirements. LFNs are immutable.

Missing section: Datasets. How it is foreseen to organize data in
sets? They can be overlapping, have subsets and supersets. There must
be a service which prevents deletion of a file when entire dataset is
erased, if it belongs to another dataset, and so on. See HEPCAL on
details.

Figure 10: what are the cylinders?..

Section 4.12.8 invents "directory" notation and claims it "maps well"
to a dataset, while of course it does not. There are some
similarities, but there are differences as well. Directories largely
represent an hierarchy, while datasets are not hierarchical at all.

Section 4.12.10 mentions ACLs on a filesystem - which filesystem? Do
Catalogs rely on filesystems? Are they not databases?

Some notes on inheritance of rights are needed. E.g., if a user
defines a dataset, he must be able to list its contents even if it was
filled by another user. Of course, he must first define who is allowed
to manipulate the dataset description and its relation to other
datasets etc. That does not mean the same users can manipulate data in
the dataset, of course, and vice versa.

Section 4.13 "Data Management"
``````````````````````````````
A wealth of fresh ideas, with well-defined architecture. However, the
reason of existence of this service is not explained. It occurs as if
the primarily goal of this service is bulk/asynchronus replication. It
does not address regular data transfer. If only this particular
service will be used for any data transfer, it will only complicate
simple things, create another single point of failure and decrease the
overall performance. But it is a nice theoretical service-based
framework exercise, deserving a separate project.

Introduction: again, implies existence of Worker Nodes with some
unknown functionality. Also, uses notation of a "local SE", something
which is not described in the SE section. 

It appears as if it only deals with data transfer between SEs, and it
is unclear how users can fetch or upload files elsewhere, which is
particularly important for analysis tasks.

Section 4.13.3 "Services Overview": unclear which part of the service
resolves LFN? For example, a user or a script running on a computing
resource issues the instruction:

> glite-copy MY_FILE_00283.SET2::FF .

Here MY_FILE_00283.SET2::FF is a logical file name. What chain is
being invoked? Which subservices are instantiated? 

Section 4.13.4 "Data Scheduler": this appears to be an asynchronous
tool, similar to the infamous GDMP. It has no practical use in job
execution. Its only use is bulk replication and synchronisation of
repositories. If this is the purpose of the entire service, it must be
made clear in the very first line of the Section.

Section 4.14 "Metadata Catalog"
```````````````````````````````
Pretty vague, almost no substance and everything "may" or "can" be,
nothing definite. Difficult to comment. Section 4.14.3 mentions tests
of such services, but does not describe the outcome: did it work as
expected, or was it a total failure? No reference is given anyway.

Section 4.15 "Package Manager"
``````````````````````````````
This must be the slot for Pacman, if one reads between the lines. Uses
terms "Job Wrapper" and "JA", which are not defined anywhere. It does
not even define what is a "package". And it offers to manage a file
system - which is a job for SE or CE, as one may expect.

Section 5 "Use cases"
`````````````````````
Very useful, many things become clear by going through them. Too bad
they are in the end. They'd be better presented as parts of the
initial achitecture somewhere in the beginning.

All the cases assume there are several WMS-es, while this is not what
the corresponding Section foresees. Contradiction.

Section 5.1 "Grid Login" sais that GAS will "create the appropriate
Grid service instances" (item 8) - one may find it difficult to
believe that GAS will create an instance of, say, a Workload Manager,
or a Computing Element. This is certainly not what was meant!

Sections 5.2 "Production Job" 5.3 "Analysis Job" seem to assume that
the only difference between production and analysis job is that the
latter has input data, while the former has no. This is not true,
please refer to HEPCAL II.

That's all...