"EGEE Middleware Architecture" EGEE-DJRA1.1-476451-v0.1 from June 8, 2004 General comments ---------------- 0. The paper presents a set of service descriptions, which are often very comprehensive and some even present architectural proposals. The services clearly follow (inherit?) the EDG middleware organisation, with few rather minor additions. 1. The paper does not describe gLite architecture. It presents a set of services and notes on WSRF framework only. Different sections appear to assume certain architecture (e.g., orientation towards PC clusters, a centralised workload manager, existence of a user interface), but there is no description of the relation of the services and their intended reason of existence. An overview section, presenting relations of different services and components, is needed. 2. It is only a draft, but still, it contains too many implied assumptions (see later comments for details). A person without prior knowledge of EDG middleware can easily get lost. A person with a good knowledge of EDG can easily map proposed components onto existing EDG ones. 3. The document gives an impression of a set of discoordinated Sections written by different people who did not communicate with each other. The level of details in different Sections and entire layout differ by orders of magnitude. This is perhaps due to the architecture absence. Several services offer overlapping functionalities (especially those related to information services, accounting, logging and such) , while some gaps are left uncovered. Also, some services have contradictory ideas about each other. See detailed comments below. Editor should go through the document and align it, perhaps together with the authors. 4. EGEE Annex 1 implied at least 2 alternatives for each service. This paper presents only one, perhaps due to the absence of the architecture on which the services should be mapped. 5. The computational power is always implied (though not explicitly described) to be PC clusters consisting of one master and several worker nodes. Since the architecture is absent, it is not clear why other kinds of resources, e.g., standalone workstations, or Condor pools, or Mosix clusters, are not considered. Operating system for all the components appears to be implicitly GNU/Linux, though no reasoning is given - architecture is missing. 6. Related issue: the notion of a WN (worker node) is used extensively and yet is never defined. 7. Applications requirements most of the time are not correlated with HEPCAL ones, and in places are misinterpreted (see detailed comments). One can assume there was no users representative in the design team, and if it is true, it's quite unfortunate. 8. It seems that the [absent] architecture implies VO-based implementation, where each VO is served by a set of dozens of services. It is very unlikely to scale beyound 2-3 VOs. 9. Each service description needs notes on fault tolerance and recovery. Some service descriptions offer it, but most don't. 10.Each service description needs an assessment of latency introduced by this service, and the architecture - if and when it appears - should present overall design latencies. 11.Authentication and authorisation services are the most abused ones: every service makes use of those, but none correlates with the actually presented description (sections 4.1, 4.2), which is fairly vague and does not really describe those details used by many other service descriptions. See detailed comments. 12.Many references are missing. Too many to list here. Every mentioned tool, application, installation, abbreviation, organisation, project, initiative (starting with GGF) must have an apropriate reference. Editor should go through the document and add missing citations. 13.Figure 6 offers the closest view of the architecture; perhaps it should be extended to other services, for completeness. 14.Term "VO" is used all over, but often in an inconsistent manner. Would be nice to define what is a VO, its role or functionality, and use it consistently. 15.There is a bit too much of general descriptions of different services (e.g. security, WSRF), which do not really belong to the architecture paper, at least in the existing shape, when they are rather abstract and are not specific. Perhaps a reference to corresponding GGF documents should be put instead of those texts. 16.If one collect all the services mentioned in the document, the list will be longer than the originally proposed in Section 1. E.g., Section 4.4 refers to a "configuration service", which is never expanded. One may guess some of the services are internal to components, but it is hardly ever explained. Detailed comments ----------------- Section 1, "Introduction" ````````````````````````` Rather strange to start building Grid from VOs; but as it seems to be an architectural decision, it has to be explicitly mentioned in the missing "Architecture" section. Big attention is paid for sharing resources inside a VO, while none - to sharing resources *between* VOs. The entire paper does not seem to take into account that VOs may come in hundreds. SWEGRID alone has more than a dozen. As NorduGrid experience shows, it is not a big task to arrange resource sharing inside a VO, as all the ownerships and policies tend to be largely the same and easily managed. The real issue is how to share resources between the VOs. The list of Grid projects does not mention Globus - the project that coined the name. Is it intentional or simply another implied thing? The list of services is odd - meaning, the granularity and the separation into "site specific" and "Grid wide specific". This separation is never supported in the later course of the document, and as there is no architecture, there is nothing to validate it. Moreover, the separation is really questionable. For example, information collector and authorization services are definitely site-specific, while the former is not listed at all, and the latter is Grid-wide! Nobody authorises anybody Grid-wide! The granularity of services is also questionable. For example, Job Monitoring is definitely a part of CE, as this is where jobs actually are, and this is the only component which can reliably assess job status. Also, it is not clear why Grid Access and Authentication are separate services: one can not exist or makes no sence without another. The Package Manager service needs explanation here. Services such as VO management and resource validation (QoS) are missing, or perhaps are included into another service - so the granularity realy varies by orders of magnitude. Figure 1 looks like an attempt at an architecture, but it needs explanation. Also, plenty of links are missing, at least as it turns our after reading the rest of the document. Then, after a questionable grouping into 2 categories, another regrouping into 5 categories is performed, for no obvious reason. Security services includes allowing/denying access to resources - while of course it partially belongs to security issues, most of access considerations are driven by VO policies, related to financial and political motives, not security. Despite being aparently VO-based, there is no dedicated set of VO-related services, such as VO management, inter-VO accounting, authorization, monitoring etc. API and Grid access service appears to be the client part in the absent architecture. It is not explicitly mentioned, and does not describe User Interface - the notion heavily used by other services. Information and monitoring services: really refreshing to see the move towards fine-grained authorization scheme for info access! Good. Job management services: includes rightfully accounting; however, accounting is done against storage facilities usage and network usage and whatever else, not necessarily connected to jobs as such. So the splitting is questionable. Data services: should include accounting, see above. Also, it is very disturbing to see that the designers limit themselves with file-based data. Even GGF considers two options: files and databases. Unfortunately, HEP-specific data containers, such as ROOT-files, are not considered by anybody, despite clear HEPCAL requirements. Neither HEPCAL Datasets are considered (rather, they are misinterpreted, see later). Section 2, "Applications Requirements" `````````````````````````````````````` Must be split in several subsections: - user requirements - resource owners requirements - security and privacy requirements - ... Users requirements must include: - Robustness - Reliability - Stability - Round-the-clock operability - Workload optimisation based on: * cycles availability * storage availability * quality of data access (effective, not geographical, "proximity") * bandwith usage * costs of resource usage * ... (see e.g. HEPCAL) - ... Resource owners requirements must include: - Non-invasiveness - Portability - ... Resource sharing across organisations: it looks like the recepy is to join these organisations in a VO and arrange thus for resource sharing. How about sharing between VOs? Section 3, "Service Based Architecture" ``````````````````````````````````````` Very consistent Section, but wrong name, or wrong contents - the Section describes framework, not architecture. Major general point: WSRF is described many times as a proposal with yet to be defined specifications. The advantage of SOA is said to be a single well-defined protocol, and yet the same section sais this protocol is not well-defined. While this is acceptable and actually belongs to an R&D project, one can hardly use these arguments for EGEE, which is a deployment project. Deployment needs performance, and there is no reference to any analysis of a service-based system performance. LHC experiments in particular need low latencies and high robustness, and it is not clear whether a generic SOA is better in this respect than a tailored dedicated architecture. What is good for an abstract Grid is not necessarily optimal for such a unique installation as LHC, and a detailed comparison analysis is sorely missing. Section 3.1 sais "This model is clearly very suitable for Grid middleware, where we have to manage Virtual Organizations (VO) and Site policies." That's not what was argued before, and it is far from being clear what VOs have to do with SOA. Section 3.3 praises SOAP, which is good for Web-based applications that can tolerate high latencies, but is not necessarily optimal for high volumes of data and high throughput. It is ASCII, after all, and it tends to increase needed bandwith by wrapping a single 1-bit flag into a complex envelope. There's a trade-off between standardization and performance for sure. Section 3.5 "State" is very reasonable generally, but does not appear to be applied in further documentation. NorduGrid's Grid Manager suits this particular framework better than the proposed gLite services. Section 3.6 "WSRF" sais literally " To date, not all of these specifications have been made public yet". What are the chances they will be defined within EGEE lifetime? Does not seem to be a reliable choice for a 2-year project aiming at providing production facilities. And a P.S.: in words of Carl Kesselman, "WSRF is not a silver bullet". Section 4 "EGEE Grid Services" `````````````````````````````` Section 4.1 Authentication ``````````````````````````` One of the best sections in the document, clearly presenting requirements and options. However, does not mention whether this model is already implemented or needs some work to get realized. Also, no alternatives offered. Section 4.1.2 assumes a centrally managed set of CAs (the "single trust domain") - is it really the intention? Newcomers/outsiders will have enormous problems, which is not quite a Grid idea. Section 4.1.3 seems to be detached from the rest, and it's not clear whether it is proposed for implementation in EGEE. Section 4.1.4 "Revocation" is very important, as it influences the way services are interoperating. Unfortunately, no details are given. Section 4.1.5 does not offer procedures to deal with stolen/lost credentials. E.g., if a secure storage gets stolen, what is the procedure? Shall the user notify some agency (Grid Police)? Any tools for that? What if an entire organisation credentials get lost/compromised, will every user get notified? By whom? In both cases, what will happen to the tasks which are already running in the system? Section 4.1.6 on pseudonyms is very interesting, but again, it is not clear if it is a part of the proposed architecture or just a note aside. Missing sections: * Identity modifications: people's names get changed, domain names get changed, organisation names and affiliations get changed, - basically, identity of each actor and legal body can change many times. How does this system accomodate for it? * Multiple identities and credentials: a person or a resource very well can have several legal identities, like double citizenship. How does this system accomodate for it? Section 4.2 "Authorization" ``````````````````````````` Also a very nice section, which unfortunately stays shy of proposing some definite tool/mechanism. But perhaps it will be in the separate deliverable for this area. Section 4.2.2 "Delegation": the important issue is how can one guarantee that any service can impersonate any user and access in his/her name any other service? For example, in the present Grid implementations, a user may submit a job which needs to fetch data from a storage service, which identity user does not know in advance. How can one provide that (a) the computer on which the job is executed accepts credentials of that storage element, and (b) the storage element will serve data for that user? Currently this problem is not solved. One must not expect that all the resources will accept all the users and all the certification authorities. Missing section: on dynamic authorisation. Roles within VO do change dynamically, and authorization services must follow it. For example, a user submits a task as a manager and is authorised to access certain resources. However, in course of the task, he is demoted to a regular user, or changes a VO altogether. He must not be able to access those sensitive services any longer, within hours at most after his athorisation patterns changed. And other way around, a person may be given some extra authority by his VO, and he should be able to excert it as soon as possible. Section 4.3 "Auditing" `````````````````````` Describes no services??? Impossible to comment, has no substance. It seems like the auditors are supposed to simply browse through hundreds of logs from different services, so much for a service. That's what they do anyway, so perhaps this section is not needed? Section 4.3.3 explicitly says "The auditing process in itself does not require a separate service", while Figure 1 "gLite Services" has a box for it, and it is listed as a service in Section 1. Which one is wrong? Section 4.4 "API and Grid Access Service" `````````````````````````````````````````` Rather odd section. From "Use cases" in Section 5 one discovers that GAS is actually the user client, realized as a service, but this section does not say so. It describes a client-like service with STRONG assumptions about architecture, which is of course not there. It is also not clear whether there is any regular client foreseen - e.g., Web has browsers for clients, and browsers are not services. Is something similar planned for gLite? Architecture MUST mention it. Perhaps the name should be "Client API..."? And the Figure 3 should replace the strawman in the Figure 6, then we'll get the architecture! (almost, save the security). What is the "configuration service"? The line on page 20 is very vague and does not explain how this service relates/communicates to others, how does it maintain the lists, etc. This service seems to be one of the central ones and deserves more than one line of explanation. Is it a single point of failure? When this service is down, what is the solution? Section 4.4.1 lists Information, Configuration and Authorization services. There is no Configuration service description anywhere. Section 4.5 "Information and Monitoring" ```````````````````````````````````````` Pretty empty section for such an important issue. Information collection and propagation is essential for any Grid, it is its nerve system. Without it, other services are disconnected. The purpose of the Information System is to provide for resource discovery, and this part is not addressed at all. So, it deserves much better description, with functionality requirements and architectural details. At the moment it only lists implementation details, no architecture. Also, alternatives must be presented. The whole Section focuses on the querying aspect, not so much on how to collect the information. Neither it describes how to achieve the "appearance of a single federated database", which is set as the goal. Also, it does not seem to be important to provide relevant information in such a database - anything will do? There must be some criteria on how recent the "stored" information is? The section is actually a description of R-GMA without naming it. It even uses R-GMA-specific terms (e.g. "TableAuthorisation", "log4j" !!!). Hardly passes for an architectural proposal, rather, an implementation description. Does not describe an informational model of EGEE: what are the components, what are the units and objects and actors (nodes? jobs? clusters? users?). Introduction: bluntly states that relational model is the one. No explanation is given, and it contradicts both to the hierarchical structure proposed by Globus, and the hierarchical nature of information on Grid. If it is desided that EGEE opts for a relational model, a solid reason must be given. The suggested system is a complicated hierarchy of caches and providers, which is shown (by EDG) not to be satisfactory and not scalable. The list of services should be given ih human words, not "OnDemandProducer" parlance with no obvious meaning. The structure of the information system must be described - its architecture. E.g., one expects there will be sensors, which will send info to some nodes, which in turn will be known to a central "dispatcher". Or is it flat, where each sensor knows everything about others? What is an entry point to such system? Is there a registry of resources? Is it one registry or many? How does the thing scale? Section 4.5.3 "Services" : seems like too many layers of services, couldn't it be kept simple? With so many intermediate steps the system is more likely to fail. What is the topology of producers? How scalable is the system where higher-level producers pull information snapshots from lower-level ones? Furthermore, it appears that even the primary producers have a "minimum retention period" (read - cache?); the risk of getting very outdated info propagated all the way to the clients is extremely high. IMPORTANT: assumes that users can enter data into the system. This must be declared in the [absent] architecture, as it has very strong implications for resource providers. It is a potential security hole, so a clear reasoning and methods to protect it must be given. Last sentence mentions "mediator" - who/what is that? A description is needed. How does it find the resources? Section 4.5.4 "Security": mentiones the issue above; however, does not deal with the situation when some resource owners will allow users to publish info, while others won't. This will render the proposed system useless. Either all must allow it (impossible) or it should be forbidden (pretty feasible). Another issue which is not addressed is how to deal with confidential information, which, e.g., should not be visible to other VOs, or to certain members of the same VO. How are authorisation rules supposed to propagate through all the layers of caches? Section 4.6 "Job Monitoring" ```````````````````````````` Not clear why it is a separate service. Also, the approach is wrong: it limits job monitoring to stdout monitoring, which is the least significant part of it. Job does not know about itself anything, while LRMS does. So this is a task of the CE to collect such info and publish it into the information system. No need for a separate service. Another concern: in the model of a PC cluster with worker nodes on a private network, implementation of this service is hardly possible, as it seems. Section 4.7 "Accounting" ```````````````````````` Probably the best section in the entire document: consistently presents requirements and a draft architecture. Not clear why accounting is a separate service from the Information system: after all, all the information overlaps. It will be rather silly to have different info collectors/meters for accounting and infosystem. This is why, again and again, an OVERALL ARCHITECTURE is badly needed, to prevent such overlaps. Not clear why abstraction layers are needed - to compare oranges with bananas? It assumes that if a user has a certain amount of credits, he will have to ballance between, say, storage and CPU usage. If he'll spend too much disk, he won't be able to compute for longer that a certain time. This makes no sense. An abstraction within the same class of services is OK, but not between them. Cost computation and billing: how would this system deal with dynamically changing costs? Like, sales of CPU time? Or bonuses? Or inflation? Could users split a bill? E.g., charge 50%/50% different VOs with which he is affiliated? Section 4.8 "Computing Element" ``````````````````````````````` Incredibly confusing/confused section! CE in Computational Grid is the central element, its heart, and yet this section gives a very played-down description. Is CE a gatekeeper? It represents a computing resource all right, but what is the architecture? Is there one CE per cluster? Or one per node? One per processor? It manages jobs, but why does not it manage data connected to the job? What is a job? Does it include input/output data management (location resolution, download and upload, possibly registration)? This is what CE should manage, too! Distribution of jobs within a heterogeneous cluster is a task of a local RMS, not CE - unless of course CE is a replacement for LRMS, which is not clear. Intarface to LRMS mentiones "standards". There are none, known LRMS-es do not adhere to standards. Even OpenPBS and PBSPro use different notations. Section 4.8.1 "Job management functionality": Must include job-related data management: stage in and stage out, like LRMS-es do, but in a global sence. If job description presents URLs for an input data, CE must fetch it. If job description presents URL for output data, CE must upload it in the name if the user. Must include setting up job environment - e.g., defining necessary pathes,according to job description. CE appears to be unable by design to send jobs to a queue: in the situation with several WMS-es, it is said to accept the first job to arrive, and refuse others. So it makes no use of local queues, creating instead a huge global queue. Won't scale. Section 4.8.2 "Other functionality": CE is said to send "CE availability" messages to WMS. In this respect, it duplicates the job of Information System. If it is the architectural decision, it must be mentioned. WMS should be perfectly able to interpret information in the Infosystem, and needs no messages from all the CEs. How about authentication and authorization on the resource? Is it not the job of CE? Section 4.8.3 "Internal CE architecture" mentions mapping between Grid users and local (Unix?) users, while "Authorization" does not imply it. Not clear why it is necessary. The procedure of job acceptance (creation of UC, execution in JC, MON) seems to be unnecessary complicated, and it is not clear why a single component must use Web Services inside, to communicate in between its own sub-components. This seems to add to latency and increase error probability. Missing section: Interactive tasks. Section 4.9 "Workload management" ````````````````````````````````` Assumes only one Workload Manager??? Single point of failure??? Is it the architectural decision? Why does it need Logging and Bookkeeping, while we already have got Accounting and Infosystem? Same information collected 3 times?! Why two main job management requests are "submission and cancelation"? Must be compared with any LRMS, which had requests like resubmission, suspension, and most importantly, monitoring. Matchmaking seems to be too simplistic. It must be also based on credits available for the user, bandwith, and other optimization parameters. Sections 4.9.3 and 4.9.4: are those components (ISM and TQ) necessary? Requirements must be presented. How does ISM gets filled with information? Why not to re-use R-GMA? Or are they related in some way? Section 4.9.5 "Job Logging and Bookkeeping" - as mentioned above, why is it a WMS component and not Accounting? Section 4.10 "Job Provenance" ````````````````````````````` Nice description, but it is absolutelly impossible to fulfill, as it will demand more storage than entire LHC data!!! Why not to delegate some of the described tasks to Accounting? Section 4.10.2 mentions "input/output sandboxes", which are not part of the [absent] architecture. Unclear what is actually meant, or what assumptions are made. Section 4.10.3 mentions various servers, but does not define their location - are those stand-alone, or reside on a gatekeeper? Or on Worker Nodes? or on Storage Elements? Well, as such components are not defined anyway, it is difficult to do any match. Also, this section suggests to name some files after JobID. This is impossible on Linux file systems, as all know JobIDs are URLs and include characters incompatible with file names. Or the [absent] architecture implies different pattern for JobIDs? Section 4.11 "Storage Element" `````````````````````````````` General point: almost no description of what functionality SEs are expected to provide. One would imagine they manage authentication/authorisation, disk space, quotas, support infoproviders/meters/whatever-they-are-called, keep local registry of data, publish it or propagate upon request to global registries, and so on. All this should be listed explicitly, not scattered across the section. Introduction, "Tactical Storage" - seems to be a local cache for a CE, which is a good thing to have. But then it should be a CE component. 4.11.2 sais it will use "security services for the purpose of authorization accounting" - well, accounting does not belong to security services, according to previous sections. Figure 8 is impossible to digest, it needs a WTF service... Figure 9 uses undefined notation of the Worker Node and makes strong implications of its capabilities: what will happen if WN can not communicate to services other than CE, because sysadmins decided so? Section 4.12 "File and Replica Catalogs" ```````````````````````````````````````` Unfortunately, assumes that file is a unit of information/data, which is of course not always the case. Also, in many areas, from HEP to biotech, researchers operate with datasets, not with files - rather, they don't care whether data are stored in files, databases or elsewhere. Introduction: Mentions "Replica Catalogue" which is never defined. Unclear whether replicas are always on SEs, or anywhere else? E.g., if a client fetches a file for analysis, is it considered a replica? If a job caches the file on a local disk, is it a replica? Physically, they are, but in this architecture it is not addressed. Mentions that Catalogs create the illusion of a single file system, which contradicts to the suggested design of SEs in 4.11.5, which argues that a virtual file system is possible. Decodes GUID as "Grid Unique ID", while the rest of the section uses "Globally Inique ID" - is it the same entity or different ones? Anyway, Catalogs must not assign GUIDs. GUID by definition is Globally Unique and is assigned when a file is created, not when it is registered. Like a DNS server does not assign unique IP numbers, it only lists them. Thus it is either an application that assigns a GUID, or a SE, if the application is uncapable of doing it. Section 4.12.2 "File names" argues that GUID has 1:1 correspondence to LFN. This must be a mistake, as one of them will not be needed in this case. Back to the example of IPs: one may well have several aliases (LFNs) for a unique number. For example, the same file may have different logical meaning for different analysis streams, so it may well carry several LFNs, while having only one GUID. Same section introduces SURL. Why "S" is used is unclear: it is a URL, just like any other URL, why invent a new name? Same section sais that LFN is mutable - this contradicts to HEPCAL requirements. LFNs are immutable. Missing section: Datasets. How it is foreseen to organize data in sets? They can be overlapping, have subsets and supersets. There must be a service which prevents deletion of a file when entire dataset is erased, if it belongs to another dataset, and so on. See HEPCAL on details. Figure 10: what are the cylinders?.. Section 4.12.8 invents "directory" notation and claims it "maps well" to a dataset, while of course it does not. There are some similarities, but there are differences as well. Directories largely represent an hierarchy, while datasets are not hierarchical at all. Section 4.12.10 mentions ACLs on a filesystem - which filesystem? Do Catalogs rely on filesystems? Are they not databases? Some notes on inheritance of rights are needed. E.g., if a user defines a dataset, he must be able to list its contents even if it was filled by another user. Of course, he must first define who is allowed to manipulate the dataset description and its relation to other datasets etc. That does not mean the same users can manipulate data in the dataset, of course, and vice versa. Section 4.13 "Data Management" `````````````````````````````` A wealth of fresh ideas, with well-defined architecture. However, the reason of existence of this service is not explained. It occurs as if the primarily goal of this service is bulk/asynchronus replication. It does not address regular data transfer. If only this particular service will be used for any data transfer, it will only complicate simple things, create another single point of failure and decrease the overall performance. But it is a nice theoretical service-based framework exercise, deserving a separate project. Introduction: again, implies existence of Worker Nodes with some unknown functionality. Also, uses notation of a "local SE", something which is not described in the SE section. It appears as if it only deals with data transfer between SEs, and it is unclear how users can fetch or upload files elsewhere, which is particularly important for analysis tasks. Section 4.13.3 "Services Overview": unclear which part of the service resolves LFN? For example, a user or a script running on a computing resource issues the instruction: > glite-copy MY_FILE_00283.SET2::FF . Here MY_FILE_00283.SET2::FF is a logical file name. What chain is being invoked? Which subservices are instantiated? Section 4.13.4 "Data Scheduler": this appears to be an asynchronous tool, similar to the infamous GDMP. It has no practical use in job execution. Its only use is bulk replication and synchronisation of repositories. If this is the purpose of the entire service, it must be made clear in the very first line of the Section. Section 4.14 "Metadata Catalog" ``````````````````````````````` Pretty vague, almost no substance and everything "may" or "can" be, nothing definite. Difficult to comment. Section 4.14.3 mentions tests of such services, but does not describe the outcome: did it work as expected, or was it a total failure? No reference is given anyway. Section 4.15 "Package Manager" `````````````````````````````` This must be the slot for Pacman, if one reads between the lines. Uses terms "Job Wrapper" and "JA", which are not defined anywhere. It does not even define what is a "package". And it offers to manage a file system - which is a job for SE or CE, as one may expect. Section 5 "Use cases" ````````````````````` Very useful, many things become clear by going through them. Too bad they are in the end. They'd be better presented as parts of the initial achitecture somewhere in the beginning. All the cases assume there are several WMS-es, while this is not what the corresponding Section foresees. Contradiction. Section 5.1 "Grid Login" sais that GAS will "create the appropriate Grid service instances" (item 8) - one may find it difficult to believe that GAS will create an instance of, say, a Workload Manager, or a Computing Element. This is certainly not what was meant! Sections 5.2 "Production Job" 5.3 "Analysis Job" seem to assume that the only difference between production and analysis job is that the latter has input data, while the former has no. This is not true, please refer to HEPCAL II. That's all...