Subject: Re: Comments on the HES document From: Ed Frank Date: Thu, 18 Apr 2002 14:13:21 -0500 To: David Adams CC: Atlas Database Group , Ed Frank Dear David (Adams), In the spirit of RD's, "better late than never," here are some comments about HES. These comments are from the perspective of of the Atlas Database Architecture (ADB). Compared to ADB, HES expands upon some issues, omits others and is simply different on a few. Before expanding upon these, let me say what is my personal stake in the discussion. I believe that a significant source of dissatisfaction in other experiments arises from users feeling the data-handling requires too much effort on their part. When they say this, I think some of what they mean is more than the amount of work, they also mean that they have too many surprises. There are too many times when they do not know what to do to get what they want or, even, that they see little correlation between their data handling planning and the response of the system. Therefore, when we wrote the architecture document we tried very hard, besides worrying about performance issues (as TomL pointed out), we worried a great deal about *clearly* stating a simple, finite understandable language for data-handling. We spent much of the document trying to convey what these limited number of ideas meant. Even though we kept the number small, people already comment that it is hard to understand. So, my stake in this discussion is that I want to maintain that part of the architecture is a _small_ "data handling language," i.e., a clear, small ( but sufficient ) list of things that can be done. When I remark below that "this is different" and "that is different," my concern isn't so much that a proposal has been made for a change, rather, the concern is typically that either a) a significant complication in the user language has been introduced or b) the user language has gotten mixed with the *implementation* ontologies, and this will ultimately destroy the utility of the user language. To illustrate, let me turn to Solveig's comments. Solveig commented that it wasn't clear why a bunch of what was inside of the HES document was there, e.g., if HES is an implementation, then why redefine all these terms? I appreciate your desire to make the document self contained, but please realize that as a consequence you are maximizing the probability that it will be inconsistent with other documents and you are forcing people to read _considerably_ more volume and to wonder about the relationship between what they are reading and other documents. So, please count this as a vote to please make the HES document not be self contained and please simply reference the ADB. Similarly, you have comments in the document that are really compute model, not HES and they too should be excised and inserted into a dedicated compute model document. Now, on to differences and omissions. RD has already pointed out that the thrust of ADB was to leverage off of collections. Users say, "I want to look at the GoodJet data" and so request that collection. HES, however, in many places puts _files_ as first-class entities. This may be an example of where we get in trouble mixing implementation (HES) with Architecture, but please consider the following point: A job is started that will iterate over an enormous dataset and will select 1 in 10 events to produce an output. Suppose the input data-sample is 10 Terabyte in size. We will need to write 1000 GB of output. Do you see that it does not work to specify an output _file?_ If we constrain the output to be, oh, 2 GB files, we need 500 files. Therefore, in ADB, there is an entity called a Resource Manager along with a prescription for the interaction between the manager and converters (or other consumers of information). The Manager is responsible for several things. 1. When a converter needs a resource (here, a file), the Manager provides a file from a *resource pool.* Here, given an output *collection* name, we will acrete files into the output collections list of consumed resources. 2. The resource pool is the unit of site-level resource management. One example of a resource pool would be a set of storage allocated for user files, or a bunch of storage for ESD. Therefore, site managers *allocate* or assign resources by talking to the database, making it aware of resources. The Storage Manager then forms the rendezvous between consumers, e.g., converters, and the sys-admins configuration. 3. The Resource Manager, since it is by design the rendezvous point, is the location of ensuring appropriate update of meta data in many cases. For example, in HES/mySQL, you can imagine you produce a table that lists allocated resources of some kind (fileIDs bound to unused storage, say). When a converter tries to do a write and discovers there's not enough room in the current resource (see the ADB), the Storage Manager grabs an entry from this table. It ALSO then updates meta-data of various kinds, e.g., updates the tie between logical FN and collection via the appropriate table. So, in summary, two points here. I believe there is a significant omission here with respect providing ADB's stipulated resource manager. Second, the HES document's attempt to work in terms of _files_ does not/can not work because you will not know in many instances, e.g. output, what the files are! They will be created, in unknown numbers, over the course of the job. So, there are both technical and conceptual reasons for focusing on collections rather than files. Continuing with differences, let me reiterate RD's point. The collection is _the_ fundamental concept in the ADB data-handling language. Perhaps it is not clear there, but collections can contain collections and so, I think, are the same as your datasets. I'm happy to rename things if that is so and if that is preferable. However, what I do wish to insist upon is that collections/datasets return to their status as first-class entities. Their names are the principle means of rendezvous in many places of the system. For example, the typical chain for a physicist locating data was envisaged to be: physicist ---------------------> collection name -----------> filename(s) query on collection meta-data catalog or other meta data The physicist never sees the filenames. Next difference....Placement categories and sharing. The HES document says that one can share at the placement category level OR at the EDO level. At ANL, we agreed that our escalation schedule was: 1. data would be shared at the clustering level. 2. clustering and sharing would be separately configurable, BUT sharing would be in groups. 3. As in 2, but allow per-object control of sharing. I do not wish to revisit the discussion that lead to this choice, but the decision was to do #1 _only._ Please confirm that the HES document does both 1 and 3 by allowing both placement category and EDO references. With regard to placement category, I'm not sure how to even begin. Somehow in HES they are much more intrusive on the user's psyche. This is one of those "it's just different" situations and I don't know how to address it at the moment. Let me try in a separate note. Streams.... In ADB, Streams mean exactly what they mean in Athena. A stream is a specification for an event selection prescription. It is true that we bound streams to collection names and used stream names as a token for configuration, but you introduce a huge architectural change via streams in HES. When you say there can be multiple input streams, one of which must contain eventID's and say that the system then acts on input to load different pieces of the event data from the different streams, that is completely foreign to ADB. We can discuss whether to include it, but please read here that the two just are different. What you have described is basically the CLEO stop-based I/O. I'd like to stay with the original ADB. Please refer back to the ADB and review the first few pages. Those pages explain the notion of read a tape, write a tape plus three optimizations, event sharing, data sharing and data placement. Somehow, when I'm all done reading HES, I just don't see the connection between those ideas and HES very clearly. This may, again, be because the document tries to be more than an implementation. It is unavoidable that implementation pick up its own concepts and language that apply within the implementation domain, but since you have those intermixed with incomplete recapitulations of the ADB, I come away feeling that a) I could never use a HES based system just by reading the ADB and that b) significant new ideas that go well beyond implementation have been introduced, and that is why. Given my statement about my primary stake being the keeping of a clear, simple data handling concept for users, I am left feeling concerned. As a way to proceed, let me suggest the following. The HES document is trying to be 4 separate things at once. 1. It recapitulates the ADB 2. It defines/explores a compute model for the experiment. 3. It defines a substantial fraction of the meta-data related to data-handling. 4. It defines the HES implementation. I think these should be separated. Just drop part 1. If you wish, I can write a 2 paragraph summary to give context to the document. Item 2 should also be excised and placed into a new document that David Quarrie has been trying to get the experiment to write for a long time. Item 3 should be excised into a separate document _except_ for those pieces that are part of the implementation. Consider- the meta data that you introduce is, in many cases, represents the semantical content of the interconnect to the grid. It needs a separate existence. Item 4, of course, should remain. Just to say it- I think your exploration of the meta data is fantastically valuable. I am trying to _promote_ it to a higher level of being. Again, my stake in this discussion is that I want to maintain that part of the architecture is a _small_ "data handling language," i.e., a clear, small ( but sufficient ) list of things that can be done. The litmus test is that a user should be able to read a _short_ document of _concepts_ and then be able to do a substantial amount of data handling. Sorry for the length. There were many other technical comments that are better left to another time. Thank you very much for your work on HES. It is exciting to see a credible realization of the proposed database architecture! I'd like to see HES come to life. I'll bet you really need coffee by now.... sorry. -Ed -- Ed Frank Office: (773) 702-7475 http://hep.uchicago.edu/~efrank Fax : (773) 702-1914 Enrico Fermi Institute / University of Chicago / Chicago, IL USA