Subject:
Re: Comments on the HES document
From:
Ed Frank <efrank@hep.uchicago.edu>
Date:
Thu, 18 Apr 2002 14:13:21 -0500
To:
David Adams <dladams@bnl.gov>
CC:
Atlas Database Group <atlas-database@atlas-lb.cern.ch>, Ed Frank <efrank@hep.uchicago.edu>

Dear David (Adams),

In the spirit of RD's, "better late than never," here are some comments
about HES.  These comments are from the perspective of of the Atlas
Database Architecture (ADB).

Compared to ADB, HES expands upon some issues, omits others and is
simply different on a few.  Before expanding upon these, let me say
what is my personal stake in the discussion.  I believe that a
significant source of dissatisfaction in other experiments arises from
users feeling the data-handling requires too much effort on their
part.  When they say this, I think some of what they mean is more than
the amount of work, they also mean that they have too many surprises.
There are too many times when they do not know what to do to get what
they want or, even, that they see little correlation between their
data handling planning and the response of the system.

Therefore, when we wrote the architecture document we tried very hard,
besides worrying about performance issues (as TomL pointed out), we
worried a great deal about *clearly* stating a simple, finite
understandable language for data-handling.  We spent much of the
document trying to convey what these limited number of ideas meant.
Even though we kept the number small, people already comment that it is
hard to understand.

So, my stake in this discussion is that I want to maintain that part
of the architecture is a _small_ "data handling language,"  i.e., a
clear, small ( but sufficient ) list of things that can be done.

When I remark below that "this is different" and "that is different,"
my concern isn't so much that a proposal has been made for a change, rather,
the concern is typically that either a) a significant complication in
the user language has been introduced or b) the user language has gotten
mixed with the *implementation* ontologies, and this will ultimately
destroy the utility of the user language.  To illustrate, let me turn
to Solveig's comments.

Solveig commented that it wasn't clear why a bunch of what was inside
of the HES document was there, e.g., if HES is an implementation, then
why redefine all these terms?  I appreciate your desire to make the document
self contained, but please realize that as a consequence you are maximizing
the probability that it will be inconsistent with other documents and you
are forcing people to read _considerably_ more volume and to wonder about
the relationship between what they are reading and other documents.  So,
please count this as a vote to please make the HES document not be
self contained and please simply reference the ADB.  Similarly, you have
comments in the document that are really compute model, not HES and they
too should be excised and inserted into a dedicated compute model document.

Now, on to differences and omissions.

RD has already pointed out that the thrust of ADB was to leverage off
of collections.  Users say, "I want to look at the GoodJet data" and
so request that collection.  HES, however, in many places puts _files_
as first-class entities.  This may be an example of where we get in
trouble mixing implementation (HES) with Architecture, but please
consider the following point: A job is started that will iterate over
an enormous dataset and will select 1 in 10 events to produce an
output.  Suppose the input data-sample is 10 Terabyte in size.  We will
need to write 1000 GB of output.

Do you see that it does not work to specify an output _file?_ If we
constrain the output to be, oh, 2 GB files, we need 500 files.
Therefore, in ADB, there is an entity called a Resource Manager along
with a prescription for the interaction between the manager and
converters (or other consumers of information).  The Manager is
responsible for several things.
	1.  When a converter needs a resource (here, a file), the Manager
            provides a file from a *resource pool.*  Here, given an output
            *collection* name, we will acrete files into the output
            collections list of consumed resources.
	2.  The resource pool is the unit of site-level resource management.
            One example of a resource pool would be a set of storage allocated
            for user files, or a bunch of storage for ESD.  Therefore, site
            managers *allocate* or assign resources by talking to the
            database, making it aware of resources.  The Storage Manager
	    then forms the rendezvous between consumers, e.g., converters,
            and the sys-admins configuration.
	3.  The Resource Manager, since it is by design the rendezvous point,
            is the location of ensuring appropriate update of meta data in
            many cases.  For example, in HES/mySQL, you can imagine you produce
            a table that lists allocated resources of some kind (fileIDs 
            bound to unused storage, say).  When a converter tries to do
            a write and discovers there's not enough room in the current
            resource (see the ADB), the Storage Manager grabs an entry from
            this table.  It ALSO then updates meta-data of various kinds,
            e.g., updates the tie between logical FN and collection via
            the appropriate table.

So, in summary, two points here.  I believe there is a significant omission
here with respect providing ADB's stipulated resource manager.  Second,
the HES document's attempt to work in terms of _files_ does not/can not
work because you will not know in many instances, e.g. output, what the
files are!  They will be created, in unknown numbers, over the course of
the job.  So, there are both technical and conceptual reasons for focusing
on collections rather than files.

Continuing with differences, let me reiterate RD's point.  The collection
is _the_ fundamental concept in the ADB data-handling language.  Perhaps
it is not clear there, but collections can contain collections and so, I
think, are the same as your datasets.  I'm happy to rename things if that
is so and if that is preferable.  However, what I do wish to insist upon
is that collections/datasets return to their status as first-class entities.
Their names are the principle means of rendezvous in many places of the
system.  For example, the typical chain for a physicist locating data was
envisaged to be:

    physicist ---------------------> collection name -----------> filename(s)
               query on collection                    meta-data
                catalog or other
                 meta data

The physicist never sees the filenames.

Next difference....Placement categories and sharing.

The HES document says that one can share at the placement category level
OR at the EDO level.  At ANL, we agreed that our escalation schedule was:
	1.  data would be shared at the clustering level.
	2.  clustering and sharing would be separately configurable, BUT
            sharing would be in groups.
	3.  As in 2, but allow per-object control of sharing.
I do not wish to revisit the discussion that lead to this choice, but the
decision was to do #1 _only._   Please confirm that the HES document
does both 1 and 3 by allowing both placement category and EDO references.

With regard to placement category, I'm not sure how to even begin.
Somehow in HES they are much more intrusive on the user's psyche.
This is one of those "it's just different" situations and I don't know
how to address it at the moment.  Let me try in a separate note.

Streams....

In ADB, Streams mean exactly what they mean in Athena.  A stream is
a specification for an event selection prescription.  It is true that we
bound streams to collection names and used stream names as a token for
configuration, but you introduce a huge architectural change via streams
in HES.  When you say there can be multiple input streams, one of which
must contain eventID's and say that the system then acts on input to
load different pieces of the event data from the different streams, that
is completely foreign to ADB.  We can discuss whether to include it,
but please read here that the two just are different.  What you have
described is basically the CLEO stop-based I/O.  I'd like to stay with
the original ADB.

Please refer back to the ADB and review the first few pages.  Those
pages explain the notion of read a tape, write a tape plus three
optimizations, event sharing, data sharing and data placement.
Somehow, when I'm all done reading HES, I just don't see the
connection between those ideas and HES very clearly.  This may, again,
be because the document tries to be more than an implementation.  It
is unavoidable that implementation pick up its own concepts and
language that apply within the implementation domain, but since you
have those intermixed with incomplete recapitulations of the ADB, I
come away feeling that a) I could never use a HES based system just by
reading the ADB and that b) significant new ideas that go well beyond
implementation have been introduced, and that is why.  Given my
statement about my primary stake being the keeping of a clear, simple
data handling concept for users, I am left feeling concerned.

As a way to proceed, let me suggest the following.  The HES document is
trying to be 4 separate things at once.
   1.  It recapitulates the ADB
   2.  It defines/explores a compute model for the experiment.
   3.  It defines a substantial fraction of the meta-data related to
       data-handling.
   4.  It defines the HES implementation.

I think these should be separated.  Just drop part 1.  If you wish, I
can write a 2 paragraph summary to give context to the document.  Item
2 should also be excised and placed into a new document that David Quarrie
has been trying to get the experiment to write for a long time.  Item 3
should be excised into a separate document _except_ for those pieces that
are part of the implementation.  Consider- the meta data that you introduce
is, in many cases, represents the semantical content of the interconnect to
the grid.  It needs a separate existence.  Item 4, of course, should
remain.

Just to say it- I think  your exploration of the meta data is fantastically
valuable.  I am trying to _promote_ it to a higher level of being.

Again, my stake in this discussion is that I want to maintain that part
of the architecture is a _small_ "data handling language," i.e., a
clear, small ( but sufficient ) list of things that can be done.  The
litmus test is that a user should be able to read a _short_ document
of _concepts_ and then be able to do a substantial amount of data
handling.

Sorry for the length.  There were many other technical comments that
are better left to another time.  Thank you very much for your work on
HES.  It is exciting to see a credible realization of the proposed
database architecture!  I'd like to see HES come to life.

I'll bet you really need coffee by now.... sorry.

	-Ed


-- Ed Frank Office: (773) 702-7475 http://hep.uchicago.edu/~efrank Fax : (773) 702-1914 Enrico Fermi Institute / University of Chicago / Chicago, IL USA