APIs: General
Background
During the meeting, we referred to the architecture design
for the request manager developed in the previous meeting, and modified it
slightly to reflect our thinking. This architecture is attached as a PowerPoint
figure (11.3.Request.mgr.arch.ppt). We'll refer to it below.
Terminology
The following terminology was used to refer to various
components and concepts.
Request Manager - the component responsible for
accepting estimation and execution requests from the user application at a
particular site. This component manages multiple requests concurrently, and uses
the other services to request file transfers.
Matchmaking Service - the
component that makes decisions on the best way to execute a request. That is, it
decides the site from which to access each file of a given request based on
information it finds in the metadata catalog
Metadata Catalog - the
component that keep the updated information on where files reside, as well as
various statistics on network speed and availability. It also contains
information on which protocols each site supports (FTP, HTTP, etc.) This
component includes the "replica catalog".
Resource Manager - a component
that is associated with each resource in the system, such as disk cache, robotic
tape systems, network resources, etc. One instance of this is a "disk cache
manager" assumed to exist with each shared disk cache on the
system.
Event Component - often event data is partitioned into
"components", such as the "hits" data, the "tracks" data, the "raw" data, etc.
The reason for such partitioning is to avoid reading entire events data when
only some of the components are needed.
File Bundle - the set of files,
one for each component, needed to be available at the same time in disk cache
for processing by the application. An index is used to determine the set of
bundles needed for each request.
Sequence of operations as a result of a
request.
The following is a sequence of operations that we foresee as a
result of a typical request made by applications.
1. The application
issues an estimate
2. The Request Interpreter (RI) consults the Logical index
Service (properties-events-files) to get files involved in the estimate
request
3. The RI Consults the Matchmaking Service for estimate on executing
request
4. The application issues an "execute" request
5. The RI Consults
the Logical Index Service to generate the set of bundles for the request
6.
The set of {bundles:events} are passed to the Request Planner (RP)
7. The
application client issues a request for next events to process (the files of at
least one bundle must be in some disk cache to process events)
8. The RP
Consults the Matchmaking Service to get order of file transfer preffered
9.
The RP issues "get file, source, destination" request to the File Transfer
Manager (FTM), one file at a time
10. The FTM issues "reserve space" request
to the Resource Reservation Service (one of the Grid Services) if "destination"
is remote site
11. The FTM issues "move file" request to the File Access
Service (one of the Grid Services)
12. The File Access Service notifies the
Matchmaking Service (or metadata catalog) of the file just moved to the disk
cache
13. Application client issues commands to read events out of
files
The above actions were marked by number in the architecture figure.
The APIs are defined to correspond to these actions.
Summary of issues discussed, conclusions, and recommendations
1. The
Request Manager architecture was described using 13 function calls shown in the
architecture figure. No changes were proposed to the architecture structure,
except for one of the function calls (12), see explanation next.
2. It was concluded that the action of notifying the metadata catalog of a
file being replicated in some cache will be done by the component that executes
the file transfer. Since this component is the Grid Storage Access Service, the
API (12) was move in the architecture across the Globus line (see
Figure).
3. The APIs between the Application (making the request) and the
Request Manager were defined before the meeting (see enclosed APIs.doc). These
were discussed. It was concluded that in addition to "priority" of a request, it
is useful permit the specification of "job_type" and a "hint". For example, a
job_type can be "on_line" or "batch", where batch can be treated as a background
job. Examples of hints may be that the files for this request are "clustered" on
tapes, or that the processing rate per event is expected to be "fast" or "slow"
(specified as seconds per MB).
4. The APIs between the File Transfer
Manager (previously called "Cache Manager") and the other components were also
defined before the meeting. The File Transfer Manager is responsible for making
reservations for resources, and scheduling file transfers or file pinning. It
was agreed that performing this function a file at a time is adequate for the
time being. The next point describes the other alternative of making aggregate
resource reservations.
5. Next we discussed the API between the Request
Manager (RM) and the Matchmaking Service (MMS). There are 2 such APIs. One for
estimation of an entire request and one for the execution of a request. Both the
estimation and the execution function calls pass the entire set of bundles to
the MMS. For estimation, the MMS returns various estimation measures, such as
the number of files found in various caches, the number of files to be
transferred from one site to another, the number of files to be read from some
tape system, the average rate of file streaming, and the total estimated
time.
6. For execution, the MMS checks with the metadata catalog and the
various resource managers to determine how much of the request (a set of
bundles) can be performed at that time. According to the available resources, it
returns a subset of the files that it recommends for processing. We refer to
this recommendation as a task, to indicate that
a request execution may be
split into multiple tasks. The files in this task are ordered to indicate the
preferred order of execution of file transfers from the MMS's point of view.
There is no obligation on the part of the RM to adhere to the recommendation or
to execute the entire task at once. Each file in the task has a specification of
source and
destination, file size, and other properties (such as network
bandwidth and intermediate cache requirement). The file list should maximize the
number of bundles (at least one bundle is required) but other files that may not
participate in bundles can be added to the task. These additional files will be
needed later, and it is best to get them at the same time as the bundle files.
The functionality of this API was defined during the meeting and it is included
here (see API_rm-mms.doc). The API is asynchronous. First, the RM calls the MMS
(API 8). After the MMS figures out the plan it calls back (API 8').
7.
The API from RM to MMS also includes a function call "did_not_work", which
notifies MMS that one of the file transfer failed (most likely because its
replica was removed in the meantime). The MMS can then come up with another
recommendation for that file.
8. The API described above permits the MMS
to be stateless, i.e. it does not have to remember what tasks were recommended
or completed. After the RM performs part or all of a task, it can then issue
another call with the remaining bundles. Such a request can even be made while
it is executing the current task. In our discussions, we concluded that it might
be beneficial if the MMS does maintain the state, since it may notice that some
bundles have been cached for another request and may want to notify the RM even
without being asked for a plan. We realized that in this case the mms-rm API in
identical to the one we already have, so we decided to permit the MMS to call
the RM multiple times with the same plan_token. However, this is only an option
for a possible future implementation of the MMS.
9. The issue of who
should make the reservations was discussed. It makes sense that the MMS will
make the reservations for the task it recommends. However, we decided that we
want the RM to have the freedom to execute any part of the task in order to
benefit the application. For example, it may want to get at least one bundle
first, so that processing can start as soon as possible, or it may want to pace
the file transfers because the processing is slow and it does not want to flood
its local cache with files that will not be accessed for a long time. We felt
that this level of discretion should be given to site RMs, but in general RMs
should try to follow the recommended plan.
10. There was a long
discussion as to whether it will be beneficial to ask the Globus Storage
Reservation Service for an aggregate resource reservation. For example, would it
be useful to reserve the aggregate space on some cache necessary for reading all
the files needed out of some tape? Of course, this is a good idea. However, if
we take into account that the matchmaking service already checked that the space
was available, it is sufficient to claim that space one chunk at a time. This
choice simplifies the various APIs. We note that the option of aggregate
resource reservation (and perhaps aggregate file transfer) may be necessary for
better flow optimization.
Other recommendations
1. It is recommended that the Globus and SRB
efforts will be coordinated to provide uniform interfaces to their
functionality. In this way, the RM and MMS could interact with either system
using the same APIs. CORBA interfaces may be preferable, using IDL, as it is an
industry wide standard.
2. Similarly, it is recommended that the APIs the
Globus LDAP metadata catalog, and SRB's MCAT would be coordinated and
standardized.
3. It is recommended that the metadata catalog keep
information about the protocols supported by the sites where storage resources
reside. Using this information, the MMS can recommend a protocol (PFTP, FTP,
HTTP, etc.) that both source and destination sites support.
Open issues
1. Authorization was discussed, and there was much
confusion as to how it will be performed. Specifically, the following situation
needs clarification. The RM supports multiple users on its site. It makes the
reservations and file transfer requests on behalf of these users. Now, suppose
that a user logged in and was authenticated by Globus. When the RM requests a
reservation or file transfer on behalf of that user, what does it have to pass
Globus for Globus to authenticate the user and permit the transactions? How does
Globus perform the authorization to the remote site if the user was not
previously authenticated on that site? Does it have to perform authentication
and authorization for each reservation and file transfer request?
2. How
do we handle application failures? One option is for a resource manager to
notice that a file was not touched in a long time (set as system parameter). At
that time it may ask the application "are you alive?" Another option is to
require the application to let the system know it is alive every so often (set
as system parameter), or it is assumed to be dead. Yet another option is to
require the application to do a read every so often, or the file will be closed
by the resource manager. Although this issue was not decided, it seems that the
last solution is the easiest to implement as it uses an existing function (read
a file). This is the solution implemented in the OOFS, the system used for
BABAR.
3. Some requests may be limited as to where the files can be
cached into (e.g. the application uses objectivity). How do we handle such
requests? Perhaps it is enough for the catalog to know about types of sites and
types of files. To handle this the requests need to give hint about their type.
This issue was not resolved.
Action items
1. The API definition need to be completed. Most of the
APIs are already defined. The rm-mms API needs to be fleshed out.
2. It
was generally agreed that as soon as we agree on a version of the APIs, anybody
who wishes to participate in quick experiments should do so. Some examples are
discussed below.
2.1 Miron suggested to set up a "mock application module" in the U. of
Wisconsin that can consult the Logical Index (using the IDL) of STACS at LBNL.
The index will be modified to return a set bundles. The mock application module
will then interact with the STACS "file transfer service" to cache files as
quickly as possible to U. of Wisconsin.
2.2 The same as above could be
done, except that the destination of the file transfers will be ANL.
2.3
The same as above could be done interfacing to SAM at Fermi. An index service
and "file transfer service" at Fermi will be accessed instead of
LBNL.
2.4 A similar scenario can be set up with the application running
at ANL.
2.5 A future goal may be achieved when the STACS "file transfer
service" is made available under Globus. Then, the file transfer request could
be made directly to Globus.
2.6 If the above is successful, access from a
combination of sites can be experimented with.
These simple scenarios can provide us with much insight on the
performance of a large number of file transfers and the behavior of the network.
Some monitoring tools will need to be developed to measure the performance.