2

Monitoring use case: Host selection for computationally intensive applications

Description:

A researcher intends to start a long running computational task. Suitable computational resources, within the grid, need to be discovered by the researcher and/or his proxy (e.g. GRAPPA). Potential discriminating information may include historical values of various compute node characteristics. By analyzing current historical trends the researcher can better predict the potential running time of the task and find the "best" node to execute on. The proper sensors, the archival system and an interface to the MDS will be developed.

Contact:

Patrick McGuigan, mcguigan@cse.uta.edu

Performance events/sensors required:

CPU Load, available memory, available free secondary storage, network interface utilization.

How the performance information will be used:

Estimation of running time for computationally intensive tasks may be predicted by analysis of the data contained within the archives of the measurements.

Access needed:
Physicists need access to both current data and archived data.
It is assumed that both current measurements and archive information will be published via MDS.
GRIS servers will initiate information providers that access the RDBMS to retrieve archives and current measurements.

Size of data to be gathered

A single measurement should be as simple as a <timestamp, value> tuple. An archive should be an ordered listing of measurement tuples.
The size of each archive will be constrained by the frequency of measurement, the duration of the logging, and the size of the above tuple. For the values used below, an archive will contain 20,160 tuples.

Overhead constraints

The collection, storage and retrieval of measurement data should impose a minimal load on the host.

Frequency data will be updated

At this time the frequency is expected to be twice per minute.

Frequency data will be accessed

The frequency at which the data will be accessed has an upper limit based on the frequency of ATLAS job submissions for computationally intesive tasks.

How timely does data need to be:

For this scenario, timliness is very important.

Scale issues:

Scaling is an issue with the system if many potential hosts are asked for archival information. It is possible that the list of potential hosts can be reduced based on other factors (e.g. OS, architecture set, installed software base)

Security requirements

Users of the system will need to have valid grid credentials for accessing the archives via MDS.

Consistency or failure concerns

The largest concern is for the failure of the RDBMS and its host. If information providers can not access the database, then requests for data from MDS clients will fail and suitable hosts may not be found for jobs.

Other explicit requirements

Duration of the logging:

At this time, an archive should contain 1 week’s worth of measurements.

Platforms:

VDT 1.0 on Linux