|
Proposed 2000 and 2001 Livermore Computing Services to ASCI Alliance Sites |
||
|
Evolution of LC Computing Environment Computing
Infrastructure |
IntroductionThe monthly tri-laboratory/Alliance telecons indicate that Alliance sites are interested in an overview of services proposed by Livermore Computing (LC) to the Alliances in the coming year and a half. This short document provides a high- level view of the proposed service structure as well as some links to more detailed information. It also distills what has been learned concerning Alliance expectations and discusses issues arising therefrom. This document is intended as a starting point for more detailed discussions with Alliance sites in the near future. Evolution of LC Computing EnvironmentComputing InfrastructureCurrently, LC offers a fair share of 41% of the 0.9 TeraOP ASCI Blue-Pacific computer to the Alliance sites. While discussions with our vendor partner, IBM, are far from finalized, it is now likely that we will retire Blue-Pacific in Spring 2001, after the 12 TeraOP system stabilizes. Depending on the outcome of the negotiations, we will then be able to replace Blue-Pacific with a significant fraction of the current classified 3.9 TeraOP system. Such action will increase Alliance access both to capacity and to capability significantly. The magnitude of the increase depends on the outcome of discussions with IBM. Details will be reported as soon as we are certain of what can be managed, but the vector is up. Long-Term Storage InfrastructureLLNL manages a long-term storage environment in its Open Computing Facility (OCF). The unclassified IBM Blue system resides in LC’s OCF. Since network bandwidth between LLNL and Alliance sites will continue to lag demand, especially in the case of large data sets, we expect there will be continued requirements for access to OCF storage environments. Current FY00 plans for this environment are shown in Figures 1 and 2 and Tables 1 and 2.
Table 1. FY00 OCF end-node aggregate transfer rate specifications.
Table 2. FY00 OCF/SCF capacity specifications.
This capability will be shared by LLNL institutional users and stockpile stewardship program users, including Alliance customers. Assuming that funding is available in FY01, this environment would be further enhanced. Obviously, these resources are neither infinite nor are they infinitely expandable, so we request that users show discretion in the use of these assets. In any event, remote customers, in planning their modus operandi, are invited to count on the ability to store some large data sets and restart files remotely at LLNL. Recommended Use of Storage EnvironmentsTables 3-5 give guidance about how the storage environments should be used. Table 3. Recommended use of storage systems.
Table 4. Recommended method to transfer files off IBM to HPSS.
Table 5. Recommended method to transfer files to and from LLNL.
The only transfer mechanisms between Alliances and LLNL is currently through FTP using SSH (or subsequent upgrade) for security. Alliances will FTP their files to an LLNL host before sending them to storage and vice versa. In other words, a hop is required. GSSFTP requires cross-cell trust relationships between LLNL and the remote Alliance site, which may not be permitted within the time frame of this document. Remote NetworkingLLNL has an OC-3 (155 MB/s) ATM service provided by ESnet for Internet access. This OC-3 service is also shared with several DOE labs and facilities via a mesh of virtual private networks (VPNs). By late CY00 we expect that the ASCI WAN will be in production (it went out for an RFQ in March 2000), which should greatly reduce this VPN traffic on the ESnet service. Additionally, LLNL is considering an upgrade of the ESnet service to OC-12. This will only be pursued if we are able to:
Alliances interested in pursuing this discussion further should contact Dave Wiltzius at wiltzius1@llnl.gov. LC is constrained by information security measures imposed by DOE, as is discussed in more detail later in this document. This presents significant challenges for providing high-speed access to our computing resources. Customer ServicesThe LC Customer Services and Support Group provides technical training, online documentation, and a call center for all local and remote users of the LC production machines. The group consists of 24 people, including documentarians, training instructors, computer scientists, and support personnel.
Resource ManagementMany of the capabilities requested by Alliances are provided by the Globus tool kit. Globus has been selected by the ASCI Distributed Resource Management (DRM) team as the mechanism for providing distributed computational services within the ASCI community. Unfortunately, the realities of the existing ASCI laboratory security infrastructure preclude the possibility of deploying ASCI resources as an integral part of the "grid." However, we hope that Alliance sites may experience immediate benefits through being able to make use of Globus capabilities within the ASCI environment. For our initial DRM offering, we envision that an Alliance customer would access the ASCI resources by using SSH (and shortly, a CRYPTOCard) to perform a secure interactive login to an ASCI resource on which the Globus client has been installed. The user would then utilize his local workstation much like an X-terminal while operating within the tri-lab ASCI environment. Within the tri-lab ASCI community, Globus capabilities would be provided incrementally as they are installed and tested. The first Globus capabilities planned would be interactive access to the ASCI resources (jobmanager-fork) and batch access (jobmanager-dpcs, jobmanager-lsf, etc.). VisualizationImmediate analysis may occur on Blue or on an exception
basis on our large unclassified SGI server.* If analysis of the data will
occur over a long period of time, we recommend that the data be sent back
to the PI’s site. The infrastructure for data analysis at LC was not designed
to support multiple remote users. If a data set is over 0.5 TB and long-term
analysis is needed, you should contact LC to see how we can support your
special needs. Transferring such a large data set could cause significant
delays for all other traffic to and from LLNL and should be scheduled.
We will consider supplying long-term access to our SGI server to support
data analysis on a case-by-case basis. Alliance ExpectationsLC recently contacted Alliance sites with the desire to determine future computing requirements. Other than the implicit requirements of interactive access, batch systems, and file transport capabilities, we also inferred the following additional requirements:
IssuesSpecifically, the current ASCI DRM product plan will be based upon DCE/Kerberos authentication but the standard Globus product will be based upon PKI, at least for the next 2 to 3 years. This makes the requirements specified above difficult to attain for at least 2 years. More generally, there are issues regarding some of the broader goals outlined above. For instance, the request, "without regard to geographical or agency boundaries," potentially clashes at some level with the mandate that LLNL demonstrate to DOE and Congress diligence in the protection of sensitive information. Some of this sensitive but unclassified information will continue to co-inhabit the networks, and that may also be accessed by Alliance customers' systems. At times the detail of security solutions proposed are not locally determined. It could happen that LC finds itself imposing security measures that do not cooperate with security capabilities inherent in applications sponsored or desired by Alliances. However, we do intend to provide Globus capabilities within the tri-lab ASCI environment and extend those capabilities to Alliance sites as possible. The goal then, is to work towards making it possible for people to complete large simulations and access resulting data sets even if such processes are not as easy as they might otherwise be in the absence of comprehensive security protections. SummaryWe have provided a very high level outline of the services that we believe are attainable. Near-term services are particularly constrained by DOE and LLNL information security mandates and policies and by limited resources at LC. We are eager to work with the Globus development community to help resolve these issues related to institutional security policies and practices. Nonetheless, this paper provides a starting point for discussion, and through such discussions it is likely that we can discover refinements to this proposal that can be put to use to the advantage of the Alliances. To summarize how Alliance users will access LC’s computing resources, at least in the near term (CY00-01):
|
|
For more information, contact the Livermore Computing Hotline, (925) 422-4532, or e-mail the consultants at lc-hotline@llnl.gov UCRL-MI-138472 |