CCST Logo
resources | software | research | pubs | information MCS Logo

Petabyte Access and Storage Solutions: SP System Utilization

David M. Malon and Edward N. May
Argonne National Laboratory

The purpose of the Petabyte Access and Storage Solutions (PASS) project is to develop scalable solutions to the problems of data retrieval and analysis for scientific experiments that will generate thousands of terabytes of data per year. The work is motivated in part by the needs of the high energy physics community and experiments such as those at the Large Hadron Collider now under construction at the European Laboratory for Particle Physics (CERN). A critical component of the PASS project is the development of tools for parallel query processing and for transparent parallel access to high-performance multilevel mass storage. Without such tools, vast amounts of experimental data will languish unanalyzed. The Argonne SP system has been integral to the PASS tool development work.

The SP system was used in early phases of the PASS project to address questions of how readily high energy physics queries might be parallelized. Among the problems addressed on the SP system in the second half of 1994 were provision of parallel data servers and transparent access to local and remote data on multilevel mass storage, particularly via Unitree and direct RAID utilization.

Current SP efforts are directed primarily toward efficient data retrieval and delivery to parallel physics queries. Parallelization takes place on two levels--physics queries run in parallel in what is essentially a same-program-multiple-data (SPMD) mode, and parallel data servers deliver data to concurrent query processes. Communication takes place not only across these regimes but also within them. For example, parallel query processes may share a work queue to ensure that data accessible to multiple processes are analyzed exactly once in a load-balanced way.

Queries now run with transparent access to local and remote data on disk, on raw RAID, in Unitree file systems, and on tape (8mm and DD2), and may take advantage of multiple remote data servers. Test queries run on the SP I/O nodes against a 6-gigabyte database have processed input data from raw RAID at a median rate of 30 megabytes per second, and from Unitree at a rate of 24 megabytes per second. Simultaneous queries using disjoint raw RAID databases have shown no degradation; a pair of simultaneous queries competing for data achieve a median processing rate of 21 megabytes per second.

Near-term goals of this project include testing scalability with 200 gigabytes of D0 experimental data from Fermilab, unification of the parallel query processing with the parallel data server capabilities, integration with client/server features developed for a Fermilab implementation, development of a Common Object Request Broker Architecture (CORBA) compliant query broker capability, interface redefinition using the Interface Definition Language (IDL) specifications, and testing with commercial CORBA-compliant object request brokers such as IBM's Distributed System Object Manager.

Further Information

Further details about the PASS Project will be made available on the Web in the near future. Currently available on the web is information about PTool , the lightweight persistent object manager developed at the University of Illinois at Chicago and used in many of the PASS project's SP experiments.

References

D. Malon, D. Lifka, E. May, R. Grossman, X. Qin, and W. Xu, Parallel query processing for event store data. Proceedings of Computing in High Energy Physics '94, San Francisco, April 1994.

E. May, D. Lifka, D. Malon, R. Grossman, X. Qin, D. Valsamis, and W. Xu, A multilevel object store and its application to HEP data analysis. Proceedings of Computing in High Energy Physics '94, San Francisco, April 1994.

Acknowledgments

The PASS project is funded jointly by the U.S. Department of Energy's Office of Scientific Computing as part of the High Performance Computing and Communications Initiative, and by the Office of Nuclear and High Energy Physics.

The work of the PASS project has been undertaken in collaboration with groups at Argonne National Laboratory, the Lawrence Berkeley Laboratory, the Fermi National Accelerator Laboratory, and the University of Illinois at Chicago.


Edward N. May
High Energy Physics Division
Argonne, IL 60439 USA
Phone: (708) 252-6222
FAX: (708) 252-5076
may@anl.gov
HEPnet Decnet: anlhep::enm
Bitnet: enm@anlhep

[ Account Request | Quad | Denali | Yukon | Tundra | ADSM | Announcements | CCST | MCS ]
Last updated on January 28, 2000
webmaster@mcs.anl.gov