Resources
from draft document Reliability of Grid
Computing Systems
This list
of references is also found in the draft OGF informational document, titled Reliability
of Grid Computing Systems.
[Abaw2004] J. H. Abawajy, J., “Fault-Tolerant Scheduling Policy for Grid Computing systems”, 18th International Parallel and
Distributed Processing Symposium, April, 2004,
[Acce2007] AccessGrid Home Page. http://www.accessgrid.org/, 2007.
[Alti2005] Altintas,
[Anan2003] Anand, S., et al., “Flow-based Multistage Co-allocation Service,” The 2003 International Conference on Communications in Computing, Las Vegas, Nevada, USA, June 2003.
[Andr2002] Andrzejak, A., Graupner, S., Kotov, V., and Trinks, H., “Algorithms for Self-Organization and Adaptive Service Placement in Dynamic Distributed Systems,” Hewlet Packard Corporation, HPL-2002-259, 2002.
[Alvi2001] Alvisi, L., and et. al. "Wrapping Server-Side TCP to Mask Connection Failures," in INFOCOM 2001, 22-26 April 2001, vol. 1, pp. 329-337.
[Arno1999]
[Aviz2004] Avizienis, A., Laprie, J., Randell, B., and Landwehr, C. “Basic Concepts and Taxonomy of Dependable and Secure Computing,” IEEE Transactions on Dependable and Secure Computing, Volume 1, Number. 1, January-March 2004.
[Bane2002] Banerjee, S., Bhattacharjee, B., and
Kommareddy, C., "Scalable Application Layer Multicast," ACM SigComm, 2002.
[Barc2005] Barcello, M., “Evaluating High-Throughput Reliable Multicast for Grid Applications in Production Networks,” 2005 IEEE International Symposium on .
[Bart2003] Bartolini, N., Presti, F.L. , and Petrioli, C. "Optimal Dynamic Replica Placement in Content Delivery Networks," The 11th IEEE International Conference on Networks, ICON 2003, 2003, pp. 125-130.
[Batc2004]
R.
Batchu, Y. Dandass, A. Skjellum, and M. Beddhu, “MPI/FT: A Model-Based Approach
to Low-Overhead Fault Tolerant Message-Passing Middleware,” Cluster
Computing, pp. 303–315, Oct. 2004.
[Baus2003] Bausch, W., Pautasso, C., and Alonso, G., “Programming for Dependability in a Service-based Grid,” Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID.03), 2003.
[Bell2002]
[Bell2003]
[Bezz2006] Bezzine, S., et al., “A Fault Tolerant and Multi-Paradigm Grid Architecture for Time Constrained Problems: Application to Option Pricing in Finance,” Second IEEE International Conference on e-Science and Grid Computing, 2006, p. 49, December 2006.
[Bosc2002] Bosilca, G., et al., “MPICHV: Toward a Scalable Fault Tolerant MPI for Volatile Nodes”, Proceedings of IEEE SuperComputing, November 2002.
[Bout2005] Bouteiller, A., et al., “MPICH-V: a Multiprotocol Automatic Fault Tolerant MPI,” International Journal of High Performance Computing and Applications, Volume 20, Issue 3, pp. 319-330, 2006.
[Bunt2007] Buntinas, D., et al., “Blocking vs. Non-Blocking Coordinated Checkpointing for Large-Scale Fault Tolerant MPI,” Accepted for publication in Future Generation Computer Systems, Elsevier Press, 2007
[Chen2002a] Chen, M., Kiciman, E., Fratkin, E., Fox, A., and Brewer, E. “Pinpoint: Problem Determination in Large, Dynamic Internet Services”, Proceedings of 2002 International Conference on Dependable Systems and Networks (DSN), IPDS track, Washington, DC, June 23-26, 2002.
[Cher1999] Chervenak, A., et al., “The Data Grid: Towards an Architecture for the
Distributed Management and Analysis of Large Scientific Data Sets,” Journal of Network and Computer Applications,
2001(23): pp. 187-200.
[Cher2002] Chervenak, A., et al., “Giggle: A Framework for Constructing
Scalable Replica Location Services,” SC2002
Conference,
[Cher2004]. Chervenak, A.L., et al., “Performance and Scalability
of a Replica Location Service,”
Thirteenth IEEE Int'l Symposium High Performance Distributed Computing (HPDC-13),
[Cher2005] Chervenak, A.,Schuler, R., Kesselman, C., Koranda, S., and Moe, B. “Wide area data replication for scientific collaborations,” Proceedings of the 6th International Workshop on Grid Computing, November 2005.
[Chiu1998]
Chiu, D.,
[Choi1999] J. Choi, M. Choi, and S. Lee. An Alarm
Correlation and Fault Identification Scheme Based on OSI Managed Object Classes.
In IEEE International Conference on Communications,
[Chun2004] Chun, G., et al., “Benchmark Probes for Grid Assessment,” The 18th International Parallel and Distributed Processing Symposium (IPDPS'04), p. 276a, 2004.
[Clap2004] Clapp, G., Gannet, J., and Skoog, R., “Requirements and Design of a Dynamic Grid Networking Layer,” 2004 IEEE International Symposium on Cluster Computing and the Grid, 2004.
[Coll2007] Colling, D., et al., “On Quality of Service Support for Grid Computing,”
The 2nd International Workshop on Distributed Cooperative Laboratories and Instrumenting the GRID (INGRID 2007), April, 2007
[Cond2007] “Adding high availability to Condor Central manager,” See http://dsl.cs.technion.ac.il/projects/gozal/project_pp./ha/ha.html.
[Cox2002] Cox, W., et al, Web Services Transaction
(WS-Transaction), 2002. See http://dev2dev.bea.com/pub/a/2004/01/ws-transaction.html.
[Cybo2006] Cybok, D., “A Grid workflow infrastructure,” Concurrency and Computation: Practice And Experience, Volume 18, Issue 10, pp. 1243–1254, 2006.
[Czaj1999] Czajkowski, K., Foster,
[Das2002] A. Das, I. Gupta, and A. Motivala, “Swim: Scalable weakly-consistent
infection-style process group membership protocol,” in Proc. of Intl. Conf. on Dependable Systems and Networks (DSN’02), pp.303–312, June 2002.
[Deel2003] Deelman, E., et al., “Mapping Abstract Complex Workflows onto Grid Environments,” Journal of Grid Computing, Volume 1, pp. 25-39, 2003.
[Demm1989] Demmy, W. and Petrini, A., “Statistical Process
Control in Software Quality Assurance,” Proceedings
of the 1989 National Aerospace and Electronics Conference,
[Deri2004] Deris, M., Abawajy, J., Suzuri, H. “An efficient replicated data access approach for large-scale distributed systems,” IEEE International Symposium on Cluster Computing and the Grid, April 2004.
[Dull2001] Dullman, D. et al., “Models for Replica Synchronisation and Consistency in a Data Grid,” Proceedings. 10th IEEE International Symposium on High Performance Distributed Computing, pp.67-75, 2001.
[Duar2006]
[Dura2005] Durand, J., and Karmarkar, A., “Message
Reliability Protocol Standards for Web Services : An Analysis,” The 3rd IEEE European Conference on Web
Services (IEEE ECOWS 2005), November 2005,
[Elno2002] Elnozahy, E., Johnson, D., and Wang, Y., “A survey of rollback recovery protocols in message-passing systems,” ACM Computing Surveys, Volume 34, Issue3, pp. 375–408, 2002.
[Emme2005] Emmerich, W., et al., “Grid Service Orchestration Using the Business Process Execution Language (BPEL),” Journal of Grid Computing, Volume 3, pp. 283–304, 2006.
[Fahr2005] Fahringer, T., et al., “ASKALON: a tool set for cluster and Grid computing,” Concurrency and Computation: Practice and Experience, Volume 17, pp. 143-169, 2005.
[Fang2007] Fang, C., et al. “Fault tolerant Web Services,” Journal of Systems Architecture, Volume 53, Issue 1, January 2007, pp. 21-38 (Request #45405, received /22/07)
[Fost2005]
Foster et al., A Globus Primer Describing Globus Toolkit Version 4, Draft
[Frey2001] Frey,
J., Tannenbaum, T., Livny, M., Foster, I., Tuecke, S., “Condor-G: A Computation
Management Agent for Multi-Institutional Grids,” Proceedings of the Tenth IEEE International Symposium on High
Performance Distributed Computing,
[Fox2005] Fox, G., Pallickara, S., Pierce, M., and Gadgil, H., “Building Messaging Substrates for Web and Grid Applications,” Philosophical Transactions of the Royal Society: Mathematical, Physical and Engineering Sciences (Scientific Applications of Grid Computing Special Issue), Volume 363 Issue 1833, pp.1757–1773, 2005
[Fox2006] Fox, G., “Collaboration and Community Grids,” International Symposium on Collaborative Technologies and Systems, pp. 419- 428, May 2006.
[Gabr2003] Gabriel, E., Fagg, et al., "A Fault-Tolerant
Communication Library for Grid Environments," Seventeenth Annual ACM International Conference on Supercomputing
(ICS'03), International Workshop on Grid Computing and e-Science,
[Glob2005] Reliable File Transfer (RFT) Service, Globus Toolkit, version 4.0, http://www.globus.org/toolkit/docs/4.0/data/rft/.
[Grah2002] Graham, R., et al., “A Network-Failure-Tolerant
Message-Passing System For Terascale Clusters,” Proceedings of the 16th international conference on
[Gray2004] Gray, J. and Lamport, L., “Consensus on Transaction Commit,” Microsoft Research Corporation, MSR-TR-2003-96.
[Grus1998] Gruschke, B., “A New Approach for Event Correlation
based on Dependency Graphs,” Fifth
Workshop of the OpenView University Association: OVUA’98,
[Gupt2001] Gupta, T. D. Chandra, and G. S. Goldszmidt, “On scalable and efficient distributed failure detectors,” Proceedings of 20th Annual ACM Symposium on Principles of Distributed Computing, pp. 170–179, 2001.
[Hill2005] Hillenbrand, M., Götze, J., and Müller, P., “Creating Dependable Web Services Using User-transparent Replica,” Proceedings of the International Conference on Next Generation Web Services Practices (NWeSP’05), 2005.
[Hilt2001] Hiltunen, M.A.; Schlichting, R.D.;
[Hori2005] Horita, Y., Taura, K., and Chikayama, T. A Scalable and Efficient Self-Organizing Failure Detector for Grid Applications, Grid Computing Workshop, 2005.
[Hosc2000] W. Hoschek, W., et al. “Data management in an international data
grid project,” Proceedings of GRID Workshop, pp. 77–90, 2000.
[Hued2006] Huedo, E., Montero, R. S., and Llorente,
[Hwan2003]
Hwang, S., and Kesselman, C., GridWorkflow : A Flexible Failure Handling
Framework for the Grid,” In Proceedings of the 12th IEEE Intl. Symposium on HPDC, 2003.
[Iamn2000] Iamnitchi, A. and Foster, I. "A problem specific fault tolerance mechanism for asynchronous, distributed systems," in Proceedings of 2000 International Conference on Parallel Processing (29th ICPP'00), Toronto, Canada, August 2000, IEEE.
[Ietf1985] File Transfer Protocol, Internet Engineering Task Force (IETF), http://www.ietf.org/, RFC 959, October 1985.
[Ietf1995] A Border Gateway Protocol 4 (BGP-4), Internet Engineering Task Force (IETF), http://www.ietf.org/, RFC 1771, March 1995.
[Ietf1999] Multicast Dissemination Protocol version 2 (MDPv2) – Internet Draft, Internet Engineering Task Force, October 1999.
[Ietf2001] Multiprotocol Label Switching Architecture, Internet Engineering Task Force (IETF), http://www.ietf.org/, RFC 3031, January 2001.
[Ietf2002a] Version 2 of the Protocol Operations forthe Simple Network Management Protocol (SNMP),Internet Engineering Task Force (IETF), http://www.ietf.org/, RFC 3416, December 2002.
[Ietf2002b] Overview and Principles of Internet Traffic Engineering, Engineering Task Force (IETF), http://www.ietf.org/, RFC 3272, May 2002.
[Ietf2002c] Applicability Statement for Traffic Engineering with MPLS, Internet Engineering Task Force (IETF), http://www.ietf.org/, RFC 3346, August 2002.
[Ietf2007a] Open Shortest Path First IGP (Interior Gateway Protocol), Internet
Engineering Task Force (IETF), http://www.ietf.org/ 2007.
[Ietf2007b] NACK-Oriented Reliable Multicast (NORM) Protocol,
Internet Engineering Task Force (IETF), http://www.ietf.org/, March 2007.
[Jain2004] Jain, A. and Shyamasundar, R., Failure Detection and Membership Management in Grid Environments, in Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing (GRID’04), 2004.
[Jits2007] Jitsumoto, H., Endo, T., Matsuoka, S., "ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs," IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007) pp.1-8, March 2007.
[Jo2005] Jo, J., Seok, W., Kwak, J. and Byeon, O., “Design and Implementation of QoS Measurement and Network Diagnosing Framework for IP Multicast in Advanced Collaborative Environment,” Proceedings of the Fourth Annual ACIS International Conference on Computer and Information Science (ICIS’05), 2005.
[Juha2003]
Juhasz, Z., Andics, A., and Szabolcs P., “Towards a Robust and Fault-Tolerant Discovery Architecture for Global
Computing Grids” Scalable Computing:
Practice and Experience, Volume 6, Number 2, pp. 22-33. 2003.
[Keya2002] Keyani, P., Larson, B., and Senhil, M. “Peer Pressure: Distributed Recovery from Attacks in Peer-to-Peer Systems”, in Web Engineering and Peer-to-Peer Computing, Gregori, E. et al. (eds.), NETWORKING 2002 Workshops, Pisa, Italy, May 19-24, 2002, Revised Papers, Lecture Notes in Computer Science 2376 Springer 2002, ISBN 3-540-44177-8, pp. 306-320.
[Khar2004] Kharchenko, V.,Popov, P., andRomanovsky, A., “On Dependability of Composite Web Services with Components Upgraded Online,” Proceedings of the International Conference on Dependable Systems and Networks (DSN 2004), Florence, Italy, pp. 287–291, June 2004.
[Koeh2003] Koehler, J., and Srivastava, B., “Web Service Composition - Current Solutions and Open Problems.” ICAPS 2003 Workshop on Planning for Web Services, pp. 28 – 35, 2003.
[Kola2005] Kola, G., Kosar, T., and Livny, M., "Faults
in Large Distributed Systems and What We Can Do About Them", Proceedings
of 11th European Conference on Parallel Processing (Euro-Par 2005), pp.
442-453,
[Kris2002] Krishnan S., Wagstrom P., and von Laszewski G., “GSFL: A workflow
framework for Grid services,” http://www-unix.globus.org/cog/papers/gsfl-paper.pdf,
January 2004.
[Kuo2005] Kuo, D. and Mckeown, M., “Advance Reservation and Co-Allocation Protocol For Grid Computing,” Proceedings of the First International Conference on e-Science and Grid Computing (e-Science’05), 2005.
[Lac2006] Lac, C. and Ramanathan, S., "A Resilient Telco Grid Middleware," Proceedings of the 11th IEEE Symposium on Computers and Communications (ISCC'06), pp. 306-311, 2006
[Lame2002] Lamehamedi, H., Szymanski, B., Shentu, Z.,
Deelman, E. , “Data replication strategies in grid environments,” Proceedings of the Fifth International
Conference on Algorithms and Architectures for Parallel Processing, 2002, pp.
378- 383.
[Lamp2001] L. Lamport, L., Paxos made simple. ACM SIGACT News (Distributed Computing Column), Volume 32, Number 4, pp. 18-25, 2001.
[
Lan2002] Lan, J. Cache Consistency Techniques for Peer-to-Peer File Sharing Networks, Master’s Thesis, Department of Computer Science, University of Massachusetts Amherst, June 2002.
[Lanf2002] Lanfermann, G., Allen, G., Radke, T., and Seidel, E., "Nomadic Migration: Fault Tolerance in a Disruptive Grid Environment," Second IEEE/ACM International Symposium Cluster Computing and the Grid, 2002, pp. 280, May 2002.
[Leau2006] Leai, K., Tan, L., Turner, K. “Orchestrating Grid Services using BPEL and Globus Toolkit 4,” Proceedings of the 7th PGNet Symposium, pp. 31-36, 2006.
[Lean2004]
Leangsuksun, C., et al., “A Failure Predictive and Policy-Based High Availability
Strategy for Linux High Performance Computing Cluster,” The Fifth LCI International Conference on Linux Clusters: the HPC
Revolution 2004,
[Lee2001] Lee, B. and Weissman, J. B. "Dynamic Replica Management in the Service
Grid," in IEEE 2nd International Workshop on Grid Computing, November, 2001.
[Lee2003] Lee H., and et. al.,"Grid Fault Tolerance Service for Quality of Service", The 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003.
[Lei2007] Lei, M.; Vrbsky, S.V.; Zijie, Q., “Online Grid Replication Optimizers to Improve System Reliability,” IEEE International Symposium on Parallel and Distributed Processing Symposium, pp. 26-30 March 2007
[Li2006] Li, Q., Xu, M., and Zhang, H., “A Root-fault Detection System of Grid Based on Immunology. Proceedings of the Fifth International Conference on Grid and Cooperative Computing (GCC 2006), Changsha, China, October 2006. pp. 369-373
Web Service Robust GridFTP",
The 2004 International MultiConference in
Computer Science and Computer Engineering,
[Lima2005a]
K. Limaye, C. B. Leangsuksun, et. al, “Job-Site Level Fault Tolerance
for Cluster and Grid environments”, The
2005 IEEE Cluster Computing,
[Lima2005b] Limaye, K.
Tikotekar, A., and Leangsuksun, B. “Fault tolerance-enabled HPC resource management with
HA-OSCAR framework,” High Availability
and Performance Computing Workshop,
[Liu2004] Liu, X., Xia, H., and Chien, A., “Validating and Scaling the MicroGrid: A Scientific Instrument for Grid Dynamics,” Journal of Grid Computing, Volume 2, Number 2, pp. 141-161, 2004.
[Liu2005] Liu, Y. Leangsuksun, C., Song, H., and Scott, S., "Reliability-aware Checkpoint/Restart Scheme: A Performability Trade-off," Proceedings of IEEE International Conference on Cluster Computing, September 2005
[Look2004a] Looker, N., Munro, M., and Xu, J., “Practical
Dependability Analysis of SOAP Based Systems,” Proceedings of the
[Look2004b] Looker, N., Munro, M., and Xu, J., “WS-FIT: A
Tool for Dependability Analysis of Web Services,” Proceedings of the 28th Annual International Computer Software and
Applications Conference (COMPSAC),
[Look2005] Looker, N., Burd, S., Drummond, M., and Munro,
M., "Pedagogic Data as a Basis for Web Service Fault Models," IEEE International Workshop on
Service-Oriented System Engineering,
[Look2007] Looker, N., Munro, M., and Xu, J., "Determining the Dependability of Service-Oriented Architectures," Submitted to the International Journal of Simulation and Process Modelling, 2007.
[Loug2002]
Loughran, Making Web Services that Work,
HP Laboratories, Hewlet-Packard Corporation, HPL-2002-274, 2002.
[Louc1998] Louca, S., Neophytou, N., Lachanas, A., and Evripidou,
P., “MPI-FT: A portable fault tolerance scheme for MPI,” Proceedings of the PDPTA ’98 International Conference,
[Lowe2003] Lowekamp, B., et al., “Enabling Network Measurement Portability Through a Hierarchy of Characteristics,” Proceedings of the Fourth International Workshop on Grid Computing (GRID’03), 2003.
[Lui2006] Lui, P. and Wu, J. J., “Optimal Replica Placement Strategy for Hierarchical Data Grid Systems,” Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06), pp. 417-420, 2006
[Macl2006] MacLaren, J., Keown, M., and Pickles,S., “Co-Allocation,
Fault Tolerance and Grid Computing,” Proceedings
of the
[Marc2001] Marchetti, C., Virgillito, A., and Baldoni, R. “Design of an Interoperable FT-CORBA Compliant Infrastructure,” Proceedings of the European Research Seminar on Advances in Distributed Systems (ERSADS), 2001.
[Matt2006] Mattmann, C., et al., “A Classification and Evaluation of Data Movement Technologies for the Delivery of Highly Voluminous Scientific Data Products,” National Aeronautics and Space Administration, Document 20060044153, 2006.
[Milo2000] Milojicic, D., Douglis, F., Paindaveine, Y., Wheeker, R., and Zhou, S. "Process Migration Survey," ACM Computing Surveys, September, 2000.
[Mill2006] Mills, K. and Dabrowski, C. “Investigating
Global Behavior in Computing Grids.” Self-Organizing
Systems, Lecture Notes in Computer Science, Vol. 4124, pp. 120-136, 2006.
[Mill2007] Mills, K. and Dabrowski, C., “Can Economics-based Resource Allocation Prove
Effective in a Computation Marketplace?” accepted for publication to the Journal of Grid Computing, 2007.
[Mogi2006] Mogilevsky, D., Koenig, G., Yurcik. W., “Byzantine Anomaly Testing for Charm++: Providing Fault Tolerance and Survivability for Charm++ Empowered Clusters,” Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid Workshops (CCGRIDW'06), p. 30, May 2006.
[Mpi2003] MPI: A Message-Passing Interface Standard, Message Passing Interface Forum, http://www.mpi-forum.org/, 2003.
[Oasi2004a] Business Transaction Protocol (BTP) Version 1.1, Committee Draft, June 2004.
[Oasi2004b] Web
Services Base Faults (WS-BaseFaults), OASIS, 2004.
[Oasi2004c] WS-Reliability 1.1, OASIS, Committee Draft 1.086, August 2006.
[Oasi2006a] Web Services Business Process Execution Language (WSBPEL), OASIS WS-BPEL 2.0 Committee Draft, May 2006.
[Oasi2006b] Web Services Reliable Messaging (WS-ReliableMessaging), Committee Draft 04, wsrm-1.1-spec-cd-04, August 2006
[Oasi2007] Web Services Coordination (WS-Coordination), Version 1.1 OASIS Standard, April 2007.
[Ogf2004a] Networking Issues for Grid Infrastructure, Open Grid Forum Informational Document, GFD-I.037, November 2004.
[Ogf2004b] A Hierarchy of Network Performance Characteristics for Grid Applications and Services, Open Grid Forum, GFD-R-P.023 (Proposed Recommendation), May 2004.
[Ogf2005a] An Architecture for Grid Checkpoint and Recovery (GridCPR) Services and a GridCPR Application Programming Interface, Draft Document, Global Grid Forum, 2005.
[Ogf2005b] GridFTP v2 Protocol Description, GFD-R-P.047, Open Grid Forum, May 2005.
[Ogf2006a] OGSA WSRF Basic Profile 1.0, Open Grid Forum, GFD.72, September 2006.
[Ogf2006b] Configuration Description, Deployment, and Lifecycle Management CDDLM Deployment API, Open Grid Forum, GFD.69, April 2006.
[Ogsa2006c] The Open Grid Services Architecture, Version 1.5, Open Grid Forum, GFD.80, September 2006.
[Ogf2007a] Use-Cases and Requirements for Grid Checkpoint and Recovery, Version 1.0, Open Grid Forum, GFD-I.92, May 2007.
[Ogf2007b] Web Services Agreement Specification (WS-Agreement), Open Grid Forum, GFD.107, May 2007.
[Natr2001] Natrajan, A., Humphrey, M., and Grimshaw, A., "Capacity and Capability Computing in Legion," The 2001 International Conference on Computational Science, May 2001
[Neko2005] Nekovee, M., Barcellos, M., and Daw, M., “Reliable multicast for the Grid: a case study in experimental computer science,” Philosophical Transactions of the Royal Society A, Volume 10, Number 1098, 2005.
An Analysis of Reliable Delivery Specifications for Web Services", International Conference on Information Technology: Coding and Computing, 2005 (ITCC 2005), Volume1, pp. 360-365, April 2005.
"[Pope2007] Popescu, A., Constantinescu, D., Erman, D., Ilie, D., “A Survey of Reliable Multicast Communication,” Third EuroNGI Conference on Next Generation Internet Networks, pp.111-118, May 2007.
[Qui2001] Qiu, L., Padmanabhan, V., and Voelker, G. "On the Placement of Web Server Replicas", Twentieth Annual Joint Conference of the IEEE Computer and Communications Societies - INFOCOM 2001, pp. 1587-1596.
[Ra2005] Ra, D., et al., “Scalable
[Rang2001] Ranganathana, K., and Foster,
[Rang2002]
Ranganathan K., Iamnitchi, A., and Foster, I., "Improving Data
Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer
Communities," in Global and Peer-to-Peer Computing on Large Scale
Distributed Systems Workshop,
[Rena2006] Ranaldo, N., Tretola, G., and Zimeo, E., “Hierarchical
and Reliable Multicast Communication for Grid Systems,” Current & Future Issues of High-End Computing,
Proceedings of the International Conference ParCo, pp. 137-144, 2005
[Ripe2002] Ripeanu, M., and Foster,
[Rti2002] Research Triangle Institute, The Economic Impacts of Inadequate Infrastructure for Software Testing, May 2002.
[Sant2005]
EDOC
[Schn2006] Schneider, J., Linnert, B., and Burchard, L., "Distributed Workflow Management for Large-Scale Grid Environments," International Symposium on Applications and the Internet (SAINT'06), 2006, pp. 229-235.
[Song2007] Song, C.X., Topkara, U., Woo, J., and Park, S.K.,
"Assessing Reliability of Grid Software Systems Using Emergent
Features," The 2nd Workshop on
Reliability and Robustness in Grid Computing Systems, the 19th Open Grid Forum
(OGF19),
[Stel1999] Stelling, P., Foster, I. Kesselman, C., Lee, C., and von Laszewski, G. “A Fault Detection Service for Wide Area Distributed Computations”, Cluster Computing, Volume 2, Number 2, 1999, pp. 117-128.
[Stoc2001] Stockinger, H., et al., “File and object replication in data grids,” Tenth IEEE Symposium on High Performance and Distributed Computing, pp. 305–314, 2001
[Sun2005] N1 Grid Engine User’s Guide, Sun MicroSystems, Inc., May 2005.
[Tai2004] Tai, S., Mikalsen, T., and Rouvellou,
[Taki2005] Takizawa, S. et al., “A Scalable Multi-Replication Framework for Data Grid,” Proceedings of the 2005 Symposium on Applications and the Internet Workshops (SAINT-W’05), 2005.
[Tann2002] Tannenbaum, T., Wright, D., Miller, K, and Livny,
M. “Condor - A Distributed Job Scheduler,” In Beowulf Cluster Computing with Linux, The MIT Press, MA,
[Tart2002] Tartanoglu, F., Issarny, V., Romanovsky, A., Levy,
N., “Dependability in the Web Service Architecture.” Proceedings of the ICSE
2002 Workshop on Architecting Dependable Systems (
[Tart2003] Tartanoglu, F., Issarny, V., Romanovsky, A., Levy,
N., “Coordinated Forward Error Recovery for Composite Web Services,” Proceedings
of the 22nd International Symposium on Reliable Distributed Systems, SRDS (
[Tauf2005] Taufer, M.,Teller, P.,
[Topk2006] Topkara, U., Song, C.X., Woo, J., and Park, S.K.,
"Connected in a Small World: Rapid Integration of Biological
Resources", Grid Computing
Environments Workshop (in conjuction with Supercomputing'06),
[Town2005] Townend, P., Groth, P., Looker, N. and Xu, J.
FT-Grid: A Fault-Tolerance System for e-Science, Proceedings of the
[Turn2007] Turner, K. and Tan, K., “Graphical Composition of Grid Services,”
Lecture Notes in
Computer Science 4401, pp. 1-17, Springer,
[Yemi1996] A. Yemini and S. Kliger. High Speed and Robust Event Correlation. IEEE Communication Magazine, Volume 34 Number 5, pp. 82–90, May 1996.
[Urga2001] Urganonkar, B. et al. “Maintaining Mutual Consistency for Cached Web Objects”, Proceedings of the 21st International Conference on Distributed Computing Systems (ICDCS-21), Phoenix, Arizona, April 2001
[Valc2005] Valcarenghi, L. and Piero C. “QoS-Aware Connection Resilience for
Network-Aware Grid Computing Fault Tolerance”, Proceedings of 2005 7th International Conference on Transparent Optical
Networks,
[Verm2003] Verma, D., and et al. “SRIRAM: A scalable resilient autonomic mesh”, IBM SYSTEMS JOURNAL, Volume 42, Number 1, pp. 19-28, 2003.
[vonL2004] von Laszewski, G., et al., “GridAnt: A Client-Controllable Grid Workflow System,” Argonne National Laboratory Preprint ANL/MCS-P1098-1003 and
Thirty-seventh Hawai’i
International Conference on System Science,
[Wald2006] Waldrich, O. Wieder, P. Ziegler, W., “A Meta-Scheduling Service for Co-allocating Arbitrary Types of Resources,” Lecture Notes on Computer Science 3911, pp. 782-791, 2006.
[Wang2006] Wang, X., Zhuang, Y., Hou, H., "Byzantine Fault Tolerance in MDS of Grid System," International Conference on Machine Learning and Cybernetics, pp.2782-2787, August 2006.
[Wate2004] Waters, G., Crawford, J., and Lim, S., “Optimising Multicast Structures for Grid Computing,” Computer Communications, Volume 27, pp. 1389-1400, September 2004.
[Woo2003] Woo, N., et al., MPICH-GF: Providing fault tolerance on grid environments", The 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid2003), May 2003.
[Wsdl2001] Web Services Description Language (WSDL) 1.1, "http://www.w3.org/TR/2001/NOTE-wsdl-20010315"
[Wsrm2005] Web Services Reliable Messaging Protocol (WS-ReliableMessaging), BEA Systems, IBM, Microsoft Corporation, Inc, and TIBCO Software Inc., 2005.
[W3c2007] SOAP Version 1.2 Part 1: Messaging Framework (Second Edition), World Wide Web Consortium (W3C), W3C recommendation, April 2007.
[Xian2006] Xiang, Y., Li, Z., and Chen, H., “Optimizing Adaptive Checkpointing Schemes for Grid Workflow Systems,” Proceedings of the Fifth International Conference on Grid and Cooperative Computing Workshops (GCCW'06), 2006.
[Xie2004] Xie, M., Dai, Y., and Poh, K., Computing
Systems Reliability, Kluwer Academic Publishers:
[Yeom2006] Yeom, H., “Providing Fault-tolerance for Parallel Programs on Grid (FT-MPICH)”, presented at the GGF First Workshop of Reliability and Robustness in Grid Computing Systems, Athens, Greece, February 2006.
[Yosh2005] Yoshimoto, K., Kovatch, P., Andrews, P. "Co-Scheduling with User-Settable Reservations," LNCS, 3834 ed., Workshop on Job Scheduling Strategies for Parallel Processing, Jun. 2005, pp. 146-156.
[Yu2004] Yu, J., Buyya, R., “A Novel Architecture for
Realizing Grid Workflow using Tuple Spaces,” The 5th IEEE/ACM International Workshop on Grid Computing (Grid 2004),
[Yu2005} Yu, J., and Buyya, R., A Taxonomy of Workflow Management Systems for Grid Computing.
Technical Report GRIDS-TR-2005-1,
[Zand1999] Zandy, V., Miller, B., and Livny, M. "Process Hijacking," The Eighth International Symposium on High Performance Distributed Computing, pp. 177-184, August 1999.
[Zhan2004]
Zhang, X., Zagorodnov, D. Hiltunen, M., Marzullo, K. and Schlichting, R.
“Fault–tolerant Grid Services Using Primary–Backup: Feasibility and
Performance”, Cluster 2004,
[Zhan2006a] Zhang, X., Junqueira, F.,
Hiltunen, M., Marzullo, K. and Schlichting, R. “Replicating
Nondeterministic Services on Grid Environments,” 15th IEEE International Symposium on High Performance Distributed
Computing, 2006 June 2006 pp.105 – 116.
[Zhan2006b] Zhang, Q., et al., “Dynamic Replica Location Service Supporting Data Grid Systems,” Sixth IEEE International Conference on Computer and Information Technology (CIT'06), p. 61, 2006.