Author: Miron Livny, University of Wisconsin – Madison

livny photoMiron Livny received a BSc degree in Physics and Mathematics in 1975 from the Hebrew University and MSc and PhD degrees in Computer Science from the Weizmann Institute of Science in 1978 and 1984, respectively. Since 1983 he has been on the Computer Sciences Department faculty at the University of Wisconsin-Madison, where he is currently a Professor of Computer Sciences and is leading the Condor project. Dr. Livny's research focuses on distributed processing and data management systems and data visualization environments. His recent work includes the Condor high throughput computing system, the DEVise data visualization and exploration environment and the BMRB repository for data from NMR spectroscopy.

 

 

Title: Condor – A Project and a System

Abstract:

Since the mid 80’s, the Condor project (www.cs.wisc.edu/Condor) at the University of Wisconsin-Madison has been engaged in the development, implementation, deployment and evaluations of mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributively owned computing resources. Guided by both the technological and sociological challenges of such computing environments, the Condor Team has been building and supporting software tools that enable scientists and engineers to increase their computing throughput. Today, the project consists of more than 35 students, full time staff and faculty who participate in a wide range of national and international multi-disciplinary efforts. Over the last decade, the Condor system gained the confidence of users and system administrators in both academia and industry as was repentantly demonstrated by the inclusion of Condor in the RH distributions. Deployed at more than 1500 sites and integrated into the software stacks of most grid projects, Condor offers an effective bridge between consumers and providers of computing and data resources. We will present the principals that have been guiding us in the evolution of the Condor project and the design of the Condor system. The challenges we face in sustaining and evolving the project will be addressed and our plans for short and long term enhancements to Condor and related technologies will be outlined

Title: Submitting locally and running globally – The GLOW and OSG experience

Abstract:

The Grid Laboratory of Wisconsin (GLOW) is a NSF and UW funded, distributed facility at the University of Wisconsin – Madison campus. It is part of the newly formed Center for High Throughput Computing (CHTC) and consists of more than 2500 processing cores and 150 TB of storage located at six different sites. Since its inception in the winter of 04, it has been serving a broad range of disciplines ranging from Biotechnology and Computer Sciences to Medical-Physics and Economics. Each of the GLOW sites is configured as an autonomous locally managed Condor pool that can operate independently when disconnected from the other sites. Under normal conditions, the six pools act like a single Condor system that is coordinated via a highly-available campus-wide matchmaking service. On-campus and off-campus users interact with GLOW through job-managers located on their desktop computers or community gateways.

The Open Science Grid (OSG) is a DOE and NSF co-funded US national distributed computing facility that supports scientific computing via an open collaboration of researchers, software developers and computing, storage and network providers. The OSG Consortium is building and operating the OSG facility, bringing resources and researchers from universities and national laboratories together and cooperating with other national and international infrastructures to give scientists access to shared resources world-wide. The particular characteristics of the OSG are to: Provide guaranteed and opportunistic access to shared resources; operate a heterogeneous environment both in services available at any site and for any Virtual Organization, and multiple implementations behind common interfaces; Support multiple software releases at any one time; Interface to campus and regional grids; Federate with other national and international cyber-infrastructures.

In the talk, we will discuss how the High Throughput computing (HTC) principals that have been guiding us for more than two decades are implemented in the context of these two facilities. Capabilities to “elevate” local GLOW jobs to the national OSG infrastructure will be discussed. These capabilities follow our long standing “bottom-up” approach to the construction and operation of large scale distributed computing infrastructure that maximize reachable capacity while preserving local access, environment and autonomy.