Berkeley UPC System Internals Documentation version 2.8.0

High-level System Design

The Berkeley UPC compiler project is seeking to develop a fully-portable, high-performance UPC compiler that will run on a wide variety of shared-memory and distributed-memory platforms using different network interconnects, including large-scale multiprocessors, PC clusters, and clusters of shared memory multiprocessors. One of the main goals of the project is to experiment with parallel compiler optimization techniques, without being tied to a particular system architecture or network. Portability is achieved by translating UPC programs to an intermediate representation in C, which can then be compiled using the system's ANSI C compiler and linked to a standardized runtime system and communication system tailored to the specific platform.

High-level system diagram

The figure shows the high-level system diagram for a UPC application compiled using the Berkeley UPC compiler. The generated C code runs on top of the UPC runtime system, which provides platform independence and implements language-specific features such as shared memory allocation and shared pointer manipulation. The runtime system implements remote operations by calling the GASNet communication interface, which provides hardware-independent lightweight networking primitives. 

The Berkeley UPC-to-C Translator

The Berkeley UPC-to-C translator performs source to source translation from UPC code to ANSI-compliant C code, with shared memory operations transformed into calls to the Berkeley UPC runtime.  The translator is based on the Open64 compiler suite; please refer to their documentation page for information about the design and specific features of that compiler.  Numerous modifications have been added to the translator to support UPC features, and the translator's C front end has also been extended to support both 32- and 64-bit platforms.  A detailed description of our modifications can be found here.

The figure gives a high-level overview of the translation process.  Upon receiving a preprocessed UPC program, the translator invokes the UPC front end, whose function is to parse, type-check, and generate the Whirl intermediate format for the program.  UPC-specific information such as shared types and block sizes of distributed arrays are kept in the symbol table, so the translator could take advantage of such information during optimizations.  The back end operates on the Whirl intermediate language and lowers shared memory operations into equivalent runtime library calls.  The Berkeley UPC runtime provides a wide of variety of shared memory accesses from non-blocking to bulk, so that the translator could experiment with different strategies for communication optimizations such as message pipelining and prefetching.  Finally the whirl2c component translates Whirl into ANSI-compliant C code, with shared variables declared as the opaque pointer-to-shared construct.      

The Berkeley UPC Runtime

The Berkeley UPC runtime specification documents the platform-independent interface between compiler-generated C code and the UPC runtime system. It provides platform-independent job/thread control, shared memory access (put/get operations and bulk transfer operations), shared pointer manipulation and arithmetic, shared memory management, UPC barriers and UPC locks. The shared memory operations provide very flexible non-blocking shared memory access with a wide range of synchronization alternatives.

In addition to providing platform independence, the Berkeley Runtime interface provides a cleanly-documented, compiler-independent "UPC abstract machine" that can be used as a compilation target by other front-end UPC compilers.

The UPC Runtime Specification:
    Postscript     Acrobat PDF     Text

UPC Runtime Implementation Notes:
Memory Management in the UPC Runtime
Handling Static Data in the UPC Runtime

The GASNet Networking Layer

GASNet is a language-independent, low-level networking layer that provides network-independent high-performance communication primitives tailored for implementing parallel global address space SPMD languages such as UPC and Titanium.

The design of GASNet is partitioned into two layers to maximize porting ease without sacrificing performance: the lower level is a narrow but very general interface called the GASNet core API - the design is based heavily on Active Messages, and is implemented directly on top of each individual network architecture. The upper level is a wider and more expressive interface called the GASNet extended API, which provides high-level operations such as remote memory access and various collective operations. 

System diagram showing GASNet layers

We've written a network-independent reference implementation of the extended API purely in terms of the core API, which allows GASNet (and the GAS compiler) to quickly and easily be ported to a new network architecture by re-implementing only the minimal core API. GASNet is structured such that implementers can choose to additionally bypass certain functions in the reference implementation of the extended API and implement them directly on the hardware to improve performance of specific operations when hardware support is available (e.g. special network support for puts/gets or hardware-assisted broadcast).

The GASNet Specification:
    Acrobat PDF     Postscript      HTML      Text
For the most up-to-date version of the spec, see the GASNet web page.
For citations, please use: U.C. Berkeley Tech Report CSD-02-1207, November 2002. (spec v1.1)
