Next: 7. Utilities Up: NWCHEM Programmer's Guide, Release Previous: 5. Integral Application Programmer's Contents

Subsections

6. Software Development Toolkit

The Software Development Toolkit is the foundation of the functional architecture in NWChem. It consists of various useful elements for memory management and data manipulation that are needed to facilitate the development of parallel computational chemistry algorithms. The memory management elements implement the NUMA memory management module for efficient execution in parallel enviroments and provides the means for interfacing between the calculation modules of the code and the system hardware. Efficient data manipulation is accomplished using the runtime data base, which stores the information needed to run particular calculations and allows different modules to have access to the same information. This chapter describes the various elements of the Software Development Toolkit in detail.

6.1 Non-Uniform Memory Allocation (NUMA)

All computers have several levels of memory, with parallel computers generally having more than computers with only a single processor. Typical memory levels in a parallel computer include the processor registers, local cache memory, local main memory, and remote memory. If the computer also supports virtual memory, local and remote disk memory are added to this heirarchy. These levels vary in size, speed, and method of access, and in NWChem the differences among them are lumped under the general concept Non-Uniform Memory Access (NUMA). This approach allows the developer to think of all memory anywhere in the system as accessible to any processor as needed. It is then possible to focus independently on the questions of memory access methods and memory access costs. Memory access methods are determined by the programming model and available tools and the desired coding style for an application. Memory access costs are determined by the program structure and the performance characteristics of the computer system. The design of a code's major algorithms, therefore, is critical to the creation of an efficient parallel program.

In order to scale to massively parallel computer architectures in all aspects of the hardware (i.e., CPU, disk, and memory), NWChem uses Non-Uniform Memory Access to distribute the data across all nodes. Memory access is achieved through explicit message passing using the TCGMSG interface. The Memory Allocator (MA) tool is used to allocate memory that is local to the calling process. The Global Arrays (GA) tool is used to share arrays between processors as if the memory were physically shared. The complex I/O patterns required to accomplish efficient memory management are handled with the abstract programming interface ChemIO.

The following subsections discuss the TCGMSG message passing tool, the Memory Allocator library, the Global Arrays library, and ChemIO, and describe how they are used in NWChem.

6.1.1 Message Passing

TCGMSG^6.1is a toolkit for writing portable parallel programs using a message passing model. It is relatively simple, having limited functionality that includes point-to-point communication, global operations, and a simple load-balancing facility, and was designed with chemical applications in mind. This simplicity contributes to the robustness of TCGMSG and its expemlary portability, and also to its high performance for a wide range of problem sizes.

The model used by TCGMSG operates as if it is sending a block until the message is explicitly received, and the messages from a particular process can be received only in the order sent. Processes should be thought of as being connected with ordered synchronous channels, even though messages are actually sent without any synchronization between sender and receiver, so far as buffering permits. The amount of buffering is greatly dependent on the mechanism used by the particular platform, so it is best not to count on this feature. Detailed information that includes documentation of the programming interface is available on-line as part of the EMSL webpage, at

       /docs/parsoft/tcgmsg/

A more general tool for message passing is MPI, which includes concepts such as process groups, communication contexts, and virtual topologies. Process groups can be used to specify that only certain processes are involved in a particular task, or to allow separate groups of processes to work on different tasks. Communication context provides an additional criterion for message selection, enhancing internal communication flexibility without incurring conflicts with other modules. MPI has been implemented in NWChem as an alternative to TCGMSG, and the code can be compiled with this option specified. However, it is not an undertaking for the faint of heart and it is highly advisable to contact nwchem-support@emsl.pnl.gov before trying this option.

The TCGMSG-MPI library is distributed with the Global Arrays package. This library is an implementation of the TCGMSG message passing inteface on top of MPI and system-specific resources. Using this library, it is possible to use both MPI and TCGMSG interfaces in the same application. TCGMSG offers a much smaller set of operations than MPI, but these include some unique capabilties, such as

nxtval - a shared memory counter with atomic updates, often used in dynamic load balancing operations
plcopy - function to copy content of a sequential file to all processes
mitoh, mdtoh, etc. - portable Fortran equivalents of the C sideol operator

The nxtval operation is implemented in TCGMSG-MPI in different ways, depending on the platform.

SGI Origin-X - shared memory and mutexes or semaphores
IBM SP
- under MPL - interrupt receive
- under LAPI communication library - atomic read-modify-write
- under thread-safe MPI - atomic read-modify-write
Intel NX - interrupt receive, with signal-based implementation of the MPI library
Cray T3D/E - SHMEM library
Fujistu VX/VPP - MPlib
server implementation using dedicated MPI process

Detailed information that includes documentation of the programming interface is available on-line as part of the EMSL webpage, at

       /docs/parsoft/tcgmsg-mpi/

6.1.2 Memory Allocator (MA)

The Memoray Allocator (MA) is used to allocate data that will generally not be directly shared with other processes, such as workspace for a particular local calculation or for replication of very small sets of data. The MA tool is a library of routines that comprises a dynamic memory allocator for use by C, FORTRAN, or mixed-language applications. It provides both heap and stack memory management disciplines, debugging and verification support (for detecting memory leaks, for example), usage statistics, and quantitative memory availability information.

Applications written in FORTRAN require this sort of library because the language does not support dynamic memory allocation. Applications written in C can benefit from using MA instead of the ordinary malloc() and free() routines because of the extra features MA provides, which include both heap and stack memory management disciplines, debugging and verification support, usage statistics, and quantitative memory availability information. MA is designed to be portable across a large variety of platforms.

Detailed information on specific routines is available in the MA man pages. This can be accessed by means of the command, man ma. (Note: this will work only if the local environmental variable MANPATH includes the path $(NWCHEM_TOP)/src/man/ma/man. See Section 8.3 for information on system and environmental requirements for running NWChem.) The following subsections present a summary list of the MA routines, and a brief discussion of the implementation of this feature.

6.1.2.1 MA Data Types

All MA memory must be explicitly assigned a specific type by defining each data item in units of integer, logical, double precision, or character words. The type of data is specified in arguments using predefined Fortran parameters (or macros in C). These parameters are available in the include files mafdecls.fh in Fortran and in macdecls.h in C. The parameters are typed as follows:

: MT_INT -- integer
: MT_DBL -- double precision
: MT_LOG -- logical
: MT_CHAR -- character*1

6.1.2.2 Implementation

To access required MA definitions, C applications should include macdecls.h and FORTRAN applications should include mafdecls.fh. These are public header files for a dynamic memory allocator, and are included in the .../src/ma subdirectory of the NWChem directory tree. The files contain the type declarations and parameter declarations for the datatype constants, and define needed functions and variable types.

The memory allocator uses the following memory layout definitions:

segment = heap_region stack_region
region = block block block ...
block = AD gap1 guard1 client_space guard2 gap2

A segment of memory is obtained from the OS upon initialization. The low end of the segment is managed as a heap. The heap region grows from low addresses to high addresses. The high end of the segment is managed as a stack. The stack region grows from high addresses to low addresses.

Each region consists of a series of contiguous blocks, one per allocation request, and possibly some unused space. Blocks in the heap region are either in use by the client (allocated and not yet deallocated) or not in use by the client (allocated and already deallocated). A block on the rightmost end of the heap region becomes part of the unused space upon deallocation. Blocks in the stack region are always in use by the client, because when a stack block is deallocated, it becomes part of the unused space.

A block consists of the client space, i.e., the range of memory available for use by the application. Guard words adjacent to each end of the client space to help detect improper memory access by the client. Bookkeeping information is stored(?) in an "allocation descriptor" AD Two gaps, each zero or more bytes long, are defined to satisfy alignment constraints (specifically, to ensure that AD and client_space are aligned properly).

6.1.2.3 List of MA routines

All MA routines are shown below, grouped by category and listed alphabetically within each category. The FORTRAN interface is given here. Information on the the C interface are available in the man pages. (The man pages also contain more detailed information on the arguments for these routines.)

Initialization:

MA_init(datatype, nominal_stack, nominal_heap)
- integer datatype
- integer nominal_stack
- integer nominal_heap
MA_sizeof(datatype1, nelem1, datatype2)
- integer datatype1
- integer nelem1
- integer datatype2
MA_sizeof_overhead(datatype)
- integer datatype
MA_initialized()

Allocation:

MA_alloc_get(datatype, nelem, name, memhandle, index)
- integer datatype
- integer nelem
- character*(*) name
- integer memhandle
- integer index
MA_allocate_heap(datatype, nelem, name, memhandle)
- integer datatype
- integer nelem
- character*(*) name
- integer memhandle
MA_get_index(memhandle, index)
- integer memhandle
- integer index
MA_get_pointer() -- C only
MA_inquire_avail(datatype)
- integer datatype
MA_inquire_heap(datatype)
- integer datatype
MA_inquire_stack(datatype)
- integer datatype
MA_push_get(datatype, nelem, name, memhandle, index)
- integer datatype
- integer nelem
- character*(*) name
- integer memhandle
- integer index
MA_push_stack(datatype, nelem, name, memhandle)
- integer datatype
- integer nelem
- character*(*) name
- integer memhandle

Deallocation:

MA_chop_stack(memhandle)
- integer memhandle
MA_free_heap(memhandle)
- integer memhandle
MA_pop_stack(memhandle)
- integer memhandle

Debugging:

MA_set_auto_verify(value)
- logical value
- integer ivalue
MA_set_error_print(value)
- logical value
- integer ivalue
MA_set_hard_fail(value)
- logical value
- integer ivalue
MA_summarize_allocated_blocks
MA_verify_allocator_stuff()

Iteration Over Allocated Blocks:

MA_get_next_memhandle(ithandle, memhandle)
- integer ithandle
- integer memhandle
MA_init_memhandle_iterator(ithandle)
- integer ithandle

Statistics:

MA_print_stats(oprintroutines)
- logical printroutines

6.1.2.4 MA Errors

Errors considered fatal by MA result in program termination. Errors considered nonfatal by MA cause the MA routine to return an error value to the caller. For most boolean functions, false is returned upon failure and true is returned upon success. (The boolean functions for which the return value means something other than success or failure are MA_set_auto_verify(), MA_set_error_print(), and MA_set_hard_fail().) Integer functions return zero upon failure; depending on the function, zero may or may not be distinguishable as an exceptional value.

An application can force MA to treat all errors as fatal via MA_set_hard_fail().

If a fatal error occurs, an error message is printed on the standard error (stderr). By default, error messages are also printed for nonfatal errors. An application can force MA to print or not print error messages for nonfatal errors via MA_set_error_print().

6.1.3 Global Arrays (GA)

Globally addressable arrays have been developed to simplify writing portable scientific software for both shared and distributed memory computers. Programming convenience, code extensibility and maintainability are gained by adopting the shared memory programming model. The Global Array (GA) toolkit provides an efficient and portable "shared memory" programming interface for distributed memory computers. Each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices without need for explicit cooperation by other processes. The trade-off with this approach is that access to shared data will be slower than access to local data, and the programmer must be aware of this in designing modules.

From the user perspective, a global array can be used as if it was stored in the shared memory. Details of the data distribution, addressing and communication are encapsulated in the global array objects. However, the information on the actual data distribution can be obtained and taken advantage of whenever data locality is important.

The Global Arrays tool has been designed to complement the message-passing programming model. The developer can use both shared memory and message passing paradigms in the same program, to take advantage of existing message-passing software libraries such as TCGMSG. This tool is also compatible with the Message Passing Interface (MPI). The Global Arrays toolkit has been in the public domain since 1994 and is actively supported. Additional documentation and information on performance and applications is available on the web site /docs/global/.

Currently support is limited to 2-D double precision or integer arrays with block distribution, at most one block per array per processor.

6.1.3.1 Interaction Between GA and MA

Available global (GA) and local (MA) memory can interact within NWChem in only two ways,

GA is allocated within MA, and GA is limited only by the available space in MA.
GA is not allocated within MA, and GA is limited at initialization (within NWChem input this is controlled by the MEMORY directive)

If GA is allocated within MA, then the available GA space is limited to the currently available MA space. This also means that the total allocatable memory for GA and MA must be no more than the available MA space. If GA is not allocated within MA, then local and global arrays occupy essentially independent space. The allocatable memory for GA is limited only by the available space for GA, and similarly, the allocatable memory for MA is limited only by the available local memory.

When allocating space for GA, some care must be exercised in the treatment of the information returned by the routine ga_memory_avail(), whether or not the allocation is done in MA. The routine ga_memory_avail() returns the amount of memory (in bytes) available for use by GA in the calling process. This returned value must be converted to double precision words when using double precision. If a uniformly distributed GA is desired, it is also necessary to find the minimum of this value across all nodes. This value will in general be a rather large number. When running on a platform with many nodes and having a large memory, the agreggate GA memory, even in double precision words, could be a large enough value to overflow a 32-bit integer. Therefore, for calculations that require knowing the size of total memory, it is advisable to first store the size of memory on each node in a double precision number and then sum these values across all the nodes.

The following pseudo-code illustrates this process for an application.

#include "global.fh"
#include "mafdecls.fh"

    integer avail_ma, avail_ga

    avail_ma = ma_inquire_avail(mt_dbl)
    avail_ga = ga_memory_avail()/ma_sizeof(mt_dbl,1,mt_byte)

    if (ga_uses_ma()) then
c
c  available GA space is limited to currently available MA space,
c  and GA and MA share the same space
c
      allocatable_ga + allocable_ma <= avail_ma = avail_ga

    else
c
c  GA and MA are independent
c
         allocatable_ga <= avail_ga
         allocatable_ma <= avail_ma

    endif
c
c find the minimum value of available GA space over all nodes
c
    call ga_igop(msgtype,avail_ga,1,'min')
c
c determine the total available GA space
c
    double precision davail_ga
    davail_ga = ga_memory_avail()/ma_sizeof(mt_dbl,1,mt_byte)
    call ga_dgop(msgtype,davail_ga,1,'+')

6.1.3.2 List of GA Routines

The following routines are invoked for operations that are globally collective. That is, they must be simultaneously invoked by all processes as if in SIMD mode.

ga_initialize_() -- initialize global array internal structures
ga_initialize_ltd(mem_limit) -- initialize global arrays and set memory usage limits (note: if mem_limit is less than zero specifies unlimited memory usage.)
- integer mem_limit -- [input] GA total memory ( specifying less than 0 means "unlimited memory")
ga_create(type,dim1,dim2,array_name,chunk1,chunk2,g_a) -- create an array
- integer type -- [input] MA type
- integer dim1, dim2 -- [input] array dimensions (dim1,dim2) as in FORTRAN
- character array_name-- [input] unique character string identifying the array
- integer chunk1, chunk2 -- [input] minimum size that dimensions should be chunked up into; setting chunk1=dim1 gives distribution by rows setting chunk2=dim2 gives distribution by columns Actual chunk sizes are modified so that they are at least the min size and each process has either zero or one chunk. (Specifying both as less than or equal to 1 yields an even distribution)
- integer g_a [output] integer handle for future references
ga_create_irreg(type, dim1, dim2, array_name, map1, nblock1, map2, nblock2, g_a) -- create an array with irregular distribution
- integer type -- [input] MA type
- integer dim1, dim2 -- [input] array dimensions (dim1,dim2) as in FORTRAN
- character array_name-- [input] unique character string identifying the array
- integer map1 -- [input] number ilo in each block
- integer nblock1 -- [input] number of blocks dim1 is divided into
- integer map2 -- [input] number jlo in each block
- integer nblock2 -- [input] number of blocks dim2 is divided into
- integer g_a -- [output] integer handle for future references
ga_duplicate(g_a, g_b, array_name) -- create an array with same properties as reference array
- character array_name -- [input] unique character string identifying the array
- integer g_a -- [output] integer handle for reference array
- integer g_b -- [output] integer handle for new array
ga_destroy_(g_a) -- destroy an array
- integer g_a -- [input] integer handle of array to be destroyed
ga_terminate_() -- destroys all existing global arrays and de-allocates shared memory
ga_sync_() -- synchronizes all processes (a barrier)
ga_zero_(g_a) -- zero an array
- integer g_a -- [input] integer handle of array to be zeroed
ga_ddot_(g_a, g_b) -- dot product of two arrays (double precision only)
- integer g_a -- [input] integer handle of first array in dot product
- integer g_b -- [input] integer handle of second array in dot product
ga_dscal -- scale the elements in an array by a constant (double precision data only)
ga_dadd -- scale and add two arrays to put result in a third (may overwrite one of the other two, doubles only)
ga_copy(g_a, g_b) -- copy one array into another
- integer g_a -- [input] integer handle of array to be copied
- integer g_b -- [input] integer handle of array g_a is copied into
ga_dgemm(transa, transb, m, n, k, alpha, g_a, g_b, beta, g_c -- BLAS-like matrix multiply
- character*1 transa, transb
- integer m, n, k
- double precision alpha, beta
- integer g_a, g_b, g_c
ga_ddot_patch(g_a, t_a, ailo, aihi, ajlo, ajhi, g_b, t_b, bilo, bihi, bjlo, bjhi) -- dot product of two arrays (double precision only; patch version) (Note: patches of different shapes and distrubutions are allowed, but not recommended, and both patches must have the same number of elements)
- integer g_a -- [input] integer identifier of first array containing patch for dot product
- integer t_a -- [input] transpose of first array
- integer ailo, aihi -- [input] high and low indices for i dimension of patch of array for dot product
- integer ajlo, ajhi -- [input] high and low indices for j dimension of patch of array for dot product
- integer g_b -- [output]integer identifier of second array contianing patch for dot product
- integer t_b -- [input] transpose of second array
- integer bilo, bihi -- [input] high and low indices for i dimension of patch of array for dot product
- integer bjlo, bjhi -- [input] high and low indices for j dimension of patch of array for dot product
ga_dscal_patch -- scale the elements in an array by a constant (patch version)
ga_dadd_patch -- scale and add two arrays to put result in a third (patch version)
ga_ifill_patch -- fill a patch of array with value (integer version)
ga_dfill_patch -- fill a patch of array with value (double version)
ga_matmul_patch(transa, transb, alpha, beta, g_a, ailo, aihi, ajlo, ajhi, g_b, bilo, bihi, bjlo, bjhi, g_c, cilo, cihi, cjlo, cjhi) -- matrix multiply (patch version)
- character transa -- [input] transpose of first array for matrix multiply
- character transb -- [input] transpose of second array for matrix multiply
- double precision alpha -- ??
- double precision beta -- ??
- integer g_a -- [input] integer identifier of first array for matrix multiply
- integer ailo, aihi -- [input] high and low indices for i dimension of patch of first array for matrix multiply
- integer ajlo, ajhi -- [input] high and low indices for j dimension of patch of first array for matrix multiply
- integer g_b -- [input] integer identifier of second array for matrix multiply
- integer bilo, bihi -- [input] high and low indices for i dimension of patch of second array for matrix multiply
- integer bjlo, bjhi -- [input] high and low indices for j dimension of patch of second array for matrix multiply
- integer g_c -- [input] integer identifier of resultant array for matrix multiply
- integer cilo, cihi -- [input] high and low indices for i dimension of patch of resultant array for matrix multiply
- integer cjlo, cjhi -- [input] high and low indices for j dimension of patch of resultant array for matrix multiply
ga_diag(g_a, g_s, g_v, eval) -- real symmetric generalized eigensolver (sequential version ga_diag_seq also exists)
- integer g_a -- matrix to diagonalize
- integer g_s -- metric
- integer g_v -- global matrix to return evecs
- double precision eval(*) -- local array to return evals
ga_diag_reuse(reuse,g_a,g_s,g_v,eval) -- a version of ga_diag for repeated use
- integer reuse -- allows reuse of factorized g_s: flag is 0 first time, greater than 0 for subsequent calls, less than 0 deletes factorized g_s
- integer g_a -- matrix to diagonalize
- integer g_s -- metric
- integer g_v -- global matrix to return evecs
- double precision eval(*) -- local array to return evals
ga_diag_std(g_a, g_v, eval) -- standard real symmetric eigensolver (sequential version also exists)
- integer g_a -- [input] matrix to diagonalize
- integer g_v -- [output] global matrix to return evecs
- double precision eval(*) -- [output] local array to return evals
ga_symmetrize(g_a) -- symmetrizes matrix A into 0.5(A+A') (NOTE: diag(A) remains unchanged.)
- integer g_a -- [input] matrix to symmetrize
ga_transpose(g_a) -- transpose a matrix
- integer g_a -- [input] matrix to transpose
ga_lu_solve(trans, g_a, g_b) -- solves system of linear equations based on LU factorization (sequential version ga_lu_solve_seq also exists)
- character*1 trans -- [input] transpose or not
- integer g_a -- [input] matrix to diagonalize (coefficient matrix A)
- integer g_b -- [output] rhs matrix B, overwritten on exit by the solution vector, X of AX = B
ga_print_patch(g_a, ilo, ihi, jlo, jhi, pretty) -- print a patch of an array to the screen
- integer g_a -- [input] integer identifier of array to be printed
- integer ilo, ihi -- [input] high and low indices for i dimension of patch of array to be printed
- integer jlo, jhi -- [input] high and low indices for j dimension of patch of array to be printed
- integer pretty -- [input] flag for format of output to screen;
  - pretty = 0, spew output out with no formatting
  - pretty = 1, format output so that it is readable
ga_print(g_a) -- print an entire array to the screen
- integer g_a -- [input] integer identifier of array to be printed
ga_copy_patch(trans, g_a, ailo, aihi, ajlo, ajhi, g_b, bilo, bihi, bjlo, bjhi) -- copy data from a patch of one global array into another array, (Note: patch can change shape, but total numer of elements must be the same between the two arrays)
- character*1 trans -- [input] transpose or not
- integer g_a -- [input] integer identifier of array to be copied
- integer ailo, aihi -- [input] high and low indices for i dimension of patch of array to be copied
- integer ajlo, ajhi -- [input] high and low indices for j dimension of patch of array to be copied
- integer g_b -- [output]integer identifier of array data is to be copied into
- integer bilo, bihi -- [input] high and low indices for i dimension of patch of array being copied into
- integer bjlo, bjhi -- [input] high and low indices for j dimension of patch of array being copied into
ga_compare_distr_(g_a, g_b) -- compare distributions of two global arrays
- integer g_a -- [input] integer identifier of first array
- integer g_b -- [output]integer identifier of second array

Operations that may be invoked by any process in true MIMD style:

ga_get_(g_a, ilo, ihi, jlo, jhi, buf, Id) -- read from a patch of an array
- integer g_a -- [input] integer handle of array
- integer ilo, ihi -- [input] high and low indices for i dimension of region
- integer jlo, jhi -- [input] high and low indices for j dimension of region
- integer buf -- [output] ???
- integer Id -- [output] ???
ga_put_(g_a, ilo, ihi, jlo, jhi, buf, Id) -- write from a patch of an array
- integer g_a -- [input] integer handle of array
- integer ilo, ihi -- [input] high and low indices for i dimension of region
- integer jlo, jhi -- [input] high and low indices for j dimension of region
- integer buf -- [output] ???
- integer Id -- [output] ???
ga_acc_(g_a, ilo, ihi, jlo, jhi, buf, Id, alpha) -- accumulate into a patch of an array (double precision only)
- integer g_a -- [input] integer handle of array
- integer ilo, ihi -- [input] high and low indices for i dimension of region
- integer jlo, jhi -- [input] high and low indices for j dimension of region
- integer buf -- [output] ???
- integer Id -- [output] ???
- integer alpha -- ????
ga_scatter_(g_a, v, i, j, nv) -- scatter elements of v into an array
- integer g_a -- [input] integer handle of array that elements of v are to be scattered into
- ???? v -- [input] array from which elements are to be scattered
- integer i, j -- [input] array element indices (i,j) as in FORTRAN
- integer nv -- ????
ga_gather_g_a, v, i, j, nv -- gather elements from an array v into array g_a
- integer g_a -- [input] integer handle of array that elements of v are to be gathered into
- ???? v -- [input] array from which elements are to be gathered
- integer i, j -- [input] array element indices (i,j) as in FORTRAN
- integer nv -- ????
ga_read_inc_(g_a, i, j, inc) -- atomically read and increment the value of a single array element (integers only)
- integer g_a -- [input] integer handle of array
- integer i, j -- [input] array element indices (i,j) as in FORTRAN
- integer inc -- [input] amount to increment array element value
ga_locate(g_a,i,j,owner) -- determine which process `holds' an array element (i,j)
- integer g_a -- [input] integer handle of array
- integer i, j -- [input] array element indices (i,j) as in FORTRAN
- integer owner -- [output] index number of processor holding the element
ga_locate_region_(g_a, ilo, ihi, jlo, jhi, map, np) -- determine which process `holds' an array section
- integer g_a -- [input] integer handle of array
- integer ilo, ihi -- [input] high and low indices for i dimension of region
- integer jlo, jhi -- [input] high and low indices for j dimension of region
- ?????? map -- [output] ???
- integer np -- [output] index number of processor holding the region
ga_error(string, icode) -- print error message and terminate the program
- character string -- [input] ????
- integer icode -- [input] integer flag for error code
ga_summarize(verbose) -- print information about all allocated arrays (note: assumes no more than 100 arrays are allocated and are numbered -1000, -999, etc.)
- integer verbose -- [input] if non-zero, print distribution information

Operations that may be invoked by any process in true MIMD style and are intended to support writing of new functions:

ga_distribution_(g_a, me, ilo, ihi, jlo, jhi) -- find coordinates of the array patch that is `held' by a processor
- integer g_a -- [input] integer handle of array
- integer me -- [input] index number of processor holding the patch
- integer ilo, ihi -- [output] high and low indices for i dimension of region
- integer jlo, jhi -- [output] high and low indices for j dimension of region
ga_access(g_a, ilo, ihi, jlo,jhi, index, Id) -- provides access to a patch of a global array
- integer g_a -- [input] integer handle of array to be accessed
- integer ilo, ihi -- [output] high and low indices for i dimension of region
- integer jlo, jhi -- [output] high and low indices for j dimension of region
- integer index -- ????
- integer Id -- ????
ga_release(g_a, ilo, ihi, jlo, jhi) -- relinquish access to internal data
- integer g_a -- [input] integer handle of array to be released
- integer ilo, ihi -- [output] high and low indices for i dimension of region
- integer jlo, jhi -- [output] high and low indices for j dimension of region
ga_release_update_(g_a, ilo, ihi, jlo, jhi) -- relinquish access after data were updated
- integer g_a -- [input] integer handle of array to be updated and released
- integer ilo, ihi -- [output] high and low indices for i dimension of region
- integer jlo, jhi -- [output] high and low indices for j dimension of region
ga_check_handle(g_a, fstring) -- verify that a GA handle is valid
- integer g_a -- [input] integer handle of array
- character* fstring -- [input] name of routine originating the check

Operations to support portability between implementations:

ga_nodeid_() -- find requesting compute process message id
ga_nnodes_() -- find number of compute processes
ga_dgop(type, x, n, op) -- equivalent to TCGMSG dgop, for use in data-server mode where only compute processes participate
- integer type -- [input] integer handle of array
- integer n -- [input]
- double precision x -- [input]
- character op -- [input]
ga_igop(type, x, n, op) -- equivalent to TCGMSG igop, for use in data-server mode where only compute processes participate; performs the operation specified by the input variable op (supported operations include addition, multiplication, maximum, minimum, and maximum or minimum of the absolute value), and returns the value in x.
- integer type -- [input] integer handle of array
- integer n -- [input]
- double precision x -- [input/output]
- character op -- [input]
ga_brdcst(type, buf, len, originator) -- equivalent to TCGMSG brdcst, for use in data server mode with predefined communicators
- integer type -- [input] integer handle of array
- ???? buf -- [input]
- integer len -- [input]
- integer originator -- [input] number of originating processor

Other utility operations:

ga_inquire_(g_a, atype, adim1, adim2) -- find the type and dimensions of the array
- integer g_a -- [input] integer identifier of array
- integer atype -- [output] MA type
- integer adim1, adim2 -- [output] array dimensions (adim1,adim2) as in FORTRAN
ga_inquire_name_(g_a, array_name) -- find the name of the array
- integer g_a -- [input] integer identifier of array
- character* array_name -- [output] string containing name of the array
ga_inquire_memory_() -- find the amount of memory in active arrays
ga_memory_avail_() -- find the amount of memory (in bytes) left for GA
ga_summarize(verbose) -- prints summary info about allocated arrays
- integer verbose -- [input] if non-zero, print distribution information
ga_uses_ma_() -- finds if memory in arrays comes from MA (memory allocator)
ga_memory_limited_() -- finds if limits were set for memory usage in arrays

Note that consistency is only guaranteed for

Multiple read operations (as the data does not change)
Multiple accumulate operations (as addition is commutative)
Multiple disjoint put operations (as there is only one writer for each element)

The application has to worry about everything else (usually by appropriate insertion of ga_sync calls).

6.1.3.3 New(?) Stuff

Subroutines that appear in the files of directory .../src/global/src, but are not in the (ga.tex) document;

ga_get_local(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc) -- local read of a 2-dimensional patch of data into a global array
ga_get_remote(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc) -- read an array patch from a remote processor
ga_put_local(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc) -- local write of a 2-dimensional patch of data into a global array
ga_put_remote(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc) -- write an array patch from a remote processor
ga_acc_local(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc, alpha) -- local accumulate of a 2-dimensional patch of data into a global array
ga_acc_remote(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc, alpha) -- accumulate an array patch from a remote processor
ga_scatter_local(g_a, v, i, j, nv, proc) -- local scatter of v into a global array
ga_scatter_remote(g_a, v, i, j, nv, proc) -- scatter of v into an array patch from a remote processor
ga_gather_local(g_a, v, i, j, nv, proc) -- local gather of v into a global array
ga_gather_remote(g_a, v, i, j, nv, proc) -- gather of v into an array patch from a remote processor
ga_dgop_clust(type, x, n, op, group) -- equivalent to TCGMSG dgop, for use in data-server mode where only compute processes participate
ga_igop_clust(type, x, n, op, group) -- equivalent to TCGMSG igop, for use in data-server mode where only compute processes participate
ga_brdcst_clust(type, buf, len, originator, group) -- internal GA routine that is used in data server mode with predefined communicators
ga_debug_suspend() -- ??? option to suspend debugging for a particular process
ga_copy_patch_dp(t_a, g_a, ailo, aihi, ajlo, ajhi, g_b, bilo, bihi, bjlo, bjhi) -- copy a patch by column order (Fortran convention)
ga_print_stats_() -- print GA statistics for each process
ga_zeroUL(uplo, g_A) -- set to zero the L/U tirangle part of an NxN double precision global array A
ga_symUL(uplo, g_A) -- make a symmetric square matrix from a double precision global array A in L/U triangle format
ga_llt_s(uplo, g_A, g_B, hsA) -- solves a system of linear equations [A]X = [B],
ga_cholesky(uplo, g_a) -- computes the Cholesky factorization of an NxN double precision symmetric positive definite matrix to obtain the L/U factor on the lower/upper triangular part of the matrix
ga_llt_f(uplo, g_A, hsA) -- computes the Cholesky factorization of an NxN double precision symmetric positive definite global array A
ga_llt_i(uplo, g_A, hsA) -- computes the inverse of a global array that is the lower triangle L or the upper triagular Cholesky factor U of an NxN double precision symmetric positive definite global array (LL' or U'U)
ga_llt_solve(g_A, g_B) -- solves a system of linear equations [A]X = [B] using the cholesky factorization of an NxN double precision symmetric positive definite global array A
ga_spd_invert(g_A) -- computes the inverse of a double precision array using the cholesky factorization of an NxN double precision symmetric positive definite global array A
ga_solve(g_A, g_B) -- solves a system of linear equations [A]X = [B], trying first to use the Cholesky factorization routine; if not successful, calls the LU factorization routine ga_llt_solve, and solves the system with forward/backward substitution
ga_ma_base_address(type, address) -- auxiliary routine to provide MA base addresses of the data (calls C routines ga_ma_get_ptr())
ga_ma_sizeof(type) -- auxiliary routine to provide MA sizes of the arrays (calls C routines ga_ma_diff())

6.1.3.4 Use of TCGMSG global operation routines

In some cases (notably workstation clusters) the global array tools use a ``data-server'' process on each node in addition to the compute processes. Data-server processes don't follow the same flow of execution of compute processes, so TCGMSG global operations (brdcst, igop, and dgop) will hang when invoked. The global array toolkit provides ``wrapper'' functions (ga_brdcst, ga_igop, and ga_dgop) which properly exclude data server processes from the global communication and must be used instead of the corresponding TCGMSG functions.

6.1.3.5 Interaction between GA and message-passing

The limited buffering available on the IBM SP-1/2 means that GA and message-passing operations cannot interleave as readily as they do on other machines. Basically, in transitioning from GA to message passing or vice versa the application must call ga_sync().

6.1.4 ChemI/O

ChemIO is a high-performanc parallel I/O abstract programming interface for computational chemistry applications^6.2. The development of out-of-core methods for computational chemistry requires efficient and portable implementation of often complex I/O patterns. The ChemIO interface addresses this problem by providing high performance implementations on multiple platforms that hides some of the complexity of the underlying I/O patterns from the programmer through the use of high-level libraries. The interface is tailored to the requirements of large-scale computational chemistry problems and supports three distinct I/O models. These are

Disk Resident Arrays (DRA) -- for explicit transfer between global memory and secondary storage, allowing the programmer to manage the movement of array data structures between local memory, remote memory, and disk storage. This component supports collective I/O operations, in which multiple processors cooperate in a read or write operation and thereby enable certain useful optimizations.
Exclusive Access Files (EAF) -- for independent I/O to and from scratch files maintained on a per-processor basis. It is used for out-of-core computations in calculational modules that cannot easily be organized to perform collective I/O operations.
Shared Files (SF) -- for creation of a scratch file that can be shared by all processors. Each processor can perform noncollective read or write operations to an arbitrary location in the file.

These models are implemented in three user-level libraries in ChemIO; Disk Resident Arrays, Exclusive Access Files, and Shared Files. These libraries are layered on a device library, the Elementary I/O library (ELIO), which provides a portable interface to different file systems. The DRA, EAF, and SF modules are fully independent. Each one can be modified or even removed without affecting the others. ELIO itself is not exposed to applications.

6.1.4.1 Elementary I/O Library (ELIO)

The ELIO library implements a set of elementary I/O primitives including blocking and non-blocking versions of read and write operations, as well as wait and probe operations to control status of non-blocking read/writes. It also implements file operations such as open, close, delete, truncate, end-of-file detection, and an inquiry function for the file/filesystem that returns the amount of available space and the filesystem type. Most of these operations are commonly seen in various flavors of the UNIX filesystem. ELIO provides an abstract portable interface to such functionality.

(Insert gory details here.)

6.1.4.2 Disk Resident Arrays

The computational chemistry parallel algorithms in NWChem have been implemented in terms of the Global Arrays shared memory programming model. The GA library (see Section 6.1.3) uses a shared memory programming model in which data locality is managed explicitly by the programmer. This management is achieved by explicit calls to functions that transfer data between a global address space (a distributed array) and local storage. The GA library allows each process in a MIMD parallel program to access asynchronously logical blocks of physically distributed matrices without the need for explicit cooperation from other processes.

The GA model exposes to the programmer the non-uniform memory access (NUMA) characteristics of modern high-performance computer systems. The disk resident array (DRA) model extends the GA model to another level in the storage hierarchy, namely, secondary storage. It introduces the concept of a disk resident array -- a disk-based representation of an array -- and provides functions for transferring blocks of data between global arrays and disk arrays. It allows the programmer to access data located on disk via a simple interface expressed in terms of arrays rather than files.

At the present time, (NOTE: The source of this statement is a document created 5/10/95) all operations are declared to be collective. This simplifies implementation on machines where only some processors are connected to I/O devices.

Except where stated otherwise, all operations are synchronous (blocking) which means that control is returned to the calling process only after the requested operation completes.

All operations return an error code with value 0 if successful, greater than zero if not successful.

A program that uses Disk Resident Arrays should look like the following example:

      program foo
#include "mafdecls.h"
#include "global.fh"
#include "dra.fh"
c
      call pbeginf()                      ! initialize TCGMSG
      if(.not. ma_init(...)) ERROR        ! initialize MA
      call ga_initialize()                ! initialize Global Arrays
      if(dra_init(....).ne.0) ERROR       ! initialize Disk Arrays 

c     do work

      if(dra_terminate().ne.0)ERROR       ! destroy DRA internal data structures
      call ga_terminate                   ! terminate Global Arrays
      call pend()                         ! terminate TCGMSG
      end

List of DRA operations:

status = dra_init(max_arrays, max_array_size, total_disk_space, max_memory) -- initializes disk resident array I/O subsystem; max_array_size, total_disk_space and max_memory are given in bytes; max_memory specifies how much local memory per processor the application is willing to provide to the DRA I/O subsystem for buffering. The value of "-1" for any of input arguments means: "don't care", "don't know", or "use defaults"
- integer max_arrays -- [input]
- double precision max_array_size -- [input]
- double precision total_disk_space -- [input]
- double precision max_memory -- [input]
status = dra_terminate() -- closes all open disk resident arrays and shuts down DRA I/O subsystem.
status = dra_create(type,dim1,dim2,name,filename,mode,rdim1,rdim2,d_a) -- creates new disk resident array with specified dimensions and type. (Note: Only one DRA object can be stored in DRA meta-file identified by filename. DRA objects persist on the disk after calling dra_close(). dra_delete() should be used instead of dra_close() to delete disk array and associated meta-file on the disk. Disk array is implicitly initialized to "0".
- integer type -- [input] MA type identifier
- integer dim1 -- [input]
- integer dim2 -- [input]
- character*(*) name -- [input]
- character*(*) filename -- [input] name of an abstract meta-file that will store the data on the disk. The
- integer mode -- [input] specifys access permissions as read, write, or read-and-write
- integer rdim1,rdim2 -- [input] specifies dimensions of a "typical" request; value of "-1" for either rdim1 or rdim2 means "unspecified"
- integer d_a -- [output] DRA handle
status = dra_open(filename, mode, d_a) -- Open and assign DRA handle to disk resident array stored in DRA meta-file filename. Disk arrays that are created with dra_create and saved by calling dra_close can be later opened and accessed by the same or different application.
- character*(*) filename -- [input] name of an abstract meta-file that will store the data on the disk. The
- integer mode -- [input] specifys access permissions as read, write, or read-and-write
- integer d_a -- [output] DRA handle
status = dra_write(g_a, d_a, request) -- writes asynchronously specified global array to specified disk resident array; dimensions and type of g_a and d_a must match. If dimensions don't match, dra_write_section should be used instead. The operation is by definition asynchronous (but could be implemented as synchronous i.e., it would return only when I/O is done.)
- integer g_a -- [input] GA handle
- integer d_a -- [input] DRA handle
- integer request -- [output] request id
status = dra_write_section(transp, g_a, gilo, gihi, gjlo, gjhi, d_a, dilo, dihi, djlo, djhi, request) -- writes asynchronously specified global array section to specified disk resident array section: OP(g_a[ gilo:gihi, gjlo:gjhi]) -> d_a[ dilo:dihi, djlo:djhi], where OP is the transpose operator (.true./.false.). Returns error if the two section's types or sizes mismatch. See dra_write specs for discussion of request.
- logical transp -- [input] transpose operator
- integer g_a -- [input] GA handle
- integer d_a -- [input] DRA handle
- integer gilo -- [input]
- integer gihi -- [input]
- integer gjlo -- [input]
- integer gjhi -- [input]
- integer dilo -- [input]
- integer dihi -- [input]
- integer djlo -- [input]
- integer djhi -- [input]
- integer request -- [output] request id
status = dra_read(g_a, d_a, request) -- reads asynchronously specified global array from specified disk resident array; Dimensions and type of g_a and d_a must match; if dimensions don't match, dra_read_section could be used instead. See dra_write specs for discussion of request.
- logical transp -- [input] transpose operator
- integer g_a -- [input] GA handle
- integer d_a -- [input] DRA handle
- integer request -- [output] request id
status = dra_read_section(transp, g_a, gilo, gihi, gjlo, gjhi, d_a, dilo, dihi, djlo, djhi, request) -- reads asynchronously specified global array section from specified disk resident array section: OP(d_a[ dilo:dihi, djlo:djhi]) -> g_a[ gilo:gihi, gjlo:gjhi] where OP is the transpose operator (.true./.false.). See dra_write specs for discussion of request.
- logical transp -- [input] transpose operator
- integer g_a -- [input] GA handle
- integer d_a -- [input] DRA handle
- integer gilo -- [input]
- integer gihi -- [input]
- integer gjlo -- [input]
- integer gjhi -- [input]
- integer dilo -- [input]
- integer dihi -- [input]
- integer djlo -- [input]
- integer djhi -- [input]
- integer request -- [output] request id
status = dra_probe(request, compl_status) -- tests for completion of dra_write/read or dra_write/read_section operation which sets the value passed in request argument; completion status is 0 if the operation has been completed, non-zero if not done yet
- integer request -- [input] request id
- integer compl_status -- [output] completion status
status = dra_wait(request) -- blocks operations until completion of dra_write/read or dra_write/read_section operation which set the value passed in request argument.
- integer request -- [input] request id
status = dra_inquire(d_a, type, dim1, dim2, name, filename) -- returns dimensions, type, name of disk resident array, and filename of DRA meta-file associated with d_a handle.
- integer d_a -- [input] DRA handle
- integer type -- [output]
- integer dim1 -- [output]
- integer dim2 -- [output]
- character*(*) name -- [output]
- character*(*) filename -- [output]
status = dra_delete(d_a) -- deletes a disk resident array associated with d_a handle. Invalidates handle. The corresponding DRA meta-file is destroyed.
- integer d_a -- [input] DRA handle
status = dra_close(d_a) -- closes DRA meta-file associated with d_a handle and deallocates data structures corresponding to this disk array. Invalidates d_a handle. The array on the disk is persistent.
- integer d_a -- [input] DRA handle
subroutine dra_flick() -- returns control to DRA for a VERY short time to improve progress of pending asynchronous operations.

6.1.4.3 Exclusive Access Files (EAF)

The EAF module supports a particularly simple I/O abstraction in which each processor in a program is able to create files that it alone has access to. The EAF interface is similar to the standard C UNIX I/O interface and is implemented as a thin wrapper on the ELIO module. It provides Fortran and C applications with capabilities that include

eaf_write and eaf_read -- blocking write and read operations
eaf_awrite and eaf_aread -- non-blocking (asynchronous) write and read operations
eaf_wait and eaf_probe -- operations that can be used to control or determine completion status of outstanding nonblocking I/O requests
eaf_stats -- operation that takes a full path to a file or directory and returns the amount of disk space available and the filesystem type (e.g., PFS, PIOFS, standard UNIX, etc.)
eaf_length and eaf_truncate -- operations that can allow the programmer to determine the length of a file, and truncate a file to a specified length.
eaf_eof -- operation that determines whether the enf of the file has been reached
eaf_open, eaf_close, and eaf_delete -- functions that interface to UNIX open, close, and unlink operations

The syntax of EAF is similar to the standard Unix C file operations, although there are some differences, as a result of introducing new semantics or extended features available through EAF.

The primary functionality of EAF is illustrated here by tracing execution of example program segments.

Example 1: basic open-write-read-close sequence.

#include "chemio.h"
#include "eaf.fh"

	integer fh 		! File Handle
	integer sz 		! Return value of size written
	integer stat		! Return status
	integer buf(100) 	! Data to write

	fh = EAF_OpenPersist('/tmp/test.out', ELIO_RW) <- We probably want
						          CHEMIO_RW here

	sz = EAF_Write(fh, 0, buf, 100*EAF_SZ_INT)     <- What's the NWChem 
                                                          macro for int size?
	if(sz .ne. 100*EAF_SZ_INT) 
      $       write(0,*) 'Error writing, wrote ', sz, ' bytes'

	sz = EAF_Read(fh, 0, buf, 100*EAF_SZ_INT)
	if(sz .ne. 100*EAF_SZ_INT) 
      $       write(0,*) 'Error reading, read ', sz, ' bytes'

	stat = EAF_Close(fh)
	end

The include file 'chemio.h' defines the permission macros ELIO_R, ELIO_W, and ELIO_RW for read, write, and read-write permissions, respectively. The header file 'eaf.fh' is a Fortran program segment externally defining the EAF routines and must appear before any executable code using EAF.

EAF_OpenPersist opens a persistent file, as opposed to a scratch file (EAF_OpenScratch) which is deleted when it is closed. This file is named '/tmp/test.out' and has read-write permissions. The returned value is the file handle for this file and should not be directly manipulated by the user.

EAF_Write writes to the file opened with file handle, fh, at absolute offset 0. It is legal to write a scalar or array, for instance in the above example both 'buf' and 'buf(1)' have the same meaning. The last argument is the number of bytes to be written. It is important to multiply the number of array elements by the element size. The following macros are provided in 'eaf.fh':

EAF_SZ_BYTE
EAF_SZ_CHARACTER
EAF_SZ_INTEGER
EAF_SZ_LOGICAL
EAF_SZ_REAL
EAF_SZ_COMPLEX
EAF_SZ_DOUBLE_COMPLEX
EAF_SZ_DOUBLE_PRECISION

The return value is the number of bytes written. If this number does not match the requested number of bytes to be written, an error has occured.

Example 2: read/write operations

EAF_Read is syntactically and semantialy identical to EAF_Write, except the buffer is read, not written.

#include "chemio.h"
#include "eaf.fh"

	integer fh 		! File Handle
	integer id1, id2 	! asynchronous ID handles
	integer stat		! Return status
	integer pend		! Pending status
	integer iter		! Iterations counter
	integer buf(100), x	! Data

	iter = 0

	fh = EAF_OpenScratch('/piofs/mogill/test.out', ELIO_RW)

	stat = EAF_AWrite(fh, 0,  buf, 100*EAF_SZ_INT, id1)
	if(stat .ne. 0) write(0,*) 'Error doing 1st asynch write.  stat=', stat

	stat = EAF_AWrite(fh, 100*EAF_SZ_INT,  x, 1*EAF_SZ_INT, id2)
	if(stat .ne. 0) write(0,*) 'Error doing 2nd asynch write.  stat=', stat

100	stat = EAF_Probe(id1, pend)
	iter = iter + 1
	write(0,*) 'Waiting', iter
	if(iter .lt. 100  .and.  pend .eq. ELIO_PENDING) goto 100
	EAF_Wait(id1)

	stat = EAF_ARead(fh, 0, buf, 100*EAF_SZ_INT, id1)
	if(stat .ne. 0) write(0,*) 'Error doing 1st asynch read.  stat=', stat

	EAF_Wait(id2)
	stat = EAF_AWrite(fh, 100*EAF_SZ_INT,  x, 1*EAF_SZ_INT, id2)
	if(stat .ne. 0) write(0,*) 'Error doing 2nd asynch write.  stat=', stat
	EAF_Wait(id2)
	EAF_Wait(id1)

	stat = EAF_Close(fh)
	end

This example demonstrates use of asynchronous reading and writing. The entire buffer 'buf' is written to offset 0, the beginning of the. The file is simultaniously written to from the scalar x in the position following the buffer. The positions in the file are determined by abosulte offset argument as with the synchronous write.

The first write, id1, is repeatedly probed for completion for 100 tries or until completion, whichever comes first. The two possible pending statuses are ELIO_DONE and ELIO_PENDING.

When a completed asynchronous operation is detected with EAF_Wait or EAF_Probe, the id is invalidated with ELIO_DONE. The following EAF_Wait(id1) blocks until id1 completes. Using EAF_Probe or EAF_Wait with an invalidated ID has no effect.

Once id1 is freed, it is reused in the first asynchronous read statement. The following EAF_Wait blocks for completion and invalidation of id2, which is then used to asynchronously read the scalar X.

The EAF_Close deletes the file because it was opened as a scratch file.

List of EAF Functions

integer EAF_OpenPersist(fname, type) -- opens a persistent file; returns file handle, or -1 upon error character *(*) fname integer type
- character fname -- Character string of a globally unique filename (path may be fully qualified)
- integer type -- Read write permissions. Legal values are ELIO_W, ELIO_R, and ELIO_RW
integer EAF_OpenScratch(fname, type) -- open a scratch file that is automatically deleted upon close; returns file handle, or -1 upon error
- character fname -- Character string of a globally unique filename (path may be fully qualified)
- integer type -- Read write permissions. Legal values are ELIO_W, ELIO_R, and ELIO_RW
integer EAF_Write(fh, offset, buf, bytes) -- synchronously write to the file specified by the file handle; returns number of bytes written, or -1 on error
- integer fh - File Handle
- integer offset -- Absolute offset, in bytes, at which to start writing
- any buf -- Scalar or array of data
- integer bytes -- Size of buffer, in bytes
integer EAF_AWrite(fh, offset, buf, bytes, req_id) -- asynchronously writes to the file specified by the file handle, and returns a handle to the asynchronous operation; if there are more than MAX_AIO_REQ asynchronous requests (reading or writing) pending, the operation is handled in a synchronous fashion and returns a "DONE" handle. Returns 0 if successful, -1 if an error occurs. (On architectures where asynchronous I/O operations are not supported, all requests are handled synchronously, returning a "DONE" handle.)
- integer fh -- [input] file descriptor
- integer offset -- [input] absolute offset, in bytes, to start writing at
- any buf - [input] scalar or array of data
- integer bytes -- [input] size of buffer, in bytes
- integer req_id -- [output] handle of asynchronous operation
integer EAF_Read(fh, offset, buf, bytes) -- synchronously reads from the file specified by the file handle; returns number of bytes read, or -1 if an error occurs
- integer fh -- [input] file descriptor
- integer offset -- [input] absolute offset, in bytes, to start writing at
- any buf -- [input] scalar or array of data
- integer bytes -- [input] size of buffer, in bytes
integer EAF_ARead(fh, offset, buf, bytes, req_id) -- asynchronously reads from the file specified by the file handle, and returns a handle to the asynchronous operation. If there are more than MAX_AIO_REQ asynchronous requests (reading or writing) pending, the operation is handled in a synchronous fashion and returns a "DONE" handle. On architectures where asynchronous I/O operations are not supported, all requests are handled synchronously, returning a "DONE" handle. Returns 0 if successful; -1 if an error occurs.
- integer fh -- [input] file descriptor
- integer offset -- [input] absolute offset, in bytes, to start writing at
- any buf -- [input] scalar or array of data
- integer bytes -- [input] size of buffer, in bytes
- integer req_id -- [output] handle of asynchronous operation
integer EAF_Probe(id, status) -- determines if an asynchronous request is completed or pending; returns ELIO_OK if successful, or ELIO_FAIL if not successful; 'status' returns ELIO_PENDING if the asyncronous operation is not complete, or ELIO_DONE if finished. When the asynchronous request is complete, the 'id' is invalidated with ELIO_DONE.
- integer id -- [input] handle of asynchronous request
- integer status -- [output] pending or completed status argument
integer EAF_Wait(id) -- waits for the completion of the asynchronous request, id; returns ELIO_OK if successful, or ELIO_FAIL if not successful; 'id' is invalidated with ELIO_DONE
- integer id -- [input] handle of asynchronous request
integer EAF_Close(fh) -- closes a file; returns ELIO_OK if successful; aborts if not successful
- integer fh -- [input] file handle

6.1.4.4 Shared Files (SF)

The Shared File module supports the abstraction of a single contiguous secondary storage address space (a "file") that every processor has access to. Processes create and destroy SF objects in a collective fashion, but all other file I/O operations are non- collective. A shared file can be thought of as a one-dimensional array of bytes located in shared memory, except that the library interface is required to actually access the data.

The library is capable of determining the striping factor and all other internal optimizations for the "file". The programmer has the option, however, of giving the library a few helpful hints, to reduce the number of decisions the interface must take care of. These hints are supplied when the shared file is created, and can be any or all of the following:

Specify a hard limit (not to be exceeded) for the file size.
Specify a soft limit for the file size; that is, an estimate of the expected shared file size, which can be exceeded at run time, if necessary.
Specify the size of a "typical" request.

Non-collective I/O operations in SF include read, write, and wait operations. Read and write operations transfer the specifeid number of bytes between local memory and disk at a specified offset. The library does not perform any explicit control of consistency in concurrent accesses to overlapping sections of the shared files. For example, SF semantics allow a write operation to return before the data transfer is complete. This requires special care in programs that perform write operations in critical sections, since unlocking access to a critical section before write completes is unsafe. To allow mutual exclusion control in access to shared files, the sf_wait function is provide. It can be used to enforce completion of the data transfer so that the data can be safely accessed by another process after access to the critical section is released by the writing process. The function sf_waitall can be used to force the program to wait for completion of multiple SF operations specified through an arugment arry of request identifiers.

The actual size of a shared file might grow as processes perform write operations beyond the current end-of-file boundary. Data in shared files are implicitly initialized to zero, which means that read operations at locations that have not been written to return zero values. However, reading behond the current end-of-file boundary is an error.

Shared files can be used to build other I/O abstractions. In many cases, this process requires adding an additional consistency control layer. A single file pointer view, for example, can be implemented by adding an automatically modifiable pointer variable located in shared memory by using the GA toolkit, or some other means.

The shared files model consists of the following elements:

Shared files are non-persistent (temporary)
Shared files resemble one-dimensional arrays in main memory
Each process can independently read/write to any location in the file
The file size has a hard limit specified when it is created
User can also specify (or use "don't know" flag) the estimated approximate file size - might be exceeded at run-time (a hint)
sf_flush flushes the buffers so that previously written data goes to the disk before the routine returns.
All routines return an error code: "0" means success.
sf_create and sf_destroy are collective
file, request sizes, and offset (all in bytes) are DOUBLE PRECISION arguments, all the other arguments are INTEGERS
read/writes are asynchronous

List of SF Functions:

integer sf_create(fname, size_hard_limit, size_soft_limit, req_size, handle)
        fname            -- meta-file name
        size_hard_limit  -- max file size in bytes not to be exceeded (a hint)
        size_soft_limit  -- estimated file size (a hint)
        req_size         -- size of  a typical request (a hint)
        handle           -- returned handle to the created file

Creates shared file using name and path specified in fname as a template. Function req_size specifies size of a typical request (-1 = "don't know").

integer sf_write(handle, offset, bytes, buffer, request_id)
        handle           -- file handle returned from sf_create   [in]
        offset           -- location in file (from the beginning)
                            where data should be written to       [in]
        buffer           -- local array to put the data           [in]
        bytes            -- number of bytes to read               [in]
        request_id       -- id identifying asynchronous operation [out]

asynchronous write operation

integer sf_read(handle, offset, bytes, buffer, request_it)
        handle           -- file handle returned from sf_create   [in]
        offset           -- location in file (from the beginning)
                            where data should be read from        [in]
        buffer           -- local array to put the data           [in]
        bytes            -- number of bytes to read               [in]
        request_id       -- id identifying asynchronous operation [out]

asynchronous read operation

integer sf_wait(request_id)
        request_id       -- id identifying asynchronous operation [in/out]

blocks calling process until I/O operation associated with id completed, invalidates request_id

integer sf_waitall(list, num)
        list(num)        -- array of ids for asynchronous operations [in/o]
        num              -- number of entries in list                [in]

blocks calling process until all "num" I/O operations associated with ids specified in list completed, invalidates ids on the list

integer sf_destroy(handle)
        handle           -- file handle returned from sf_create      [in]

6.2 The Run Time Data Base (RTDB)

The run time data base is the parameter and information repository for the independent modules (e.g., SCF, RIMP2) comprising NWChem. This approach is similar in spirit to the GAMESS dumpfile or the Gaussian checkpoint file. The only way modules can share data is via the database or via files, the names of which are stored in the database (and may have default values). Information is stored directly in the database as typed arrays, each of which is described by

a name, which is a simple string of ASCII characters (e.g., "reference energies"),
the type of the data (real, integer, logical, or character),
the number of data items, and
the actual data (an array of items of the specified type).

A database is simply a file and is opened by name. Usually there is just one database per calculation, though multiple databases may be open at any instant.

By default, access to all open databases occur in parallel, meaning that

all processes must participate in any read/write of any database and any such operation has an implied synchronization
writes to the database write the data associated with process zero but the correct status of the operation is returned to all processes
reads from the database read the data named by process zero and broadcast the data to all processes, checking dimensions and types of provided arrays

Alternatively, database operations can occur sequentially. This means that only process zero can read/write the database, and this happens with no communication or synchronization with other processes. Any read/write operations by any process other than process zero is an error.

Usually, all processes will want the same data at the same time from the database, and all processes will want to know of the success or failure of operations. This is readily done in the default parallel mode. An exception to this is during the reading of input. Usually, only process zero will read the input and needs to store the data directly into the database without involving the other processes. This is done using sequential mode.

The following subsections contain a detailed listing of the C and Fortran API. Programs using RTDB routines must include the appropriate header file; rtdb.fh for Fortran, or rtdb.h for C. These files define the return types for all rtdb functions. In addition, rtdb.fh specifies the following parameters

rtdb_max_key -- an integer parameter that defines the maximum length of a character string key
rtdb_max_file -- an integer parameter that defines the maximum length of a file name

The Fortran routines return logical values; .true.on success, .false. on failure. The C routines return integers; 1 on success, or 0 on failure. All rtdb_* functions are also mirrored by routines rtdb_par_* in which process 0 performs the operation and all other processes are broadcast the result of a read and discard writes.

6.2.1 Functions to Control Access to the Runtime Database

The functions that control opening, closing, writing to and reading information from the runtime database are described in this section.

6.2.1.1 `rtdb_parallel`

C routine:

  int rtdb_parallel(const int mode)

Fortran routine:

  logical function rtdb_parallel(mode)
  logical mode              [input]

This function sets the parallel access mode of all databases to mode and returns the previous setting. If mode is true then accesses are in parallel, otherwise they are sequential.

6.2.1.2 `rtdb_open`

C routine:

  int rtdb_open(const char *filename, const char *mode, int *handle)

Fortran routine:

  logical function rtdb_open(filename, mode, handle)
  character *(*) filename   [input]
  character *(*) mode       [input]
  integer handle            [output]

This function opens a database. It requires the following arguments:

Filename -- path to file associated with the data base
mode -- specify initial condition of data base
- new -- Open only if it does not exist already
- old -- Open only if it does exist already
- unknown -- Create a new database or open the existing database Filename (preserving contents)
- empty -- Create a new database or open the existing database Filename (deleting contents)
- scratch -- Create a new database or open the existing database Filename (deleting contents) that will be automatically deleted upon closing. Note that items cached in memory are not written to disk when this mode is specified.
handle -- returns an integer handle which must be used in all future references to the database

6.2.1.3 `rtdb_close`

C routine:

  int rtdb_close(const int handle, const char *mode)

Fortran routine:

  logical function rtdb_close(handle, mode)
  integer handle            [input]
  character*(*) mode        [input]

This function closes a database. It requires the following arguments:

handle -- unique handle created when the database was first opened
mode -- specifies the fate of the information in the database after closing;
- keep -- Preserve the data base file to enable restart
- delete -- Delete the data base file, freeing all resources

When closing a database file that has been opened with the rtdb_open argument mode specified as scratch, the value for mode for the function rtdb_close is automatically set to delete. Database files needed for restart must not be opened as scratch files.

6.2.1.4 `rtdb_put`

C routine:

  int rtdb_put(const int handle, const char *name, const int ma_type,
               const int nelem, const void *array)

Fortran routine:

  logical function rtdb_put(handle, name, ma_type, nelem, array)
  integer handle            [input]
  character *(*) name       [input]
  integer ma_type           [input]
  integer nelem             [input]
  <ma_type>                 [input]
  nelem                     [input]
  array                     [input]

This function inserts an entry into the database, replacing the previous entry. It requires the following arguments:

handle -- unique handle created when the database was first opened
name -- entry name of data array to be put into the database (null-terminated character string)
ma_type -- MA type of the entry
nelem -- number of elements of the given type
array -- array of length nelem containing data to be inserted

6.2.1.5 `rtdb_get`

C routine:

  int rtdb_get(const int handle, const char *name, const int ma_type,
               const int nelem, void *array)

Fortran routine:

  logical function rtdb_get(handle, name, ma_type, nelem, array)
  integer handle            [input]
  character *(*) name       [input]
  integer ma_type           [input]
  integer nelem             [input]
  <ma_type>                 [output]
  nelem                     [output]
  array                     [output]

This function gets an entry from the data base. It requires the following arguments:

handle -- unique handle created when the database was first opened
name -- entry name of data array to get from the database (null-terminated character string)
ma_type -- MA type of the entry (which must match entry type in the database)
nelem -- size of array in units of ma_type
array -- buffer of length nelem defined by calling routine to store the returned data

6.2.1.6 `rtdb_cput` and `rtdb_cget`

  logical function rtdb_cput(handle, name, nelem, buf)
  integer handle            [input]
  character *(*) name       [input]
  character *(*) buf        [input]

  logical function rtdb_cget(handle, name, nelem, buf)
  integer handle            [input]
  character *(*) name       [input]
  character *(*) buf        [output]

These functions are Fortran routines to provide put/get functionality for character variables. The functions have identical argument lists, the only difference between them is that for rtdb_cput, the specified character data is put into the database, and for rtdb_cget the data is copied from the database. The arguments are as follows;

handle -- unique handle created when the database was first opened
name -- entry name of data array to get from the database (null-terminated character string)
buf -- character variable to be put into the database (for rtdb_cput, or character buffer in calling routine to store returned character data (for rtdb_cget.

6.2.1.7 `rtdb_ma_get`

C routine:

  int rtdb_ma_get(const int handle, const char *name, int *ma_type,
                  int *nelem, int *ma_handle)

Fortran routine:

  logical function rtdb_ma_get(handle, name, ma_type, nelem, ma_handle)
  integer handle            [input]
  character *(*) name       [input]
  integer ma_type           [output]
  integer nelem             [output]
  integer ma_handle         [output]

This function returns the MA type, number of elements of that type, and the MA handle of the entry specified. (The MA handle is to memory automatically allocated to hold the data read from the database.) the function requires the following arguments:

handle -- unique handle created when the database was first opened
name -- entry name of information to get from the database (null-terminated character string)
ma_type -- returns MA type of the entry in the database
nelem -- returns number of elements of type ma_type in data
ma_handle -- returns MA handle to data

6.2.1.8 `rtdb_get_info`

C routine:

  int rtdb_get_info(const int handle, const char *name, int *ma_type, 
                    int *nelem, char date[26])

Fortran routine:

  logical function rtdb_get_info(handle, name, ma_type, nelem, date)
  integer handle            [input]
  character *(*) name       [input]
  integer ma_type           [output]
  integer nelem             [output]
  character*26 date         [output]

This function queries the database to obtain the number of elements in the specified entry, it's MA type, and the date of its insertion into the rtdb. It requires the following arguments:

handle -- unique handle created when the database was first opened
name -- entry name of data for which information is to be obtained (null-terminated character string in C, standard FORTRAN character constant or variable in FORTRAN)
ma_type -- returns MA type of the entry
nelem -- returns number of elements of the given type
date -- returns date of insertion (null-terminated character string or FORTRAN character variable)

6.2.1.9 `rtdb_first` and `rtdb_next`

C routines:

  int rtdb_first(const int handle, const int namelen, char *name)

  int rtdb_next(const int handle, const int namelen, char *name)

Fortran routines:

  logical function rtdb_first(handle, name)
  integer handle            [input]
  character *(*) name       [output]

  logical function rtdb_next(handle, name)
  integer handle            [input]
  character *(*) name       [output]

These routines enable iteration through the items in the database in an effectively random order. The function rtdb_first returns the name of the first user-inserted entry in the datbase. The function rtdb_next returns the name of the user-inserted entry put into the data base after the entry identified on the previous call to rtdb_next (or the call to rtdb_first, on the first call to rtdb_next).

The arguments required for the C routines are as follows:

handle -- unique handle created when the database was first opened
namelen -- size of buffer in calling routine required to store name
name -- buffer to hold returned name of next (or first) entry in the database

The Fortran routines require the same arguments for handle and name, but it is not necessary to define the length of the buffer required.

An example of the use of these functions in C is to count and print the name of all entries in the database. The coding for this can be implemented as follows;

  char name[256];
  int n, status, rtdb;

  for (status=rtdb_first(rtdb, sizeof(name), name), n=0;
       status;
       status=rtdb_next(rtdb, sizeof(name), name), n++) 
    printf("entry %d has name '%s'\n", n, name);

6.2.1.10 `rtdb_delete`

C routine:

  int rtdb_delete(const int handle, const char *name)

Fortran routine:

  logical function rtdb_delete(handle, name)
  integer handle            [input]
  character *(*) name       [input]

This function deletes an entry from the database.

handle -- unique handle created when the database was first opened
name -- entry name of data to delete from the database (null-terminated character string)

This function does not return any arguments. The value the function itself returns as indicates success or failure of the delete operation. The function returns as

1 if key was present and successfully deleted
0 if key was not present, or if an error occured

6.2.1.11 `rtdb_print`

C routine:

  int rtdb_print(const int handle, const int print_values)

Fortran routine:

  logical function rtdb_print(handle, print_values)
  integer handle            [input]
  logical print_values      [input]

This function prints the contents of the data base to STDOUT. It requires the following arguments:

handle -- unique handle created when the database was first opened
print_values -- (boolean flag) if true, values as well as keys are printed out.

Next: 7. Utilities Up: NWCHEM Programmer's Guide, Release Previous: 5. Integral Application Programmer's Contents

Dunyou Wang 2009-03-13