Next: 7. Utilities
Up: NWCHEM Programmer's Guide, Release
Previous: 5. Integral Application Programmer's
Contents
Subsections
The Software Development Toolkit is the foundation of the functional architecture
in NWChem. It consists of various useful elements for memory management and
data manipulation that are needed to facilitate the
development of parallel computational chemistry algorithms. The
memory management elements implement the NUMA memory management module
for efficient execution in parallel enviroments and provides the means
for interfacing between the calculation modules
of the code and the system hardware. Efficient data manipulation is
accomplished using the runtime data base, which
stores the information needed to run particular calculations and allows
different modules to have access to the same information.
This chapter describes the various elements of the Software Development Toolkit
in detail.
6.1 Non-Uniform Memory Allocation (NUMA)
All computers have several levels of memory, with parallel computers generally
having more than computers with only a single processor. Typical memory levels
in a parallel computer include the processor registers,
local cache memory, local main memory,
and remote memory. If the computer also supports virtual memory, local and
remote disk memory are added to this heirarchy. These levels vary in size,
speed, and method of access, and in NWChem the differences among them are
lumped under the general concept Non-Uniform Memory Access (NUMA). This
approach allows the developer to think of all memory anywhere in the system as
accessible to any processor as needed. It is then possible to focus
independently on the questions of memory access methods and memory access costs.
Memory access methods are determined by the programming model and available
tools and the desired coding
style for an application. Memory access costs are determined by the
program structure and the performance characteristics of the computer system.
The design of a code's major algorithms, therefore, is critical to
the creation of an efficient parallel program.
In order to scale to massively parallel computer
architectures in all aspects of the hardware (i.e., CPU, disk,
and memory), NWChem uses Non-Uniform Memory Access
to distribute the data across all nodes. Memory access is achieved
through explicit message passing using the TCGMSG interface.
The Memory Allocator (MA) tool is used to allocate memory that is local to
the calling process. The Global Arrays (GA) tool is used to share
arrays between processors as if the memory were physically shared.
The complex I/O patterns required to accomplish efficient memory management
are handled with the abstract programming interface ChemIO.
The following subsections discuss the TCGMSG message passing tool,
the Memory Allocator library,
the Global Arrays library, and ChemIO, and describe how they are used in NWChem.
TCGMSG6.1is a toolkit for writing portable parallel programs using
a message passing model. It is relatively simple, having limited
functionality that includes point-to-point communication, global
operations, and a simple load-balancing facility, and was designed with
chemical applications in mind. This simplicity contributes to the
robustness of TCGMSG and its expemlary portability, and also to
its high performance for a wide range of problem sizes.
The model used by TCGMSG operates as if it is sending a block until the message
is explicitly received, and the messages from a particular process can be
received only in the order sent. Processes should be thought of as being
connected with ordered
synchronous channels, even though messages are actually sent without any
synchronization between sender and receiver, so far as buffering permits.
The amount of buffering is
greatly dependent on the mechanism used by the particular platform, so it
is best not to count on this feature.
Detailed information that includes
documentation of the programming interface is available on-line as part
of the EMSL webpage, at
/docs/parsoft/tcgmsg/
A more general tool for message passing is MPI, which includes concepts
such as process groups, communication contexts, and virtual topologies.
Process groups can be used to specify that only certain processes are
involved in a particular task, or to allow separate groups of processes
to work on different tasks. Communication context provides an additional
criterion for message selection, enhancing internal communication
flexibility without incurring conflicts with other modules. MPI has been
implemented in NWChem as an alternative to TCGMSG, and the code
can be compiled with this option specified. However, it
is not an undertaking for the faint of heart and it is highly advisable to
contact nwchem-support@emsl.pnl.gov
before trying this option.
The TCGMSG-MPI library is distributed with the Global Arrays package.
This library is an implementation of the TCGMSG message passing inteface on top
of MPI and system-specific resources. Using this library, it is
possible to use both MPI and TCGMSG interfaces in the same application.
TCGMSG offers a much smaller set of operations than MPI, but these include
some unique capabilties, such as
- nxtval - a shared memory counter with atomic updates, often used in
dynamic load balancing operations
- plcopy - function to copy content of a sequential file to all processes
- mitoh, mdtoh, etc. - portable Fortran equivalents of the C sideol operator
The nxtval operation is implemented in TCGMSG-MPI in different ways, depending
on the platform.
- SGI Origin-X - shared memory and mutexes or semaphores
- IBM SP
- under MPL - interrupt receive
- under LAPI communication library - atomic read-modify-write
- under thread-safe MPI - atomic read-modify-write
- Intel NX - interrupt receive, with signal-based implementation
of the MPI library
- Cray T3D/E - SHMEM library
- Fujistu VX/VPP - MPlib
- server implementation using dedicated MPI process
Detailed information that includes
documentation of the programming interface is available on-line as part
of the EMSL webpage, at
/docs/parsoft/tcgmsg-mpi/
The Memoray Allocator (MA) is used to allocate data that will generally not be directly
shared with other processes, such as workspace for a particular local
calculation or for replication of very small sets of data. The MA tool
is a library of routines that comprises a dynamic memory allocator
for use by C, FORTRAN, or mixed-language applications. It provides
both heap and stack memory management disciplines, debugging and
verification support (for detecting memory leaks, for example), usage
statistics, and quantitative memory availability information.
Applications written in FORTRAN
require this sort of library because the language does not
support dynamic memory allocation. Applications written in C can benefit from
using MA instead of the ordinary malloc() and free()
routines because of the extra features MA provides, which include both heap and
stack memory management disciplines, debugging and verification
support, usage statistics, and quantitative memory availability
information. MA is designed to be portable across a large variety of
platforms.
Detailed information on specific routines is available in the MA man pages.
This can be accessed by means of the command, man ma. (Note: this
will work only if the local environmental variable MANPATH includes
the path $(NWCHEM_TOP)/src/man/ma/man. See Section 8.3
for information on system and environmental requirements for running NWChem.)
The following subsections present a summary list of the MA routines, and
a brief discussion of the implementation of this feature.
All MA memory must be explicitly assigned a specific type by defining each
data item in units of integer, logical,
double precision, or character words. The type of data is specified
in arguments using predefined Fortran parameters (or macros in C).
These parameters are available in the include
files mafdecls.fh
in
Fortran and in macdecls.h
in C. The parameters are typed as follows:
MT_INT
-- integer
MT_DBL
-- double precision
MT_LOG
-- logical
MT_CHAR
-- character*
1
To access required MA definitions, C applications should include
macdecls.h and FORTRAN applications should include
mafdecls.fh. These are public header files for a dynamic memory
allocator, and are included in the .../src/ma
subdirectory of the
NWChem directory tree. The files contain the type declarations
and parameter declarations for the datatype constants, and define
needed functions and variable types.
The memory allocator uses the following memory layout definitions:
- segment = heap_region stack_region
- region = block block block ...
- block = AD gap1 guard1 client_space guard2 gap2
A segment of memory is obtained from the OS upon initialization. The
low end of the segment is managed as a heap. The heap region grows
from low addresses to high addresses. The high end of the segment is
managed as a stack. The stack region grows from high addresses to low
addresses.
Each region consists of a series of contiguous blocks, one per
allocation request, and possibly some unused space. Blocks in the
heap region are either in use by the client (allocated and not yet
deallocated) or not in use by the client (allocated and already
deallocated). A block on the rightmost end of the heap region becomes
part of the unused space upon deallocation. Blocks in the stack
region are always in use by the client, because when a stack block is
deallocated, it becomes part of the unused space.
A block consists of the client space, i.e., the range of memory
available for use by the application. Guard words adjacent to each end
of the client space to help detect improper memory access by the
client. Bookkeeping information is stored(?) in an "allocation descriptor" AD
Two gaps, each zero or more bytes long, are defined to satisfy alignment constraints
(specifically, to ensure that AD and client_space are aligned
properly).
All MA routines are shown below, grouped by category and listed
alphabetically within each category. The FORTRAN interface is given
here. Information on the the C interface are available in the man
pages.
(The man
pages also contain more
detailed information on the
arguments for these routines.)
Initialization:
- MA_init(datatype, nominal_stack, nominal_heap)
- integer datatype
- integer nominal_stack
- integer nominal_heap
- MA_sizeof(datatype1, nelem1, datatype2)
- integer datatype1
- integer nelem1
- integer datatype2
- MA_sizeof_overhead(datatype)
- MA_initialized()
Allocation:
- MA_alloc_get(datatype, nelem, name, memhandle, index)
- integer datatype
- integer nelem
- character*(*) name
- integer memhandle
- integer index
- MA_allocate_heap(datatype, nelem, name, memhandle)
- integer datatype
- integer nelem
- character*(*) name
- integer memhandle
- MA_get_index(memhandle, index)
- integer memhandle
- integer index
- MA_get_pointer() -- C only
- MA_inquire_avail(datatype)
- MA_inquire_heap(datatype)
- MA_inquire_stack(datatype)
- MA_push_get(datatype, nelem, name, memhandle, index)
- integer datatype
- integer nelem
- character*(*) name
- integer memhandle
- integer index
- MA_push_stack(datatype, nelem, name, memhandle)
- integer datatype
- integer nelem
- character*(*) name
- integer memhandle
Deallocation:
- MA_chop_stack(memhandle)
- MA_free_heap(memhandle)
- MA_pop_stack(memhandle)
Debugging:
- MA_set_auto_verify(value)
- logical value
- integer ivalue
- MA_set_error_print(value)
- logical value
- integer ivalue
- MA_set_hard_fail(value)
- logical value
- integer ivalue
- MA_summarize_allocated_blocks
- MA_verify_allocator_stuff()
Iteration Over Allocated Blocks:
- MA_get_next_memhandle(ithandle, memhandle)
- integer ithandle
- integer memhandle
- MA_init_memhandle_iterator(ithandle)
Statistics:
- MA_print_stats(oprintroutines)
Errors considered fatal by MA result in program termination. Errors
considered nonfatal by MA cause the MA routine to return an error
value to the caller. For most boolean functions, false is returned
upon failure and true is returned upon success. (The boolean
functions for which the return value means something other than
success or failure are MA_set_auto_verify(), MA_set_error_print(), and MA_set_hard_fail().) Integer
functions return zero upon failure; depending on the function, zero
may or may not be distinguishable as an exceptional value.
An application can force MA to treat all errors as fatal via
MA_set_hard_fail().
If a fatal error occurs, an error message is printed on the standard
error (stderr). By default, error messages are also printed for
nonfatal errors. An application can force MA to print or not print
error messages for nonfatal errors via MA_set_error_print().
Globally addressable arrays have been developed to simplify writing
portable scientific software for both shared and distributed memory
computers. Programming convenience, code extensibility and
maintainability are gained by adopting the shared memory programming
model. The Global Array (GA) toolkit provides an efficient and portable
"shared memory" programming interface for distributed memory computers.
Each process in a MIMD parallel program can asynchronously access
logical blocks of physically distributed matrices without need for
explicit cooperation by other processes.
The trade-off with this approach is that
access to shared data will be slower than access
to local data, and the programmer must be aware of this in designing modules.
From the user perspective, a global array can be used as if it was stored
in the shared memory. Details of the data distribution, addressing and
communication are encapsulated in the global array objects. However,
the information on the actual data distribution can be obtained and
taken advantage of whenever data locality is important.
The Global Arrays tool has been designed to complement the message-passing
programming model. The developer can use both shared memory and message
passing paradigms in the same program, to take advantage of existing
message-passing software libraries such as TCGMSG. This tool is also
compatible with the Message Passing Interface (MPI). The Global Arrays toolkit
has been in the public domain since 1994 and is actively supported. Additional
documentation and information on performance and applications is available
on the web site /docs/global/.
Currently support is limited to 2-D double precision or integer arrays
with block distribution, at most one block per array per processor.
Available global (GA)
and local (MA) memory can interact within NWChem in only two ways,
- GA is allocated within MA, and GA is limited only by the available
space in MA.
- GA is not allocated within MA, and GA is limited at initialization
(within NWChem input this is controlled by the MEMORY directive)
If GA is allocated within MA, then
the available GA space is limited to the currently available MA space. This
also means that the total allocatable memory for GA and MA must be
no more than the available MA space.
If GA is not allocated within MA, then local and global arrays occupy essentially
independent space. The allocatable memory for GA is limited only by the available
space for GA, and similarly, the allocatable memory for MA is limited only
by the available local memory.
When allocating space for GA,
some care must be exercised in the treatment of the information returned by
the routine ga_memory_avail()
, whether or not
the allocation is done in MA. The routine ga_memory_avail()
returns the amount of memory (in bytes)
available for use by GA in the calling process.
This returned value must be converted to double precision words when
using double precision.
If a uniformly distributed GA is desired, it is also necessary to find
the minimum of this value across all nodes. This value will in general be
a rather large number.
When running on a platform with many nodes and having a large memory,
the agreggate GA memory, even in double precision words, could be a large enough
value to overflow a
32-bit integer. Therefore, for calculations that require knowing the size of
total memory, it is advisable to first store the size of memory on each node
in a double precision
number and then sum these values across all the nodes.
The following pseudo-code illustrates this process for an application.
#include "global.fh"
#include "mafdecls.fh"
integer avail_ma, avail_ga
avail_ma = ma_inquire_avail(mt_dbl)
avail_ga = ga_memory_avail()/ma_sizeof(mt_dbl,1,mt_byte)
if (ga_uses_ma()) then
c
c available GA space is limited to currently available MA space,
c and GA and MA share the same space
c
allocatable_ga + allocable_ma <= avail_ma = avail_ga
else
c
c GA and MA are independent
c
allocatable_ga <= avail_ga
allocatable_ma <= avail_ma
endif
c
c find the minimum value of available GA space over all nodes
c
call ga_igop(msgtype,avail_ga,1,'min')
c
c determine the total available GA space
c
double precision davail_ga
davail_ga = ga_memory_avail()/ma_sizeof(mt_dbl,1,mt_byte)
call ga_dgop(msgtype,davail_ga,1,'+')
The following routines are invoked for operations that are globally collective.
That is, they must be
simultaneously invoked by all processes as if in SIMD mode.
- ga_initialize_() -- initialize global array internal
structures
- ga_initialize_ltd(mem_limit) -- initialize global arrays and set
memory usage limits (note: if
mem_limit
is less than zero specifies
unlimited memory usage.)
- integer mem_limit -- [input] GA total memory ( specifying less than 0
means "unlimited memory")
- ga_create(type,dim1,dim2,array_name,chunk1,chunk2,g_a) -- create an array
- integer type -- [input] MA type
- integer dim1, dim2 -- [input] array dimensions (dim1,dim2) as in FORTRAN
- character array_name-- [input] unique character string identifying the array
- integer chunk1, chunk2 -- [input] minimum size that dimensions should
be chunked up into;
setting chunk1=dim1 gives distribution by rows
setting chunk2=dim2 gives distribution by columns
Actual chunk sizes are modified so that they are
at least the min size and each process has either
zero or one chunk.
(Specifying both as less than or equal to 1
yields an even distribution)
- integer g_a [output] integer handle for future references
- ga_create_irreg(type, dim1, dim2, array_name, map1, nblock1,
map2, nblock2, g_a) -- create an array with irregular
distribution
- integer type -- [input] MA type
- integer dim1, dim2 -- [input] array dimensions (dim1,dim2) as in FORTRAN
- character array_name-- [input] unique character string identifying the array
- integer map1 -- [input] number ilo in each block
- integer nblock1 -- [input] number of blocks dim1 is divided into
- integer map2 -- [input] number jlo in each block
- integer nblock2 -- [input] number of blocks dim2 is divided into
- integer g_a -- [output] integer handle for future references
- ga_duplicate(g_a, g_b, array_name) -- create an array with same properties as reference
array
- character array_name -- [input] unique character string identifying the array
- integer g_a -- [output] integer handle for reference array
- integer g_b -- [output] integer handle for new array
- ga_destroy_(g_a) -- destroy an array
- integer g_a -- [input] integer handle of array to be destroyed
- ga_terminate_() -- destroys all existing global arrays
and de-allocates shared memory
- ga_sync_() -- synchronizes all processes (a barrier)
- ga_zero_(g_a) -- zero an array
- integer g_a -- [input] integer handle of array to be zeroed
- ga_ddot_(g_a, g_b) -- dot product of two arrays (double precision only)
- integer g_a -- [input] integer handle of first array in dot product
- integer g_b -- [input] integer handle of second array in dot product
- ga_dscal -- scale the elements in an array by a constant
(double precision data only)
- ga_dadd -- scale and add two arrays to put result in a
third (may overwrite one of the other two, doubles only)
- ga_copy(g_a, g_b) -- copy one array into another
- integer g_a -- [input] integer handle of array to be copied
- integer g_b -- [input] integer handle of array g_a is copied into
- ga_dgemm(transa, transb, m, n, k, alpha, g_a, g_b, beta, g_c --
BLAS-like matrix multiply
- character*1 transa, transb
- integer m, n, k
- double precision alpha, beta
- integer g_a, g_b, g_c
- ga_ddot_patch(g_a, t_a, ailo, aihi, ajlo, ajhi,
g_b, t_b, bilo, bihi, bjlo, bjhi) -- dot product of two arrays (double precision
only; patch version) (Note: patches of different shapes and distrubutions
are allowed, but not recommended, and both patches must have the same number
of elements)
- integer g_a -- [input] integer identifier of first array containing
patch for dot product
- integer t_a -- [input] transpose of first array
- integer ailo, aihi -- [input] high and low indices for i dimension of
patch of array for dot product
- integer ajlo, ajhi -- [input] high and low indices for j dimension of
patch of array for dot product
- integer g_b -- [output]integer identifier of second array contianing
patch for dot product
- integer t_b -- [input] transpose of second array
- integer bilo, bihi -- [input] high and low indices for i dimension of
patch of array for dot product
- integer bjlo, bjhi -- [input] high and low indices for j dimension of
patch of array for dot product
- ga_dscal_patch -- scale the elements in an array by a
constant (patch version)
- ga_dadd_patch -- scale and add two arrays to put result
in a third (patch version)
- ga_ifill_patch -- fill a patch of array with value
(integer version)
- ga_dfill_patch -- fill a patch of array with value
(double version)
- ga_matmul_patch(transa, transb, alpha, beta, g_a, ailo, aihi,
ajlo, ajhi, g_b, bilo, bihi, bjlo, bjhi, g_c, cilo, cihi,
cjlo, cjhi) -- matrix multiply (patch version)
- character transa -- [input] transpose of first array for matrix multiply
- character transb -- [input] transpose of second array for matrix multiply
- double precision alpha -- ??
- double precision beta -- ??
- integer g_a -- [input] integer identifier of first array for matrix multiply
- integer ailo, aihi -- [input] high and low indices for i dimension of
patch of first array for matrix multiply
- integer ajlo, ajhi -- [input] high and low indices for j dimension of
patch of first array for matrix multiply
- integer g_b -- [input] integer identifier of second array for matrix multiply
- integer bilo, bihi -- [input] high and low indices for i dimension of
patch of second array for matrix multiply
- integer bjlo, bjhi -- [input] high and low indices for j dimension of
patch of second array for matrix multiply
- integer g_c -- [input] integer identifier of resultant array for matrix multiply
- integer cilo, cihi -- [input] high and low indices for i dimension of
patch of resultant array for matrix multiply
- integer cjlo, cjhi -- [input] high and low indices for j dimension of
patch of resultant array for matrix multiply
- ga_diag(g_a, g_s, g_v, eval) -- real symmetric generalized eigensolver
(sequential version
ga_diag_seq
also exists)
- integer g_a -- matrix to diagonalize
- integer g_s -- metric
- integer g_v -- global matrix to return evecs
- double precision eval(*) -- local array to return evals
- ga_diag_reuse(reuse,g_a,g_s,g_v,eval) -- a
version of ga_diag for repeated use
- integer reuse -- allows reuse of factorized g_s: flag is
0 first time, greater than 0 for
subsequent calls, less than 0
deletes factorized g_s
- integer g_a -- matrix to diagonalize
- integer g_s -- metric
- integer g_v -- global matrix to return evecs
- double precision eval(*) -- local array to return evals
- ga_diag_std(g_a, g_v, eval) -- standard real symmetric eigensolver
(sequential version also exists)
- integer g_a -- [input] matrix to diagonalize
- integer g_v -- [output] global matrix to return evecs
- double precision eval(*) -- [output] local array to return evals
- ga_symmetrize(g_a) -- symmetrizes matrix A into 0.5(A+A') (NOTE: diag(A)
remains unchanged.)
- integer g_a -- [input] matrix to symmetrize
- ga_transpose(g_a) -- transpose a matrix
- integer g_a -- [input] matrix to transpose
- ga_lu_solve(trans, g_a, g_b) -- solves system of linear equations based
on LU factorization (sequential version
ga_lu_solve_seq
also exists)
- character*1 trans -- [input] transpose or not
- integer g_a -- [input] matrix to diagonalize (coefficient matrix A)
- integer g_b -- [output] rhs matrix B, overwritten on exit
by the solution vector, X of AX = B
- ga_print_patch(g_a, ilo, ihi, jlo, jhi, pretty) -- print a patch of an array to the
screen
- integer g_a -- [input] integer identifier of array to be printed
- integer ilo, ihi -- [input] high and low indices for i dimension of patch
of array to be printed
- integer jlo, jhi -- [input] high and low indices for j dimension of patch
of array to be printed
- integer pretty -- [input] flag for format of output to screen;
- pretty = 0, spew output out with no formatting
- pretty = 1, format output so that it is readable
- ga_print(g_a) -- print an entire array to the screen
- integer g_a -- [input] integer identifier of array to be printed
- ga_copy_patch(trans, g_a, ailo, aihi, ajlo, ajhi, g_b, bilo,
bihi, bjlo, bjhi) -- copy data from a patch of one global
array into another array, (Note: patch can change shape, but total numer of elements
must be the same between the two arrays)
- character*1 trans -- [input] transpose or not
- integer g_a -- [input] integer identifier of array to be copied
- integer ailo, aihi -- [input] high and low indices for i dimension of
patch of array
to be copied
- integer ajlo, ajhi -- [input] high and low indices for j dimension of
patch of array
to be copied
- integer g_b -- [output]integer identifier of array data is to be
copied into
- integer bilo, bihi -- [input] high and low indices for i dimension of patch
of array being copied into
- integer bjlo, bjhi -- [input] high and low indices for j dimension of patch
of array being copied into
- ga_compare_distr_(g_a, g_b) -- compare distributions of two global
arrays
- integer g_a -- [input] integer identifier of first array
- integer g_b -- [output]integer identifier of second array
Operations that may be invoked by any process in true MIMD style:
- ga_get_(g_a, ilo, ihi, jlo, jhi, buf, Id) -- read from a patch of an array
- integer g_a -- [input] integer handle of array
- integer ilo, ihi -- [input] high and low indices for i dimension of region
- integer jlo, jhi -- [input] high and low indices for j dimension of region
- integer buf -- [output] ???
- integer Id -- [output] ???
- ga_put_(g_a, ilo, ihi, jlo, jhi, buf, Id) -- write from a patch of an array
- integer g_a -- [input] integer handle of array
- integer ilo, ihi -- [input] high and low indices for i dimension of region
- integer jlo, jhi -- [input] high and low indices for j dimension of region
- integer buf -- [output] ???
- integer Id -- [output] ???
- ga_acc_(g_a, ilo, ihi, jlo, jhi, buf, Id, alpha) -- accumulate into a patch of an array (double
precision only)
- integer g_a -- [input] integer handle of array
- integer ilo, ihi -- [input] high and low indices for i dimension of region
- integer jlo, jhi -- [input] high and low indices for j dimension of region
- integer buf -- [output] ???
- integer Id -- [output] ???
- integer alpha -- ????
- ga_scatter_(g_a, v, i, j, nv) -- scatter elements of v into an array
- integer g_a -- [input] integer handle of array that elements of v are to be scattered into
- ???? v -- [input] array from which elements are to be scattered
- integer i, j -- [input] array element indices (i,j) as in FORTRAN
- integer nv -- ????
- ga_gather_g_a, v, i, j, nv -- gather elements from an array v into array g_a
- integer g_a -- [input] integer handle of array that elements of v are to be gathered into
- ???? v -- [input] array from which elements are to be gathered
- integer i, j -- [input] array element indices (i,j) as in FORTRAN
- integer nv -- ????
- ga_read_inc_(g_a, i, j, inc) -- atomically read and increment the value
of a single array element (integers only)
- integer g_a -- [input] integer handle of array
- integer i, j -- [input] array element indices (i,j) as in FORTRAN
- integer inc -- [input] amount to increment array element value
- ga_locate(g_a,i,j,owner) -- determine which process `holds' an array
element (i,j)
- integer g_a -- [input] integer handle of array
- integer i, j -- [input] array element indices (i,j) as in FORTRAN
- integer owner -- [output] index number of processor holding the element
- ga_locate_region_(g_a, ilo, ihi, jlo, jhi, map, np) --
determine which process `holds' an
array section
- integer g_a -- [input] integer handle of array
- integer ilo, ihi -- [input] high and low indices for i dimension of region
- integer jlo, jhi -- [input] high and low indices for j dimension of region
- ?????? map -- [output] ???
- integer np -- [output] index number of processor holding the region
- ga_error(string, icode) -- print error message and terminate the
program
- character string -- [input] ????
- integer icode -- [input] integer flag for error code
- ga_summarize(verbose) -- print information about all
allocated arrays (note: assumes no more than 100 arrays are allocated and
are numbered -1000, -999, etc.)
- integer verbose -- [input] if non-zero, print distribution information
Operations that may be invoked by any process in true MIMD style and
are intended to support writing of new functions:
- ga_distribution_(g_a, me, ilo, ihi, jlo, jhi) -- find coordinates of the array patch
that is `held' by a processor
- integer g_a -- [input] integer handle of array
- integer me -- [input] index number of processor holding the patch
- integer ilo, ihi -- [output] high and low indices for i dimension of region
- integer jlo, jhi -- [output] high and low indices for j dimension of region
- ga_access(g_a, ilo, ihi, jlo,jhi, index, Id) -- provides access to a patch of a global array
- integer g_a -- [input] integer handle of array to be accessed
- integer ilo, ihi -- [output] high and low indices for i dimension of region
- integer jlo, jhi -- [output] high and low indices for j dimension of region
- integer index -- ????
- integer Id -- ????
- ga_release(g_a, ilo, ihi, jlo, jhi) -- relinquish access to internal data
- integer g_a -- [input] integer handle of array to be released
- integer ilo, ihi -- [output] high and low indices for i dimension of region
- integer jlo, jhi -- [output] high and low indices for j dimension of region
- ga_release_update_(g_a, ilo, ihi, jlo, jhi) -- relinquish access after data were
updated
- integer g_a -- [input] integer handle of array to be updated and released
- integer ilo, ihi -- [output] high and low indices for i dimension of region
- integer jlo, jhi -- [output] high and low indices for j dimension of region
- ga_check_handle(g_a, fstring) -- verify that a GA handle is valid
- integer g_a -- [input] integer handle of array
- character* fstring -- [input] name of routine originating the check
Operations to support portability between implementations:
- ga_nodeid_() -- find requesting compute process message id
- ga_nnodes_() -- find number of compute processes
- ga_dgop(type, x, n, op) -- equivalent to TCGMSG dgop, for use in data-server
mode where only compute processes participate
- integer type -- [input] integer handle of array
- integer n -- [input]
- double precision x -- [input]
- character op -- [input]
- ga_igop(type, x, n, op) -- equivalent to TCGMSG igop, for use in data-server mode
where only compute processes participate; performs the operation specified by the input variable
op (supported operations include addition, multiplication, maximum, minimum,
and maximum or minimum of the absolute
value), and returns the value in x.
- integer type -- [input] integer handle of array
- integer n -- [input]
- double precision x -- [input/output]
- character op -- [input]
- ga_brdcst(type, buf, len, originator) -- equivalent to TCGMSG brdcst, for use in data server mode
with predefined communicators
- integer type -- [input] integer handle of array
- ???? buf -- [input]
- integer len -- [input]
- integer originator -- [input] number of originating processor
Other utility operations:
- ga_inquire_(g_a, atype, adim1, adim2) -- find the type and
dimensions of the array
- integer g_a -- [input] integer identifier of array
- integer atype -- [output] MA type
- integer adim1, adim2 -- [output] array dimensions (adim1,adim2) as in FORTRAN
- ga_inquire_name_(g_a, array_name) -- find the name of the array
- integer g_a -- [input] integer identifier of array
- character* array_name -- [output] string containing name of the array
- ga_inquire_memory_() -- find the amount of memory in
active arrays
- ga_memory_avail_() -- find the amount of memory (in bytes) left for
GA
- ga_summarize(verbose) -- prints summary info about allocated
arrays
- integer verbose -- [input] if non-zero, print distribution information
- ga_uses_ma_() -- finds if memory in arrays comes from MA
(memory allocator)
- ga_memory_limited_() -- finds if limits were set for
memory usage in arrays
Note that consistency is only guaranteed for
- Multiple read operations (as the data does not change)
- Multiple accumulate operations (as addition is commutative)
- Multiple disjoint put operations (as there is only one writer
for each element)
The application has to worry about everything else (usually by
appropriate insertion of ga_sync calls).
Subroutines that appear in the files of directory .../src/global/src, but
are not in the (ga.tex) document;
- ga_get_local(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc) -- local
read of a 2-dimensional patch of data into a global array
- ga_get_remote(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc) -- read an
array patch from a remote processor
- ga_put_local(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc) -- local
write of a 2-dimensional patch of data into a global array
- ga_put_remote(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc) -- write an
array patch from a remote processor
- ga_acc_local(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc, alpha) -- local
accumulate of a 2-dimensional patch of data into a global array
- ga_acc_remote(g_a, ilo, ihi, jlo, jhi, buf, offset, Id, proc, alpha) -- accumulate an
array patch from a remote processor
- ga_scatter_local(g_a, v, i, j, nv, proc) -- local
scatter of v into a global array
- ga_scatter_remote(g_a, v, i, j, nv, proc) -- scatter of v into an
array patch from a remote processor
- ga_gather_local(g_a, v, i, j, nv, proc) -- local
gather of v into a global array
- ga_gather_remote(g_a, v, i, j, nv, proc) -- gather of v into an
array patch from a remote processor
- ga_dgop_clust(type, x, n, op, group) -- equivalent to TCGMSG dgop, for use in data-server
mode where only compute processes participate
- ga_igop_clust(type, x, n, op, group) -- equivalent to TCGMSG igop, for use in data-server
mode where only compute processes participate
- ga_brdcst_clust(type, buf, len, originator, group) -- internal GA routine
that is used in data server mode with predefined communicators
- ga_debug_suspend() -- ??? option to suspend debugging for a particular process
- ga_copy_patch_dp(t_a, g_a, ailo, aihi, ajlo, ajhi,
g_b, bilo, bihi, bjlo, bjhi) -- copy a patch by column order (Fortran convention)
- ga_print_stats_() -- print GA statistics for each process
- ga_zeroUL(uplo, g_A) -- set to zero the L/U tirangle part of an NxN
double precision global array A
- ga_symUL(uplo, g_A) -- make a symmetric square matrix from a
double precision global array A in L/U triangle format
- ga_llt_s(uplo, g_A, g_B, hsA) -- solves a system of linear equations [A]X = [B],
- ga_cholesky(uplo, g_a) -- computes the Cholesky factorization of an NxN
double precision symmetric positive definite matrix to obtain the L/U factor
on the lower/upper triangular part of the matrix
- ga_llt_f(uplo, g_A, hsA) -- computes the Cholesky factorization of an NxN
double precision symmetric positive definite global array A
- ga_llt_i(uplo, g_A, hsA) -- computes the inverse of a global array that is
the lower triangle L or the upper triagular Cholesky factor U of an NxN
double precision symmetric positive definite global array (LL' or U'U)
- ga_llt_solve(g_A, g_B) -- solves a system of linear equations [A]X = [B]
using the cholesky factorization of an NxN
double precision symmetric positive definite global array A
- ga_spd_invert(g_A) -- computes the inverse of a double precision array
using the cholesky factorization of an NxN
double precision symmetric positive definite global array A
- ga_solve(g_A, g_B) -- solves a system of linear equations [A]X = [B], trying
first to use
the Cholesky factorization routine; if not successful, calls the LU
factorization routine
ga_llt_solve
, and solves the system with forward/backward
substitution
- ga_ma_base_address(type, address) -- auxiliary routine to provide MA
base addresses of the data (calls C routines ga_ma_get_ptr())
- ga_ma_sizeof(type) -- auxiliary routine to provide MA
sizes of the arrays (calls C routines ga_ma_diff())
In some cases (notably workstation clusters) the global array tools
use a ``data-server'' process on each node in addition to the compute
processes. Data-server processes don't follow the same flow of
execution of compute processes, so TCGMSG global operations
(brdcst
, igop
, and dgop
) will hang when invoked.
The global array toolkit provides ``wrapper'' functions
(ga_brdcst
, ga_igop
, and ga_dgop
) which properly
exclude data server processes from the global communication and must
be used instead of the corresponding TCGMSG functions.
The limited buffering available on the IBM SP-1/2 means that GA and
message-passing operations cannot interleave as readily as they do on
other machines. Basically, in transitioning from GA to message
passing or vice versa the application must call ga_sync().
ChemIO is a high-performanc parallel I/O abstract programming interface for
computational chemistry applications6.2.
The development of out-of-core methods for
computational chemistry requires efficient and portable implementation of often complex
I/O patterns. The ChemIO interface addresses this problem by providing high
performance implementations on multiple platforms that hides some of the
complexity of the underlying I/O patterns from the programmer through the use of
high-level libraries. The interface is tailored to the requirements of
large-scale computational chemistry problems and supports three distinct
I/O models. These are
- Disk Resident Arrays (DRA) -- for explicit transfer between global
memory and secondary storage, allowing the programmer to manage the movement of array
data structures between local memory, remote memory, and disk storage. This component
supports collective I/O operations, in which multiple processors cooperate in
a read or write operation and thereby enable certain useful optimizations.
- Exclusive Access Files (EAF) -- for independent I/O to and from
scratch files maintained on a per-processor basis. It is used for out-of-core
computations in calculational modules that cannot easily be organized to perform collective I/O
operations.
- Shared Files (SF) -- for creation of a scratch file that can be
shared by all processors. Each processor can perform noncollective read or write
operations to an arbitrary location in the file.
These models are implemented in three user-level libraries in ChemIO; Disk Resident
Arrays, Exclusive Access Files, and Shared Files. These libraries are layered on
a device library, the Elementary I/O library (ELIO), which provides a portable
interface to different file systems. The DRA, EAF, and SF modules are fully
independent. Each one can be modified or even removed without affecting the others.
ELIO itself is not exposed to applications.
The ELIO library implements a set of elementary I/O primitives including blocking and
non-blocking versions of read and write operations, as well as wait and probe operations to
control status of non-blocking read/writes. It also implements file operations such
as open, close, delete, truncate, end-of-file detection, and an inquiry function
for the file/filesystem that returns the amount of available space and the filesystem
type. Most of these operations are commonly seen in various flavors of the UNIX
filesystem. ELIO provides an abstract portable interface to such functionality.
(Insert gory details here.)
The computational chemistry parallel algorithms in NWChem have been implemented in terms
of the Global Arrays shared memory programming model. The GA library (see Section
6.1.3) uses a shared memory programming model in which data locality
is managed explicitly by the programmer. This management is achieved by explicit
calls to functions that transfer data between a global address space (a distributed
array) and local storage. The GA library allows each process in a MIMD parallel
program to access asynchronously logical blocks of physically distributed matrices
without the need for explicit cooperation from other processes.
The GA model exposes to the programmer the non-uniform memory access (NUMA) characteristics
of modern high-performance computer systems. The disk resident array (DRA) model extends
the GA model to another level in the storage hierarchy, namely, secondary storage. It
introduces the concept of a disk resident array -- a disk-based representation of an
array -- and provides functions for transferring blocks of data between global arrays
and disk arrays. It allows the programmer to access data located on disk via a
simple interface expressed in terms of arrays rather than files.
At the present time, (NOTE: The source of this statement is a document
created 5/10/95) all operations are declared to be collective.
This simplifies implementation on machines where only some processors
are connected to I/O devices.
Except where stated otherwise, all operations are synchronous (blocking)
which means that control is returned to the calling process only after
the requested operation completes.
All operations return an error code with value 0 if successful, greater than
zero if not successful.
A program that uses Disk Resident Arrays should look like the following example:
program foo
#include "mafdecls.h"
#include "global.fh"
#include "dra.fh"
c
call pbeginf() ! initialize TCGMSG
if(.not. ma_init(...)) ERROR ! initialize MA
call ga_initialize() ! initialize Global Arrays
if(dra_init(....).ne.0) ERROR ! initialize Disk Arrays
c do work
if(dra_terminate().ne.0)ERROR ! destroy DRA internal data structures
call ga_terminate ! terminate Global Arrays
call pend() ! terminate TCGMSG
end
List of DRA operations:
- status = dra_init(max_arrays, max_array_size, total_disk_space, max_memory) --
initializes disk resident array I/O subsystem;
max_array_size, total_disk_space and max_memory are given
in bytes;
max_memory specifies how much local memory per processor the
application is willing to provide to the DRA I/O subsystem for
buffering.
The value of "-1" for any of input arguments means:
"don't care", "don't know", or "use defaults"
- integer max_arrays -- [input]
- double precision max_array_size -- [input]
- double precision total_disk_space -- [input]
- double precision max_memory -- [input]
- status = dra_terminate() --
closes all open disk resident arrays and shuts down
DRA I/O subsystem.
- status = dra_create(type,dim1,dim2,name,filename,mode,rdim1,rdim2,d_a) --
creates new disk resident array with specified dimensions
and type.
(Note: Only one DRA object can be stored in DRA meta-file identified by
filename.
DRA objects persist on the disk after calling dra_close().
dra_delete() should be used instead of dra_close() to delete disk
array and associated meta-file on the disk.
Disk array is implicitly initialized to "0".
- integer type -- [input] MA type identifier
- integer dim1 -- [input]
- integer dim2 -- [input]
- character*(*) name -- [input]
- character*(*) filename -- [input]
name of an abstract
meta-file that will store the data on the disk. The
- integer mode -- [input] specifys access permissions
as read, write, or read-and-write
- integer rdim1,rdim2 -- [input]
specifies dimensions of a
"typical" request; value of "-1" for either rdim1 or rdim2
means "unspecified"
- integer d_a -- [output] DRA handle
- status = dra_open(filename, mode, d_a) --
Open and assign DRA handle to disk resident array stored in DRA
meta-file filename. Disk arrays that are created
with dra_create and saved by calling dra_close can be
later opened and accessed by the same or different
application.
- character*(*) filename -- [input]
name of an abstract
meta-file that will store the data on the disk. The
- integer mode -- [input] specifys access permissions
as read, write, or read-and-write
- integer d_a -- [output] DRA handle
- status = dra_write(g_a, d_a, request) --
writes asynchronously specified global array to specified
disk resident array;
dimensions and type of g_a and d_a must match. If dimensions
don't match, dra_write_section should be used instead.
The operation is by definition asynchronous (but could
be implemented as synchronous i.e., it would return only
when I/O is done.)
- integer g_a -- [input] GA handle
- integer d_a -- [input] DRA handle
- integer request -- [output] request id
- status = dra_write_section(transp, g_a, gilo, gihi, gjlo, gjhi,
d_a, dilo, dihi, djlo, djhi, request) --
writes asynchronously specified global array section to
specified disk resident array section:
OP(g_a[ gilo:gihi, gjlo:gjhi]) -> d_a[ dilo:dihi, djlo:djhi],
where OP is the transpose operator (.true./.false.).
Returns error if the two section's types or sizes mismatch.
See dra_write specs for discussion of request.
- logical transp -- [input] transpose operator
- integer g_a -- [input] GA handle
- integer d_a -- [input] DRA handle
- integer gilo -- [input]
- integer gihi -- [input]
- integer gjlo -- [input]
- integer gjhi -- [input]
- integer dilo -- [input]
- integer dihi -- [input]
- integer djlo -- [input]
- integer djhi -- [input]
- integer request -- [output] request id
- status = dra_read(g_a, d_a, request) --
reads asynchronously specified global array from specified
disk resident array;
Dimensions and type of g_a and d_a must match; if dimensions
don't match, dra_read_section could be used instead.
See dra_write specs for discussion of request.
- logical transp -- [input] transpose operator
- integer g_a -- [input] GA handle
- integer d_a -- [input] DRA handle
- integer request -- [output] request id
- status = dra_read_section(transp, g_a, gilo, gihi, gjlo, gjhi,
d_a, dilo, dihi, djlo, djhi, request) --
reads asynchronously specified global array section from
specified disk resident array section:
OP(d_a[ dilo:dihi, djlo:djhi]) -> g_a[ gilo:gihi, gjlo:gjhi]
where OP is the transpose operator (.true./.false.).
See dra_write specs for discussion of request.
- logical transp -- [input] transpose operator
- integer g_a -- [input] GA handle
- integer d_a -- [input] DRA handle
- integer gilo -- [input]
- integer gihi -- [input]
- integer gjlo -- [input]
- integer gjhi -- [input]
- integer dilo -- [input]
- integer dihi -- [input]
- integer djlo -- [input]
- integer djhi -- [input]
- integer request -- [output] request id
- status = dra_probe(request, compl_status) --
tests for completion of dra_write/read or
dra_write/read_section operation which sets the value
passed in request argument;
completion status is 0 if the operation has been completed, non-zero
if not done yet
- integer request -- [input] request id
- integer compl_status -- [output] completion status
- status = dra_wait(request) --
blocks operations until completion of dra_write/read or
dra_write/read_section operation which set the value
passed in request argument.
- integer request -- [input] request id
- status = dra_inquire(d_a, type, dim1, dim2, name, filename) --
returns dimensions, type, name of disk resident array,
and filename of DRA meta-file associated with d_a
handle.
- integer d_a -- [input] DRA handle
- integer type -- [output]
- integer dim1 -- [output]
- integer dim2 -- [output]
- character*(*) name -- [output]
- character*(*) filename -- [output]
- status = dra_delete(d_a) --
deletes a disk resident array associated with d_a handle.
Invalidates handle.
The corresponding DRA meta-file is destroyed.
- integer d_a -- [input] DRA handle
- status = dra_close(d_a) --
closes DRA meta-file associated with d_a handle and
deallocates data structures corresponding to this disk
array. Invalidates d_a handle. The array on the disk is
persistent.
- integer d_a -- [input] DRA handle
- subroutine dra_flick() --
returns control to DRA for a VERY short time to improve
progress of pending asynchronous operations.
The EAF module supports a particularly simple I/O abstraction in which each processor
in a program is able to create files that it alone has access to. The EAF interface
is similar to the standard C UNIX I/O interface and is implemented as a thin
wrapper on the ELIO module. It provides Fortran and C applications with capabilities
that include
- eaf_write and eaf_read -- blocking write and read operations
- eaf_awrite and eaf_aread -- non-blocking (asynchronous)
write and read operations
- eaf_wait and eaf_probe -- operations that can be used to control
or determine completion status of outstanding nonblocking I/O requests
- eaf_stats -- operation that takes a full path to a file or directory and
returns the amount of disk space available and the filesystem type (e.g., PFS, PIOFS,
standard UNIX, etc.)
- eaf_length and eaf_truncate -- operations that can allow the programmer
to determine the length of a file, and truncate a file to a specified length.
- eaf_eof -- operation that determines whether the enf of the file has been reached
- eaf_open, eaf_close, and eaf_delete -- functions that interface to
UNIX open, close, and unlink operations
The syntax of EAF is similar to the standard Unix C
file operations, although there are some differences, as a result of
introducing new semantics or extended features
available through EAF.
The primary functionality of EAF is illustrated here by
tracing execution of example program segments.
Example 1:
basic open-write-read-close sequence.
#include "chemio.h"
#include "eaf.fh"
integer fh ! File Handle
integer sz ! Return value of size written
integer stat ! Return status
integer buf(100) ! Data to write
fh = EAF_OpenPersist('/tmp/test.out', ELIO_RW) <- We probably want
CHEMIO_RW here
sz = EAF_Write(fh, 0, buf, 100*EAF_SZ_INT) <- What's the NWChem
macro for int size?
if(sz .ne. 100*EAF_SZ_INT)
$ write(0,*) 'Error writing, wrote ', sz, ' bytes'
sz = EAF_Read(fh, 0, buf, 100*EAF_SZ_INT)
if(sz .ne. 100*EAF_SZ_INT)
$ write(0,*) 'Error reading, read ', sz, ' bytes'
stat = EAF_Close(fh)
end
The include file 'chemio.h' defines the permission macros ELIO_R, ELIO_W, and
ELIO_RW for read, write, and read-write permissions, respectively. The
header file 'eaf.fh' is a Fortran program segment externally defining the EAF
routines and must appear before any executable code using EAF.
EAF_OpenPersist opens a persistent file, as opposed to a scratch file
(EAF_OpenScratch) which is deleted when it is closed. This file is named
'/tmp/test.out' and has read-write permissions. The returned value is the
file handle for this file and should not be directly manipulated by the user.
EAF_Write writes to the file opened with file handle, fh, at absolute offset
0. It is legal to write a scalar or array, for instance in the above
example both 'buf' and 'buf(1)' have the same meaning. The last argument is
the number of bytes to be written. It is important to multiply the number of
array elements by the element size. The following macros are provided in
'eaf.fh':
- EAF_SZ_BYTE
- EAF_SZ_CHARACTER
- EAF_SZ_INTEGER
- EAF_SZ_LOGICAL
- EAF_SZ_REAL
- EAF_SZ_COMPLEX
- EAF_SZ_DOUBLE_COMPLEX
- EAF_SZ_DOUBLE_PRECISION
The return value is the number of bytes written. If this number does not
match the requested number of bytes to be written, an error has occured.
Example 2: read/write operations
EAF_Read is syntactically and semantialy identical to EAF_Write, except the
buffer is read, not written.
#include "chemio.h"
#include "eaf.fh"
integer fh ! File Handle
integer id1, id2 ! asynchronous ID handles
integer stat ! Return status
integer pend ! Pending status
integer iter ! Iterations counter
integer buf(100), x ! Data
iter = 0
fh = EAF_OpenScratch('/piofs/mogill/test.out', ELIO_RW)
stat = EAF_AWrite(fh, 0, buf, 100*EAF_SZ_INT, id1)
if(stat .ne. 0) write(0,*) 'Error doing 1st asynch write. stat=', stat
stat = EAF_AWrite(fh, 100*EAF_SZ_INT, x, 1*EAF_SZ_INT, id2)
if(stat .ne. 0) write(0,*) 'Error doing 2nd asynch write. stat=', stat
100 stat = EAF_Probe(id1, pend)
iter = iter + 1
write(0,*) 'Waiting', iter
if(iter .lt. 100 .and. pend .eq. ELIO_PENDING) goto 100
EAF_Wait(id1)
stat = EAF_ARead(fh, 0, buf, 100*EAF_SZ_INT, id1)
if(stat .ne. 0) write(0,*) 'Error doing 1st asynch read. stat=', stat
EAF_Wait(id2)
stat = EAF_AWrite(fh, 100*EAF_SZ_INT, x, 1*EAF_SZ_INT, id2)
if(stat .ne. 0) write(0,*) 'Error doing 2nd asynch write. stat=', stat
EAF_Wait(id2)
EAF_Wait(id1)
stat = EAF_Close(fh)
end
This example demonstrates use of asynchronous reading and writing. The
entire buffer 'buf' is written to offset 0, the beginning of the. The file
is simultaniously written to from the scalar x in the position following the
buffer. The positions in the file are determined by abosulte offset argument
as with the synchronous write.
The first write, id1, is repeatedly probed for completion for 100 tries or
until completion, whichever comes first. The two possible pending statuses
are ELIO_DONE and ELIO_PENDING.
When a completed asynchronous operation is detected with EAF_Wait or
EAF_Probe, the id is invalidated with ELIO_DONE. The following EAF_Wait(id1)
blocks until id1 completes. Using EAF_Probe or EAF_Wait with an invalidated
ID has no effect.
Once id1 is freed, it is reused in the first asynchronous read statement.
The following EAF_Wait blocks for completion and invalidation of id2, which
is then used to asynchronously read the scalar X.
The EAF_Close deletes the file because it was opened as a scratch file.
List of EAF Functions
- integer EAF_OpenPersist(fname, type) -- opens a persistent file; returns file
handle, or -1 upon error
character *(*) fname
integer type
- character fname -- Character string of a globally unique filename (path may
be fully qualified)
- integer type -- Read write permissions. Legal values are ELIO_W, ELIO_R,
and ELIO_RW
- integer EAF_OpenScratch(fname, type) -- open a scratch file that is automatically
deleted upon close; returns file handle, or -1 upon error
- character fname -- Character string of a globally unique filename (path may
be fully qualified)
- integer type -- Read write permissions. Legal values are ELIO_W, ELIO_R,
and ELIO_RW
- integer EAF_Write(fh, offset, buf, bytes) -- synchronously write to the
file specified by the file handle; returns number of bytes written, or -1 on error
- integer fh - File Handle
- integer offset -- Absolute offset, in bytes, at which to start writing
- any buf -- Scalar or array of data
- integer bytes -- Size of buffer, in bytes
- integer EAF_AWrite(fh, offset, buf, bytes, req_id) --
asynchronously writes to the file specified by the file handle,
and returns a handle to the asynchronous operation;
if there are more than MAX_AIO_REQ asynchronous requests (reading
or writing) pending, the operation is handled in a synchronous
fashion and returns a "DONE" handle.
Returns
0 if successful, -1 if an error occurs.
(On architectures where asynchronous I/O operations are not supported,
all requests are handled synchronously, returning a "DONE" handle.)
- integer fh -- [input] file descriptor
- integer offset -- [input] absolute offset, in bytes, to start writing at
- any buf - [input] scalar or array of data
- integer bytes -- [input] size of buffer, in bytes
- integer req_id -- [output] handle of asynchronous operation
- integer EAF_Read(fh, offset, buf, bytes) --
synchronously reads from the file specified by the file handle;
returns number of bytes read, or -1 if an error occurs
- integer fh -- [input] file descriptor
- integer offset -- [input] absolute offset, in bytes, to start writing at
- any buf -- [input] scalar or array of data
- integer bytes -- [input] size of buffer, in bytes
- integer EAF_ARead(fh, offset, buf, bytes, req_id) --
asynchronously reads from the file specified by the file handle,
and returns a handle to the asynchronous operation.
If there are more than MAX_AIO_REQ asynchronous requests (reading
or writing) pending, the operation is handled in a synchronous
fashion and returns a "DONE" handle.
On architectures where asynchronous I/O operations are not supported,
all requests are handled synchronously, returning a "DONE" handle.
Returns
0 if successful; -1 if an error occurs.
- integer fh -- [input] file descriptor
- integer offset -- [input] absolute offset, in bytes, to start writing at
- any buf -- [input] scalar or array of data
- integer bytes -- [input] size of buffer, in bytes
- integer req_id -- [output] handle of asynchronous operation
- integer EAF_Probe(id, status) --
determines if an asynchronous request is completed or pending;
returns
ELIO_OK if successful, or ELIO_FAIL if not successful;
'status' returns ELIO_PENDING if the asyncronous operation is
not complete, or ELIO_DONE if finished.
When the asynchronous request is complete, the 'id' is invalidated
with ELIO_DONE.
- integer id -- [input] handle of asynchronous request
- integer status -- [output] pending or completed status argument
- integer EAF_Wait(id) --
waits for the completion of the asynchronous request, id;
returns
ELIO_OK if successful, or ELIO_FAIL if not successful;
'id' is invalidated with ELIO_DONE
- integer id -- [input] handle of asynchronous request
- integer EAF_Close(fh) --
closes a file;
returns
ELIO_OK if successful; aborts if not successful
- integer fh -- [input] file handle
The Shared File module supports the abstraction of a single contiguous secondary storage
address space (a "file") that every processor has access to. Processes create and
destroy SF objects in a collective fashion, but all other file I/O operations are non-
collective. A shared file can be thought of as a one-dimensional array of bytes located
in shared memory, except that the library interface is required to actually access the
data.
The library is capable of determining the striping factor and all other internal
optimizations for the "file". The programmer has the option, however, of giving the library a few
helpful hints, to reduce the number of decisions the interface must take care of. These hints
are supplied when the shared file is created, and can be any or all of the following:
- Specify a hard limit (not to be exceeded) for the file size.
- Specify a soft limit for the file size; that is, an estimate of the expected
shared file size, which can be exceeded at run time, if necessary.
- Specify the size of a "typical" request.
Non-collective I/O operations in SF include read, write, and wait operations. Read and write
operations transfer the specifeid number of bytes between local memory and disk at a
specified offset. The library does not perform any explicit control of consistency
in concurrent accesses to overlapping sections of the shared files. For example, SF
semantics allow a write operation to return before the data transfer is complete.
This requires special care in programs that perform write operations in critical sections,
since unlocking access to a critical section before write completes is unsafe.
To allow mutual exclusion control in access to shared files, the sf_wait function
is provide. It can be used
to enforce completion of the data transfer so that the data can be safely accessed
by another process after access to the critical section is released by the writing
process. The function sf_waitall can be used to force the program to wait for completion
of multiple SF operations specified through an arugment arry of request identifiers.
The actual size of a shared file might grow as processes perform write operations
beyond the current end-of-file boundary. Data in shared files are implicitly initialized
to zero, which means that read operations at locations that have not been written to
return zero values. However, reading behond the current end-of-file boundary is an
error.
Shared files can be used to build other I/O abstractions. In many cases, this process
requires adding an additional consistency control layer. A single file pointer view,
for example, can be implemented by adding an automatically modifiable pointer
variable located in shared memory by using the GA toolkit, or some other means.
The shared files model consists of the following elements:
- Shared files are non-persistent (temporary)
- Shared files resemble one-dimensional arrays in main memory
- Each process can independently read/write to any location in the file
- The file size has a hard limit specified when it is created
- User can also specify (or use "don't know" flag) the estimated approximate
file size - might be exceeded at run-time (a hint)
- sf_flush flushes the buffers so that previously written data goes to the disk
before the routine returns.
- All routines return an error code: "0" means success.
- sf_create and sf_destroy are collective
- file, request sizes, and offset (all in bytes) are DOUBLE PRECISION arguments,
all the other arguments are INTEGERS
- read/writes are asynchronous
List of SF Functions:
integer sf_create(fname, size_hard_limit, size_soft_limit, req_size, handle)
fname -- meta-file name
size_hard_limit -- max file size in bytes not to be exceeded (a hint)
size_soft_limit -- estimated file size (a hint)
req_size -- size of a typical request (a hint)
handle -- returned handle to the created file
Creates shared file using name and path specified in fname as a template.
Function req_size specifies size of a typical request (-1 = "don't know").
integer sf_write(handle, offset, bytes, buffer, request_id)
handle -- file handle returned from sf_create [in]
offset -- location in file (from the beginning)
where data should be written to [in]
buffer -- local array to put the data [in]
bytes -- number of bytes to read [in]
request_id -- id identifying asynchronous operation [out]
asynchronous write operation
integer sf_read(handle, offset, bytes, buffer, request_it)
handle -- file handle returned from sf_create [in]
offset -- location in file (from the beginning)
where data should be read from [in]
buffer -- local array to put the data [in]
bytes -- number of bytes to read [in]
request_id -- id identifying asynchronous operation [out]
asynchronous read operation
integer sf_wait(request_id)
request_id -- id identifying asynchronous operation [in/out]
blocks calling process until I/O operation associated with id completed,
invalidates request_id
integer sf_waitall(list, num)
list(num) -- array of ids for asynchronous operations [in/o]
num -- number of entries in list [in]
blocks calling process until all "num" I/O operations associated with ids
specified in list completed, invalidates ids on the list
integer sf_destroy(handle)
handle -- file handle returned from sf_create [in]
The run time data base is the parameter and information repository for
the independent modules (e.g., SCF, RIMP2) comprising NWChem.
This approach is similar in spirit to the GAMESS
dumpfile or the Gaussian checkpoint file. The only way modules can
share data is via the database or via files, the names of which are stored in
the database (and may have default values). Information is stored
directly in the database as typed arrays, each of which is described by
- a name, which is a simple string of ASCII characters (e.g.,
"reference energies"
),
- the type of the data (real, integer, logical, or character),
- the number of data items, and
- the actual data (an array of items of the specified type).
A database is simply a file and is opened by name. Usually there is
just one database per calculation, though multiple databases may be
open at any instant.
By default, access to all open databases occur in parallel, meaning
that
- all processes must participate in any read/write of any database
and any such operation has an implied synchronization
- writes to the database write the data associated with process
zero but the correct status of the operation is returned to all
processes
- reads from the database read the data named by process zero and
broadcast the data to all processes, checking dimensions and types
of provided arrays
Alternatively, database operations can occur sequentially. This means
that
only process zero can read/write the database, and this happens
with no communication or synchronization with other processes.
Any read/write operations by any process other than process zero is
an error.
Usually, all processes will want the same data at the same time from
the database, and all processes will want to know of the success or
failure of operations. This is readily done in the default parallel
mode. An exception to this is during the reading of input.
Usually, only process zero will read the input and needs to store the
data directly into the database without involving the other processes.
This is done using sequential mode.
The following subsections contain a detailed listing of the C and Fortran API.
Programs using RTDB routines must include the appropriate header file;
rtdb.fh for Fortran, or rtdb.h for C. These files define the return
types for all rtdb functions. In addition, rtdb.fh specifies the
following parameters
- rtdb_max_key -- an integer parameter that defines the maximum
length of a character string key
- rtdb_max_file -- an integer parameter that defines the maximum
length of a file name
The Fortran routines return logical values; .true.
on success, .false.
on failure. The C routines return integers; 1 on success, or 0 on failure.
All rtdb_*
functions are also mirrored by routines rtdb_par_*
in which process 0 performs the operation and all other processes
are broadcast the result of a read and discard writes.
The functions that control opening, closing, writing to and reading information
from the runtime database are described in this section.
C routine:
int rtdb_parallel(const int mode)
Fortran routine:
logical function rtdb_parallel(mode)
logical mode [input]
This function sets the parallel access mode of all databases to mode and returns the
previous setting. If mode is true then accesses are in parallel, otherwise they are
sequential.
C routine:
int rtdb_open(const char *filename, const char *mode, int *handle)
Fortran routine:
logical function rtdb_open(filename, mode, handle)
character *(*) filename [input]
character *(*) mode [input]
integer handle [output]
This function opens a database. It requires the following arguments:
- Filename -- path to file associated with the data base
- mode -- specify initial condition of data base
- new -- Open only if it does not exist already
- old -- Open only if it does exist already
- unknown -- Create a new database or open the existing database Filename (preserving contents)
- empty -- Create a new database or open the existing database Filename (deleting contents)
- scratch -- Create a new database or open the existing database Filename (deleting contents)
that will be automatically deleted upon closing. Note that items
cached in memory are not written to disk when this mode is specified.
- handle -- returns an integer handle which must be used in all future
references to the database
C routine:
int rtdb_close(const int handle, const char *mode)
Fortran routine:
logical function rtdb_close(handle, mode)
integer handle [input]
character*(*) mode [input]
This function closes a database. It requires the following arguments:
- handle -- unique handle created when the database was first opened
- mode -- specifies the fate of the information in the database after closing;
- keep -- Preserve the data base file to enable restart
- delete -- Delete the data base file, freeing all resources
When closing a database file that has been opened with the rtdb_open argument
mode specified as scratch, the value for mode for the function
rtdb_close is automatically set to delete. Database files needed for
restart must not be opened as scratch files.
C routine:
int rtdb_put(const int handle, const char *name, const int ma_type,
const int nelem, const void *array)
Fortran routine:
logical function rtdb_put(handle, name, ma_type, nelem, array)
integer handle [input]
character *(*) name [input]
integer ma_type [input]
integer nelem [input]
<ma_type> [input]
nelem [input]
array [input]
This function inserts an entry into the database, replacing the previous entry.
It requires the following arguments:
- handle -- unique handle created when the database was first opened
- name -- entry name of data array to be put into the database (null-terminated character string)
- ma_type -- MA type of the entry
- nelem -- number of elements of the given type
- array -- array of length nelem containing data to be inserted
C routine:
int rtdb_get(const int handle, const char *name, const int ma_type,
const int nelem, void *array)
Fortran routine:
logical function rtdb_get(handle, name, ma_type, nelem, array)
integer handle [input]
character *(*) name [input]
integer ma_type [input]
integer nelem [input]
<ma_type> [output]
nelem [output]
array [output]
This function gets an entry from the data base. It requires the following arguments:
- handle -- unique handle created when the database was first opened
- name -- entry name of data array to get from the database (null-terminated character string)
- ma_type -- MA type of the entry (which must match entry type in the database)
- nelem -- size of array in units of ma_type
- array -- buffer of length nelem defined by calling routine to store the returned data
logical function rtdb_cput(handle, name, nelem, buf)
integer handle [input]
character *(*) name [input]
character *(*) buf [input]
logical function rtdb_cget(handle, name, nelem, buf)
integer handle [input]
character *(*) name [input]
character *(*) buf [output]
These functions are Fortran routines to provide put/get functionality for character
variables. The functions have identical argument lists, the only difference between them is that for
rtdb_cput, the specified character data is put into the database, and for
rtdb_cget the data is copied from the database. The arguments are as follows;
- handle -- unique handle created when the database was first opened
- name -- entry name of data array to get from the database (null-terminated character string)
- buf -- character variable to be put into the database (for rtdb_cput, or character
buffer in calling routine to store returned character data (for rtdb_cget.
C routine:
int rtdb_ma_get(const int handle, const char *name, int *ma_type,
int *nelem, int *ma_handle)
Fortran routine:
logical function rtdb_ma_get(handle, name, ma_type, nelem, ma_handle)
integer handle [input]
character *(*) name [input]
integer ma_type [output]
integer nelem [output]
integer ma_handle [output]
This function returns the MA type, number of elements of that type, and the MA handle
of the entry specified. (The MA handle is to memory
automatically allocated to hold the data read from the database.) the function requires
the following arguments:
- handle -- unique handle created when the database was first opened
- name -- entry name of information to get from the database (null-terminated character string)
- ma_type -- returns MA type of the entry in the database
- nelem -- returns number of elements of type ma_type in data
- ma_handle -- returns MA handle to data
C routine:
int rtdb_get_info(const int handle, const char *name, int *ma_type,
int *nelem, char date[26])
Fortran routine:
logical function rtdb_get_info(handle, name, ma_type, nelem, date)
integer handle [input]
character *(*) name [input]
integer ma_type [output]
integer nelem [output]
character*26 date [output]
This function queries the database to obtain the number of elements in the
specified entry, it's MA type, and the date of its insertion into the rtdb.
It requires the following arguments:
- handle -- unique handle created when the database was first opened
- name -- entry name of data for which information is to be obtained
(null-terminated character string in
C, standard FORTRAN character constant or variable in FORTRAN)
- ma_type -- returns MA type of the entry
- nelem -- returns number of elements of the given type
- date -- returns date of insertion (null-terminated
character string or FORTRAN character variable)
C routines:
int rtdb_first(const int handle, const int namelen, char *name)
int rtdb_next(const int handle, const int namelen, char *name)
Fortran routines:
logical function rtdb_first(handle, name)
integer handle [input]
character *(*) name [output]
logical function rtdb_next(handle, name)
integer handle [input]
character *(*) name [output]
These routines enable iteration through the items in the database in
an effectively random order. The function rtdb_first returns
the name of the first user-inserted entry in the datbase. The function
rtdb_next returns the name of the user-inserted entry put into the
data base after the entry identified on the previous call to rtdb_next
(or the call to rtdb_first, on the first call to rtdb_next).
The arguments required for the C routines are as follows:
- handle -- unique handle created when the database was first opened
- namelen -- size of buffer in calling routine required to store name
- name -- buffer to hold returned name of next (or first) entry in the database
The Fortran routines require the same arguments for handle and name, but it is
not necessary to define the length of the buffer required.
An example of the use of these functions in C is to count and print the name of all
entries in the database. The coding for this can be implemented as follows;
char name[256];
int n, status, rtdb;
for (status=rtdb_first(rtdb, sizeof(name), name), n=0;
status;
status=rtdb_next(rtdb, sizeof(name), name), n++)
printf("entry %d has name '%s'\n", n, name);
C routine:
int rtdb_delete(const int handle, const char *name)
Fortran routine:
logical function rtdb_delete(handle, name)
integer handle [input]
character *(*) name [input]
This function deletes an entry from the database.
- handle -- unique handle created when the database was first opened
- name -- entry name of data to delete from the database (null-terminated character string)
This function does not return any arguments. The value the function itself returns as indicates
success or failure of the delete operation. The function returns as
- 1 if key was present and successfully deleted
- 0 if key was not present, or if an error occured
C routine:
int rtdb_print(const int handle, const int print_values)
Fortran routine:
logical function rtdb_print(handle, print_values)
integer handle [input]
logical print_values [input]
This function prints the contents of the data base to STDOUT. It requires
the following arguments:
- handle -- unique handle created when the database was first opened
- print_values -- (boolean flag) if true, values as
well as keys are printed out.
Next: 7. Utilities
Up: NWCHEM Programmer's Guide, Release
Previous: 5. Integral Application Programmer's
Contents
Dunyou Wang
2009-03-13