DESCRIPTION
Kuck and Associates' CLASSPACK Basic Math Library contains
highly optimized versions of standard computational building
blocks which users of high-performance computers can use to
improve the speed and portability of their numerical appli-
cations programs. The BLAS (Basic Linear Algebra Subpro-
grams) routines, which make up the bulk of this library, are
widely used in dense numerical linear algebra programs.
Since they are the most useful vector and matrix operations,
the availability of versions tuned to the user's machine
provides a simple way for speeding up existing programs and
writing new numerical programs. They also enhance portabil-
ity, since programs can contain standard calls to a widely-
available set of library routines. The BLAS routines were
developed in three groups. The BLAS 1 routines consist of
vector-vector operations such as dot products and "scalar *
vector + vector". The BLAS 2 routines provide matrix-vector
operations such as matrix-vector multiply and the rank-1
update of a matrix. The BLAS 3 routines provide matrix-
matrix operations such as rank-k update and the solution of
triangular systems with multiple right-hand sides. This
library contains all three groups
The Fourier Transform is a common operation in many fields,
especially signal processing. Kuck and Associates' Basic
Math Library includes three routines (in single-precision
and double-precision versions) for performing FFTs (Fast
Fourier Transforms). These have been tuned for optimal per-
formance on this machine.
While FFT routines are not as standardized as the BLAS rou-
tines, portability is eased by limiting changes to small
sections of code which interface between the user's program
and the available library.
In addition to BLAS and Fast Fourier Transform routines,
special routines are provided for machines with Intel i860
processors. These routines include the sparse BLAS routines
for performing the scalar-vector product and dot product,
special scalar-vector product routines designed for the case
when the result vector resides in the on-chip cache, several
special vector-vector operations, programs for interchanging
the dimensions of 2-, 3-, and 4-dimensional arrays, and sub-
routines for solving tridiagonal, block tridiagonal, penta-
diagonal, and block pentadiagonal systems.
The routines in this library are packaged as the Kuck &
Associates' Basic Math Library. The Basic Math Library
object routines are packaged as a library archive called
libkmath.a. Programs must be linked with this library by
using the -lkmath option on the Fortran compiler command
line.
The descriptions of the routines in this manual specify the
legal values for all scalar integer and character arguments.
The routines themselves check these arguments when they are
called to determine if they have legal values. If the value
of one of the arguments is not legal, an error action is
taken. The action taken depends on the class of routine.
o BLAS Level 1 Routines: Return from the program without
performing any computations.
o BLAS Level 2 Routines: Issue an error message which con-
tains the name of the routine and the number of the argu-
ment with the illegal value, e.g.,
** On entry to DGEMV parameter number 1 had an ille-
gal value
o BLAS Level 3 Routines: Issue an error message which con-
tains the name of the routine and the number of the argu-
ment with the illegal value.
** On entry to DGEMM parameter number 1 had an ille-
gal value
o Fast Fourier Transforms: return from the program without
performing any computations.
The following sections describe the BLAS (Basic Linear Alge-
bra Subprograms) routines available in the Basic Math
Library. The routines are arranged in alphabetical order,
ignoring the prefixes which indicate the data type the
specific routine uses, e.g., routine names beginning with D
are DOUBLE PRECISION.) All of the BLAS routines come in
multiple versions, usually for SINGLE PRECISION real, DOUBLE
PRECISION real, single precision COMPLEX, and DOUBLE PRECI-
SION COMPLEX. This section contains "Performance Hints for
Intel i860 Processors" on how to obtain optimal performance
from the routines in the Basic Math Library on computers
which use the Intel i860 Processor. Whenever possible, fol-
low these guidelines concerning array arguments to routines
in the library: Vector arguments should have an increment
equal to one. Double precision complex arrays should always
begin on a 16-byte boundary, i.e., the memory address of the
first element of the array should be divisible by 16. Some
Fortran compilers provide an option that will force this
alignment. As an alternative, common blocks begin on 16-
byte boundaries and so the first array at the beginning of a
common block will always have the desired alignment. Subse-
quent arrays in the common block will also be on a 16-byte
boundary.
DOUBLE PRECISION COMPLEX X, Y
COMMON X(20), Y(10)
will force X and Y onto 16-byte storage boundaries. Array
arguments to the FFT programs should lie on 16-byte storage
boundaries. Vector arguments to BLAS 1 routines should be
in the data cache. If there are two vector arguments to a
BLAS 1 routine, it is best if only the first argument be in
cache rather than only the second argument. A vector will
usually be in cache after it has just been referenced, pro-
vided it is small enough (no larger than 8192 bytes). The
first loop below will generally perform better than the
second, since X will be in cache after the first call and is
accessed as the first operand while Y will not remain in
cache.
DO 10 J = 1, 100
S = S + SDOT( 1000, X, 1, Y(1,J), 1 )
10 CONTINUE
DO 10 J = 1, 100
S = S + SDOT( 1000, Y(1,J), 1, X, 1 )
10 CONTINUE
There is not much impact if the vector arguments to the BLAS
2 or FFT routines are not in cache because these routines
are designed to manage the cache. The BLAS 3 routines also
manage the cache but have no vector arguments.
For more information, consult the following articles:
C. Lawson, R. Hanson, D. Kincaid, and F. Krough, "Basic
Linear Algebra Subprograms for Fortran Usage," ACM Trans. on
Math. Soft. 5 (1979) 308-325.
J. Dongarra, J. DuCroz, S. Hammarling, and R. Hanson, "An
Extended Set of Fortran Basic Linear Algebra Subprograms,"
ACM Trans. on Math. Soft. 14,1 (1988) 1-32.
J. Dongarra, J. DuCroz, I. Duff, and S. Hammarling, "A Set
of Level 3 Basic Linear Algebra Subprograms," ACM Trans on
Math. Soft. (Dec. 1989).
ROUTINE ARGUMENTS
Vector arguments are passed in one-dimensional arrays.
Associated with the vector are a length and an increment
which are passed as integer variables. The length specifies
the number of elements in the vector. The increment (also
called stride) of the vector specifies the spacing between
vector elements and the order of the elements in the one-
dimensional array in which the vector is passed. If a vec-
tor of length n and increment incx is passed in a one-
dimensional array x, then its values stored at x(1), x(1 +
|incx|), If incx is positive, then the elements are stored
in increasing order in the array x. If incx is negative,
then the elements are stored in decreasing order with the
first element being stored in x(1 + (n-1)*|incx|). If incx
is zero, then all elements of the vector have the same value
which is stored in x(1). The dimension of the one-
dimensional array holding the vector must always be at least
idimx = 1 + (n-1)*|incx|
Example 1: Let x(1:10) be the one-dimensional real array
x = ( 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0 )
If
incx = 2 and n= 4,
then the vector argument with elements in order from first
to last is
( 1.0, 3.0, 5.0, 7.0 )
If
incx = -2 and n = 4
then the vector argument with elements in order from first
to last is
( 7.0, 5.0, 3.0, 1.0.)
If
incx = 0 and n = 4
then the vector argument with elements in order from first
to last is
( 1.0, 1.0, 1.0, 1.0 )
One-dimensional substructures of a matrix, such as the rows,
columns, and diagonals, can be passed as vector arguments
provided the correct starting address and increment are
specified. With Fortran column-major ordering used for
storing the m-by-n matrix, the increment between elements in
the same column is 1, the increment between elements in the
same row is m, and the increment between elements on the
same diagonal is m+1. Example 2: Let A be the real 5x4
matrix declared as
REAL A(5,4).
The third column of A can be scaled by 2.0 by invoking the
BLAS routine sscal with the following statement:
CALL SSCAL (5, 2.0, A(1,3), 1).
The second row can be scaled by 2.0 with the statement:
CALL SSCAL (4, 2.0, A(2,1), 5).
The main diagonal of A can be scaled by 2.0 with the state-
ment:
CALL SSCAL (5, 2.0, A(1,1), 6).
Notes: Some routines have the restriction that the vector
increment cannot be zero. This restriction is required to
maintain compatibility with the Argonne BLAS interface. If
the increment of a vector argument is not specified, then it
is assumed to be 1.
Matrix arguments are passed in two-dimensional arrays. For-
tran column-major ordering of the storage is assumed, i.e.,
elements of the same column occupy successive storage loca-
tions. Associated with a matrix argument are three quanti-
ties: its leading dimension which specifies the number of
storage locations between elements in the same row, its
number of rows, and its number of columns. The leading
dimension of the matrix must always be at least as large as
the number of rows. In addition, a character transposition
parameter is often passed which indicates whether the matrix
argument is to be used in normal or transposed form or, if
the matrix is complex, if the conjugate transpose of the
matrix is to be used. The values of the transposition
parameter for these cases are 'N', and 'C', respectively.
EXAMPLES
This section presents examples illustrating the calling
sequence of programs in the Basic Math Library.
Example 1: The following program illustrates a call to the
BLAS 1 routine saxpy.
integer n, incx, incy
real x(5), y(9), alpha
data n / 5 /, incx / 1 /, incy / -2 /, alpha / 1.1 /
data x / 0.1, 0.2, 0.3, 0.4, 0.5 /
data y / 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0 /
c
call saxpy( n, alpha, x, incx, y, incy )
c
write(*,1) y
c
1 format( 9f8.2 )
end
Output:
1.55 2.00 3.44 4.00 5.33 6.00 7.22 8.00 9.11
Example 2: The following program illustrates a call to the
BLAS 2 routine sger.
integer m, n, incx, incy, i, j
real x(3), y(3), alpha, a(4,2)
data m /3/, n / 2 /, lda /4/
data incx / 1 /, incy / 2 /, alpha / 0.1 /
data x / 0.1, 0.2, 0.3/, y / 1.0, 2.0, 3.0/
data a / 1.1, 2.1, 3.1, 4.1, 1.2, 2.2, 3.2, 4.2/
c
call sger( m, n, alpha, x, incx, y, incy, a, lda )
c
do 1 i = 1, lda
write(*,2) ( a(i,j), j = 1, n )
1 continue
c
2 format( 2f8.2 )
end
Output:
1.11 1.23
2.12 2.26
3.13 3.29
4.10 4.20
Example 3: The following program illustrates a call to the
BLAS 3 routine strmm.
integer m, n, incx, incy, i, j
real alpha, a(4,3), b(5,2)
character side, uplo, transa, diag
data m /3/, n / 2 /, lda /4/, ldb /5/
data alpha / 1.0 /
data side / 'L'/, uplo /'U'/, transa /'N'/, diag /'U'/
data a / 1.1, 2.1, 3.1, 4.1, 1.2, 2.2, 3.2, 4.2, 1.3, 2.3, 3.3, 4.3/
data b / 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0/
c
call strmm( side, uplo, transa, diag, m, n, alpha, a, lda, b, ldb )
c
do 1 i = 1, ldb
write(*,2) ( b(i,j), j = 1, n )
1 continue
c
2 format( 2f8.2 )
end
Output:
0.73 2.48
0.89 2.54
0.30 0.80
0.40 0.90
0.50 1.00
Example 4: The following program illustrates calls to the
FFT routines scfft1d and csfft1d:
integer n, iflag
real r(10)
complex wsave(10)
data n /8/
data r /1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0/
iflag = 0
call scfft1d( r, n, iflag , wsave )
iflag = 1
call scfft1d( r, n, iflag, wsave )
write(*,1) r
iflag = 1
call csfft1d( r, n, iflag, wsave )
write(*,1) r
1 format(10f8.2)
end
Output:
36.00 0.00 -4.00 9.66 -4.00 4.00 -4.00 1.66 -4.00 0.00
1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 0.00 0.00
Acknowledgement and Disclaimer