DESCRIPTION
	  Kuck and Associates' CLASSPACK Basic Math Library contains
	  highly optimized versions of standard	computational building
	  blocks which users of	high-performance computers can use to
	  improve the speed and	portability of their numerical appli-
	  cations programs.  The BLAS (Basic Linear Algebra Subpro-
	  grams) routines, which make up the bulk of this library, are
	  widely used in dense numerical linear	algebra	programs.
	  Since	they are the most useful vector	and matrix operations,
	  the availability of versions tuned to	the user's machine
	  provides a simple way	for speeding up	existing programs and
	  writing new numerical	programs.  They	also enhance portabil-
	  ity, since programs can contain standard calls to a widely-
	  available set	of library routines.  The BLAS routines	were
	  developed in three groups.  The BLAS 1 routines consist of
	  vector-vector	operations such	as dot products	and "scalar *
	  vector + vector".  The BLAS 2	routines provide matrix-vector
	  operations such as matrix-vector multiply and	the rank-1
	  update of a matrix. The BLAS 3 routines provide matrix-
	  matrix operations such as rank-k update and the solution of
	  triangular systems with multiple right-hand sides.  This
	  library contains all three groups

	  The Fourier Transform	is a common operation in many fields,
	  especially signal processing.	 Kuck and Associates' Basic
	  Math Library includes	three routines (in single-precision
	  and double-precision versions) for performing	FFTs (Fast
	  Fourier Transforms).	These have been	tuned for optimal per-
	  formance on this machine.

	  While	FFT routines are not as	standardized as	the BLAS rou-
	  tines, portability is	eased by limiting changes to small
	  sections of code which interface between the user's program
	  and the available library.

	  In addition to BLAS and Fast Fourier Transform routines,
	  special routines are provided	for machines with Intel	i860
	  processors.  These routines include the sparse BLAS routines
	  for performing the scalar-vector product and dot product,
	  special scalar-vector	product	routines designed for the case
	  when the result vector resides in the	on-chip	cache, several
	  special vector-vector	operations, programs for interchanging
	  the dimensions of 2-,	3-, and	4-dimensional arrays, and sub-
	  routines for solving tridiagonal, block tridiagonal, penta-
	  diagonal, and	block pentadiagonal systems.

	  The routines in this library are packaged as the Kuck	&
	  Associates' Basic Math Library.  The Basic Math Library
	  object routines are packaged as a library archive called
	  libkmath.a. Programs must be linked with this	library	by
	  using	the -lkmath option on the Fortran compiler command
	  line.
	  The descriptions of the routines in this manual specify the
	  legal	values for all scalar integer and character arguments.
	  The routines themselves check	these arguments	when they are
	  called to determine if they have legal values.  If the value
	  of one of the	arguments is not legal,	an error action	is
	  taken.  The action taken depends on the class	of routine.

	    o BLAS Level 1 Routines: Return from the program without
	    performing any computations.

	    o BLAS Level 2 Routines: Issue an error message which con-
	    tains the name of the routine and the number of the	argu-
	    ment with the illegal value, e.g.,

	       ** On entry to DGEMV  parameter number  1 had an	ille-
	    gal	value

	    o BLAS Level 3 Routines: Issue an error message which con-
	    tains the name of the routine and the number of the	argu-
	    ment with the illegal value.

	       ** On entry to DGEMM  parameter number  1 had an	ille-
	    gal	value

	    o Fast Fourier Transforms: return from the program without
	    performing any computations.
	  The following	sections describe the BLAS (Basic Linear Alge-
	  bra Subprograms) routines available in the Basic Math
	  Library.  The	routines are arranged in alphabetical order,
	  ignoring the prefixes	which indicate the data	type the
	  specific routine uses, e.g., routine names beginning with D
	  are DOUBLE PRECISION.)  All of the BLAS routines come	in
	  multiple versions, usually for SINGLE	PRECISION real,	DOUBLE
	  PRECISION real, single precision COMPLEX, and	DOUBLE PRECI-
	  SION COMPLEX.	 This section contains "Performance Hints for
	  Intel	i860 Processors" on how	to obtain optimal performance
	  from the routines in the Basic Math Library on computers
	  which	use the	Intel i860 Processor.  Whenever	possible, fol-
	  low these guidelines concerning array	arguments to routines
	  in the library: Vector arguments should have an increment
	  equal	to one.	 Double	precision complex arrays should	always
	  begin	on a 16-byte boundary, i.e., the memory	address	of the
	  first	element	of the array should be divisible by 16.	 Some
	  Fortran compilers provide an option that will	force this
	  alignment.  As an alternative, common	blocks begin on	16-
	  byte boundaries and so the first array at the	beginning of a
	  common block will always have	the desired alignment.	Subse-
	  quent	arrays in the common block will	also be	on a 16-byte
	  boundary.

		    DOUBLE PRECISION COMPLEX X,	Y
		    COMMON X(20), Y(10)

	  will force X and Y onto 16-byte storage boundaries.  Array
	  arguments to the FFT programs	should lie on 16-byte storage
	  boundaries.  Vector arguments	to BLAS	1 routines should be
	  in the data cache. If	there are two vector arguments to a
	  BLAS 1 routine, it is	best if	only the first argument	be in
	  cache	rather than only the second argument.  A vector	will
	  usually be in	cache after it has just	been referenced, pro-
	  vided	it is small enough (no larger than 8192	bytes).	The
	  first	loop below will	generally perform better than the
	  second, since	X will be in cache after the first call	and is
	  accessed as the first	operand	while Y	will not remain	in
	  cache.
		    DO 10 J = 1, 100
		      S	= S + SDOT( 1000, X, 1,	Y(1,J),	1 )
	       10   CONTINUE

		    DO 10 J = 1, 100
		       S = S + SDOT( 1000, Y(1,J), 1, X, 1 )
	       10   CONTINUE
	  There	is not much impact if the vector arguments to the BLAS
	  2 or FFT routines are	not in cache because these routines
	  are designed to manage the cache. The	BLAS 3 routines	also
	  manage the cache but have no vector arguments.
	  For more information,	consult	the following articles:

	  C. Lawson, R.	Hanson,	D. Kincaid, and	F. Krough, "Basic
	  Linear Algebra Subprograms for Fortran Usage," ACM Trans. on
	  Math.	Soft. 5	(1979) 308-325.

	  J. Dongarra, J. DuCroz, S. Hammarling, and R.	Hanson,	"An
	  Extended Set of Fortran Basic	Linear Algebra Subprograms,"
	  ACM Trans. on	Math. Soft. 14,1 (1988)	1-32.

	  J. Dongarra, J. DuCroz, I. Duff, and S. Hammarling, "A Set
	  of Level 3 Basic Linear Algebra Subprograms,"	ACM Trans on
	  Math.	Soft. (Dec. 1989).

     ROUTINE ARGUMENTS
	  Vector arguments are passed in one-dimensional arrays.
	  Associated with the vector are a length and an increment
	  which	are passed as integer variables.  The length specifies
	  the number of	elements in the	vector.	 The increment (also
	  called stride) of the	vector specifies the spacing between
	  vector elements and the order	of the elements	in the one-
	  dimensional array in which the vector	is passed.  If a vec-
	  tor of length	n and increment	incx is	passed in a one-
	  dimensional array x, then its	values stored at x(1), x(1 +
	  |incx|), If incx is positive,	then the elements are stored
	  in increasing	order in the array x.  If incx is negative,
	  then the elements are	stored in decreasing order with	the
	  first	element	being stored in	x(1 + (n-1)*|incx|).  If incx
	  is zero, then	all elements of	the vector have	the same value
	  which	is stored in x(1).  The	dimension of the one-
	  dimensional array holding the	vector must always be at least
			    idimx = 1 +	(n-1)*|incx|

	  Example 1: Let x(1:10) be the	one-dimensional	real array
	   x = ( 1.0, 2.0, 3.0,	4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0 )
	  If
			      incx = 2 and  n= 4,
	  then the vector argument with	elements in order from first
	  to last is
			     ( 1.0, 3.0, 5.0, 7.0 )
	  If
			      incx = -2	and n =	4
	  then the vector argument with	elements in order from first
	  to last is
			     ( 7.0, 5.0, 3.0, 1.0.)
	  If
			       incx = 0	and n =	4
	  then the vector argument with	elements in order from first
	  to last is
			     ( 1.0, 1.0, 1.0, 1.0 )

	  One-dimensional substructures	of a matrix, such as the rows,
	  columns, and diagonals, can be passed	as vector arguments
	  provided the correct starting	address	and increment are
	  specified.  With Fortran column-major	ordering used for
	  storing the m-by-n matrix, the increment between elements in
	  the same column is 1,	the increment between elements in the
	  same row is m, and the increment between elements on the
	  same diagonal	is m+1.	 Example 2: Let	A be the real 5x4
	  matrix declared as
		     REAL A(5,4).

	  The third column of A	can be scaled by 2.0 by	invoking the
	  BLAS routine sscal with the following	statement:
		     CALL SSCAL	(5, 2.0, A(1,3), 1).

	  The second row can be	scaled by 2.0 with the statement:
		     CALL SSCAL	(4, 2.0, A(2,1), 5).

	  The main diagonal of A can be	scaled by 2.0 with the state-
	  ment:
		     CALL SSCAL	(5, 2.0, A(1,1), 6).


	  Notes: Some routines have the	restriction that the vector
	  increment cannot be zero. This restriction is	required to
	  maintain compatibility with the Argonne BLAS interface.  If
	  the increment	of a vector argument is	not specified, then it
	  is assumed to	be 1.


	  Matrix arguments are passed in two-dimensional arrays. For-
	  tran column-major ordering of	the storage is assumed,	i.e.,
	  elements of the same column occupy successive	storage	loca-
	  tions.  Associated with a matrix argument are	three quanti-
	  ties:	its leading dimension which specifies the number of
	  storage locations between elements in	the same row, its
	  number of rows, and its number of columns.  The leading
	  dimension of the matrix must always be at least as large as
	  the number of	rows. In addition, a character transposition
	  parameter is often passed which indicates whether the	matrix
	  argument is to be used in normal or transposed form or, if
	  the matrix is	complex, if the	conjugate transpose of the
	  matrix is to be used.	 The values of the transposition
	  parameter for	these cases are	'N', and 'C', respectively.

     EXAMPLES
	  This section presents	examples illustrating the calling
	  sequence of programs in the Basic Math Library.

	  Example 1: The following program illustrates a call to the
	  BLAS 1 routine saxpy.

		     integer n,	incx, incy
		     real x(5),	y(9), alpha
		     data n / 5	/, incx	/ 1 /, incy / -2 /, alpha / 1.1	/
		     data x / 0.1, 0.2,	0.3, 0.4, 0.5 /
		     data y / 1.0, 2.0,	3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0 /
	       c
		     call saxpy( n, alpha, x, incx, y, incy )
	       c
		     write(*,1)	y
	       c
		1      format( 9f8.2 )
		     end

	  Output:

	      1.55    2.00    3.44    4.00    5.33    6.00    7.22    8.00    9.11

	  Example 2: The following program illustrates a call to the
	  BLAS 2 routine sger.


		     integer m,	n, incx, incy, i, j
		     real x(3),	y(3), alpha, a(4,2)
		     data m /3/, n / 2 /, lda /4/
		     data incx / 1 /, incy / 2 /, alpha	/ 0.1 /
		     data x / 0.1, 0.2,	0.3/,  y / 1.0,	2.0, 3.0/
		     data a / 1.1, 2.1,	3.1, 4.1, 1.2, 2.2, 3.2, 4.2/
	       c
		     call sger(	m, n, alpha, x,	incx, y, incy, a, lda )
	       c

		     do	1 i = 1, lda
			write(*,2) ( a(i,j), j = 1, n )
		1      continue
	       c
		2      format( 2f8.2 )
		     end

	  Output:

	      1.11    1.23
	      2.12    2.26
	      3.13    3.29
	      4.10    4.20

	  Example 3: The following program illustrates a call to the
	  BLAS 3 routine strmm.

		     integer m,	n, incx, incy, i, j
		     real alpha, a(4,3), b(5,2)
		     character side, uplo, transa, diag
		     data m /3/, n / 2 /, lda /4/, ldb /5/
		     data alpha	/ 1.0 /
		     data side / 'L'/, uplo /'U'/, transa /'N'/, diag /'U'/

		     data a / 1.1, 2.1,	3.1, 4.1, 1.2, 2.2, 3.2, 4.2, 1.3, 2.3,	3.3, 4.3/

		     data b / 0.1, 0.2,	0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0/
	       c
		     call strmm( side, uplo, transa, diag, m, n, alpha,	a, lda,	b, ldb )
	       c
		     do	1 i = 1, ldb
			write(*,2) ( b(i,j), j = 1, n )
		1    continue
	       c
		2    format( 2f8.2 )
		     end
	  Output:

	      0.73    2.48
	      0.89    2.54
	      0.30    0.80
	      0.40    0.90
	      0.50    1.00

	  Example 4: The following program illustrates calls to	the
	  FFT routines scfft1d and csfft1d:

		     integer n,	iflag
		     real r(10)
		     complex wsave(10)
		     data n /8/
		     data r /1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0,	8.0, 9.0, 10.0/

		     iflag = 0
		     call scfft1d( r, n, iflag , wsave )
		     iflag = 1
		     call scfft1d( r, n, iflag,	wsave )
		     write(*,1)	r
		     iflag = 1
		     call csfft1d( r, n, iflag,	wsave )
		     write(*,1)	r
		1    format(10f8.2)
		     end
	  Output:

	     36.00    0.00   -4.00    9.66   -4.00    4.00   -4.00    1.66   -4.00    0.00
	      1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    0.00    0.00
Acknowledgement and Disclaimer