Application: sparse matrix-vector multiplication. 


Wim Vanroose
,   9/4/2002

Introduction

The matrix exponential of an operator acting on a vector is an important problem in mathematics, physics and engineering. Many problems are reduced to an ODE of the form

d v(t)/dt =  A v(t)

with v(t) a time dependent vector and A a linear operator.  The solution v(t) is subject to certain initial conditions.   
The solution of this equation can be formally written as

v(t)  = Exp ( t  A ) v_0.

So, calculating the matrix exponential will provide use with the solution.

 For small t  << 1 ,  approximations  like Pade or Crank-Nicholson of the exponential work really good.   Problem is that for the application that I am interested in, the matrix  A is a huge sparse matrix and the Pade or similar approximation only work for small matrices.

Since, the matrix A is sparse, the exponential can be calculated  based on the Lanczos algorithm.     By applying  A multiple time to the initial state v,  and using a gramm-schmidt like procedure,  the Lanczos algorithm creates  an orthogonal basis for the vector space

Span(v_0, Av_0, AA v_0,  ....,  )

The operation of the matrix exponential on is then calculated in this  subspace.   This is possible since the representation of  A in this limited subspace is again a small matrix. And the exponential of that small matrix can be calculated with the conventional Pade approximation .

The code of this procedure is pretty easy: you essential have to write a matrix-vector product routine  that implement the application of  A on vector v.     I have this code  running on a single processor machine.

 Application:   sparse matrix-vector multiplication 

So in order to rewrite a the matrix exponential  on a parallel machine I need to calculate the  Matrix-vector product in parallel.  Therefore, the application that I am interested in is how do I organise sparse matrix-vector multiplication.  Its obvious that this is not the first application that relies heavily on this multiplication.    

Because the matrix is sparse it looks very difficult to find the optimal way of programming the matrix-vector product, since it depends critically on the distribution of non-zero elements.

We dont want to use the generalisation of the uniprocesser BLAS routine  dgemv.  For example PLAPACK has the call PLA_dgemv or  SCALAPACK has  PSgemv.  These routines are optimized for dense matrices and don't take advantage of the nonzero elements in the matrix A. In addition in our application A is so big that we never want to contstruct it. We prefer an construction on-the-fly when we calculate the matrix product.

I've found some recent papers on this problem and it seems that the result depend one the distribution of of the non-zero elements in matrix A

Conclusion

At first sight, no  information can be given on the peak performance and the scalability without considering the structure of the sparse matrix.