Quick Start Guide
Access to Franklin
Software
Status & StatsNERSC MOTD Announcements Known Problems Current Queue Look Completed Jobs List Job Stats |
Using LibSci on FranklinThe Cray Scientific Libraries package, LibSci, is available on Franklin. Libsci includes BLAS, LAPACK, ScaLAPACK, BLACS, and SuperLU routines. SuperLu User's Guide, LAPACK, ScaLAPACK, BLAS, and BLACS man pages The LibSci module is automatically loaded upon login. The compiler wrappers (ftn, CC, cc) will automatically load and link to the LibSci library. BLAS and LAPACK functionality included in LibSci contains Cray custom routines combined with the University of Texas' libGoto library. BLAS and LAPACK from LibSci are preferred and default functions than the corresponding BLAS and LAPAC functions provided in ACML. Threaded BLAS in LibSci can be used by setting the GOTO_NUM_THREADS for Franklin and by using the appropriate Torque options for using less than 4 MPI tasks per node for your job. Scalapack in LibSci is also supported in mixed mode, i.e. MPI parallelism across compute nodes with threaded BLAS level parallelism on cores within a node. The code needs to be linked with libsci_mp library via "-lsci_mp". This is performed via compile and runtime options, and environment variables, as follows: 1) adjust the process grid dimensions in ScaLAPACK to account for the decrease in BLACS nodes; 2) ensure that the number of BLACS processes required is equal to the number of nodes required, NOT equal to the number of cores; 3) link with libsci_mp library via "-lsci_mp". 4) set GOTO_NUM_THREADS in the PBS job script used to launch the job. Here is a sample job script for using threaded BLAS or mixed mode Scalapack: #PBS -q debug #PBS -l mppwidth=64 #PBS -l mppnppn=1 #PBS -l walltime=00:30:00 #PBS -j eo cd $PBS_O_WORKDIR setenv GOTO_NUM_THREADS 4 ftn atest.f90 -lsci_mp aprun -n 64 -N 1 ./a.out LibSci also includes the automatic benchmarking interfaces for IRT, the Iterative Refinement Toolkit. Iterative Refinement can be used 'under-the-covers' in your application by setting the environment variable IRT_USE_SOLVERS to 1. You can therefore obtain significant speed-ups on applications that are performing LAPACK or ScaLAPACK linear solvers (Cholesky or LU) just by setting this environment variable. It is particularly useful for benchmarkers who cannot change their codes. Please refer to "intro_irt" man page for details about the explicit interface to IRT, as well as the caveats and safety of using iterative refinement. You can use IRT under-the-covers if your code using the following routine or combination of routines: dgetrf and dgetrs (LAPACK) dpotrf and dpotrs (LAPACK) dgesv (LAPACK) dposv (LAPACK) pdgetrf and pdgetrs (ScaLAPACK) pdpotrf and pdpotrs (ScaLAPACK) pdgesv (ScaLAPACK) pdposv (ScaLAPACK) zgetrf and zgetrs (LAPACK) zpotrf and zpotrs (LAPACK) zgesv (LAPACK) zposv (LAPACK) pzgetrf and pzgetrs (ScaLAPACK) pzpotrf and pzpotrs (ScaLAPACK) pzgesv (ScaLAPACK) pzposv (ScaLAPACK) Example Fortran Code using ScaLAPACK from LibSciBelow is the compile and run of a Fortran that uses ScaLAPACK from LibSci.Notice that there is no explicit reference to LibSci in the compiler line. It is included in the default user environment, and the compiler wrappers perform the linking automatically. % cat ABCp.f90 ! Matrix-Matrix Multiply, AB = C using LibSci ScaLAPACK ! filename: ABCp.f90 ! compile: ftn -fast -o ABCp ABCp.f90 ! input: ABCp.dat ! prow number of rows in proc grid ! pcol number of columns in proc grid ! n number of rows/columns in matrix A ! nb matrix distribution block size ! oputput: fort.u, where u=10+processor number, and stdout implicit none integer :: n, nb ! problem size and block size integer :: myunit ! local output unit number integer :: myArows, myAcols ! size of local subset of global array integer :: i,j, igrid,jgrid, iproc,jproc, myi,myj, p real*8, dimension(:,:), allocatable :: myA,myB,myC integer :: numroc ! blacs routine integer :: me, procs, icontxt, prow, pcol, myrow, mycol ! blacs data integer :: info ! scalapack return value integer, dimension(9) :: ides_a, ides_b, ides_c ! scalapack array desc ! Read problem description open(unit=1,file="ABCp.dat",status="old",form="formatted") read(1,*)prow read(1,*)pcol read(1,*)n read(1,*)nb close(1) if (((n/nb) < prow) .or. ((n/nb) < pcol)) then print *,"Problem size too small for processor set!" stop 100 endif ! Initialize blacs processor grid call blacs_pinfo (me,procs) call blacs_get (0, 0, icontxt) call blacs_gridinit(icontxt, 'R', prow, pcol) call blacs_gridinfo(icontxt, prow, pcol, myrow, mycol) myunit = 10+me write(myunit,*)"--------" write(myunit,*)"Output for processor ",me," to unit ",myunit write(myunit,*)"Proc ",me,": myrow, mycol in p-array is ", & myrow, mycol ! Construct local arrays ! Global structure: matrix A of n rows and n columns ! matrix B of n rows and n column ! matrix C of n rows and n column myArows = numroc(n, nb, myrow, 0, prow) myAcols = numroc(n, nb, mycol, 0, pcol) write(myunit,*)"Size of global array is ",n," x ",n write(myunit,*)"Size of block is ",nb," x ",nb write(myunit,*)"Size of local array is ",myArows," x ",myAcols ! Initialize local arrays allocate(myA(myArows,myAcols)) allocate(myB(myArows,myAcols)) allocate(myC(myArows,myAcols)) do i=1,n call g2l(i,n,prow,nb,iproc,myi) if (myrow==iproc) then do j=1,n call g2l(j,n,pcol,nb,jproc,myj) if (mycol==jproc) then myA(myi,myj) = real(i+j) myB(myi,myj) = real(i-j) myC(myi,myj) = 0.d0 ! write(myunit,*)"A(",i,",",j,")", & ! " --> myA(",myi,",",myj,")=",myA(myi,myj), & ! "on proc(",iproc,",",jproc,")" endif enddo endif enddo ! Prepare array descriptors for ScaLAPACK ides_a(1) = 1 ! descriptor type ides_a(2) = icontxt ! blacs context ides_a(3) = n ! global number of rows ides_a(4) = n ! global number of columns ides_a(5) = nb ! row block size ides_a(6) = nb ! column block size ides_a(7) = 0 ! initial process row ides_a(8) = 0 ! initial process column ides_a(9) = myArows ! leading dimension of local array do i=1,9 ides_b(i) = ides_a(i) ides_c(i) = ides_a(i) enddo ! Call ScaLAPACK library routine call pdgemm('T','T',n,n,n,1.0d0, myA,1,1,ides_a, & myB,1,1,ides_b,0.d0, & myC,1,1,ides_c ) ! Print results call g2l(n,n,prow,nb,iproc,myi) call g2l(n,n,pcol,nb,jproc,myj) if ((myrow==iproc) .and. (mycol==jproc)) & write(*,*) 'c(',n,n,')=',myC(myi,myj) ! Deallocate the local arrays deallocate(myA, myB, myC) ! End blacs for processors that are used call blacs_gridexit(icontxt) call blacs_exit(0) end ! convert global index to local index in block-cyclic distribution subroutine g2l(i,n,np,nb,p,il) implicit none integer :: i ! global array index, input integer :: n ! global array dimension, input integer :: np ! processor array dimension, input integer :: nb ! block size, input integer :: p ! processor array index, output integer :: il ! local array index, output integer :: im1 im1 = i-1 p = mod((im1/nb),np) il = (im1/(np*nb))*nb + mod(im1,nb) + 1 return end ! convert local index to global index in block-cyclic distribution subroutine l2g(il,p,n,np,nb,i) implicit none integer :: il ! local array index, input integer :: p ! processor array index, input integer :: n ! global array dimension, input integer :: np ! processor array dimension, input integer :: nb ! block size, input integer :: i ! global array index, output integer :: ilm1 ilm1 = il-1 i = (((ilm1/nb) * np) + p)*nb + mod(ilm1,nb) + 1 return end % cat runit #PBS -N scalapackjob #PBS -q debug #PBS -l mppwidth=64 #PBS -l walltime=00:05:00 #PBS -e run.out #PBS -j eo cd $PBS_O_WORKDIR ftn -o ABCp -fast ABCp.f90 aprun -n 64 ./ABCp % cat ABCp.dat 8 8 20000 64 % cat run.out Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. /opt/xt-asyncpe/1.0c/bin/ftn: INFO: linux target is being used c( 20000 20000 )= 5333133330000.000 Application 243414 resources: utime 0, stime 0 -------------------------- Batch Job Report ------------------------------ Job Id: 5570270.nid00003 User Name: yunhe Group Name: yunhe Job Name: scalapackjob Session ID: 31162 Resource List: walltime=00:05:00 Queue Name: debug Account String: mpccc Total Tasks: 64 Tasks per Node: 4 Node Count: 16 Job Exit Summary: APINFO_SUCCESS: application completed with no detectable system errors |
Page last modified: Wed, 22 Oct 2008 21:00:13 GMT Page URL: http://www.nersc.gov/nusers/systems/franklin/software/libsci.php Web contact: webmaster@nersc.gov Computing questions: consult@nersc.gov Privacy and Security Notice |