Weekly Communication
Week June 28, 2002
FOR symmetric operator
I observed more than 2 times speedup when I switch gmres+ilu
to icc+cg in sloving the Dagger operator. But after a few time
steps, I got "current blowup". That means the algorithm is
unstable. I need to check the code again to be sure that
everything is correct.
This is what I did:
I refered to Hanks's OMP version and followed the
same rule to form the star and dagger matriices.
Then I solved them in 3 ways:
1) using gmres+ilu
-0.8754 0.4613e-11 -0.3435e-5
-0.3687 0.2878e-11 -0.8994e-5 --- original
2) using icc+ilu
-1.2726 0.2108e-11 -0.3080e-5
-0.3687 0.2878e-11 -0.8994e-5 --- original
3) using icc+cg
no problem but unstable.
I think this is due to the boundary condition.
So I need to talk to Hank how he did in the OMP version to
keep the matrix symmetric under different boundary conditions.
BTW, I didn't see and efficiency improvement when ilu is
switched to icc.
Week June 14, 2002
FOR symmetric operator
I have implemented the symmetric operators and their solvers
in FOUR ways. Some of them have close digits, but none of them
completely match. These differences are due to the diagonalized
lumped approximation lmass (userData->work.lmass):
_ _ - -
lmass=| sum_j mass_1j | to Mass=|mass_11 mass_12 ... mass_1k|
| sum_j mass_2j | |mass_21 mass_22 ... mass_2k|
| . | | . . ... . |
| . | | . . ... . |
| . | | . . ... . |
| sum_j mass_lj | |mass_l1 mass_l2 ... mass_lk|
- - - -
EXAMPLE I: grad * grad U = W
int grad * grad U lambda_i da = int W lambda_i da
-->>
- sum_j (int grad lambda_j * grad lambda_i da) U_j =
__________________________________________________
| sum_j (int lambda_i lambda_j da) W_j
| ____________________________________
| |
| |
Stiff * U = Mass * W
|
|
->lmass * W
->matrix * vector ==>> vector * vector
EXAMPLE II: 1/R * grad * R * grad U - 1/(dt * g) = 1/(dt * g) W
int 1/R *grad*R*grad U lambda_i da -
__________________________________
| int 1/(dt*g) lambda_i da =
| ________________________
| | int 1/(dt*g) W lambda_i da
| | __________________________
| | |
| | |
DaggerMatrix * U - Mass * D * U = Mass * D * W
| |
| |
->lmass * D * U |
->lmass * D * W
->matrix * D ==>> vector * D
D is the diagonal matrix given by
_ _
D=| 1/(dt*g_1) |
| 1/(dt*g_2) |
| . | g_j is resistivity|viscosity
| . | So D changes in every call.
| . |
| 1/(dt*g_k) |
- -
JUSTIFY lmass*W approximating Mass*W
1- Mass is symmetric, so there exits matrix X:
X^T Mass X = Diag(eigen_1, eigen_2, ... eigen_k)
|
|
first eigenvalue of Mass
2- According to Gershgorin theory,
|eigen_i| <= sum_j |mass_ij|
= sum_j mass_ij as mass_{ij}s are positive.
|
|
the ith component of lmass
==>> This might be the reason that lmass*W & lmass*D*W works.
FOUR ways used to implement the symmetric star and dagger operators
grad * R * grad U - R/(dt*g) * U = -R/(dt*g) * W
grad *(1/R)* grad A - 1/(dt*g*R) * U = -1/(dt*g*R) * W
1- lamss*DR*W to approximate Mass*DR*W:
the same idea Hank used in EXAMPLE I, now D changes to DR
_ _
DR=| 1/(dt*gR_1) |
| 1/(dt*gR_2) |
| . | gR_j=R_j/g_j or
| . | gR_j=1/(g_j*R_j)
| . |
| 1/(dt*gR_k) |
- -
2- lmassr*D*W to approximate Mass*DR*W:
the same concept Hank used in lmass. forming lmassr
_ _ - -
lmassr=|sum_j masR_1j| to MassR=|masR_11 masR_12 ... masR_1k|
|sum_j masR_2j| |masR_21 masR_22 ... masR_2k|
| . | | . . ... . |
| . | | . . ... . |
| . | | . . ... . |
|sum_j masR_lj| |masR_l1 masR_l2 ... masR_lk|
- - - -
massR_ij=mass_ij/R_{vertex k} or mass_{ij}*R_{vertex k}
MassR is made symmetric.
3- Using Mass*DR*W directly ==>> closer digits
4- using MassR*D*W directly ==>> closer digits
5- StarMatrix = (stiff_ij/Raverage)
DaggerMatrix = (stiff_ij*Raverage)
digits of gamma, wkin, w(10,1)
original -0.3687 0.2878e-11 -0.8994e-5
method 1 0.2495 0.2044e-11 -0.1514e-4
method 2 0.2439 0.2142e-11 -0.1571e-4
method 3 -1.7153 0.9873e-12 -0.1690e-6
method 4 -0.6488 0.3063e-11 -0.6332e-5
Raverg 5 -0.8754 0.4613e-11 -0.3435e-5
Rasymm 5 -1.2726 0.2108e-11 -0.3080e-5
Results from 1- 2- were quite different from original numbers.
Possible reason: g is almost the same over (R,Z) space, but
R changes a lot.
CONCLUSION
1- Mass*DR & MassR*D are not symmetric generally.
if g is constant over (R,Z) space, they are symmetric;
2- in order to make Mass*DR & MassR*D symmetric generally,
that will be prohibitively expensive, therefore, unrealistic.
3- check the code again and seek a better way for approximation
otherwise, continue to use general method.
Week June 14, 2002
For GMRES + ILU
1) on Uniprocessor, ilu fill-in is supported up to 5 levels;
2) on Multiprocessor,
block Jacobi preconditioning with ILU on each processor
To use ILU(k) on each block use -sub_pc_ilu_levels 3
==>> The reason why ilu with fill-in is faster:
With more fill-ins, ilu is more closely approximate the original
matrix. Therefore, the number of iterations decreases.
==>> Preconditioning:
cluster the eigenvalues far from zero and close to 1
==>> ICCG: CG with ICC
for Ax=b if A SPD (Symmetric Positive Definite)
for normal equation A^TA x= A^T b if A is not SPD
For turning off
eta, rmu, vmu, pkks, pdissf, pkkk, icheta
we can save about 23% of user time:
RadialGrids TurnON TurnOff Difference Percentage
30 54.13 42.34 11.79 21.78%
40 91.62 70.37 21.25 23.19%
50 139.73 107.39 32.34 23.14%
60 203.09 155.43 47.65 23.37%
70 282.72 216.29 66.43 23.50%
(poloidal partition is 3)
Form symmetric star dagger operator:
0)star[3,3] =
Dlam_0*Dlam_0/R_0 Dlam_0*Dlam_1/R_1 Dlam_0*Dlam_2/R_2
Dlam_1*Dlam_0/R_1 Dlam_1*Dlam_1/R_2 Dlam_1*Dlam_2/R_0
Dlam_2*Dlam_0/R_2 Dlam_2*Dlam_1/R_0 Dlam_2*Dlam_2/R_1
dagger[3,3] =
Dlam_0*Dlam_0*R_0 Dlam_0*Dlam_1*R_1 Dlam_0*Dlam_2*R_2
Dlam_1*Dlam_0*R_1 Dlam_1*Dlam_1*R_2 Dlam_1*Dlam_2*R_0
Dlam_2*Dlam_0*R_2 Dlam_2*Dlam_1*R_0 Dlam_2*Dlam_2*R_1
massr[3,3] =
lam_0* lam_0/R_0 lam_0* lam_1/R_1 lam_0* lam_2/R_2
lam_1* lam_0/R_1 lam_1* lam_1/R_2 lam_1* lam_2/R_0
lam_2* lam_0/R_2 lam_2* lam_1/R_0 lam_2* lam_2/R_1
rmass[3,3] =
lam_0* lam_0*R_0 lam_0* lam_1*R_1 lam_0* lam_2*R_2
lam_1* lam_0*R_1 lam_1* lam_1*R_2 lam_1* lam_2*R_0
lam_2* lam_0*R_2 lam_2* lam_1*R_0 lam_2* lam_2*R_1
1) mesh/felement/accumLocalFE.c
1.1-in AccumLocalFE
-double star[9], dagger[9], massr[9], rmass[9];
double Rave;
-Rave = ( r[map.elements[jElement][0]] +
r[map.elements[jElement][1]] +
r[map.elements[jElement][2]] )/3.0;
-star[i*3+j] = s[i*3+j]/r[map.elements[jElement][(i+j)%3]];
dagger[i*3+j] = s[i*3+j]*r[map.elements[jElement][(i+j)%3]];
massr[i*3+j] = m[i*3+j]/r[map.elements[jElement][(i+j)%3]];
rmass[i*3+j] = m[i*3+j]*r[map.elements[jElement][(i+j)%3]];
-UpdateMassStiff(jElement, m, s, dRoverR, star, dagger, massr, rmass,
UpdateMassStiff(jElement, m, s, dRoverR, star, dagger, massr, rmass,
UpdateMassStiff(jElement, m, s, dRoverR, star, dagger, massr, rmass,
1.2-in UpdateMassStiff
l->matrix[whichPhiSection].star[iRow][iColumn] += star[i*3+j];
l->matrix[whichPhiSection].dagger[iRow][iColumn] += dagger[i*3+j];
l->matrix[whichPhiSection].massr[iRow][iColumn] += massr[i*3+j];
l->matrix[whichPhiSection].rmass[iRow][iColumn] += rmass[i*3+j];
1.3-in accumLocalFE.h
UpdateMassStiff(int jE, double *m, double *s, double *dRoverR,
double *star , double *dagger, double *massr, double *rmass,
2) mesh/felement/felement.h
2.1-in class FEMatrix
double **star;
double **dagger;
double **massr;
double **rmass;
2.2-in class FiniteElementMatrix
Mat *Star; /*cj form symm operator*/
Mat *Dagger;
Mat *MassR;
Mat *RMass;
3) mesh/felement/allocateLocalFEMatrix.c
3.1-in AllocLocalFEMatrix
p[i].star = fMatrix(matrixInfo);
p[i].dagger = fMatrix(matrixInfo);
p[i].massr = fMatrix(matrixInfo);
p[i].rmass = fMatrix(matrixInfo);
4) mesh/felement/constructFEMatrix.c
4.1-in ConstructFEMatrix
gfe->Star =(Mat *)malloc((unsigned)map.NumPhiSections....
gfe->Dagger=(Mat *)malloc((unsigned)map.NumPhiSections....
gfe->MassR =(Mat *)malloc((unsigned)map.NumPhiSections....
gfe->RMass =(Mat *)malloc((unsigned)map.NumPhiSections....
4.2-in ConstructFEMatrix
ierr = MatCreateMPIAIJ(.... &(gfe->Star[whichPhiSection]));
ierr = MatCreateMPIAIJ(.... &(gfe->Dagger[whichPhiSection]));
ierr = MatCreateMPIAIJ(.... &(gfe->MassR[whichPhiSection]));
ierr = MatCreateMPIAIJ(.... &(gfe->RMass[whichPhiSection]));
4.3-in ConstructFEMatrix
ierr = MatSetLocalToGlobalMapping(gfe->Star[....], ....
ierr = MatSetLocalToGlobalMapping(gfe->Dagger[....], ....
ierr = MatSetLocalToGlobalMapping(gfe->MassR[....],
ierr = MatSetLocalToGlobalMapping(gfe->RMass[....],
4.4-in ConstructFEMatrix
ierr = MatSetValuesLocal(gfe->Star[whichPhiSection], .....
lfe->matrix[whichPhiSection].star[i], ....
ierr = MatSetValuesLocal(gfe->Dagger[whichPhiSection], ....
lfe->matrix[whichPhiSection].dagger[i], ....
ierr = MatSetValuesLocal(gfe->MassR[whichPhiSection], ....
lfe->matrix[whichPhiSection].massr[i], ....
ierr = MatSetValuesLocal(gfe->RMass[whichPhiSection], ....
lfe->matrix[whichPhiSection].rmass[i],
4.5-in ConstructFEMatrix
ierr = MatAssemblyBegin(gfe->Star[whichPhiSection], ....
ierr = MatAssemblyEnd(gfe->Star[whichPhiSection], .....
ierr = MatAssemblyBegin(gfe->Dagger[whichPhiSection], ....
ierr = MatAssemblyEnd(gfe->Dagger[whichPhiSection], ....
ierr = MatAssemblyBegin(gfe->MassR[whichPhiSection], ....
ierr = MatAssemblyEnd(gfe->MassR[whichPhiSection], ....
ierr = MatAssemblyBegin(gfe->RMass[whichPhiSection], ....
ierr = MatAssemblyEnd(gfe->RMass[whichPhiSection], ....
4.6-in ConstructFEMatrix
Free_fMatrix(lfe->matrixInfo, lfe->matrix[....].star);
Free_fMatrix(lfe->matrixInfo, lfe->matrix[....].dagger);
Free_fMatrix(lfe->matrixInfo, lfe->matrix[....].massr);
Free_fMatrix(lfe->matrixInfo, lfe->matrix[....].rmass);
5) m3d/interface/poissc.c
5.1-in poissc_
int iR=0;
poissolve(i, gfe.Stiffness, iR,
5.2-in poistarc_
int iR=1;
poissolve(i, gfe.Star, iR,
5.3-in poisdagc_
int iR=2;
poissolve(i, gfe.Dagger, iR,
5.4-in poissolve(.... Mat *Msolver, const int iR, ....
poisNeuman ( whichPhiSection, Msolver, iR, A, B);
poisDiriclet( whichPhiSection, Msolver, iR, A, B, ibc);
5.5-in poisNeuman (.... Mat *Msolver, const int iR,
in poisDiriclet(.... Mat *Msolver, const int iR, ....
if( iR==1 )
ierr = VecPointwiseDivide(userData.canvas.rhs.g,
grid.R.g[whichPhiSection],userData.canvas.rhs.g )
if( iR==2 )
ierr = VecPointwiseMult(userData.canvas.rhs.g,
grid.R.g[whichPhiSection],userData.canvas.rhs.g )
5.5==>>5.6
if( iR==0 )
ierr = VecPointwiseMult(A.g[whichPhiSection],
userData.work.lmass.g[whichPhiSection], userData.canvas.rhs.g);
CHKERRQ(ierr);
if( iR==1 )
ierr = MatMult(gfe.MassR[whichPhiSection], A.g[whichPhiSection],
userData.canvas.rhs.g ); CHKERRQ(ierr);
if( iR==2 )
ierr = MatMult(gfe.RMass[whichPhiSection], A.g[whichPhiSection],
userData.canvas.rhs.g ); CHKERRQ(ierr);
5.1-in poismc_
int iR=0;
Diffpar(i, gfe.Stiffness, iR,
5.2-in poismstc_
int iR=1;
Diffpar(i, gfe.Star, iR,
5.3-in poismdgc_
int iR=2;
Diffpar(i, gfe.Dagger, iR,
5.4-in Diffpar(.... Mat *Msolve, const int iR, ....
if( iR==1 )
ierr = VecPointwiseDivide(userData.canvas.rhs_algebraic.g,
grid.R.g[whichPhiSection],userData.canvas.rhs_algebraic.g )
if( iR==2 )
ierr = VecPointwiseMult(userData.canvas.rhs_algebraic.g,
grid.R.g[whichPhiSection],userData.canvas.rhs_algebraic.g )
5.4==>>5.5
if( iR==0 )
ierr = VecPointwiseDivide(userData.work.lmass.g[whichPhiSection],
userData.coeff.resistivity.g[whichPhiSection],
userData.canvas.rhs_algebraic.g ); CHKERRQ(ierr);
if( iR==1 )
ierr = MatMult(gfe.MassR[whichPhiSection],
userData.coeff.resistivity.g[whichPhiSection],
userData.canvas.rhs_algebraic.g ); CHKERRQ(ierr);
if( iR==2 )
ierr = MatMult(gfe.RMass[whichPhiSection],
userData.coeff.resistivity.g[whichPhiSection],
userData.canvas.rhs_algebraic.g ); CHKERRQ(ierr);
6) mhd/allocation/mhdData.h
in MHD_work add
ParData lmassr;
ParData lrmass;
7) mhd/allocation/constructMHDdata.c
under m3d work space add
duplicatePData(map, mhd.basic[0].tI, &mhd.work.lmassr);
duplicatePData(map, mhd.basic[0].tI, &mhd.work.lrmass);
8) mhd/allocation/constructLmass.c
ierr = MatMult(gfe->MassR[i], scratch->tmpData.g,
userData->work.lmassr.g[i]); CHKERRQ(ierr);
ierr = MatMult(gfe->RMass[i], scratch->tmpData.g,
userData->work.lrmass.g[i]); CHKERRQ(ierr);
GetArrayInPData(*phone, *map, userData->work.lmassr);
GetArrayInPData(*phone, *map, userData->work.lrmass);
9)
5.5-in poisDiriclet(.... Mat *Msolver, const int iR, ....
if( iR==0 )
ierr = VecPointwiseMult(A.g[whichPhiSection],
userData.work.lmass.g[whichPhiSection],
userData.canvas.rhs.g);
CHKERRQ(ierr);
if( iR==1 )
ierr = VecPointwiseMult(A.g[whichPhiSection],
userData.work.lmassr.g[whichPhiSection],
userData.canvas.rhs.g);
CHKERRQ(ierr);
if( iR==2 )
ierr = VecPointwiseMult(A.g[whichPhiSection],
userData.work.lrmass.g[whichPhiSection],
userData.canvas.rhs.g);
CHKERRQ(ierr);
5.6-in poisNeuman (.... Mat *Msolver, const int iR,
if(iR==0){
ierr = VecPointwiseMult(A.g[whichPhiSection],
userData.work.lmass.g[whichPhiSection],
userData.canvas.rhs.g); CHKERRQ(ierr); }
if(iR==1){
ierr = VecPointwiseMult(A.g[whichPhiSection],
userData.work.lmassr.g[whichPhiSection],
userData.canvas.rhs.g); CHKERRQ(ierr); }
if(iR==2){
ierr = VecPointwiseMult(A.g[whichPhiSection],
userData.work.lrmass.g[whichPhiSection],
userData.canvas.rhs.g); CHKERRQ(ierr); }
if(iR==0){
for (i=0; i
Week June 7, 2002
Notes:
=>> Using Petsc, we don't have many options to make use of the matrix
structure to improve the performance. During these 2 weeks, I
strongly feel that we are limited by this package.
=>> The following 3 lines are recommended to be included in your job
submit script files:
-ksp_gmres_restart 120
-ksp_gmres_modifiedgramschmidt
-pc_ilu_levels 3
Here is my test: 32 = Number of grids in minor radial direction.
- preconditioning is better
-ksp_type gmres -pc_type ilu is 10 times faster than
-ksp_type gmres -pc_type none
- gmres with modified Gram-Schmidt is even better
-ksp_gmres_modifiedgramschmidt -pc_type ilu is 9% faster than
-ksp_type gmres -pc_type ilu
- increase restart number to get even better peformance
-ksp_gmres_restart 120 -ksp_gmres_modifiedgramschmidt -pc_type ilu
is again 9% faster than
-ksp_gmres_modifiedgramschmidt -pc_type ilu
- increase fillin level in ILU to again get even better peformance
-ksp_gmres_restart 120 -ksp_gmres_modifiedgramschmidt -pc_ilu_levels 3
is again 9% faster than
-ksp_gmres_restart 120 -ksp_gmres_modifiedgramschmidt -pc_type ilu
SO THE PERFORMANCE IS IMPROVED 4 TIMES COMPARED TO THE DEFAULT SETTING.
Jin: w3.pppl.gov/~jchen/doc.html is updated
. Eigenspectrum and Performance
. Direct GMRES and ILU preconditioned GMRES
. connectivity matrix: connecting lfe and gfe