Re: Accuracy Problems in Inverting Matrices



The important question is the last one below: what problem are you
_really_ trying to solve?  If you are trying to find parameters of
the generalized linear model  Gb=y  , where G has many more rows
than columns, then constructing G-transpose * G and inverting is not 
the way to go. Forming the normal equations (G-transpose * G) greatly 
magnifies any inherent ill-conditioning in the problem.

(Aside: All these computations are being performed in floating-point
arithmetic.  In floating-point, most arithmetic operations give inexact
results, so it would be miraculous to routinely find an exact identity
when multiplying a matrix by its computed inverse.  If you have a stable
algorithm and the problem is well-behaved, then the round-off errors
won't kill you.)

(Aside 2: When a problem is "ill-conditioned", small uncertainty in the
initial data leads to large uncertainty in the results.  The condition
number is a rough measure of this uncertainty.  With a condition number
of 10^15, it is possible for a perturbation in the last digit of a number
to a change in the leading digit of the result, i.e., to wipe out 
all accuracy, regardless of the method used to solve it.)

Forming the normal equations can cange a moderately conditioned problem
into a very ill-conditioned problem.  Working with the original matrix G
is much better.  JAMA provides a means to do this.  If you wish to find the
best least squares approximation to the solution of Gb=y, then use the
JAMA QRDecomposition class to construct a QR decomposition.  The class then
has a method (solve) which solves Gb=y, given y.

Ron Boisvert

> RICHET Yann ADECCO wrote:
> 
> Hello,
> 
> I've tryed to solve your problem with MATLAB.
> Matlab results are very close to JAMA ones, and the same problem of inverse appears.
> 
> I think you can't inverse a matrix with such a condition number ( 10^5 in the first case and 10^15 in the second) .
> JAMA uses a LU decomposition to inverse, and probably SAS uses a different algorithm.
> 
> But maybe you can use a more stable numeric algorithm to compute your GLM ?
> What expression are you trying to solve exactly ?
> 
> Yann
> 
> 
> 
> 
>      -----Message d'origine-----
>      De : Boyce Byerly [mailto:BByerly@newcourse.com]
>      Envoyé : mercredi 13 mars 2002 19:36
>      À : Multiple recipients of list
>      Objet : Accuracy Problems in Inverting Matrices
> 
>      Hello!
> 
>      I've been using this excellent package for a week or two now.  I'm
>      using it to solve parameter estimates for General Linear Models, and I'm running into some
>      problems when I try to invert a matrix.  The matrices I'm inverting
>      are square matrices of the form  (G_transpose * G), where G contains
>      a matrix of independent variables used in statistical models.
> 
>      I was double-checking myself by multiplying the Original Matrix (Gt*G)
>      by the calculated inverse from the jama class. I then look at the
>      resulting matrix to find out if I get the identity matrix back.  Results
>      vary from "almost right" to "very far away from an identity matrix."
>      I'm also testing it by looking at the interim products that SAS
>      produces -- SAS agrees that I'm calculating (Gt*G) correctly, but
>      the answers for (Gt*G)^-1 are not correct at all.
> 
>      I'm using a very recent version of jama, only about a week old.  I'm
>      using it in conjunction with jdk1.4.0 (the real one, not a beta), and
>      I'm using the Eclipse 2.0 IDE.  I did wrap the Jama Matrix class
>      in my own class to provide a little error-checking, but the below
>      problems are illustrated using code like this:
> 
>      XT_X.print(20,15);
>      Matrix result = XT_X.inverse();
>      result.print(20,15);
> 
>      So I'm using the Jama.Matrix methods directly.  I don't think my
>      wrapping could have messed anything up.
> 
>      I provide an example below.  Here's one where the answers are
>      not _quite_ right.  Note that I had to go out to about fifteen decimal
>      places to illustrate the problems:
> 
>      X matrix follows
> 
>      1.000000000000000 6.000000000000000 28.000000000000000 10.000000000000000
> 
>      1.000000000000000 12.000000000000000 40.000000000000000 20.000000000000000
> 
>      1.000000000000000 10.000000000000000 32.000000000000000 17.000000000000000
> 
>      1.000000000000000 8.000000000000000 36.000000000000000 12.000000000000000
> 
>      1.000000000000000 9.000000000000000 34.000000000000000 11.000000000000000
> 
>      -----------------
> 
>      XT_X matrix follows
> 
>      5.000000000000000 45.000000000000000 170.000000000000000 70.000000000000000
> 
>      45.000000000000000 425.000000000000000 1562.000000000000000 665.000000000000000
> 
>      170.000000000000000 1562.000000000000000 5860.000000000000000 2430.000000000000000
> 
>      70.000000000000000 665.000000000000000 2430.000000000000000 1054.000000000000000
> 
>      -----------------
> 
>      Note: The above matrix XT_X, is exactly what SAS reports as the values.
> 
>      XT_X_inv matrix follows
> 
>      18.062318840579740 1.061594202898550 -0.722826086956522 -0.202898550724636
> 
>      1.061594202898533 0.516304347826089 -0.093297101449275 -0.181159420289856
> 
>      -0.722826086956520 -0.093297101449275 0.038496376811594 0.018115942028986
> 
>      -0.202898550724632 -0.181159420289856 0.018115942028985 0.086956521739131
> 
>      -----------------
> 
>      The above, XT_X_inv, the inverse matrix, is somewhat different from what SAS calculates.
> 
>      For example, the [0,0] element is 18.06 here; SAS calculates it as 17.59. None of the estimates
> 
>      are off by an order-of-magnitude, but it is pretty different.
> 
>      Sas calculates it as:
> 
> 
> 
>      X'X Inverse Matrix
> 
>      Intercept Years Age Income
> 
>      Intercept 17.588888889 0.6388888889 -0.680555556 2.3333333333
> 
>      Years 0.6388888889 0.1388888889 -0.055555556 2.0833333333
> 
>      Age -0.680555556 -0.055555556 0.0347222222 -0.208333333
> 
>      Income 2.3333333333 2.0833333333 -0.208333333 11.5
> 
> 
> 
>      -----------------
> 
>      Now, I multiply to make sure everything was done right -- I should get the identity
> 
>      matrix out, but here's what I do get:
> 
> 
> 
>      Test Matrix XTxX X XT_X_inv = ident? follows
> 
>      0.999999999999993 -0.000000000000004 0.000000000000000 -0.000000000000002
> 
>      -0.000000000000369 1.000000000000014 0.000000000000004 -0.000000000000021
> 
>      -0.000000000000284 0.000000000000000 1.000000000000014 -0.000000000000028
> 
>      -0.000000000000398 0.000000000000000 0.000000000000018 0.999999999999986
> 
> 
> 
>      Now we are starting to get trouble.  The above is _supposed_ to be the identity
>      matrix, but, again, nothing's quite right.  You have to go out to about ten decimal
>      places to see it, but the problem's definitely arising.
> 
>      If this was the worst of my problems, I wouldn't worry -- statistics, after all, isn't
>      the most precise use for mathematics.  However, when I start to use even slightly
>      complicated datasets, the inverse function becomes so inaccurate that it's not
>      usable at all.  The above example is a regression-type parameter estimate; when
>      you're doing proper GLM, you get matrices with lots of 0's and 1's in them, and
>      that seems to really cause mischief.  Here's another example:
> 
> 
> 
>      XT_X matrix follows
> 
>      16.000000000000000 6.000000000000000 3.000000000000000 7.000000000000000 380.999999999999940
> 
>      6.000000000000000 6.000000000000000 0.000000000000000 0.000000000000000 132.399999999999980
> 
>      3.000000000000000 0.000000000000000 3.000000000000000 0.000000000000000 91.000000000000000
> 
>      7.000000000000000 0.000000000000000 0.000000000000000 7.000000000000000 157.600000000000000
> 
>      380.999999999999940 132.399999999999980 91.000000000000000 157.600000000000000 9559.939999999999000
> 
> 
>      ---------------
> 
>      XT_X_inv (the inverse) matrix follows
> 
>      -750599937895078.600000000000000 750599937895081.500000000000000 750599937895082.600000000000000 750599937895081.800000000000000 -0.130781270308912
> 
>      750599937895080.600000000000000 -750599937895082.000000000000000 -750599937895082.800000000000000 -750599937895082.400000000000000 0.063857537946074
> 
>      750599937895081.500000000000000 -750599937895082.400000000000000 -750599937895082.400000000000000 -750599937895082.500000000000000 0.038786411622836
> 
>      750599937895080.600000000000000 -750599937895082.100000000000000 -750599937895082.600000000000000 -750599937895082.100000000000000 0.062500000000000
> 
>      -0.091994858686076 0.025071126323238 -0.000000000000000 0.023713588377164 0.003032797539101
> 
> 
> 
>      And look at what you get at the far end, when you test to make sure that (XT_X) * (XT_X)^-1 is really the identity matrix. There's actually a -47
> 
>      on the bottom row where there should be a 0.  If you look at the inverse matrix above, there are some very large repeating numbers in it, that
> 
>      I presume mean we've exceeded the limits of certain numbers.
> 
> 
> 
> 
> 
>      Test Matrix XTxX X XT_X_inv = ident? follows
> 
>      -1.050041159394993 -0.447900870846192 1.000000000000000 0.034877171699625 0.000000000000000
> 
>      -0.180119290036474 0.319417125196756 0.000000000000000 -0.360320898863438 0.000000000000000
> 
>      0.128467859567074 -0.218527504585311 1.000000000000000 -0.092063457678042 0.000000000000000
> 
>      -0.498389728925595 -1.048790491457637 0.000000000000000 0.737261528241105 0.000000000000000
> 
>      -47.465329347366380 -16.321536617420860 -16.000000000000000 -13.299517929611255 1.000000000000004
> 
> 
> 
>      SAS calculates the inverse as:
> 
>      Intercept 2.4704468492 -0.802279656 0.0103525939 -0.434668572 0.0691731207 0 36.005808632
> 
>      cylinders -0.802279656 0.5077243651 -0.009628452 0.1010873279 -0.446990855 0 -0.116141914
> 
>      displace 0.0103525939 -0.009628452 0.0002048951 -0.001311149 0.0107440513 0 -0.058843371
> 
>      make AMC -0.434668572 0.1010873279 -0.001311149 0.3461095006 0.1154580223 0 -2.129155798
> 
>      make Audi 0.0691731207 -0.446990855 0.0107440513 0.1154580223 1.1002262293 0 1.7149657708
> 
>      make Buick 0 0 0 0 0 0 0
> 
>      mpg 36.005808632 -0.116141914 -0.058843371 -2.129155798 1.7149657708 0 162.60315576
> 
> 
> 
>      Could somebody help me out here?  Am I doing something unreasonable?  Is there an extension that lets me
> 
>      use some more precise numeric class?
> 
> 
> 
>      -Boyce
> 
>      _______________________________________
> 
>      W. Boyce Byerly, Ph.D.



Date Index | Thread Index | Problems or questions? Contact list-master@nist.gov