03-FEB-97                                                    12-FEB-97 09:27:26

Leonid Petrov
pet@leo.gsfc.nasa.gov


                Implementation fast algorithms in SOLVE.
                =======================================

I. User guide.
   ~~~~~~~~~~~

   In order to increase the speed of data processing new fast algorithms were
included in SOLVE software system. In interactive mode a new option FAST_MODE
was added at the bottom of OPTIN page. In batch mode a new keyword FAST_MODE
was added in setup section. It is necessary to emphasize that new fast
algorithm don't do any approximation. They produce exactly the same solution
(upto the rounding errors) as a straightforward approach but calculate it
rather faster exploiting sparseness of the normal matrix. Variable FAST_MODE
may have the following values:

1) FAST_MODE = NONE  --  the old way of calculations which was used routinely
   from 1976 till 1996 is chosen. Sparseness of the normal matrix is not taken
   into account. Further I will call it FULL case.

2) FAST_MODE = DOT_PRODUCT  --  small optimization for calculation of dot
   product will be introduced. For huge sessions with 5-30 thousands
   observations calculation normal matrix is speeded up to 2-3 times.

3) FAST_MODE = B3D -- algorithm B3D is used for solving normal system.
   Theoretical background is described in a paper L. Petrov "Multigroup LSQ
   method and its generalization". Here I will put only a brief summary of the
   approach and dwell upon some details of an implementation it in SOLVE.
   Tests show that for sessions where 9-12 stations participated, and
   atmosphere and clocks are modelled by linear spline with 60 minutes
   segments B3D algorithm finds the solution by 10 times faster. When the time
   span for clock is 60 minutes and for atmosphere is 20 minutes B3D algorithms
   works by 20 times faster then for a straightforward FULL approach.

   1.1 Restrictions:
   ~~~~~~~~~~~~~~~~~

1) When atmosphere and clock modelled by linear spline the time spans should
either be the same or be a multiple each other. Arbitrary combination of
time spans leads to loss of regularity of zeroes in normal matrix and
CAN NOT BE treated by B3D algorithm.

2) Time spans for spline modelling clock and/or atmosphere should be the same
for all stations ('batch' mode parameterization in SOLVE notation).

3) Databases should be sorted strictly in time order. (More than 200 databases
have a wrong time order before 1997 ).

4) Sessions where real clock breaks detected and lock function is modelled by
the special manner to treat them are not nowadays supported in B3D option.
It is planned to support this case in future versions.


   1.2 Defaults and environment variables.
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  Initial values of new variables FAST_MODE and FAST_DBG (variable FAST_DBG is
not intended for users, Their value should be NONE. Any other values will
produce some debug printout) has two level defaults. Batch_mode defaults are
NONE. When keyword FAST_MODE is messed in job configure file, then default
value is used. To pot initial value in screen form OPTIN verify the value of
environment variable FAST_MODE. If such a variable is setup, their value will
be used as a default instead of batch-mode default.

  A special environment variable FAST_MODE__BYPASS is introduced to provide
processing long arc list in batch mode. If such an environment variable is set
up then for each session 1) FAST_MODE will be installed in according to its
value by superseding the values from BATCH or OPTIN 2) it will be tested
whether this sessions can be analyzed by B3D. If not then FAST_MODE will be
set up to NONE, but temporary, only for this session.


II. Programmer's guide.
    ~~~~~~~~~~~~~~~~~~~


2.1. FAST_MODE = DOT_PRODUCT

   FAST_MODE = DOT_PRODUCT  works twice: in NORML when normal
   equations are calculated and in CRES when residuals are produced. The total
   number of variables which enter in normal system for one session is
   hundreds and thousands. But the number of parameters on which one equation
   of conditions depends is considerably lesser: usually several tens. Thus,
   the vector of equations of conditions contains a lot of zeroes. Old SOLVE
   uses this property only in one half. When FAST_MODE is DOT_PRODUCT or B3D
   then special fast algorithm for calculation of dot product of sparse
   vectors is used.


2.2. FAST_MODE = B3D

   The normal system which is produced during analysis of one session when we
model atmosphere, clock and sometimes EOP by linear spline (or, the same,
"piece-wise linear function") has two important properties: 1) normal matrix
has a lot of zeroes; 2) the distribution of zeroes in normal matrix has a
regular structure. There exist such a parameters permutation that yields such
a picture:

                         GBBBBBBBBBBBBBB
                         BCD
                         BDCD
                         B DCD
                         B  DCD
                         B   DCD
                         B    DCD
                         B     DCD
                         B      DCD
                         B       DCD
                         B        DCD
                         B         DCD
                         B          DCD
                         B           DCD

  Fig. 1 B3D case (symmetric, double-bordered, block-threediagonal matrix).


  where letter C, B, D, G denotes a blocks of non-zeroes elements (although
not all elements necessarily are non-zero) and blank denotes zero blocks.
Such case resembles situation with global estimation when we have symmetric,
double-bordered, block-diagonal matrix (cf. Fig. 2) -- B1D case. For solving
such system with B1D structure a powerful algorithm is used denoted as "ARCPE
(ARC Parameter Elimination) algorithm" or "SLSQ -- Sequential Least SQuare"
or B1D.


                         GBBBBBBB
                         BC
                         B C
                         B  C
                         B   C
                         B    C
                         B     C
                         B      C

  Fig. 2 B1D case (symmetric, double-bordered, block-diagonal matrix).

  Algorithm B3D is a generalization above mentioned algorithm to the case
when our normal matrix is symmetric, double-bordered, block-threediagonal
matrix. It is the case when some parameters are modelled by linear spline.
Implementing B3D in SOLVE made by adding some patches in code of ENTER, OPTIN,
BATCH, PROC, NORML, CRES.

  When some parameters are modelled by linear spline the values of the
parameters on the boundary points are estimates. Consider the simpliest case.
We have two stochastic parameters: atmosphere delay in zenith direction and
clock variations (clock drift and drift rate is treated as global parameters
for whole session) modelled by N-segmented linear spline. The time span of the
model for atmosphere and clock is the same and all observations are within the
first and the last (N+1 -th) spline boundary.

                      *

    |_______|_______|_______|_______|____... ___|
    1       2       3       4       5          N+1

    Fig. 3 Example of linear spline parameterization.

  Thus, f.e. if an observation (denoted by asterisk) lays between 3-rd and 4-th
boundary, the parameters for 3-rd and 4-th boundary enters to equation of
conditions while parameters for 1--2 and 5-- N+1 direction  enter in equations and
the coefficients for them are zeroes. If all stochastic parameters for each
boundary point gathered in one block then each observation which lays between
K-th and K+1 -th boundary points will supply stuff in  K-th, K+1 -th block and
to a global block (for non-stochastic parameters). Thus, we see that an order of
parameters cannot be an arbitrary for proper implementation of B3D algorithm.

  The following approaches used in this implementation of B3D algorithm in
SOLVE. The list of parameters names (as 20-letters string) are generated by
GET_NAMES (from cutil). This list is scanned by routine MAP_PARAM (from proc)
and the data structure which maintains a correspondence between parameter in
an order for FULL case and in an order for B3D case is generated. For each
parameter a block type (G -- global (for entier session), B -- mixed
local-global, C -- diagonal local-local, D -- down-diagonal local-local)
Number of block (0 for global block) and a place in a block are calculated.
When PARTL (from cutil) scans the list of parameters and builds an equation of
conditions it assigns an attribute for a stochastic parameter: is this
parameter a left or a right boundary for this observation. If this parameter
is a left boundary it is put to the list of parameters for left boundary for a
segment under consideration. Thus each equation of condition is separated onto
three parts: part of global parameters, part of left boundaries, and part of
right boundaries. And each equation of condition updates 6
submatrices-accumulators.  After reaching the next boundary these
submatrices-accumulators are used for creating blocks of normal system
(routine NSG_B3D from PROC).

  Constraints modify the normal matrix. There are two places where constraints
are imposed: in PROC (calling subroutine is CNSTR which is called after LOOP)
and in NORML (routine NORML_MAIN). All routines which imposes constraints in
PROC were modifyed. In the case of FULL matrix they work by old way, and in the
case of B3D they support a B3D scheme of keeping parameters. At the last
moment when the elements of the normal matrix are actually modified the
variable FAST_MODE is tested and in the case of B3D the address of the element
which should be modified is calculated using a map of correspondence between
parameters in FULL case and in B3D case. Constraints imposed in NORML are
calculated by different approach. An array MAT which in the FULL case keeps
the normal matrix in B3D case is zeroed out. After passing through all
subroutines which impose a variety of constraints array MAT will contain all
corrections to a normal matrix (but the order of the elements in array MAT
corresponds to the order of the elements in the FULL case). Then routine
MAPCNST_B3D (from norml) scans the array MAT, searches for non-zero elements
there and updates a set of submatrices where block of normal matrix are kept.

  In the case of B3D a routine NORML_B3D (from norml) is called. It finds the
parameter estimates, th estimates of their variances and the estimates of the
covariance matrix. Not all elements of the covariance matrix are calculated.
Only these blocks of the covariance matrix calculated which corresponds to
non-zero blocks of normal matrix. NB: the blocks of the covariance matrix
which corresponds to zero blocks of normal system ARE NOT ZERO. They are not
calculated due to saving time reasons.

  After solving the normal system by B3D algorithm a set of submatrices where
blocks of covariance matrix, the parameters estimations and their variances
are kept is reordered and expanded to the size of FULL matrix. It is done by
routine EXPAND_B3D. Starting from this point all further manipulations made by
CRES, ADJUST and other executables for the case B3D and FULL are exactly the
same.

  There are two unsubstantial complication of above mentioned scheme. Let we
have two stochastic parameters, say, clock and atmosphere modelled by linear
spline with different time span. Generally speaking normal matrix will not
have bordered, block-threediagonal structure for such a case and B3D algorithm
cannot be applied. But there is one exception. If a time span of one parameter
IS A MULTIPLE of time span of other we can unite several blocks in one. Each
united block will contains zeroes inside, but we will treat it as non-zero
elements. This scheme will not have extreme performance but nevertheless will
have considerable advantages before the FULL case. Imagine that clock behaviour
is modelled by 60 minutes segmented linear spline and atmosphere is modelled by
20 minutes time spans spline, and beginning and ending boundaries are not
coincide. Thus, one block will contain 1 set of parameters for clock boundary
and 3 sets of parameters for atmosphere boundary points.


       |___.___.___|___.___.___|___.___.___|___.___.___|


    Fig. 4  Example linear spline parametrization with non-equal time spans.


                 G  CAA   C
                 G  C AA  C
                 G  C  A  CA

                 G        CAA   C
                 G        C AA  C
                 G        C  A  CA

                 G              CAA   C
                 G              C AA  C
                 G              C  A  CA

                 G                    CAA   C
                 G                    C AA  C
                 G                    C  A  CA

    Fig. 5  Example of matrix for equations of conditions for linear spline
            parametrization with non-equal time spans.

  The difficulty is that not all parameters for the right boundary are
located in the next block. For example, looking at the fig. 5 we see that for
equations in the first two lines the clock parameters for the right boundary
are located in the next block, but the parameters for atmosphere -- at the
same block. It is not true for equations in the 3-rd line. To handle such a
case properly a special logical array is calculated in MAP_PARAM. If the value
of the element of this array which corresponds to a given parameter is false
then the parameters for a right boundary assumed to be in the current block,
not in the next.

  Additional difficulty emerges when the last boundary for the parameters
modelled by linear spline with shorter segments doesn't coincide with the last
boundary for the parameters with longer segments. It may occur when there are
too few observations after the last boundary for the parameters with longer
spans that they appeared to be unsufficient for creating an additional segment
for parameter with longer segment, but an additional segment for the parameters
with shorter segment will be created. The case when the last boundary for the
parameters with the longest time span is after the last boundary for the
parameters with shorter time span is not supported. Only the case when the
boundary for parameters with the longest time span is before (or coincide) with
the last boundary with shorter time span is supported. At the latter case the
last block will have such a particularity: for observations after the last
boundary for parameter with longer time span, each observations will supply
parameters only for the parameters of the last block. To handle such a case
properly another special logical array is calculated in MAP_PARAM. If the value
of the element of this array which corresponds to a given parameter is false
then the parameters for the left boundary assumed to be in the next block, not
in the current.


2.3. Testware


     2.3.1. FAST_DBG


  For debugging purposes a variable FAST_DBG introduced. It can be assigned
either in batch mode (keyword FAST_DBG) or in interactive mode.

  FAST_DBG = NONE  normal mode. No printout generated.

  FAST_DBG = APPEARANCE -- Messages in LOOP, DO_***, PROC, NORML, CRES will
             be generated. It allows to traces execution of a number of
             subroutines.

  FAST_DBG = PRINTOUT -- in FULL case 4 printout files will be generated:
             1-2) /tmp/nor_full.mat and /tmp/nor_full.vec with normal matrix
             and normal vector; 3-4) /tmp/cov_full.mat and /tmp/cov_full.vec
             with covariance matrix and the vector of estimates. In B3D case
             three files will be generated: 1) /tmp/param.fil which
             contains map of correspondence between parameters in FULL and
             in B3D case; 2) /tmp/nor_b3d.bin with a set of submatrices for the
             normal matrix and normal vector; 3) /tmp/cov_b3d.bin with a set
             of submatrices for the covariance matrix and the vector of the
             estimates. Some other informations are printed on the screen.

  FAST_DBG = TIMER -- time of calculation some portion of code in PROC, NORML,
             CRES will be put on the screen.


     2.3.2. TB3D


  Executable TB3D allows user to look at the normal matrices/vectors,
covariance matrix, vector of the estimates and compare its values. It is
assumed that 1) before using TB3D SOLVE ran with FAST_DBG = PRINTOUT 2)
only one user ran SOLVE in FAST_DBG = PRINTOUT mode (since the file names are
unique). The following options are available:

  0 -- Look at full norml matrix,
  1 -- Look at B3D  norml matrix/vector
  2 -- Look at transformation B3D submatrices   --->  full matrix
  3 -- Look at transformation full matrix       --->  B3D submatrices
  4 -- Compare norml matrix/vector B3D         <--->  full
  5 -- Compare covariance matrix/estimates B3D <--->  full
  6 -- look at B3D  cov matrix / vector of B3D estimates

  In the mode 4 the differences of normal matrix and normal vector for B3D
and FULL case which exceed 1.D-12 will be put on the screen.

  In the mode 5 the differences of covariance matrix and vector of the
estimates for B3D and FULL case which exceed 1.D-6 will be put on the screen.


     2.3.3. STAND_B3DX


  Executable STAND_B3DX allows user to compare results of the solution LSQ
problem by FULL algorithm and B3D algorithm using simulation data.