03-FEB-97 12-FEB-97 09:27:26 Leonid Petrov pet@leo.gsfc.nasa.gov Implementation fast algorithms in SOLVE. ======================================= I. User guide. ~~~~~~~~~~~ In order to increase the speed of data processing new fast algorithms were included in SOLVE software system. In interactive mode a new option FAST_MODE was added at the bottom of OPTIN page. In batch mode a new keyword FAST_MODE was added in setup section. It is necessary to emphasize that new fast algorithm don't do any approximation. They produce exactly the same solution (upto the rounding errors) as a straightforward approach but calculate it rather faster exploiting sparseness of the normal matrix. Variable FAST_MODE may have the following values: 1) FAST_MODE = NONE -- the old way of calculations which was used routinely from 1976 till 1996 is chosen. Sparseness of the normal matrix is not taken into account. Further I will call it FULL case. 2) FAST_MODE = DOT_PRODUCT -- small optimization for calculation of dot product will be introduced. For huge sessions with 5-30 thousands observations calculation normal matrix is speeded up to 2-3 times. 3) FAST_MODE = B3D -- algorithm B3D is used for solving normal system. Theoretical background is described in a paper L. Petrov "Multigroup LSQ method and its generalization". Here I will put only a brief summary of the approach and dwell upon some details of an implementation it in SOLVE. Tests show that for sessions where 9-12 stations participated, and atmosphere and clocks are modelled by linear spline with 60 minutes segments B3D algorithm finds the solution by 10 times faster. When the time span for clock is 60 minutes and for atmosphere is 20 minutes B3D algorithms works by 20 times faster then for a straightforward FULL approach. 1.1 Restrictions: ~~~~~~~~~~~~~~~~~ 1) When atmosphere and clock modelled by linear spline the time spans should either be the same or be a multiple each other. Arbitrary combination of time spans leads to loss of regularity of zeroes in normal matrix and CAN NOT BE treated by B3D algorithm. 2) Time spans for spline modelling clock and/or atmosphere should be the same for all stations ('batch' mode parameterization in SOLVE notation). 3) Databases should be sorted strictly in time order. (More than 200 databases have a wrong time order before 1997 ). 4) Sessions where real clock breaks detected and lock function is modelled by the special manner to treat them are not nowadays supported in B3D option. It is planned to support this case in future versions. 1.2 Defaults and environment variables. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Initial values of new variables FAST_MODE and FAST_DBG (variable FAST_DBG is not intended for users, Their value should be NONE. Any other values will produce some debug printout) has two level defaults. Batch_mode defaults are NONE. When keyword FAST_MODE is messed in job configure file, then default value is used. To pot initial value in screen form OPTIN verify the value of environment variable FAST_MODE. If such a variable is setup, their value will be used as a default instead of batch-mode default. A special environment variable FAST_MODE__BYPASS is introduced to provide processing long arc list in batch mode. If such an environment variable is set up then for each session 1) FAST_MODE will be installed in according to its value by superseding the values from BATCH or OPTIN 2) it will be tested whether this sessions can be analyzed by B3D. If not then FAST_MODE will be set up to NONE, but temporary, only for this session. II. Programmer's guide. ~~~~~~~~~~~~~~~~~~~ 2.1. FAST_MODE = DOT_PRODUCT FAST_MODE = DOT_PRODUCT works twice: in NORML when normal equations are calculated and in CRES when residuals are produced. The total number of variables which enter in normal system for one session is hundreds and thousands. But the number of parameters on which one equation of conditions depends is considerably lesser: usually several tens. Thus, the vector of equations of conditions contains a lot of zeroes. Old SOLVE uses this property only in one half. When FAST_MODE is DOT_PRODUCT or B3D then special fast algorithm for calculation of dot product of sparse vectors is used. 2.2. FAST_MODE = B3D The normal system which is produced during analysis of one session when we model atmosphere, clock and sometimes EOP by linear spline (or, the same, "piece-wise linear function") has two important properties: 1) normal matrix has a lot of zeroes; 2) the distribution of zeroes in normal matrix has a regular structure. There exist such a parameters permutation that yields such a picture: GBBBBBBBBBBBBBB BCD BDCD B DCD B DCD B DCD B DCD B DCD B DCD B DCD B DCD B DCD B DCD B DCD Fig. 1 B3D case (symmetric, double-bordered, block-threediagonal matrix). where letter C, B, D, G denotes a blocks of non-zeroes elements (although not all elements necessarily are non-zero) and blank denotes zero blocks. Such case resembles situation with global estimation when we have symmetric, double-bordered, block-diagonal matrix (cf. Fig. 2) -- B1D case. For solving such system with B1D structure a powerful algorithm is used denoted as "ARCPE (ARC Parameter Elimination) algorithm" or "SLSQ -- Sequential Least SQuare" or B1D. GBBBBBBB BC B C B C B C B C B C B C Fig. 2 B1D case (symmetric, double-bordered, block-diagonal matrix). Algorithm B3D is a generalization above mentioned algorithm to the case when our normal matrix is symmetric, double-bordered, block-threediagonal matrix. It is the case when some parameters are modelled by linear spline. Implementing B3D in SOLVE made by adding some patches in code of ENTER, OPTIN, BATCH, PROC, NORML, CRES. When some parameters are modelled by linear spline the values of the parameters on the boundary points are estimates. Consider the simpliest case. We have two stochastic parameters: atmosphere delay in zenith direction and clock variations (clock drift and drift rate is treated as global parameters for whole session) modelled by N-segmented linear spline. The time span of the model for atmosphere and clock is the same and all observations are within the first and the last (N+1 -th) spline boundary. * |_______|_______|_______|_______|____... ___| 1 2 3 4 5 N+1 Fig. 3 Example of linear spline parameterization. Thus, f.e. if an observation (denoted by asterisk) lays between 3-rd and 4-th boundary, the parameters for 3-rd and 4-th boundary enters to equation of conditions while parameters for 1--2 and 5-- N+1 direction enter in equations and the coefficients for them are zeroes. If all stochastic parameters for each boundary point gathered in one block then each observation which lays between K-th and K+1 -th boundary points will supply stuff in K-th, K+1 -th block and to a global block (for non-stochastic parameters). Thus, we see that an order of parameters cannot be an arbitrary for proper implementation of B3D algorithm. The following approaches used in this implementation of B3D algorithm in SOLVE. The list of parameters names (as 20-letters string) are generated by GET_NAMES (from cutil). This list is scanned by routine MAP_PARAM (from proc) and the data structure which maintains a correspondence between parameter in an order for FULL case and in an order for B3D case is generated. For each parameter a block type (G -- global (for entier session), B -- mixed local-global, C -- diagonal local-local, D -- down-diagonal local-local) Number of block (0 for global block) and a place in a block are calculated. When PARTL (from cutil) scans the list of parameters and builds an equation of conditions it assigns an attribute for a stochastic parameter: is this parameter a left or a right boundary for this observation. If this parameter is a left boundary it is put to the list of parameters for left boundary for a segment under consideration. Thus each equation of condition is separated onto three parts: part of global parameters, part of left boundaries, and part of right boundaries. And each equation of condition updates 6 submatrices-accumulators. After reaching the next boundary these submatrices-accumulators are used for creating blocks of normal system (routine NSG_B3D from PROC). Constraints modify the normal matrix. There are two places where constraints are imposed: in PROC (calling subroutine is CNSTR which is called after LOOP) and in NORML (routine NORML_MAIN). All routines which imposes constraints in PROC were modifyed. In the case of FULL matrix they work by old way, and in the case of B3D they support a B3D scheme of keeping parameters. At the last moment when the elements of the normal matrix are actually modified the variable FAST_MODE is tested and in the case of B3D the address of the element which should be modified is calculated using a map of correspondence between parameters in FULL case and in B3D case. Constraints imposed in NORML are calculated by different approach. An array MAT which in the FULL case keeps the normal matrix in B3D case is zeroed out. After passing through all subroutines which impose a variety of constraints array MAT will contain all corrections to a normal matrix (but the order of the elements in array MAT corresponds to the order of the elements in the FULL case). Then routine MAPCNST_B3D (from norml) scans the array MAT, searches for non-zero elements there and updates a set of submatrices where block of normal matrix are kept. In the case of B3D a routine NORML_B3D (from norml) is called. It finds the parameter estimates, th estimates of their variances and the estimates of the covariance matrix. Not all elements of the covariance matrix are calculated. Only these blocks of the covariance matrix calculated which corresponds to non-zero blocks of normal matrix. NB: the blocks of the covariance matrix which corresponds to zero blocks of normal system ARE NOT ZERO. They are not calculated due to saving time reasons. After solving the normal system by B3D algorithm a set of submatrices where blocks of covariance matrix, the parameters estimations and their variances are kept is reordered and expanded to the size of FULL matrix. It is done by routine EXPAND_B3D. Starting from this point all further manipulations made by CRES, ADJUST and other executables for the case B3D and FULL are exactly the same. There are two unsubstantial complication of above mentioned scheme. Let we have two stochastic parameters, say, clock and atmosphere modelled by linear spline with different time span. Generally speaking normal matrix will not have bordered, block-threediagonal structure for such a case and B3D algorithm cannot be applied. But there is one exception. If a time span of one parameter IS A MULTIPLE of time span of other we can unite several blocks in one. Each united block will contains zeroes inside, but we will treat it as non-zero elements. This scheme will not have extreme performance but nevertheless will have considerable advantages before the FULL case. Imagine that clock behaviour is modelled by 60 minutes segmented linear spline and atmosphere is modelled by 20 minutes time spans spline, and beginning and ending boundaries are not coincide. Thus, one block will contain 1 set of parameters for clock boundary and 3 sets of parameters for atmosphere boundary points. |___.___.___|___.___.___|___.___.___|___.___.___| Fig. 4 Example linear spline parametrization with non-equal time spans. G CAA C G C AA C G C A CA G CAA C G C AA C G C A CA G CAA C G C AA C G C A CA G CAA C G C AA C G C A CA Fig. 5 Example of matrix for equations of conditions for linear spline parametrization with non-equal time spans. The difficulty is that not all parameters for the right boundary are located in the next block. For example, looking at the fig. 5 we see that for equations in the first two lines the clock parameters for the right boundary are located in the next block, but the parameters for atmosphere -- at the same block. It is not true for equations in the 3-rd line. To handle such a case properly a special logical array is calculated in MAP_PARAM. If the value of the element of this array which corresponds to a given parameter is false then the parameters for a right boundary assumed to be in the current block, not in the next. Additional difficulty emerges when the last boundary for the parameters modelled by linear spline with shorter segments doesn't coincide with the last boundary for the parameters with longer segments. It may occur when there are too few observations after the last boundary for the parameters with longer spans that they appeared to be unsufficient for creating an additional segment for parameter with longer segment, but an additional segment for the parameters with shorter segment will be created. The case when the last boundary for the parameters with the longest time span is after the last boundary for the parameters with shorter time span is not supported. Only the case when the boundary for parameters with the longest time span is before (or coincide) with the last boundary with shorter time span is supported. At the latter case the last block will have such a particularity: for observations after the last boundary for parameter with longer time span, each observations will supply parameters only for the parameters of the last block. To handle such a case properly another special logical array is calculated in MAP_PARAM. If the value of the element of this array which corresponds to a given parameter is false then the parameters for the left boundary assumed to be in the next block, not in the current. 2.3. Testware 2.3.1. FAST_DBG For debugging purposes a variable FAST_DBG introduced. It can be assigned either in batch mode (keyword FAST_DBG) or in interactive mode. FAST_DBG = NONE normal mode. No printout generated. FAST_DBG = APPEARANCE -- Messages in LOOP, DO_***, PROC, NORML, CRES will be generated. It allows to traces execution of a number of subroutines. FAST_DBG = PRINTOUT -- in FULL case 4 printout files will be generated: 1-2) /tmp/nor_full.mat and /tmp/nor_full.vec with normal matrix and normal vector; 3-4) /tmp/cov_full.mat and /tmp/cov_full.vec with covariance matrix and the vector of estimates. In B3D case three files will be generated: 1) /tmp/param.fil which contains map of correspondence between parameters in FULL and in B3D case; 2) /tmp/nor_b3d.bin with a set of submatrices for the normal matrix and normal vector; 3) /tmp/cov_b3d.bin with a set of submatrices for the covariance matrix and the vector of the estimates. Some other informations are printed on the screen. FAST_DBG = TIMER -- time of calculation some portion of code in PROC, NORML, CRES will be put on the screen. 2.3.2. TB3D Executable TB3D allows user to look at the normal matrices/vectors, covariance matrix, vector of the estimates and compare its values. It is assumed that 1) before using TB3D SOLVE ran with FAST_DBG = PRINTOUT 2) only one user ran SOLVE in FAST_DBG = PRINTOUT mode (since the file names are unique). The following options are available: 0 -- Look at full norml matrix, 1 -- Look at B3D norml matrix/vector 2 -- Look at transformation B3D submatrices ---> full matrix 3 -- Look at transformation full matrix ---> B3D submatrices 4 -- Compare norml matrix/vector B3D <---> full 5 -- Compare covariance matrix/estimates B3D <---> full 6 -- look at B3D cov matrix / vector of B3D estimates In the mode 4 the differences of normal matrix and normal vector for B3D and FULL case which exceed 1.D-12 will be put on the screen. In the mode 5 the differences of covariance matrix and vector of the estimates for B3D and FULL case which exceed 1.D-6 will be put on the screen. 2.3.3. STAND_B3DX Executable STAND_B3DX allows user to compare results of the solution LSQ problem by FULL algorithm and B3D algorithm using simulation data.