Memo: Speed of SOLVE. ========================== 03-JUN-99 03-JUN-99 19:33:05 Leonid Petrov pet@leo.gsfc.nasa.gov I. Introduction. ~~~~~~~~~~~~~~~~ As I mentioned in my memo "About problems with speed of SOLVE" of 26-MAY-97, SOLVE spent 70% and more time for input/output when it was making the longest global runs by analyzing 2700-3000 sessions. Four steps were proposed at that time to speed up global SOLVE: 1) to put all scratch files, superfiles, SOLVE executables at the local disk of the fastest machine; 2) to eliminate train structure of SOLVE and unite all procedures in one executable; 3) to eliminate reordering CGM after processing each session; 4) to put calculation of partial derivatives on troposphere gradient to CALC Steps 1-3 have already implemented, the step 4 has been implemented in CALC but has not been implemented yet in SOLVE. Additional tricks were implemented in SOLVE: f.e. to read superfile catalogue SUPCAT only once before processing the first session also helps a lot. II. What did we achieve? ~~~~~~~~~~~~~~~~~~~~~~~~ SOLVE has a built-in feature for time profiling. If a keyword FAST_DBG TIM is specified in the control file then SOLVE puts records with CPU and elapsed time in the file TIMRxx in WORK_DIR directory. The TIMRxx file can be analyzed by prf.e program. Several test runes have been made on the machines: miro (HP 9000/715) and bootes (HP 9000/780). Summary of time profiling are presented in the Appendices A, B, C. Explanation of items is presented in the Appendix D. All runs were made at the fastest machines. Superfiles were located at the local disks; scratch files and executables were also at the local disk; arc-files were written in local disk. Machines were not running another time-consuming processes. Elapsed time in general depends on a various factors: speed of processor, speed of input/output, overall network performance and so on. Nevertheless, tests on miro and bootes showed that NO TRAIN f-SOLVE takes about 2 times less time than s-SOLVE. Main routines of NO TRAIN f-SOLVE took the following share: bootes miro BATCH 11% 8% PROC 50% 44% FORW 9% 8% BACK 7% 11% CRES 20% 22% Despite of differences in control files used for tests in bootes and miro the share is about the same. We see that the share of elapsed time used for solving normal equations is about 10-20%. Overheads for i/o of superfiles and arc-files are at the level 15-20% what is not perfect but acceptable. But PROC and CRES take 2/3 of entire time. Further efforts for accelerating f-SOLVE should be focused on reducing time spent by PROC and CRES. PROC and CRES spend part of their time for making equations of conditions, part of their time for algebra: making normal equations (PROC) or computation of residuals (CRES). Both parts require further optimization. III. What is the fastest way to use SOLVE now. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1) Use the fastest machine in exclusive mode :-); 2) Put all superfiles on the local disk(s); 3) Make arc-files directories in the local disk(s); 4) Use B1B3D (for global run) or B3D (for independent runs) methods; 5) Use NO TRAIN; 6) use SAVING_RATE not less than 10 (recommended value 100); 7) Specify GLO_PARLIM a bit larger than the expected number of global parameters; 8) Try to reduce and better: exclude at all sessions which fall back into a non-fast mode; 9) Use FAST_COV LOC unless there is a special need for examining chi/ndg; 10) Not to use SINGULARITY_CHECK ACTION REPARAMETERIZE if it is possible. Appendix A: ^^^^^^^^^^^ MIRO. INDEPENDENT run. miro HP-UX B.10.20 A 9000/715 1836 sessions. 60/60 clock/atmosphere FAST_COV SEG TRAIN NO FAST_MODE B3D ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1) ADJST 283.95 299.73 1836 2) BATCH-01 6.64 33.34 1 3) BATCH-02 505.28 707.16 1836 4) BATCH-05 14.09 25.05 1836 5) BATCH-06 1.43 3.39 1 6) CRES-01 79.40 113.95 1836 7) CRES-02 3009.07 3094.68 1836 8) CRES-03 30.93 34.37 1836 9) NORML-02 53.29 55.15 1836 10) NORML-03 515.17 529.88 1836 11) NORML-04 32.22 33.10 1836 12) PROC-01 529.34 551.50 1836 13) PROC-03 6236.99 6483.89 1836 14) PROC-04 343.05 391.14 1836 =================================================== 11640.85 12356.33 12391.00 MIRO. INDEPENDENT run. 1836 sessions. 60/60 clock/atmosphere TRAIN YES FAST_MODE B3D ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1) ADJST 381.12 400.75 1836 2) BATCH-01 6.06 19.60 1 3) BATCH-02 456.20 1064.62 1836 4) BATCH-05 12.84 17.97 1836 5) BATCH-06 1.37 2.38 1 6) CRES-01 117.85 168.25 1836 7) CRES-02 3765.39 3832.13 1836 8) CRES-03 47.61 46.92 1836 9) GLOBL 14.07 26.30 1836 10) NORML-01 35.51 37.11 1836 11) NORML-02 489.37 494.42 1836 12) NORML-03 509.61 517.61 1836 13) NORML-04 104.29 325.17 1836 14) PROC-01 446.74 459.71 1836 15) PROC-03 6725.88 6918.84 1836 16) PROC-04 487.85 747.10 1836 =================================================== 13601.76 15078.88 16640.00 MIRO. INDEPENDENT run. 1836 sessions. 60/60 clock/atmosphere TRAIN YES FAST_MODE NONE ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1) ADJST 380.50 404.64 1836 2) BATCH-01 6.43 29.44 1 3) BATCH-02 469.68 1088.18 1836 4) BATCH-05 11.81 13.59 1836 5) BATCH-06 1.42 2.43 1 6) CRES-01 111.32 158.96 1836 7) CRES-02 7034.86 7316.24 1836 8) CRES-03 47.56 51.21 1836 9) GLOBL 14.16 19.23 1836 10) NORML-01 35.97 37.40 1836 11) NORML-02 118.98 113.96 1836 12) NORML-03 2845.95 2919.11 1836 13) NORML-04 25.05 24.56 1836 14) PROC-01 440.72 466.76 1836 15) PROC-03 7583.20 7937.65 1836 16) PROC-04 457.07 526.64 1836 =================================================== 19584.68 21110.00 22619.00 Appendix B: ^^^^^^^^^^^ MIRO. Complete run. 1836 sessions. 60/60 clock/atmosphere, 1879 global parameters FAST_COV SEG FAST_MODE NONE TRAIN NO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1) ADJST 272.39 281.31 1837 | 1.54% 2) BACK-01 458.06 656.84 1836 | 3.61% 3) BACK-02 1054.18 1337.66 1836 | 7.35% 4) BATCH-01 8.51 32.37 2 | .18% 5) BATCH-02 1023.22 1405.31 3672 | 7.72% 6) BATCH-05 49.69 102.99 3672 | .57% 7) BATCH-06 22.62 24.05 1 | .13% 8) CRES-01 86.83 138.87 1836 | .76% 9) CRES-02 3692.23 3819.89 1836 | 20.98% 10) CRES-03 32.98 37.06 1836 | .20% 11) FORW-01 913.71 926.92 1836 | 5.09% 12) FORW-02 140.24 358.47 1836 | 1.97% 13) FORW-03 122.21 122.26 1836 | .67% 14) FORW-04 5.97 18.05 1 | .10% 15) NORML-01 .02 .02 1 | .00% 16) NORML-02 329.98 334.56 1 | 1.84% 17) NORML-03 547.01 554.77 1 | 3.05% 18) NORML-04 .58 3.07 1 | .02% 19) PROC-01 474.33 485.61 1836 | 2.67% 20) PROC-03 6949.10 7155.15 1836 | 39.29% 21) PROC-04 364.83 414.69 1836 | 2.28% ================================================================= 16548.69 18209.92 18380.0 | 5:06:20 MIRO. COMPLETE run. 1836 sessions. 60/60 clock/atmosphere, 1879 global parameters FAST_COV SEG FAST_MODE B1B3D TRAIN YES ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1) ADDER 32.17 33.72 21 2) ADJST 350.18 378.11 1837 3) ARCPE-02 634.54 646.93 1815 4) ARCPE-03 65.06 264.51 1815 5) ARCPE-04 2735.26 2854.60 1836 6) ARCPE_01 364.78 374.43 1836 7) BACK-01 2111.36 2392.07 1836 8) BACK-02 2271.38 2711.26 1836 9) BATCH-01 6.58 30.73 2 10) BATCH-02 875.76 2124.22 3672 11) BATCH-05 408.73 460.44 3672 12) BATCH-06 1.71 25.25 1 13) CRES-01 115.18 153.38 1836 14) CRES-02 4491.95 4995.78 1836 15) CRES-03 48.36 53.45 1836 16) GLOBL 59.21 99.77 3672 17) NORML-01 .02 .18 1 18) NORML-02 351.81 357.95 1 19) NORML-03 548.30 555.85 1 20) NORML-04 .55 3.07 1 21) PROC-01 445.19 463.74 1836 22) PROC-03 7511.68 7937.80 1836 23) PROC-04 540.89 875.48 1836 =================================================== 23970.65 27792.72 29817.00 MIRO. COMPLETE run. 1836 sessions. 60/60 clock/atmosphere, 1879 global parameters FAST_MODE NONE TRAIN YES ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1) ADDER 2649.12 2799.84 1836 2) ADJST 349.45 374.50 1837 3) ARCPE-04 4269.63 4393.67 1836 4) ARCPE_01 363.64 373.48 1836 5) BACK-01 2050.90 2095.26 1836 6) BACK-02 2476.68 2781.71 1836 7) BATCH-01 6.93 28.71 2 8) BATCH-02 910.78 2139.81 3672 9) BATCH-05 410.44 471.95 3672 10) CRES-01 112.72 143.85 1836 11) CRES-02 7780.41 7965.28 1836 12) CRES-03 47.03 47.65 1836 13) GLOBL 108.25 306.34 3672 14) NORML-01 .02 .17 1 15) NORML-02 348.80 352.48 1 16) NORML-03 546.73 552.55 1 17) NORML-04 .53 3.05 1 18) PROC-01 449.97 467.39 1836 19) PROC-03 8865.79 9470.75 1836 20) PROC-04 502.14 587.72 1836 =================================================== 32249.96 35356.16 37380.22 10:23:00 Appendix C: ^^^^^^^^^^^ BOOTES. COMPLETE run. bootes HP-UX B.10.20 A 9000/780 FAST_COV LOC SEG_OUTPUT NO TRAIN NO $FLAGS ATMOSPHERES AUTO 30 CLOCKS AUTO 60 sessions: 2 832 observations: 2 419 641 Global Parameters: 970 Total Arc Parameters : 1 047 798 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1) ADJST 242.40 253.53 2833 | 1.62% 2) BACK-01 305.22 505.92 2832 | 3.23% 3) BACK-02 261.96 621.35 2832 | 3.97% 4) BATCH-01 8.66 48.08 2 | .31% 5) BATCH-02 1342.32 1753.26 5664 | 11.19% 6) BATCH-05 17.78 50.22 5664 | .32% 7) BATCH-06 13.98 15.18 1 | .10% 8) CRES-01 71.45 158.53 2832 | 1.01% 9) CRES-02 2512.63 2922.48 2832 | 18.65% 10) CRES-03 24.91 27.59 2832 | .18% 11) FORW-01 689.22 704.06 2832 | 4.49% 12) FORW-02 125.47 403.53 2832 | 2.58% 13) FORW-03 243.61 247.88 2832 | 1.58% 14) FORW-04 1.04 4.41 1 | .03% 15) NORML-01 .00 .00 1 | .00% 16) NORML-02 15.23 15.91 1 | .10% 17) NORML-03 27.50 27.82 1 | .18% 18) NORML-04 .07 .07 1 | .00% 19) PROC-01 273.42 279.83 2832 | 1.79% 20) PROC-02 2184.00 2364.27 1080 | 15.09% 21) PROC-03 4584.43 4772.38 2832 | 30.46% 22) PROC-04 397.75 493.52 2832 | 3.15% ================================================================= 13343.05 15669.82 15825.0 | 4:23:45 BOOTES. COMPLETE run. bootes HP-UX B.10.20 A 9000/780 FAST_COV SEG SEG_OUTPUT YES TRAIN NO $FLAGS ATMOSPHERES AUTO 30 CLOCKS AUTO 60 sessions: 2 832 observations: 2 419 641 Global Parameters: 970 Total Arc Parameters : 1 047 798 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1) ADJST 350.18 366.12 2833 | 2.10% 2) BACK-01 309.07 511.50 2832 | 2.93% 3) BACK-02 1033.62 1436.03 2832 | 8.23% 4) BATCH-01 8.43 55.75 2 | .32% 5) BATCH-02 1366.28 1858.92 5664 | 10.66% 6) BATCH-05 16.95 54.86 5664 | .31% 7) BATCH-06 14.14 15.66 1 | .09% 8) CRES-01 74.04 157.36 2832 | .90% 9) CRES-02 2583.18 3089.06 2832 | 17.71% 10) CRES-03 28.10 34.99 2832 | .20% 11) FORW-01 696.90 709.57 2832 | 4.07% 12) FORW-02 169.50 489.56 2832 | 2.81% 13) FORW-03 245.06 248.93 2832 | 1.43% 14) FORW-04 1.04 4.18 1 | .02% 15) NORML-01 .00 .00 1 | .00% 16) NORML-02 15.27 15.77 1 | .09% 17) NORML-03 27.48 27.88 1 | .16% 18) NORML-04 .09 .08 1 | .00% 19) PROC-01 276.25 285.28 2832 | 1.64% 20) PROC-02 2296.99 2476.14 1080 | 14.19% 21) PROC-03 4827.20 5095.26 2832 | 29.21% 22) PROC-04 398.85 513.17 2832 | 2.94% ================================================================= 14738.62 17446.07 17597.0 | 4:53:17 Appendix D: ^^^^^^^^^^^ Comments on time profiling listing 2) BACK-01 -- time for reading arc-file and computation of lists of global and local parameters. 3) BACK-02 -- time for computation of adjustments of local parameters and their covariance matrix. 4) BATCH-01 -- time taken by BATCH before processing the first session. 5) BATCH-02 -- time fot reading superfile and writing it in the scratch file. 6) BATCH-05 -- time for saving solution to provide possibility to restore the run in the case of failure. 7) BATCH-06 -- time taken by BATCH after processing the first session. 8) CRES-01 -- time taken by CRES before computation of residuals. 9) CRES-01 -- time taken by CRES for computation of residuals. 10) CRES-01 -- time taken by CRES after computation of residuals. 11) FORW-01 -- time taken by BATCH for parameter elimination. 12) FORW-02 -- time taken by BATCH for writing arc-files. 13) FORW-03 -- time taken by BATCH for CGM update and recovering memory faults. 14) FORW-04 -- time taken by BATCH for CGM reordering and writing down. 16) NORML-02 -- time taken by NORML for imposing constraints. 17) NORML-03 -- time taken by NORML for CGM inversion 19) PROC-01 -- time taken by PROC before processing the first observation. 20) PROC-02 -- time taken by PROC for the additional run caused by re-parameterization. 21) PROC-03 -- time taken by PROC for the final run (it will be only run in absence of re-parameterization) 24) PROC-04 -- time taken by PROC for imposing constraints.