Some Detailed Progress

   11/23/98

     Implement communications directly on memory windows
     (In-core implementation of 3-D fields)

     The parallel in-core mode, i.e., bypassing ramdisk3d completely, works
    now.  It gives identical results with the 1-PE mode. The in-core mode,
    mom timing is now 50 sec vs. 74 sec for the out-of-core parallel mode
    (parallel_1d and no max_window) 1 day simulation  for the explicit
    free-surface.

    Now that we know how the data are related, we are ready to complete
    the parallel I/O implementation.  There are 2 options: either build I/O
    on the in-core parallel mode or on the out-core parallel mode. Start
    with in-core (max_window) seems to be easier to do.

    Also, should we base on single PE netCDF or multi-PE netCDF?
    There is no big difference here, since we will re-shuffle the data on
    the processors.  We will start with the simple one.

  12/17/98

    Parallel I/O implementation

    The parallel I/O of MOM now works, after the in-place data redistribution
    routines are integrated,  and a large # of inconsistencies are corrected.

    At the moment,  all the 3-D data arrays are redistributed to PE=0 and
    write out there. 2-D arrays are done similarly.  The memory usage for
    I/O is essentially imt*jmt*kt.   We'll continue to get multi-PE I/O
    work, i.e., all lower km PEs (km=# of depth levels), so that the max
    memory usage for I/O is imt*jmt.   The interface remains same, using
    the netCDF wrapper that Sheldon wrote.

  12/22/98

      Bottleneck in open and close diagnostic file repeatedly detected

     The bottleneck in diag() subroutine was identified and eliminated,
     i.e., getunit() is called once a simulated day, instead of every time step.
     The final version, including the in-core version, gets a speedup of
     3.5 times over the old version (parallel_1d and memory swapping
     between memory window and ramdisk) on 32 processors. This will
     cut a 6-month simulation  to 2-3 months.
 

  05/07/99

     Stand-alone snapshot_netCDF Implementation

     A stand-alone snapshot code was written to simulate the remap-write
     strategy used in MOM3.   Netcdf format is used for the data output.

     Here is the snapit_netcdf figure for the results.  The total time here includes
     the file ``open", file ``close" time, remapping time and writing time, since this
     represents more realistic timing in production environment.  The total I/O
     time remains constant with respect to total PEs, since they are always done
     from the four I/O PEs, independent of total computing  PEs. The remapping
     time reduces from 8.38 sec on 32 PEs to 4.67 sec on 256 PEs for 1442x514x40
     case, showing good scaling of the remapping algorithm.
 

  08/03/99

   Stand-alone snapshot_FFIO Implementation

   FFIO format is used for the data output in the stand-alone snapshot code to
   simulate the remap-write strategy used in MOM3.

   The results are very similar to those from NETCDF format output.  Here is
   the snapit_ffio figure for the results.
 

Last modified in September 1999.

Back to MOM3/NERSC Homepage