Implement communications directly
on memory windows
(In-core implementation of
3-D fields)
The parallel in-core mode, i.e.,
bypassing ramdisk3d completely, works
now. It gives identical results
with the 1-PE mode. The in-core mode,
mom timing is now 50
sec vs. 74 sec for the out-of-core
parallel mode
(parallel_1d and no max_window) 1
day simulation for the explicit
free-surface.
Now that we know how the data are related,
we are ready to complete
the parallel I/O implementation.
There are 2 options: either build I/O
on the in-core parallel mode or on
the out-core parallel mode. Start
with in-core (max_window) seems to
be easier to do.
Also, should we base on single PE netCDF
or multi-PE netCDF?
There is no big difference here, since
we will re-shuffle the data on
the processors. We will start
with the simple one.
Parallel I/O implementation
The parallel I/O of MOM now works,
after the in-place data redistribution
routines are integrated, and
a large # of inconsistencies are corrected.
At the moment, all the 3-D data
arrays are redistributed to PE=0 and
write out there. 2-D arrays are done
similarly. The memory usage for
I/O is essentially imt*jmt*kt.
We'll continue to get multi-PE I/O
work, i.e., all lower km PEs
(km=# of depth levels), so that the max
memory usage for I/O is imt*jmt.
The interface remains same, using
the netCDF wrapper that Sheldon wrote.
Bottleneck in open and close diagnostic file repeatedly detected
The bottleneck in diag()
subroutine was identified and eliminated,
i.e., getunit() is called
once a simulated day, instead of every time step.
The final version, including
the in-core version, gets a speedup of
3.5
times over the old version (parallel_1d and memory swapping
between memory window and ramdisk)
on 32 processors. This will
cut a 6-month simulation
to 2-3 months.
05/07/99
Stand-alone snapshot_netCDF Implementation
A stand-alone snapshot code was
written to simulate the remap-write
strategy used in MOM3.
Netcdf format is used for the data output.
Here is the snapit_netcdf
figure for the results. The total time here includes
the file ``open", file ``close"
time, remapping time and writing time, since this
represents more realistic timing
in production environment. The total I/O
time remains constant with respect
to total PEs, since they are always done
from the four I/O PEs, independent
of total computing PEs. The remapping
time reduces from 8.38 sec on
32 PEs to 4.67 sec on 256 PEs for 1442x514x40
case, showing good scaling of
the remapping algorithm.
08/03/99
Stand-alone snapshot_FFIO Implementation
FFIO format is used for the data output in
the stand-alone snapshot code to
simulate the remap-write strategy used in
MOM3.
The results are very similar to those from
NETCDF format output. Here is
the snapit_ffio
figure for the results.
Last modified in September 1999.
Back to MOM3/NERSC Homepage