UCRL-CODE-2001-010
NOTICE 1
This work was produced at the University of California, Lawrence Livermore National Laboratory (UC LLNL) under contract no. W-7405-ENG-48 (Contract 48) between the U.S. Department of Energy (DOE) and The Regents of the University of California (University) for the operation of UC LLNL. The rights of the Federal Government are reserved under Contract 48 subject to the restrictions agreed upon by the DOE and University as allowed under DOE Acquisition Letter 97-1.
DISCLAIMER
This work was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor the University of California nor any of their employees, makes any warranty, express or implied, or assumes any liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately-owned rights. Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or the University of California, and shall not be used for advertising or product endorsement purposes.
NOTIFICATION OF COMMERCIAL USE
Commercialization of this product is prohibited without notifying the Department of Energy (DOE) or Lawrence Livermore National Laboratory (LLNL).
In addition, to test scalability, several runs of configurations 3 and 4 will be run.
The decks used in the following tests may be found in ~/irs-1.4/decks
Each run will generate three files which should be saved. These will be variations on the following names.
The "irs" in the above names will be different for each of the tests.
This should be the first shakedown test. It should detect any obvious errors in the executable, such as mis-matched libraries or run-time errors inherent in the code.
All versions of this code should be able to be run sequentially. That is, versions compiled with MPI and/or OpenMP threads should run sequentially as well as in parallel mode.
Using input deck zrad.0001, run the code sequentially as follows
irs -k seq zrad.0001
It may help to run each test in a separate directory, so that all created files will be saved and not overwritten by subsequent runs.
If the code ran successfully, something similar to the following will be printed to the screen at the end of the run.
BENCHMARK microseconds per zone-iteration = 2.1199106478034 wall time used: 9.107000e+01 seconds total cpu time used: 8.918000e+01 seconds physics cpu time used: 8.892000e+01 seconds
This BENCHMARK score should be printed to stdout (the screen) and also to a file named "seqhsp".
If the code was built to take advantage of OpenMP threads, then the following two runs should be made. The number of OpenMP threads spawned is usually controlled by an environment variable such as OMP_NUM_THREADS. For these tests, set this value to at least 4.
irs -k omp_seq zrad.0008.seq
irs -k omp_thr zrad.0008.seq -threads
The 1st run should run the code without threads. The second run will run the same file with threads.
In general, MPI Parallel runs of codes are started by something akin to the following.
mpirun -np 4 code.to.run arg1 arg2 arg3This may be different on your platform. However, assuming this syntax, the following all-MPI tests may be run.
These tests exercise the code in MPI parallel mode using from 8 to 8000 processors.
mpirun -np 8 irs -k mpi.0008 zrad.0008
mpirun -np 64 irs -k mpi.0064 zrad.0064
mpirun -np 216 irs -k mpi.0216 zrad.0216
mpirun -np 512 irs -k mpi.0512 zrad.0512
mpirun -np 1000 irs -k mpi.1000 zrad.1000
mpirun -np 1728 irs -k mpi.1728 zrad.1728
mpirun -np 2744 irs -k mpi.2744 zrad.2744
mpirun -np 4096 irs -k mpi.4096 zrad.4096
mpirun -np 5832 irs -k mpi.5832 zrad.5832
mpirun -np 8000 irs -k mpi.8000 zrad.8000
This set of tests run the code in a mode which uses MPI and OpenMP threads. As in the previous simpler threaded case, the mechanism used to specify the number of threads is assumed to be the environment variable OMP_NUM_THREADS.
Since the architecture of the machines this code will run on is unknown, the runs which best exercise the machine are unknown. Decks are provided to test the following number of CPU's: 8, 64, 216, 512, 1000, 1728, 2744, 4096, 5832, 8000.
The number of MPI tasks and Threads needed to make use of the processors can be varied. The following runs show variations for 8 and 64 processor runs. Use corresponding arguments for the larger number of runs.
export OMP_NUM_THREADS=2
mpirun -np 4 irs -k mpi.0004mpi.002thr zrad.0008 -threads
export OMP_NUM_THREADS=4
mpirun -np 2 irs -k mpi.0002mpi.004thr zrad.0008 -threads
export OMP_NUM_THREADS=2
mpirun -np 32 irs -k mpi.0032mpi.002thr zrad.0064 -threads
export OMP_NUM_THREADS=4
mpirun -np 16 irs -k mpi.0016mpi.004thr zrad.0064 -threads
export OMP_NUM_THREADS=8
mpirun -np 8 irs -k mpi.0008mpi.008thr zrad.0064 -threads
export OMP_NUM_THREADS=16
mpirun -np 4 irs -k mpi.0004mpi.016thr zrad.0064 -threads
export OMP_NUM_THREADS=32
mpirun -np 2 irs -k mpi.0002mpi.032thr zrad.0064 -threads
The "-k xxx" option in each of the above runs determines what the prefix of each file will be. For instance, the sequential run
irs -k seq zrad.0008.seq
The top of the 'seq.tmr' file contains information which can be checked to verify the run. Important information is --
As the code is run on more processors, the .tmr file can be checked to ensure that
Sample listings from several runs is listed below, with the important information listed in bold.
Function Timers Analysis Across Processors Num of MPI Processes : 1 Num of Domains : 8 Threading : OFF Sorted by Max Wall Seconds PROCESSOR MFLOP / ROUTINE OR DOMAIN WALL SEC FLOPS CPU SECS WALL SECS ------------------------- ---------- ------------ ------------ ------------ ------------ ------------ xirs (G) Aggregate N/A 4.955942e+01 1.830046e+11 3.693100e+03 3.692630e+03 Average N/A 4.955942e+01 1.830046e+11 3.693100e+03 3.692630e+03 Wall Seconds Max Proc 0 4.955942e+01 1.830046e+11 3.693100e+03 3.692630e+03 Wall Seconds Min Proc 0 4.955942e+01 1.830046e+11 3.693100e+03 3.692630e+03
Function Timers Analysis Across Processors Num of MPI Processes : 2 Num of Domains : 8 Threading : OFF Sorted by Max Wall Seconds PROCESSOR MFLOP / ROUTINE OR DOMAIN WALL SEC FLOPS CPU SECS WALL SECS ------------------------- ---------- ------------ ------------ ------------ ------------ ------------ xirs (G) Aggregate N/A 8.965630e+01 1.830046e+11 2.023910e+03 2.041180e+03 Average N/A 4.482815e+01 9.150232e+10 2.023910e+03 2.041180e+03 Wall Seconds Max Proc 1 4.482815e+01 9.150232e+10 2.023910e+03 2.041180e+03 Wall Seconds Min Proc 0 4.482837e+01 9.150232e+10 2.013380e+03 2.041170e+03
Function Timers Analysis Across Processors Num of MPI Processes : 1 Num of Domains : 8 Threading : ON Sorted by Max Wall Seconds PROCESSOR MFLOP / ROUTINE OR DOMAIN WALL SEC FLOPS CPU SECS WALL SECS ------------------------- ---------- ------------ ------------ ------------ ------------ ------------ xirs (G) Aggregate N/A 8.150262e+01 1.830043e+11 1.780673e+08 2.245380e+03 Average N/A 8.150262e+01 1.830043e+11 1.780673e+08 2.245380e+03 Wall Seconds Max Proc 0 8.150262e+01 1.830043e+11 1.780673e+08 2.245380e+03 Wall Seconds Min Proc 0 8.150262e+01 1.830043e+11 1.780673e+08 2.245380e+03