IRS: Implicit Radiation Solver 1.4 Benchmark Runs


UCRL-CODE-2001-010

NOTICE 1

This work was produced at the University of California, Lawrence Livermore National Laboratory (UC LLNL) under contract no. W-7405-ENG-48 (Contract 48) between the U.S. Department of Energy (DOE) and The Regents of the University of California (University) for the operation of UC LLNL. The rights of the Federal Government are reserved under Contract 48 subject to the restrictions agreed upon by the DOE and University as allowed under DOE Acquisition Letter 97-1.

DISCLAIMER

This work was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor the University of California nor any of their employees, makes any warranty, express or implied, or assumes any liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately-owned rights. Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or the University of California, and shall not be used for advertising or product endorsement purposes.

NOTIFICATION OF COMMERCIAL USE

Commercialization of this product is prohibited without notifying the Department of Energy (DOE) or Lawrence Livermore National Laboratory (LLNL).

Privacy & Legal Notice


Contents


Running the Code

The code should be run in the following configurations.
  1. Sequential
  2. Parallel using OpenMP Threads
  3. Parallel using MPI
  4. Parallel using MPI and OpenMP Threads

In addition, to test scalability, several runs of configurations 3 and 4 will be run.

The decks used in the following tests may be found in ~/irs-1.4/decks

Each run will generate three files which should be saved. These will be variations on the following names.

The "irs" in the above names will be different for each of the tests.

Running Sequential

This should be the first shakedown test. It should detect any obvious errors in the executable, such as mis-matched libraries or run-time errors inherent in the code.

All versions of this code should be able to be run sequentially. That is, versions compiled with MPI and/or OpenMP threads should run sequentially as well as in parallel mode.

Using input deck zrad.0001, run the code sequentially as follows

        irs -k seq zrad.0001

It may help to run each test in a separate directory, so that all created files will be saved and not overwritten by subsequent runs.

If the code ran successfully, something similar to the following will be printed to the screen at the end of the run.

     
        BENCHMARK microseconds per zone-iteration = 2.1199106478034
 
        wall        time used: 9.107000e+01 seconds
        total   cpu time used: 8.918000e+01 seconds
        physics cpu time used: 8.892000e+01 seconds

This BENCHMARK score should be printed to stdout (the screen) and also to a file named "seqhsp".

Running Parallel using OpenMP Threads

If the code was built to take advantage of OpenMP threads, then the following two runs should be made. The number of OpenMP threads spawned is usually controlled by an environment variable such as OMP_NUM_THREADS. For these tests, set this value to at least 4.

        irs -k omp_seq zrad.0008.seq

        irs -k omp_thr zrad.0008.seq -threads

The 1st run should run the code without threads. The second run will run the same file with threads.

Running Parallel using MPI

In general, MPI Parallel runs of codes are started by something akin to the following.

        mpirun -np 4 code.to.run arg1 arg2 arg3
This may be different on your platform. However, assuming this syntax, the following all-MPI tests may be run.

These tests exercise the code in MPI parallel mode using from 8 to 8000 processors.

Running Parallel using MPI and OpenMP Threads

This set of tests run the code in a mode which uses MPI and OpenMP threads. As in the previous simpler threaded case, the mechanism used to specify the number of threads is assumed to be the environment variable OMP_NUM_THREADS.

Since the architecture of the machines this code will run on is unknown, the runs which best exercise the machine are unknown. Decks are provided to test the following number of CPU's: 8, 64, 216, 512, 1000, 1728, 2744, 4096, 5832, 8000.

The number of MPI tasks and Threads needed to make use of the processors can be varied. The following runs show variations for 8 and 64 processor runs. Use corresponding arguments for the larger number of runs.


Expected Results

Each run will generate three files which should be saved. These will be variations on the following names.

The "-k xxx" option in each of the above runs determines what the prefix of each file will be. For instance, the sequential run

        irs -k seq zrad.0008.seq


Interpreting Results

The top of the 'seq.tmr' file contains information which can be checked to verify the run. Important information is --

As the code is run on more processors, the .tmr file can be checked to ensure that

Sample listings from several runs is listed below, with the important information listed in bold.


Sample .tmr data from Sequential run

Function Timers Analysis Across Processors
Num of MPI Processes : 1
Num of Domains       : 8
Threading            : OFF
Sorted by Max Wall Seconds

                                        PROCESSOR      MFLOP /                                                    
                  ROUTINE               OR DOMAIN     WALL SEC        FLOPS     CPU SECS    WALL SECS    
------------------------- ---------- ------------ ------------ ------------ ------------ ------------ 
xirs (G)                   Aggregate          N/A 4.955942e+01 1.830046e+11 3.693100e+03 3.692630e+03 
                             Average          N/A 4.955942e+01 1.830046e+11 3.693100e+03 3.692630e+03 
                    Wall Seconds Max    Proc    0 4.955942e+01 1.830046e+11 3.693100e+03 3.692630e+03 
                    Wall Seconds Min    Proc    0 4.955942e+01 1.830046e+11 3.693100e+03 3.692630e+03 


Sample .tmr data from MPI parallel run

Function Timers Analysis Across Processors
Num of MPI Processes : 2
Num of Domains       : 8
Threading            : OFF
Sorted by Max Wall Seconds

                                        PROCESSOR      MFLOP /                                                    
                  ROUTINE               OR DOMAIN     WALL SEC        FLOPS     CPU SECS    WALL SECS   
------------------------- ---------- ------------ ------------ ------------ ------------ ------------ 
xirs (G)                   Aggregate          N/A 8.965630e+01 1.830046e+11 2.023910e+03 2.041180e+03 
                             Average          N/A 4.482815e+01 9.150232e+10 2.023910e+03 2.041180e+03 
                    Wall Seconds Max    Proc    1 4.482815e+01 9.150232e+10 2.023910e+03 2.041180e+03 
                    Wall Seconds Min    Proc    0 4.482837e+01 9.150232e+10 2.013380e+03 2.041170e+03 

 

Sample .tmr data from Threaded parallel run

Function Timers Analysis Across Processors
Num of MPI Processes : 1
Num of Domains       : 8
Threading            : ON
Sorted by Max Wall Seconds

                                        PROCESSOR      MFLOP /                                                    
                  ROUTINE               OR DOMAIN     WALL SEC        FLOPS     CPU SECS    WALL SECS   
------------------------- ---------- ------------ ------------ ------------ ------------ ------------ 
xirs (G)                   Aggregate          N/A 8.150262e+01 1.830043e+11 1.780673e+08 2.245380e+03 
                             Average          N/A 8.150262e+01 1.830043e+11 1.780673e+08 2.245380e+03 
                    Wall Seconds Max    Proc    0 8.150262e+01 1.830043e+11 1.780673e+08 2.245380e+03 
                    Wall Seconds Min    Proc    0 8.150262e+01 1.830043e+11 1.780673e+08 2.245380e+03 


UCRL-CODE-2001-010