TotalView Part 3:
Debugging Parallel Programs

Blaise Barney, Lawrence Livermore National Laboratory

Part 3 Contents

  1. Process/Thread Groups
  2. Debugging Threaded Codes
    1. Overview
    2. Finding Thread Information
    3. Selecting a Thread
    4. Execution Control for Threaded Programs
    5. Viewing and Modifying Thread Data
  3. Debugging OpenMP Codes
    1. Overview
    2. Debugging OpenMP Programs
  4. Debugging MPI Codes
    1. Overview
    2. Starting an MPI Debug Session
    3. Selecting an MPI Process
    4. Controlling MPI Process Execution
    5. Viewing and Modifying Multi-process Data
    6. Displaying Message Queue State
  5. Debugging Hybrid Codes
    1. Overview
    2. Debugging Hybrid Programs
  6. Batch System Debugging
  7. Topics Not Covered
  8. References and More Information



Preface



Process/Thread Groups


TotalView P/T Groups:

Types of P/T Groups:

Selecting P/T Groups:

Important:



Debugging Threaded Codes

Overview

General Threads Model:

Supported Platforms:

Important Differences:


Finding Thread Information

Root Window:

Process Window:


Selecting a Thread

By Diving:

By Thread Navigation Buttons:
  • Use the thread navigation control buttons (at right) located in the bottom right corner of the Process Window.

  • "Cycle-through" the threads until the desired thread's information fills the Process Window.
Process Window thread navigation buttons

Differentiating Threads:


Execution Control for Threaded Programs

Three Scopes of Influence:

Synchronous vs. Asynchronous:

Warning For asynchronous thread control, unexpected program behavior (like hanging) can occur if some threads step or run while others are stopped - particularly in library routines. CTRL-C may be able to be used to cancel the command that caused the hang.

Thread-specific Breakpoints:


Viewing and Modifying Thread Data

Laminated Variables:

In the Kernel:



Debugging OpenMP Codes

Overview

Threads Model:

Supported Platforms:

Supported Features:


Debugging OpenMP Programs

Just Like Threads (sorta):

Setting the Number of Threads:

Code Transformation:

Master Thread vs. Worker Threads:

Stack Parent Token Line:

Example OpenMP Session:

  1. Master thread Stack Trace Pane showing original routine
  2. Process/thread status bars differentiating threads
  3. Master thread Stack Frame Pane showing shared variables
  4. Worker thread Stack Trace Pane showing outlined routine. Stack Parent Token Line is missing in this case.
  5. Worker thread Stack Frame Pane showing private variables
  6. Root Window showing all threads
  7. Threads Pane showing all threads plus selected thread

    Example OpenMP Debug Session

Execution Control:

Warning Asynchronous execution: single stepping or running one OpenMP thread while others are stopped can lead to unexpected program behavior (like hanging). CTRL-C may be able to be used to cancel the command that caused the hang.

Viewing and Modifying Data:

Manager Threads:

THREADPRIVATE Data:



Debugging MPI Codes

Overview

Multi-Process:

Supported Platforms:


Starting an MPI Debug Session

Just a Little Bit Different:

Example:

  1. Start TotalView with the parallel task manager process. Note that the order of arguments and executables is important, and differs between platforms.

    Examples:

    IBM
    (at LC)
    totalview poe -a myprog -procs 4 -rmpool 0
    QUADRICS
    Intel Linux
    under SLURM
    totalview srun -a -n 16 -p pdebug myprog
    MVAPICH
    Opteron Linux
    under SLURM
    totalview srun -a -n 16 -p pdebug myprog
    SGI
    totalview mpirun -a myprog -np 16
    Sun
    totalview mprun -a myprog -np 16
    MPICH
    mpirun -np 16 -tv myprog

  2. The Root Window and Process Window will appear as usual, however it will be the manager process that will be loaded, not your program. Start the manager process by typing g in the Process Window or by:

    Process Window >  Process Menu  >  Go 

  3. A dialog window will then appear notifying you that it is a parallel job and asking whether or not you wish to stop the job now. Click on "Yes" (see below)

  4. TotalView will then acquire the MPI tasks which are running under the manager process. When this is done, the Process Window will default to displaying the state information and source for MPI task 0. You are now ready to begin debugging your program.

    Example windows for starting an MPI program under manager process


Selecting an MPI Process

By Diving:

By Process Navigation Buttons:


Example:


Controlling MPI Process Execution

Starting and Stopping Processes:


Warning If you use accelerator keys to control execution, be sure to type the right key! It is a fairly common accident to use a process level command instead of group level command (and vice-versa). For example, typing g instead of G.

Holding and Releasing Processes:

Breakpoints and Barrier Points:

Warning About Single Process Commands:


Viewing and Modifying Multi-process Data

Laminated Variables:


Displaying Message Queue State

Types of Messages Displayed:

Actions:

Message Queue Graph:

Notes:




Debugging Hybrid Codes

Overview

What are "Hybrid" Codes?

Nothing New (Just More of It):

Supported Platforms:


Debugging Hybrid Programs

Starting a Hybrid Code Debug Session:

Tying it All Together:

Example:



Batch System Debugging


Why Would You Want To?

A General Approach:

A Couple LC Specific Notes:

Example Batch Debug Session:



Topics Not Covered


TotalView includes a number of other features and functions not covered in this tutorial. A partial list of these appears below. Please consult the TotalView Documentation for more information.



References and More Information


The most useful documentation and reference material is from Etnus. You can download this from the "Documentation" section of their web site at www.etnus.com. If you already have TotalView installed, the same documentation comes with the installation and is available from the install directory and TotalView's "Help" menu.


This concludes TotalView Part 3

Evaluation Form       Please complete the online evaluation form - unless you are doing the exercise, in which case please complete it at the end of the exercise.

Where would you like to go now?