NERSC logo National Energy Research Scientific Computing Center
  A DOE Office of Science User Facility
  at Lawrence Berkeley National Laboratory
Restore navigation column
NERSC Tutorials

Introduction

This tutorial will take you through the OpenMP directives, starting with the most basic and useful directives.

Execution Model

A program that is written using OpenMP directives begins execution as a single process, called the master thread of execution. The master thread executes sequentially until the first parallel construct is encountered. The PARALLEL / END PARALLEL directive pair constitutes the parallel construct.

When a parallel construct is encountered, the master thread creates a team of threads, and the master becomes the master of the team. The program statements that are enclosed in a parallel construct, including routines called from within the construct, are executed in parallel by each thread in the team.

Upon completion of the parallel construct, the threads in the team synchronize and only the master thread continues execution. Any number of parallel constructs can be specified in a single program. As a result, a program may fork and join many times during execution.

The degree of parallelism an OpenMP code is dependent on the code, the platform, the hardware configuration, the compiler, and the operating system. In no case are you guaranteed to have each thread running on a separate processor.

The Cray XT4 will only allow you to specify a number of threads less than or equal to the number of processor cores on a node. The franklin compute nodes all have four cores, so you can execute 1, 2, 3, or 4 threads on them.

You are only allowed to run OpenMP codes on franklin on compute nodes with a pbs script.

Examples

Examples using OpenMP can be copied to your $HOME/openmp_examples directory on franklin by using:

% cd $HOME
% mkdir openmp_examples
% module load training
% cp $EXAMPLES/OpenMP/tutorial/* openmp_examples

OpenMP Directive Syntax

OpenMP directives are inserted directly into source code. Free-form Fortran source code directives begin with the sentinel !$OMP. Fixed-form Fortran source code directives begin with the sentinels !$OMP, C$OMP, or *$OMP. Sentinels must start in column one. Continuation lines are permitted using the same format as the Fortran source code format you are using (free or fixed).

Following are descriptions of the basic use of OpenMP directives with examples.

PARALLEL Directive

A Parallel Region is a block of code that is to be executed in parallel by a number of threads. Each thread executes the enclosed code separately.

Note that all code within a parallel region is executed by each thread unless other OpenMP directives specify otherwise. For instance, a DO loop that lies within a parallel region will be executed completely (and redundantly) by each thread unless a parallel DO directive is inserted before the loop. A DO or PARALLEL DO directive is necessary if you want the loop to be executed once with different threads performing different iterations of the loop in parallel.

It is illegal to branch out of a Parallel Region.

!$OMP PARALLEL [clause] 

	code block       

!$OMP END PARALLEL 

There are many possible values of [clause].

Examples

!Filename: parallel.f90
!
!This simply shows that code in a PARALLEL
!region is executed by each thread.

PROGRAM PARALLEL 
        IMPLICIT NONE
        INTEGER I

        I=1

!$OMP PARALLEL FIRSTPRIVATE(I)

        PRINT *, I

!$OMP END PARALLEL 

END PROGRAM PARALLEL 

To run on franklin:

> cat parallel.pbs
#PBS -N parallel
#PBS -j oe
#PBS -o parallel.out
#PBS -q interactive
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=4
#PBS -l walltime=00:05:00
#PBS -V

cd $PBS_O_WORKDIR

ftn -o parallel -mp=nonuma -Minfo=mp parallel.f90

export OMP_NUM_THREADS=4
aprun -n 1 -N 1 -d 4 ./parallel
> qsub parallel.pbs
498022.nid00003
> cat parallel.out
/opt/xt-asyncpe/1.0c/bin/ftn: INFO: linux target is being used
parallel:
    12, Parallel region activated
    14, Parallel region terminated
            1
            1
            1
            1
Application 277470 resources: utime 0, stime 0

The next example shows use of the REDUCTION clause. This simple example shows how the values of the variables are combined when leaving the parallel region when a REDUCTION clause is used. Also note that each thread executes the PRINT statement in the parallel region.

!Filename: reduction.f90
!
!This program shows the use of the REDUCTION clause.

PROGRAM REDUCTION 
        IMPLICIT NONE
        INTEGER tnumber, OMP_GET_THREAD_NUM
        INTEGER I,J,K
        I=1
        J=1
        K=1
        PRINT *, "Before Par Region: I=",I," J=", J," K=",K
        PRINT *, ""

!$OMP PARALLEL DEFAULT(PRIVATE) REDUCTION(+:I)&
!$OMP			 REDUCTION(*:J) REDUCTION(MAX:K)

        tnumber=OMP_GET_THREAD_NUM()

        I = tnumber
        J = tnumber
        K = tnumber

        PRINT *, "Thread ",tnumber, "         I=",I," J=", J," K=",K

!$OMP END PARALLEL

        PRINT *, ""
        print *, "Operator            +     *    MAX"
        PRINT *, "After Par Region:  I=",I," J=", J," K=",K

END PROGRAM REDUCTION 

To run on franklin:

> cat reduction.pbs
#PBS -N reduction
#PBS -j oe
#PBS -o reduction.out
#PBS -q interactive
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=4
#PBS -l walltime=00:05:00
#PBS -V

cd $PBS_O_WORKDIR

ftn -o reduction -mp=nonuma -Minfo=mp reduction.f90

export OMP_NUM_THREADS=4
aprun -n 1 -N 1 -d 4 ./reduction
> qsub reduction.pbs
498041.nid00003
> cat reduction.out
/opt/cray/xt-asyncpe/2.0/bin/ftn: INFO: linux target is being used
reduction:
   15, Parallel region activated
   25, Begin critical section
       End critical section
       Parallel region terminated
 Before Parallel Region: I=            1  J=            1  K=            1
 
 Thread             0               I=            0  J=            0  K= 
            0
 Thread             2               I=            2  J=            2  K= 
            2
 Thread             3               I=            3  J=            3  K= 
            3
 Thread             1               I=            1  J=            1  K= 
            1
 
 Operator                 +     *    MAX
 After Parallel Region:  I=            7  J=            0  K=            3
Application 2015449 resources: utime 0, stime 0

DO Directive

The DO directive specifies that the iterations of the immediately following DO loop must be executed in parallel. The DO directive must be enclosed in a parallel region; it creates no threads by itself. The following do loop can not be a DO WHILE.

!$OMP DO [clause[[,]clause ...]  
	do_loop 
!$OMP END DO [NOWAIT] 

The DO clause can have various values.

It is illegal to branch out of a DO loop associated with the DO directive.

Example

!Filename: dodir.f90
!

PROGRAM DODIR 
        IMPLICIT NONE
        INTEGER I,L
        INTEGER, PARAMETER:: DIM=16
        REAL A(DIM),B(DIM),S
        INTEGER nthreads,tnumber
        INTEGER OMP_GET_NUM_THREADS,OMP_GET_THREAD_NUM

        CALL RANDOM_NUMBER(A)
        CALL RANDOM_NUMBER(B)

!$OMP PARALLEL DEFAULT(PRIVATE) SHARED(A,B)
!$OMP DO SCHEDULE(STATIC,4)
        DO I=2,DIM
                B(I) = ( A(I) - A(I-1) ) / 2.0

                nthreads=OMP_GET_NUM_THREADS()
                tnumber=OMP_GET_THREAD_NUM()
                print *, "Thread",tnumber," of",nthreads," has I=",I
        END DO
!$OMP END DO
!$OMP END PARALLEL

        S=MAXVAL(B)
        L=MAXLOC(B,1)

        PRINT *, "Maximum gradient: ",S," at location:",L

END PROGRAM DODIR 

Compiling and running on Franklin:

> cat dodir.pbs
#PBS -N dodir
#PBS -j oe
#PBS -o dodir.out
#PBS -q interactive
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=4
#PBS -l walltime=00:05:00
#PBS -V

cd $PBS_O_WORKDIR

ftn -o dodir -mp=nonuma -Minfo=mp dodir.f90

export OMP_NUM_THREADS=4
aprun -n 1 -N 1 -d 4 ./dodir
> qsub dodir.pbs
500611.nid00003
> cat dodir.out
/opt/cray/xt-asyncpe/2.0/bin/ftn: INFO: linux target is being used
dodir:
   15, Parallel region activated
   17, Parallel loop activated with static block-cyclic schedule
   24, Barrier
       Parallel region terminated
 Thread            0  of            4  has I=            2
 Thread            0  of            4  has I=            3
 Thread            0  of            4  has I=            4
 Thread            0  of            4  has I=            5
 Thread            3  of            4  has I=           14
 Thread            3  of            4  has I=           15
 Thread            3  of            4  has I=           16
 Thread            1  of            4  has I=            6
 Thread            1  of            4  has I=            7
 Thread            1  of            4  has I=            8
 Thread            1  of            4  has I=            9
 Thread            2  of            4  has I=           10
 Thread            2  of            4  has I=           11
 Thread            2  of            4  has I=           12
 Thread            2  of            4  has I=           13
 Maximum gradient:    0.6280164      at location:            1
Application 2016137 resources: utime 0, stime 0

Notice that the loop was divided among the 4 threads as we requested with the SCHEDULE(STATIC,4) clause. Also note that if had not enclosed the do loop in a DO/END DO directive block, the loop would not have been split, but would have been executed OMP_NUM_THREADS number of times.

PARALLEL DO Directive

The PARALLEL DO directive provides a shortcut form for specifying a parallel region that contains a single DO directive. The semantics are identical to specifying a PARALLEL directive followed by a DO directive.

!$OMP PARALLEL DO [clause[[,]clause ...]  
	do_loop 
!$OMP END PARALLEL DO 

The clause can be any of those associated with the PARALLEL and DO directives described above.

Example

!Filename: pardo.f90
!

PROGRAM PARDO 
        IMPLICIT NONE
        INTEGER I,J
        INTEGER, PARAMETER:: DIM1=10000, DIM2=200
        REAL A(DIM1),B(DIM2,DIM1),C(DIM2,DIM1)
        REAL before, after, elapsed,S
        INTEGER nthreads,OMP_GET_NUM_THREADS

        CALL RANDOM_NUMBER(A)

        call cpu_time(before)

!$OMP PARALLEL DO SCHEDULE(RUNTIME) PRIVATE(I,J) SHARED (A,B,C,nthreads)
        DO J=1,DIM2
                nthreads = OMP_GET_NUM_THREADS()
          DO I=2, DIM1
                B(J,I) = ( (A(I)+A(I-1))/2.0 ) / SQRT(A(I))
                C(J,I) = SQRT( A(I)*2 ) / ( A(I)-(A(I)/2.0) )
                B(J,I) = C(J,I) * ( B(J,I)**2 ) * SIN(A(I))
          END DO
        END DO

!$OMP END PARALLEL DO

        call cpu_time(after)
        !Find elapsed time; convert to seconds from ms
        elapsed = after-before

        S=MAXVAL(B)

        WRITE(6,'("Maximum of B=",1pe8.2," found in ",1pe8.2," &
		&seconds using", I2," threads")') S,elapsed,nthreads


END PROGRAM PARDO 

Compiling and running on Franklin:

> cat pardo.pbs
#PBS -N pardo
#PBS -j oe
#PBS -o pardo.out
#PBS -q interactive
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=4
#PBS -l walltime=00:05:00
#PBS -V

cd $PBS_O_WORKDIR

ftn -o pardo -mp=nonuma -Minfo=mp pardo.f90

export OMP_NUM_THREADS=4
aprun -n 1 -N 1 -d 4 ./pardo
> qsub pardo.pbs
500719.nid00003
> cat pardo.out
/opt/cray/xt-asyncpe/2.0/bin/ftn: INFO: linux target is being used
pardo:
   16, Parallel region activated
   17, Parallel loop activated with  runtime schedule schedule
   24, Parallel region terminated
Maximum of B=2.92E+01 found in 7.68E-02 seconds using 4 threads
Application 2016575 resources: utime 0, stime 0

SECTIONS Directive

The SECTIONS directive specifies that code in the the enclosed SECTION blocks are to be divided among the threads in the team. Each section is executed once.

!$OMP SECTIONS [clause[[,]clause ...]  

[!$OMP SECTION]
	code block                  
[!$OMP SECTION
	code block]
... 

!$OMP END SECTIONS [NOWAIT] 

The clause can be one of the following:

Example

!Filename: sections.f90
!
!This shows code that is executed 
!in sections.

PROGRAM SECTIONS 
        IMPLICIT NONE
        INTEGER OMP_GET_THREAD_NUM, tnumber 


!$OMP PARALLEL
!$OMP SECTIONS PRIVATE(tnumber) 

!$OMP SECTION
        tnumber=OMP_GET_THREAD_NUM()
        PRINT *,"This is section 1 being executed by thread",tnumber    
!$OMP SECTION
        tnumber=OMP_GET_THREAD_NUM()
        PRINT *,"This is section 2 being executed by thread",tnumber    
!$OMP SECTION
        tnumber=OMP_GET_THREAD_NUM()
        PRINT *,"This is section 3 being executed by thread",tnumber    
!$OMP SECTION
        tnumber=OMP_GET_THREAD_NUM()
        PRINT *,"This is section 4 being executed by thread",tnumber    
!$OMP END SECTIONS 
!$OMP END PARALLEL

END PROGRAM SECTIONS 

Compiling and running on franklin:

> cat sections.pbs
#PBS -N sections
#PBS -j oe
#PBS -o sections.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V

cd $PBS_O_WORKDIR

ftn -o sections -mp=nonuma -Minfo=mp sections.f90

export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./sections
> qsub sections.pbs
500738.nid00003
> cat sections.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
sections.f90:
sections:
    11, Parallel region activated
    12, Begin sections
    17, New section
    20, New section
    23, New section
    26, End sections
        Parallel region terminated
 This is section 2 being executed by thread            1
 This is section 4 being executed by thread            1
 This is section 1 being executed by thread            0
 This is section 3 being executed by thread            0
Application 4734431 resources: utime 0, stime 0

SINGLE Directive

The SINGLE directive specifies that the enclosed code is to be executed by only one thread in the team. Threads that are not executing in the SINGLE directive wait at the END SINGLE directive unless NOWAIT is specified. It is illegal to branch out of a SINGLE block.

!$OMP SINGLE [clause[[,]clause ...]  
	block                   
!$OMP END SINGLE [NOWAIT]  

The clause can be on of the following:

Example

!Filename: single.f90
!
!This shows use of the SINGLE directive.
!

PROGRAM SINGLE 
        IMPLICIT NONE
        INTEGER, PARAMETER:: N=12
        REAL, DIMENSION(N):: A,B,C,D
        INTEGER:: I
        REAL:: SUMMED

!$OMP PARALLEL SHARED(A,B,C,D) PRIVATE(I)

!***** Reading files fort.10, fort.11, fort.12 in parallel

!$OMP SECTIONS
!$OMP SECTION
        READ(10,*) (A(I),I=1,N)
!$OMP SECTION
        READ(11,*) (B(I),I=1,N)
!$OMP SECTION
        READ(12,*) (C(I),I=1,N)
!$OMP END SECTIONS

!$OMP SINGLE
        SUMMED = SUM(A) + SUM(B) + SUM(C)
        PRINT *, "Sum of A+B+C=",SUMMED
!$OMP END SINGLE 

!$OMP DO SCHEDULE(STATIC,4)
        DO I=1,N
                D(I) = A(I) + B(I)*C(I)
        END DO
!$OMP END DO

!$OMP END PARALLEL

        PRINT *, "The values of D are", D

END PROGRAM SINGLE 

Compiling and running on franklin. The files named fort.10, fort.11, and fort.12 each has 12 1.0's:

> cat single.pbs
#PBS -N single
#PBS -j oe
#PBS -o single.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V

cd $PBS_O_WORKDIR

ftn -o single -mp=nonuma -Minfo=mp single.f90

export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./single
> qsub single.pbs
500801.nid00003
> cat single.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
single.f90:
single:
    13, Parallel region activated
    17, Begin sections
    20, New section
    22, New section
    24, End sections
    26, Begin single section
    29, End single section
        Barrier
    32, Parallel loop activated; static block-cyclic iteration allocation
    35, Barrier
        Parallel region terminated
 Sum of A+B+C=    36.00000    
 The values of D are    2.000000        2.000000        2.000000     
    2.000000        2.000000        2.000000        2.000000     
    2.000000        2.000000        2.000000        2.000000     
    2.000000    
Application 4735339 resources: utime 0, stime 0

MASTER Directive

The code enclosed with MASTER and END MASTER directives is executed only by the master thread of the team. There is no implied barrier either on entry to or exit from the MASTER section. Branching out of a MASTER block is illegal.

!$OMP MASTER                 
	block                  
!$OMP END MASTER  

BARRIER Directive

!$OMP BARRIER

The BARRIER directive synchronizes all threads in a team. When encountered, each thread waits until all of the others threads in that team have reached this point.

Example

!Filename: barrier.f90
!
!This shows use of the BARRIER directive.
!

PROGRAM ABARRIER 
        IMPLICIT NONE
        INTEGER:: L
        INTEGER:: nthreads, OMP_GET_NUM_THREADS
        INTEGER:: tnumber, OMP_GET_THREAD_NUM

!$OMP PARALLEL SHARED(L) PRIVATE(nthreads,tnumber)
        nthreads = OMP_GET_NUM_THREADS()
        tnumber  = OMP_GET_THREAD_NUM()

!$OMP MASTER

        PRINT *, ' Enter a value for L'
        READ(5,*)  L

!$OMP END MASTER

!$OMP BARRIER

!$OMP CRITICAL

        PRINT *, ' My thread number         =',tnumber
        PRINT *, ' Number of threads        =',nthreads
        PRINT *, ' Value of L               =',L
        PRINT *, ''

!$OMP END CRITICAL

!$OMP END PARALLEL


END PROGRAM ABARRIER 

Compiling and running on franklin:

> cat barrier.pbs
#PBS -N barrier
#PBS -j oe
#PBS -o barrier.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V

cd $PBS_O_WORKDIR

ftn -o barrier -mp=nonuma -Minfo=mp barrier.f90

export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./barrier < ninety
> cat ninety
90
> qsub barrier.pbs
500868.nid00003
> cat barrier.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
barrier.f90:
barrier:
    12, Parallel region activated
    16, Begin master section
    21, End master section
    23, Barrier
    27, Begin critical section __cs_unspc
    30, End critical section __cs_unspc
        Parallel region terminated
  Enter a value for L
  My thread number         =            0
  Number of threads        =            2
  Value of L               =           90
 
  My thread number         =            1
  Number of threads        =            2
  Value of L               =           90
 
Application 4736056 resources: utime 0, stime 0

FLUSH Directive

!$OMP FLUSH(list) 

The FLUSH directive identifies synchronization points at which the implementation is required to provide a consistent view of memory. The directive must appear at the precise point in the code at which the synchronization is required. The optional list argument consists of a comma-separated list of variables that need to be flushed in order to avoid flushing all variables.

The FLUSH directive is implied for the following directives:

The directive is not implied if NOWAIT is present.

ATOMIC Directive

!$OMP ATOMIC

The ATOMIC directive ensures that a specific memory location is to be updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads.

Example

!Filename: density.f
!
PROGRAM DENSITY
        IMPLICIT NONE

        INTEGER, PARAMETER:: NBINS=10
        INTEGER, PARAMETER:: NPARTICLES=100000
        REAL:: XMIN, XMAX, MAXMASS, MINMASS

        REAL, DIMENSION(NPARTICLES):: X_LOCATION, PARTICLE_MASS
        INTEGER, DIMENSION(NPARTICLES):: BIN
        REAL, DIMENSION(NBINS):: GRID_MASS, GRID_DENSITY
        INTEGER, DIMENSION(NBINS):: GRID_N

        REAL:: DX,DXINV,TOTAL_MASS,CHECK_MASS
        INTEGER:: I, CHECK_N, XMAX_LOC(1)

        GRID_MASS=0.0
        TOTAL_MASS=0.0
	GRID_N=0
        CHECK_MASS=0.0
        CHECK_N=0

! Initialize particle positions and masses
        CALL RANDOM_NUMBER(PARTICLE_MASS)
        CALL RANDOM_NUMBER(X_LOCATION)

        MAXMASS = MAXVAL(PARTICLE_MASS)
        MINMASS = MINVAL(PARTICLE_MASS)
        XMAX = MAXVAL(X_LOCATION)
        XMIN = MINVAL(X_LOCATION)
        PRINT *, 'MINMASS =',MINMASS,' MAXMASS = ',MAXMASS
        PRINT *, 'XMIN =',XMIN,' XMAX = ',XMAX


! Grid Spacing (and inverse)
        DX = (XMAX-XMIN) / FLOAT(NBINS)
        DXINV = 1/DX

!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(I) REDUCTION(+:TOTAL_MASS)

!$OMP DO
        DO I = 1, NPARTICLES
                IF (I==XMAX_LOC(1)) THEN
                        BIN(I) = NBINS
                ELSE
                        BIN(I) = 1 + ( (X_LOCATION(I)-XMIN) * DXINV )
                END IF

                IF(BIN(I) < 1 .OR. BIN(I) > NBINS) THEN
!                  Off Grid!
                   PRINT *, 'ERROR: BIN =',BIN(I),' X =',X_LOCATION(I)  

                ELSE
!$OMP ATOMIC
                     GRID_MASS(BIN(I)) = GRID_MASS(BIN(I)) &
					+ PARTICLE_MASS(I)
!$OMP ATOMIC
                     GRID_N(BIN(I))    = GRID_N(BIN(I)) + 1

                     TOTAL_MASS = TOTAL_MASS + PARTICLE_MASS(I)
                END IF
        END DO
!$OMP END DO

!$OMP END PARALLEL

        DO I=1, NBINS
                GRID_DENSITY(I) = GRID_MASS(I) * DXINV
        END DO

        PRINT *, 'Total Particles =',NPARTICLES
        PRINT *, 'Total Mass =',TOTAL_MASS
        DO I=1,NBINS
                PRINT *, 'DENSITY(',I,' ) =',GRID_DENSITY(I),' &
			&MASS(',I,' ) =',GRID_MASS(I)
        END DO

! Check for consistency
        DO I=1,NBINS
                CHECK_MASS = CHECK_MASS + GRID_MASS(I)
                CHECK_N    = CHECK_N + GRID_N(I)
        END DO

        PRINT *, 'Particles on Grid =', CHECK_N
        PRINT *, 'Total Mass on Grid =', CHECK_MASS
END PROGRAM DENSITY

Without the ATOMIC directives, difference threads would try to update the grid mass bins at the same time, causing erroneous results. Compiling and running on franklin:

> cat density.pbs
#PBS -N density
#PBS -j oe
#PBS -o density.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V

cd $PBS_O_WORKDIR

ftn -o density -mp=nonuma -Minfo=mp density.f90

export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./density
> qsub density.pbs
500899.nid00003
> cat density.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
density.f90:
density:
    40, Parallel region activated
    43, Parallel loop activated; static block iteration allocation
    55, Begin critical section
        End critical section
    57, Begin critical section
        End critical section
    62, Barrier
    64, Begin critical section
        End critical section
        Parallel region terminated
 MINMASS =   4.8334509E-06  MAXMASS =    0.9999951    
 XMIN =   4.3212090E-06  XMAX =    0.9999959    
 Total Particles =       100000
 Total Mass =    50035.98    
 DENSITY(            1  ) =    50715.23      MASS(            1  ) = 5071.481    
 DENSITY(            2  ) =    50151.22      MASS(            2  ) = 5015.081    
 DENSITY(            3  ) =    50154.58      MASS(            3  ) = 5015.417    
 DENSITY(            4  ) =    49183.93      MASS(            4  ) = 4918.352    
 DENSITY(            5  ) =    49083.88      MASS(            5  ) = 4908.347    
 DENSITY(            6  ) =    50813.93      MASS(            6  ) = 5081.351    
 DENSITY(            7  ) =    51367.36      MASS(            7  ) = 5136.694    
 DENSITY(            8  ) =    49535.90      MASS(            8  ) = 4953.549    
 DENSITY(            9  ) =    50296.39      MASS(            9  ) = 5029.598    
 DENSITY(           10  ) =    49062.40      MASS(           10  ) = 4906.199    
 Particles on Grid =       100000
 Total Mass on Grid =    50036.07    
Application 4736271 resources: utime 0, stime 0

Running a second time reveals that the additions of numbers ranging over 5 orders of magnitude (MINMASS to MAXMASS) are not entirely associative; note the values of MASS and Total Mass. You will not get precisely the same results each time as you would if had used a single thread.

 MINMASS =   4.8334509E-06  MAXMASS =    0.9999951    
 XMIN =   4.3212090E-06  XMAX =    0.9999959    
 Total Particles =       100000
 Total Mass =    50035.98    
 DENSITY(            1  ) =    50715.10      MASS(            1  ) = 5071.468    
 DENSITY(            2  ) =    50151.42      MASS(            2  ) = 5015.101    
 DENSITY(            3  ) =    50154.55      MASS(            3  ) = 5015.414    
 DENSITY(            4  ) =    49184.07      MASS(            4  ) = 4918.366    
 DENSITY(            5  ) =    49083.94      MASS(            5  ) = 4908.353    
 DENSITY(            6  ) =    50813.96      MASS(            6  ) = 5081.354    
 DENSITY(            7  ) =    51367.18      MASS(            7  ) = 5136.676    
 DENSITY(            8  ) =    49535.82      MASS(            8  ) = 4953.542    
 DENSITY(            9  ) =    50296.48      MASS(            9  ) = 5029.606    
 DENSITY(           10  ) =    49062.43      MASS(           10  ) = 4906.203    
 Particles on Grid =       100000
 Total Mass on Grid =    50036.08    

As an illustration of how things would go wrong, here's sample output with the ATOMIC directive removed from the code. Note the particle conservation check fails:

 MINMASS =   4.8334509E-06  MAXMASS =    0.9999951    
 XMIN =   4.3212090E-06  XMAX =    0.9999959    
 Total Particles =       100000
 Total Mass =    50035.98    
 DENSITY(            1  ) =    45400.19      MASS(            1  ) = 4539.981    
 DENSITY(            2  ) =    44862.25      MASS(            2  ) = 4486.188    
 DENSITY(            3  ) =    45124.16      MASS(            3  ) = 4512.378    
 DENSITY(            4  ) =    44338.48      MASS(            4  ) = 4433.811    
 DENSITY(            5  ) =    44486.02      MASS(            5  ) = 4448.565    
 DENSITY(            6  ) =    45709.66      MASS(            6  ) = 4570.928    
 DENSITY(            7  ) =    46367.48      MASS(            7  ) = 4636.710    
 DENSITY(            8  ) =    44487.38      MASS(            8  ) = 4448.701    
 DENSITY(            9  ) =    45177.90      MASS(            9  ) = 4517.752    
 DENSITY(           10  ) =    44100.41      MASS(           10  ) = 4410.005    
 Particles on Grid =        90036
 Total Mass on Grid =    45005.02    
Application 4736632 resources: utime 0, stime 0

ORDERED Directive

The code enclosed with ORDERED and END ORDERED directives is executed in the order in which iterations would be executed in a sequential execution of the loop. The ORDERED directive can only appear in the context of a DO or PARALLEL DO directive. It is illegal to branch into or out of an ORDERED block.

!$OMP ORDERED              
	block                   
!$OMP END ORDERED          

Example

PROGRAM ORDERED
        IMPLICIT NONE

        INTEGER, PARAMETER:: N=1000, M=4000
        REAL, DIMENSION(N,M):: X,Y
        REAL, DIMENSION(N):: Z
        INTEGER I,J

        CALL RANDOM_NUMBER(X)
        CALL RANDOM_NUMBER(Y)
        Z=0.0

        PRINT *, 'The first 20 values of Z are:'

!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(I,J)

!$OMP DO SCHEDULE(DYNAMIC,2) ORDERED
        DO I=1,N
                DO J=1,M
                        Z(I) = Z(I) + X(I,J)*Y(J,I)
                END DO

!$OMP ORDERED
                IF(I<21) THEN
                        PRINT *, 'Z(',I,') =',Z(I)
                END IF
!$OMP END ORDERED

        END DO

!$OMP END DO

!$OMP END PARALLEL


END PROGRAM ORDERED

Compiling and running on franklin:

> cat ordered.pbs
#PBS -N ordered
#PBS -j oe
#PBS -o ordered.out
#PBS -q interactive
#PBS -S /bin/bash
#PBS -l mppwidth=1
#PBS -l mppnppn=1
#PBS -l mppdepth=2
#PBS -l walltime=00:05:00
#PBS -V

cd $PBS_O_WORKDIR

ftn -o ordered -mp=nonuma -Minfo=mp ordered.f90

export OMP_NUM_THREADS=2
aprun -n 1 -N 1 ./ordered
> qsub ordered.pbs
500980.nid00003
> cat ordered.out
/opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used
ordered.f90:
ordered:
    15, Parallel region activated
    18, Parallel loop activated; dynamic iteration allocation
    31, Barrier
        Parallel region terminated
 The first 20 values of Z are:
 Z(            1 ) =    1028.067    
 Z(            2 ) =    1015.378    
 Z(            3 ) =    1010.786    
 Z(            4 ) =    1003.594    
 Z(            5 ) =    990.9749    
 Z(            6 ) =    982.2872    
 Z(            7 ) =    1021.567    
 Z(            8 ) =    1019.952    
 Z(            9 ) =    1011.410    
 Z(           10 ) =    986.6424    
 Z(           11 ) =    987.3596    
 Z(           12 ) =    992.1103    
 Z(           13 ) =    1001.674    
 Z(           14 ) =    1000.352    
 Z(           15 ) =    1021.354    
 Z(           16 ) =    1009.728    
 Z(           17 ) =    996.0969    
 Z(           18 ) =    1005.107    
 Z(           19 ) =    993.8898    
 Z(           20 ) =    981.4053    
Application 4736903 resources: utime 22, stime 3

Without the ordered directive, sample output looks like this:

 The first 20 values of Z are:
 Z(            3 ) =    1010.786    
 Z(            1 ) =    1028.067    
 Z(            2 ) =    1015.378    
 Z(            5 ) =    990.9749    
 Z(            6 ) =    982.2872    
 Z(            7 ) =    1021.567    
 Z(            8 ) =    1019.952    
 Z(            9 ) =    1011.410    
 Z(           10 ) =    986.6424    
 Z(           11 ) =    987.3596    
 Z(           12 ) =    992.1103    
 Z(           13 ) =    1001.674    
 Z(           14 ) =    1000.352    
 Z(           15 ) =    1021.354    
 Z(           16 ) =    1009.728    
 Z(           17 ) =    996.0969    
 Z(           18 ) =    1005.107    
 Z(           19 ) =    993.8898    
 Z(           20 ) =    981.4053    
 Z(            4 ) =    1003.594    
Application 4736940 resources: utime 24, stime 2

LBNL Home
Page last modified: Fri, 21 May 2004 22:21:36 GMT
Page URL: http://www.nersc.gov/nusers/help/tutorials/openmp/print.php
Web contact: webmaster@nersc.gov
Computing questions: consult@nersc.gov

Privacy and Security Notice
DOE Office of Science