Introduction
This tutorial will take you through the OpenMP directives, starting with the most basic and useful directives.
Execution Model
A program that is written using OpenMP directives begins execution as a single process, called the master thread of execution. The master thread executes sequentially until the first parallel construct is encountered. The PARALLEL / END PARALLEL directive pair constitutes the parallel construct.
When a parallel construct is encountered, the master thread creates a team of threads, and the master becomes the master of the team. The program statements that are enclosed in a parallel construct, including routines called from within the construct, are executed in parallel by each thread in the team.
Upon completion of the parallel construct, the threads in the team synchronize and only the master thread continues execution. Any number of parallel constructs can be specified in a single program. As a result, a program may fork and join many times during execution.
The degree of parallelism an OpenMP code is dependent on the code, the platform, the hardware configuration, the compiler, and the operating system. In no case are you guaranteed to have each thread running on a separate processor.
The Cray XT4 will only allow you to specify a number of threads less than or equal to the number of processor cores on a node. The franklin compute nodes all have four cores, so you can execute 1, 2, 3, or 4 threads on them.
You are only allowed to run OpenMP codes on franklin on compute nodes with a pbs script.
Examples
Examples using OpenMP can be copied to your $HOME/openmp_examples directory on franklin by using:
% cd $HOME % mkdir openmp_examples % module load training % cp $EXAMPLES/OpenMP/tutorial/* openmp_examples
OpenMP Directive Syntax
OpenMP directives are inserted directly into source code. Free-form Fortran source code directives begin with the sentinel !$OMP. Fixed-form Fortran source code directives begin with the sentinels !$OMP, C$OMP, or *$OMP. Sentinels must start in column one. Continuation lines are permitted using the same format as the Fortran source code format you are using (free or fixed).
Following are descriptions of the basic use of OpenMP directives with examples.
PARALLEL Directive
A Parallel Region is a block of code that is to be executed in parallel by a number of threads. Each thread executes the enclosed code separately.
Note that all code within a parallel region is executed by each thread unless other OpenMP directives specify otherwise. For instance, a DO loop that lies within a parallel region will be executed completely (and redundantly) by each thread unless a parallel DO directive is inserted before the loop. A DO or PARALLEL DO directive is necessary if you want the loop to be executed once with different threads performing different iterations of the loop in parallel.
It is illegal to branch out of a Parallel Region.
!$OMP PARALLEL [clause] code block !$OMP END PARALLEL
There are many possible values of [clause].
Examples
!Filename: parallel.f90 ! !This simply shows that code in a PARALLEL !region is executed by each thread. PROGRAM PARALLEL IMPLICIT NONE INTEGER I I=1 !$OMP PARALLEL FIRSTPRIVATE(I) PRINT *, I !$OMP END PARALLEL END PROGRAM PARALLEL
To run on franklin:
> cat parallel.pbs #PBS -N parallel #PBS -j oe #PBS -o parallel.out #PBS -q interactive #PBS -l mppwidth=1 #PBS -l mppnppn=1 #PBS -l mppdepth=4 #PBS -l walltime=00:05:00 #PBS -V cd $PBS_O_WORKDIR ftn -o parallel -mp=nonuma -Minfo=mp parallel.f90 export OMP_NUM_THREADS=4 aprun -n 1 -N 1 -d 4 ./parallel > qsub parallel.pbs 498022.nid00003 > cat parallel.out /opt/xt-asyncpe/1.0c/bin/ftn: INFO: linux target is being used parallel: 12, Parallel region activated 14, Parallel region terminated 1 1 1 1 Application 277470 resources: utime 0, stime 0
The next example shows use of the REDUCTION clause. This simple example shows how the values of the variables are combined when leaving the parallel region when a REDUCTION clause is used. Also note that each thread executes the PRINT statement in the parallel region.
!Filename: reduction.f90 ! !This program shows the use of the REDUCTION clause. PROGRAM REDUCTION IMPLICIT NONE INTEGER tnumber, OMP_GET_THREAD_NUM INTEGER I,J,K I=1 J=1 K=1 PRINT *, "Before Par Region: I=",I," J=", J," K=",K PRINT *, "" !$OMP PARALLEL DEFAULT(PRIVATE) REDUCTION(+:I)& !$OMP REDUCTION(*:J) REDUCTION(MAX:K) tnumber=OMP_GET_THREAD_NUM() I = tnumber J = tnumber K = tnumber PRINT *, "Thread ",tnumber, " I=",I," J=", J," K=",K !$OMP END PARALLEL PRINT *, "" print *, "Operator + * MAX" PRINT *, "After Par Region: I=",I," J=", J," K=",K END PROGRAM REDUCTION
To run on franklin:
> cat reduction.pbs #PBS -N reduction #PBS -j oe #PBS -o reduction.out #PBS -q interactive #PBS -l mppwidth=1 #PBS -l mppnppn=1 #PBS -l mppdepth=4 #PBS -l walltime=00:05:00 #PBS -V cd $PBS_O_WORKDIR ftn -o reduction -mp=nonuma -Minfo=mp reduction.f90 export OMP_NUM_THREADS=4 aprun -n 1 -N 1 -d 4 ./reduction > qsub reduction.pbs 498041.nid00003 > cat reduction.out /opt/cray/xt-asyncpe/2.0/bin/ftn: INFO: linux target is being used reduction: 15, Parallel region activated 25, Begin critical section End critical section Parallel region terminated Before Parallel Region: I= 1 J= 1 K= 1 Thread 0 I= 0 J= 0 K= 0 Thread 2 I= 2 J= 2 K= 2 Thread 3 I= 3 J= 3 K= 3 Thread 1 I= 1 J= 1 K= 1 Operator + * MAX After Parallel Region: I= 7 J= 0 K= 3 Application 2015449 resources: utime 0, stime 0
DO Directive
The DO directive specifies that the iterations of the immediately following DO loop must be executed in parallel. The DO directive must be enclosed in a parallel region; it creates no threads by itself. The following do loop can not be a DO WHILE.
!$OMP DO [clause[[,]clause ...] do_loop !$OMP END DO [NOWAIT]
The DO clause can have various values.
It is illegal to branch out of a DO loop associated with the DO directive.
Example
!Filename: dodir.f90 ! PROGRAM DODIR IMPLICIT NONE INTEGER I,L INTEGER, PARAMETER:: DIM=16 REAL A(DIM),B(DIM),S INTEGER nthreads,tnumber INTEGER OMP_GET_NUM_THREADS,OMP_GET_THREAD_NUM CALL RANDOM_NUMBER(A) CALL RANDOM_NUMBER(B) !$OMP PARALLEL DEFAULT(PRIVATE) SHARED(A,B) !$OMP DO SCHEDULE(STATIC,4) DO I=2,DIM B(I) = ( A(I) - A(I-1) ) / 2.0 nthreads=OMP_GET_NUM_THREADS() tnumber=OMP_GET_THREAD_NUM() print *, "Thread",tnumber," of",nthreads," has I=",I END DO !$OMP END DO !$OMP END PARALLEL S=MAXVAL(B) L=MAXLOC(B,1) PRINT *, "Maximum gradient: ",S," at location:",L END PROGRAM DODIR
Compiling and running on Franklin:
> cat dodir.pbs #PBS -N dodir #PBS -j oe #PBS -o dodir.out #PBS -q interactive #PBS -l mppwidth=1 #PBS -l mppnppn=1 #PBS -l mppdepth=4 #PBS -l walltime=00:05:00 #PBS -V cd $PBS_O_WORKDIR ftn -o dodir -mp=nonuma -Minfo=mp dodir.f90 export OMP_NUM_THREADS=4 aprun -n 1 -N 1 -d 4 ./dodir > qsub dodir.pbs 500611.nid00003 > cat dodir.out /opt/cray/xt-asyncpe/2.0/bin/ftn: INFO: linux target is being used dodir: 15, Parallel region activated 17, Parallel loop activated with static block-cyclic schedule 24, Barrier Parallel region terminated Thread 0 of 4 has I= 2 Thread 0 of 4 has I= 3 Thread 0 of 4 has I= 4 Thread 0 of 4 has I= 5 Thread 3 of 4 has I= 14 Thread 3 of 4 has I= 15 Thread 3 of 4 has I= 16 Thread 1 of 4 has I= 6 Thread 1 of 4 has I= 7 Thread 1 of 4 has I= 8 Thread 1 of 4 has I= 9 Thread 2 of 4 has I= 10 Thread 2 of 4 has I= 11 Thread 2 of 4 has I= 12 Thread 2 of 4 has I= 13 Maximum gradient: 0.6280164 at location: 1 Application 2016137 resources: utime 0, stime 0
Notice that the loop was divided among the 4 threads as we requested with the SCHEDULE(STATIC,4) clause. Also note that if had not enclosed the do loop in a DO/END DO directive block, the loop would not have been split, but would have been executed OMP_NUM_THREADS number of times.
PARALLEL DO Directive
The PARALLEL DO directive provides a shortcut form for specifying a parallel region that contains a single DO directive. The semantics are identical to specifying a PARALLEL directive followed by a DO directive.
!$OMP PARALLEL DO [clause[[,]clause ...] do_loop !$OMP END PARALLEL DO
The clause can be any of those associated with the PARALLEL and DO directives described above.
Example
!Filename: pardo.f90 ! PROGRAM PARDO IMPLICIT NONE INTEGER I,J INTEGER, PARAMETER:: DIM1=10000, DIM2=200 REAL A(DIM1),B(DIM2,DIM1),C(DIM2,DIM1) REAL before, after, elapsed,S INTEGER nthreads,OMP_GET_NUM_THREADS CALL RANDOM_NUMBER(A) call cpu_time(before) !$OMP PARALLEL DO SCHEDULE(RUNTIME) PRIVATE(I,J) SHARED (A,B,C,nthreads) DO J=1,DIM2 nthreads = OMP_GET_NUM_THREADS() DO I=2, DIM1 B(J,I) = ( (A(I)+A(I-1))/2.0 ) / SQRT(A(I)) C(J,I) = SQRT( A(I)*2 ) / ( A(I)-(A(I)/2.0) ) B(J,I) = C(J,I) * ( B(J,I)**2 ) * SIN(A(I)) END DO END DO !$OMP END PARALLEL DO call cpu_time(after) !Find elapsed time; convert to seconds from ms elapsed = after-before S=MAXVAL(B) WRITE(6,'("Maximum of B=",1pe8.2," found in ",1pe8.2," & &seconds using", I2," threads")') S,elapsed,nthreads END PROGRAM PARDO
Compiling and running on Franklin:
> cat pardo.pbs #PBS -N pardo #PBS -j oe #PBS -o pardo.out #PBS -q interactive #PBS -l mppwidth=1 #PBS -l mppnppn=1 #PBS -l mppdepth=4 #PBS -l walltime=00:05:00 #PBS -V cd $PBS_O_WORKDIR ftn -o pardo -mp=nonuma -Minfo=mp pardo.f90 export OMP_NUM_THREADS=4 aprun -n 1 -N 1 -d 4 ./pardo > qsub pardo.pbs 500719.nid00003 > cat pardo.out /opt/cray/xt-asyncpe/2.0/bin/ftn: INFO: linux target is being used pardo: 16, Parallel region activated 17, Parallel loop activated with runtime schedule schedule 24, Parallel region terminated Maximum of B=2.92E+01 found in 7.68E-02 seconds using 4 threads Application 2016575 resources: utime 0, stime 0
SECTIONS Directive
The SECTIONS directive specifies that code in the the enclosed SECTION blocks are to be divided among the threads in the team. Each section is executed once.
!$OMP SECTIONS [clause[[,]clause ...] [!$OMP SECTION] code block [!$OMP SECTION code block] ... !$OMP END SECTIONS [NOWAIT]
The clause can be one of the following:
- PRIVATE (list)
- FIRSTPRIVATE (list)
- LASTPRIVATE (list)
- REDUCTION ({operator | intrinsic):list)
- See operator and intrinsic list for Parallel Regions.
Example
!Filename: sections.f90 ! !This shows code that is executed !in sections. PROGRAM SECTIONS IMPLICIT NONE INTEGER OMP_GET_THREAD_NUM, tnumber !$OMP PARALLEL !$OMP SECTIONS PRIVATE(tnumber) !$OMP SECTION tnumber=OMP_GET_THREAD_NUM() PRINT *,"This is section 1 being executed by thread",tnumber !$OMP SECTION tnumber=OMP_GET_THREAD_NUM() PRINT *,"This is section 2 being executed by thread",tnumber !$OMP SECTION tnumber=OMP_GET_THREAD_NUM() PRINT *,"This is section 3 being executed by thread",tnumber !$OMP SECTION tnumber=OMP_GET_THREAD_NUM() PRINT *,"This is section 4 being executed by thread",tnumber !$OMP END SECTIONS !$OMP END PARALLEL END PROGRAM SECTIONS
Compiling and running on franklin:
> cat sections.pbs #PBS -N sections #PBS -j oe #PBS -o sections.out #PBS -q interactive #PBS -S /bin/bash #PBS -l mppwidth=1 #PBS -l mppnppn=1 #PBS -l mppdepth=2 #PBS -l walltime=00:05:00 #PBS -V cd $PBS_O_WORKDIR ftn -o sections -mp=nonuma -Minfo=mp sections.f90 export OMP_NUM_THREADS=2 aprun -n 1 -N 1 ./sections > qsub sections.pbs 500738.nid00003 > cat sections.out /opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used sections.f90: sections: 11, Parallel region activated 12, Begin sections 17, New section 20, New section 23, New section 26, End sections Parallel region terminated This is section 2 being executed by thread 1 This is section 4 being executed by thread 1 This is section 1 being executed by thread 0 This is section 3 being executed by thread 0 Application 4734431 resources: utime 0, stime 0
SINGLE Directive
The SINGLE directive specifies that the enclosed code is to be executed by only one thread in the team. Threads that are not executing in the SINGLE directive wait at the END SINGLE directive unless NOWAIT is specified. It is illegal to branch out of a SINGLE block.
!$OMP SINGLE [clause[[,]clause ...] block !$OMP END SINGLE [NOWAIT]
The clause can be on of the following:
- PRIVATE (list)
- FIRSTPRIVATE (list)
Example
!Filename: single.f90 ! !This shows use of the SINGLE directive. ! PROGRAM SINGLE IMPLICIT NONE INTEGER, PARAMETER:: N=12 REAL, DIMENSION(N):: A,B,C,D INTEGER:: I REAL:: SUMMED !$OMP PARALLEL SHARED(A,B,C,D) PRIVATE(I) !***** Reading files fort.10, fort.11, fort.12 in parallel !$OMP SECTIONS !$OMP SECTION READ(10,*) (A(I),I=1,N) !$OMP SECTION READ(11,*) (B(I),I=1,N) !$OMP SECTION READ(12,*) (C(I),I=1,N) !$OMP END SECTIONS !$OMP SINGLE SUMMED = SUM(A) + SUM(B) + SUM(C) PRINT *, "Sum of A+B+C=",SUMMED !$OMP END SINGLE !$OMP DO SCHEDULE(STATIC,4) DO I=1,N D(I) = A(I) + B(I)*C(I) END DO !$OMP END DO !$OMP END PARALLEL PRINT *, "The values of D are", D END PROGRAM SINGLE
Compiling and running on franklin. The files named fort.10, fort.11, and fort.12 each has 12 1.0's:
> cat single.pbs #PBS -N single #PBS -j oe #PBS -o single.out #PBS -q interactive #PBS -S /bin/bash #PBS -l mppwidth=1 #PBS -l mppnppn=1 #PBS -l mppdepth=2 #PBS -l walltime=00:05:00 #PBS -V cd $PBS_O_WORKDIR ftn -o single -mp=nonuma -Minfo=mp single.f90 export OMP_NUM_THREADS=2 aprun -n 1 -N 1 ./single > qsub single.pbs 500801.nid00003 > cat single.out /opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used single.f90: single: 13, Parallel region activated 17, Begin sections 20, New section 22, New section 24, End sections 26, Begin single section 29, End single section Barrier 32, Parallel loop activated; static block-cyclic iteration allocation 35, Barrier Parallel region terminated Sum of A+B+C= 36.00000 The values of D are 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 2.000000 Application 4735339 resources: utime 0, stime 0
MASTER Directive
The code enclosed with MASTER and END MASTER directives is executed only by the master thread of the team. There is no implied barrier either on entry to or exit from the MASTER section. Branching out of a MASTER block is illegal.
!$OMP MASTER block !$OMP END MASTER
BARRIER Directive
!$OMP BARRIER
The BARRIER directive synchronizes all threads in a team. When encountered, each thread waits until all of the others threads in that team have reached this point.
Example
!Filename: barrier.f90 ! !This shows use of the BARRIER directive. ! PROGRAM ABARRIER IMPLICIT NONE INTEGER:: L INTEGER:: nthreads, OMP_GET_NUM_THREADS INTEGER:: tnumber, OMP_GET_THREAD_NUM !$OMP PARALLEL SHARED(L) PRIVATE(nthreads,tnumber) nthreads = OMP_GET_NUM_THREADS() tnumber = OMP_GET_THREAD_NUM() !$OMP MASTER PRINT *, ' Enter a value for L' READ(5,*) L !$OMP END MASTER !$OMP BARRIER !$OMP CRITICAL PRINT *, ' My thread number =',tnumber PRINT *, ' Number of threads =',nthreads PRINT *, ' Value of L =',L PRINT *, '' !$OMP END CRITICAL !$OMP END PARALLEL END PROGRAM ABARRIER
Compiling and running on franklin:
> cat barrier.pbs #PBS -N barrier #PBS -j oe #PBS -o barrier.out #PBS -q interactive #PBS -S /bin/bash #PBS -l mppwidth=1 #PBS -l mppnppn=1 #PBS -l mppdepth=2 #PBS -l walltime=00:05:00 #PBS -V cd $PBS_O_WORKDIR ftn -o barrier -mp=nonuma -Minfo=mp barrier.f90 export OMP_NUM_THREADS=2 aprun -n 1 -N 1 ./barrier < ninety > cat ninety 90 > qsub barrier.pbs 500868.nid00003 > cat barrier.out /opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used barrier.f90: barrier: 12, Parallel region activated 16, Begin master section 21, End master section 23, Barrier 27, Begin critical section __cs_unspc 30, End critical section __cs_unspc Parallel region terminated Enter a value for L My thread number = 0 Number of threads = 2 Value of L = 90 My thread number = 1 Number of threads = 2 Value of L = 90 Application 4736056 resources: utime 0, stime 0
FLUSH Directive
!$OMP FLUSH(list)
The FLUSH directive identifies synchronization points at which the implementation is required to provide a consistent view of memory. The directive must appear at the precise point in the code at which the synchronization is required. The optional list argument consists of a comma-separated list of variables that need to be flushed in order to avoid flushing all variables.
The FLUSH directive is implied for the following directives:
- BARRIER
- CRITICAL and END CRITICAL
- END DO
- END PARALLEL
- END SECTIONS
- END SINGLE
- ORDERED and END ORDERED
The directive is not implied if NOWAIT is present.
ATOMIC Directive
!$OMP ATOMIC
The ATOMIC directive ensures that a specific memory location is to be updated atomically, rather than exposing it to the possibility of multiple, simultaneous writing threads.
Example
!Filename: density.f ! PROGRAM DENSITY IMPLICIT NONE INTEGER, PARAMETER:: NBINS=10 INTEGER, PARAMETER:: NPARTICLES=100000 REAL:: XMIN, XMAX, MAXMASS, MINMASS REAL, DIMENSION(NPARTICLES):: X_LOCATION, PARTICLE_MASS INTEGER, DIMENSION(NPARTICLES):: BIN REAL, DIMENSION(NBINS):: GRID_MASS, GRID_DENSITY INTEGER, DIMENSION(NBINS):: GRID_N REAL:: DX,DXINV,TOTAL_MASS,CHECK_MASS INTEGER:: I, CHECK_N, XMAX_LOC(1) GRID_MASS=0.0 TOTAL_MASS=0.0 GRID_N=0 CHECK_MASS=0.0 CHECK_N=0 ! Initialize particle positions and masses CALL RANDOM_NUMBER(PARTICLE_MASS) CALL RANDOM_NUMBER(X_LOCATION) MAXMASS = MAXVAL(PARTICLE_MASS) MINMASS = MINVAL(PARTICLE_MASS) XMAX = MAXVAL(X_LOCATION) XMIN = MINVAL(X_LOCATION) PRINT *, 'MINMASS =',MINMASS,' MAXMASS = ',MAXMASS PRINT *, 'XMIN =',XMIN,' XMAX = ',XMAX ! Grid Spacing (and inverse) DX = (XMAX-XMIN) / FLOAT(NBINS) DXINV = 1/DX !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(I) REDUCTION(+:TOTAL_MASS) !$OMP DO DO I = 1, NPARTICLES IF (I==XMAX_LOC(1)) THEN BIN(I) = NBINS ELSE BIN(I) = 1 + ( (X_LOCATION(I)-XMIN) * DXINV ) END IF IF(BIN(I) < 1 .OR. BIN(I) > NBINS) THEN ! Off Grid! PRINT *, 'ERROR: BIN =',BIN(I),' X =',X_LOCATION(I) ELSE !$OMP ATOMIC GRID_MASS(BIN(I)) = GRID_MASS(BIN(I)) & + PARTICLE_MASS(I) !$OMP ATOMIC GRID_N(BIN(I)) = GRID_N(BIN(I)) + 1 TOTAL_MASS = TOTAL_MASS + PARTICLE_MASS(I) END IF END DO !$OMP END DO !$OMP END PARALLEL DO I=1, NBINS GRID_DENSITY(I) = GRID_MASS(I) * DXINV END DO PRINT *, 'Total Particles =',NPARTICLES PRINT *, 'Total Mass =',TOTAL_MASS DO I=1,NBINS PRINT *, 'DENSITY(',I,' ) =',GRID_DENSITY(I),' & &MASS(',I,' ) =',GRID_MASS(I) END DO ! Check for consistency DO I=1,NBINS CHECK_MASS = CHECK_MASS + GRID_MASS(I) CHECK_N = CHECK_N + GRID_N(I) END DO PRINT *, 'Particles on Grid =', CHECK_N PRINT *, 'Total Mass on Grid =', CHECK_MASS END PROGRAM DENSITY
Without the ATOMIC directives, difference threads would try to update the grid mass bins at the same time, causing erroneous results. Compiling and running on franklin:
> cat density.pbs #PBS -N density #PBS -j oe #PBS -o density.out #PBS -q interactive #PBS -S /bin/bash #PBS -l mppwidth=1 #PBS -l mppnppn=1 #PBS -l mppdepth=2 #PBS -l walltime=00:05:00 #PBS -V cd $PBS_O_WORKDIR ftn -o density -mp=nonuma -Minfo=mp density.f90 export OMP_NUM_THREADS=2 aprun -n 1 -N 1 ./density > qsub density.pbs 500899.nid00003 > cat density.out /opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used density.f90: density: 40, Parallel region activated 43, Parallel loop activated; static block iteration allocation 55, Begin critical section End critical section 57, Begin critical section End critical section 62, Barrier 64, Begin critical section End critical section Parallel region terminated MINMASS = 4.8334509E-06 MAXMASS = 0.9999951 XMIN = 4.3212090E-06 XMAX = 0.9999959 Total Particles = 100000 Total Mass = 50035.98 DENSITY( 1 ) = 50715.23 MASS( 1 ) = 5071.481 DENSITY( 2 ) = 50151.22 MASS( 2 ) = 5015.081 DENSITY( 3 ) = 50154.58 MASS( 3 ) = 5015.417 DENSITY( 4 ) = 49183.93 MASS( 4 ) = 4918.352 DENSITY( 5 ) = 49083.88 MASS( 5 ) = 4908.347 DENSITY( 6 ) = 50813.93 MASS( 6 ) = 5081.351 DENSITY( 7 ) = 51367.36 MASS( 7 ) = 5136.694 DENSITY( 8 ) = 49535.90 MASS( 8 ) = 4953.549 DENSITY( 9 ) = 50296.39 MASS( 9 ) = 5029.598 DENSITY( 10 ) = 49062.40 MASS( 10 ) = 4906.199 Particles on Grid = 100000 Total Mass on Grid = 50036.07 Application 4736271 resources: utime 0, stime 0
Running a second time reveals that the additions of numbers ranging over 5 orders of magnitude (MINMASS to MAXMASS) are not entirely associative; note the values of MASS and Total Mass. You will not get precisely the same results each time as you would if had used a single thread.
MINMASS = 4.8334509E-06 MAXMASS = 0.9999951 XMIN = 4.3212090E-06 XMAX = 0.9999959 Total Particles = 100000 Total Mass = 50035.98 DENSITY( 1 ) = 50715.10 MASS( 1 ) = 5071.468 DENSITY( 2 ) = 50151.42 MASS( 2 ) = 5015.101 DENSITY( 3 ) = 50154.55 MASS( 3 ) = 5015.414 DENSITY( 4 ) = 49184.07 MASS( 4 ) = 4918.366 DENSITY( 5 ) = 49083.94 MASS( 5 ) = 4908.353 DENSITY( 6 ) = 50813.96 MASS( 6 ) = 5081.354 DENSITY( 7 ) = 51367.18 MASS( 7 ) = 5136.676 DENSITY( 8 ) = 49535.82 MASS( 8 ) = 4953.542 DENSITY( 9 ) = 50296.48 MASS( 9 ) = 5029.606 DENSITY( 10 ) = 49062.43 MASS( 10 ) = 4906.203 Particles on Grid = 100000 Total Mass on Grid = 50036.08
As an illustration of how things would go wrong, here's sample output with the ATOMIC directive removed from the code. Note the particle conservation check fails:
MINMASS = 4.8334509E-06 MAXMASS = 0.9999951 XMIN = 4.3212090E-06 XMAX = 0.9999959 Total Particles = 100000 Total Mass = 50035.98 DENSITY( 1 ) = 45400.19 MASS( 1 ) = 4539.981 DENSITY( 2 ) = 44862.25 MASS( 2 ) = 4486.188 DENSITY( 3 ) = 45124.16 MASS( 3 ) = 4512.378 DENSITY( 4 ) = 44338.48 MASS( 4 ) = 4433.811 DENSITY( 5 ) = 44486.02 MASS( 5 ) = 4448.565 DENSITY( 6 ) = 45709.66 MASS( 6 ) = 4570.928 DENSITY( 7 ) = 46367.48 MASS( 7 ) = 4636.710 DENSITY( 8 ) = 44487.38 MASS( 8 ) = 4448.701 DENSITY( 9 ) = 45177.90 MASS( 9 ) = 4517.752 DENSITY( 10 ) = 44100.41 MASS( 10 ) = 4410.005 Particles on Grid = 90036 Total Mass on Grid = 45005.02 Application 4736632 resources: utime 0, stime 0
ORDERED Directive
The code enclosed with ORDERED and END ORDERED directives is executed in the order in which iterations would be executed in a sequential execution of the loop. The ORDERED directive can only appear in the context of a DO or PARALLEL DO directive. It is illegal to branch into or out of an ORDERED block.
!$OMP ORDERED block !$OMP END ORDERED
Example
PROGRAM ORDERED IMPLICIT NONE INTEGER, PARAMETER:: N=1000, M=4000 REAL, DIMENSION(N,M):: X,Y REAL, DIMENSION(N):: Z INTEGER I,J CALL RANDOM_NUMBER(X) CALL RANDOM_NUMBER(Y) Z=0.0 PRINT *, 'The first 20 values of Z are:' !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(I,J) !$OMP DO SCHEDULE(DYNAMIC,2) ORDERED DO I=1,N DO J=1,M Z(I) = Z(I) + X(I,J)*Y(J,I) END DO !$OMP ORDERED IF(I<21) THEN PRINT *, 'Z(',I,') =',Z(I) END IF !$OMP END ORDERED END DO !$OMP END DO !$OMP END PARALLEL END PROGRAM ORDERED
Compiling and running on franklin:
> cat ordered.pbs #PBS -N ordered #PBS -j oe #PBS -o ordered.out #PBS -q interactive #PBS -S /bin/bash #PBS -l mppwidth=1 #PBS -l mppnppn=1 #PBS -l mppdepth=2 #PBS -l walltime=00:05:00 #PBS -V cd $PBS_O_WORKDIR ftn -o ordered -mp=nonuma -Minfo=mp ordered.f90 export OMP_NUM_THREADS=2 aprun -n 1 -N 1 ./ordered > qsub ordered.pbs 500980.nid00003 > cat ordered.out /opt/xt-pe/2.0.44a2/bin/snos64/ftn: INFO: linux target is being used ordered.f90: ordered: 15, Parallel region activated 18, Parallel loop activated; dynamic iteration allocation 31, Barrier Parallel region terminated The first 20 values of Z are: Z( 1 ) = 1028.067 Z( 2 ) = 1015.378 Z( 3 ) = 1010.786 Z( 4 ) = 1003.594 Z( 5 ) = 990.9749 Z( 6 ) = 982.2872 Z( 7 ) = 1021.567 Z( 8 ) = 1019.952 Z( 9 ) = 1011.410 Z( 10 ) = 986.6424 Z( 11 ) = 987.3596 Z( 12 ) = 992.1103 Z( 13 ) = 1001.674 Z( 14 ) = 1000.352 Z( 15 ) = 1021.354 Z( 16 ) = 1009.728 Z( 17 ) = 996.0969 Z( 18 ) = 1005.107 Z( 19 ) = 993.8898 Z( 20 ) = 981.4053 Application 4736903 resources: utime 22, stime 3
Without the ordered directive, sample output looks like this:
The first 20 values of Z are: Z( 3 ) = 1010.786 Z( 1 ) = 1028.067 Z( 2 ) = 1015.378 Z( 5 ) = 990.9749 Z( 6 ) = 982.2872 Z( 7 ) = 1021.567 Z( 8 ) = 1019.952 Z( 9 ) = 1011.410 Z( 10 ) = 986.6424 Z( 11 ) = 987.3596 Z( 12 ) = 992.1103 Z( 13 ) = 1001.674 Z( 14 ) = 1000.352 Z( 15 ) = 1021.354 Z( 16 ) = 1009.728 Z( 17 ) = 996.0969 Z( 18 ) = 1005.107 Z( 19 ) = 993.8898 Z( 20 ) = 981.4053 Z( 4 ) = 1003.594 Application 4736940 resources: utime 24, stime 2
Page last modified: Fri, 21 May 2004 22:21:36 GMT Page URL: http://www.nersc.gov/nusers/help/tutorials/openmp/print.php Web contact: webmaster@nersc.gov Computing questions: consult@nersc.gov Privacy and Security Notice |