From: Karl Feind <kaf@XXXXXXX>
Date: Mon, 16 Nov 1998 12:05:51 -0600 (CST)
To: mpi-comm@XXXXXXXXXXX,kaf@XXXXXXX
Subject: MPI-2 thread safety and collectives
Cc: judith@XXXXXXX,gropp@XXXXXXXXXXX
X-UIDL: 16baa7c1cf170ce0d8814dbeeb7521ef


I'd like to get some opinions about interpreting an aspect of the MPI-2 thread 
safety specification which deals with collectives.

-----------------------------------------------------------------------------
  MPI-2 Standard, Section 8.7.1, paragraphs 1-3:

   Threads

   In a thread-compliant implementation, an MPI process is a process that
   may be multi-threaded. Each thread can issue MPI calls; however,
   threads are not separately addressable: a rank in a send or receive
   call identifies a process, not a thread. A message sent to a process
   can be received by any thread in this process.

   Rationale.

   This model corresponds to the POSIX model of interprocess
   communication: the fact that a process is multi-threaded, rather than
   single-threaded, does not affect the external interface of this
   process. MPI implementations where MPI `processes' are POSIX threads
   inside a single POSIX process are not thread-compliant by this
   definition (indeed, their ``processes'' are single-threaded). ( End of
   rationale.)

   Advice to users.

   It is the user's responsibility to prevent races when threads within
   the same application post conflicting communication calls. The user
   can make sure that two threads in the same process will not issue
   conflicting communication calls by using distinct communicators at
   each thread. ( End of advice to users.)
-----------------------------------------------------------------------------

The basic question is simple.   How do you define "conflicting communication
calls"?   There are two possible interpretations:

     	1) Any MPI collective call on the same communicator.
    or	2) The same MPI collective  call on the same communicator.

When I read the text of the standard, I tend to take interpretation #1,
but one MPI Forum member I talked to recalls the forum specifically taking 
interpretation #2.    Hence I want to get more feedback to ensure that I'm
interpreting this correctly.

I think that choosing interpretation #1 would be very desirable because it
would permit some MPI collectives to be layered on top of other collectives
without introducing thread-safety problems.   As a common example, the ROMIO
MPI I/O software is layered on MPI collectives.   Interpretation #1 would
permit ROMIO to be layered and still not violate thread-safety.

Consider scenario S1 below, which illustrates this thread-safety issue as
it affects the ROMIO MPI-2 I/O layered implementation.

ROMIO function MPI_File_open()  calls MPI_Allreduce() using the communicator
passed by the user into MPI_File_open.    Suppose the user had another thread
executing an MPI_Allreduce() collective operation on the same communicator.

If a race condition allows the threads to execute in different orders,
we would get an incorrect result:

		Scenario S1
		-----------

     |  Process 0                       |       Process 1
     |  ---------                       |       ---------
     |                                  |
Time |  thread A calls MPI_Allreduce    |       thread C calls MPI_Allreduce
 |   |  via MPI_File_open               |
 |   |                                  |
 |   |  thread B calls MPI_Allreduce    |       thread D calls MPI_Allreduce
 |   |                                  |       via MPI_File_open
 V   |                                  |


Under interpretation #1, you could say that the user needs to ensure 
that all threads issuing any collective calls on the same communicator 
must be properly ordered by the user.  With this interpretation, scenario
S1 would be erroneous and the implementation wouldn't need to deal with
the matching up of the collective calls on multiple threads.

Notice that under both interpretations, a scenario like S2 should 
be erroneous because the same collective function is called by
the two threads.

                Scenario S2
                -----------

     |  Process 0                       |       Process 1
     |  ---------                       |       ---------
     |                                  |
Time |  thread A calls MPI_Allreduce    |       thread C calls MPI_Allreduce
 |   |  for MPI_MAX operation           |       for MPI_SUM operation
 |   |                                  |
 |   |  thread B calls MPI_Allreduce    |       thread D calls MPI_Allreduce
 |   |  for MPI_SUM operation           |       for MPI_MAX operation
 V   |                                  |


Thanks for any opinions or recollections of forum discussions on this matter.


Karl Feind                                    E-Mail: kaf@XXXXXXX
Silicon Graphics                              Phone:  612/683-5673
655F Lone Oak Drive                           Fax:    612/683-5276
Eagan, MN  55121