Date: Mon, 16 Nov 1998 14:01:32 -0600 (CST)
From: Tony Skjellum <tony@XXXXXXXXXXXXXXXX>
To: Karl Feind <kaf@XXXXXXX>
cc: mpi-comm@XXXXXXXXXXX,judith@XXXXXXX,gropp@XXXXXXXXXXX
Subject: Re: MPI-2 thread safety and collectives
In-Reply-To: <199811161805.MAA09295@XXXXXXXXXXXXXXXX>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-UIDL: b09733c0793055973d268715c3240f5c

Karl,

First of all, we discussed for a long time that if one serialized the
sequence of collective operations in a set of threads (same communicator),
and that if this sequence was the same across the communicator
(independent of number of threads in each process), that this was prima
facie legal.  

If this is not what was finally agreed on, then interpretation #1 is the
right one in my opinion.  

Common implementations allow collectives to be insulated from each other
by the semantics of the operations (with a backup that each collective has
its own tag, but that is generally unneeded).  That would suggest your
interpretation #2, but that depends generally on the strategy used in
MPICH and other implementations to utilize collective tags per operation.
In fact, for most collective operations, the effective total order is
enough to gain separation (avoid race condition) from multiple invocations
of the same operation or different operations without the tag, so that
anything that implicates this tag as a condition of thread safety is not
desirable IMHO.

-Tony

Anthony Skjellum, PhD, President (tony@mpi-softtech.com) 
MPI Software Technology, Inc., Ste 201, 1 Research Blvd., Starkville, MS 39759
+1-(601)320-4300 x15; FAX: +1-(601)320-4301; http://www.mpi-softtech.com
*The source for MPI, MPI-2, MPI/RT, Embedded & High Performance Middleware(tm)*


On Mon, 16 Nov 1998, Karl Feind wrote:

> 
> 
> I'd like to get some opinions about interpreting an aspect of the MPI-2 thread 
> safety specification which deals with collectives.
> 
> -----------------------------------------------------------------------------
>   MPI-2 Standard, Section 8.7.1, paragraphs 1-3:
> 
>    Threads
> 
>    In a thread-compliant implementation, an MPI process is a process that
>    may be multi-threaded. Each thread can issue MPI calls; however,
>    threads are not separately addressable: a rank in a send or receive
>    call identifies a process, not a thread. A message sent to a process
>    can be received by any thread in this process.
> 
>    Rationale.
> 
>    This model corresponds to the POSIX model of interprocess
>    communication: the fact that a process is multi-threaded, rather than
>    single-threaded, does not affect the external interface of this
>    process. MPI implementations where MPI `processes' are POSIX threads
>    inside a single POSIX process are not thread-compliant by this
>    definition (indeed, their ``processes'' are single-threaded). ( End of
>    rationale.)
> 
>    Advice to users.
> 
>    It is the user's responsibility to prevent races when threads within
>    the same application post conflicting communication calls. The user
>    can make sure that two threads in the same process will not issue
>    conflicting communication calls by using distinct communicators at
>    each thread. ( End of advice to users.)
> -----------------------------------------------------------------------------
> 
> The basic question is simple.   How do you define "conflicting communication
> calls"?   There are two possible interpretations:
> 
>      	1) Any MPI collective call on the same communicator.
>     or	2) The same MPI collective  call on the same communicator.
> 
> When I read the text of the standard, I tend to take interpretation #1,
> but one MPI Forum member I talked to recalls the forum specifically taking 
> interpretation #2.    Hence I want to get more feedback to ensure that I'm
> interpreting this correctly.
> 
> I think that choosing interpretation #1 would be very desirable because it
> would permit some MPI collectives to be layered on top of other collectives
> without introducing thread-safety problems.   As a common example, the ROMIO
> MPI I/O software is layered on MPI collectives.   Interpretation #1 would
> permit ROMIO to be layered and still not violate thread-safety.
> 
> Consider scenario S1 below, which illustrates this thread-safety issue as
> it affects the ROMIO MPI-2 I/O layered implementation.
> 
> ROMIO function MPI_File_open()  calls MPI_Allreduce() using the communicator
> passed by the user into MPI_File_open.    Suppose the user had another thread
> executing an MPI_Allreduce() collective operation on the same communicator.
> 
> If a race condition allows the threads to execute in different orders,
> we would get an incorrect result:
> 
> 		Scenario S1
> 		-----------
> 
>      |  Process 0                       |       Process 1
>      |  ---------                       |       ---------
>      |                                  |
> Time |  thread A calls MPI_Allreduce    |       thread C calls MPI_Allreduce
>  |   |  via MPI_File_open               |
>  |   |                                  |
>  |   |  thread B calls MPI_Allreduce    |       thread D calls MPI_Allreduce
>  |   |                                  |       via MPI_File_open
>  V   |                                  |
> 
> 
> Under interpretation #1, you could say that the user needs to ensure 
> that all threads issuing any collective calls on the same communicator 
> must be properly ordered by the user.  With this interpretation, scenario
> S1 would be erroneous and the implementation wouldn't need to deal with
> the matching up of the collective calls on multiple threads.
> 
> Notice that under both interpretations, a scenario like S2 should 
> be erroneous because the same collective function is called by
> the two threads.
> 
>                 Scenario S2
>                 -----------
> 
>      |  Process 0                       |       Process 1
>      |  ---------                       |       ---------
>      |                                  |
> Time |  thread A calls MPI_Allreduce    |       thread C calls MPI_Allreduce
>  |   |  for MPI_MAX operation           |       for MPI_SUM operation
>  |   |                                  |
>  |   |  thread B calls MPI_Allreduce    |       thread D calls MPI_Allreduce
>  |   |  for MPI_SUM operation           |       for MPI_MAX operation
>  V   |                                  |
> 
> 
> 
> Thanks for any opinions or recollections of forum discussions on this matter.
> 
> 
> Karl Feind                                    E-Mail: kaf@XXXXXXX
> Silicon Graphics                              Phone:  612/683-5673
> 655F Lone Oak Drive                           Fax:    612/683-5276
> Eagan, MN  55121
>