Date: Wed, 09 May 2001 19:52:21 +0200 From: Hubert Ritzdorf Organization: C&C Research Laboratories, NEC Europe Ltd. X-Mailer: Mozilla 4.76C-SGI [en] (X11; U; IRIX 6.5 IP32) X-Accept-Language: en MIME-Version: 1.0 To: mpi-21@XXXXXXXXXXXXX Subject: Re: MPI_FINALIZE References: Content-Type: text/plain; charset=iso-8859-2 Content-Transfer-Encoding: 7bit Sender: owner-mpi-21@XXXXXXXXXXXXX Precedence: bulk Jeff Squyres wrote: > A few points: > > 1. Implementing the change in MPI_FINALIZE to make it collective over "the > union of all processes that have been and continue to be connected" is a > non-trivial distributed algorithm, since it is essentially a barrier over > potentially unrelated and not-directly-connected processes. > > 2. Is there a difference between "have been and continue to be connected" > and "are connected"? > > 3. This change can potentially drastically change the semantics of > currently-valid MPI programs. > > As one example: currently-valid "task-farm" programs may unintentionally > cause a lot of "zombied" MPI processes that are simply waiting for an > MPI_FINALIZE from their ancestor(s). Consider what happens if a root > process continually spawns short-lived MPI processes to perform some task > in a "fire and forget" kind of model. The short-lived child processes > could previously invoke MPI_FINALIZE and die. With the proposed change, > the short-lived processed will now block waiting for the parent to invoke > MPI_FINALIZE as well. > > This program can be fixed by having the root and child processes invoke > MPI_COMM_DISCONNECT right after spawning (or after whenever the last > message between the root and children finishes) so that the child can > MPI_FINALIZE by itself, and then die. As I have understood the MPI-2 standard (Page 106, Lines 11-41), this is exactly the procedure which the standard requires. Thus, programs which don't disconnect before calling MPI_Finalize expect waiting for MPI_Finalize. Programs, you mentioned and which expect dying before all connected processes have called MPI_Finalize, are not ``valid'' MPI programs. We have implemented MPI_Finalize for NEC SX systems corresponding to Page 106; which corresponds to a barrier over all MPI processes which are connected. > But my concern is backwards compatibility: we have no idea how many > programs exist that rely on MPI_FINALIZEing over just MPI_COMM_WORLD. > Changing the spec now could cause unintended side-effects in > currently-valid MPI programs. I don't see this backwards compatibility problem; the programs are not standard conform and may still run (including the ``zombie'' MPI processes which may waste resources). Best regards Hubert -- ______________________________________________________________________________ Hubert Ritzdorf NEC Europe Ltd. C&C Research Laboratories Rathausallee 10 D-53757 Sankt Augustin ______________________________________________________________________________