__color__	__group__	ticket	summary	component	version	type	owner	created	_changetime	_description	_reporter
1	None	428	MPI_Win_fence memory consumption	mpich2	None	bug	jayesh *	1235780829	1236984277	    {{{    Hi List,    the attached test program uses MPI_Accumulate/MPI_Win_fence for one  sided communication with derived datatype.  The program runs fine with mpich2-1.1a2 except for my debugging version  of MPICH2 compiled with    ./configure --with-device=ch3:nemesis --enable-g=dbg,mem,meminit    In this case the MPI_Win_fence on the target side comuses about 90% of  main memory (e.g. > 3 GB). As the behaviour is completely  different for predefined datatypes, I suspect that the memory  consumption is related to the construction of the  derived datatype on the target side.    Is there a workaround for this?    Thanks + Best regards,  Dorian          }}}	Dorian Krause <ddkrause@uni-bonn.de>
1	None	462	test1_dt failed in nightlies	mpich2	None	bug	jayesh	1237132067	1237223429	test1_dt (rma) failed in the nightlies. We should consider this separately from the attribute test failures (which were fixed last night).  	jayesh
2	None	29	nemesis ext_procs optimization	mpich2	None	bug	buntinas	1217598154	1236632771	    In r975 I committed a rough cut of dynamic processes for nemesis    newtcp.  In mpid_nem_inline.h I commented out an optimization that    uses MPID_nem_mem_region.ext_procs because it prevents the proper    operation of dynamic processes.  Unfortunately, removing it adds    ~100ns to our zero-byte message latencies.  So there is a FIXME in    the code that reads like this:    {{{   /* FIXME the ext_procs bit is an optimization for the all-local-procs case.      This has been commented out for now because it breaks dynamic processes.      Some other solution should be implemented eventually, possibly using a      flag that is set whenever a port is opened. [goodell@ 2008-06-18] */  }}}    In general, this won't affect real uses who run any inter-node jobs,    since they were already polling every time anyway.  However, it does    hurt those wonderful microbenchmarks.  A hack fix is to leave this in    but also check to see if a port has been opened.  A possibly better    fix is to only poll the network every X iterations of "poll    everything", where X is some tunable parameter.    This req is a reminder for this FIXME.    -Dave  	goodell
2	None	79	Nemesis support for non-Intel/AMD platforms	mpich2	None	bug	buntinas	1218473627	1236632795	    (This is a resend of one of the lost emails)    Should we make the default device depend on whether we're intel-Unix?  Bill    William Gropp  Paul and Cynthia Saylor Professor of Computer Science  University of Illinois Urbana-Champaign      	William Gropp <wgropp@illinois.edu>
2	None	148	multiple netmod support	mpich2	None	bug	buntinas *	1221675882	1221683096	This is a place holder for supporting multiple netmods simultaneously in nemesis     * poll on multiple netmods   * configure which vcs use which netmods        	buntinas
2	None	149	Define netmod interface	mpich2	None	bug	buntinas *	1221675971	1236632814	Place holder for defining the netmod interface     * versioning   * allow for future modifications    	buntinas
2	None	152	PAC_F90_CHECK_COMPILER_OPTION + pgf90	mpich2	None	bug	chan	1221712853	1237058207	    PAC_F90_CHECK_COMPILER_OPTION rejects -O2 as a   valid compiler flag for pgf90, because the compiler  produces different output (just the file name) with  -O2 when linked with object file compiled without -O2.  Here is the related config.log output:      configure:10592: checking whether routines compiled with -O2 can be linked with ones compiled without -O2  configure:10598: pgf90 -c  conftest2.f90 >conftest2.out 2>&1  configure:10601: $? = 0  configure:10603: pgf90 -O2  -o conftest conftest2.o conftest.f90  >conftest.bas 2>&1  configure:10606: $? = 0  configure: Compiler output differed in two cases  0a1  > conftest.f90:  configure:10651: result: no      A.Chan	Anthony Chan <chan@mcs.anl.gov>
2	None	165	Config and binary file conflict	mpich2	None	bug	balaji	1222202918	1224791129	    Hi,    has anyone let you guys know about the file conflict in your "mpd"  projects (Music Player Daemon and Multi Processing Daemon) yet?  This old bug report summarizes it, and aside from telling the package  manager not to install both on the same system, nothing has happened  since: http://bugs.gentoo.org/145367  I've run into the same problem and was wondering if you'd be willing to  do something about it?  Debian calls its mpd "mpich-mpd-bin". Why not just "mpich-mpd" I'm not  sure but that sounds reasonable to me. On the other hand, MusicPD is  pretty much standalone and there are probably less scripts out in the  wild that have its name hardcoded---you start in an init script and  that's it. So that would perhaps be easier to change:  /usr/bin/musicplayerd and /etc/musicplayerd.conf?  Would be great if you guys could work something out...    cheers,  _Matthias  --   I prefer encrypted and signed messages. KeyID: FAC37665  Fingerprint: 8C16 3F0A A6FC DF0D 19B0  8DEF 48D9 1700 FAC3 7665	Matthias Bethke <matthias@towiski.de>
2	None	179	Fw: [ROMIO Req #936] Inconsistent and incorrect use of MPIR_Nest_incr and MPIR_Nest_decr	mpich2	None	bug	None	1222886214	1222886214	    Forwarding to Trac.    > From: William Gropp <wgropp@illinois.edu>  > To: romio-maint@mcs.anl.gov  > Content-Type: text/plain; charset="US-ASCII"; format=flowed; delsp=yes  > Content-Transfer-Encoding: 7bit  > Mime-Version: 1.0 (Apple Message framework v929.2)  > Subject: [ROMIO Req #936] Inconsistent and incorrect use of MPIR_Nest_incr and MPIR_Nest_decr and MPI routines  > Date: Mon, 22 Sep 2008 09:09:07 -0500  > X-Mailer: Apple Mail (2.929.2)  > X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailgw.mcs.anl.gov  > Cc: romio-maint@mcs.anl.gov  >   > I enabled the nesting tests in MPICH2 and found a number of problems    > in the ROMIO code (particularly in iwrite.c and iread.c).  In looking    > at these files, I saw no need for the MPIR_Nest_incr or Nest_decr .     > These macros should only be used when calling an MPI routine (not an    > MPIR or other internal implementation routine).  Their purpose is to    > tell the MPI routine not to invoke the MPI error handler but instead    > to return an error code; they're also currently used to avoid    > recursive calls to the global thread mutex; again, this only applies    > to the MPI routines, not the internal routines.  >  	thakur@mcs.anl.gov (Rajeev Thakur)
2	None	221	Fwd: MPICH2 bug? (attributes)	mpich2	None	bug	gropp	1224603493	1230652737	    {{{    Jeff was kind enough to point out this bug in our attribute  handling.  I've tested it out on my mac with gcc and g77 and I  definitely get a bus error in case 4.  His tests are in the attached  tarball and something along these lines should probably be added to  our test suites.    -Dave    Begin forwarded message:  {{{  > From: Jeff Squyres <jsquyres@cisco.com>  > Date: October 21, 2008 Oct 21 8:28:17 AM CDT  > To: Dave Goodell <goodell@mcs.anl.gov>  > Subject: MPICH2 bug?  >  > Yo Dave --  >  > MPI attributes suck.  I was the poor schlep who was tapped to write  > them in OMPI, and it took me a *long* time to get them right.  In  > doing so, I came up with what I thought were 9 discrete cases for  > reading and writing attributes.  I outline the details in the  > comment beginning here:  >  >     https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/attribute/  > attribute.c#L23  >  > To make sure I got this stuff right, I wrote up a test program that  > checks all 9 cases.  MPICH2 seems to segv on case 4 (I didn't  > really dig any further than that).  Can you check it out?  >  > --  > Jeff Squyres  > Cisco Systems  }}}    }}}	goodell
2	None	258	Fwd: Reducing MPI_REAL16's	mpich2	None	bug	None	1225299144	1225307529	    {{{    Yet another wacky inter-language issue to check on...    Begin forwarded message:    > From: Jeff Squyres <jsquyres@cisco.com>  > Date: October 29, 2008 Oct 29 11:25:21 AM CDT  > To: Dave Goodell <goodell@mcs.anl.gov>  > Subject: Reducing MPI_REAL16's  >  > Yo Dave --  >  > In the genre of obscure MPI bugs...  Will MPICH2 also have this  > same issue?  >  >     https://svn.open-mpi.org/trac/ompi/ticket/1603  >  > See comments 1, 2, and 4 in particular.  >  > --  > Jeff Squyres  > Cisco Systems  >    }}}	goodell
2	None	277	Eliminating MPICH2ism from test suite	mpich2	None	bug	gropp	1226099615	1226349256	I'm running the MPICH2 test suite under the IBM MPI, and I've found a number of problems.  Some are ambiguities in the MPI spec that have been fixed in 2.1; some are improper use of MPICH internals (there were some refs to status.count and status.cancelled, neither of which is valid MPI). Some are unsupported features in the IBM MPI.  Some appear to be bugs in the IBM MPI, for which I'll probably want to enhance the output from the tests in those cases.  This is a place holder for the updates.	gropp
2	None	278	MPICH2-1.0.8 on windows with gfortran	mpich2	None	bug	jayesh	1226099669	1236888782	    {{{      BTW: cygwin now has gfortran available. So you might want to support  it aswell in the binary distribution. [difference being single  underscore vs double underscore fortran symbols].    Its available as gfortran-4.exe as part of gcc4-gfortran package.    Satish    }}}	Satish Balay <balay@mcs.anl.gov>
2	None	279	Re: MPICH2-1.0.8 on windows with Compaq f90	mpich2	None	bug	jayesh	1226156500	1236891730	    {{{    Changing to INTEGER (KIND=4) gets this going and I have a successful  configure & build of PETSc with it..    [as mentioned in the previous e-mail using 'INTEGER' on the 32bit  windows install might work with both g77 & compaq f90]    Satish    On Fri, 7 Nov 2008, Satish Balay wrote:    >  > This is with compaq f90 on windows. It support (KIND=4) - but not  > (KIND=8)[its an old compiler - but I think some folks still use it -  > as it goes with VC6, so I test PETSc with it]  >  > Satish  >  > -----------------------------  >  > Checking for header mpif.h  > sh: /home/sbalay/petsc-dev/bin/win32fe/win32fe f90 -c -o conftest.o  -threads  -debug:full -fpp:-m  -I/cygdrive/c/Program\ Files/MPICH2/include conftest.F  > Executing: /home/sbalay/petsc-dev/bin/win32fe/win32fe f90 -c -o conftest.o  -threads -debug:full -fpp:-m  -I/cygdrive/c/Program\ Files/MPICH2/include  conftest.F  > sh: conftest.i^M  > c:\PROGRA~1\MPICH2\INCLUDE\mpif.h(404) : Error: This is not a valid data type.  [KIND]^M  >        INTEGER (KIND=8) MPI_DISPLACEMENT_CURRENT^M  > ----------------^^M  >  > Possible ERROR while running compiler: conftest.i^M  > c:\PROGRA~1\MPICH2\INCLUDE\mpif.h(404) : Error: This is not a valid data type.  [KIND]^M  >        INTEGER (KIND=8) MPI_DISPLACEMENT_CURRENT^M  > ----------------^^M  > ret = 256  > Source:  >       program main  >        include 'mpif.h'  >       end  >  >  >    }}}	Satish Balay <balay@mcs.anl.gov>
2	None	284	Retrofit job attributes to PMI v1.1	mpich2	None	bug	buntinas	1226437293	1236620279	There is a function called PMI_Get_clique_ranks() which is not part of the PMI v1 interface and is not implemented in all process managers.  The proposal was made to implement PMI v2's job attributes functionality into PMI v1 (and bump the version number to 1.1) to address the need for getting info on local processes from the process manager, in a more general way.    This is a placeholder to remind us to do this.	buntinas
2	None	287	Trunk warnings: "...Handles still allocated"	mpich2	None	bug		1226523478	1233517929	 When I run pt2pt/scancel & some io tests on trunk I get the following message when compiled with "--enable-strict --enable-g=all",    #############################################################################    shakey:/sandbox/jayesh/freshBuild/mpich2/test/mpi/pt2pt> mpiexec -n 2 scancel   No Errors  In direct memory block for handle type REQUEST, 3 handles are still allocated    #############################################################################    Regards,  Jayesh 	jayesh
2	None	288	Re: [mpi-all-commits] r3512 - mpich2/branches/dev/knem/src/mpid/ch3/include	mpich2	None	bug	None	1226587769	1226588955	    {{{    The request creation is on the critical path for latency - the intent  was that the parts of the code that needed these fields was  responsible for setting them before using them.  While these changes  are reasonable temporary fixes, we need to reduce the overhead in  request creation and management.    One starter would be to correct the code that should have set the  fields that are now nulled out here - where are they?    Bill    On Nov 13, 2008, at 8:26 AM, goodell@mcs.anl.gov wrote:    > Author: goodell  > Date: 2008-11-13 08:26:51 -0600 (Thu, 13 Nov 2008)  > New Revision: 3512  >  > Modified:  >   mpich2/branches/dev/knem/src/mpid/ch3/include/mpidimpl.h  > Log:  > Merge r3511 from trunk -> knem.  This zeroes out some additional  > fields in MPID_Requests.  >  >  > Modified: mpich2/branches/dev/knem/src/mpid/ch3/include/mpidimpl.h  > ===================================================================  > --- mpich2/branches/dev/knem/src/mpid/ch3/include/mpidimpl.h  > 2008-11-13 14:20:26 UTC (rev 3511)  > +++ mpich2/branches/dev/knem/src/mpid/ch3/include/mpidimpl.h  > 2008-11-13 14:26:51 UTC (rev 3512)  > @@ -316,6 +316,7 @@  >     (sreq_)->comm = comm;                                     \  >     (sreq_)->cc                          = 1;                         \  >     (sreq_)->cc_ptr              = &(sreq_)->cc;              \  > +    (sreq_)->partner_request   = NULL;                          \  >     MPIR_Comm_add_ref(comm);                                  \  >     (sreq_)->status.MPI_ERROR    = MPI_SUCCESS;               \  >     (sreq_)->status.cancelled    = FALSE;                     \  > @@ -331,6 +332,8 @@  >     (sreq_)->dev.segment_ptr     = NULL;                      \  >     (sreq_)->dev.OnDataAvail     = NULL;                      \  >     (sreq_)->dev.OnFinal         = NULL;                      \  > +    (sreq_)->dev.iov_count      = NULL;                      \  > +    (sreq_)->dev.iov_offset     = NULL;                      \  > }  >  > /* This is the receive request version of MPIDI_Request_create_sreq */  > @@ -353,11 +356,14 @@  >     (rreq_)->cc_ptr              = &(rreq_)->cc;              \  >     (rreq_)->status.MPI_ERROR    = MPI_SUCCESS;               \  >     (rreq_)->status.cancelled    = FALSE;                     \  > +    (rreq_)->partner_request   = NULL;                          \  >     (rreq_)->dev.state = 0;                                     \  >     (rreq_)->dev.cancel_pending = FALSE;                        \  >     (rreq_)->dev.datatype_ptr = NULL;                           \  >     (rreq_)->dev.segment_ptr = NULL;                            \  >     (rreq_)->dev.iov_offset   = 0;                              \  > +    (rreq_)->dev.OnDataAvail    = NULL;                      \  > +    (rreq_)->dev.OnFinal        = NULL;                      \  >      MPIDI_CH3_REQUEST_INIT(rreq_);\  > }  >  >    William Gropp  Deputy Director for Research  Institute for Advanced Computing Applications and Technologies  Paul and Cynthia Saylor Professor of Computer Science  University of Illinois Urbana-Champaign          }}}	William Gropp <wgropp@illinois.edu>
2	None	292	Some thread tests hang with Nemesis and SMPD	mpich2	None	bug	jayesh *	1226850709	1228324911	    {{{    Looks like with Nemesis and SMPD, the following thread tests fail:  alltoall, multisend, multispawn, taskmaster    http://www.mcs.anl.gov/research/projects/mpich2/nightly/new/latest/run_20568  10692/test_1/make_testing.html    Rajeev    }}}	"Rajeev Thakur" <thakur@mcs.anl.gov>
2	None	299	ERROR MESSAGE: ../../../include/mpitypedefs.h:17:25: sys/bitypes.h:  No such file or directory	mpich2	None	bug	None	1227254507	1228235134	    {{{    Dear Sir/Mdm,    I am trying to install MPICH2 into CYGWIN and encountered this error message at  the end of my make.log file.    Attached is my make.log file for your kind consideration.    I would like to request for some advice about it, please.    Thank you.    regards,  Cornelius        1) This email is confidential and may be privileged. If you are not the  intended recipient, please delete it and notify us immediately. Please  do not copy or use it for any purpose, or disclose its contents to any  other person.    2) If you believe you're receiving this e-mail in error or prefer not to  receive publicity materials from Data Storage Institute,  please send e-mail to admin@dsi.a-star.edu.sg with the subject "Unsubscribe".    Please remember to include the body text  received.  Removal requests will be honored and respected. Please allow 1 to 3 days for  processing.  You may still receive other e-mails from us within the grace period.    3) As an anti-virus measure, our mail server rejects the following  attachments:     *.bat, *.com, *.cmd, *.exe, *.hta, *.Ink, *.pif, *.scr,     *.shs; *.vb*; *.{*, *.js, *.sct, *.wsh, *.jse, *.swf.    If you need to send us an attachment of this type, please contact us  at helpline@dsi.a-star.edu.sg.    Thank you!    }}}	"Lee Mun Chiew, Cornelius" <Cornelius_LEE@dsi.a-star.edu.sg>
2	None	300	cross-compiling is broken in MPICH2-1.1a2	mpich2	None	bug	gropp	1227293869	1227567796	    {{{    The culprit is PAC_CC_FUNCTION_NAME_SYMBOL finding its way into the  top-level configure.in.  It invokes AC_RUN_IFELSE, which won't work  when cross-compiling.    I note that there is other code in configure.in to test for the  presence of __func__ and such, lacking only the check to see that it  works correctly.  (Would a compiler really ever provide this without  it working correctly?)    Another approach would be to forget the autoconf stuff and make the  use of __func__ conditional under something like this:    #if __STDC__ && __STDC_VERSION__ >= 199901L    The downside is that you miss out on usage for pre-C99 compilers that  support __func__.  But from the looks of what Globus is doing with  this, that doesn't seem like a big deal.    -dg    }}}	David Gingold <david.gingold@sicortex.com>
2	None	304	Mem leak during error condns in MPIR/MPIC* funcs	mpich2	None	bug	jayesh	1227553867	1237158827	This is a placeholder to remind us to cleanup memory in error cases for MPIR/MPIC* functions.  eg: In bcast.c we have the following code,    MPIR_Bcast(){  ...    if (!is_contig || !is_homogeneous)    {        tmp_buf = MPIU_Malloc(nbytes);        ...    }    ...    if ((nbytes < MPIR_BCAST_SHORT_MSG) || (comm_size < MPIR_BCAST_MIN_PROCS))    {        ...        while (mask < comm_size)        {            if (relative_rank & mask)            {                ...                if (mpi_errno != MPI_SUCCESS) {                    /* FIXME: tmp_buf NOT FREED IN THIS CASE */                    MPIU_ERR_POP(mpi_errno);                }                ...            }            mask <<= 1;        }        ...   }      if (!is_contig || !is_homogeneous)    {        ...        MPIU_Free(tmp_buf);    }    fn_exit: ...  fn_fail: ...  }     There are many cases like these in the MPIR/MPIC* funcs.   A good fix would be to get rid of MPIU_Malloc() and use MPIU_CHKLMEM_MALLOC()/MPIU_CHKLMEM_FREEALL() instead.    Regards,  Jayesh	jayesh
2	None	307	about /iface:mixed_str_len_arg	mpich2	None	bug	None	1227634426	1229106357	    {{{    Dear MPI developing group,    I am trying to run a FORTRAN code (Intel Fortran compilier). In my code I need  to use  the compilier option: "/iface:mixed_str_len_arg".  Unfortrantely MPICH2 does not support this mixed_str_len_arg.  I tried compiling the MPICH2 source code with mixed_str_len_arg option, but it  still does not work.  Do you know how to compile a MPI version that supports /iface:mixed_str_len_arg  (based in Interl Fortran compilier).        Cheers,  Wei Yao      }}}	"Wei Yao" <Yao-W@email.ulster.ac.uk>
2	None	315	Minor wart during MPE install	mpich2	None	bug	chan	1228236112	1228430941	    {{{    I saw this flash by when doing an install:    Installing SLOG2SDK's share  mkdir: /Users/gropp/tmp/mpi2-inst-nemesis/share/logfiles: File exists  *** Error making directory /Users/gropp/tmp/mpi2-inst-nemesis/share/  logfiles. ***  Installed MPE2 in /Users/gropp/tmp/mpi2-inst-nemesis    Bill    William Gropp  Deputy Director for Research  Institute for Advanced Computing Applications and Technologies  Paul and Cynthia Saylor Professor of Computer Science  University of Illinois Urbana-Champaign          }}}	William Gropp <wgropp@illinois.edu>
2	None	321	f77/buildiface bugs around HAVE_MULTIPLE_PRAGMA_WEAK	mpich2	None	bug	None	1228498140	1229624590	    {{{    This fixes a couple of problems in the f77/buildiface script.  One is  a typo (SYBMOLS instead of SYMBOLS), the other was a scope error  hiding behind that.    -dg    ....    Index: src/binding/f77/buildiface  ===================================================================  --- src/binding/f77/buildiface  (revision 65243)  +++ src/binding/f77/buildiface  (working copy)  @@ -820,7 +820,7 @@    #endif /* USE_WEAK_SYMBOLS */\    /* End MPI profiling block */\n\n";    -    &AddFwrapWeakName( $lcname, $ucname );  +    &AddFwrapWeakName( $lcname, $ucname, $args );        }    }    @@ -3567,11 +3567,11 @@    # Allow multiple underscore versions of names    # but without the PMPI versions (needed for the wrapper library)    sub AddFwrapWeakName {  -    my ($lcname, $ucname) = @_;  +    my ($lcname, $ucname, $args) = @_;          print $OUTFD "    /* These definitions are used only for generating the Fortran  wrappers */  -#if defined(USE_WEAK_SYBMOLS) && defined(HAVE_MULTIPLE_PRAGMA_WEAK)  && \\  +#if defined(USE_WEAK_SYMBOLS) && defined(HAVE_MULTIPLE_PRAGMA_WEAK)  && \\        defined(USE_ONLY_MPI_NAMES)\n";        &print_weak_decl( $OUTFD, "MPI_$ucname", $args, $lcname );        &print_weak_decl( $OUTFD, "mpi_${lcname}__", $args, $lcname );    }}}	David Gingold <david.gingold@sicortex.com>
2	None	325	maint/updatefiles fails to create configure on niagara	mpich2	None	bug	gropp	1228778619	1234825132	Running maint/updatefiles on niagara1 doesn't create configure files.  No error is reported by maint/updatefiles.  Perhaps another change in the last 2 months improved the error reporting.	buntinas
2	None	331	corrupted block allocated in segment.c[222]	mpich2	None	bug	None	1229615379	1229965347	    {{{    Hi,    I'm using MPICH2-1.0.8 on CentOS 5.2 with MPI_THREAD_MULTIPLE.  I configured  it with:    CC="icc" ./configure --prefix=$HOME/mpich2-icc --with-device=ch3:sock  --with-thread-package=pthreads --enable-threads --enable-error-checking=all  --enable-error-messages=all --enable-timing=all --enable-g=all  --disable-fast    I'm running my program with MALLOC_CHECK_=1 and got the following message  repeated many times in stderr:    [4] Block at address 0x0000000008a7cce0 is corrupted (probably write past  end)  [4] Block allocated in segment.c[222]    My program ended with this message after running for over 5 hours on 8  nodes:    rank 7 in job 1  node40_45659   caused collective abort of all ranks    exit status of rank 7: killed by signal 9    I don't know if this helps or if these messages came at the same time.  Please let me know if you need more information or if there is a way I can  log more information.  I use MPI_Send, MPI_Bsend, MPI_Ssend, MPI_Recv,  MPI_Barrier, and MPI_Reduce in my program.    This is the first time I've seen this error, but in previous runs I have had  messages that say:    job aborted; reason = mpd disappeared    but I don't know if this is related.    Cheers,  Shawn  }}}	"Shawn Poindexter" <sdp@astronomy.ohio-state.edu>
2	None	333	a problem with Fortran and mpe_logf on Windows	mpich2	None	bug	jayesh *	1229682354	1234893136	    {{{    Dear all     I have installed MPICH2 on my PC running Windows Xp and Digital Visual Fortran  6.0.  All thinga are OK but I cant generate clog file after running wmpiexec.   If  (include 'mpe_logf.h') is added to the source.f, many errors are reported  durnig link operation. The considered souce file and the generated errors are  attatched. Please tell me how can I solve this problem. Thank you                                                                       Alaa El-  nashar      }}}	alaa nashar <nashar_al@yahoo.com>
2	None	343	[mpich-discuss] request to enhance jumpshot script for portability	mpich2	None	bug	None	1230648771	1230662103	    {{{    version used : 1.0.8     the current jumpshot script created by config has the JAVA path and MPICH2  installation path hard coded in.  This make shipping jumpshot to a different env  impossible without hacking.     I am requesting that the jumpshot/jumpshot.in code (in  src/mpe2/src/slog2sdk/bin) be enhanced with the following changes:     # Set JAVA environment  if [ "XX${JRE}" = "XX" ] ; then      JVM=<JAVA path during configure>/bin/java  ###  else      JVM=${JRE}/bin/java  fi  JVMOPTS=""                                                 ###  # Assume user's environmental JVMFLAGS is better than what configure found.  JVMFLAGS=${JVMFLAGS:-${JVMOPTS}}       ###  # Set PATH to various jar's needed by the GUI  MPIEXEC_PATH=`which mpiexec`  echo ${MPIEXEC_PATH}  if [ "XX${MPIEXEC_PATH}" = "XX" ] ; then     GUI_LIBDIR=<MPICH2 install path>/lib         ###  else     GUI_LIBDIR=$(dirname $(dirname $MPIEXEC_PATH))/lib  fi      where path in <> are hard coded like the existing code.  the env var JRE (or any  name the MPICH2 group prefers) decides where to pickup the JRE.  lines marked  ### are lines from 1.0.8 release.     this will make relocating jumpshot to a different system easy.     thanks  tan         }}}	chong tan <chong_guan_tan@yahoo.com>
2	None	354	[mpich2-dev] followup, smpd + mpiexec_rsh.c	mpich2	None	bug	jayesh	1231876471	1236108778	    {{{    Another bug with smpd and mpiexec_rsh startup:  the working directory  is never passed to the rsh invocations.  E.g. I run the job from /  home/frey/mpitest and have the program attempting to open "test.in"  and it fails since the "rsh" puts me in /home/frey.  So relative  paths will never work with mpiexec_rsh startup.          ::::::::::::::::::::::::::::::::::::::::::::::::::::::     Jeffrey T. Frey, Ph.D.     Systems Programmer IV / Cluster Management     Network & Systems Services / College of Engineering     University of Delaware, Newark DE  19716     Office: (302) 831-6034  Mobile: (302) 419-4976     http://turin.nss.udel.edu/       99 A1 7F 5E 71 70 8A 38  3C 4A A2 B1 4D 0A B2 49  ::::::::::::::::::::::::::::::::::::::::::::::::::::::        }}}	Jeffrey Frey <frey@UDel.Edu>
2	None	355	RE: [mpich-discuss] MPICH2 1.1a2 - problems with more than 4 computers	mpich2	None	bug	jayesh	1231882249	1232032590	    {{{    Hi,     From the error codes in the hostname tests it looks like Computer1 (Where  the shared network folder resides) is unable to handle the number of  connections to it.    ############ Error code desc from MS ############    ERROR_REQ_NOT_ACCEP (71 0x47) : No more connections can be made to this  remote computer at this time because there are already as many connections  as the computer can accept.    ############ Error code desc from MS ############     We should retry (but we do not) in this case.     Can you verify that the existing network mapped drive connections are  cleanedup in all the machines (Type "net use" in a command prompt on each  machine to view the existing network mapped conns)?    Regards,    Jayesh        _____    From: Tina Tina [mailto:gucigu@gmail.com]  Sent: Tuesday, January 13, 2009 3:21 PM  To: Jayesh Krishna  Subject: Re: [mpich-discuss] MPICH2 1.1a2 - problems with more than 4  computers        Dear Community!    I started testng with the exampel cpi.exe program (so the problem is not  in my program). I run the following command for all computers X=(1..8) and  everything worked ok:  "C:\Program Files\MPICH2\bin\mpiexec.exe" -map X:\\Computer1\MPI$ -wdir  X:\CPI\ -hosts 1 ComputerX -machinefile "C:\Program  Files\MPICH2\bin\machines.txt" X:\CPI\cpi.exe    Than I ran the following command:  "C:\Program Files\MPICH2\bin\mpiexec.exe" -map X:\\Computer1\MPI$ -wdir  X:\CPI\ -n X -machinefile "C:\Program Files\MPICH2\bin\machines.txt"  X:\CPI\cpi.exe    Note: I also changed the machines.txt file as you suggested (adding :1).    The result was the following for X up to 5 it worked ok (I did only one  test run). But when I tested with X=6 (aka. on 6 computers). I got the  following error:    launch failed: CreateProcess(X:\CPI\cpi.exe) on 'Computer2' failed, error  3 - The system cannot find the path specified.    On next run with X=6:    launch failed: CreateProcess(X:\CPI\cpi.exe) on 'Computer2' failed, error  3 - The system cannot find the path specified.    launch failed: CreateProcess(X:\CPI\cpi.exe) on 'Computer6' failed, error  3 - The system cannot find the path specified.    launch failed: CreateProcess(X:\CPI\cpi.exe) on 'Computer3' failed, error  3 - The system cannot find the path specified.    launch failed: CreateProcess(X:\CPI\cpi.exe) on 'Computer5' failed, error  3 - The system cannot find the path specified.    launch failed: CreateProcess(X:\CPI\cpi.exe) on 'Computer4' failed, error  3 - The system cannot find the path specified.    On next run with X=6:    I got the same error as on the first run.    And this errors were repeating on and on and on ... most of the times the  error with only one computer and in most cases it was the second computer  in the machinefile list. But not necesary. When there were more than one  launch failed errors (like in second case) the order could be also  different. In 20 tries not one was successfull.    Than just for kicks I tried with X=8 I got the same errors with random  number of launch failed errors and  more or less random ComputerX that  reported this.    But every now or than I got one of the following errors (after the list of  launch failed errors):  1)  unable to post a write for the next command,  sock error: generic socket failure, error stack:  MPIDU_Sock_post_writev(1768): An established connection was aborted by the  software in your host machine. (errno 10053)  unable to post a write of the close command to tear down the job tree as  part of the abort process.  unable to post an abort command.  2)  unable to post a read for the next command header,  sock error: generic socket failure, error stack:  MPIDU_Sock_post_readv(1656): An existing connection was forcibly closed by  the remote host. (errno 10054)  unable to post a read for the next command on left context.  3)  unable to read the cmd header on the left context, socket connection  closed.      Hope this info helps    Regards    P.S.: I tried a couple of runs with X=5 and got mixed results, on some  runs it worked ok on some it did not. Basically the same as with my  program. So I would still say, as the number of computers increases, the  problem gets worse.    P.P.S.: Almost forgot to test the hostname. Here are the results of two  runs.    "C:\Program Files\MPICH2\bin\mpiexec.exe" -map X:\\computer1\MPI$ -wdir  X:\CPI\ -n 8 -machinefile "C:\Program Files\MPICH2\bin\machines.txt"  hostname  *********** Warning ************  Unable to map \\computer1\MPI$. (error 71)    *********** Warning ************  *********** Warning ************  Unable to map \\computer1\MPI$. (error 71)    *********** Warning ************  computer4  computer1  computer8  computer2  *********** Warning ************  Unable to map \\computer1\MPI$. (error 71)    *********** Warning ************  computer7  computer5  computer3  *********** Warning ************  Unable to map \\computer1\MPI$. (error 71)    *********** Warning ************  computer6    "C:\Program Files\MPICH2\bin\mpiexec.exe" -map X:\\computer1\MPI$ -wdir  X:\CPI\ -n 8 -machinefile "C:\Program Files\MPICH2\bin\machines.txt"  hostname  *********** Warning ************  Unable to map \\computer1\MPI$. (error 71)    *********** Warning ************  *********** Warning ************  Unable to map \\computer1\MPI$. (error 71)    *********** Warning ************  computer3  computer7  computer5  computer1  computer4  computer8  computer2  computer6        2009/1/13 Jayesh Krishna <jayesh@mcs.anl.gov>      Hi,  # Do you get any error message related to mapping network drives when you  ran your job ?   Please provide us with the command+output of your MPI job (Copy-paste  your complete mpiexec command and its output in your email).    # Can you run a command like (Note that I have removed "-noprompt"  option),            mpiexec -map x:\\computer1\MPI -wdir x:\ -n 8 -machinefile  testallnamesmf.txt hostname      with the following contents in the machinefile (testallnamesmf.txt -  contains all the computer/host names - Note that I specify that only 1 MPI  process be launched on each host using "hostname:1" syntax),    computer1:1 -ifhn 192.168.1.1  computer2:1 -ifhn 192.168.1.2  ...  computer8:1 -ifhn 192.168.1.8    # Does your program fail consistently for certain computers ? Try running  a simple job (mpiexec -map x:\\computer1\MPI -wdir x:\ -n 1 -machinefile  testmf.txt hostname) with only specifying 1 computer/host at a time.    # Try removing "-noprompt" from the mpiexec command and see if mpiexec  prompts you for anything (password, inputs etc).    Regards,  Jayesh      _____    From: mpich-discuss-bounces@mcs.anl.gov  [mailto:mpich-discuss-bounces@mcs.anl.gov] On Behalf Of Tina Tina  Sent: Tuesday, January 13, 2009 12:01 PM  To: mpich-discuss@mcs.anl.gov  Subject: [mpich-discuss] MPICH2 1.1a2 - problems with more than 4  computers      Dear Community!    I am using the latest version of MPICH2 for Windows (the problem occurs  also on 1.0.8). I have 8 computers connected over giga-bit switch. I have  written a program that uses MPI for paralelization. When I run a program  on one or two computers. Everything works OK (lets say most of the time).  When I run it on 4 computers, sometimes it works and sometimes it does  not. The error that I get is:  launch failed: CreateProcess(X:\mpi_program.exe) on 'computerX' failed,  error 3 - The system cannot find the path specified.    Most times I get this error for one computer in machine list, but it can  also happen for 2 or more computers etc.    If I increase number of computers over 4. I get this error almost every  time. With 6 or more this happens every time. It looks like the higher the  number the worse it gets. I would really like to make this work. Has  anybody had such experiences and what was the solution.    It looks like the computer tries to start the program before the mapped  drive would be made operational. Is there any way to increase this delay?  Or are there any other settings that needs to be set?    There are some other errors that I occasionally get, but this is the most  important one (for now).    Systems:  Windows XP SP3 (on all computers)  Installed latest MPICH2  Connection giga-bit NICs (local network) over switch    Example of run command: "C:\Program Files\MPICH2\bin\mpiexec.exe" -map  X:\\computer1\MPI -wdir X:\ -n 4 -machinefile "C:\Program  Files\MPICH2\bin\machines.txt" -noprompt X:\mpi_program.exe    \\computer1\MPI is a shared folder on computer1 from which the command is  run    machines.txt consists of following lines:  computer1 -ifhn 192.168.1.1  computer2 -ifhn 192.168.1.2  ...  computer8 -ifhn 192.168.1.8    These are the NICs I would like MPI to use them for communication. The  order of computers in machines.txt is irrelevant (it happens on every  combination).    Regards        }}}	"Jayesh Krishna" <jayesh@mcs.anl.gov>
2	None	363	Re: MPI_IN_PLACE bug in Allgatherv in MPE's collchk	mpich2	None	bug	chan	1231966878	1237008489	  {{{      ----- "Satyanarayana Kakollu" <kakollu@gmail.com> wrote:    > Hi Anthony,  > Is it safe to use MPI_ALLGATHERV with MPI_IN_PLACE in fortran?  >  > Should we just use the recv buffer as send buffer instead of  > MPI_IN_PLACE?  >  > Thanks,  > Satya  >  >  >  > On Tue, Jan 6, 2009 at 4:45 PM, Anthony Chan <chan@mcs.anl.gov>  > wrote:  >  > >  > > Hi Satyanarayana,  > >  > > The support of MPI_IN_PLACE for Allgatherv in CollChk library  > > is definitely in 1.0.6p1.  My simple test program didn't reveal  > > any problem.  If your program is small, could you send it to  > > me so I can check if the collchk library contains any bug ?  > >  > > Thanks,  > > A.Chan  > >  > > ----- "Anthony Chan" <chan@mcs.anl.gov> wrote:  > >  > > > ----- "Rajeev Thakur" <thakur@mcs.anl.gov> wrote:  > > >  > > > > That might be a bug in the collchk library. If sendbuf is  > > > MPI_IN_PLACE  > > > > in  > > > > Allgatherv, the sendcount argument should be ignored.  > > > >  > > > > Rajeev  > > > >  > > > >  > > > >  > > > >   _____  > > > >  > > > > From: Satyanarayana Kakollu [mailto:kakollu@gmail.com]  > > > > Sent: Friday, December 19, 2008 9:53 AM  > > > > To: Anthony Chan  > > > > Cc: Rajeev Thakur  > > > > Subject: Re: Trouble with MPI_BCAST  > > > >  > > > >  > > > > Thank you Rajeev and Anthony,  > > > >  > > > > -mpe=mpicheck give the following message at an MPI_ALL_GATHERV  > call  > > > > in our  > > > > code.  > > > >  > > > > ALLGATHERV (Rank 0) --> Inconsistent datatype signatures  > detected  > > > > between  > > > > local rank 0  > > > >  > > > > I am using the MPI_IN_PLACE option with send count set as '0',  > can  > > > > this be  > > > > the problem ?  > > > >  > > > > Satya  > > > >  > > > > On Wed, Dec 17, 2008 at 10:02 PM, Anthony Chan  > <chan@mcs.anl.gov>  > > > > wrote:  > > > >  > > > >  > > > >  > > > > Or use "mpicc -mpe=mpicheck" or "mpif90 -mpe=mpicheck" as  > linker.  > > > >  > > > > A.Chan  > > > >  > > > >  > > > > ----- "Rajeev Thakur" <thakur@mcs.anl.gov> wrote:  > > > >  > > > > > Satya,  > > > > >            Try linking with -lmpe_collchk. It will run MPE's  > > > > > collective call  > > > > > checker to see if there is any discrepancy in the parameters  > > > passed  > > > > > to  > > > > > MPI_Bcast. If that doesn't show any errors, try running a  > simple  > > > > test  > > > > > program that contains only the broadcast.  > > > > >  > > > > > Rajeev  > > > > >  > > > > >  > > > > >  > > > > >   _____  > > > > >  > > > > > From: Satyanarayana Kakollu [mailto:kakollu@gmail.com]  > > > > > Sent: Tuesday, December 16, 2008 5:31 PM  > > > > > To: Rajeev Thakur  > > > > > Subject: Trouble with MPI_BCAST  > > > > >  > > > > >  > > > > > Rajeev,  > > > > >  > > > > > We are seeing that our code is getting stuck at MPI_BCAST on  > a  > > > > > customer  > > > > > machine. The call simple, all ranks use same size buffer and  > > > count,  > > > > > we  > > > > > verified that the root is same on all ranks.  > > > > >  > > > > > The code works on our clusters, but not on the user's  > machine.  > > > Here  > > > > > are the  > > > > > differences between our clusters and the user's machine.  > > > > >  > > > > >  > > > > > Our clusters                         User's machine  > > > > >  > > > > > Multi-proc nodes                   Single SMP node with 8  > cores on  > > > > > two  > > > > > sockets.  > > > > > CentOS 4, RHEL 4                RHEL 5 client version  > > > > > mpich2 1.0.6p1                     mpich2 1.0.6p1 (same)  > > > > >  > > > > > We were using gdb to localize the bug to MPI_BCAST two of the  > 8  > > > > ranks  > > > > > do not  > > > > > get past the BCAST. If we replace the BCAST with PT2PT  > > > > communication  > > > > > it is  > > > > > running well for 1000s of iterations.  > > > > >  > > > > > We linked our applications statically, on the RHEL 4 machine.  > > > > >  > > > > > Can you share your first thoughts about the issue.  > > > > >  > > > > > Thanks,  > > > > > Satya  > >  }}}	kakollu@gmail.com
2	None	366	RE: [mpich-discuss] can not find function MPI_Type_create_f90_real	mpich2	None	bug	jayesh	1232033959	1232045689	    {{{    Hi,   The current fortran libraries in MPICH2 on windows don't support the  *TYPE_CREATE_F90* functions.   I have added this request to our bug tracking system and will update you  on our progress (Should be available in the next release - end of this  month.).    Regards,  Jayesh      _____    From: mpich-discuss-bounces@mcs.anl.gov  [mailto:mpich-discuss-bounces@mcs.anl.gov] On Behalf Of trimtrim trimtrim  Sent: Thursday, January 15, 2009 3:14 AM  To: mpich-discuss@mcs.anl.gov  Subject: [mpich-discuss] can not find function MPI_Type_create_f90_real      Dear every one:    I am try to use the function of "MPI_Type_create_f90_real" to select the  MPI send data type. But when I link the program, it shows I can't find the  library.  "Error    1     error LNK2019: unresolved external symbol  _MPI_TYPE_CREATE_F90_REAL referenced in function _MAIN__    TEST "    Does anyone knows which library I need to add, the library "fmpe.lib" and  "fmpich2.lib" are already add to the linker. Below is the attached  program.  Many thanks  Regards Haihua.    PROGRAM MAIN  USE mpi  INTEGER(KIND=4),PARAMETER::p = 12,r=37;  INTEGER(KIND=4),PARAMETER:: realKind =selected_real_kind(p,r)  INTEGER(KIND=4):: MPI_INT_KIND,info;  CALL MPI_Type_create_f90_real(p,r,MPI_INT_KIND,info);    write(*,*) realKind;  Write(*,*) PRECISION(x),range(x);    END    }}}	"Jayesh Krishna" <jayesh@mcs.anl.gov>
2	None	375	Need to define C datatypes in Fortran's mpif.h	mpich2	None	bug	None	1233025048	1233025048	    {{{    Creating ticket for this...      -----Original Message-----  From: William Gropp [mailto:wgropp@illinois.edu]  Sent: Monday, January 26, 2009 8:43 PM  To: Rajeev Thakur  Cc: 'Dave Goodell'  Subject: Re: Datatypes in multiple languages    Yes, that's what it means - we need to add them (with a decimal version of  the value) to mpif.h .    Bill    On Jan 26, 2009, at 8:25 PM, Rajeev Thakur wrote:    > Bill,  >     In the Chapter on Language Interoperability, it says "All  > predefined datatypes can be used in datatype constructors in any  > language" (pg 483, ln 46). Does it mean that all C datatypes must also  > be defined in Fortran's mpif.h? We don't have any of them defined  > currently, but we do have the  > opposite: Fortran datatypes defined in C mpi.h.  >  > Rajeev  >      }}}	"Rajeev Thakur" <thakur@mcs.anl.gov>
2	None	401	BARRIER instructions not generated in CH3 SHM device on X86_64 linux.	mpich2	None	bug	goodell	1233615465	1236108915	    {{{    Hello,    While investigating MPICH2 1.0.8 SHM implementation with RHEL5 Linux on  Intel X86_64, we noticed that the following macros:  MPID_WRITE_BARRIER() & MPID_READ_BARRIER() are translated to empty code  in src/mpid/ch3/util/shmbase/ch3_shm.c.    The macros for generating these barriers are defined in  src/mpid/ch3/channels/shm/include/mpidi_ch3_impl.h.    The  following line (63) in mpidi_ch3_impl.h:    #ifdef HAVE_GCC_AND_PENTIUM_ASM    should be replaced with this one:    #if defined(HAVE_GCC_AND_PENTIUM_ASM) || defined(HAVE_GCC_AND_X86_64_ASM)    After this change the barrier macros produce the proper  "fence"  instructions.    Thank you,  Tal Nevo  Application Performance Engineering  ScaleMP Inc.  }}}	Tal Nevo <tal@scalemp.com>
2	None	439	f90 f77 configure test failure on solaris	mpich2	None	bug		1236616633	1236616633	  I noticed this configure message on solaris, and I don't think it's right (g77 does not work with g77).  Maybe it is, in which case ignore this.    -d    {{{  checking whether Fortran 90 works with Fortran 77... cat: cannot open conftest.f90  Output from the link step is  ld: fatal: file conftest1.f90: unknown file type  ld: fatal: File processing errors. No output written to conftest  collect2: ld returned 1 exit status  no  configure: WARNING: The test program that was used and the output may be found in config.log  configure: WARNING: The selected Fortran 90 compiler /opt/csw/gcc3/bin/g77 does not work with the selected Fortran 77 compiler /opt/csw/gcc3/bin/g77.  Use the environment variables F90 and F77 respectively to select compatible Fortran compilers.  The check here tests to see if a main program compiled with the Fortran 90 compiler can link with a subroutine compiled with the Fortran 77 compiler.  }}}    http://www.mcs.anl.gov/research/projects/mpich2/nightly/old/runs/SPARC-Solaris-GNU32-mpd-ch3:nemesis-2009-03-05-20-45.xml	buntinas
2	None	441	mpiu_shm_wrappers warnings	mpich2	None	bug	jayesh *	1236707313	1236895762	I get a lot of warning messages (mostly in mpiu_shm_wrappers) when building MPICH2.    [...snip...]  /home/balaji/projects/mpich2/trunk/trunk/src/util/wrappers/mpiu_shm_wrappers.h: In function 'MPIU_SHMW_Seg_open':  /home/balaji/projects/mpich2/trunk/trunk/src/util/wrappers/mpiu_shm_wrappers.h:889: warning: format not a string literal and no format arguments  [...snip...]    Here's my configure line:    {{{  ../trunk/configure --enable-g=dbg,log \  --with-pm=hydra:gforker:remshell:mpd \  --disable-cxx --disable-f77 --disable-f90 \  --disable-mpe --disable-romio --disable-fast \  --disable-spawn --enable-strict=posix \  --enable-dependencies  }}}    I didn't really dig into which option was causing these warnings.  	balaji
2	None	463	Re: MPICH2 installation problem	mpich2	None	bug	None	1237149939	1237149939	    {{{    Hi,    We'll need some more information, and you should send this to  mpich2-maint@mcs.anl.gov    (which I've cc'ed here).  Just based on what is here, my guess is  that the wrong MPI library was found; can you also send the compile  and link commands and output?    Bill    On Mar 12, 2009, at 10:20 PM, Yang Yang wrote:    > Hi, Dr. Gropp,  >  > I recently installed MPICH2-1.0.8 on the cluster to replace the old  > MPICH-1.2.7.  Although the example pi code can be compiled and run  > using MPICH2, the one software package that used to be working under  > MPICH-1.2.7 is not working now with MPICH2-1.0.8.  The compilation  > was successful and the executable was built.  But when I tried to  > run the package by mpiexec –n 16 ./exe it generated the message as  > follows:  >  > Initializing MPI  > node01.cluster: Not running from mpirun?.  > Initializing MPI  > node10.cluster: Not running from mpirun?.  > Initializing MPI  > node13.cluster: Not running from mpirun?.  > Initializing MPI  > western-wind.cluster: Not running from mpirun?.  > Initializing MPI  > node05.cluster: Not running from mpirun?.  > Initializing MPI  > node04.cluster: Not running from mpirun?.  > Initializing MPI  > Initializing MPI  > node14.cluster: Not running from mpirun?.  > node15.cluster: Not running from mpirun?.  > Initializing MPI  > node08.cluster: Not running from mpirun?.  > Initializing MPI  > node03.cluster: Not running from mpirun?.  > Initializing MPI  > node09.cluster: Not running from mpirun?.  > Initializing MPI  > node07.cluster: Not running from mpirun?.  > Initializing MPI  > node02.cluster: Not running from mpirun?.  > Initializing MPI  > node11.cluster: Not running from mpirun?.  > Initializing MPI  > node12.cluster: Not running from mpirun?.  > Initializing MPI  > node06.cluster: Not running from mpirun?.  >  > What’s wrong ?  I used the same compilers for the software package  > as used for building MPICH2.  >  > I appreciate your help.  >  > Regards,  >  > Yang    William Gropp  Deputy Director for Research  Institute for Advanced Computing Applications and Technologies  Paul and Cynthia Saylor Professor of Computer Science  University of Illinois Urbana-Champaign          }}}	William Gropp <gropp@mcs.anl.gov>
2	None	464	Hydra: Multi-executable launches on the same node	mpich2	None	bug	balaji	1237163769	1237163769	Hydra uses a separate proxy whenever there is a separate executable name. For the case where two different executables are launched on the same host, both proxies try to open the same port and one of them fails.	balaji
2	None	465	PSM netmod for Nemesis	mpich2	None	bug	balaji	1237173548	1237173548	This ticket is a reminder for us to cleanup the PSM netmod in nemesis (will probably need to rewrite it based on the changes in the MX module).	balaji
2	None	466	meet error, when run mpdboot	mpich2	None	bug	None	1237187199	1237187199	    {{{    hello, I meet an error, when i execute "mpdboot -n 2 -f mpd.hosts".    I install MPICH2 follow the instruction of "installguide.pdf" step by step.  Everything including ssh goes smoothly.  i have two computer node01 and node02.  if i run mpdboot manually, it goes well.    root@node01:~# mpd &  [1] 5753  root@node01:~# mpdtrace -l  node01_48187 (159.226.10.27)  root@node01:~# exit  logout  Connection to node01 closed.  root@node02:~# mpd -h node01 -p 48187 &  [1] 5875  root@node02:~# mpdtrace  node02  node01  root@node02:~#  But, when i run "mpdboot -n 2 -f mpd.hosts“ on node01 or node02,an error  occur.  root@node01:~# mpdboot -n 2 -f mpd.hosts  mpdboot_node01 (handle_mpd_output 406): from mpd on node02, invalid port  info:  no_port    root@node01:~#  I have check the iptables, which i never change, it is blank for INPUT  OUTPUT and FORWARD, with no surprise.  I try the "mpdcheck", which give some information:  root@node01:~# mpdcheck -f mpd.hosts -ssh  client on node02 failed to access the server  here is the output:  bash: /home/wsh/bin/mpdcheck.py: No such file or directory  root@node01:~#  there are some information else, which maybe useful:  root@node01:~# mpdboot -n 2 -f mpd.hosts -d -v  debug: starting  running mpdallexit on node01  LAUNCHED mpd on node01  via  debug: launch cmd= /home/wsh/bin/mpd.py   --ncpus=1 -e -d  debug: mpd on node01  on port 33709  RUNNING: mpd on node01  debug: info for running mpd: {'ncpus': 1, 'list_port': 33709, 'entry_port':  '', 'host': 'node01', 'entry_host': '', 'ifhn': ''}  LAUNCHED mpd on node02  via  node01  debug: launch cmd= ssh -x -n -q node02 '/home/wsh/bin/mpd.py  -h node01 -p  33709  --ncpus=1 -e -d'  debug: mpd on node02  on port no_port  mpdboot_node01 (handle_mpd_output 406): from mpd on node02, invalid port  info:  no_port    root@node01:~#  and run"mpdcheck -pc on node01:  root@node01:~# mpdcheck -pc  --- print results of: gethostbyname_ex(gethostname())  ('node01', [], ['159.226.10.27'])  --- try to run /bin/hostname  node01  --- try to run uname -a  Linux node01 2.6.27-13-generic #1 SMP Thu Feb 26 07:26:43 UTC 2009 i686  GNU/Linux  --- try to print /etc/hosts  127.0.0.1    localhost.localdomain    localhost  #127.0.1.1    ubuntu.ubuntu-domain    ubuntu  159.226.10.27  scc-m  159.226.10.27  node01  159.226.10.41  node02  # The following lines are desirable for IPv6 capable hosts  ::1     ip6-localhost ip6-loopback  fe00::0 ip6-localnet  ff00::0 ip6-mcastprefix  ff02::1 ip6-allnodes  ff02::2 ip6-allrouters  ff02::3 ip6-allhosts    --- try to print /etc/resolv.conf  # Generated by NetworkManager  nameserver 159.226.2.135  --- try to run /sbin/ifconfig -a  eth0      Link encap:Ethernet  HWaddr 00:0b:db:bb:d8:2f            inet addr:159.226.10.27  Bcast:159.226.10.255  Mask:255.255.255.0            inet6 addr: 2001:cc0:2004:2:20b:dbff:febb:d82f/64 Scope:Global            inet6 addr: fe80::20b:dbff:febb:d82f/64 Scope:Link            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1            RX packets:24886 errors:0 dropped:0 overruns:0 frame:0            TX packets:10709 errors:0 dropped:0 overruns:0 carrier:0            collisions:0 txqueuelen:1000            RX bytes:18262430 (18.2 MB)  TX bytes:1162554 (1.1 MB)            Interrupt:17    lo        Link encap:Local Loopback            inet addr:127.0.0.1  Mask:255.0.0.0            inet6 addr: ::1/128 Scope:Host            UP LOOPBACK RUNNING  MTU:16436  Metric:1            RX packets:758 errors:0 dropped:0 overruns:0 frame:0            TX packets:758 errors:0 dropped:0 overruns:0 carrier:0            collisions:0 txqueuelen:0            RX bytes:129788 (129.7 KB)  TX bytes:129788 (129.7 KB)    pan0      Link encap:Ethernet  HWaddr 8a:95:9a:1d:ab:c0            BROADCAST MULTICAST  MTU:1500  Metric:1            RX packets:0 errors:0 dropped:0 overruns:0 frame:0            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0            collisions:0 txqueuelen:0            RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)    --- try to print /etc/nsswitch.conf  # /etc/nsswitch.conf  #  # Example configuration of GNU Name Service Switch functionality.  # If you have the `glibc-doc-reference' and `info' packages installed, try:  # `info libc "Name Service Switch"' for information about this file.    passwd:         compat  group:          compat  shadow:         compat    hosts:          files mdns4_minimal [NOTFOUND=return] dns mdns4  networks:       files    protocols:      db files  services:       db files  ethers:         db files  rpc:            db files    netgroup:       nis  root@node01:~#    mpdcheck on node02:  --- print results of: gethostbyname_ex(gethostname())  ('node02', [], ['159.226.10.41'])  --- try to run /bin/hostname  node02  --- try to run uname -a  Linux node02 2.6.27-7-generic #1 SMP Fri Oct 24 06:42:44 UTC 2008 i686  GNU/Linux  --- try to print /etc/hosts  127.0.0.1    localhost.localdomain    localhost  ##127.0.1.1    ubuntu.ubuntu-domain    ubuntu  159.226.10.27  scc-m  159.226.10.27  node01  159.226.10.41  node02  # The following lines are desirable for IPv6 capable hosts  ::1     ip6-localhost ip6-loopback  fe00::0 ip6-localnet  ff00::0 ip6-mcastprefix  ff02::1 ip6-allnodes  ff02::2 ip6-allrouters  ff02::3 ip6-allhosts    --- try to print /etc/resolv.conf  # Generated by NetworkManager  nameserver 159.226.2.135  --- try to run /sbin/ifconfig -a  eth0      Link encap:Ethernet  HWaddr 00:0a:eb:ad:e3:c5            inet addr:159.226.10.41  Bcast:159.226.10.255  Mask:255.255.255.0            inet6 addr: 2001:cc0:2004:2:20a:ebff:fead:e3c5/64 Scope:Global            inet6 addr: fe80::20a:ebff:fead:e3c5/64 Scope:Link            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1            RX packets:9822 errors:0 dropped:0 overruns:0 frame:0            TX packets:7460 errors:0 dropped:0 overruns:0 carrier:0            collisions:0 txqueuelen:1000            RX bytes:3538769 (3.5 MB)  TX bytes:1208257 (1.2 MB)            Interrupt:16 Base address:0xc000    eth1      Link encap:Ethernet  HWaddr 50:78:4c:70:f9:df            UP BROADCAST MULTICAST  MTU:1500  Metric:1            RX packets:0 errors:0 dropped:0 overruns:0 frame:0            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0            collisions:0 txqueuelen:1000            RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)            Interrupt:17 Base address:0xc400    lo        Link encap:Local Loopback            inet addr:127.0.0.1  Mask:255.0.0.0            inet6 addr: ::1/128 Scope:Host            UP LOOPBACK RUNNING  MTU:16436  Metric:1            RX packets:512 errors:0 dropped:0 overruns:0 frame:0            TX packets:512 errors:0 dropped:0 overruns:0 carrier:0            collisions:0 txqueuelen:0            RX bytes:34873 (34.8 KB)  TX bytes:34873 (34.8 KB)    pan0      Link encap:Ethernet  HWaddr b2:ed:0d:3c:2b:c0            BROADCAST MULTICAST  MTU:1500  Metric:1            RX packets:0 errors:0 dropped:0 overruns:0 frame:0            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0            collisions:0 txqueuelen:0            RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)    --- try to print /etc/nsswitch.conf  # /etc/nsswitch.conf  #  # Example configuration of GNU Name Service Switch functionality.  # If you have the `glibc-doc-reference' and `info' packages installed, try:  # `info libc "Name Service Switch"' for information about this file.    passwd:         compat  group:          compat  shadow:         compat    hosts:          files mdns4_minimal [NOTFOUND=return] dns mdns4  networks:       files    protocols:      db files  services:       db files  ethers:         db files  rpc:            db files    netgroup:       nis  root@node02:~#  Any help will be highly appreciated.  Thanks.  }}}	王强 <xiaoqiang714@gmail.com>
2	None	146	Shared memory capable collectives	mpich2	None	feature	goodell	1221634594	1236712841	We need to make sure that shared-memory capable collectives are implemented in 1.1. This ticket is to keep track of what all is pending for 1.1a2. Most current work is in the shmemcoll branch.	balaji
2	None	443	Non-ssh boot-strap servers for Hydra	mpich2	None	feature	balaji	1236722791	1236722839	Currently, ssh is the only supported boot-strap server for Hydra. We will need to eventually add support for slurm, pbs, sge and (maybe) fork as well.	balaji
2	None	444	Dynamic process support in Hydra	mpich2	None	feature	balaji	1236722952	1236722952	This is a place holder for dynamic process support in Hydra.	balaji
2	None	445	Hydra proxy enhancements	mpich2	None	feature	balaji	1236723281	1236723461	The current proxy implementation in Hydra is fairly simple. This needs to be extended in the following ways:    1. The proxy should be able to use the boot-strap server. The interface right now is not clean enough to allow this and needs to be fixed. This will let us launch a multi-level hierarchy of proxies.    2. The proxy currently only handles process launch and stdout/stderr/stdin functionality. Code-wise, however, the proxy is parallel to the process manager and should be able to provide some PMI functionality as well. This will help on large-scale systems, but is currently not supported.    3. Manual proxy launching capability: for platforms that don't have boot-strap servers, it should be possible to launch them either manually or as persistent daemons (e.g., on windows).    4. Connected proxies: on systems which have high-speed and scalable network capabilities (IB, MX), the proxies do not have to be disconnected. This makes most sense only when the proxies are pre-launched and not spawned as a part of mpiexec.	balaji
2	None	446	Windows support for Hydra	mpich2	None	feature	jayesh	1236723343	1236723343	This is a place holder for windows support for Hydra.	balaji
2	None	457	Hydra: Process-core mapping ability	mpich2	None	feature	balaji	1236991392	1236991465	We need to be able to allow processes to be bound to a processor or core on the system. This is do-able using external applications such as numactl that are available on some platforms, but that might not be portable. There are two possible options for this:    1. Extend PMI (maybe part of PMI-1.1 or PMI-2), so the process manager can tell the MPI process what core it should bind to, and the process can internally do the binding.    2. Instead of forking off the application processes directly, the proxies can spawn our own binding processes; these binding processes bind themselves to the appropriate core based on information from the proxy and then execvp the actual application.    My preference is option 2.    Other things to consider here are a portable way for a process to internally bind itself to a core.     -- Pavan	balaji
3	None	40	Support for type_create_indexed_block in ROMIO	romio	None	bug	thakur	1217952911	1236712472	    Need to add support for MPI_Type_create_indexed_block in ROMIO flatten code.    Rajeev  	"Rajeev Thakur" <thakur@mcs.anl.gov>
3	None	89	MPI_Waitany does not return correct errors for truncation	mpich2	None	bug	thakur	1218575064	1236712434	    Need to see why MPI_Waitany does not return error truncate when Irecv is  posted with a smaller buffer than the send.	"Rajeev Thakur" <thakur@mcs.anl.gov>
3	None	90	issues around MPID_Dev_comm_create_hook(), etc.	mpich2	None	bug	goodell	1218639335	1236712586	    Need to look into this.      _____      From: owner-mpich2-dev@mcs.anl.gov [mailto:owner-mpich2-dev@mcs.anl.gov] On  Behalf Of David Gingold  Sent: Thursday, August 07, 2008 8:48 PM  To: mpich2-dev@mcs.anl.gov  Subject: [mpich2-dev] issues around MPID_Dev_comm_create_hook(), etc.      I'm scrambling to get a release out, but lest I forget about this later, I  thought I should mention a few brief bits about MPID_Dev_comm_create_hook()  and friends:    1. The callers of these don't check the return values. It would be nicer to  allow the hooks to pass errors up, e.g. if the create hook does memory  allocation.    2. MPIR_Setup_intercomm_localcomm() doesn't call  MPID_Dev_comm_create_hook(). Should it? (This had me on a bit of a goose  chase this evening, but I'm better now.)     3. I've ended up hanging device-specific things off the communicator that  might instead be hung off the communicator's group. (The bits, in my case,  are representations of what ranks are local versus off-node.) Should we have  MPID_Dev_group_{create,destroy}_hook() functions, also? I note that there is  already a MPID_DEV_GROUP_DECL facility.    -dg    --  David Gingold  Principal Software Engineer  SiCortex  Three Clock Tower Place, Suite 210  Maynard MA 01754  (978) 897-0214 x224      	"Rajeev Thakur" <thakur@mcs.anl.gov>
3	None	100	deleting dead/unsupported code	mpich2	None	bug	buntinas	1219062395	1237022079	    Some cleanup items that we don't want to forget about...    {{{  Begin forwarded message:    > From: Pavan Balaji <balaji@mcs.anl.gov>  > Date: August 17, 2008 Aug 17 2:29:51 PM CDT  > To: mpich2-core@mcs.anl.gov  > Subject: Re: [mpich2-core] deleting dead/unsupported code  > Reply-To: mpich2-core@mcs.anl.gov  >  >  > src/mpid/ch3/channels/nemesis/nemesis/net_mode/elan_module should    > also probably go out. And newtcp_module should be renamed to    > tcp_module.  >  > If we are spending some time to clean up the directory structure,    > it might be worth changing the net_mod directory to "nm" and    > removing _module in each netmod's name. Function name lengths would    > be cut down by half :-).  >  > Merging nemesis/nemesis to nemesis is also on the plate, but will    > probably take more time.  >  >  -- Pavan  }}}	goodell
3	None	182	unify communicator creation paths	mpich2	None	bug		1222980457	1222980457	Not all communicators are created through the MPIR_Comm_create routine.  Some are created via MPIR_Setup_intercomm_localcomm while MPIR_Process.{comm_world,comm_self,icomm_world} are created by hand in a separate array.    Each piece of duplicated communicator construction logic is a spot where we are likely to have a bug some time in the future.  If all three (or more) code paths are not kept in sync correctly then we will likely experience a bug.  As a bonus, this should make the code easier to read and understand.    This ticket is here to keep us from forgetting to clean this up.  	goodell
3	None	197	rreq->dev.recv_pending_count uninitialized	mpich2	None	bug	buntinas	1223505987	1224790554	Now that we have valgrind integration (r3255) valgrind is showing the use of some uninitialized data here:  {{{  goodell-desktop% mpiexec -n 3 valgrind -q ./reduce   No Errors  ==16186== Conditional jump or move depends on uninitialised value(s)  ==16186==    at 0x45FF9D: MPID_Irecv (mpid_irecv.c:76)  ==16186==    by 0x40CA3B: MPIC_Sendrecv (helper_fns.c:115)  ==16186==    by 0x46D731: MPIR_Barrier (barrier.c:70)  ==16186==    by 0x46DFE0: PMPI_Barrier (barrier.c:387)  ==16186==    by 0x45E842: MPID_Finalize (mpid_finalize.c:92)  ==16186==    by 0x421DD8: PMPI_Finalize (finalize.c:152)  ==16186==    by 0x4023DD: main (reduce.c:53)  }}}    adding --db-attach=yes into the mix shows that rreq->dev.recv_pending_count is filled with 0xef, confirming the uninitialized data.  We need to figure out what path is missing this value and if there is a sensible default to initialize it to in the constructor.    -Dave    	goodell
3	None	204	mpid_abort warning	mpich2	None	bug	goodell	1223667852	1225197494	    {{{    I get this warning for mpid_abort.    /homes/thakur/cvs/mpich2/src/mpid/ch3/src/mpid_abort.c: In function  â€˜MPID_Abortâ€™:  /homes/thakur/cvs/mpich2/src/mpid/ch3/src/mpid_abort.c:99: warning: function  declared â€˜noreturnâ€™ has a â€˜returnâ€™ statement  /homes/thakur/cvs/mpich2/src/mpid/ch3/src/mpid_abort.c:100: warning:  â€˜noreturnâ€™ function does return    }}}	"Rajeev Thakur" <thakur@mcs.anl.gov>
3	None	216	Incorrect behavior of MPICH2 C++ when Error handler MPI::ERRORS_RETURN is set	mpich2	None	bug	goodell *	1224162971	1236713188	    {{{    Hi, Rajeev.    During investigation of some problem with MPI C++ code we have found  that error handling in MPICH2 is not conform to MPI standatd.  The issue is the program throws exception then MPI::ERRORS_RETURN error  handler is set.  It is demonstrated in attached example.     <<test.cxx>>  I think that this behavior due to definition MPIX_CALL macro in mpicxx.h  file.    #define MPIX_CALL( fnc ) \  {int err; err = fnc ; if (err) throw Exception(err);}    Victor.    --------------------------------------------------------------------  Closed Joint Stock Company Intel A/O  Registered legal address: Krylatsky Hills Business Park,  17 Krylatskaya Str., Bldg 4, Moscow 121614,  Russian Federation    This e-mail and any attachments may contain confidential material for  the sole use of the intended recipient(s). Any review or distribution  by others is strictly prohibited. If you are not the intended  recipient, please contact the sender and delete all copies.  }}}	"Shumilin, Victor" <victor.shumilin@intel.com>
3	None	231	won't compile with --with-thread-package=no	mpich2	None	bug	chan	1224871468	1236713424	  {{{    ./configure --with-thread-package=no    make  }}}  fails to build cpi with linker errors, e.g.:    info_getvallen.c:(.text+0x41): undefined reference to `MPE_Thread_tls_get'    info_getvallen.c:(.text+0x14c): undefined reference to `MPE_Thread_tls_get'    info_getvallen.c:(.text+0x200): undefined reference to `MPE_Thread_mutex_unlock'    info_getvallen.c:(.text+0x220): undefined reference to `MPE_Thread_mutex_lock'    info_getvallen.c:(.text+0x383): undefined reference to `MPE_Thread_tls_set'    info_getvallen.c:(.text+0x4cb): undefined reference to `MPE_Thread_tls_set'    There are also lots of warnings about incompatible type of MPE_Thread_tls_get    	buntinas
3	None	289	File_set_view doesn't check committed status of datatypes	mpich2	None	bug		1226602142	1226602142	MPI_File_set_view (and perhaps others) does not check that the datatypes are committed (as required by the standard).  The IBM MPI does require this; I found this out while debugging some of the I/O test cases.	gropp
3	None	374	investigate shared memory segment size	mpich2	None	bug	buntinas	1232996514	1236714575	Alexander claims that the current nemesis shared memory segment size is approximately 16MiB, which might be too much as memory/core ratios shrink.  We need to investigate this to see if it's actually a problem and if there is anything sensible we can do to reduce the size of the segment.    -Dave  	goodell
3	None	393	Cross compilation requirement for manual cross files	mpich2	None	bug	chan	1233285855	1236714697	This ticket is a reminder based on some input from Rob Latham about the use of cross files for cross compilation. Here's the summary of the problem:    Currently, the MPICH2 configuration files perform runtime checks to find type sizes and other parameters. During cross-compilation such runtime checks are not possible, so we expect the values to be provided to us in a cross file. However, it looks like there are some tricks to do this at compile time: http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/configure.in#L207    We should follow a similar approach to remove or minimize the number of runtime checks we need, and thus not rely on having a user-provided cross file.	balaji
3	None	411	intel tests don't free datatypes	mpich2	None	bug		1234567339	1234567339	From a quick skim of the code and when building with --enable-g=handle, it looks like the Intel tests don't free the datatypes that they create.    It would be good to create an MPITEST_Finalize that does any necessary cleanup that corresponds to MPITEST_Init.    -Dave  	goodell
3	None	442	Hydra stdin support	mpich2	None	bug	balaji	1236721742	1236721742	Hydra does not currently support stdin when more than one process is launched. This needs to be eventually fixed. Filing this ticket so we don't forget.     -- Pavan	balaji
3	None	448	Fwd: MPIR_Grequest_waitall	mpich2	None	bug	None	1236800632	1236800696	    {{{    Begin forwarded message:    > From: Matthew Koop <koop@cse.ohio-state.edu>  > Date: March 11, 2009 Mar 11 2:42:50 PM CDT  > To: Dave Goodell <goodell@mcs.anl.gov>  > Cc: Darius Buntinas <buntinas@mcs.anl.gov>, Pavan Balaji <balaji@mcs.anl.gov  > >, <thakur@mcs.anl.gov>, <mvapich-core@cse.ohio-state.edu>  > Subject: Re: MPIR_Grequest_waitall  >  > Hi Dave,  >  > Sure, that sounds great.  >  > Matt  >  > On Wed, 11 Mar 2009, Dave Goodell wrote:  >  >> Hi Matt,  >>  >> "greq_wait" passes for the most recent version of MPICH2 that I  >> happen  >> to have compiled on my laptop.  Something must be different between  >> MPICH2 and MVAPICH2 in this area.  >>  >> I don't have the time right now to track down a bug I can't  >> reproduce,  >> but I can turn this into a Trac ticket and we'll take a closer look  >> at  >> it in the next week or two if you'd like.  >>  >> -Dave  >>  >> On Mar 10, 2009, at 2:27 PM, Matthew Koop wrote:  >>  >>>  >>> The 'greq_wait' test fails for MVAPICH2 -- although it never even  >>> enters  >>> the CH3 layer -- everything is at the upper layer.  >>>  >>> Matt  >>>  >>> On Tue, 10 Mar 2009, Darius Buntinas wrote:  >>>  >>>>  >>>> Do you have a test program that demonstrates the bug?  >>>>  >>>> -d  >>>>  >>>> On 03/10/2009 12:44 PM, Matthew Koop wrote:  >>>>> I was looking into an issue here we were seeing with  >>>>> tests/mpi/threads/pt2pt/greq_wait and it doesn't seem like  >>>>> 'wait_fn' will  >>>>> always be populated.  >>>>>  >>>>> MPI_Grequest_start sets wait_fn to NULL, cc_ptr is not null, and  >>>>> kind is  >>>>> set to MPID_UREQUEST. Then in MPIR_Grequest_waitall:  >>>>>  >>>>> 625     for (i = 0; i < count; ++i)  >>>>> 626     {  >>>>> 627         /* skip over requests we're not interested in */  >>>>> 628         if (request_ptrs[i] == NULL || *request_ptrs[i]-  >>>>>> cc_ptr == 0  >>>>>                   ||  request_ptrs[i]->kind != MPID_UREQUEST)  >>>>> 629             continue;  >>>>> 630         mpi_error = (request_ptrs[i]->wait_fn)(1,  >>>>>           &request_ptrs[i]->grequest_extra_state, 0, NULL);  >>>>> 631         if (mpi_error) MPIU_ERR_POP(mpi_error);  >>>>> 632     }  >>>>>  >>>>> Shows that wait_fn gets called in this case.  >>>>>  >>>>> Matt  >>>>>  >>>>  >>>  >>  >    }}}	koop@cse.ohio-state.edu
3	None	265	SMPD machinefile format	mpich2	None	docs	jayesh	1225479325	1236713545	SMPD machinefile format is different from that of mpd, either we should support that format in SMPD or clearly document it.  This is a placeholder to remind me to go through the user guides and update stuff related to windows & smpd    -Jayesh	jayesh
3	None	183	Support for Sun Studio compiler	mpich2	None	feature	chan	1222991512	1237058224	This is a place holder to add support for Sun Studio compilers on x86.	buntinas
3	None	192	make intercomm bcast SMP-aware	mpich2	None	feature	goodell	1223405655	1236713108	The way that MPI_Bcast is implemented right now is SMP-aware for intracomms but not for intercomms.  This needs to be corrected, probably by introducing a small utility function in bcast.c that performs SMP broadcasts (called something like MPIR_SMPBcast).    -Dave  	goodell
3	None	286	supporting -Werror	mpich2	None	feature	None	1226508635	1236714354	    {{{    Given the lack of response from others, I've interpreted this as  consensus and created a tracking ticket for this feature.    -Dave    Begin forwarded message:    > From: Dave Goodell <goodell@mcs.anl.gov>  > Date: November 11, 2008 Nov 11 2:03:32 PM CST  > To: mpich2-core@mcs.anl.gov  > Subject: Re: [mpich2-core] supporting -Werror  > Reply-To: mpich2-core@mcs.anl.gov  >  > This seems like a sensible way to accomplish what I want.  I'm not  > sure what else we would want to put in MAKE_CFLAGS in the future,  > but it gets the job done.  >  > -Dave  >  > On Nov 11, 2008, at 1:55 PM, Anthony Chan wrote:  >  >> Hi Dave,  >>  >> Are you saying that you want -Werror to be used in building  >> the MPICH2 libraries but not used during any of the configure tests ?  >> If so, it seems to me the easiest thing to do create a special  >> makefile  >> only CFLAGS, e.g. MAKE_CFLAGS, which is set during make step,  >> i.e. "make MAKE_CFLAGS=..." like you have been doing, and MAKE_CFLAGS  >> is defined in each Makefile as "CFLAGS = $(CFLAGS) $(MAKE_CFLAGS)".  >>  >> PS. I think fixing configure is the wrong approach, too massive  >> and complicated, because we are altering the meaning of CFLAGS  >> in configure tests...  >>  >> A.Chan  >>  >> ----- "Dave Goodell" <goodell@mcs.anl.gov> wrote:  >>  >>> David Gingold from SiCortex is in the process of updating to  >>> mpich2-1.1.0a1 and is looking for a way to build with "-Werror".  I  >>> would like for us to be able to support this for our own development  >>>  >>> as well.  >>>  >>> Unfortunately, configuring with CFLAGS="-Werror" breaks numerous  >>> configure tests and causes configure to make the wrong determination  >>>  >>> about system characteristics.  For example configure thinks that it  >>> can't find any suitable timer implementation on my mac when  >>> configured with -Werror.  >>>  >>> Running "make CFLAGS=-Werror" sort of works, except that it stomps  >>> any CFLAGS that were set by configure and I don't know if our  >>> recursive make reliably passes variables to sub-makes in all cases.  >>>  >>> Because fixing all of the autoconf tests is likely to be an  >>> intractable problem, what I think we want is a configure switch that  >>>  >>> will cause -Werror to be included in CFLAGS after all configure  >>> tests  >>>  >>> have been made but before AC_OUTPUT time.  This obviously will cause  >>>  >>> some builds to fail if the system is not warnings-clean, but this  >>> wouldn't be the default option.  The main trick to this approach is  >>> that we would have to basically do the same thing in each sub-  >>> configure because of the preciousness of the CFLAGS.  Maybe a  >>> PAC_WERROR inserted just before all AC_OUTPUTs in the tree, I'm not  >>> quite sure...  >>>  >>> Any thoughts?  As usual with these build system issues there's  >>> probably a problem that I'm not thinking of, but that's why it's  >>> good  >>>  >>> to discuss this sort of thing.  Alternative proposals are welcome.  >>>  >>> -Dave  >    }}}	goodell
4	None	118	Simple MPICH2 Delegate Bug	mpich2	None	bug	jayesh	1220052846	1236712633	    Hi,    I have setup 3 Win64 hosts using "smpd -register_spn" with no problem.  On the domain controller, I have created a user that is authorized for  delegation and setup all 3 hosts to allow delegation.      Then, assume the following two scenarios:    Scenario 1:  submission host: vm-cce1  execution host: vm-cce2  command: mpiexec -delegate -hosts 1 vm-cce2 hostname  result: vm-cce2    Scenario 2:  submission host: vm-cce1  execution host: vm-cce1  command: mpiexec -delegate -hosts 1 vm-cce1 hostname  result: op_read error on left context: socket connection closed  unable to read the re-connect request, socket connection closed.    Scenario 2 appears to be a bug.  Why is it that I can not use delegation  when talking to the localhost vm-cce1.  This I think is a bug.  Try it  for yourself.    Regards,    Larry Adams  Senior Systems Engineer  Platform Computing  Tele: (586) 510-0007  Cell: (586) 899-1138  Skype: TheWitness  	"Larry Adams" <ladams@platform.com>
4	None	122	Dynamic process context IDs	mpich2	None	bug	goodell	1220452373	1236712744	Hi Roberto,    We have done several rounds of checks and do not see any difference between MPICH2 1.0.7 and the TCP/IP interface of MVAPICH2 1.2. Both these should perform exactly the same. We are continuing our investigation.    We are wondering whether you can send us a sample code piece to reproduce the problem you are indicating across these two interfaces.  This will help us to debug this problem faster and help you to solve your problem.    I've added other CCs in this email, maybe other people are interested to have a look in.    Attached you find the test program, which I'm working on, to turn up the problem. I'm not completely sure if it works perfectly since I wasn't able to complete its execution, but please let me know if I made something wrong inside the code. The testmaster is quite easy, you must provide the number of jobs to simulate (say 50000) and the node file that the resource manager provide for its schedule. Actually the node that matches the master will be excluded by the slave nodes.    The testmain creates a ring of threads from the assigned nodes. So walking in the ring, for each free node it find, a thread is started so you should have as many threads as the number of assigned nodes working in multithreading. For simulating something to do each thread internally generate a random integer, sets some MPI_Info (host and pwd), spawn the testslave job, send it the  generated random number, wait that the testslave receive and send back that  number, sent and received numbers are comparated in order to verify their coherency, the slave send an empty MPI_Send() for signaling its termination, the thread now calls MPI_Comm_disconnect() for closing the slave connection, and finally all the MPI_Info are cleared. At this time the thread terminate. When the number of requested jobs are correctly "worked out" the application should terminate ... but without cleaning up (too tired sorry ;-), so it just wait a  bit and finalize the MPI.    At this time, I wasn't able to complete any execution. Currently the application still crashing with the backtrace you find below. Only one time I was able to reach 3500 jobs but one thread was stuck in a mutex. Looking in the backtrace you can find the same race I'm getting in my applications.    Program received signal SIGSEGV, Segmentation fault.  [Switching to Thread 1087666512 (LWP 18231)]  0x00000000006a3902 in MPIDI_PG_Dup_vcr () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  Missing separate debuginfos, use: debuginfo-install glibc.x86_64  (gdb) info threads    29 Thread 1121462608 (LWP 18232)  0x0000003465a0a8f9 in  pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0  * 28 Thread 1087666512 (LWP 18231)  0x00000000006a3902 in MPIDI_PG_Dup_vcr  () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1    27 Thread 1142442320 (LWP 18230)  0x0000003464ecbd66 in poll () from  /lib64/libc.so.6    26 Thread 1098156368 (LWP 18229)  0x0000003464e9ac61 in nanosleep () from  /lib64/libc.so.6    1 Thread 140135980537584 (LWP 18029)  main (argc=3, argv=0x7ffffb5992d8)  at testmaster.c:437    (gdb) bt  #0  0x00000000006a3902 in MPIDI_PG_Dup_vcr () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #1  0x0000000000668012 in SetupNewIntercomm () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #2  0x00000000006682c8 in MPIDI_Comm_accept () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #3  0x00000000006a6617 in MPID_Comm_accept () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #4  0x000000000065ec5f in MPIDI_Comm_spawn_multiple () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #5  0x00000000006a17e6 in MPID_Comm_spawn_multiple () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #6  0x00000000006783fd in PMPI_Comm_spawn () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #7  0x00000000004017de in NodeThread_threadMain (arg=0x120a790) at  testmaster.c:314  #8  0x0000003465a06407 in start_thread () from /lib64/libpthread.so.0  #9  0x0000003464ed4b0d in clone () from /lib64/libc.so.6  (gdb) thread 29    [Switching to thread 29 (Thread 1121462608 (LWP 18232))]#0  0x0000003465a0a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from  /lib64/libpthread.so.0  (gdb) bt  #0  0x0000003465a0a8f9 in pthread_cond_wait@@GLIBC_2.3.2 () from  /lib64/libpthread.so.0  #1  0x000000000065e2e7 in MPIDI_CH3I_Progress () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #2  0x00000000006675ca in FreeNewVC () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #3  0x0000000000668302 in MPIDI_Comm_accept () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #4  0x00000000006a6617 in MPID_Comm_accept () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #5  0x000000000065ec5f in MPIDI_Comm_spawn_multiple () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #6  0x00000000006a17e6 in MPID_Comm_spawn_multiple () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #7  0x00000000006783fd in PMPI_Comm_spawn () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #8  0x00000000004017de in NodeThread_threadMain (arg=0x120d590) at  testmaster.c:314  #9  0x0000003465a06407 in start_thread () from /lib64/libpthread.so.0  #10 0x0000003464ed4b0d in clone () from /lib64/libc.so.6  (gdb) thread 27    [Switching to thread 27 (Thread 1142442320 (LWP 18230))]#0  0x0000003464ecbd66 in poll () from /lib64/libc.so.6  (gdb) bt  #0  0x0000003464ecbd66 in poll () from /lib64/libc.so.6  #1  0x00000000006d63bf in MPIDU_Sock_wait () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #2  0x000000000065e1e7 in MPIDI_CH3I_Progress () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #3  0x00000000006cf87c in PMPI_Send () from  /home/roberto/.HRI/Proxy/HRI/External/mpich2/1.0.7/lib/linux-x86_64-gcc-glib  c2.3.4/libmpich.so.1.1  #4  0x0000000000401831 in NodeThread_threadMain (arg=0x120a6f0) at  testmaster.c:480  #5  0x0000003465a06407 in start_thread () from /lib64/libpthread.so.0  #6  0x0000003464ed4b0d in clone () from /lib64/libc.so.6    (gdb) thread 26  [Switching to thread 26 (Thread 1098156368 (LWP 18229))]#0  0x0000003464e9ac61 in nanosleep () from /lib64/libc.so.6  (gdb) bt  #0  0x0000003464e9ac61 in nanosleep () from /lib64/libc.so.6  #1  0x0000003464e9aa84 in sleep () from /lib64/libc.so.6  #2  0x000000000040197c in NodeThread_threadMain (arg=0x120d630) at  testmaster.c:505  #3  0x0000003465a06407 in start_thread () from /lib64/libpthread.so.0  #4  0x0000003464ed4b0d in clone () from /lib64/libc.so.6  (gdb) 	"Rajeev Thakur" <thakur@mcs.anl.gov>
4	None	123	MPIDU_Yield and MPID_Thread_yield	mpich2	None	bug	balaji	1220475167	1236143376	      MPIDU_Yield has been implemented as a mpid/common/locks utility function   for a number of platforms. MPID_Thread_yield is implemented at the MPI   top-level, but only for a subset of cases (e.g., sched_yield and yield;   no windows version or select version is present).    It's probably a better idea to move the MPIDU_Yield function as a   top-level utility as MPID_Yield, and allow MPID_Thread_yield to use   MPID_Yield internally. This means that the internal usage of the   MPIDU_Yield function has to change as well, and probably the header file   dependencies too.    Sending this email as a place holder for this fix.      -- Pavan    --   Pavan Balaji  http://www.mcs.anl.gov/~balaji	Pavan Balaji <balaji@mcs.anl.gov>
4	None	195	Internal error in packet size	mpich2	None	bug	buntinas *	1223484726	1236619851	    {{{    I'm getting this today:    william-gropps-computer:examples gropp$ ./cpi  Internal error - packet definition is too small.  Generic is 32 bytes,  MPIDI_CH3_Pkt_t is 36    This is with the ch3:sock device/channel.  The full configure line is      /Users/gropp/projects/software/mpich2-current/configure --with-  pm=gforker:mpd --with-device=ch3:sock --enable-threads=runtime --  enable-thread-cs=global --enable-refcount=default --enable-  g=log,mem,dbg,mutex,nesting --enable-strict=posix  --enable-  dependencies --without-mpe --prefix=/Users/gropp/tmp/thread-tests/  mpich2-current-inst --enable-debuginfo    Bil    William Gropp  Deputy Director for Research  Institute for Advanced Computing Applications and Technologies  Paul and Cynthia Saylor Professor of Computer Science  University of Illinois Urbana-Champaign          }}}	William Gropp <wgropp@illinois.edu>
4	None	281	RE: [mpich-discuss] closesocket failed error when running an MinGWcompiled executable.	mpich2	None	bug	jayesh	1226422229	1236713686	    {{{     Hi,    This is a bug in the current state machine of SMPD. This should not  affect the execution of your MPI program (This error occurs when the  process manager tries to cleanup connections after the MPI program  finishes execution).    (PS: You can go ahead with your program development and ignore the  closesocket() errors for now. We will fix this bug soon.)  Regards,  Jayesh    -----Original Message-----  From: mpich-discuss-bounces@mcs.anl.gov  [mailto:mpich-discuss-bounces@mcs.anl.gov] On Behalf Of Dmitri Chubarov  Sent: Tuesday, November 11, 2008 1:23 AM  To: someindianbloke@gmail.com  Cc: mpich-discuss@mcs.anl.gov  Subject: Re: [mpich-discuss] closesocket failed error when running an  MinGWcompiled executable.    Hi,    Error 10093 is a winsock error code for "Successful WSAStartup not yet  performed". Do check if you get the same error on other machines to rule  out network misconfiguration in your Windows installation.    On Tue, Nov 11, 2008 at 7:02 AM, Chiraj <someindianbloke@gmail.com> wrote:  > Hi,  >  > I have compiled a C executable using the MPICH2 Windows libraries and  > MinGW. I have tried running the executable using "mpiexec -localroot  > -localonly 2 main.exe 0.xml", but I get the following error:  >  > closesocket failed, sock 1284, error 10093  >  > Could some please tell me what this means I am doing wrong? I have  > tried searching everywhere on what this error means. I have registered  > my user credentials using wmpiregister and checked if I have started  > smpd as a windows service. I am running Windows Xp Professional SP3 on  > an Intel Pentium 4 base system.  >  > Chiraj  >  }}}	"Jayesh Krishna" <jayesh@mcs.anl.gov>
4	None	305	Location of the console - hardcoded /tmp	mpich2	None	bug	None	1227569934	1236714472	    {{{    Hi,    while developing a Tight Integration of the mpd startup method of  MPICH2 into SGE, I found that the location of the console is hard-  coded to be /tmp. Would it be an RFE to redirect it to $TMPDIR, if  it's set?    As I also set MPD_CON_EXT to get unique entries per jobnumber (on the  master node of a parallel job, slave nodes have no consoles at all in  this setup), I would also like to force the console to be in the SGE  created $TMPDIR.    -- Reuti  }}}	Reuti <reuti@staff.uni-marburg.de>
4	None	395	etags configure checks	mpich2	None	bug	balaji	1233288333	1236109235	MPICH2 binary packages currently have a dependency on emacs-common; its configuration relies on the availability of etags that is provided by the emacs-common package. This should be removed since "make etags" is completely broken currently and is rarely used through "make" in MPICH2.	balaji
4	None	435	Code Duplication in Collectives	mpich2	None	bug		1236288136	1236288136	A lot of the code in the collectives is duplicated. These should be moved to helper functions.	balaji
4	None	77	configure support for memory barriers	mpich2	None	feature	goodell	1218220179	1236712347	mpidu_mem_barriers.h contains support for memory barriers on various platforms.  Unfortunately, it doesn't yet have any non-intel, non-unix support and so it won't work for lots of platforms that MPICH2 runs on.  Up until r1299 it had a "#warning" statement in there, but that isn't portable to several compilers, including the Visual Studio 64-bit compiler.  I replaced it with an MPID_Abort statement that will trigger when the barrier is invoked in r1299.  We should probably change it to a configure-time check so that users know up front that MPICH2 won't work on that platform.    This will be more important once we begin using the memory barriers outside of nemesis, since ch3:sock and other code needs to be broadly portable.    -Dave	goodell
4	None	290	better valgrind integration	mpich2	None	feature		1226693985	1226694168	This is a catchall ticket for some of the valgrind integration features that I'd like to put into mpich2 and don't want to forget about.       1. Add an {{{MPIU_Assert_valid_and_not_null}}} - This would check for a !NULL value but also look {{{0xefefefef}}} if mem debugging is enabled and/or test validity via the {{{VALGRIND_CHECK_VALUE_IS_DEFINED(_lvalue)}}} valgrind client request macro if valgrind is available.  {{{MPIU_Assert_zero_and_not_null}}} would be very similar.   2. Use proper memory pool management macros to track the handle allocation.  {{{VALGRIND_CREATE_MEMPOOL}}} and friends is what is used for this.   3. Use {{{VALGRIND_CREATE_BLOCK}}} to add descriptions to regions of memory in order to make understanding valgrind messages clearer.   4. I suspect the knem LMT code will frequently cause valgrind to think that memory is undefined when it is actually defined.  Look at ways to give valgrind a better view of things.   5. Add valgrind client requests as an alternative to the initializations performed by {{{--enable-g=meminit}}}.  That way when Tom uses valgrind on him MPI program he doesn't get uninitialized writev warnings but doesn't have to pay a full initialization latency penalty.   6. Figure out a good way to integrate valgrind into the nightly tests.  This would help catch bugs that our current {{{--enable-g}}} features can't.  	goodell
4	None	297	make -j N support	mpich2	None	feature	goodell *	1227221670	1236714390	We should support parallel builds (via "make -j N").  Builds in general could go much much faster, and would especially speed up on slow clock speed platforms like the SiCortex.    -Dave  	goodell
4	None	306	Add MPID_Segment_transpack() function	mpich2	None	feature	buntinas	1227629920	1227629920	A transpack function would be useful when copying from a noncontig buffer to a noncontig buffer (like in MPIR_Localcopy).  The idea is that you would pass in the source segment and destination segment and the function would copy directly from the source to the dest buffer.  Currently this is done using two copies where the data is packed from the noncontig source buffer into the temp buffer and then unpacked to the noncontig dest buffer.      I don't believe that a transpack function can be implemented above the ADI level because access is needed to segment manipulation functions.    I'm creating this ticket as a placeholder to remind us to look into this.	buntinas
4	None	458	Flow control in MPICH2	mpich2	None	feature		1236992453	1236992453	This is a reminder that we need to add communication flow-control in MPICH2.	balaji
4	None	459	MPICH2 on Vista	mpich2	None	feature		1236992880	1236992880	I was going over the older tickets in mpich2tkreq and moving relevant and non-duplicate ones here. One of the tickets was on MPICH2 for Vista. This has been on our plates for some time, but we never got to it because of not enough demand. This ticket is a reminder that it needs to be done at some point. Keeping this as long term for now, unless someone thinks this is critical.	balaji