From owner-nwchem-users@emsl.pnl.gov Tue Aug 2 13:20:13 2005 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.12.10/8.12.10) with ESMTP id j72KKDiM009988 for ; Tue, 2 Aug 2005 13:20:13 -0700 (PDT) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.12.10/8.12.10/Submit) id j72KKDPX009987 for nwchem-users-outgoing; Tue, 2 Aug 2005 13:20:13 -0700 (PDT) Date: Tue, 02 Aug 2005 15:20:11 -0500 (CDT) From: Brian Barnes Subject: NWChem 4.7 + infiniband compile...help? To: nwchem-users@emsl.pnl.gov Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk hello, I have been attempting to compile NWChem 4.7 for our new cluster, which is made of dual-processor Opteron 250 nodes running the Rocks 4.0.0 linux distribution (which is based on a 2.6 kernel and RHEL4). I can successfully compile NWChem for gigabit ethernet, but am having trouble compiling for the infiniband interconnect. I have read some old correspondence on nwchem-users, from http://www.emsl.pnl.gov/docs/nwchem/nwchem-support/2004/08/0032.NWChem-4.6_Infiniband_from_Mellanox ...but there aren't many details. there is also no formal description in the INSTALL file of how to do this. essentially guessing, based on the myrinet/mpich instructions, I used these environment variables for my compile: setenv USE_MPI "y" setenv MPI_LOC /opt/mpich/infiniband/gnu setenv MPI_LIB $MPI_LOC/lib setenv LIBMPI "-lfmpich -lmpich -lpmpich" setenv MPI_INCLUDE /usr/include/iba/vapi setenv ARMCI_NETWORK MELLANOX setenv ARMCI_HOME /opt/SilverStormIB setenv ARMCI_INCLUDE $ARMCI_HOME/include doing this, the compile finishes cleanly, and I can run a two-processor test job just fine. that same test job, on 2 nodes, 4 processors, crashes immediately: running /net/apps/nwchem/bin/nwchem.inf on 4 LINUX vapi processors ARMCI configured for 2 cluster nodes. Network protocol is 'Mellanox VAPI'. 1 in check FAILURE create qp 2 in check FAILURE create qp 2:ARMCI(vapi):failure:-245:create qp code -1 -1 : -245 3 in check FAILURE create qp 0:ARMCI(vapi):failure:-245:create qp code -1 -1 : -245 0 in check FAILURE create qp 0:ARMCI(vapi):failure:-245:create qp code -1 -1 : -245 Last System Error Message from Task 0:: File name too long [0] [MPI Abort by user] Aborting Program! I compiled using gcc/g77. it took me a few tries to find the right combo of environment variables to even get it to compile (with infiniband). Would someone provide some pointers for an infiniband / linux cluster compile? I would greatly appreciate it. thanks, Brian Barnes Gelb Group Washington University in St. Louis test job (single point energy on caffeine): scratch_dir /scratch permanent_dir /scratch Title "big-e-nw" Start big-e-nw echo memory 200 mb charge 0 geometry autosym units angstrom C -0.662000 1.04900 -0.00200000 N -1.74100 0.197000 -0.00400000 C -1.52600 -1.15400 -0.00100000 N -0.232000 -1.67400 -0.00100000 C 0.796000 -0.781000 -0.00000 C 0.592000 0.542000 -0.00100000 N 1.82900 1.12200 -0.00100000 C 2.71200 0.0780000 0.00000 N 2.04900 -1.01300 0.00000 O -0.829000 2.24900 -0.00000 O -2.45400 -1.93500 0.00100000 C -3.09300 0.769000 -0.00300000 C 0.00400000 -3.11700 0.00500000 C 2.16200 2.53500 -0.00100000 H 3.81300 0.131000 0.00100000 H -3.90000 0.00500000 -0.0300000 H -3.24600 1.37600 0.919000 H -3.23200 1.41700 -0.899000 H 1.08800 -3.36900 -0.00600000 H -0.438000 -3.57100 0.921000 H -0.456000 -3.58100 -0.898000 H 3.26700 2.67800 -0.00100000 H 1.74700 3.02300 -0.911000 H 1.74700 3.02300 0.911000 end basis "ao basis" spherical print H library "cc-pVDZ" O library "cc-pVDZ" C library "cc-pVDZ" N library "cc-pVDZ" END scf rhf nopen 0 end mp2 freeze core 14 end task mp2 energy