From owner-nwchem-users@emsl.pnl.gov Thu Aug 18 15:27:00 2005 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.12.10/8.12.10) with ESMTP id j7IMR0ah015505 for ; Thu, 18 Aug 2005 15:27:00 -0700 (PDT) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.12.10/8.12.10/Submit) id j7IMQx2m015504 for nwchem-users-outgoing; Thu, 18 Aug 2005 15:26:59 -0700 (PDT) Date: Thu, 18 Aug 2005 17:26:57 -0500 (CDT) From: Lev Gelb Subject: Problem with parallel plane-wave unrestricted DFT X-X-Sender: gelb@globe.wustl.edu To: nwchem-users@emsl.pnl.gov Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Authentication-warning: globe.wustl.edu: gelb owned process doing -bs Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk Dear NWChem-users: We've recently put nwchem4.7 on our new cluster (CentOS 4 on dual-opteron nodes), and I'm seeing some very strange behavior on a plane-wave test, as follows (sorry about the length of this message.) The two systems are periodic cells with 9 and 10 lithium atoms, respectively; actual input files follow below. On a single processor, the 9-atom system converges nicely, as follows: iter. Energy DeltaE DeltaRho ------------------------------------------------------ - 15 steepest descent iterations performed 10 -0.1511971625E+01 -0.21168E-01 0.13121E-04 - 10 steepest descent iterations performed 20 -0.1663140082E+01 -0.94371E-02 0.22799E-05 30 -0.1854134208E+01 -0.51254E-02 0.91331E-05 ... 110 -0.1889763716E+01 -0.31983E-03 0.36069E-06 ... 140 -0.1896166323E+01 -0.75225E-04 0.79443E-07 ... 180 -0.1897817775E+01 -0.92757E-05 0.95982E-08 *** tolerance ok. iteration terminated >>> ITERATION ENDED AT Thu Aug 18 16:08:42 2005 <<< However, on 4 processors, the 9-atom problem doesn't complete, because the delta-rho quantity never really changes, even though the energy is converging nicely: - 15 steepest descent iterations performed 10 -0.1511971625E+01 -0.21168E-01 0.16911E-03 - 10 steepest descent iterations performed 20 -0.1663140082E+01 -0.94371E-02 0.23853E-03 30 -0.1854134208E+01 -0.51254E-02 0.55198E-03 40 -0.1867532746E+01 -0.12517E-02 0.65500E-03 .... 250 -0.1898153594E+01 -0.46424E-05 0.97836E-03 (stopped) Likewise, again on 4 processors, the 9-atom problem, but without "set nwpw:lcao_skip .true." - here I get a nonsensical delta-rho column - 15 steepest descent iterations performed 10 -0.1874058999E+01 -0.12883E-02 0.54193E+02 - 10 steepest descent iterations performed 20 -0.1880692950E+01 -0.70479E-03 0.54193E+02 30 -0.1883978550E+01 -0.30178E-03 0.54193E+02 40 -0.1886287615E+01 -0.21862E-03 0.54193E+02 50 -0.1887768161E+01 -0.93751E-04 0.54194E+02 60 -0.1888553232E+01 -0.55774E-04 0.54194E+02 70 -0.1888974033E+01 -0.28792E-04 0.54194E+02 80 -0.1889236991E+01 -0.20856E-04 0.54194E+02 On the other hand, the 10-atom problem seems fine on four processors: - 15 steepest descent iterations performed 10 -0.1714439401E+01 -0.22127E-01 0.15488E-04 - 10 steepest descent iterations performed 20 -0.1977436970E+01 -0.12150E+00 0.28492E-02 - 10 steepest descent iterations performed 30 -0.2068622199E+01 -0.29013E-02 0.13256E-04 40 -0.2087048544E+01 -0.42730E-03 0.20320E-05 50 -0.2089307306E+01 -0.14378E-03 0.44339E-06 60 -0.2089992857E+01 -0.39936E-04 0.64539E-07 70 -0.2090267137E+01 -0.19432E-04 0.32624E-07 80 -0.2090382489E+01 -0.98696E-05 0.16586E-07 *** tolerance ok. iteration terminated >>> ITERATION ENDED AT Thu Aug 18 16:04:09 2005 <<< On two other systems, NWchem4.6 can run similar benchmarks without a problem. So there seems to be something not working properly between the unrestricted code and the parallel library? This is a GNU-compiled version; we get the same results with and without BLAS use. Any suggestions would be very much appreciated. BTW - we're also working on an Infiniband compile, and having a lot of problems with that; will post later. Best regards, Lev Input files: 9-atom problem: ------------------------------------------------------- title "cell1_n100-T1500K_00451_nwchem.inp" scratch_dir /net/scratch/gelb/2 permanent_dir /net/scratch/gelb/2 start pysim_nwchem memory 850 mb geometry units angstrom noautosym noautoz print system crystal lat_a 13.2718075850 lat_b 13.2718075850 lat_c 13.2718075850 alpha 90.0 beta 90.0 gamma 90.0 end Li 0.1065709548 -0.3162622803 0.1720360425 Li -0.2039190642 -0.1328766114 -0.4999692920 Li 0.4651652484 -0.3838787967 -0.4607781700 Li -0.2164949274 0.1305385194 -0.1735956870 Li 0.4366161347 -0.0171857869 -0.0146632790 Li -0.3205859621 0.0482712325 0.3610856152 Li 0.0188304075 0.2934363976 -0.4892234583 Li 0.3554614744 0.1466658958 -0.3815328541 Li -0.1694203274 -0.4703112459 0.3991860417 end nwpw pseudopotentials Li library library1 end simulation_cell ngrid 26 26 26 end xc pbe96 ewald_ncut 3 tolerances 1e-05 1e-06 end set nwpw:psi_nolattice .true. set nwpw:lcao_skip .true. set nwpw:mimimizer 2 task pspw energy ------------------------------------------------------ And here's the 10-atom problem: --------------------------------------------------------- title "cell1_n100-T1500K_00451_nwchem.inp" scratch_dir /net/scratch/gelb/2 permanent_dir /net/scratch/gelb/2 start pysim_nwchem memory 850 mb geometry units angstrom noautosym noautoz print system crystal lat_a 13.2718075850 lat_b 13.2718075850 lat_c 13.2718075850 alpha 90.0 beta 90.0 gamma 90.0 end Li 0.1065709548 -0.3162622803 0.1720360425 Li -0.2039190642 -0.1328766114 -0.4999692920 Li 0.4651652484 -0.3838787967 -0.4607781700 Li -0.2164949274 0.1305385194 -0.1735956870 Li 0.4366161347 -0.0171857869 -0.0146632790 Li -0.3205859621 0.0482712325 0.3610856152 Li 0.0188304075 0.2934363976 -0.4892234583 Li 0.3554614744 0.1466658958 -0.3815328541 Li -0.1694203274 -0.4703112459 0.3991860417 Li -0.3776838877 -0.2204119578 0.0500355936 end nwpw pseudopotentials Li library library1 end simulation_cell ngrid 26 26 26 end xc pbe96 ewald_ncut 3 tolerances 1e-05 1e-06 end set nwpw:psi_nolattice .true. set nwpw:lcao_skip .true. set nwpw:mimimizer 2 task pspw energy ---------------------------------------------------