From owner-nwchem-users@emsl.pnl.gov Sat Jun 2 09:19:02 2007 Received: from odyssey.emsl.pnl.gov (localhost [127.0.0.1]) by odyssey.emsl.pnl.gov (8.13.8/8.13.8) with ESMTP id l52GJ1Vf006452 for ; Sat, 2 Jun 2007 09:19:02 -0700 (PDT) Received: (from majordom@localhost) by odyssey.emsl.pnl.gov (8.13.8/8.13.8/Submit) id l52GJ1Kf006451 for nwchem-users-outgoing-0915; Sat, 2 Jun 2007 09:19:01 -0700 (PDT) X-Authentication-Warning: odyssey.emsl.pnl.gov: majordom set sender to owner-nwchem-users@emsl.pnl.gov using -f X-Ironport-SG: OK_Domains X-Ironport-SBRS: 4.0 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CAFA1YUZCxGTqjGdsb2JhbAAVj24BAQEICAQPH4o2 X-IronPort-AV: i="4.16,376,1175497200"; d="scan'208"; a="33776629:sNHT23288923" DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=VxScWRKK5iHoi6/yM2ig5hna+f7dn9Az2b+1osUdHeQv4pKf+SkBzj2EZGlpLfyBPq/Z+/bknN3IAqVy9RdFSllmcmo9QtwrpOvZcwaV2xX4yAuDcs9Kge7o7VirJcUejpRzUm2U8P8ea6KAPqg4H0Rx4gT4egBTXjpwDPjVonI=; X-YMail-OSG: Z_TohcQVM1l3n6IbjAx2H_OWfh_aYQopRnSXzfGY7qS5Si4cZr5ccj2YPPA79LieCcFWfuC9oWWdow6JenM__GibXFTMDxEG0rGrguBVEk1KI2vA1qrP4g-- Date: Sat, 2 Jun 2007 09:18:58 -0700 (PDT) From: Francesco Pietra Subject: Fwd: RE: [NWCHEM] Failure in restart To: users nwchem MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Message-ID: <182986.97051.qm@web58905.mail.re1.yahoo.com> Sender: owner-nwchem-users@emsl.pnl.gov Precedence: bulk Please, disregard the original notes below. Failure was due to launching the "restart" in serial mode, while the available vectors had been obtained from computation in parallel mode. That said, and apologized about, two related questions: 1) Should a restarted DFT/Geom (like present one) die, or be deliberately killed, could it be restarted again? If so, does the second restarted procedures make use of the computation during the first restart, or does it start again from the original vectors? 2) This restarted procedure makes heavier use of the swap that the original procedure (of the 97GB available, the original computation used no more than 70%, while the restarted procedure uses 90-100%, according to "df -h", with a few MB free). Therefore, what happens if the swap becomes insufficient? For the moment it goes on expedite, the bad situation of the swap notwithstanding (I have in mind to add a second raid 1 array of HDs, just for the swap; unfortunately the nvidias do not like cheap HDs, which result in blurring the memory). Thanks francesco pietra --- Francesco Pietra wrote: > Date: Sat, 2 Jun 2007 00:03:47 -0700 (PDT) > From: Francesco Pietra > Subject: RE: [NWCHEM] Failure in restart > To: "DeJong, Wibe A" , > users nwchem > > Bert: > Unfortunately, it did not work. > > I have tried also changing the Title from > "eqax77834.m052x-1" to "eqax77834m052x-1-restarted" > (in any case starting from a fresh copy of the whole > directory from the original calculation). > > The restart procedure begun by using the old > vectors, > though top (i option) showed that only one processor > was being used. That situation lasted for a long > time, > then I left running overnight, finding in the > morning > on the screen: > > xlm_ao_mo: tmp1 806484 > Current input line > 17: task dft optimize > > see manual for more information > > For further details see manual section > 0:0:xlm_ao_mo: tmp1::806484 > Last System Error Message from Task 0: > Inappropriate iotcl for device > ARMCI aborting 886484 (0xc4c84) > > Nothing new was written (as far as I could see) in > the > directory for this calculation. > > Recently I carried out successfully the same > calculation for a diastereomer of present molecule. > > I have now tried a RHF/Geometry/6-31G* for > cyclopentane, launched from ECCE to the remote > machine > were the above calculations have been carried out. > It > was performed OK, all nodes were involved (top, i > option). > > The only solution I can see, is to take the last > Cartesian coordinates from the original nwch.nwout > stopped at 50 iter, start a new DFT to let nwchem > doing a new guess Hessian. I'll do nothing, however, > until you have the opportunity to read these notes. > > Thanks > > francesco > > --- "DeJong, Wibe A" wrote: > > > Francesco, > > > > You specified 50 steps in the geometry > optimization > > and that's why it > > stopped in the first run. Note, not just the Gmax > > and Grms need to be > > converged, the Xmax and Xrms need to be within the > > threshold too. > > > > As to your restart. Try adding back in the > following > > block before the > > task line > > > > dft > > mult 1 > > XC m05-2x > > iterations 100 > > mulliken > > end > > > > Thanks, > > > > Bert de Jong > > NWCHem developer > > > > -----Original Message----- > > From: owner-nwchem-developers@emsl.pnl.gov > > [mailto:owner-nwchem-developers@emsl.pnl.gov] On > > Behalf Of Francesco > > Pietra > > Sent: Wednesday, May 30, 2007 11:57 PM > > To: users nwchem > > Subject: [NWCHEM] Failure in restart > > > > I wonder whether I did some silly error. > > > > 1)I submitted a dft/geometry according to attached > > file dft_input (made > > shorter by removing most coordinates for the > > 98-atoms molecule). > > > > The calculation on 4 nodes 4GB/node went on > > regularly. > > At iteration 50, Gmax and Grms were already OK > > (default converge > > criteria). However, it crashed at this stage. The > > tail of nwch.nwout is > > attached (dft_tail). > > > > The calculation was restarted by editing nwch.nw > > and, sitting in the dir > > for all output files for this calculation, > > launching: > > > > $ /home/francesco/nwchem50/bin/nwchem nwch.nw > > > > where the edited input nwch.nw is attached > > (dft_input_restart). > > > > It aborted soon with error: > > > > > > grid_nbfm: silly accgauss > > > > current input line: > > 7: task optimize > > > > This error has not yet been assigned. > > > > 0:0: grid_nbfm: silly accgauss:: 0 > > > > Last System error message from task 0: > Inappropriate > > ioctl for device > > ARMCI aborting 0. > > > > > > > > The existing nwch.nwout from original calculation > > has not been > > rewritten. At any event, before setting up the > > restart procedure, I had > > saved on external HD a copy of the whole directory > > for this calculation. > > > > Hope to have made the issue clear. > > > > Thanks > > francesco pietra > > > > > > > > > ________________________________________________________________________ > > ____________Boardwalk for $500? In 2007? Ha! Play > > Monopoly Here and Now > > (it's updated for today's economy) at Yahoo! > Games. > > > http://get.games.yahoo.com/proddesc?gamekey=monopolyherenow > > > > > > > > > > > ____________________________________________________________________________________ > Food fight? Enjoy some healthy debate > in the Yahoo! Answers Food & Drink Q&A. > http://answers.yahoo.com/dir/?link=list&sid=396545367 > ____________________________________________________________________________________ Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7