AMBER Version 5 Volume 1 Amber 5 is a collaborative effort of the research groups of Peter Kollman (UCSF), David Case (Scripps), Ken Merz(Penn State), Tom Darden (NIEHS), and Dave Fergunson (Minnesota). The authors are: David A. Case (The Scripps Research Institute) David A. Pearlman (Vertex Pharmaceuticals) James W. Caldwell (UCSF) Thomas E. Cheatham III (UCSF,NIH) Wilson S. Ross (UCSF) Carlos Simmerling (UCSF) Tom Darden (NIEHS) Kenneth M. Merz (Penn State) Robert V. Stanton (UCSF) Ailan Cheng (Penn State) James J. Vincent(Penn State) Mike Crowley (PSC) David M. Ferguson (University of Minnesota) Randall Radmer (UCSF) George L. Seibel (for contributions to Amber version 3A while at UCSF) U. Chandra Singh (for contributions to Amber versions 2 and 3 while at UCSF) Paul Weiner (for contributions to Amber version 1 while at UCSF) Peter A. Kollman (UCSF) All contents Copyright (c) 1986, 1991, 1995, 1997, University of California. All Rights Reserved. Acknowledgements We acknowledge the generous cooperation of Wilfred van Gun- steren, whose molecular dynamics code was used as the basis of the md modules in version 2.0. We are also pleased to acknowl- edge Rad Olson and Bill Swope at IBM Almaden Center, whose con- tributions were instrumental in developing the better vector optimized non-bonded routines first released in version 3, revision A. Research support from DARPA, the NIH, and the NSF for Peter Kollman is gratefully acknowledged, as is support from the NIH for David Case. Use of the facilities of the UCSF Computer Graphics Laboratory (Thomas Ferrin, PI) is appreci- ated. We thank Nelson H.F. Beebe of the University of Utah for permission to include his ``portable namelist'' code. Wendy Cornell contributed a discussion of charge derivation to the manual and added demos and documentation for the RESP program. We also thank Allison Howard and Valerie Daggett for various helpful discussions and suggestions. The pseudocontact shift code was provided by Ivano Bertini of the University of Flo- rence. Many people helped add features to various codes; these contributions are described in the documentation for the indi- vidual programs. Jed Pitera, in particular, made a large con- tribution to the documentaion for PROFEC in Version 5. Recommended Citations: When citing Amber Version 5 in the literature, the following citation should be used: D.A. Case, D.A. Pearlman, J.W. Caldwell, T.E. Cheatham III, W.S. Ross, C.L. Simmerling, T.A. Darden, K.M. Merz, R.V. Stanton, A.L. Cheng, J.J. Vincent, M. Crowley, D.M. Ferguson, R.J. Radmer, G.L. Seibel, U.C. Singh, P.K. Weiner and P.A. Kollman (1997), AMBER 5, University of California, San Francisco. The basic description of the methods incorporated in Amber is in: D.A. Pearlman, D.A. Case, J.W. Caldwell, W.S. Ross, T.E. Cheatham, III, S. DeBolt, D. Ferguson, G. Seibel, and P. Kollman. AMBER, a package of computer programs for apply- ing molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comp. Phys. Commun. 91, 1-41 (1995). 1. Introduction. Amber is the collective name for a suite of programs that allow users to carry out molecular dynamics simulations, par- ticularly on biomolecules. None of the individual programs carries this name, but the various parts work reasonably well together, and provide a powerful framework for many common cal- culations. The term amber is also sometimes used to refer to the empirical force field that is implemented here. It should be recognized however, that the code and force field are sepa- rate: several other computer packages have implemented the amber force field, and other force fields can be implemented with the amber programs. Further, the force field is in the public domain, whereas the codes are distributed under a license agreement. Amber 5 (1997) represents a significant change from the most recent previous version, 4.1, which was released in 1995. Briefly, the major differences include: (1) an updated and parallelized implementation of the parti- cle-mesh Ewald routine, and its incorporation into the free energy module; (2) "locally-enhanced sampling" (LES) code that allows parts of the system to be present as multiple copies; (3) an alternate version of Sander (ROAR) that includes the ability to define part of the system as a quantum- mechanical section (QM/MM), and includes alternate inte- grators; (4) PROFEC (Pictorial Representation of Free Energy Changes), a set of tools for carrying out and displaying extrapolative free energy changes; (5) new and parallelized methods for NMR refinement; incor- poration of penalites based on pseudocontact shifts. (6) updates to the functionality and stability of LEaP. 1.1. What to read next. If you are installing this package or want to redimension the code, see installation section of this manual. New users should continue with this introduction section, and will also find the tutorial section useful. The directories under amber5/demo contain a number of systems that may serve as exam- ples. You should familiarize yourself with the files in the database directory, amber5/dat, by looking over the database section of the manual. There is also lots of information, tips, and examples on the Amber Web pages at http://www.amber.ucsf.edu; a portion of this is included in the distribution tape: point your browser to amber5/Web/index.html. Although Amber may appear dauntingly complex at first, it has become easier to use over the past few years, and overall is reasonably straightforward once you understand the basic archi- tecture and option choices. Hundreds of people have learned to use Amber; don't be easily discouraged. 1.2. Information flow in Amber. Understanding where to begin in Amber is primarily a prob- lem of managing the flow of information in this package. You first need to understand what information is needed by the energy programs (gibbs, sander, spasms, roar and nmode). You need to know where it comes from, and how it gets into the form that the energy programs need. This section is meant to orient the new user, and is not a substitute for the individual pro- gram documentation. Information all the energy programs need: (1) Cartesian coordinates for each atom in the system. (2) "Topology": connectivity, atom names, atom types, residue names, and charges. (3) Force field: Parameters for all of the bonds, angles, dihedrals, and atom types in the system. (4) Commands: The user specifies the procedural options and state parameters desired. This information is provided to the energy programs in three files: One contains the coordinates; the second contains the topology and parameters, and is called the "topology file"; the command or "input" file is the third file. Additional files may needed for special options specified in the command file. Cartesian coordinates usually come from Xray crystallogra- phy, NMR spectroscopy, or model-building. They should be in Brookhaven Protein Databank (PDB) format. The program LEaP provides a platform for carrying out many of these modeling tasks, but users may wish to consider other programs as well. Topology comes from the database: The database is found in the amber5/dat directory. It is called db94.dat, and is described in Chapter 3. It contains topology for the standard amino acids as well as N and C-terminal charged amino acids, DNA, and RNA. The database contains default internal coordi- nates for these monomer units, but coordinate information is usually obtained from PDB files. Topology information for other molecules (not found in the standard database) is kept in user-generated "residue files". The basic force field parameters are also found in the amber5/dat directory; the database section of the manual con- tains some detailed descriptions of various force field options. This file may be used "as is" for proteins and nucleic acids, or users may prepare their own files that con- tain modifications to the "standard" force fields. data- seq pdb force base field | | | | | | | | +++++++++++++++++++++++++++++++++++++++++++++++++ + | | | | prmtop + prep db94.datlink lnkbin--edit edtbin--parm -+-----nmode + | prmcrd | +++++++++++++++++++++++++++++++++++++++++++++++++ | (also handled by LEaP) prmtop | prmcrd | | nmanal, con- sander, lmanal straints gibbs, spasms | | | carnal | anal mdanal Basic information flow in AMBER _______________________________________________________________ 1.3. Preparatory programs. LEaP is the primary program to create a new system in Amber, or to modify old systems. It combines the functionality of prep, link, edit, and parm outlined below. These latter programs are retained primarily for backward compatibility with older versions of Amber. PREP creates or adds to a residue database from the appropriate topology/parameter information. Required for residues not already defined in the standard AMBER database. (As sup- plied, the standard AMBER database contains definitions for the 20 standard amino acids, nucleic acids with the five standard bases, and a few other units). LINK deals only with topology. You tell LINK the residue sequence of your molecule (even if there is only one residue). LINK will extract the topology information for each residue from the standard AMBER database or, option- ally, from the residue database files created with PREP. The topology for each residue will be linked together to form the topology for the system. This is written to a binary file (default name = lnkbin) that is read by EDIT. EDIT deals mainly with coordinates. One of the primary pur- poses of EDIT is to read PDB coordinates and apply them to the system generated by LINK. Coordinates for atoms that are missing from the PDB file (usually hydrogens) will in most cases be generated automatically by EDIT, using the stored internal coordinates in the link binary file link.bin. EDIT can be used to solvate a molecule in water, to add counterions to nucleic acid systems, or to alter coordinates in specific ways. PARM will determine which bonds, angles, dihedrals, and atom types exist in the system, and extract the appropriate parameters for them from the force field file. PARM then writes the final coordinate and topology files needed by all other AMBER programs. used to add simple non-varying internal coordinate restraints to the system, and to cre- ate a two-state topology file for use in GIBBS free energy perturbation calculations. PARM outputs two files which are used subsequently: The topology file (default name = prmtop); and the coordinate file (default name = prmcrd). PROTONATE This program will add hydrogens in appropriate locations to peptides and proteins that lack them. It can also check the suitability of protons that are already present, and convert from one naming system to another (e.g. from IUPAC-IUB recommendations to Brookhaven format.) 1.4. Energy programs. SANDER is the basic energy minimizer and molecular dynamics pro- gram. This program relaxes the structure by iteratively moving the atoms down the energy gradient until a suffi- ciently low average gradient is obtained. Structures should usually be minimized before molecular dynamics sim- ulation. The molecular dynamics portion generates config- urations of the system by integrating Newtonian equations of motion. MD will sample more configurational space than MIN, and will allow the structure to cross over small potential energy barriers. For complicated systems MD is usually able to locate lower energy conformations than simple energy minimization. Configurations may be saved at regular intervals during the simulation for later anal- ysis. More elaborate conformational searching and modeling MD studies can also be carried out using the SANDER module. This allows a variety of constraints to be added to the basic force field, and has been designed especially for the types of calculations involved in NMR structure refinement. GIBBS is the free energy perturbation program. It is similar to SANDER, but uses the ensemble of generated configurations to calculate the free energy difference between two simi- lar states through either a perturbation or thermodynamic integration approach. The two states are defined by the user in LEaP or PARM. NMODE is both a quasi-Newton Raphson second derivative energy minimizer and vibrational analysis program. The NMODE minimizer is capable of obtaining extremely low energy gradients. NMODE can calculate the normal modes of the system as well as numerous thermochemical properties. Other features include the ability to compute "Langevin modes" (normal modes including viscous coupling to a con- tinuum solvent,) and techniques to find transitions states as well as minima. ROAR is a "Penn State" version of sander, that incoporates a variety of features not found in sander itself. The most notable change is the incorporation of the ability to define a part of the system quantum-mechanically, allowing it to interact with other parts of the system that are defined in a molecular mechanics sense. Other features of ROAR include implementation of a Nose-Hoover-chain MD integrator, Ewald summations, and multiple-time-scale integration routines. 1.5. Analysis programs. ANAL is for the analysis of structure and especially molecular mechanical energy of a single configuration of a system. It can be run on structures both before and after modifi- cation by the energy programs. Running ANAL on the ini- tial configuration of your system is a good way to locate errors in the structure that result in large energies. Anal can also be used for more sophisticated analyses of energy and structure. CARNAL is a molecular dynamics analysis program. It is used for geometrical measurements, root mean square coordinate fit- ting, trajectory averaging, and other structural analyses of MD trajectories. CARNAL executes a programming lan- guage for filtering, measuring and comparing multiple streams of coordinate files (the language contains 44 key- words and uses 10 punctuation/logical characters). As an example, one can use it to build a trajectory in which the solute is positioned for minimum root mean square fit of residues in the active site and only the first shell of waters is included. RDPARM is a general purpose utility that can examine and modify prmtop files created by LEaP or PARM. It can also process trajectory files created from MD simulations, carrying out superpositions, extractions of coordinates, etc. NMANAL/LMANAL computes atomic fluctuations and various correlation func- tions from normal modes. 2. Installation of Amber 5 (1) Compile the basic AMBER distribution: a. go to the src directory below this one. b. create a link to the appropriate Machine file, e.g. ln -s -f Machine/Machine.hp MACHINE for HP, or ln -s -f Machine/Machine.sgi MACHINE for SGI, etc. c. ./Makeall >& make.errs & look at the make.errs file for programs that didn't get made; loader warnings (especially on SGI -- see the Machine.sgi file) can generally be ignored; compiler warnings should be considered, but most are innocuous. (2) Compile LEaP: a. go to the leap directory below this one. b. xmkmf; make World; make install.leap; [Note: on some Digital Unix machines, the second step above may need to be: make "DNETLIB=-ldnet_stub" World ] c. Set your LEAPROOT environment variable using the leapSetup.sh or leapSetup.csh scripts in amber5/Leap (depending on whether you use the Boune or C shells). (3) Compile ROAR: a. go to the roar/source directory below this one. b. edit the Makefile, to uncomment the lines appropriate for your machine. c. make install (4) Make the database files: a. go to the dat directory below this one. b. make (5) Make Interface (optional): a. go to the interface directory below this one. b. source install_int; source install_ambint. (6) Test the basic AMBER distribution a. go to the test directory below this one. b. ./Run.tests >& tests.errs & Examine the tests.errs file: where "possible FAILURE" messages are found, go to the indicated directory under amber5/test, and look at the *.dif files. Differences should involve round-off in the final digit printed, or occasional messages that differ from machine to machine. The "standards" to which your output will be compared are contained in the demo dirctory. The "default" values were run at UCSF on an HP-735 workstation. You should expect to see slighly different detailed MD trajectories on other machines, especially in gibbs, although the "statistics" (which are the real answers) should be very close to our results. The transition-state theory test may give a large number of differences -- see the comments in the output of that test to help interpret it. c. go to the leap/test directory below this one d. ./Run >& tests.errs & Again, examine the tests.errs file for indications of more than round-off error differences. e. There is no automated test for gibbs with PME; if you are interested in this option, please look at the files in amber5/demo/sodium_gibbs_pme. f. There is no automated test for addles; if you are interested in this option, please look at the files in amber5/demo/addles. 2.1. Installing by hand. If your system is not included among the configuration files supplied, or if you want to alter the existing file or are curious how these files hide machine dependencies, see the file amber5/src/Machine/0README. If you are developing or changing the MACHINE file, you may want to go more slowly, com- piling and testing. You can type make or make install in any src/ directory to make the program(s) in that directory, e.g. when redimensioning arrays or otherwise modifying the code. make clean will cause all .o files to be removed; otherwise they stay around using significant space to conserve recompila- tion time. 2.2. Testing. We have installed and tested AMBER 5 on a number of machines, using Cray, IBM, Sun, Hewlett-Packard, DEC, Convex, and Silicon Graphics hardware. However, owing to time and access limitations, not all combinations of code, compilers, and operating systems have been tested. Therefore we recommend running the test suites. The distribution contains a validation suite that can be used to help verify correctness. The nature of molecular dynamics, and to a lesser extent molecular mechanics, is such that the course of the calculation is very dependent on the order of arithmetical operations and the machine arithmetic implementation, i.e. the method used for roundoff. Because each step of the calculation depends on the results of the pre- vious step, the slightest difference will eventually lead to a divergence in trajectories. As an initially identical dynamics run progresses on two different machines, the trajectories will eventually become completely uncorrelated. Neither of them are ``wrong;'' they are just exploring different regions of phase space. Hence, states at the end of long simulations are not very useful for verifying correctness. Averages are meaning- ful, provided that normal statistical fluctuations are taken into account. ``Different machines'' in this context means any difference in floating point hardware, word size, or rounding modes, as well as any differences in compilers or libraries. Differences in the order of arithmetic operations will affect roundoff behavior; (a + b) + c is not necessarily the same as a + (b + c). Different optimization levels will among other things affect operation order, and may therefore affect the course of the calculations. When comparing the output from two different machines for purposes of verification, it is very important that identical input files be used to generate both sets of output. The vali- dation suite uses matched inputs and outputs in the amber5/demo/ tree, which is set to read-only to help you avoid overwriting them with files created on your machine. Testing takes place in the amber5/test/ tree. All initial values reported as integers should be identi- cal. The energies and temperatures on the first cycle should be identical. The RMS and MAX gradients reported in sander are often more precision sensitive than the energies, and may vary by 1 in the last figure on some machines. As is the case with sander, the trajectory in a Gibbs simulation will diverge, but the resulting free energy should not if the simulation is run to convergence (this is not done because of the time involved). In minimization and dynamics calculations, it is not unusual to to see small divergences in behavior after as little as 1-200 cycles. In general, compiler and optimizer errors are fairly obvi- ous, and result in rather large changes in the output, if you get any output at all. See test/0README for examples of acceptable output differences and discussion of peculiarities of various machines. 2.3. Memory Requirements. The AMBER 5 programs as distributed are dimensioned for a fairly large system (about 10K atoms), and you may want to change their dimensions to be more appropriate for the machine you are using if you are running in a tight memory environment. See src/0README for information on resizing. Some programs use local scripts called resize.csh; this script uses the stream editor sed, and employs regular expression matching to cor- rectly redimension the code regardless of what its dimensions are currently set to, even if the same parameter has been inad- vertently set to different values in different modules. Here we describe dimensioning of sander as an example. Sander can also be compiled the MEM_ALLOC switch set in the MACHINE file. This causes arrays to be allocated dynamically, based on the size of the molecule. If this works on your machine, it can eliminate most of the resizing effort described below. In src/sander/sizes.h, you will find the following parame- ters that might need to be changed: parameter (MAXINT=1300000) parameter (MAXPR=2500000) parameter (MAXREA=340000) parameter (MAXHOL=200000) parameter (MAXDUP=5000) The actual memory requirements for a particular job can be determined from the output of SANDER or GIBBS. An annotated example follows: +-------------------------------------------------------------+ | | | 1. R E S O U R C E U S E: | | | | | | Memory Use Allocated Used | | Real 340000 19109 <-- minimum MAXREA | | Integer 1300000 29904 <-- minimum MAXINT | | | | Max Nonbonded Pairs: 1270096 packed 1 to a machine word | | ^ | | "NWDVAR" | | Duplicated 26 dihedrals | | | | Duplicated 94 dihedrals <-- minimum MAXDUP | | . | | . | | . | | | | NB-UPDATE: NPAIRS = 150395 HBPAIR = 2804 | | | +-------------------------------------------------------------+ As is shown above, MAXREA must be at least as large as the number of Real words used. It can be read directly off the output. MAXDUP must be at least as large as the larger of the two 'duplicated' values given, in this case 94. The actual amount of memory controlled by MAXDUP is 10 times its value. MAXINT is slightly more complicated, since it depends on the type of pairlist packing used. The number of NB pair pointers packed in a native integer word, NWDVAR, is printed in the out- put as shown. It will be 1 or 2 on byte-oriented machines, 1, 2, or 4 on 64 bit machines like Cray or FPS264. The minimum value of MAXINT is determined by the sum of the static integer requirement given in the output. For gibbs, MAXINT also includes the requirement for the pairlist, while in sander the pairlist size is determined by MAXPR. The pairlist requirement is the total number of nonbonded pairs, NPAIRS, divided by NWD- VAR. Because the number of pairs may grow or shrink during a run, you should include a safety factor of 5-10% extra for NPAIRS. The algorithm to determine the MAXPR (sander) or MAX- INT pairlist component is thus: (NPAIRS/NWDVAR) * 1.1 More detailed documentation on memory use and packing configu- ration is found in sander/sizes.h. For gibbs, the variables which define memory allocation are MAXREA, MAXINT, and MAXCHR. MAXREA and MAXINT can be set as described above. MAXCHR allo- cates character storage, is typically small, and scales linearly with the number of atoms. These parameters are scaled to a single parameter, memmax.