1. An Amber Tutorial. AMBER is a suite of programs for use in molecular modeling and molecular simulations. It consists of a substructure database, a force field parameter file, and a variety of useful programs. Here we give some commented sample runs to provide an overview of how things are carried out. The examples do not use the interface programs, and only a cover a fraction of the things that it is possible to do with AMBER. The formats of the example files shown are described in detail later in the man- ual, in the chapters pertaining to the programs. A separate introduction to LEaP is given in Chapter 5. Additional tutorial examples are available on the Amber web page, at http://www.amber.ucsf.edu; a portion of this page is included in the distribution tape: point your browser to amber5/Web/index.html. _______________________________________________________________ 1.1. Example 1. Minimization of BPTI in vacuum. _______________________________________________________________ Step 1. Generate some starting coordinates. The first step is to obtain starting coordinates. We begin with the file 6pti.pdb, exactly as distributed by the Protein Data Bank and Brookhaven. This file (as with most Brookhaven files) needs some editing before it can be used by Amber. First, alternate conformations are provided for residue 50, so we need to figure out which one we want. For this exam- ple, we choose the "A" conformation, and manually edit the file to remove the alternate conformers. Second, coordinates are provided for a phosphate group and a variety of water molecules. These are not needed for the calculation we are pursuing here, so we also edit the file to remove these. Third, the cysteine residues are involved in disulfide bonds, and need to have their residue names changed from CYS to CYX to reflect this. Let's call this modified file 6pti.mod.pdb. Finally, hydrogen positions are not included, so we run the Amber program protonate to provide these: protonate -d amber41/dat/PROTON_INFO < 6pti.mod.pdb > 6pti.H.pdb In other situations, many different programs could be used to generate starting coordinates, but the basic ideas are the same: somehow generate what you want in a "pdb" format, then run the result through protonate. We recommend doing the last step even if protons are present, since protonate performs a number of checks on the correctness and naming of hydrogen atoms. Step 2. Run LEaP to generate the parameter and topology file. This is a fairly straightforward exercise in loading in the pdb file, adding the disulfide cross links, and saving the resulting files. Type the following command in either tleap or xleap: bpti = loadPdb 6pti.H.pdb bond bpti.5.SG bpti.55.SG bond bpti.14.SG bpti.38.SG bond bpti 30.SG bpti.51.SG saveAmberParm bpti prmtop prmcrd quit Step 3. Perform some minimization. Use this script: +------------------------------------------------------------------------+ | Running minimization for BPTI | +------------------------------------------------------------------------+ |cat << eof > min.in | |# 200 steps of minimization, distance-dependent dielectric | | &cntrl | | maxcyc=200, imin=1, cut=12.0, nsnb=20, idiel=0, scee=2.0, ntpr=10, | | &end | |eof | |sander -i min.in -o 6pti.min1.out -c prmcrd -r 6pti.min1.xyz | |/bin/rm min.in | +------------------------------------------------------------------------+ This will perform minimization (imin) for 200 steps (max- cyc), using a nonbonded cutoff of 12 `angstroms' (cut) and a distance-dependent dielectric constant (idiel). The list of non-bonded pairs will be updated every 20 steps (nsnb), and intermediate results will be printed every 10 steps (ntpr). Text output will go to file 6pti.min1.out, and the final coor- dinates to file 6pti.min1.xyz. The rest of this section documents using older AMBER mod- ules to carry out the equilvalent of Step 2, above. These are included for completeness, but we encourage people not to con- tinue doing things this way. Step 2a. Run LINK to establish the topology. The following script will accomplish this by creating an input file and running LINK with a prep database: +-------------------------------------------------------------------------------+ | Running link for BPTI | +-------------------------------------------------------------------------------+ | | |cat <lnkin | | bpti | | | | | |DU | | 0 0 0 0 0 | | bpti | |P 1 0 1 3 1 | |ARG 2PRO ASP PHE CYX LEU GLU PRO PRO TYR THR GLY | |PRO CYX LYS ALA ARG ILE ILE ARG TYR PHE TYR ASN | |ALA LYS ALA GLY LEU CYX GLN THR PHE VAL TYR GLY | |GLY CYX ARG ALA LYS ARG ASN ASN PHE LYS SER ALA | |GLU ASP CYX MET ARG THR CYX GLY GLY ALA | | | | 5 55SG SG 0 | | 14 38SG SG 0 | | 30 51SG SG 0 | | | |QUIT | |eof | |# | |link -i lnkin -o lnkout -p $AMHOME/dat/db4.dat | |/bin/rm lnkin | +-------------------------------------------------------------------------------+ You should interpret the file given above using the input description for link. Basically, the first seven lines contain operation flags, many of which are almost always the same. The next four lines give the amino acid sequence, then come lines that establish cross-links (disulfide bonds) between residues 5-55, 14-38 and 30-51. The UNIX AMHOME variable should be set to the location of Amber on your system, and your PATH variable should allow the program link to be found. The above script will create a text output file lnkout (which you should read), and a binary file lnkbin, which will be used as input to the next step. This step informs AMBER of the "topology" of the system: what all the atoms are called, what their "types" are (needed to set up a force field), and where all the bonds are. All this information was assembled from the sequence and the infor- mation about amino acids that is contained in the db4.dat file. Step 2b. Run EDIT to insert the starting coordinates. The following script will accomplish this: +-----------------------------------------------------------+ | Running edit for BPTI | +-----------------------------------------------------------+ |cat < edtin | |bpti, 5pti structure | | 0 0 0 0 | |XYZ | |OMIT | |XRAY | | 0 0 0 0 0 | |QUIT | |eof | |# | |edit -i edtin -o edtout -pi 6pti.H.pdb | |/bin/rm edtin | +-----------------------------------------------------------+ The XRAY option reads in a Brookhaven format file and com- pares the atoms in that file to what Amber expects to see; when it finds matches it inserts the proper coordinates into the system, and it reports errors when it fails to find matches. In this case, all the atoms are present, and no warning mes- sages should be obtained. The output from the above script will be a text file called edtout and a binary file edtbin, which will be used in the next step. Step 2c. Run PARM to connect the force field to the protein. This is done by this simple script: +-----------------------------------------------------------+ | Running parm for BPTI | +-----------------------------------------------------------+ |cat <prmin | | name of system | |BIN FOR STDA | | | | 0 | | | |eof | |# | |parm -i prmin -o prmout -f $AMHOME/dat/parm91.dat | |/bin/rm prmin | +-----------------------------------------------------------+ In this step, the parm program looks through the molecular information in edtbin and determines all the types of "parame- ters" (force constants, bond lengths, non-bonded sizes, etc.) that are necessary to calculate the energy of BPTI. It then searches through the parm91.dat file to find the parameters. The program will complain if something is missing, but this is just a standard protein, and everything is in place. The out- put is a text file prmout, which you should read, and data files prmtop and prmcrd that will be used in the next step. The prmtop and prmcrd files are ascii files, so can be moved easily from one machine to another. (It is common to run link, edit, and parm on a workstation, then transfer the prmtop and prmcrd files to a more powerful computer for minimization and dynamics.) The prmtop file contains all the information needed to compute the energy of a molecule, and prmcrd contains the coordinates (in this case, the starting coordinates.) This division makes sense since minimization or dynamics will change the coordinates but not the make-up of the molecule. You are now ready to go back to Step 3, above. _______________________________________________________________ 1.2. Example 2. Peptide with a non-standard residue. _______________________________________________________________ As a second example, suppose you want to minimize an enzyme - substrate complex, and that you have a standard PDB file with coordinates for the enzyme and substrate, which we will call 'model.pdb'. Such a file might come from X-ray crys- tallography or model building. PDB files generally don't con- tain connectivity information, so this must be provided. In addition, each atom of the system must be assigned the appro- priate AMBER atom type so the correct force field parameters will be applied. For the amino acid residues of the protein, this connectivity and atom type information already exists, keyed by residue name in the AMBER database, and need not be specified. However, the substrate will require that you input connectivity and atom type information. This is done using the program PREP. The input for prep consists of only one file, in this case subprep.in. Running PREP will give you two output files. One of the files, sub.res, is a residue topology file for your substrate. It will have the same format as the amino acid residues in the standard data- base. The other file, prep.out, is a list of diagnostic information. Amber programs are usually run through the use of command files. In the following examples, one will see that the con- trol file for each program is named filename.in. The output file containing user information and diagnostics is called filename.out. The binary topology files that are passed from module to module are called filename.bin, where filename is the name of the module that created it. Residue files from prep have names ending in ".res", pdb files have names ending in ".pdb", and coordinate files created by AMBER are usually named name.crd. It is not essential that these naming conventions be adhered to but it will facilitate communication with other AMBER users. The following two files are the command and input files that create the substrate residue file using PREP. +------------------------------------------------------------+ | Running PREP | +------------------------------------------------------------+ |Unix: | |~amber41/exe/prep -i subprep.in -o prep.out | | | |VMS: | |$ set default [yourdir.tet] | |$ assign subprep.in for005 | |$ assign prep.out for006 | |$ run [amber41.exe]prep | | | |Here is the "subprep.in" file; strip comments before using. | | | |0 0 1 !control for database generation | | !blank card | |substrate !title | |sub.res !name of output file | |SUB INT 0 !control parameters - see PREP.DOC | |CORRECT OMIT DU BEG | | | | 1 DUMM DU M 0 0 0 0. 0. 0. 0. | | 2 DUMM DU M 1 0 0 1.449 0. 0. 0. | | 3 DUMM DU M 2 1 0 1.522 111.1 0. 0. | | 4 N N M 3 2 1 1.335 116.6 180. -0.5200 | | 5 HN H E 4 3 2 1.01 119.8 0. 0.2480 | | 6 CA CH M 4 3 2 1.449 121.9 180. 0.2270 | | 7 CB C2 S 6 4 3 1.525 111.1 60. 0.0390 | | 8 CG C2 S 7 6 4 1.525 109.47 180. 0.0530 | | 9 CD C2 S 8 7 6 1.525 109.47 180. 0.0480 | |10 CE C2 S 9 8 7 1.525 109.47 180. 0.2180 | |11 NZ N3 3 10 9 8 1.47 109.47 180. -0.2720 | |12 HNZ1 H3 E 11 10 9 1.01 109.47 60. 0.3110 | |13 HNZ2 H3 E 11 10 9 1.01 109.47 180. 0.3110 | |14 HNZ3 H3 E 11 10 9 1.01 109.47 300. 0.3110 | |15 C JJ M 6 4 3 1.522 111.1 180. 0.5260 | |16 O O2 E 15 6 4 1.229 120.5 0. -0.5000 | | | | | |IMPROPER | |-M CA N H | |CA +M C O | |CB CA N C | | | |DONE | |STOP | +------------------------------------------------------------+ The next step is to link the appropriate residues from the standard database, along with the residue file you created with PREP into a macromolecule. This is done using the program LINK. Note that if you were only interested in the enzyme and not the substrate, you would start at this point. The third line of link.in tells LINK that the topological information for residue "SUB" is in the file sub.res (the "standard" residues are retrieved from the prep database file specified with the `-p' argument). The residues of the enzyme are listed sequentially in the order that they are to be bonded. The substrate residue is put at the end, separated by the spacer residue "***" indi- cating that it is not covalently attached. Alternatively it could be specified as a separate molecule. After the residue sequence, disulfide crosslinks are input. Any desired covalent attachment can be input as a crosslink. See LINK.DOC for details. Running LINK again produces two files: link.bin, the molecular topology file, and link.out, which contains diagnos- tic information. +-----------------------------------------------------------------+ | Running LINK | +-----------------------------------------------------------------+ |Unix: | |~amber41/exe/link -i link.in -o link.out -l link.bin -p db4.dat | | | |VMS: | |$ SET DEFAULT [YOURDIR.TET] | |$ ASSIGN [AMBER.DAT]DB4.DAT FOR001 | |$ ASSIGN LINK.IN FOR005 | |$ ASSIGN LINK.OUT FOR006 | |$ ASSIGN LINK.BIN FOR010 | |$ RUN [AMBER41.EXE]LINK | | | |LINK.IN: | | | |TACK'S PROTEIN !title | | !blank line | |SUB 0sub.res !filename for residue SUB | | !blank card | |DU !symbol for dummy atoms | | 0 0 0 0 0 !print controls | |tacks protein !subtitle for first molecule | |P 1 0 1 3 1 !control parameters for first molecule | |ASP 1SER CYX GLU ALA ILE ILE HIP GLU LEU HID SER | |ARG HID PRO GLY ASP PHE GLY ALA ASP ALA GLN GLY | |ALA MET ASN LYS ALA CYX GLU SER *** SUB !residue list | | !blank card | | 3 30SG SG 0 !crosslink info | | !blank card | |QUIT !exit control | +-----------------------------------------------------------------+ The binary file link.bin contains your system, but at this point it lacks the correct atomic coordinates. It does contain the internal coordinates for each residue, but the residues are linked with arbitrary dihedral angles. The file also contains some pseudo atoms called "dummy" atoms. They are there to define the space axes for the internal coordinate system and must be removed. The addition of correct coordinates and removal of dummy atoms is accomplished with the program EDIT. Input for EDIT consists of a small control file, edit.in; the topology file from LINK, link.bin; and your PDB file. Two files are output: edit.bin, the molecular topology file (now with correct coordinates and dummy atoms removed), and edit.out containing user information and diagnostics. A look at edit.out generally reveals some frightening diagnostics stating that input for some atoms was not found. What this actually means is that the PDB file was missing some atoms present in the database residues, or had some extra atoms not present in the database (sometimes these are the same atoms, with different names). If atoms in the PDB file are missing, edit will add them using the stored internal coordinates of the residues. In the event that this can't be done (notably for the very first atoms in a molecule), the correct orientation of the atoms can be specified on edit.in using the "ABC" option of EDIT. Extra atoms in the PDB file are ignored. Some important notes: EDIT expects the residue sequence of the pdb file to match the link input file. If any residues are missing or extra ones are pre- sent, the program will stop with an error message. The ordering of atoms within a residue does not matter, nor do the atom sequence numbers, however, all atom records for a given residue should have the same residue sequence number. +-----------------------------------------------------------+ | Running EDIT | +-----------------------------------------------------------+ |Unix: | |~amber41/exe/edit -i edit.in -o edit.out -l link.bin | | -e edit.bin -pi model.pdb -po edit.pdb | | | |VMS: | |$ set default [yourdir.tet] | |$ assign edit.in for005 | |$ assign edit.out for006 | |$ assign link.bin for010 | |$ assign edit.bin for012 | |$ assign model.pdb for015 | |$ run [amber41.exe]edit | | | |EDIT.IN: | | | |tacks protein !title | | 0 0 0 0 !print controls | |XYZ !select xyz option | |OMIT !xyz input - omit dummy atoms | |XRAY !select xray option | | 0 0 0 0 !xray input | |ABC !select abc option | | 1 0 !abc input | | 1 1.01 109.5 60.0 2 5 6 !abc input | | 3 1.01 109.5 180.0 2 5 6 !abc input | | 4 1.01 109.5 300.0 2 5 6 !abc input | | !blank card | | QUI !terminate abc option | |QUIT !terminate edit | +-----------------------------------------------------------+ All that remains to be done is add force field parameters to the molecular topology file, and you will be ready to run either molecular mechanics or molecular dynamics. Force field parameters are added with the program PARM. Input for PARM consists of a control file; parm.in, a parameter file; parm91.dat, and the topology file from EDIT; edit.bin. parm91.dat is part of the AMBER 4 distribution. Output from PARM consists of the completed topology file, prmtop, a coordi- nate file, prmcrd, and the diagnostics file prmout. The fin- ished topology and coordinate files can be written either in binary form or formatted form. In general we now use only the formatted form for all files, so they can be used on various machines regardless of the underlying representation of data. It is important to look at parm.out to make sure that all the needed parameters were found in parm91.dat. If you are only working with amino acids, nucleic acids, or water, they should all be there. However, it is very easy to construct a molecule for which parameters do not exist in parm91.dat. In that event you will have to create some on your own. Often parameters for similar bonding situations can be found in parm91.dat, and sim- ply duplicated in that file with the appropriate atom types. An overview of parameter generation is contained in Appendices C and D. +-----------------------------------------------------------+ | Running PARM | +-----------------------------------------------------------+ |Unix: | |~amber41/exe/parm -i parm.in -o prmout | | -e edit.bin -f amber5/dat/parm91.dat | | -c prmcrd -p prmtop | | | |VMS: | |$ SET DEFAULT [YOURDIR.TET] | |$ ASSIGN PARM.IN FOR005 | |$ ASSIGN PARM.OUT FOR006 | |$ ASSIGN [AMBER41.DAT]PARM91.DAT FOR010 | |$ ASSIGN PARM.TOP FOR012 | |$ ASSIGN EDIT.BIN FOR015 | |$ ASSIGN PARM.CRD FOR018 | |$ RUN [AMBER41.EXE]PARM | | | | | |PARM.IN: | | | |tack helix !title | |BIN FOR STDA !format controls + nonbon param set name | | 0 0 0 !print flags | | 1 1 1 !print flags | +-----------------------------------------------------------+ Now you are finally ready to run SANDER, the molecular mechanics/dynamics module. Input for this program consists of a control file, sander.in, the topology file from PARM; prm- top, and the coordinate file from PARM; prmcrd. Output from SANDER consists of the final coordinates, minmd.crd, and a record of the molecular mechanical energies of the system as the minimization/dynamics proceeds, minmd.out. Output from a dynamics run may optionally include files containing the dynam- ics trajectory and velocities of all the atoms of the system over the course of the simulation. SANDER.COM: This file will typically be submitted to a batch queue, or run in the background at reduced priority. +-----------------------------------------------------------+ | Running SANDER | +-----------------------------------------------------------+ | | |SANDER.IN: | |This file uses the namelist style of input. | | | |# 500 steps min, constant dielectric | | &cntrl | | imin = 1, maxcyc = 500, nrun = 0, nsnb = 50, | | idiel = 1, cut = 8.0, scee = 2.0, | | &end | | | |Unix: | |sander -i sander.in -o minmd.out -p prmtop | | -c prmcrd -r minmd.crd -inf minmd.inf | | | |VMS: | |$ SET DEFAULT [YOURDIR.TET] | |$ ASS SANDER.IN FOR005 | |$ ASS MINMD.OUT FOR006 | |$ ASS PARM.BIN FOR020 | |$ ASS COORD.DAT FOR021 | |$ ASS COORD.OUT FOR033 | |$ RUN [AMBER41.EXE]SANDER.EXE | | | | | +-----------------------------------------------------------+ _______________________________________________________________ 1.3. Example 3. A more complicated protein example. _______________________________________________________________ This section works through in some detail setting up a protein simulation in AMBER. The example is for plastocyanin in water, and contains a number of things that experienced AMBER users know how to do, but which may be far from obvious for others. In particular, there are a number of items that go beyond a simple protein. The examples assume you have an envi- ronment variable $amber5 that points to the top of your AMBER distribution. (1) Plastocyanin contains a metal ion bound to four amino acids, and I also want to modify a methionine residue that is bound to the copper in such a way that it has a different type of sulfur than is found in the standard database. (2) The Brookhaven crystallographic file (1PLC) contains crystallographic waters, which I might want to keep. Only the oxygen positions are provided, so I will need to try to figure out where to put protons. (3) Somewhat unusually, this PDB file has proton positions for the protein, which I would like to keep. However, Brookhaven uses proton names that are different than what NMR spectroscopists use, and I would like to be able to use the latter to make easy contact with NMR results. (4) Using the most probable ionization states of the protein (at neutral pH) results in a protein with a net charge of -8, so I would like to include mobile counterions in the solution to create an overall neutral system. This will be a lot of work, but it's infinitely easier now in AMBER than it used to be. We will also use this example to show how to set up some constraints, such as might be found in a NMR refinement, and will illustrate how to carry out simu- lated annealing optimizations. 1.3.1. Make database file for the modified residues. For plastocyanin, I will define two new types of residues: HIC, which will be a histidine coupled to a copper ion, and which will take the place of HIS 37 in the "real" sequence, and MEM, which is a modified methionine where the sulfur atom is of type "SM" rather than type "S". "SM" is a type I made up, and will use to create special force field parameters for MET 94, which is bonded to the copper ion with a fairly long bond. Here are the input files for these two residues: +-------------------------------------------------------------------------+ | hicu.in | +-------------------------------------------------------------------------+ | 0 0 2 | | | |HISTIDINE PLUS | |hicu.db4 | | HIC INT 1 | | CORR OMIT DU BEG | | 0.00000 | | 1 DUMM DU M 0 -1 -2 0.000 0.000 0.000 0.00000 | | 2 DUMM DU M 1 0 -1 1.449 0.000 0.000 0.00000 | | 3 DUMM DU M 2 1 0 1.522 111.100 0.000 0.00000 | | 4 N N M 3 2 1 1.335 116.600 180.000 -0.34790 | | 5 H H E 4 3 2 1.010 119.800 0.000 0.27470 | | 6 CA CT M 4 3 2 1.449 121.900 180.000 -0.13540 | | 7 HA H1 E 6 4 3 1.090 109.500 300.000 0.12120 | | 8 CB CT 3 6 4 3 1.525 111.100 60.000 -0.04140 | | 9 HB2 HC E 8 6 4 1.090 109.500 300.000 0.08100 | | 10 HB3 HC E 8 6 4 1.090 109.500 60.000 0.08100 | | 11 CG CC S 8 6 4 1.510 115.000 180.000 -0.00120 | | 12 ND1 NB B 11 8 6 1.390 122.000 180.000 -0.15130 | | 13 CU CU E 12 11 8 2.050 126.000 0.000 0.38660 | | 14 CE1 CR B 12 11 8 1.320 108.000 180.000 -0.01700 | | 15 HE1 H5 E 14 12 11 1.090 120.000 180.000 0.26810 | | 16 NE2 NA B 14 12 11 1.310 109.000 0.000 -0.17180 | | 17 HE2 H E 16 14 12 1.010 125.000 180.000 0.39110 | | 18 CD2 CW S 16 14 12 1.360 110.000 0.000 -0.11410 | | 19 HD2 H4 E 18 16 14 1.090 120.000 180.000 0.23170 | | 20 C C M 6 4 3 1.522 111.100 180.000 0.73410 | | 21 O O E 20 6 4 1.229 120.500 0.000 -0.58940 | | | |LOOP | | CG CD2 | | | |IMPROPER | | -M CA N H | | CA +M C O | | CE1 CD2 NE2 HE2 | | CG NE2 CD2 HD2 | | ND1 NE2 CE1 HE1 | | ND1 CD2 CG CB | | | |DONE | |STOP | +-------------------------------------------------------------------------+ +-------------------------------------------------------------------------+ | mem.in | +-------------------------------------------------------------------------+ | 0 0 2 | | | |METHIONINE, with SM atom type for the sulfur | |mem.db4 | | MEM INT 1 | | CORR OMIT DU BEG | | 0.00000 | | 1 DUMM DU M 0 -1 -2 0.000 0.000 0.000 0.00000 | | 2 DUMM DU M 1 0 -1 1.449 0.000 0.000 0.00000 | | 3 DUMM DU M 2 1 0 1.522 111.100 0.000 0.00000 | | 4 N N M 3 2 1 1.335 116.600 180.000 -0.41570 | | 5 H H E 4 3 2 1.010 119.800 0.000 0.27190 | | 6 CA CT M 4 3 2 1.449 121.900 180.000 -0.02370 | | 7 HA H1 E 6 4 3 1.090 109.500 300.000 0.08800 | | 8 CB CT 3 6 4 3 1.525 111.100 60.000 0.03420 | | 9 HB2 HC E 8 6 4 1.090 109.500 300.000 0.02410 | | 10 HB3 HC E 8 6 4 1.090 109.500 60.000 0.02410 | | 11 CG CT 3 8 6 4 1.525 109.470 180.000 0.00180 | | 12 HG2 H1 E 11 8 6 1.090 109.500 300.000 0.04400 | | 13 HG3 H1 E 11 8 6 1.090 109.500 60.000 0.04400 | | 14 SD SM S 11 8 6 1.810 110.000 180.000 -0.27370 | | 15 CE CT 3 14 11 8 1.780 100.000 180.000 -0.05360 | | 16 HE1 H1 E 15 14 11 1.090 109.500 60.000 0.06840 | | 17 HE2 H1 E 15 14 11 1.090 109.500 180.000 0.06840 | | 18 HE3 H1 E 15 14 11 1.090 109.500 300.000 0.06840 | | 19 C C M 6 4 3 1.522 111.100 180.000 0.59730 | | 20 O O E 19 6 4 1.229 120.500 0.000 -0.56790 | | | |IMPROPER | | -M CA N H | | CA +M C O | | | |DONE | |STOP | +-------------------------------------------------------------------------+ I made mem.in just by copying the relevant portions of the methionine entry from all_amino94.in in the database directory, changing the atom type of the sulfur, and adding appropriate first and last lines. Similar things were done for the histi- dine residue (starting from the library's HIP residue), except that I added a copper atom bonded to ND1. It is a good idea to read these files alongside the PREP documentation. Then, these files were used as input to PREP: prep -i hicu.in -o hicu.prpout -p hicu.params prep -i mem.in -o mem.prpout -p mem.params This creates the files hicu.db4 and mem.db4, which describe these modified residues that will be incorporated into our pro- tein. 1.3.2. Do some editing of the PDB file. Several small changes need to be made to the input PDB file to make it work with Amber: (1) First, we need to split of the HOH water residues into a separate file, say watpdb, and remove them from the main PDB file (call this modified file 1plc.nowat.pdb). Fur- ther, the remarks in this pdb file indicate that waters #183 and #187 are a disordered pair, and should not both be present. So, I arbitrarily choose to delete #187 and to keep #183. For some reason, these two disordered waters were both put in the PDB file, and not assigned as alternate conformers. Note that AMBER by default will also choose only the principal position for disor- dered side chains, i.e. the "A" conformation if there is more than one. But this is done automatically, and does not require editing. If you want to start a simulation from the "B" conformation of a side chain, you need to manually edit the PDB file to remove the "A" conforma- tion and blank-out the alternate conformation flag for the atoms you want AMBER to use. Generally speaking, you have to look carefully at a Brookhaven file before really using it. (An alternative is to simply strip out the "crystallo- graphic" waters and not use them at all. This is most appropriate if you are planning an MD or free energy simulation that will go through an extensive equilibra- tion period before the "real" simulation begins. One goal of equilibration is to minimize dependence upon the starting conditions, and certainly the individual water molecules will move around a lot during any decent equi- libration. At that point, the fact that you went to some trouble to originally place the waters in some nice positions may be irrelevant. Or maybe not; opinions differ on this matter, which is why we try to provide flexible tools in Amber. For this tutorial, we will not use the "crystallographic" waters in our starting struc- ture.) One final change involves residue names. Brookhaven files do no distinguish between cysteine residue that are involved in bonds to other things (and hence which have no proton on the sulfur atom) and "free" residues that do have such a proton. Molecular mechanics studies need to make this sort of distinction. Since residue 84 in plastocyanin has the sulfur atom bonded to the copper ion, I changed its name from CYS to CYX. Similar com- ments apply to histidines: molecular mechanics studies need to know (or guess, or assume) whether the histidine has a proton bonded at the `delta` position (HID), at the `epsilon` position (HIE), or at both (HIP). This is pretty easy for plastocyanin, since the two histidine side chains are both bound to copper through the ND1 nitrogen. So we initially change both HIS residues to HIE, in order to tell AMBER to put the protons on the NE2 nitrogen. (Note that in many other proteins, it will often be reasonable to assign surface-accessible histidines to be protonated, residue name HIP.) (2) Next, we need to work on the proton names in the main protein file. Most Brookhaven crystallographic files do not have protons, so the protonate program is used to add them. Even here, we want to change the names Brookhaven uses to NMR conventions as described above, so we will still use protonate. This program also does sanity and chirality checking, so it is generally a good idea to use it prior to putting any pdb file into Amber. Now run: csh: ( protonate -k -d PROTON_INFO < 1plc.nowat.pdb > 1plc.nowat.H.pdb) >& protonate.out sh: protonate -k -d PROTON_INFO < 1plc.nowat.pdb > 1plc.nowat.H.pdb 2> protonate.out The -k option changes the names but "keeps" the posi- tions of the protons in the input pdb file. As in other examples, you need to use the location of the PRO- TON_INFO file on your machine in place the the location listed above. (3) Next, I moved the copper ATOM card from the end of the pdb file into residue 37, changing its residue name to "HIC" and its residue number to 37. I also changed the residue name for the rest of atoms of residue 37 from "HIE" to "HIC", and changed the residue name for residue 92 from "MET" to "MEM" as described above. I call this file 1plc.protein.pdb. 1.3.3. Set up the system without solvent. AMBER provides two ways to set up the files needed to carry out minimization or molecule dynamics. The original ("old") way runs the programs link, edit, and parm, each of which needs a separate input file. The "new" way involves run- ning the program LEaP (which stands for "link, edit and parm"). Most new users will run LEaP, but it won't hurt to skim over the next section, which describes the "old" way, since the ideas required are very similar in the two approaches. 1.3.3.1. (Alternative 1): Create link, edit and parm files. AMBER gets the sequence information, plus information about how the copper ion is bound to its ligands, from the input files to LINK: +--------------------------------------------------------------------------------+ | lnkin.nowat | +--------------------------------------------------------------------------------+ |PLASTOCYANIN | | | |HIC 2hicu_all.db4 | |MEM 2met.db4 | | | |DU | | 0 0 0 0 0 | |Plastocyanin (poplar) | |P 1 0 1 3 1 | |ILE 2ASP VAL LEU LEU GLY ALA ASP ASP GLY SER LEU ALA PHE VAL PRO | |SER GLU PHE SER ILE SER PRO GLY GLU LYS ILE VAL PHE LYS ASN ASN | |ALA GLY PHE PRO HIC ASN ILE VAL PHE ASP GLU ASP SER ILE PRO SER | |GLY VAL ASP ALA SER LYS ILE SER MET SER GLU GLU ASP LEU LEU ASN | |ALA LYS GLY GLU THR PHE GLU VAL ALA LEU SER ASN LYS GLY GLU TYR | |SER PHE TYR CYX SER PRO HIE GLN GLY ALA GLY MEM VAL GLY LYS VAL | |THR VAL ASN | | | | 37 87CU ND1 0 | | 37 84CU SG 0 | | 37 92CU SD 0 | | | |QUIT | +--------------------------------------------------------------------------------+ Again, it is a good idea to compare this input to the descrip- tions in the manual. Note the the copper atom is already bonded to the ND1 atom of HIS37, and that crosslink commands to used to add three other ligands to it. Then run: link -i lnkin.nowat -o lnkout.nowat -p /afs/psc/packages/amber/amber41/dat/db94.dat where you must substitute the location of db94.dat on your machine for the file listed above. Study the output file lnk- out.nowat to see if it looks like everything worked OK. Next create a standard input for for edit: +-----------------------------------------------------------+ | edtin.nowat | +-----------------------------------------------------------+ |poplar plastocyanin | | 0 0 0 0 | |XYZ | |OMIT | |XRAY | | 0 0 0 0 0 | |QUIT | +-----------------------------------------------------------+ and run: edit -i edtin.nowat -o edtout.nowat -pi 1plc.protein.pdb and look carefully at the output file. It is very common to find warning messages at this point, and they need to be cleared up, usually by minor re-editing of the input PDB file. Finally, create a standard input file for parm: +-----------------------------------------------------------+ | prmin | +-----------------------------------------------------------+ | standard parm using parm94.dat | |BIN FOR MOD4 | | 0 0 0 | | 1 | +-----------------------------------------------------------+ We also need to make modifications to the AMBER force field to recognize the copper atom and the modified methionine residue. The best way to do this is not to edit the main force field file, but to create a frcmod file with changes specific to each project. Here is the one I created for plastocyanin: +-----------------------------------------------------------+ | frcmod.pcy | +-----------------------------------------------------------+ |# modifications to force field for poplar plastocyanin | |MASS | |SM 32.06 | |CU 65.36 | | | |BOND | |NB-CU 70.000 2.05000 kludge by JRS | |CU-S 70.000 2.10000 kludge by JRS | |CU-SM 70.000 2.90000 for pcy | |CT-SM 222.000 1.81000 met(aa) | | | |ANGLE | |CU-NB-CV 50.000 126.700 JRS estimate | |CU-NB-CR 50.000 126.700 JRS estimate | |CU-NB-CP 50.000 126.700 JRS estimate | |CU-NB-CC 50.000 126.700 JRS estimate | |CU-SM-CT 50.000 120.000 JRS estimate | |CU-S -CT 50.000 120.000 JRS estimate | |CU-S -C2 50.000 120.000 JRS estimate | |CU-S -C3 50.000 120.000 JRS estimate | |NB-CU-NB 10.000 110.000 dac estimate | |NB-CU-SM 10.000 110.000 dac estimate | |NB-CU-S 10.000 110.000 dac estimate | |SM-CU-S 10.000 110.000 dac estimate | |CU-SM-CT 50.000 120.000 JRS estimate | |CT-CT-SM 50.000 114.700 met(aa) | |HC-CT-SM 35.000 109.500 | |H1-CT-SM 35.000 109.500 | |CT-SM-CT 62.000 98.900 MET(OL) | | | |DIHE | |X -NB-CU-X 1 0.000 180.000 3.000 | |X -CU-SM-X 1 0.000 180.000 3.000 | |X -CU-S -X 1 0.000 180.000 3.000 | |X -CT-SM-X 3 1.000 0.000 3.000 | | | |NONBON | | CU 2.20 0.200 | | SM 2.00 0.200 | | | +-----------------------------------------------------------+ Crating a frcmod file is a bit of an art, since one is often faced with unknown parameters (such as force constants from copper to its ligands in the above example.) In this tutorial, I am just going over the mechanics of running AMBER, and make no claims about the validity or appropriateness of the above parameters. There is a big literature on parameter estimation, and users are encouraged to consult this when faced with unusual chemical environments. Now, run parm with the above inputs: parm -i prmin -o prmout.nowat -f $amber5/dat/parm94.dat -m frcmod.pcy -p prmtop.nowat -c prmcrd.nowat Again, you need to put in the location of the parm94.dat file on your machine. Study the prmout.nowat file to see if it looks like things went OK. It might be typical to find "miss- ing parameters" at this point, which means that the frcmod file does not contain all of the parameters you need; the missing ones will be identified in parm output. You might also create a pdb file at this point with the new coordinates: ambpdb -p prmtop.nowat -wrap < prmcrd.nowat > 1plc.protein.amber.pdb The -wrap flag will make the output proton names more like those Brookhaven uses. Leave this flag off if you want the names in the output PDB file to be the internal AMBER proton names. 1.3.3.2. (Alternative 2): Use LEaP to set up the required files. It is simpler to carry out the above procedure in LEaP. Let's start by setting up an input file: +-----------------------------------------------------------+ | leap.in | +-----------------------------------------------------------+ |PARM94 = loadamberparams frcmod.pcy | |loadAmberPrep mem.lib | |loadAmberPrep hicu.lib | |plc = loadPdb 1plc.protein.pdb | |bond plc.87.ND1 plc.37.CU | |bond plc.84.SG plc.37.CU | |bond plc.92.SD plc.37.CU | |saveAmberParm plc prmtop.nowat prmcrd.nowat | +-----------------------------------------------------------+ By default, LEaP will read in the standard AMBER 94 force field libraries. The first line in the file above merges in the extra material from the frcmod.pcy file described above. The next two lines load in the files describing the modified residues HIC and MEM. The a molecule, named "plc" is created by reading in the appropriate pdb file. (LEaP will often complain at this point if it finds something it doesn't under- stand or doesn't like; a typical task would be to modify the pdb file and try again.) Next, the three "cross-links" that connect residues 84, 87 and 92 to the copper atom are added via the bond command. Finally, the saveAmberParm command is used to create the required output files. Details of all of these commands can be found in the LEaP manual. The file above is read into LEaP as follows; ">" is the prompt that LEaP provides: tleap > source leap.in > quit 1.3.4. Run a simple minimization. We start with a very simple minimization with restraints to keep the protein from moving too far. In the examples below, do not include the comments in parenthesis in your actual files. +----------------------------------------------------------------+ | min1.in | +----------------------------------------------------------------+ | | |Minimization with Cartesian restraints | | &cntrl | | imin=1, maxcyc=200, (invoke minimization) | | scee=1.2, idiel=0, cut=12.0, (force field options) | | nsnb=20, (update non-bonded list) | | ntpr=5, (print frequency) | | ntr=1, (turn on cartesian restraints) | | &end | |Group input for restrained atoms | | 1.0 (force constant for restraint) | |RES 1 99 (all atoms in residues 1-99) | |END | |END | | | +----------------------------------------------------------------+ To run this file, use the following command: sander -i min1.in -c prmcrd.nowat -p prmtop.nowat -ref prmcrd.nowat -o min1.out -r min1.xyz 1.3.5. Run simulated annealing optimization. At this point, one often would want to carry out a more robust optimization, using a dynamical simulated annealing pro- tocol. One also might add some sort of constraints at this point. In this example, we will illustrate how NMR-derived constraints might be incorporated; constraints from other sources of information would be handled in a similar fashion. +---------------------------------------------------------------------------+ | simul_anneal.in | +---------------------------------------------------------------------------+ | | | 15ps simulated annealing protocol | | &cntrl | | nstlim=15000, ntt=1, (time limit, temp. control) | | scee=1.2, (scee must be set - 1-4 scale factor) | | ntpr=500, pencut=0.1, (control of printout) | | ipnlty=1, nmropt=1, (NMR penalty function options) | | vlimit=10, (prevent bad temp. jumps) | | &end | |# | |# Simple simulated annealing algorithm: | |# | |# from steps 0 to 1000: raise target temperature 10->1200K | |# from steps 1000 to 3000: leave at 1200K | |# from steps 3000 to 15000: re-cool to low temperatures | |# | | &wt type='TEMP0', istep1=0,istep2=1000,value1=10., | | value2=1200., &end | | &wt type='TEMP0', istep1=1001, istep2=3000, value1=1200., | | value2=1200.0, &end | | &wt type='TEMP0', istep1=3001, istep2=15000, value1=0., | | value2=0.0, &end | |# | |# Strength of temperature coupling: | |# | |# steps 0 to 3000: tight coupling for heating and equilibration | |# steps 3000 to 11000: slow cooling phase | |# steps 11000 to 13000: somewhat faster cooling | |# steps 13000 to 15000: fast cooling, like a minimization | |# | | &wt type='TAUTP', istep1=0,istep2=3000,value1=0.2, | | value2=0.2, &end | | &wt type='TAUTP', istep1=3001,istep2=11000,value1=4.0, | | value2=2.0, &end | | &wt type='TAUTP', istep1=11001,istep2=13000,value1=1.0, | | value2=1.0, &end | | &wt type='TAUTP', istep1=13001,istep2=14000,value1=0.5, | | value2=0.5, &end | | &wt type='TAUTP', istep1=14001,istep2=15000,value1=0.05, | | value2=0.05, &end | | | | (continued on next page) | +---------------------------------------------------------------------------+ +---------------------------------------------------------------+ | simul_anneal.in (continued) | +---------------------------------------------------------------+ |# | |# "Ramp up" the restraints over the first 3000 steps: | |# | | &wt type='REST', istep1=0,istep2=3000,value1=0.1, | | value2=1.0, &end | | &wt type='REST', istep1=3001,istep2=15000,value1=1.0, | | value2=1.0, &end | | | | &wt type='END' &end | |LISTOUT=POUT (get restraint violation list) | |DISANG=RST.f (file containing NMR restraints) | | | +---------------------------------------------------------------+ The restraint file referenced above (RST.f) would ordinar- ily hold hundreds to thousands of constraints based on experi- mental information. The constraints would keep the protein near its native conformation even though we have heated to a very high temperature. Examples of such constraint files are given in the SANDER section of the Users' Manual. Since a refinement like this can take a long time to run, we have not actually included files for it in the tutorial directory: the example given above can serve as a guide for real calculations that you might want to carry out. 1.3.6. Setup the system with counterions in a box of water. Next, we turn from "vacuum" simulations to consider how to set up and carry out molecular dynamics simulations in a box of solvent water, using periodic boundary conditions. This exam- ple is typical of many molecular dynamics simulations begin carried out with AMBER. 1.3.6.1. Work on positioning counterions. The question of how or whether to include solvent counte- rions in protein and DNA simulations is a difficult one. Gen- erally speaking, DNA simulations have often used counterions and many existing protein simulations have not. In terms just of "mechanics" and not science, AMBER will suggest counterion positions for you, by using the cion program: csh: ( cion -elstat -p prmtop.nowat < 1plc.protein.amber.pdb > cion.pdb) >& cion.out sh: cion -elstat -p prmtop.nowat < 1plc.protein.amber.pdb > cion.pdb 2> cion.out This routine places counterions at the most favorable electro- static positions, until it achieves a neutral overall system. Note, however, that this may end up corresponding to a fairly high salt concentration, and may not be at all what you want. At this stage, the user's judgment is required, which is why a lot of this stuff is not yet automated. This tutorial can't go over all of the pros and cons of various choices, and in any event, different users will have different preferences. Let's suppose that you choose a few sodium counterions to add to the simulation. For this example, we will choose the first eight counterions, in order to neutralize the -8 charge on the pro- tein itself. Then, you need to run LINK again, using an input something like the following: +--------------------------------------------------------------------------------+ | lnkin.cion | +--------------------------------------------------------------------------------+ |PLASTOCYANIN | | | |HIC 2hicu.db4 | |MEM 2mem.db4 | | | |DU | | 0 0 0 0 0 | |Plastocyanin (poplar) | |P 1 0 1 3 1 | |ILE 2ASP VAL LEU LEU GLY ALA ASP ASP GLY SER LEU ALA PHE VAL PRO | |SER GLU PHE SER ILE SER PRO GLY GLU LYS ILE VAL PHE LYS ASN ASN | |ALA GLY PHE PRO HIC ASN ILE VAL PHE ASP GLU ASP SER ILE PRO SER | |GLY VAL ASP ALA SER LYS ILE SER MET SER GLU GLU ASP LEU LEU ASN | |ALA LYS GLY GLU THR PHE GLU VAL ALA LEU SER ASN LYS GLY GLU TYR | |SER PHE TYR CYX SER PRO HIE GLN GLY ALA GLY MEM VAL GLY LYS VAL | |THR VAL ASN | | | | 37 87CU ND1 0 | | 37 84CU SG 0 | | 37 92CU SD 0 | | | |Eight sodium counterions: | |P 0 0 1 3 0 | |CIP 2*** CIP *** CIP *** CIP *** CIP *** CIP *** CIP *** CIP | | | |QUIT | +--------------------------------------------------------------------------------+ Note the use of the "***" residue to indicate that the counterions are not chemically bonded to each other. Now run: rm lnkbin edtbin link -i lnkin.cion -o lnkout.cion -p $amber5/dat/db94.dat where you must again substitute the location of db94.dat on your machine for the file listed above. Study the output file lnkout.cion to see if it looks like everything worked OK. 1.3.6.2. Run EDIT and PARM to add a box of waters around the system. Actually, the hard part is mostly done. At this point we would manually add the top eight counterions from the file cion.pdb to the bottom of the file 1plc.protein.pdb, (call this new file 1plc.cion.pdb) and then would run edit to read in the PDB file that has counterions, and to add a box of water molecules around the solute. Here is a sample input file for that: +-----------------------------------------------------------+ | edtin.wat | +-----------------------------------------------------------+ |poplar plastocyanin | | 0 0 0 0 | |XYZ | |OMIT | |XRAY | | 0 0 0 0 0 | |BOX | |HW OW 4 | | 0.417 2.8 2.3 | | 12.0 12.0 12.0 | |QUIT | +-----------------------------------------------------------+ The edit command is: edit -i edtin.wat -o edtout.wat -pi 1plc.cion.pdb -b $amber5/dat/wat216.dat This creates a box with a minimum of 10 A between the pro- tein and the edge of the box. There end up being 3336 water molecules surrounding the protein. This is not a large value, since with counterions you need to be sure that there is enough room for them to move around if they need to. Running PARM with counterions and water is no different than without, so at this point you need to repeat the PARM step outlined above, except for changing the names of the output files: parm -i prmin -o prmout.wat -f $amber5/dat/parm94.dat -m frcmod.pcy -p prmtop.wat -c prmcrd.wat If everything went well, you will have a parameter file (prm- top.wat), and a coordinate file (prmcrd.wat). These would be used to start an equilibration procedure, followed by an MD or free energy simulation. 1.3.6.3. Run solvated molecular dynamics simulation. We will first run a simple minimization in water to remove initial bad contacts: +--------------------------------------------------------+ | min2.in | +--------------------------------------------------------+ | | | molecular dynamics run | | &cntrl | | imin=1, | | scee=1.2, idiel=1, cut=9.0, | | ntb=1, ntc=2, nsnb=25, | | maxcyc=500, ntpr=25, | | &end | | | +--------------------------------------------------------+ Here is the command to run: sander -O -i min2.in -c prmcrd.wat -p prmtop.wat -o min2.out -r min2.xyz & (This is run in background, since it will take a few min- utes to run; Next, try out a short molecular dynamics run; actual "production" computations would include many, many more MD steps than is given here. +-----------------------------------------------------------------+ | md1.in | +-----------------------------------------------------------------+ | | |molecular dynamics run | | &cntrl | | imin=0, irest=0, ntx=1, tempi=0., (no restart MD) | | scee=1.2, idiel=1, cut=9.0, (force field options) | | ntt=1, temp0=300.0, tautp=0.2, (temperature control) | | ntp=2, taup=0.2, (pressure control) | | ntb=2, ntc=4, ntf=2, nsnb=25, (SHAKE, periodic bc.) | | nstlim=500, (run for 0.0005 nsec) | | ntwe=100, ntwx=100, ntpr=25, (output frequency) | | &end | | | +-----------------------------------------------------------------+ Here we will start from the output of the minimization step to carry out the dynamics: sander -O -i md1.in -c min2.xyz -p prmtop.wat -o md1.out -r md1.xyz -x md1.crd -e md1.ene & The output will include the md1.out file giving information about the progress of the trajectory, along with md1.crd and md1.ene files that contain the coordinates and energy informa- tion at every 100th step, respectively. Many of the analysis programs in AMBER can use these sorts of files.