LINK module Page 1 1. LINK Usage: link [-O] -i input -o output -l lnkbin -p db4.dat -O Overwrite output files if they exist. _______________________________________________________________ NOTE: Leap replaces Prep, Link, Edit and Parm with a much simpler, single program. The purpose of this module is to create the molecular topology file read by EDIT. It reads the topology of individ- ual residues from one of the standard databases and/or from individual files, and links them together to create the topol- ogy of the final system. It uses the tree convention of the PREP module and always connects the first main type atom of the current residue to the last main type atom of the previous residue. In addition to this standard linking process, it can cross link specified atoms in a molecule or different molecules to form a covalent bond. The macromolecule can be specified as a single molecule or set of molecules. For example, double stranded DNA is normally defined as two molecules, where each strand constitutes a molecule. Separate molecules are not linked by a covalent bond unless explicitly specified by the cross linking information. The standard database contains topological information for nucleic acid and peptide residues. If non-standard residues are required, the PREP module must first be used to create this nonstandard residue. It may be appended to the database or kept as an individual file. See PREP.DOC for details. It is important to note that this module will generate the connectivity and the internal parameter lists such as bond angle, dihedral and excluded atom pointers correctly. However, the coordinates are generally not meaningful except for smaller molecules because residues are linked with the dihedral angles stored in the data base. Side chain conformation will also be that found in the database entry. Hence the user will gener- ally want to read in a new coordinates at the next stage using the EDIT module and a coordinate file. LINK module Page 2 This module was originally written by P. K. Weiner at UCSF and overhauled by U. C. Singh in Feb. 1984. LINK 3.0 Rev A is a revision for portability and reliability by George Seibel, 1989. NOTE: the utility program NUKIT will generate link (and nucgen) input files for nucleic acids interactively. Files: file unit description db94.dat 15 Prep database input 5 Program control input output 6 User information and diagnostics lnkbin 10 Output topology file (binary) NOTE: LINK has formatted input, so pay careful attention to the fields specified. _______________________________________________________________ Nucleic Acids _______________________________________________________________ The nucleic acid residues have different naming conven- tions in the 1994 and 1991 databases (db94.dat and db4.dat, respectively). Nucgen can generate PDB files using the 1994 residue names. 1994 Nucleic Acids strand position termini residue names 5' 3' ---------------------------------------------------------------- beginning OH O G5, C5, A5, T5, U5 middle phosphate O G, C, A, T, U end phosphate OH G3, C3, A3, T3, U3 single residue OH OH GN, CN, AN, TN, UN ---------------------------------------------------------------- 1991 Nucleic Acids LINK module Page 3 In the 1991 database, phosphates and terminal hydrogens are treated as separate residues, so the bases have only one version: GUA, CYT, ADE, THY and URA. The phosphate group is represented by the residue name POM. The terminal hydrogen atoms are represented by residues HB (at the 5' end) and HE (at the 3' end). At the 5' end of a nucleic acid chain, the atom H in HB is connected to the O5' atom of the first nucleoside. The nucleic acid chain grows in the 5' to 3' direction. At the 3' end the H atom in HE is connected to the O3' atom of the last nucleo- side residue. Two nucleoside residues are connected by a phos- phate group (POM). The 5' and 3' ends of POM are linked to O3' and O5' atoms of the preceding and following nucleoside residue. In the case of a double-helix, the complementary chain also is represented from the 5' end to the 3' end. Each chain is described as a separate molecule. Thus, the double helix will be represented by two molecules, the triple helix by three molecules.. etc.. For example, the sequence d(ATATAT).d(ATATAT) is repre- sented by two molecules as: A5 T A T A T3 (MOL 1) A5 T A T A T3 (MOL 2) in the 1994 convention, and HB ADE POM THY POM ADE POM THY POM ADE POM THY HE (MOL 1) HB ADE POM THY POM ADE POM THY POM ADE POM THY HE (MOL 2) in the 1991 convention. The sequence d(CGATG).d(CATCG) is rep- resented by C5 G A T G3 (MOL 1) C5 A T C G3 (MOL 2) in the 1994 convention and HB CYT POM GUA POM ADE POM THY POM GUA HE (MOL 1) HB CYT POM ADE POM THY POM CYT POM GUA HE (MOL 2) in the 1991 convention. _______________________________________________________________ Peptides and Proteins _______________________________________________________________ LINK module Page 4 Proteins are assumed to begin from the N-terminus and end at the C-terminus. There are three ways to handle these ter- mini: (1) specify only the 'normal' residues in the chain with IFTPRO=0 for uncharged ends: (N--CA--...--CA--C) which will leave 'unbalanced' charges (2) specify only 'normal' residues in the chain with IFT- PRO=1 for charged ends: (NH3+ --CA--...--CA-- COO-) (3) specify the 'normal' residues with terminal residues ACE and NME and IFTPRO=0 for neutral ends: (ACE--CA--...--CA--NME). IFTPRO is on card 6B. The program does not recognize the disulfide S-S bridge in proteins so this must be input as cross links for each bridge (cards 6B and 6E). ----------------------------------------------------- name residue ----------------------------------------------------- Alanine ALA Arginine ARG Asparagine ASN Aspartic acid ASP Cysteine CYS Cystine (S-S bridge) CYX Glutamine GLN Glutamic acid GLU Glycine GLY Histidine delta H HID Histidine epsilon H HIE Histidine + HIP Isoleucine ILE Leucine LEU Lysine LYS Methionine MET Phenylalanine PHE Proline PRO Serine SER Threonine THR Tryptophan TRP Tyrosine TYR Valine VAL Acetyl group ACE (beginning residue) N-Methyl NME (end residue) LINK module Page 5 _______________________________________________________________ Input description _______________________________________________________________ The following section describes the input data necessary for this module, which is read from unit 5. IMPORTANT: Charac- ter data should be left-justified. ----------------------------------------------------------------------- - 1a - TITLE FOR THE RUN FORMAT(20A4) TITLE Title for identification ----------------------------------------------------------------------- - 1b - blank card (read but ignored) ----------------------------------------------------------------------- - 2 - This section is used to inform LINK of nonstandard residues and where these files can be located. If no nonstandard residues are required one blank card (card 3) is still necessary to terminate this section. If a nonstandard residue has the same name as a standard residue found in the database, set ITYPEF = 9 in card 6C for the nonstandard residue. This will prevent the standard residue from being substituted for the nonstandard residue. The actual value of ITYPEF in the nonstandard residue file doesn't matter. IERES(I) , JERES(I) , KERES(I) , I = 1,NERES FORMAT(A4,1X,I5,A40) IERES(I) name of the residue (PREP input card 5) JERES(I) flag for the type of topology file (PREP input card 5 (KFORM)) = 0 formatted file = 1 binary file KERES(I) Name of the residue topology file (PREP input card 4) Note: The external source for nonstandard residue(s) is read until a blank card is encountered. As many as 200 external residues can be read. ----------------------------------------------------------------------- - 3 - one blank card LINK module Page 6 ----------------------------------------------------------------------- - 4 - ISYMDU FORMAT(A4) ISYMDU Symbol for the dummy atoms. It is advisable to use 'DU' as the symbol for dummy atoms as in the data base. The symbol for the dummy has to be unique for a given system. It is not permitted to have more the one dummy atom symbol. Do NOT confuse these dummy atoms with the dummy atoms used in Perturbation. ----------------------------------------------------------------------- - 5 - IWO , IWI , IWN , IWA Output information written to file 'output' (unit 6) FORMAT(10I5) IWO Flag to output the coordinates for each atom = 0 coordinates will be output = 1 output will be suppressed IWI Flag to output the residue information for all residues = 0 none = 1 output the information IWN Flag to output non-bonded excluded lists for all atoms = 0 none = 1 output IWA Flag to output bond, angle and dihedral pointers = 0 none = 1 output ----------------------------------------------------------------------- - 6 - Individual molecule information. The program will continue to read groups of cards described in this section until a 'QUIT' card is encountered. Each group of cards represents one molecule. Repeat cards 6A - 6F for each molecule of the system. ----------------------------------------------------------------------- - 6A - SUBTITLE FOR THE MOLECULE TO BE READ FORMAT(20A4) TITLE Subtitle for the molecule LINK module Page 7 ----------------------------------------------------------------------- - 6B - LBMOL , ICROSL , ICONN , NM0 , NA0 , IFTPRO FORMAT(A1,I4,4I5) LBMOL Label for the type of molecule. This is necessary since some adjustments of the end residues of nucleotides and peptides are made so that the charge of the system is an integral value. Consult the subroutine BLDIT in the LINK source code for details. 'D' the molecule is a deoxy nucleotide 'R' the molecule is a ribonucleotide 'P' the molecule is a peptide or protein 'O' the molecule is anything else ('O' = other) ICROSL Flag for the presence of cross links within the molecule or between molecules. = 0 none = 1 cross link is present and its information will be read after the residue sequence. ICONN Read but not used. NM0 The molecule number to which the first main atom of the present molecule is attached either by a covalent or non-covalent bond. If there is no covalent attachment, NM0 should be set equal to 1, as the first molecule defines the space axes for all subsequent molecules. *** This is usually set equal to 1 *** NA0 The relative atom number in molecule NM0 to which the current molecule's first main type atom is connected. If there is no covalent connection NA0 should be set equal to 3 as the space axes are defined by the first three atoms of the first molecule. If this is not done there will be an error in converting from internal coordinates to cartesian coordinates in the EDIT module. *** This is usually set equal to 3 *** IFTPRO Flag for the type of protein terminal residues = 0 Standard (uncharged) terminal residues = 1 Charged terminal residues ( NH3+, COO-) ----------------------------------------------------------------------- - 6C - Residue information for the current molecule. It is read in the following format until a blank card is encountered. LINK module Page 8 LBRES(I) , ITYPF(I) , I = 1, NRESM FORMAT(16(A4,I1)) LBRES(I) Residue name ITYPF(I) The type of force field for the current residue. This option is included so that different types of residues may be kept in the data base. Currently three types are available; the united atom type, the all atom type, and Jorgensen's OPLS model. Additional models could be put into the data base and retrieved using this option. = 1 united atom model = 2 all atom model = 3 OPLS united atom model (requires use of OPLS force field file) If this is zero then the previous non zero value is carried over until a non zero option is specified. NOTE: The program assumes it is done reading the residue information when it encounters a blank card. It is always assumed that the first main type atom of the current residue is connected to the last main type atom of the preceding residue by a covalent bond. If this covalent linkage is not desired the two residues should be separated by the spacer residue, '***'. When two residues are separated by the spacer residue they are connected without a covalent linkage. For example the sequence ALA 1GLY will be connected by a covalent bond while ALA 1*** GLY will be topologically connected but no bonding parameters will be considered. Both systems in this example use the united atom force field. ----------------------------------------------------------------------- - 6D - Blank card to terminate residue input ----------------------------------------------------------------------- - 6E - This section is used to create cross linkages within a molecule or between different molecules. A normal use of this is to create disulfide bonds in proteins. ***** ONLY IF ICROSL.GT.0 ***** *** ICROSL is set in card 6B *** ICROS , JCROS , IACROS , JACROS , MOLNM FORMAT(2I5,2A4,I5) ICROS , JCROS LINK module Page 9 The residue numbers, as they are listed in 6C, of the two residues to be cross linked. ICROS specifies the relative residue number in molecule MOLNM, (MOLNM is assumed to be the current molecule number unless otherwise specified at the end of this card.), and the residue JCROS is assumed to be in the current molecule. IACROS The graph name, the name assigned to the atom in PREP, of the atom in residue ICROS which is involved in the cross link. JACROS The graph name of the atom in residue JCROS which is involved in the cross link. MOLNM The molecule number to which the residue ICROS belongs. If MOLNM = 0 then it is assumed to be in the current molecule. ----------------------------------------------------------------------- - 6F - Blank card if ICROSL .GT. 0 ----------------------------------------------------------------------- - 7 - KSTOP FORMAT(A4) KSTOP Control to exit from the program immediately following the last blank card. If additional molecules are to be processed the cards in group 6 are repeated before the 'QUIT' card. The program will never make a graceful exit if this card is missing since it is working inside an infinite loop. ----------------------------------------------------------------------- ++++++ END OF INPUT ++++++ Rev A Revision: George Seibel LINK 3.0 Authors: U.C. Singh and P.K. Weiner Director: P.A. Kollman Department of Pharmaceutical Chemistry School of Pharmacy University of California San Francisco CA 94143 Phone (415) 476 4637