LINK module                                              Page 1


      1.  LINK


      Usage:

           link [-O] -i input -o output -l lnkbin -p db4.dat


      -O   Overwrite output files if they exist.

      _______________________________________________________________


           NOTE: Leap replaces Prep, Link, Edit and Parm with a  much
      simpler, single program.

           The  purpose  of  this  module  is to create the molecular
      topology file read by EDIT.  It reads the topology of  individ-
      ual  residues  from  one  of the standard databases and/or from
      individual files, and links them together to create the  topol-
      ogy  of  the  final system.  It uses the tree convention of the
      PREP module and always connects the first main type atom of the
      current  residue  to  the  last  main type atom of the previous
      residue.  In addition to this standard linking process, it  can
      cross link specified atoms in a molecule or different molecules
      to form a covalent bond.  The macromolecule can be specified as
      a  single  molecule  or  set of molecules.  For example, double
      stranded DNA is normally  defined as two molecules, where  each
      strand  constitutes  a  molecule.   Separate  molecules are not
      linked by a covalent bond unless explicitly  specified  by  the
      cross linking information.

           The standard database contains topological information for
      nucleic acid and peptide residues.   If  non-standard  residues
      are required, the PREP module must first be used to create this
      nonstandard residue.  It may be appended  to  the  database  or
      kept as an individual file.  See PREP.DOC for details.

           It is important to note that this module will generate the
      connectivity and the internal  parameter  lists  such  as  bond
      angle, dihedral and excluded atom pointers correctly.  However,
      the coordinates are generally not meaningful except for smaller
      molecules  because residues are linked with the dihedral angles
      stored in the data base.  Side chain conformation will also  be
      that  found in the database entry.   Hence the user will gener-
      ally want to read in a new coordinates at the next stage  using
      the EDIT module and a coordinate file.


      LINK module                                              Page 2


           This module was originally written by P. K. Weiner at UCSF
      and overhauled by U. C. Singh in Feb. 1984.   LINK 3.0 Rev A is
      a  revision  for  portability and reliability by George Seibel,
      1989.

           NOTE: the utility program NUKIT will  generate  link  (and
      nucgen) input files for nucleic acids interactively.


      Files:

            file       unit   description

            db94.dat   15     Prep database

            input      5      Program control input
            output     6      User information and diagnostics
            lnkbin     10     Output topology file (binary)


           NOTE:  LINK  has formatted input, so pay careful attention
      to the fields specified.

      _______________________________________________________________

                               Nucleic Acids
      _______________________________________________________________


           The nucleic acid residues have  different  naming  conven-
      tions  in  the  1994  and 1991 databases (db94.dat and db4.dat,
      respectively). Nucgen can generate PDB  files  using  the  1994
      residue names.

                            1994 Nucleic Acids

      strand position           termini       residue names
                              5'         3'
      ----------------------------------------------------------------

      beginning               OH         O    G5,  C5,  A5,  T5,  U5
      middle               phosphate     O    G,   C,   A,   T,   U
      end                  phosphate     OH   G3,  C3,  A3,  T3,  U3
      single residue          OH         OH   GN,  CN,  AN,  TN,  UN

      ----------------------------------------------------------------


                            1991 Nucleic Acids


      LINK module                                              Page 3


           In  the  1991  database, phosphates and terminal hydrogens
      are treated as separate residues, so the bases  have  only  one
      version:  GUA,  CYT,  ADE, THY and URA.  The phosphate group is
      represented by the residue name  POM.   The  terminal  hydrogen
      atoms are represented by residues HB (at the 5' end) and HE (at
      the 3' end).

           At the 5' end of a nucleic acid chain, the atom H in HB is
      connected to the O5' atom of the first nucleoside.  The nucleic
      acid chain grows in the 5' to 3' direction.  At the 3' end  the
      H  atom  in HE is connected to the O3' atom of the last nucleo-
      side residue.  Two nucleoside residues are connected by a phos-
      phate  group  (POM).   The 5' and 3' ends of POM are  linked to
      O3' and O5' atoms of the  preceding  and  following  nucleoside
      residue.

           In  the  case  of  a double-helix, the complementary chain
      also is represented from the 5' end to the 3' end.  Each  chain
      is  described  as  a separate molecule.  Thus, the double helix
      will be represented by two molecules, the triple helix by three
      molecules.. etc..

           For  example,  the  sequence d(ATATAT).d(ATATAT) is repre-
      sented by two molecules as:

          A5  T   A   T   A   T3    (MOL 1)
          A5  T   A   T   A   T3    (MOL 2)

      in the 1994 convention, and

          HB  ADE POM THY POM ADE POM THY POM ADE POM THY HE   (MOL 1)
          HB  ADE POM THY POM ADE POM THY POM ADE POM THY HE   (MOL 2)

      in the 1991 convention.  The sequence d(CGATG).d(CATCG) is rep-
      resented by

          C5  G   A   T   G3    (MOL 1)
          C5  A   T   C   G3    (MOL 2)

      in the 1994 convention and

          HB  CYT POM GUA POM ADE POM THY POM GUA HE   (MOL 1)
          HB  CYT POM ADE POM THY POM CYT POM GUA HE   (MOL 2)

      in the 1991 convention.

      _______________________________________________________________

                           Peptides and Proteins
      _______________________________________________________________


      LINK module                                              Page 4


           Proteins  are assumed to begin from the N-terminus and end
      at the C-terminus.  There are three ways to handle  these  ter-
      mini:

       (1)   specify  only  the  'normal'  residues in the chain with
             IFTPRO=0 for uncharged ends:
                               (N--CA--...--CA--C)
             which will leave 'unbalanced' charges

       (2)   specify only 'normal' residues in the  chain  with  IFT-
             PRO=1 for charged ends:
                           (NH3+ --CA--...--CA-- COO-)

       (3)   specify the 'normal' residues with terminal residues ACE
             and NME and IFTPRO=0 for neutral ends:
                             (ACE--CA--...--CA--NME).

      IFTPRO is on card 6B.

           The program does not recognize the disulfide S-S bridge in
      proteins  so  this must be input as cross links for each bridge
      (cards 6B and 6E).

                   -----------------------------------------------------
                         name                      residue
                   -----------------------------------------------------

                       Alanine                       ALA
                       Arginine                      ARG
                       Asparagine                    ASN
                       Aspartic acid                 ASP
                       Cysteine                      CYS
                       Cystine (S-S bridge)          CYX
                       Glutamine                     GLN
                       Glutamic acid                 GLU
                       Glycine                       GLY
                       Histidine delta H             HID
                       Histidine epsilon H           HIE
                       Histidine +                   HIP
                       Isoleucine                    ILE
                       Leucine                       LEU
                       Lysine                        LYS
                       Methionine                    MET
                       Phenylalanine                 PHE
                       Proline                       PRO
                       Serine                        SER
                       Threonine                     THR
                       Tryptophan                    TRP
                       Tyrosine                      TYR
                       Valine                        VAL
                       Acetyl group                  ACE  (beginning residue)
                       N-Methyl                      NME  (end residue)


      LINK module                                              Page 5


      _______________________________________________________________

                             Input description
      _______________________________________________________________


           The following section describes the input  data  necessary
      for this module, which is read from unit 5.  IMPORTANT: Charac-
      ter data should be left-justified.

           -----------------------------------------------------------------------

              - 1a -       TITLE FOR THE RUN

                        FORMAT(20A4)

              TITLE       Title for identification

           -----------------------------------------------------------------------

              - 1b -   blank card (read but ignored)

           -----------------------------------------------------------------------

              - 2 -      This section is used to inform LINK of nonstandard
                   residues and where these files can be located.  If no
                   nonstandard residues are required one blank card (card 3) is
                   still necessary to terminate this section.  If a nonstandard
                   residue has the same name as a standard residue found in the
                   database, set ITYPEF = 9 in card 6C for the nonstandard
                   residue.  This will prevent the standard residue from being
                   substituted for the nonstandard residue.  The actual value of
                   ITYPEF in the nonstandard residue file doesn't matter.

                   IERES(I) , JERES(I) , KERES(I) , I = 1,NERES

                       FORMAT(A4,1X,I5,A40)

              IERES(I)   name of the residue  (PREP input card 5)

              JERES(I)   flag for the type of topology file
                   (PREP input card 5 (KFORM))
              = 0  formatted file
              = 1  binary file

              KERES(I)   Name of the residue topology file (PREP input card 4)

                   Note: The external source for nonstandard residue(s) is
                   read until a blank card is encountered.  As many as
                   200 external residues can be read.

           -----------------------------------------------------------------------
              - 3 -      one blank card


      LINK module                                              Page 6


           -----------------------------------------------------------------------

              - 4 -      ISYMDU

                       FORMAT(A4)

              ISYMDU     Symbol for the dummy atoms.
                   It is advisable to use 'DU' as the symbol for dummy
                   atoms as in the data base.  The symbol for the dummy
                   has to be unique for a given system.  It is not permitted
                   to have more the one dummy atom symbol.  Do NOT confuse
                   these dummy atoms with the dummy atoms used in Perturbation.

           -----------------------------------------------------------------------

              - 5 -      IWO , IWI , IWN , IWA

                   Output information written to file 'output' (unit 6)

                       FORMAT(10I5)

              IWO        Flag to output the coordinates for each atom
              = 0  coordinates will be output
              = 1  output will be suppressed

              IWI        Flag to output the residue information for all residues
              = 0  none
              = 1  output the information

              IWN        Flag to output non-bonded excluded lists for all atoms
              = 0  none
              = 1  output

              IWA        Flag to output bond, angle and dihedral pointers
              = 0  none
              = 1  output

           -----------------------------------------------------------------------

              - 6 -      Individual molecule information.  The program will
                   continue to read groups of cards described in this
                   section until a 'QUIT' card is encountered.  Each group
                   of cards represents one molecule.  Repeat cards 6A - 6F
                   for each molecule of the system.

           -----------------------------------------------------------------------

              - 6A -     SUBTITLE FOR THE MOLECULE TO BE READ

                       FORMAT(20A4)

              TITLE      Subtitle for the molecule


      LINK module                                              Page 7


           -----------------------------------------------------------------------

              - 6B -     LBMOL , ICROSL , ICONN , NM0 , NA0 , IFTPRO

                       FORMAT(A1,I4,4I5)

              LBMOL      Label for the type of molecule.
                   This is necessary since some adjustments of the end
                   residues of nucleotides and peptides are made so
                   that the charge of the system is an integral value.
                   Consult the subroutine BLDIT in the LINK source code
                   for details.

              'D'  the molecule is a deoxy nucleotide
              'R'  the molecule is a ribonucleotide
              'P'  the molecule is a peptide or protein
              'O'  the molecule is anything else ('O' = other)

              ICROSL     Flag for the presence of cross links within the
                   molecule or between molecules.

              = 0  none
              = 1  cross link is present and its information will be read
                   after the residue sequence.

              ICONN      Read but not used.

              NM0        The molecule number to which the first main atom of
                   the present molecule is attached either by a covalent or
                   non-covalent bond. If there is no covalent attachment,
                   NM0 should be set equal to 1, as the first molecule defines
                   the space axes for all subsequent molecules.
                  *** This is usually set equal to 1 ***

              NA0        The relative atom number in molecule NM0 to which the
                   current molecule's first main type atom is connected.
                   If there is no covalent connection NA0 should be set
                   equal to 3 as the space axes are defined by the first
                   three atoms of the first molecule.  If this is not done
                   there will be an error in converting from internal
                   coordinates to cartesian coordinates in the EDIT module.
                  *** This is usually set equal to 3 ***

              IFTPRO     Flag for the type of protein terminal residues

              = 0  Standard (uncharged) terminal residues
              = 1  Charged terminal residues ( NH3+, COO-)

           -----------------------------------------------------------------------

              - 6C -     Residue information for the current molecule.  It is read
                   in the following format until a blank card is encountered.


      LINK module                                              Page 8


                   LBRES(I) , ITYPF(I) , I = 1, NRESM

                       FORMAT(16(A4,I1))

              LBRES(I)   Residue name

              ITYPF(I)   The type of force field for the current residue.
                   This option is included so that different types of residues
                   may be kept in the data base.  Currently three types are
                   available; the united atom type, the all atom type, and
                   Jorgensen's OPLS model.
                   Additional models could be put into the data base and
                   retrieved using this option.

              = 1  united atom model
              = 2  all atom model
              = 3  OPLS united atom model (requires use of OPLS force field file)

                   If this is zero then the previous non zero value is carried
                   over until a non zero option is specified.

                   NOTE: The program assumes it is done reading the residue
                         information when it encounters a blank card.  It is
                         always assumed that the first main type atom of the
                         current residue is connected to the last main type atom
                         of the preceding residue by a covalent bond.  If this
                         covalent linkage is not desired the two residues should
                         be separated by the spacer residue, '***'.  When two
                         residues are separated by the spacer residue they are
                         connected without a covalent linkage.

                         For example the sequence ALA 1GLY will be connected by
                         a covalent bond while ALA 1*** GLY  will be topologically
                         connected but no bonding parameters will be considered.
                         Both systems in this example use the united atom force
                         field.

           -----------------------------------------------------------------------
              - 6D -     Blank card to terminate residue input
           -----------------------------------------------------------------------

              - 6E -     This section is used to create cross linkages within a
                   molecule or between different molecules. A normal use of
                   this is to create disulfide bonds in proteins.

                       ***** ONLY IF ICROSL.GT.0 *****
                       *** ICROSL is set in card 6B ***

                   ICROS , JCROS , IACROS , JACROS , MOLNM

                       FORMAT(2I5,2A4,I5)

              ICROS , JCROS


      LINK module                                              Page 9


                   The residue numbers, as they are listed in 6C, of the two
                   residues to be cross linked.  ICROS specifies the relative
                   residue number in molecule MOLNM, (MOLNM is assumed to be
                   the current molecule number unless otherwise specified at
                   the end of this card.), and the residue JCROS is assumed
                   to be in the current molecule.

              IACROS     The graph name, the name assigned to the atom in PREP, of
                   the atom in residue ICROS which is involved in the cross link.

              JACROS     The graph name of the atom in residue JCROS which is
                   involved in the cross link.

              MOLNM      The molecule number to which the residue ICROS belongs.
                   If MOLNM = 0 then it is assumed to be in the current
                   molecule.

           -----------------------------------------------------------------------
              - 6F -     Blank card if ICROSL .GT. 0
           -----------------------------------------------------------------------

              - 7 -      KSTOP

                       FORMAT(A4)

              KSTOP      Control to exit from the program

                   immediately following the last blank card.  If additional
                   molecules are to be processed the cards in group 6 are
                   repeated before the 'QUIT' card.  The program will never
                   make a graceful exit if this card is missing since it
                   is working inside an infinite loop.

           -----------------------------------------------------------------------

                        ++++++ END OF INPUT ++++++


         Rev A Revision:    George Seibel
         LINK 3.0 Authors:  U.C. Singh and P.K. Weiner
         Director:          P.A. Kollman
                      Department of Pharmaceutical Chemistry
                      School of Pharmacy
                      University of California
                      San Francisco CA 94143
                      Phone  (415) 476 4637