1.  Carnal


                              Wilson S. Ross
                  Department of Pharmaceutical Chemistry
                  University of California, San Francisco
                             ross@cgl.ucsf.edu
                            ross@ucsfcgl.bitnet

      Usage:


           carnal [-O] < analin > analout
           carnal [-O] -i analin -o analout -p parm


      -O   Overwrite output files.

      CARNAL is a new coordinate analysis program that allows compar-
      isons between multiple streams of coordinates using a  flexible
      command  language.  It has many of the capabilities of ANAL and
      MDANAL, which it is intended to  eventually  replace.  It  also
      provides  set-theoretic  group  specification, cartesian vector
      oriented measurements, hbond analysis, output of  distributions
      (including  radial), selection of coordinate sets from streams,
      interpretations of md streams in terms of  windows  and  format
      conversion.

           ___________________________________________________________________
           |                                                                  |
           |                                Stage 1                Stage 2    |
           |                                (CARNAL)            (You supply)  |
           |                                                                  |
           |  ___________         stream(s)              tables,              |
           |  | Dynamics |----->|          ___________   coords               |
           |  -----------       |--------->| static & |                       |
           |                    |- - - - ->| dynamic  |---------> numerical   |
           |  ______________    |- - - - ->| analysis |            analysis   |
           |  | Minimization|-->|         /-----------            & display   |
           |  --------------             /   ^                                |
           |                            /    |                                |
           |               parm file(s)/     |                                |
           |                                 commands                         |
           |                                 (analin)                         |
           |__________________________________________________________________|


      1.1.  Contents

                   Introduction
                           Input
                           Output


                           Analin intro
                               Overview of analin sections
                               Simple analin example
                   Analin Syntax Specification
              More complex capabilities
                 HBOND
                 DISTRIBUTION
              Examples


      1.2.  Introduction

      CARNAL  input  is  essentially a programming language that lets
      one specify variables and perform operations on them.  The con-
      trol  input file for CARNAL is referred to as analin.  The com-
      mands in this file name the other input  and  output  files  as
      well  as  the measurements to be performed.  These commands are
      described in detail in the syntax  specification  and  examples
      below.  The -o file or standard output contains carnal's inter-
      pretation of the analin input and summary data for the run.

      Note that periodic boundary conditions are only applied to DIS-
      TRIBUTION  DIST  measurements. {CHANGE this eventually} This is
      because of difficulties in  measuring  across  2  streams,  but
      could  be  enabled  in  the  normal  case of measuring within a
      stream.


      1.2.1.  Input

      CARNAL takes as input one or more "streams"  of  input  coordi-
      nates,  which  are lists of restart, mdcrd, or Amber pdb format
      files.  The formats can be mixed in a  stream,  but  the  files
      must have the same number of atoms in the same order defined by
      the parm file for that stream. Each stream may have a different
      parm  file.   The -p argument or the first parm file defined in
      analin is used as the default parm file  if  no  parm  file  is
      specified for a stream.  At least one stream must be specified;
      the first one is used as the default when a stream reference is
      expected. CARNAL will also load static coordinate sets for com-
      parison with the individual sets in  the  streams.  It  detects
      formats  transparently.  NOTE: periodic boundary conditions are
      only applied to DISTRIBUTION DIST measurements; this is because
      of  difficulties  in measuring across 2 streams, but could (and
      at some point will) be enabled in the normal case of  measuring
      within  a  stream.   Compressed  input  files  ending in .Z are
      uncompressed transparently, allowing disk  space  to  be  saved
      (the disk version remains compressed).


      1.2.2.  Output

      CARNAL writes as output tables of measurements (scalar measure-
      ments or 0/1 hydrogen bond occupancies), distributions (includ-
      ing radial) and coordinate sets in mdcrd, restrt or pdb format.
      Summary data is also written to the  file  named  with  the  -o
      argument  on the command line or standard output if no -o argu-
      ment is given.


      1.3.  Analin introduction

      This introduction is intended to give the feel  of  the  analin
      language via an overview of the syntax and a simple example.  A
      complete syntax definition and more complex examples are  given
      below.


      1.3.1.  Summary of Analin Sections

      These  are  the required sections in the analin input file syn-
      tax.  Comments follow '!'s.  In actual analin files, a  '#'  at
      the  beginning  of  a line turns it into a comment. There are 4
      main sections, each begun by a keyword in this order:


           FILES_IN           ! name parm, coord/restrt/pdb files
           FILES_OUT          ! name output tables, coord/restrt/pdb dumps
           DECLARE            ! describe items to be measured
           OUTPUT             ! direct declared stuff to output files
           END                ! end input, start execution: STOP may be
                              ! substituted for debugging: program stops


           Things are declared in the first 3 sections and referenced
      in  the last 2 sections. When something is declared it is given
      an id for referencing it later.


      1.3.2.  A Simple Analin Example

      Select some coord sets from mdcrd files and output them in  pdb
      format:


           FILES_IN
             PARM   p1 ketop;           ! keyword, id, filename
             STREAM s1 kecrd kfcrd;     ! keyword, id, 2 filenames
           FILES_OUT
             COORD  c1 /tmp/ke.p PDB;   ! keyword, id, filename, output format
           DECLARE
                                        ! no declarations for this simple case
           OUTPUT


             COORD  c1 s1 SELECT (1 3 5 200);
                                        ! keyword, files_out id, files_in id:
                                        ! command to select sets 1, 3, 5, 200
           END


      In this case, coord sets 1, 3, 5, and 200 from the concatenated
      stream of mdcoord files kecrd and kfcrd will be output  to  pdb
      files /tmp/ke.p.1 /tmp/ke.p.3 /tmp/ke.p.5 /tmp/ke.p.200.


      1.3.3.  Analin Syntax Specification

      Notes

      Aspects  to be changed are indicated by 'CHANGE:'.  Definitions
      must precede references: you cannot refer to something  defined
      later  in  the  file. Characters reserved for explicit purposes
      are:


                 #  -  .  %  (  )  &  |  ,  !  ?


      A '#' as the first character of a line makes the  line  a  com-
      ment.  The  format  is  entirely  free, i.e.  statements can be
      spread across multiple lines with any indentation and with com-
      ment  lines  embedded. Lines may not exceed 80 characters - re-
      read the preceding sentence if you think that causes a problem.
      In  the  syntax  below,  items  in [] brackets are optional and
      items within {} braces are descriptions rather  than  token-by-
      token matchings.

      _______________________________________________________________

          ---FIRST SECTION = "FILES_IN"

          Input coordinates may be MD crd  dump,  inpcrd/restrt  or
          Amber output pdb format.

      FILES_IN

          PARM  id  filename;
              Amber  Parm  file. Multiple parms can be defined; the
              1st one (or one defined by the  [-p  parm]  argument)
              becomes  the default for STREAMs that don't specify a
              parm.

          STREAM  strid  [parmid]
                  [NOBOX] [ATOM n] [WIN x y] file1 file2 ... ;
              At least 1 STREAM must be specified. STREAM files are
              read  sequentially  at  each step. If > 1 STREAMs are
              named, they can be compared  at  each  step.   If  no


              parmid  is  given,  the  first one defined is used by
              default. The NOBOX and ATOM options allow  CARNAL  to
              handle  certain discrepancies between the parm topol-
              ogy and the input stream.  If these options are inac-
              curate,  synchronization  may  be  lost, resulting in
              garbage. ATOM is for reading in  a  stream  that  has
              fewer  atoms  that  the  prmtop - such a stream might
              have been created earlier using the  ATOM  option  in
              the  COORD  section.  All coordinate sets in a stream
              must have the same number of atoms.

              NOBOX
                Mdcrd files for periodic simulations have box coor-
                dinates after each coordinate set. Carnal automati-
                cally detects the presence of  periodic  conditions
                from  the  parm topology and allows for reading the
                box coordinates  in  mdcrd.  However,  minimization
                restrt  files  (as  well  as  constant volume mdcrd
                files previous to 4.1) do not include the box coor-
                dinates.   The  NOBOX  option allows carnal to read
                min.rst and old constant volume  mdcrd  files  cor-
                rectly.
              ATOM n
                Read  only n atoms (more may be defined in the parm
                file).  I.e. coordinates for only n  atoms  are  in
                the  stream. Implies NOBOX, i.e. if a box is speci-
                fied in prmtop, it is ignored.
              WIN
                Means, "skip x sets, use y sets"  repeatedly.  This
                is  for  analysis  of periodic equilibration / data
                collection runs such as gibbs.

          STATIC  statid  [parmid]
                  [NOBOX] [ATOM n] file1 file2 ... ;
              STATIC files are read at the beginning and remain  in
              memory  for comparison with STREAM coordinates.  Each
              static set in an id  can  be  referenced  by  'id%1',
              'id%2' etc. See STREAM above for "NOBOX" and "ATOM n"
              descriptions.

      _____________________________________________________________

          ---SECOND SECTION = "FILES_OUT"

      FILES_OUT

          TABLE  tabid  filename ;
              In the tables, there is one "logical row" per coordi-
              nate set measured, so a given measurement over a tra-
              jectory  occupies a column. For example, the Nth item
              directed to the table (in the OUTPUT section,  below)
              would  form the Nth numerical column and the Ith mea-
              surement of that value would be in  the  Ith  logical


              row.  The logical rows are wrapped so that a row con-
              tinues through a series of lines  in  a  single  file
              beginning  with  keys  of columns by grepping for the
              key letter. If there is demand, rows can  instead  be
              spread  across  multiple files (filename.0 filename.1
              ...)  or  just  tabbed  continuously  within  a  file
              (harder to read visually).  Thus, the format is:


                     L0   m1    m2    m3    m4    ...   |
                     L1   m11   m12   m13   m14   ...   |   1st coord set
                     L2   m21   m22                     |
                     L0   m1    m2    m3    m4    ...    |
                     L1   m11   m12   m13   m14   ...    |  2nd coord set
                     L2   m21   m22                      |


              where  the measurements are m1, m2, ... m22 extending
              over a logical line consisting of lines  'L0',  'L1',
              and 'L2'.

          COORD  crdid file [APPEND] [BLANK] [format] ;
              APPEND
                Add to end of named file if it exists.
              BLANK
                Write a blank line after each set.

              Format symbols are 'PDB' 'RST' and 'CRD'. The default
              format is CRD.

          HBOND  hbid  base_file [TABLE] [LIST];
              TABLE
                Output table of occupancies in base_file.tab.  This
                table  has  two  sections.  The first part is a key
                that lists the possible hbonds in order. The format
                is:


                       # 1  (ADE  2 O5' )--(HB   1 H   ) .. (ADE  2 O1' )
                       # 2  (ADE  2 O5' )--(HB   1 H   ) .. (ADE  2 N7  )


                The  second  part  consists  of a matrix of 0's and
                1's.  Each column is for a given hbond pair accord-
                ing  to  the numbering in the key section, and each
                row (line) is for a coordinate set.  The format  is
                '0'  if  no  hbond  is happening, "1" if it is. The
                Unix 'awk' utility can be used to extract column(s)
                of interest for further occupancy analysis or plot-
                ting, e.g. "egrep -v '^#' |  awk  '{print  $2,  $5,
                $8}'   base_file.tab"  where  the  'egrep'  command
                strips out the key section and  the  "awk"  command
                selects the columns of interest. Note that if there


                are too many columns for awk to handle, the  "perl"
                utility may be needed.
              LIST
                Output  list  of per-hbond-per-set to base_file.lis
                for extensive analysis. The format is:


                       1 ( ADE  2 N6   )( THY  5 O4   ) 2.930768 9.125721
                       2 ( ADE  2 N6   )( THY  5 O4   ) 2.957820 3.151730


                where the 1st number is the number of  the  coordi-
                nate  set  (starting  with  1),  followed by donor,
                acceptor, distance and angle (in  radians).   Atoms
                are  specified by residue name, residue number, and
                atom name.  (See  OUTPUT  HBOND  STAT  for  summary
                hbond info, including fraction of occupancy.)

              HBOND specifications are given in the OUTPUT section.
              Summary info is printed to standard output.
              CHANGE someday: at least one of {TABLE, LIST} must be
              given,  and the OUTPUT HBOND statement is req'd to do
              any hbond analysis.

          DISTRIBUTION  dbid  filename [DAP][MIN];
              DAP
                Put number of intervals on 1st line.
              MIN
                For DIST option, below. For  each  'solvent'  atom,
                write  out  min  distance  to  'solute' for the run
                (multiple lists separated by a '%'  are  output  if
                WIN  is  chosen  with  DIST). This is to figure out
                which waters to  keep  in  a  second  pass  dumping
                COORDs. List output goes to filename.min. See Exam-
                ple #.
              The definition of contents of the file  is  described
              in the OUTPUT section.

      _____________________________________________________________

          ---THIRD SECTION = "DECLARE"

          Each object is bound to a crd set; if not  bound  explic-
          itly,  the default is stream 0. References to that object
          inherit the binding of the object, except  for  within  a
          GROUP statement (GROUP (GROUP id)...). I.e. in general an
          optional [crdset] is not allowed after a declared  id  is
          used  (binding  at  reference  rather than creation), for
          now.

          "Points" can be atoms (specified by "number [crdset]"  or
          "atom_name residue_number [crdset]") or centers of geome-
          try or mass of groups  of  atoms  (see  GROUP  definition


          below).   For example, "PLANE id 12 34 58;" specifies the
          plane formed by atoms  12,  34  and  58  in  the  default
          stream,  and  "PLANE id OD1 2 ND1 4 OD1 6;" specifies the
          plane formed by atom name OD1 in residue 2, etc.   "PLANE
          id  gid1  gid2  gid3;"  specifies the plane formed by the
          centers of geometry of three groups.

      DECLARE

          ----Group is defined by set theoretic operations.   Group
          attributes  include center of geometry or mass, moment of
          inertia, and radius of gyration.

          GROUP  id  [crdset] (((set OP set) OP set) ...) ;
              The group is defined on  the  default  stream  unless
              [crdset] is given.  The nesting in parentheses deter-
              mines the order of evaluation.

              OP can be either "&" or "|"
                 where "&" = intersection, "|" = union
              and set can be any one of: {  (ATOM numlist),
                    (ATOM [NAME|TYPE] name_list),
                    (RES numlist),
                    (RES NAME name_list),
                    (SOLUTE)
                    (GROUP name_list_of_groupids),
                    !set  }
              In the (GROUP ) set, the groups  are  OR'd  together.
              The  NAME  and TYPE options allow the use of "?" as a
              wild card matching any single character.   "!set"  is
              the  negation  of  the  set, e.g. "!(ATOM TYPE H H?)"
              would be all  atoms  except  hydrogens  (unless  some
              hydrogen type did not begin with "H").

              Allowing expressions:

                groupid%center
                  center  of  geometry - default if groupid is used
                  as a point
                groupid%cmass
                  center of mass
                groupid%momin
                  moment of inertia
                groupid%radgyr
                  radius of gyration

          CUTRES id groupid cut;
              CUTRES  produces  a  list  in  AMBER   GROUP   format
              (Appendix B) of all residues with any atom within cut
              of the given groupid, including the residues  in  the
              group itself.  It is intended primarily for analyzing
              a single coordinate set to generate a group of  atoms
              within  an  area  of interest that will be allowed to


              move in a belly dynamics or perturbation  simulation.

          AXIS id {2 points} ;
              AXIS axid1 1 2 ;
                Atoms 1 -> 2 in stream 0 by default.
              AXIS axid1 1 st1id 2 st2id ;
                Atom   1   in   stream/static  st1  ->  atom  2  in
                stream/static st2.
              AXIS axid2 grp1id grp2id%cmass;
                grp1 center of geometry to  grp2  center  of  mass:
                note  that  the  groupids may be the same or groups
                may be defined on different streams.

          PLANE  plid { 3 points or 2 axes } ;
              A plane is treated as its normal vector where  appro-
              priate.

          ANGLE  angid { 3 points or 2 axes/planes };
              Planes are interpreted as normal vectors.

          TORSION  tid { 4 points or 3 axes/planes };
              Planes  are interpreted as normal vectors.  Note that
              in the averages printed to standard output, "circular
              statistics"  are  used  (discussion  at end of carnal
              section).

          TORSION  tid BACKBONE [residue1 [residue2] ] [crdset] ;
              Find all torsions involving backbone  bonds  (between
              Amber  main  chain  atoms),  starting  with  residue1
              (default:  1st  residue)  and  ending  with  residue2
              (default: last residue).  If first and last residues'
              terminal backbone atoms are  bonded  to  each  other,
              torsions involving them are included.

          DIST  dsid  { 2 of: points, axes, planes };
              Select  2 points, 2 axes, or point and axis or plane.
              [planes and axes are not supported yet]

          IMAGE imageid groupid ;
          IMAGE imageid groupid%cmass ;
              Only for periodic systems.  Place the system so  that
              groupid  (center of geometry or center of mass) is at
              the center of the box, and image all residues accord-
              ingly.  Cleans  up Ewald runs. Uses groupid's stream.
              Imageid can be used as a  streamid  in  measurements.
              NOTE:  this  is  not  guaranteed to give the desired,
              intact system on the first try - it  depends  on  the
              transformations  that  Ewald has made and the size of
              the box. For example, the center of geometry of a DNA
              duplex  could  be  in  the center of the box, but the
              strands could be on the edges. Successive transforma-
              tions  using  trial  and  error  may  be  required to
              restore an Ewald trajectory to normal appearance, and


              different transformations may be required for differ-
              ent frames. Also, intact molecules can be  broken  up
              by  this option if centering makes some residues pro-
              ject out of the box (possibly an indication that  the
              box size used was too small).

          RMS rmsid  [FIT]  groupid ;
          RMS rmsid  [FIT]  groupid  streamid ;
          RMS rmsid  [FIT]  groupid  streamid  refcrdid ;
          RMS rmsid  [FIT] groupid  staticid [ATOM] [RES];
          RMS rmsid  groupid2  prevrmsid ;
              Using atoms in groupid, measure rms of one coordinate
              set to another.  If  FIT  is  selected,  the  current
              coordinate  set is first positioned for minimum mass-
              weighted rms of the group on  the  reference  coordi-
              nates;  this  also  allows  the  rmsid  to be used to
              determine the resulting (non-fitted) RMS  measurement
              on other groups (as in the prevrmsid case above), and
              the FITted rmsid can be used like a stream for  other
              measurements,  as  well  as output via a COORD state-
              ment.

              The first and simplest case  above  uses  groupid  to
              compare the default stream to its first set. The sec-
              ond case compares a named  stream  (rather  than  the
              default)  to its first set.  The third case specifies
              both the stream and a reference set  for  comparison;
              this  reference set could be a static (single) set or
              another stream (comparing  successive  sets  in  each
              stream).

              The  fourth  example,  using  a STATIC id, produces a
              (triangular) matrix of RMS values, one for each  pair
              of coordinate sets in the STATIC id. The ATOM and RES
              options additionally cause per-atom  and  per-residue
              RMS  to  be  reported.  All  output is calculated and
              printed immediately, since  it  does  not  depend  on
              reading  a  serial  stream.   The matrix format for a
              3-coordinate-set STATIC would be:


                   ---RMS MATRIX
                   set0 set1 value0
                   set0 set2 value1
                   set1 set2 value2
                   ---


              NOTE: use of FIT with this option leaves all  coordi-
              nate  sets in the STATIC set with the group center of
              mass placed at the origin and each successive coordi-
              nate set rotated to fit its predecessor.


              The  final case measures the rms of a second group on
              a pair of sets that were positioned by a previous RMS
              FIT  statement.  See OUTPUT TABLE for instructions on
              obtaining per-residue and per-atom rms  for  streams.
              Any  number of any of these types of RMS measurements
              can be used.

          DME dmeid groupid ;
          DME dmeid groupid streamid ;
          DME dmeid groupid streamid refcrdid ;
          DME dmeid groupid staticid;
              Using atoms in groupid, measure Distance Matrix Error
              between  one coordinate set and another. DME compares
              intra-group distances in one conformation to those in
              another conformation.

            DME=sqrt(2/(N*(N-1))*__________(distance-refdistance))

              In  proteins, the convention is to use DME for groups
              defined on C-alpha atoms.   The  first  and  simplest
              case above uses groupid to compare the default stream
              to its first set. The second case  compares  a  named
              stream  (rather  than  the default) to its first set.
              The third case specifies both the stream and a refer-
              ence  set for comparison; this reference set could be
              a static set or another stream.   The  final  example
              produces a (triangular) matrix of DME values, one for
              each pair of coordinate sets in the  STATIC  id.  The
              format for a 3-coordinate-set STATIC would be:


                   ---DME MATRIX
                   set0 set1 value0
                   set0 set2 value1
                   set1 set2 value2
                   ---


          PUCKER          pukid          NUCLEIC         [streamid]
          [residue_names|residue_numbers] ;
          PUCKER pukid number_of_points points ;
              Measure pucker using algorithm of D. Cremer and J. A.
              Pople  (JACS  96:6  pp  1354-1358,  1975).   For  the
              NUCLEIC options, the Altona and  Sundaralingham  con-
              vention  (JACS  94  pp  8205-8212,  1972  or p. 20 of
              Saenger's "Principles  of  Nucleic  Acid  Structure",
              Springer-Verlag,  1983)  is approximated by adding 90
              degrees to the phase angle, the  puckers  are  always
              ordered  according to residue order in the parm file,
              and the standard atom names (O4'/O1', C1', C2',  C3',
              C4')  are  used to determine the points.  (For a com-
              parison of different nucleic acid pucker conventions,
              see  S. C. Harvey and M. Prabhakaran, JACS 108:20, pp


              6128-6136, 1986.)  In the  general  case  (specifying
              points explicitly), a ring of N points can be parame-
              terized  by  N-3  alternating  amplitudes  and  phase
              angles.  Note that the Cremer/Pople algorithm finds a
              mean plane based on the  assumption  that  successive
              points  in  the ring have the same angle between them
              with respect to the center of geometry, so for  kinky
              rings  this  may not work.  Note that in the averages
              for angles  printed  to  standard  output,  "circular
              statistics"  are  used  (discussion  at end of carnal
              section).
                  PUCKER pukid NUCLEIC;
                    Measure pucker of all  standard  residues  ('94
                    force  field:  G5,  G,  G3, GN, etc.; '91 force
                    field: GUA, etc.) in default stream using stan-
                    dard  atom names (O4'/O1', C1', C2', C3', C4').
                  PUCKER pukid NUCLEIC streamid;
                    Same as above, but uses specific stream  rather
                    than default.
                  PUCKER pukid NUCLEIC GUA;
                    Measure  pucker  of  all  residues  named "GUA"
                    using standard atom names.
                  PUCKER pukid NUCLEIC 2,4,6,8 ;
                    Measure pucker of residues 2,4,6, and  8  using
                    standard atom names.
                  PUCKER pukid 5 O1' 2 C1' 2 C2' 2 C3' 2 C4' 2 ;
                    Measure  pucker  of  5 points: O1' (residue 2),
                    C1' (residue 2), etc.

      _____________________________________________________________

          ---FOURTH SECTION = "OUTPUT"

          Define what declares go to things defined in FILES_OUT.

      OUTPUT

          TABLE  tbid { column_list } ;
              At  least  one  column must be specified. Columns are
              printed in their order in the list.  Column_list  may
              include ids, classes of measurement (e.g. DIST) which
              print in the order declared, MEAS  which  prints  all
              scalar  measurements, or ALL which prints everything.
              AXIS ids result in vectors, PLANE ids in normals, and
              GROUP  ids  default  to  center  of  geometry  unless
              attributes are specified, such  as  grpid%cmass.  RMS
              ids   default   to   the  rms  of  the  group,  while
              rmsid%residues and rmsid%atoms give  per-residue  and
              per-atom  rms  respectively. For per-residue rms, the
              group must not have any partially-included  residues.
              If  either  per-residue or per-atom options are used,
              the statistics are printed in the  summary  with  the
              residue and atom names.


          COORD crdid streamid
                [SELECT (-i j k,l m-p q-)] [MOD h]
                [BOX]  [AVERAGE]  [ATOM  n]  [GROUP  gid]  [EXH2O m
                [GROUP gid]] [INH2O gid] ;
              SELECT (-5 7 8,10 100-105 200-)
                Select certain coordinate sets from the  stream  by
                order.   The  order  in  a stream starts with 1 and
                continues across however many  files  comprise  the
                stream.  Numbers are separated by spaces or commas,
                and "-" is used to indicate ranges. In  this  exam-
                ple,  select  sets 1 through 5, then sets 7, 8, and
                10, then  sets  100  through  105,  then  sets  200
                through  the  end.   Note  that this option selects
                coordinate sets  for  output  only,  and  does  not
                affect  measurements  on  the stream, as opposed to
                the STREAM WIN option, which pre-selects  sets  for
                all the other commands.
              MOD h
                Select  every  h'th set from the stream.  Note that
                this option selects files for output only, and does
                not  affect  measurements on the stream, as opposed
                to the STREAM WIN option,  which  pre-selects  sets
                for all the other commands.
              BOX
                Print  box coordinates after atoms (not allowed for
                PDB format).
              AVERAGE
                Average the coordinates. Not compatible with EXH2O,
                but  ok with ATOM, GROUP and INH2O. Produces a sin-
                gle set, so PDB or RST format is  advised  for  the
                corresponding  FILES_OUT  COORD  declaration.  Sug-
                gested that this be applied to an RMS FIT  streamid
                so that the area of interest has minimal distortion
                from drift or pressure scaling of  the  box.  Note:
                The  averaged coordinates may be a chemically unre-
                alistic hybrid of different regions of phase space,
                so  visual inspection, energy analysis, and perhaps
                energy minimization may be in order,  depending  on
                the  purpose.  As a simple example, the methane C-H
                bond length shortens due to H rotation.  Note 2:  a
                measurement on averaged coordinates is not the same
                as the average of the  same  measurement  over  the
                same trajectory.
              ATOM n
                Output  only  the  first  n  atoms. ATOM, INH2O and
                EXH2O are mutually exclusive options.
              GROUP gid
                Output only the atoms in the GROUP. When PDB format
                is  chosen, residues retain their original numbers.
              EXH2O m [GROUP gid]
                Omit all but  m  waters  from  the  set,  retaining
                either  those  closest  to the non-waters, or those
                closest to the atoms specified by  GROUP.  Distance


                is  measured  from water oxygen. Waters are printed
                in order of closeness to the target, i.e. the order
                varies  from  set to set, so identity-based dynamic
                graphics smoothing schemes  will  fail.  Cannot  be
                used with INH2O.
              INH2O gid
                Omit all waters from the set except those with atom
                type OW in group gid, where gid contains only  OWs.
                This  group is intended to be built with the output
                of a previous pass  using  DISTRIBUTION  MIN  which
                informs  the  user how close each water came to the
                area of interest during the run.  When  PDB  output
                is  specified,  water  residues keep their original
                numbers.  See example section below.

          HBOND hbid [DONOR [EXACT] g1] [ACCEPTOR [EXACT] g2]
                [DISTANCE x] [ANGLE y] [STATS];
              "hbid" is an id from FILES_OUT, above.
              DONOR and ACCEPTOR indicate group ids  for  searching
              for  the  appropriate atoms (note: donor is the heavy
              atom to which the hydrogen is attached). The  default
              for  either  is  all  atoms. A single group id may be
              given in place of  separate  definitions.   If  DONOR
              and/or  ACCEPTOR  are  specified,  the  EXACT  option
              forces carnal to use all heavy atoms that may  apply,
              instead of just the "classic" ones such as oxygen and
              nitrogen.
              DISTANCE
                Cutoff distance  in  angstroms  between  the  heavy
                atoms: default is 4.0.
              ANGLE
                Cutoff H-donor-acceptor angle in degrees (0 is lin-
                ear): default 1 radian ~= 60 degrees.
              STATS
                Directs printing of per-hbond summaries to standard
                output. The format is:


                       HBOND h1 stats:
                         # 19 (ADE  2 N6  )_(ADE  2 HN6A)..(THY  5 O4  ) % 64.400000
                              distance    avg: 2.909575  max 2.961379   min 0.000000
                              angle(deg)  avg: 7.241544  max 15.219878  min 0.000000


                where  the  #  refers to the column of file.tab and
                the "64.400000" gives the percentage of  occupancy.
                The  other  statistics  are only for the "occupied"
                states. The distance is between donor and  acceptor
                atoms.

          DISTRIBUTION dsid
                { RAW | min max nboxes [WIN nsets] }
                { measid | DIST group1 [ group2 [ALL]]  [NORM]  } ;


              Distribution output can be either RAW (a long line of
              ascii  floating  point numbers per coordinate set) or
              binned and normalized.  If  the  latter,  the  WINdow
              option  causes  the  distribution for each nsets sup-
              plied by the STREAM to be written, with a "%" line to
              separate each window.

              RAW  is  the  recommended option for large data sets,
              since the proper range and number of bins are hard to
              guess  at:  the  rdis  program can be used on the raw
              output to quickly try various min/max/nboxes  numbers
              on  the  raw  file.  Note, however, that measurements
              including many terms can generate files  larger  than
              the original trajectory, so the RAW option may not be
              appropriate in such cases.  For example, when measur-
              ing  O-O distributions in a system of N waters, there
              are 9N numbers per coordinate set, but N(N-1)/2  dis-
              tances  to  write out if RAW is used. For N=1000 this
              amounts to a RAW file 55 times larger than  the  tra-
              jectory.

              Either  an  id  for  a scalar measurement or a radial
              distance distribution (DIST) may  be  specified.   In
              the  latter case, one or two groups can be specified.
              A single group may be given, in which case all intra-
              group  distances  are used (this would be appropriate
              for  e.g.  water  O-O);  otherwise,  two  groups  are
              required. When just two groups are given (without the
              ALL option), the groups are treated as  "solute"  and
              "solvent"  respectively: for each "solvent" atom, the
              distance to the closest "solute" atom is applied. The
              ALL   option  includes  all  group1-group2  distances
              rather than the "solvent" to closest  "solute"  atom.
              Note  that  when  two  groups overlap, distances of 0
              would be obtained for the  atoms  that  are  in  both
              groups, so groups should be disjunct.  The MIN option
              from the FILES_OUT section above is  only  valid  for
              the plain, two-group mode.

              Volumetric NORMalization is optional when radial DIS-
              Tribution is selected.  This  only  makes  sense  for
              measuring  the distribution around a single atom or a
              set of chemically identical atoms, since the  normal-
              ization is done by dividing the count for each inter-
              val by the volume of a spherical shell  having  radii
              equal  to  the  shell  boundaries.  (To normalize the
              distribution around a functional group, for  example,
              would  require  calculating  the  volumes of the non-
              spherical shells around the group.)

              The   output   format    is:    "bin_center     value
              smoothed_value   integral,"  where  bin_center is the
              center of the interval of the bin (the first is min +


              0.5  *  (max-min)/nboxes), the value is the distribu-
              tion for that bin, and the integral is the cumulative
              sum at the current bin.  The integral is not affected
              by the NORM option.

      1.4.  Examples

      ======Simple Coordinate Averaging


           #Simple Coordinate Averaging
           FILES_IN
             PARM   p1 ketop;           ! keyword, id, filename
             STREAM s1 kecrd kfcrd;     ! keyword, id, 2 filenames
           FILES_OUT
             COORD  c1 /tmp/ke.p PDB;   ! keyword, id, filename, output format
           DECLARE
                                        ! no declarations for this simple case
           OUTPUT
             COORD  c1 s1 AVERAGE;
                                        ! keyword, files_out id, files_in id:
                                        ! command to average sets
           END


      ======Simple Distance, Angle, Torsion Measurements

      Note the semicolons terminating each statement!


           # Plain measurements involving points (atoms, centers of mass)
           FILES_IN
             PARM p1 prm.top;
             STREAM s1 a1.trj a2.trj a3.trj;
           FILES_OUT
             TABLE tab1 meas.tab;
           DECLARE
           #
           #    First, some measurements using atoms only: format is:
           #       FUNC_NAME ID atom_specs ;
           #    each atom_spec can be either:
           #       ATNAME RESNUM
           #    or
           #       ATNUM
           #
             DIST dist1 O1' 2 O1' 7;
             ANGLE ang1 2 12 13;
             TORSION tor1 C1 4 C2 4 C3 4 C4 4;
           #
           #    A special case for torsions:
           #
             TORSION tor2 BACKBONE;
           #              ^ find all torsions consisting completely of


           #                main chain atoms
           #
           #    Now some more geometrical stuff: the angle between
           #    the normal vectors of two planes:
           #
             PLANE pla1  C1 4 C2 4 C3 4;
             PLANE pla2  C1 5 C2 5 C3 5;
             ANGLE ang2  pla1 pla2;
           #
           #    Now some measurements using composite points:
           #    format is the same, except for atom_specs.
           #    First we'll define a group consisting of the non-waters,
           #    and a group consisting of atoms in 3 numerically adjacent
           #    residues:
           #
             GROUP g1 (SOLUTE);
             GROUP g2 (RES 5,6,7);
           #
           #    And now to measure the distance between the centers of
           #    mass of each group to see how that pair of residues
           #    fluctuates from the center:
           #
             DIST dist2 g1%cmass g2%cmass ;
           #
           #    Now let's define 2 more residue-based groups:
           #
             GROUP g3 (RES 21,22,23);
             GROUP g4 (RES 44,45,46);
           #
           #    And we'll measure the angular fluctuations of the 3
           #    residue-based centers of mass:
             ANGLE ang3 g2%cmass g3%cmass g4%cmass ;

           OUTPUT
           #    Now direct all the measurements to the table defined above
           #    MEAS refers to all scalar measurements; each 1 will be a
           #    column in the order defined (dist1 ang1 tor1 tor2 dist2 ang2).
           #    Alternatively, the ids could be given explicitly in any order.
           #
             TABLE tab1 MEAS;
           END


      ======RMS deviation

      You want a measurement of the  minimum  RMS  deviation  of  a
      group of atoms as a measure of how disordered some structures
      are relative to another structure. Given the best fit on that
      group  of  atoms, you also want to know how much another sub-
      group differs and how much all  the  atoms  outside  the  fit
      group  differ.  You  also want to save the fit structures for
      viewing.


           #
           # RMS example: fit central 8 bases of a G4 DNA quadruplex
           #
           FILES_IN
             PARM p1 p524.top;
             STREAM s1
              sm4.pdb
              sm9.pdb
              sm17.pdb;
             STATIC ref_set  sm3.rst;
           #                 ^^^^^^^ this file will be used as one
           #                         reference set for the comparison;
           #                         no pdb file around, but that's ok
           FILES_OUT
             TABLE tbl  sm.rms;
           #            ^^^^^^ this is a table for the per-set rms values
             COORD crd  fit.p  PDB;
           #                   ^^^ save some structure(s) in PDB format
           #            ^^^^^ name of file
           DECLARE
           #
           #    Now let's get down to business.
           #    Declare a group of atoms to fit on - all the
           #    non-sugars in the central 8 GUAs:
           #
             GROUP grp1 (  (ATOM NAME N9 C8 H8 N7 C5 C6 O6 N1 H1
                                 C2 N2 HN2A HN2B N3 C4)
                         & (RES 4,6, 13,15, 22,24, 31,33)  );
           #             ^ boolean for "all of the atom names in
           #               the 1st part that occur in the following
           #               residue numbers"
           #
           #    RMS fit the structures in the stream to the reference
           #    set using the group of atoms just defined (fitting is
           #    mass-weighted). This creates a new thing that can be
           #    treated as a stream.
           #
             RMS fit1  FIT  grp1  s1 ref_set;
           #                         ^^^^^^^ reference structure id -
           #                                 if not given, the first
           #                                 structure in the stream
           #                                 would be used; or this id
           #                                 could be for a different
           #                                 stream instead of the static
           #                                 coord set used here
           #                      ^^ stream id - what to fit
           #                ^^^ group of atoms to use for fitting
           #           ^^^ fit (position) the stream set to minimize rms
           #     ^^^ 'fit1' is the new, streamlike thing
           #
           #    Specify the group of atoms to measure a secondary
           #    deviation - the terminal bases on each strand, w/out
           #    the sugars:


           #
             GROUP g2 ((RES 2,8, 11,17, 20,26, 29,35) &
                  (ATOM NAME N9 C8 H8 N7 C5 C6 O6 N1 H1
                        C2 N2 HN2A HN2B N3 C4));
           #
           #    Measure the RMS on that group resulting from the fit
           #    of the other bases, i.e. between the target structure
           #    and the new, fitted structure. Just measuring this
           #    time, not creating a new set.
           #
             RMS fit2  g2  fit1  ref_set;
           #
           #    Let's see what that fit does for _all_ the atoms outside
           #    of the fit, not just the end bases.
           #    Specify the group of all atoms not in the original group
           #    used for the fitting:
           #
             GROUP g3 (!grp1);
           #
           #    Measure the RMS of that group on the fitted structures
           #    as before.
           #
             RMS fit3  g3  fit1  ref_set;
           #
           #    Just for fun, create a new fitted set using the 1st group
           #    but using the 1st set in the stream as reference.
           #
             RMS fit4  FIT  grp1  s1;
           #                      ^^ just specifying the stream with no
           #                         reference defaults the reference to
           #                         the 1st set in the stream
           #                ^^^^ use our "central bases" group again
           #           ^^^ FIT the thing
           #     ^^^^ name of a new, stream-like thing starting with the
           #          second crd set in the stream
           #
           OUTPUT
           #
           #    Write the RMS values to the table:
           #
             TABLE tbl  fit1  fit2  fit3  fit4;
           #            ^^^^  ^^^^  ^^^^  ^^^^ output the measured rms
           #                                   values as columns in the
           #                                   table declared as a file
           #                                   above
           #       ^^^ to table 'tbl'
           #
           #    Average the the structures fitted using the 1st group
           #    on the reference set and print them to the coordinate
           #    file defined above. Perhaps we will then energy min
           #    this structure and claim it means something.
           #
             COORD c1  fit1  AVERAGE;


           #
           END


      ======Coordinate Selection: waters: use of INH2O with DISTRI-
      BUTION MIN

      You  want coordinate dump of solute with selected waters. Not
      the closest waters at each  step:  you  _know_exactly_  which
      waters  you  want:  the same ones in every set, maybe so that
      when you smooth the trajectory, the Nth  water  won't  change
      its identity at each step.  This is a 2-pass procedure: first
      you need to get a list of the waters: you need the atom  num-
      ber  of  the  OW (atom type) in each water. If you want those
      waters to be all those that came within a given  distance  of
      the  solute,  have I got an option for you. The thing to have
      is, for each water, the closest it came to whatever you  want
      to  call the solute. DISTRIBUTION MIN will give you a list of
      atom number, distance pairs that you can sort to generate the
      list  of  favored  waters you need. With this list of OW atom
      numbers, you can define a group which, in another  pass,  can
      be used to filter waters.


           # use of INH2O with DISTRIBUTION MIN - 1st pass
           FILES_IN
             PARM p1 prm.top;
             STREAM s1 a1.trj a2.trj a3.trj;
           FILES_OUT
             DISTRIBUTION d1 file MIN;
           #                       ^this is the key: creates "file.min"
           DECLARE
             GROUP g1 (SOLUTE);
             GROUP g2 (ATOM TYPE OW);
           #                     ^^ have to use OW
           OUTPUT
             DISTRIBUTION d1 0.0 10.0 10 DIST g1 g2;
           #                 ^^^^^^^^^^^             you'll also get the
           #                                           net curve; RAW ok
           #                             ^^^^        this is the 2nd key
           #                                   ^^^^  order is solute then
           #                                           solvent
           END


      --Now  the  critical  step: filtering the water. First we use
      'awk' to see what waters we want to keep, e.g.:

      % awk '$2 < 3.0 {print $1}' file.min | wc -l

      This tells you how many water oxygens came within the  cutoff
      (3.0  in  this  example).  Choose  a  cutoff  such  that  the


      resulting list of type OW atoms is the  right  size  for  you
      and:

      % awk '$2 < 3.0 {print $1}' file.min > temp

      Now  you  need  to  take the list of OWs in the temp file and
      include it in a GROUP ATOM  statement  as  in  the  following
      example in order to select the waters into a coordinate dump.


           # use of INH2O with DISTRIBUTION MIN - 2nd pass
           FILES_IN
             PARM p1 prm.top;
             STREAM s1 a1.trj a2.trj a3.trj;
           FILES_OUT
             COORD c1 filtered.trj;
           DECLARE
             GROUP g1 (ATOM 2,11,26,29);
           #                ^^^^^^^^^^ the type OW atom numbers of your choice
           OUTPUT
             COORD c1 INH2O g1;
           END


      ======DISTRIBUTION DIST (radial distance distributions) exam-
      ples


           # DISTRIBUTION EXAMPLE: ONE GROUP to itSELF (OW-OW)
           FILES_IN
             PARM p1 prmtop;
             STREAM s1 crd;
           FILES_OUT
             DISTRIBUTION d1 xdb ;
           DECLARE
             GROUP g1 (ATOM TYPE OW);
           OUTPUT
             DISTRIBUTION d1 RAW DIST g1 SELF;
           #                             ^ use all intra-group distances
           #                          ^ group id from DECLARE GROUP
           #                     ^ distance macro
           #                 ^ dump all distances to file for use w/ rdis
           #                   program (i.e. don't bin measurements or
           #                   output bins)
           #              ^ id
           #       For each 'solvent' group atom, the nearest 'solute' atom
           #       is found and binned if it satisfies the min, max criterion.
           END


           # DISTRIBUTION EXAMPLE: TWO GROUPS using closest atom in 1st to
           #                       each of 2nd
           FILES_IN


             PARM p1 prmtop;
             STREAM s1 crd;
           FILES_OUT
             DISTRIBUTION d1 xdb ;
           DECLARE
             GROUP g1 (RES NAME ADE);
             GROUP g2 (RES NAME THY);
           OUTPUT
             DISTRIBUTION d1 RAW DIST g1 g2 ;
           #                             ^ 2nd group is the 'solvent'
           #                          ^ 1st group is the 'solute'
           #                     ^ distance macro
           #                 ^ dump all distances to file (don't bin)
           #              ^ id
           #       For each 'solvent' group atom, the nearest 'solute' atom
           #       is found and binned if it satisfies the min, max criterion.
           END

           # DISTRIBUTION EXAMPLE: TWO GROUPS using all inter-group pairs
           FILES_IN
             PARM p1 prmtop;
             STREAM s1 crd;
           FILES_OUT
             DISTRIBUTION d1 xdb ;
           DECLARE
             GROUP g1 (RES NAME ADE);
             GROUP g2 (RES NAME THY);
           OUTPUT
             DISTRIBUTION d1 RAW DIST g1 g2 ALL;
           #                                ^ consider all group1-group2
           #                                  interactions
           #                             ^ 2nd group id from DECLARE GROUP
           #                          ^ 1st group id from DECLARE GROUP
           #                     ^ distance macro
           #                 ^ dump all distances to file (don't bin)
           #              ^ id
           END

           # DISTRIBUTION EXAMPLE: TWO GROUPS using all group1 intra pairs and
           #                       all inter group1-group2 pairs.
           #                       This case: all distances between water O and
           #                       other O (including water) in a water/octanol
           #                       solution.
           FILES_IN
             PARM p1 prmtop;
             STREAM s1 crd;
           FILES_OUT
             DISTRIBUTION d1 xdb ;
           DECLARE
             GROUP g1 (ATOM TYPE OW);
           #           ^^^^^^^^^^^^ all water oxygens
             GROUP g2 (ATOM TYPE OH);
           #           ^^^^^^^^^^^^ all octanol oxygens


           OUTPUT
             DISTRIBUTION d1 RAW DIST g1 SELF g2 ALL;
           #                                     ^ consider all octanol-water
           #                                       distances too
           #                                  ^ group id from DECLARE GROUP
           #                                    (octanol O)
           #                             ^ consider all water-water distances
           #                          ^ group id from DECLARE GROUP (water O)
           #                     ^ distance macro
           #                 ^ dump all distances to file (don't bin)
           #              ^ id
           END


      ======HBOND examples

      You  want a occupancies for all possible hbonds at each step.
      This file will consist of a line for each coordinate  set  in
      the  stream  with  a  '0' or '1' followed by a space for each
      possible hbond.  You also want the  percentage  occupancy  of
      each  hbond  over the run, and the average distance and angle
      when occupied.  And while you're at it, you want to print the
      distance and angle of each possible hbond.

      You  also want to specify the maximum distance and angle that
      qualify an hbond.

      The percentages and averages are written to the  main  output
      at the end of the run.


           FILES_IN
                   PARM p1 hbtop;
                   STREAM s1 hbmd;
           FILES_OUT
                   HBOND h1 xhb TABLE LIST;
           #                          ^ write a list of distances/angles
           #                            to "xhb.lis"
           #                    ^ write occupancies to "xhb.tab"
           #                ^ use "xhb" as the basis for filenames
           #
           #
           DECLARE
           OUTPUT
                   HBOND h1 DISTANCE 3.3 ANGLE 20.0 STATS;
           #                                        ^ print the averages
           #                                          of the occupied cases
           #                             ^ limit the angle;
           #                               default 1 radian ~= 60 degrees
           #                ^ limit the distance between heavy atoms;
           #                  default 4 Angstroms
           END


      Perhaps you want to specify the donor and acceptor groups, if
      only to limit the number of columns in the table. This  time,
      we'll also just use the default criteria for hbonds.


           # HBOND ANALYSIS EXAMPLE: USING GROUPS FOR DONOR/ACCEPTOR
           FILES_IN
             PARM p1 hbtop;
             STREAM s1 hbmd;
           FILES_OUT
             HBOND h1 xhb TABLE LIST;
           DECLARE
           DECLARE
             GROUP g1 (ATOM TYPE N2 NA);
             GROUP g2 (ATOM TYPE NC O );
           OUTPUT
             HBOND h1 DONOR g1 ACCEPTOR g2 STATS;
           END


      ====== Case history 1: a geometrical example

      A long, relatively stiff molecule was bending: how to charac-
      terize it?  One approach short  of  measuring  the  curvature
      (which  carnal  doesn't do yet) would be to define groups for
      different  segments,  then  measure  angles  between  vectors
      involving  centers  of  mass or geometry; e.g. if this is the
      molecule:


                   ======================================
                   ======================================
                   ======================================
                   ^^^^^^          ^^^^^^          ^^^^^^
                   grp1            grp2            grp3


             ANGLE a1 grp1%cmass grp2%cmass grp3%cmass;

           or

                   ======================================
                   ======================================
                   ======================================
                   ^^^^^^ ^^^^^^             ^^^^^^ ^^^^^
                   grp1   grp2               grp3   grp4

             AXIS ax1 grp1%cmass grp2%cmass;
             AXIS ax2 grp3%cmass grp4%cmass;
             ANGLE a1 ax1 ax2;


      1.5.  References

           Batschelet,  Edward.   Circular  statistics  in  Biology
      (1981) Academic Press Inc., New York, NY. This method is used
      for averaging angles that can encompass a full  360  degrees.
      For  motivation,  think  of  what  the  average  of 0 and 359
      degrees or 0 and 180 degrees would be.  The  'averages'  from
      this  method  are  in the range [-180..180], so e.g. a single
      value of 183 would result in a statistical 'average' of -177.
      Another  reference  for  circular statistics mentioned on the
      net as being easier to find: Fisher, N. I.  Statistical Anal-
      ysis  of Circular Data (1993) Cambridge University Press, New
      York.
      Kabsch, (1976) Acta  Cryst.  A32,  922-923  and  (1978)  Acta
      Cryst.  A34,  827-828.  A newer method using quaternions, not
      implemented here: Kearsley (1989) Acta Cryst. A45, 208-210.