------------------------------------------------------------------------------- SISC Sequence Identity & Structure Comparison version 12/1999 Adam Zemla Email: adamz@llnl.gov THE WEB BASED FACILITY IS UNDER CONSTRUCTION! ------------------------------------------------------------------------------- The following data files are updated every week: - PDB_list The list of current set of structures downloaded from Protein Data Bank (PDB files). - SEQRES.all_list The list of all chains of sequences of amino acids that are extracted from SEQRES records of PDB files. - SEQRES.unique The set of unique residue sequences. The set of basic (primitive) clusters of sequences. - SEQRES.unique_list The list of unique residue sequences (clusters). - PDB_seq_ss.all The set of all residue sequences extracted from ATOM records (coordinates) of PDB files. Each residue sequence (chain) is reported together with its secondary structure element assignment generated by DSSP program. The DSSP code H = alpha helix I = 5 helix (pi helix) G = 3-helix (3/10 helix) E = extended strand, participates in beta ladder B = residue in isolated beta-bridge T = hydrogen bonded turn S = bend C = others --------------- FILES FORMAT DESCRIPTION -------------------------------------- 1. Example of the SEQRES.unique file: >Cluster_name Cluster_sequence_length number_of_chains # si chain_name percent_of_coordinates pdb_length resolution SequenceOfAminoAcidsExtractedFromSEQRESrecords >1boy 219 6 # si 1boy 96.35 211 2.20 # si 1tfh_A 92.24 202 2.40 # si 1ahw_C 91.32 200 3.00 # si 1ahw_F 91.32 200 3.00 # si 1jps_T 91.32 200 1.85 # si 1tfh_B 83.11 182 2.40 SGTTNTVAAYNLTWKSTNFKTILEWEPKPVNQVYTVQISTKSGDWKSKCF YTTDTECDLTDEIVKDVKQTYLARVFSYPAGNVESTGSAGEPLYENSPEF TPYLETNLGQPTIQSFEQVGTKVNVTVEDERTLVRRNNTFLSLRDVFGKD LIYTLYYWKSSSSGKKTAKTNTNEFLIDVDKGENYCFSVQAVIPSRTVNR KSTDSPVECMGQEKGEFRE 2. Example of the PDB_seq_ss.all file: >Chain_name cluster_name cluster_sequence_length # xxxxx: xxxxx # ---- " ---- # xxxxx: xxxxx SequenceOfAminoAcidsExtractedFromRecords-ATOM TheSecondaryStructureElementAssignmentBy-DSSP >1tfh_A 1boy 219 # Header: COAGULATION FACTOR # Title: EXTRACELLULAR DOMAIN OF HUMAN TISSUE FACTOR # Keywds: BLOOD COAGULATION, TISSUE FACTOR, COAGULATION FACTOR, # Keywds: GLYCOPROTEIN # Date: 10-APR-97 # Revdate: 19-AUG-98 # Resolution: 2.40 # Residues: 92.24 202 203 # Author: M.HUANG, R.SYED, E.A.STURA, M.J.STONE, R.S.STEFANKO, W.RUF, # Author: T.S.EDGINGTON, I.A.WILSON # Compnd: MOL_ID: 1; # Compnd: MOLECULE: HUMAN TISSUE FACTOR; # Compnd: CHAIN: A, B; # Compnd: FRAGMENT: EXTRACELLULAR DOMAIN; # Compnd: SYNONYM: TF, THROMBOPLASTIN, COAGULATION FACTOR III; # Compnd: ENGINEERED: YES # Cryst1: 64.393 85.828 112.901 90.00 90.00 90.00 P 21 21 21 8 NTVAAYNLTWKSTNFKTILEWEPKPVNQVYTVQISTKSGDWKSKCFYTTD CCCCCEEEEEEEETTEEEEEEECCCSSEEEEEEEEETTSCCEEEEEEESC TECDLTDEIVKDVKQTYLARVFSYPAGNV-AGEPLYENSPEFTPYLETNL SEEECHHHHTTCTTSCEEEEEEEEECSCC-CCSCEEEECCCBCHHHHSBC GQPTIQSFEQVGTKVNVTVEDERTLVRRNNTFLSLRDVFGKDLIYTLYYW CCCCEEEEEEETTEEEEEECCCEEEEEETTEEEEHHHHHGGGCEEEEEEE KSSSSGKKTAKTNTNEFLIDVDKGENYCFSVQAVIPSRTVNRKSTDSPVE ETTCCCCEEEEESSSEEEEECCTTCCEEEEEEEECTTCSSSCBCCCCCCE CMG ECC