Both the fold recognition and ab-initio prediction algorithms used rely on the information contained in multiple sequence alignments, so predictions were not made in cases where the sequence family was very small or the alignments ambiguous. Initial multiple sequence alignments were generally obtained by mailing the target sequence to the PHD secondary structure prediction server [1] and were improved by adding additional related sequences obtained from running blast against a number of databases and using a number of other automatic and manual alignment methods.
For fold recognition hidden markov models (HMM) [2] were constructed from the multiple sequence alignment. Models were then used to search a subset of pdb chain sequences (pdb90), none of which has greater than 90% homology with any other. The sequence or alignment was also mailed to the PHD server to obtain a secondary structure prediction [1]. For each alignment a value was calculated measuring the degree of similarity between predicted secondary structure segments and those observed in the known structure (from DSSP [3]) using an algorithm similar to [4]. By considering the hmm score, the secondary structure overlap score and the ranking of similar folds in the list (using the fold classification of scop [5], which is incorporated into pdb90) a prediction of fold type was made. The predictions for xyla, kau, synapto, bphc and l14 were based mainly on the results from this approach.
In cases where a high beta sheet content was observed, an ab initio beta-strand pairing prediction was made [6]. The results of this prediction were combined with the PHD secondary structure prediction to identify strands most likely to pair. Because the number of possible pairings is proportional to the square of the number of strands, whereas the number of observed pairs is linearly related, prediction generally becomes less reliable as the number of stands increases. The predictions of the small proteins, prosub and staufen3 were mainly made using this approach. In the case of large proteins, the method was used to guide the selection of the closest fold in PDB, based on the similarities in predicted contact maps observed between target and folds proposed to be similar.
For the cases of rtp and chmut, no fold could be recognised and the sequences predicted to be mainly helical, however certain sequence patterns suggesting leucine zippers were identified and speculative topology predictions submitted based on this.
Structures Predicted Main Method Used ==================== ================ xyla hmm kau hmm prosub beta-strand topology synapto hmm staufen3 beta-strand topology + hmm bphc hmm + beta-strand topology rtp sequence patterns chmut sequence patterns l14 hmm + beta-strand topology Structures not predicted Reason not predicted ======================== ==================== bhted too few homologous sequences pcna too short notice (2 days) smanucecs too few homologous sequences ppdk not enough time pbdg not enough time mystery suspected to be some sort of joke!We thank Burkhard Rost, Reinhard Schneider and Chris Sander for access to the PHD secondary structure prediction server [1], the DSSP program [3], and the HSSP database, to Sean Eddy for use of HMM program suite [2], Erik Sonnhammer for the use of SWIR5, to Andrej Sali for use of Modeller and to TH's collaborators Alexey Murzin, Brenner and Cyrus Chothia for the development of SCOP [5]. TH is grateful to the MRC and ZENECA for financial support.
[1] B. Rost, C. Sander: Prediction of protein structure at better than
70% accuracy. J. Mol. Biol., 1993, Vol. 232, pp. 584-599.
[2] S. Eddy, "HMM*: Hidden Markov Modeling of Proteins and Nucleic
Acids", software freely available via anonymous ftp to
cele.mrc-lmb.cam.ac.uk, in pub/sre, documentation from
http://logi.mrc-lmb.cam.ac.uk/.
Contact sre@mrc-lmb.cam.ac.uk for
information.
[3] W. Kabsch and C. Sander (1983). Dictionary of protein secondary
structure: pattern recognition of hydrogen-bonded and geometrical
features. Biopolymers, 22, 2577-637.
[4] B. Rost, C. Sander, R. Schneider (1994). Redefining the goals of
protein secondary structure. JMB, 235, 13-26.
[5] A. Murzin, S.E. Brenner, T.J.P. Hubbard, C. Chothia (1994). The
SCOP (Structural Classification of Proteins) database: available
world-wide over the internet through the world wide web (WWW) at
http://scop.mrc-lmb.cam.ac.uk/scop/.
[6] T.J.P Hubbard (1994). Use of beta-strand Interaction
Pseudo-Potentials in Protein Structure Prediction and Modelling. In
R.H. Lathrop (eds.), Proceedings of the Biotechnology Computing Track,
Protein Structure Prediction MiniTrack of the 27th HICSS. IEEE Computer
Society Press, pp. 336-354. (macintosh Postscript file available by
anonymous ftp to ind2.mrc-lmb.cam.ac.uk in /pub/th/beta/beta.hicss-27.ps.Z. Contact th@mrc-cpe.cam.ac.uk for
information.