Tim Hubbard and Jong Park
Centre for Protein Engineering, MRC Centre, Cambridge, UK, th@mrc-lmb.cam.ac.uk

Our entries to the structure prediction competition are for those sequences where there is no known homology with any sequence of known structure, with the objective of either recognising similar folds in the database of known structures (PDB) or predicting ab-initio a rough 3-D topology.

Both the fold recognition and ab-initio prediction algorithms used rely on the information contained in multiple sequence alignments, so predictions were not made in cases where the sequence family was very small or the alignments ambiguous. Initial multiple sequence alignments were generally obtained by mailing the target sequence to the PHD secondary structure prediction server [1] and were improved by adding additional related sequences obtained from running blast against a number of databases and using a number of other automatic and manual alignment methods.

For fold recognition hidden markov models (HMM) [2] were constructed from the multiple sequence alignment. Models were then used to search a subset of pdb chain sequences (pdb90), none of which has greater than 90% homology with any other. The sequence or alignment was also mailed to the PHD server to obtain a secondary structure prediction [1]. For each alignment a value was calculated measuring the degree of similarity between predicted secondary structure segments and those observed in the known structure (from DSSP [3]) using an algorithm similar to [4]. By considering the hmm score, the secondary structure overlap score and the ranking of similar folds in the list (using the fold classification of scop [5], which is incorporated into pdb90) a prediction of fold type was made. The predictions for xyla, kau, synapto, bphc and l14 were based mainly on the results from this approach.

In cases where a high beta sheet content was observed, an ab initio beta-strand pairing prediction was made [6]. The results of this prediction were combined with the PHD secondary structure prediction to identify strands most likely to pair. Because the number of possible pairings is proportional to the square of the number of strands, whereas the number of observed pairs is linearly related, prediction generally becomes less reliable as the number of stands increases. The predictions of the small proteins, prosub and staufen3 were mainly made using this approach. In the case of large proteins, the method was used to guide the selection of the closest fold in PDB, based on the similarities in predicted contact maps observed between target and folds proposed to be similar.

For the cases of rtp and chmut, no fold could be recognised and the sequences predicted to be mainly helical, however certain sequence patterns suggesting leucine zippers were identified and speculative topology predictions submitted based on this.

Structures Predicted			Main Method Used
====================			================
xyla					hmm
kau					hmm
prosub					beta-strand topology
synapto					hmm
staufen3				beta-strand topology + hmm
bphc					hmm + beta-strand topology
rtp					sequence patterns
chmut					sequence patterns
l14					hmm + beta-strand topology

Structures not predicted		Reason not predicted
========================		====================
bhted					too few homologous sequences
pcna					too short notice (2 days)
smanucecs				too few homologous sequences
ppdk					not enough time
pbdg					not enough time
mystery					suspected to be some sort of joke!
We thank Burkhard Rost, Reinhard Schneider and Chris Sander for access to the PHD secondary structure prediction server [1], the DSSP program [3], and the HSSP database, to Sean Eddy for use of HMM program suite [2], Erik Sonnhammer for the use of SWIR5, to Andrej Sali for use of Modeller and to TH's collaborators Alexey Murzin, Brenner and Cyrus Chothia for the development of SCOP [5]. TH is grateful to the MRC and ZENECA for financial support.

[1] B. Rost, C. Sander: Prediction of protein structure at better than 70% accuracy. J. Mol. Biol., 1993, Vol. 232, pp. 584-599.
[2] S. Eddy, "HMM*: Hidden Markov Modeling of Proteins and Nucleic Acids", software freely available via anonymous ftp to cele.mrc-lmb.cam.ac.uk, in pub/sre, documentation from http://logi.mrc-lmb.cam.ac.uk/. Contact sre@mrc-lmb.cam.ac.uk for information.
[3] W. Kabsch and C. Sander (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577-637.
[4] B. Rost, C. Sander, R. Schneider (1994). Redefining the goals of protein secondary structure. JMB, 235, 13-26.
[5] A. Murzin, S.E. Brenner, T.J.P. Hubbard, C. Chothia (1994). The SCOP (Structural Classification of Proteins) database: available world-wide over the internet through the world wide web (WWW) at http://scop.mrc-lmb.cam.ac.uk/scop/.
[6] T.J.P Hubbard (1994). Use of beta-strand Interaction Pseudo-Potentials in Protein Structure Prediction and Modelling. In R.H. Lathrop (eds.), Proceedings of the Biotechnology Computing Track, Protein Structure Prediction MiniTrack of the 27th HICSS. IEEE Computer Society Press, pp. 336-354. (macintosh Postscript file available by anonymous ftp to ind2.mrc-lmb.cam.ac.uk in /pub/th/beta/beta.hicss-27.ps.Z. Contact th@mrc-cpe.cam.ac.uk for information.

Structural Biology home page
Asilomar Conference home page
LLNL Disclaimer
Web page maintained by BBRP Webmaster (BBRPWebmaster@humpty.llnl.gov).
CONF-941241
Last modified on 1-11-95