Designing a Novel Globular Protein Fold

A major challenge of computational structural biology has been to create, from scratch, new proteins with heretofore unobserved three-dimensional structures. A collaboration from the University of Washington, Seattle, the University of North Carolina, Chapel Hill, and the Fred Hutchinson Cancer Research Center has now developed and demonstrated a methodology for protein-structure prediction and design by creating the first artificial globular protein with a novel topology, a 93-residue protein called Top7. Significantly, the x-ray structure of Top7 agreed almost precisely with the structure specified by the computational model.

Protein function depends on the complex geometries assumed as sequences of amino acids (each comprising a carboxyl group, an amino group, and a side chain) link into chains of residues that form local structural units such as α helices and β strands, and finally fold into compact, three-dimensional, globular domains and multidomain structures. Previous computational protein design attempts have focused primarily on redesigning the sequence of naturally occurring proteins to enhance their stability or to achieve new functionality.

Comparison of the x-ray crystal structure of Top7 (yellow) to the computationally designed model (green). The C-α overlay of the structure and model shows the overall high accuracy of the design (1.17-Å root-mean-square deviation).

Dial-A-Protein

Solving the human genome is only half the battle when it comes to converting genetic information into ways of treating disease. DNA provides the starting recipe for the manufacture of proteins, large molecules that carry out much of the work as the various processes that make up the daily life of a cell proceed. How proteins do this work depends on their three-dimensional structure, which is replete with folds, crevices, and other features that are just as important as the chemical composition specified by the recipe. Scientists know that strings of molecules known as amino acids form helices and other characteristic structures, that these in turn fold up into compact domains, and that the domains can combine into complex proteins. But given only the sequence of amino acids specified by DNA, scientists cannot predict the final folded and compacted protein structure and, hence, how it works. Being able to predict structures could also confer to biologists the ability to design artificial protein structures that might themselves be useful. Kuhlman et al. have made a major advance toward this goal with a mathematical technique that takes the reverse approach of determining which amino-acid sequence will result in a specified protein structure. The researchers have tested their technique by designing a protein that does not exist in nature, synthesizing the protein, and verifying its structure at the ALS.

These methods generally start with a known, high-resolution structure of the target protein and then try to optimize the packing of different amino-acid side chains while keeping the backbone template (carboxyl and amino groups) fixed to arrive at new low-energy sequence solutions. The key features of these methods are an efficient search protocol for sampling the theoretically vast number of sequence permutations and an energy function designed to model the physical forces that hold natural proteins together.

The collaborators extended these concepts in their RosettaDesign method. However, they were faced with the additional challenge of sampling protein-backbone structural space as well as sequence space, since their goal was the creation of a novel fold (where no natural backbone template was available). To this end, they constructed RosettaDesign to iterate between full-scale optimization of the sequence for a fixed-backbone conformation and gradient-based optimization of the backbone coordinates for a fixed sequence. Beginning with a simple back-of-the-envelope sketch of the target, a novel α/β fold, and this protocol, they designed Top7, a 93-residue α/β protein with a topology not observed in the Protein Structure Database (PDB), i.e., an artificial protein.

By means of a variety of biophysical techniques, the researchers determined the synthesized Top7 protein to be monomeric, highly soluble, and extremely stable to chemical and thermal denaturation. Preliminary NMR analysis also showed that Top7 had a rigid structure consistent with the target topology. Finally, thanks to the ALS Howard Hughes Medical Institute Beamline 8.2.1, they solved an x-ray structure of a single selenomethionyl-substituted variant of Top7 to 2.5-Å resolution with single-wavelength anomalous diffraction (SAD) data.

This high-resolution crystal structure revealed that the Top7 protein adopted the designed topology and in fact was strikingly similar to the design model at atomic resolution (1.17-Å root-mean-square deviation or RMSD over all backbone atoms). The two models differ most in the region surrounding the first N-terminal (amino-group end) hairpin, but even here the all-atom RMSD did not exceed 2.8Å. In contrast, the C-terminal (carboxyl-group end) halves of the crystal structure and the designed model are very similar, and core side-chain atoms are virtually superimposable.

Comparison of the x-ray crystal structure of Top7 (yellow) to the computationally designed model (green). The side-chains in the core of the C-terminal portion of the Top7 structure are effectively superimposable with the model.

The successful design of Top7 has two major implications. First, it is a strong validation of the understanding and description of the energetics of proteins and other macromolecules, much of which, incidentally, has been a consequence of the determination of high-resolution structures of those macromolecules. Second, it suggests that the development of protein therapeutics and molecular machines need not be limited to the structures sampled by the biological evolutionary process.

Research conducted by G. Dantas, G. Varani, and D. Baker (University of Washington, Seattle); B. Kuhlman (University of North Carolina, Chapel Hill); and G.C. Ireton and B.L. Stoddard (Fred Hutchison Cancer Research Center, Seattle).

Research funding: The National Institutes of Health and the Cancer Research Fund of the Damon Runyun–Walter Winchell Foundation. Operation of the ALS is supported by the U.S. Department of Energy, Office of Basic Energy Sciences.

Publication about this research: B. Kuhlman, G. Dantas, G.C. Ireton, G. Varani, B.L. Stoddard, and David Baker, “Design of a novel globular protein fold with atomic-level accuracy,” Science 302, 1364 (2003).

ALSNews Vol. 239, March 31, 2004

Designing a Novel Globular Protein Fold

Dial-A-Protein

More ALS Science