pmc logo imageJournal ListSearchpmc logo image
Logo of pnasPNAS Home page.Reference to the article.PNAS Info for AuthorsPNAS SubscriptionsPNAS About
Proc Natl Acad Sci U S A. 1998 October 27; 95(22): 12930–12933.
PMCID: PMC23658
Biophysics
Folding and aggregation of designed proteins
R. A. Broglia,* G. Tiana,§ S. Pasquali,* H. E. Roman,* and E. Vigezzi*
*Dipartimento di Fisica Universita di Milano and Istituto Nazionale di Fisica Nucleare, I-20133 Milan, Italy; The Niels Bohr Institute, University of Copenhagen, 2100 Copenhagen, Denmark; and §Department of Physics, DTU Building 307, 2800 Lyngby, Denmark
To whom reprint requests should be addressed.
Edited by Peter G. Wolynes, University of Illinois, Urbana, IL, and approved September 1, 1998
Received May 15, 1998.
Abstract
Protein aggregation is studied by following the simultaneous folding of two designed identical 20-letter amino acid chains within the framework of a lattice model and using Monte Carlo simulations. It is found that protein aggregation is determined by elementary structures (partially folded intermediates) controlled by local contacts among some of the most strongly interacting amino acids and formed at an early stage in the folding process.
 
Studies of how proteins fold have shown that the way protein clumps form in the test tube is similar to how proteins form the so-called “amyloid” deposits that are the pathological signal of a variety of diseases, among them the memory disorder Alzheimer’s (16). Protein aggregation traditionally has been connected to either unfolded or native states. Inclusion body formation has been assumed to arise from hydrophobic aggregation of the unfolded or denatured states, whereas the amyloid fibrils have been assumed to arise from native-like conformations in a process analogous to the polymerization of hemoglobin S.
By using lattice-model simulations (718), we find that aggregation arises from elementary structures that are controlled by local contacts, which eventually build the folding nucleus (16) of the heteropolymers where nonlocal contacts play an important role. Aggregation takes place when some of the most strongly interacting amino acids establish their local contacts, leading to the formation of a specific subset of the native structure. These elementary structures, which provide local guidance in both the folding and the aggregation process, can be viewed as the partially folded intermediates suggested to be involved in the aggregation of a number of proteins (6, 1923).
We studied the simultaneous folding of two identical 20-letter amino acid chains, each composed of 36 monomers and designed to fold into their native conformation (Fig. 1a), within the framework of a simple lattice model of protein folding (14, 16, 18) and by using Monte Carlo (MC) simulations. Although the model does not treat side chains explicitly, the amino acids are chemically different. Their differences are manifested in pairwise interaction energies of different magnitude and sign, depending on the identity of the interacting amino acids. The configurational energy is
equation M1
1
{[r with right arrow above]} being the set of coordinates of all of the monomers describing a chain conformation. The quantity Δ([r with right arrow above]i[r with right arrow above]j) is a contact function. It is equal to one if sites i and j are at unit distance (lattice neighbors) not connected by a covalent bond, and zero otherwise. In addition, it is assumed that on-site repulsive forces prevent two amino acids from occupying the same site simultaneously, so that Δ(0) = ∞. There are 20 types of amino acids in the model. The quantities Um(i),m′(j) are the contact energies between amino acids of type m and m′ and were taken from table 6 of ref. 24. The 36-mer chain denoted S36 and designed by minimizing, for fixed amino acid concentration, the energy of the native conformation with respect to the amino acids sequence is shown in Fig. 1b. At temperature T = 0.20 (in our temperature scale) it folds in 8 × 106 MC steps and at T = 0.28 the folding time is 8 × 105 MC steps. The fractional population of the native state corresponding to these two temperatures is 91% and 10%, respectively, to be compared with a population of 10−5 for the heteropolymer folding temperature of T = 0.40. All of the calculations presented below were carried out at the temperature of T = 0.28, optimal from the point of view of allowing for the accumulation of statistically representative samples of the different simulations, and at the same time leading to a consistent population of the native conformation. Pilot calculations carried out at T = 0.24 and T = 0.26, where the fractional population of the native state is 55% and 20%, respectively, leads to results that agree in detail with the results obtained at T = 0.28.
Figure 1Figure 1
(a) The conformation of the 36-mer chosen as the native state in the design procedure. Each amino acid residue is represented as a bead occupying a lattice site. The design tends to place the most strongly interacting amino acids in the interior of the (more ...)
The characterization of the role played by the different amino acids in the folding process of S36 have been carried out in ref. 18, by using mutations (19 possible substitutions of monomers on each site). It was found that the 36 sites of the native conformation (see Fig. 1a) can be classified as “hot” (red beads, numbered 6, 27, and 30), “warm” (beads numbered 3, 5, 11, 14, 16, and 28), and “cold” (the rest of the beads) sites. On average, mutations on the 27 cold sites yield sequences that still fold to the native structure (neutral mutations), although the folding time is somewhat longer than for S36. Sequences obtained from mutations on the six warm sites fold, as a rule, to a unique conformation, sometimes different but in any case very similar to the native one. Mutations on the three hot sites lead, in general, to complete misfolding (denaturation) of the protein.
Essentially two different outcomes of the simulation studies of the simultaneous folding of two S36 sequences have been observed: (i) both chains fold to their native conformation (Fig. 1a), and (ii) both chains get intertwined in conformations that are quite compact and display some amount of similarity to the native conformation (Fig. 2). In the first case, each chain targets into its minimum energy structure (native conformation) about which it fluctuates (25, 26). The second case is associated with an ensemble of compact low-energy conformations typical of those reached in the folding of a random chain, where the system spends little time in each conformation and displays conspicuous energy fluctuations.
Figure 2Figure 2
Examples of aggregation. The hot sites of chain 1 and their nearest neighbors are shown as red and yellow beads, respectively, as in Fig. 1a. The hot sites of chain 2 are shown as blue beads, the corresponding nearest-neighbor amino acids in the native (more ...)
At the basis of these phenomena are the elementary structures built out of the monomer sequences S41 [equivalent] (3,4,5,6), S42 [equivalent] (27,28,29,30), and S43 [equivalent] (11,12,13,14) (see Fig. 1a). They are controlled by the local contacts 3–6, 27–30, and 11–14, which are among some of the most strongly interacting amino acids. These structures aside from containing, at the local level, essentially all of the amino acids found in the folding nucleus (16) of the protein, provide the local guidance for its formation and thus are, to a large extent, responsible for the fast folding of the designed sequence S36 (Fig. 1b). In fact, the structures S4i (i = 1,2,3) can be viewed as the local “bricks” of a dynamical LEGO kit to model protein folding.
The pairs of strongly interacting monomers (3,6), (27,30), and (11,14) become nearest neighbors very early in the folding process, with the associated first passage time (FPT) being of the order of 102 MC steps in all three cases. The corresponding local contacts achieve 90–95% stability after 0.25 × 106 MC steps, a time to be compared with the FPT for the folding of both interacting chains (see ii above) and equal to 2 × 106 MC steps. The folding core is formed essentially when the three different bricks of the same chain assemble together, establishing the nonlocal contacts 6–27, 3–30, 6–11, and 27–14, at which time it becomes easy for nonlocal contacts 27–16 and 30–33 to fall in place. The speed with which this process is done crucially depends on the local contacts between monomers 3–6, 27–30, and 11–14, which control the stability of the local structures S41, S42, and S43.
Once the folding nucleus of both proteins is formed, it takes fewer than 3 × 104 MC steps for them to reach the native configuration. All of the contacts that maintain the bricks in place involve at least one amino acid occupying a hot site in the native conformation of the isolated protein (18), that is a strongly interacting amino acid (Fig. 1a). Once the hot site amino acids are in place, it takes 0.6 × 106 MC steps for both proteins to fold (FPT), in keeping with the fact that while the FPT of the contact 6–27 is ≈0.4 × 106 MC steps, it takes ≈1.4 × 106 MC steps for it to become stable.
Aggregation results when one chain, in the process of establishing its nonlocal contacts, uses a hot site amino acid belonging to the other chain. The FPT associated with this phenomenon is typically 0.5 × 106 MC steps. In other words, aggregation occurs when the local structures (bricks) belonging to different chains attach to each other (see Fig. 2). Such a “mistake” can take place in a number of different ways, and not only in the ones that mimic the disposition of the bricks in the native core configuration, in keeping with the LEGO analogy. Because of the strongly interacting character of the amino acids occupying sites 27, 30, and 6, aggregation is, for all purposes, an irreversible process under native-like conditions, as seen by the results of simulations leading to aggregation that have been followed over 108 MC steps. We have repeated the calculations by using the contact energies Um(i),m′(j) (see Eq. 1) reported in table 5 of ref. 24 and obtained very similar results to the ones discussed above.
To see whether the local structures S4i (i = 1,2,3) are an artifact or not of the conspicuous dispersion displayed by the contact energies used in the calculations (24), we have repeated the simulations by using the Go model (8, 17). We found that the presence of the elementary structures S4i is, if anything, better defined in this case as compared to the case discussed previously, and that their role in the aggregation process is again essential. In particular, because all of these local structures now have equal energy content, in most of the events leading to aggregation, all three local structures of one chain find at least a local structure partner belonging to the other chain with which they interact.
By using again the contact energies of ref. 24, we have found that the rate of aggregation increases in a significant manner, by introducing cold (neutral) mutations. The chosen mutations are able to affect in a significant way the stability of one of the local structures, without much changing the ability the resulting isolated sequence S′36 has to fold on short call to the native conformation. In particular, substituting the amino acid R at position 11 of the designed sequence, by amino acid A, the rate with which aggregation takes place increases by 70% (i.e. from 22% to a 37% rate at a distance d = 4, where d represents the initial distance, in units of lattice spacing, between monomers number 18 of each of the chains). The reason for this increase is that it takes 0.6 × 106 MC steps for the pair of monomers 11–14 of the mutated sequence S′36 to establish a stable contact (as compared to 0.25 × 106 MC steps for S36). Consequently, the other two local structures (associated with the monomer groups S41 and S42) have more time and thus a better chance to interact with the homologous structures of the other chain, than in the case of the simultaneous folding of two S36 sequences. Similar results have been obtained by performing single and multiple mutations in cold and warm sites of the native conformation. Because 75% of all sites are cold, and thus associated with neutral mutations (18), there is a large number of mutations that, while destabilizing the elementary structures and thus increasing the rate of aggregation, do not affect the stability of the protein in an important way. These results are consistent with a number of observations, in particular those carried out in the study of the amyloid-forming system transthyretin. When altered by any of 50 different mutations, this protein, which normally occurs in the blood plasma, deposits in the heart, lungs, and gut, causing a lethal disease called familial amyloidotic polyneuropathy (27, 28). These mutations do not alter normal folding of the protein but do destabilize the protein structure, facilitating the formation of partially folded intermediates that readily aggregate to one another (29, 30).
We conclude that a given protein will have a (small) number of local partially folded intermediates that control both protein folding and aggregation. Within the model of designed proteins these are the elementary structures that build the folding nucleus. Consequently, most of the aggregates of this protein, as well as of the sequences homologous to it, will display similar native-like structures, independent of the nature of the effect triggering the aggregation.
Acknowledgments
Discussions with E. Shakhnovich are much appreciated. We thank the late Dr. N. D’Alessandro for help in modeling design. We gratefully acknowledge financial support by the North Atlantic Treaty Organization under Grant CRG 940231.
ABBREVIATIONS
MCMonte Carlo
FPTfirst passage time

Footnotes
This paper was submitted directly (Track II) to the Proceedings Office.
References
1.
Fink, A L. Folding Design. 1998;3:R9–R23. [PubMed]
2.
Silow, M; Oliberg, M. Proc Natl Acad Sci USA. 1997;94:6084–6086. [PubMed]
3.
Mitraki, A; King, J. Biotechnology. 1989;7:690–697.
4.
Wetzel, R. Cell. 1998;86:699–702. [PubMed]
5.
Janicke, R. Philos Trans R Soc London B. 1995;348:97–105. [PubMed]
6.
Wetzel, R. Trends Biotechnol. 1994;12:193–198. [PubMed]
7.
Ueda, Y; Taketomi, H; Go, N. Int J Pept Protein Res. 1975;7:445–449. [PubMed]
8.
Go, N; Abe, H. Biopolymers. 1981;20:1013–1031. [PubMed]
9.
Lau, K; Dill, K. Macromolecules. 1989;22:3986–3997.
10.
Sklonik, J; Kolinski, A; Sikorski, R. Comm Mol Cell Biophys. 1990;6:223–247.
11.
Covell, D; Jernigan, R. Biochemistry. 1990;29:3287–3294. [PubMed]
12.
Godzik, A; Kolinksi, A; Sklonik, J. J Comput Chem. 1994;14:1194–1202.
13.
Socci, N; Bialek, W; Onuchic, J. Phys Rev E. 1994;49:3440–3443.
14.
Shakhnovich, E I. Phys Rev Lett. 1994;72:3907–3910. [PubMed]
15.
Klimov, D; Thirumalai, D. Phys Rev Lett. 1996;76:4070–4073. [PubMed]
16.
Shakhnovich, E I; Abkevich, V; Ptitsym, O. Nature (London). 1996;379:96–98. [PubMed]
17.
Pande, V S; Grosberg, A Y; Tanaka, T; Rokshar, D S. Curr Opin Struct Biol. 1998;8:68–79. [PubMed]
18.
Tiana, G; Broglia, R A; Roman, H E; Vigezzi, E; Shakhnovich, E I. J Chem Phys. 1998;108:757–761.
19.
King, J; Haase-Petingell, C; Robinson, A S; Speed, M; Mitraki, A. FASEB J. 1996;10:57–66. [PubMed]
20.
Speed, M A; Morshead, T; Wang, D I O; King, J. Protein Sci. 1997;6:99–108. [PubMed]
21.
Hurla, M R; Helms, L R; Li, L; Chan, W; Wetzel, R. Proc Natl Acad Sci USA. 1994;91:5446–5450. [PubMed]
22.
Kim, D; Yu, M H. Biochem Biophys Res Commun. 1996;226:378–384. [PubMed]
23.
Fink, A L. Annu Rev Biophys Biomol Struct. 1995;24:495–522. [PubMed]
24.
Miyazawa, S; Jerningan, R L. Macromolecules. 1985;18:534–552.
25.
Ptitsym, O B. Protein Folding. Creighton T E. , editor. New York: Freeman; 1992. pp. 243–300.
26.
Shakhnovich, E I; Finkelstein, A V. Biopolymers. 1989;28:1667–1694. [PubMed]
27.
McCutehen, S L; Colon, W; Kelley, J W. Biochemistry. 1993;32:12119–12127. [PubMed]
28.
McCutehen, S L; Lai, Z; Miroy, G; Kelley, J W; Colon, W. Biochemistry. 1995;34:13527–13536. [PubMed]
29.
Hamilton, J A; Steinraut, L K; Braden, B C; Liepnieks, J; Benson, M D; Holmgren, G; Sandgren, O; Steen, L. J Biol Chem. 1993;268:2425–2430. [PubMed]
30.
Terry, C J; Damas, A M; Oliveira, P; Saraivia, M J; Alves, I L; Costa, P P; Matias, P M; Sakaki, Y; Blake, C C F. EMBO J. 1993;12:735–741. [PubMed]
31.
Humphrey, W; Dalke, A; Schulten, K. J Mol Graphics. 1996;14:33–38. [PubMed]