chapter 8 GENES AS DETERMINANTS OF PROTEIN STRUCTURE 3 tudies on the biochemical effects of mutations have given strong support to the notion that individual genes are con- cerned with the biosynthesis of individual proteins. In a large num- ber of instances it has been possible to attribute the absence, or modification of an enzyme to a single gene mutation. The study of such biochemical lesions, not only in microorganisms like Neurospora and E. coli but in man and other higher organisms as well, has led to the concept of a "one gene-one have already introduced in Chapter 2. enzyme" relationship, which we The term "gene" as used in this context has, until quite recently, been employed to convey the purely abstract concept of a unit of heredity. It represented a quantum of genetic information that in some way controlled the biosynthesis of a single protein or, in more cautious terms, of some "functional unit." The recent advances in the biochemistry of the chromosome and of DNA, and in the mapping of genetic "fine structure" of the sort we have discussed in relation to bacteriophage, now make it possible to speculate about gene action 164 in chemical terms instead of formal abstraction. of S. Benzer, G. Streisinger, The investigations M. Demerec and his collaborators, G. Pontecorvo, and many others have indicated that the idea of a one-dimensional array of "genes," divisible by genetic recombina- tion, may very likely be extended down to molecular dimensions. Their results suggest that the word "pseudoallelism" needed to be invented only because of the difficulties of demonstrating extremely rare recombinations in unfavorable biological material. If we accept the generalities of the "one gene-one enzyme" con- cept, and if we are willing to go along with the present trend of opinion on the role of DNA as the basic determinant of heredity, we must seriously consider the conclusion that the information which governs details of protein structure is present in the chemical struc- ture of the DNA molecule. It is an undeniable temptation to sug- gest further that a point mutation is really just a very localized change in the sequence or the three-dimensional relationships within a poly- nucleotide chain and that such a localized change might reflect itself in the sequence and folding of the protein concerned. In spite of the fact that many investigators properly accept the generality as a working hypothesis, such speculations are, at present, mostly fancy with little fact. The pathway from gene structure to phenotypic protein may be a long and tortuous one, and we cannot rule out such possible complications as the combined action of several genes in the synthesis of a single protein or the involvement of cytoplasmic heredi- tary factors which might modify, or even initiate, steps in a biosyn- thetic pathway. If this hypothesis is an approximately correct one, however, we should, as N. Horowitz has pointed out, be able to demonstrate mutations that lead to qualitative as well as quantitative changes in enzymes and other proteins. It should be possible, for example, to show that various mutations within a given protein-determining re- gion of the genetic material of an organism can lead to "mutant" forms of a biologically active protein which exhibit varying degrees of functional adequacy. Mutations affecting portions of protein struc- ture that are essential for function should be lethal ones, whereas those affecting less essential regions might either be undetected or "leaky," to use the genetic patois. In spite of the fact that hundreds of examples have been found of gene-protein relationships, it has been possible to demonstrate a cor- relation between the mutation of a single gene and the chemical and physical properties of a homogeneous protein molecule in only a few instances. Many of these positive correlations have emerged from GENES AS DETERMINANTS OF PROTEIN STRUCTURE 165 studies on proteins of higher organisms for the simple reason that protein samples of sufficient purity are easier to come by with red cells, milk, and plasma than with microorganisms. However, the ad- vantages offered by microorganisms in respect to genetic mapping has been a tremendous stimulus to gene-minded protein chemists, and it is likely that many of the major advances in this area will be made on material from such sources. If, for example, the protein whose biosynthesis is under the control of that region of genetic material in T4 bacteriophage so elegantly mapped by Benzer (see Chapter 4) could be identified and isolated in pure form, it is clear that a direct TABLE 14 Alterations in Proteins Attributable to Mutations Protein Species Demonstrated or Possible Effects of Mutation Hemoglobin1*2 Man Sheep Mouse &Lactoglobulin** 2 Cattle Haptoglobin' Man Pantothenate-synthesizing enzyme' E. coli Tyrosinase' Neurmpora cr. Glutamic acid dehydrogenase' Neurospora cr. For more detailed reference see: Composition and charge Charge Charge Charge Charge Thermostability Thermostability Reversible heat activation 1. N. Horowitz, Federation hoc., 16, 818 (1956). 4. D. Steinberg and E. Mihalyi, Ann. Reu. &o&em., 26, 373 (1957). test, in enormous detail, could be made for the existence of a corre- spondence between "cistron" and protein. Such detail could never be achieved with human proteins because extensive gene mapping in man is limited by his lengthy generation time and his eugenic mores. A partial list of those proteins for which gene-linked modification has been demonstrated is presented in Table 14. With one excep- tion, human hemoglobin, the difference between the normal protein and that obtained from the mutant has been in electrophoretic mo- bility, heat stability, and serological behavior. The net charge, stabil- ity, and serology of a protein are, of course, quite distinctive charac- teristics, and the proteins in Table 14 which have been studied in respect to these parameters can almost certainly be assumed to exist 166 THE MOLECULAR BASIS OF EVOLUTION in forms whose differences are related to allelomorphic genes. Never- theless, small organic molecules, tightly bound to proteins, can modify charge, and polysaccharides or other haptenic substances may in- fluence antigenicity. For such reasons, the case of human hemoglobin is a particularly favorable one, since for this protein the electro- phoretic and solubility differences between mutant forms are at- tributable to actual modifications in amino acid sequence. In 1949, L. Pauling, H. A. Itano, S. J. Singer, and I. C. Wells' made the important observation that the hemoglobin of sickle-cell anemics is electrophoretically abnormal and that in individuals with sickle-cell trait (an asymptomatic condition) a mixture of the abnormal sickle- cell and the normal forms could be demonstrated. Extensive study of the familial relationships of sickle-cell anemia has indicated that this frequently fatal disease is inherited in a Mendelian fashion. By an analysis of the genetic relationships between sickle-cell anemia and sickle-cell trait, J. V. Neel established that the production of the abnormal hemoglobin was due to the presence of a single mutant gene. Genetically, the anemic may be characterized as homozygous for the sickling gene and the individual with the trait as heterozygous. The studies of Pauling and Itano and their collaborators, together with the discovery by H. Hijrlein and G. Weber2 of a congenital methemoglobinemia involving an abnormal globin component, stim- ulated the search for other genetically linked aberrations in hemo- globin synthesis. At present writing a dozen or more types of ab- normal hemoglobins which may be detected by their unusual physi- cal properties are known. In addition, there are a number of clinical situations in which detection depends on hematologic examination but no changes in the physical properties of hemoglobin have been observed. Such abnormal individuals have microcytic red cells or cells showing some other deviation from the normal morphology of erythrocytes. These instances of inhibition of synthesis of normal hemoglobin are collectively named thalassemia, and Allison has pro- posed, on the basis of the observation that the locus controlling the thalassemia effect does not appear to be allelomorphic with the nor- mal hemoglobin gene, Hb *, that the locus for thalassemia be desig- nated Th. The normal gene at this locus would then be termed ThN and the thalassemia allele, ThT. Examples of the clinical nomenclature and genotypic designations for a number of abnormalities involving the hemoglobin molecule are given in Table 15. This compilation is taken from the excellent re- view by Itano to which the reader is referred for more detailed in- formation. For our present purposes it is sufficient to recognize, GENES AS DETEkMlNANTS OF PROTEIN STRUCTURE 167 TABLE 15 The Human Hemoglobins' Method of Detection Method of Detection A Normal adult F Foetal x Electrophoresis S Electrophoresis Solubility Tactoid formation C Electrophoresis E Electrophoresis G Electrophoresis H Electrophoresis I ' Electrophoresis J Electrophoresis M Spectrophotometry D Electrophoresis and solubility Nomenclature of Syndromes Associated with Abnormalities in Hemoglobin Metabolism Genotype Condition - Homozygous Normal Sickle-cell anemia Hemoglobin C disease Thalassemia major Thalassemia major Heterozygous Sickle-cell trait Hemoglobin C trait Sickle-cell hemoglobin C disease Thalassemia minor Thalassemia minor Sickle-cell thalassemia disease Hemoglobin C thalassemia disease Doubly heterozygous Sickle-cell thalassemia disease Hemoglobin C thalassemia disease Hb Locus HbAHbA HbsHbs HbcHbc HbthHbth HbAHbS HbAHbC HbSHbC HbAHbLh HbSHbth HbCHbth HbAHbS HbAHbC Th Locus ThN ThN ThTThT ThNThT ThN ThT ThN ThT * From a review by H. Itano, ~duances in Protein Chemiatfg, volume 12 (C. B. Anfinsen, M. L. Anson, K. Bailey, and J. T. Edsall, editors), Academid Press, p. 215, 1957. 168 THE MOLECULAR BASIS OF EVOLUTION first, that some of the various abnormal hemoglobins (Hb', Hb', etc.) are under the control of a series of genes which seem to be allelic (they are, perhaps, pseudoallelic) and, second, that certain other abnormalities, inclusively termed thallassemias ( ThT 1, Th'r, etc. ) involve genetic abnormalities for which no physical or chemical reflection in the structure of the hemoglobin molecule has been ob- served and which appear to be associated with genetic loci different from the HbA locus. Let us now examine what chemical data we have. The differences observed by Pauling, Itano, and their colleagues in the electrophoretic mobility of normal and sickle-cell hemoglobin might be ascribed to modifications in the amino acid sequence leading to the introduction or deletion of charged side-chain groups. On the other hand, such charge differences might be apparent only and could reflect the man- ner of folding of the polypeptide chains of the protein to expose or to mask charged groups in response to configurational change. A direct test of these hypotheses has been made by V. Ingram,3 who has examined the details of sequence in the molecule (Figure 78) by means of the sensitive "fingerprinting" technique described in the previous chapter. His investigations have made it extremely likely that both sickle-cell hemoglobin and hemoglobin C differ from normal hemoglobin in only a single amino acid residue. The affected portion of the protein is shown in Figure 79. A glutamic acid residue in HbA has been replaced with valine and lysine, respectively, in Hbs and HbC. The corresponding changes in net charge per mole (plus 2 for Hbs and plus 4 for Hba, with respect to HbA) agree with that to be expected from the electrophoretic measurements, and no evi- dence has been obtained for other changes in sequence in the rest of the molecular structure of the protein. We have here, then, a direct test of the proposition that a mutation in a specific genetic locus causes a specific change in the covalent structure of the phenotypic protein related to this locus. Indeed, Ingram's experiments are a test with a vengeance. Not only do the allelic Mendelian genes HbA, Hb", and HbC have to do with a very restricted aspect of structure, but they all appear to be related to the same aspect, namely the se- quence at one unique point. If the sequence of nucleotides in the polynucleotide chain of DNA determines polypeptide sequence, how can we explain the fact that these three genetically segregatable loci all influence the same position in the polypeptide? A particularly intriguing possibility for explaining Ingram's results comes from a consideration of the theoretical model of Watson and Crick for DNA structure. The obligatory pairing of heterocyclic GENES AS DETERMINANTS OF PROTEIN STRUCTURE 169 (0) (h) Figure 78. "Fingerprints" of the peptides produced by digestion of normal hemo- globin (a) and sickle-cell hemoglobin (b) with trypsin. The "fingerprints" were obtained by a combination of electrophoresis and chromatography, more or less as described in Figures 71 and 72. the fingerprints differ significautly. The encircled areas in the figure show where From V. M. Ingram, Nature, 100, 326 (1957). bases in this structure has, as we have discussed earlier, been sug- gested as a basis for the accurate self-duplication of DNA strands. The specific sequences of the bases in the complementary strands of the double helix have also been viewed as a set of coded genetic information which might serve as the fundamental template for pro- tein synthesis. The most popular code form has been one based on "triplets," in which various sets of three nucleotides correspond to a specific amino acid. Employing this idea, we may arbitrarily trans- late the sequence of amino acids in hemoglobin that differs in the three mutant forms into a corresponding nucleotide code as shown in Figure 80. The replacement of a single nucleotide with another within the critical trinucleotide sequence would give us the required 170 THE MOLECULAR BASIS OF EVOLUTION change in code. (The reader will obviously not take all this too seriously. The most improbable hypotheses in science have turned out to be true, however, and this one certainly deserves some serious consideration for its novelty and coherence.) One very interesting question is raised by the existence of three mutant forms of hemoglobin differing from one another in respect to a single "locus." Why, with some 300 ,amino acid residues in a hemoglobin monomer to choose from, has the accident of mutation occurred, and been perpetuated, in the same place three times? The phenomenon is qualitatively reminiscent of the results obtained by Benzer in bis analysis of mutants in the rII region of bacteriophage T4 where he ,observed that, out of many hundreds of mutant colonies selected, a disproportionately great number involved mutation in the Same genetic ~locus, whereas others were modified only rarely. The nonrandom distribution of affected loci, both in the bacteriophage case for which we have a good deal of genetic information, and for human hemoglobin for which we unfortunately have very little, might mean that only certain mutations are "permissible" and that the de- gree of permissibility is slight in most of the genetic material. We might equally well suggest, however, that some unsuspected peculiari- ties of DNA structure favor the modification of some lengths of nucleotide sequence more than others. Most probably, the mutant hemoglobin genes have been preserved because of the selective ad- vantage Ithey have conferred on the affected individuals. , (Sickle-cell anemia, for example, is correlated with decreased susceptibility to clinical malaria.) - - HbA... ~is.Vsl.Leu.Leu.Thr.Pro.Uu.Glu.~ys . . T t - Hb S . . . ;Iis.Val.Leu.Leu.Thr.Pro.VaZ.Glu.&s . . . t t HbC.. . &is.Val.Leu.Leu.Thr.Pro.ip Glu.L+ys . . . t r t Figure 79. The differences in amino acid sequence between normal hemoglobin, sickle-cell hemoglobin and hemoglobin C. The arrows indicate the points of attack by trypsin which have lead to the production of the peptide fragments shown in the figure. GENES AS DETERMINANTS OF PROTEIN STRUCTURE 171 DNA Protein DNA Protein DNA Protein 3. and subsequent study of their chemical structure, may well lead to I ;:.,. xs B another situation like that of the hemoglobins for which the direct chemical consequences of mutation can be shown. Others of the protein systems under investigation, listed in Table 14, also promise to be extremely informative, particularly those involving easily iso- lated proteins like the ,&lactoglobulins of milk. Because of their flexibility as regards genetic analysis, however, the bacteria and bac- teriophages are, at present, receiving the most concerted attention. For example, no less than three laboratories are in the midst of the particular problem of determining the effects of mutation in the h region of bacteriophage T2 on the chemical nature of the phage particle. The host range (h) region of the genetic material of bacteriophage T2 determines whether or not a phage particle will adsorb to a specific bacterial