NO. AS09 December 30, 1961 `NATURE 122i GENERAL NATURE OF THE GENETIC CODE FOR PROTEINS @ DR.I R. J./WATTS-TOBIN - Medical Research Council Unit for Molecular Biology, Cavendish Laboratory, Cambridge HERE is now a mass of indirect evidence which suggests that ths amino-a&d sequence along the polypeptids chain of a protein is determined by the sequence of the bases along some particular part of the nucleic acid of the genetic material. Since there are twenty common amino-acids found throughout Sature, but only four common bases, it haa often been surmised that the sequence of the four baaes is in soms way a code for the sequence of the amino- acids. In this article ws report genetic experiments which, togsther with the work of others, suggest that the genetic code is of the foUowing general type: (a) A group of three bases (or, leas likely, a multiple of three bases) codes one amino-acid. (b) The code is not of the overlapping type (see Fig. 1). (c) The sequence of the baass is read from a fixed Btarting point. This dstsrminsa how the long sequences of bases are to bs correctly read off as triplets. There ars no special `commas' to show how to select the right triplets. If the starting point is displaced by one bass, then the reading into triplets is displaced, and thus becomes incorrsct. (d) The code is probably `degenerate'; that is, in general, one particular ammo-acid can be coded by one of several tripieta of bases. The Reading of the Code The evidence that the genetic cods is not over- lapping (see Fig. 1) doss not come from our work. but from that, of Wittmannl and of Tsugita and Frasnkel-Conrat on the mutants of tobacco mosaic virus produced by nitrous asid. In an overlapping triplet code, an alteration to one baas will in general change three adjacent amino-acids in the polypeptide chain. Their work on the alterations produced in the protein of the virus show that usually only one amino-acid at a time is changed a8 a result of treating the ribonuclsic acid (RNA) of the virus with nitrous acid. In the rarer cases where two amino-acids are altered (owing presumably to two separate deamma- tions by the nitrous acid on one piece of RNA), the altered amino-acids ars not in adjacent positions in the polypeptide chain. Brsnnera had previously shown that, if the code were universal (that is, the same throughout Nature), then all overlapping triplet codes were impossible. Moreover, all the abnormal human hremoglobins studied in detail4 show only single amino-acid changes. The newer experimental rssulta ssssntially rule out all simple codes of the overlapping type. If the code is not overlapping, then there must be Borne arrangement to show how to select the correct triplets (or quadruplets, or whatever it may be) along the continuous sequence of bases. One obvious suggestion is that, say, every fourth baas is a `comma'. &other idea is that certain triplets make `sense', whereas others make `nonsense', as in the comma-free codes of Crick, Griffith and Or&j. Alternatively, the correct choice may be made by starting at a fixed point and working along the sequence of bases three (or four, or whatever) at a time. which we now favour. It is this possibility Experimental Results Our genetic experiments have heen carried out on the B cistron of the rn region of the bacteriophage T'4, which attacke strains of Eschmichia coli. This is the system so brilliantly exploited by BenzeP*`. The rn region consists. of two adjacent genes, or `cistrona', called cistron A and cistron B. The wild- type phags will grow on both E. coli B (here called B) and on J!?. coli K12 (a) (here called K), but a phage which has lost the function of either gene will not grow on K. Such a phags produces an r plaque on B. Many point mutations of ths genes are known which behave in this way. Deletions of part of the region are also found. Other mutations, known as `leaky', show partial function; that is, they will grow on R but their plaque-type on B is not truly wild. We `report hers our work ,on the mutant P 13 (now renamed FC 0) in the Bl segment of the B cistron. Thie mutant was originally produced by the action of proflavins. We@ have previously argued that acridines such aa pro5vin act as mutagens because they add or dslsts a base or bases. The most striking evidence in favour of this is that mutants produced by a&dines are seldom `leaky' ; they are almost always completely lacking in the function of the gene. Since our note was published, experimental data from two eourcsa have been added to 0u.1: previous evidence: (1) we have examined a set of 126 pn mutants made with acridine yellow; of these only 6 are IeaLT- (typically about half the mutants made with base analogues are leaky) ; (2) Streisinger lo has found that whereas mutants of the lysozyme of phage T4 produced by baas-analogues are usually leaky, all lysozyme mutants produced by proflavin are negative, that is, the function is completely lacking. If an acridine mutant i,3 produced by, say, adding a base, it should revert to `lvild-type' by deleting a bass. Our work on revertants of FC-0 shows that it-usually Starlinq point 3 ,, ;$I Overlappirq code +7 NUCLEIC ACID * I' ' ' ' ' ' ' --- ,-J+-~---- 1 3 ' ETC. Non-overlapplnq Code Fig. 1. To show the difference between an overlapping code and a non-overlappinu code. The short wrticnl lines represent the bases of the nucleic acid. The czw illustrated is for a triplet code 1228 NATURE December 30, 1961 VOL. 192 reverts not by reversing the original mutation but by producing a second mutation at a nearby point on duced as suppressors of these suppressors. Again all the genetic map. That is, by a `suppressor' in the these new Suppressors. are non-hky T mut&s, and all map within the Bl sc,ment for one site in the same gene. In one case (or possibly two cases) it r B2 segment. may have reverted back to true wild, but in at least 18 other cases the `wild type' produced was really a double mutant with a `wild' nhenotvne. Other workers'1 have found a similar- pheno%ienon with rn mutants, and Jin.l&* has made a detailed analysis of suppressors in the hm gene. The genetia map of these 18 suppressors of PC 0 is shown in Fig. 2, line a. It will be eeen that they all fall in the B1 segment of the gene, though not all of them are very close to PC 0. They scatter over a region about, say, one-tenth the size of the B cistron. Not all are at different sites. We have found eight sites in all, but most of them fall into or near two close clusters of sites. In all cases the suppressor was a non-leaky r. That is, it gave an r plaque on B and would not grow on K. This is the phenotype shown by a complete deletion of the gene, and shows that the function is lacking. The only possible exception was one case where the suppressor appeared to back-mutate so feat that we could not study it. Each suppressor, as we have said, fails to grow on K. Reversion of each can therefore be studied by the same procedure used for FC 0. In a few cases these mutants apparently revert to the original wild- type, but usually they revert by forming a double mutant. Fig. 2, lines b-g, shows the mutants pro- Or& again we have repented the process on two of the new suppressors, with the same general results, aa shown in Fig. 2, lines i and j. All these mutants, except the original PC 0, occurred spontaneous1.y. We have. however, pro. duced one set (as suppressors of PC 7) using acridin? yellow as a mutagen. The spectrum of suppressors we get (see Fig. 2, line h) is crudely similar to the spontaneous spectrum, and all tho mutants &I`? non-leaky 6s. We have also testred a (small) selection of all our mutants and shown that their reversion. rates are increased by acrid& yellow. Thus in all we have about eighty independent 7 mutants, all suppressors of FC 0, or suppressors of suppressors, or suppressors of suppressors of sup- pressors. They all fall within a limited region of the gene and they are all non-leaky r mutants. The double mutants (which contain a mutation plus its suppressor) which plate on K have a variety of plaque types on B. Some are indistinguishable from wild, some can be distinguished from wild with difBculty, while others are easily distinguishable and produce plaques rather like r. We have checked in a few cases that the pheno- menon is quite distinct from `complementation', since the two mutants which separately are pheno- typically r, and together are wild or pseudo-wild, A- - I - l I I I I I I I I I I 4,FC I(-) I I / 40 I I -- -- I I I 32 28 .U ALL + 80 76 78 75 77 74 I ALL I - ' Fc47(+) 83 I a7 I -- I , I Bla ngmrnt I Bibi ' Blbz ' 82 I'ig. 9. -4 tentative IWIpT-Only very roWuY to D.Xie--Of the left-hand end of the B cistron. showing the position of the FC family of mutants. The order OfsItes witbin the re@ons covered by brackets (at the top of the 0gcre)is not known. only been located approximately. xutsnts in italics hnrc EaCh the repreeenta the su~preacoors picked up from one mntant. namely, that marked on the Lint In bold figures 1220 NO. 4809 December 30, 1961 NATURE ;nust be put together in the same piece of &netio platorial. A simultaneous infection of K by the two mutants in separate viruses w$ not do. The Explanation in Outline .Our explanation of all these facts is based on the beory set out at the beginning of this article. Although we have no direct evidence that the B cjstron produces a polypeptide chain (probably through an RIiA intermediate), in what follows we shall assume this to be so. To fix ideas, we imagine that the string of nucleotide bases is read, triplet by t+let, from a starting point on the left of the B cistron. We now suppose that, for example, the mutant FC 0 was produced by the insertion of an &htional base in the wild-type sequence. Then this &iition of a base at the FC 0 site will mean that the reading of all the triplets to the right of E% 0 will be shifted along one base, and will therefore be incor- rect. Thus the amino-acid sequence of the protein which the B cistron is presumed to produce will be completely altered from that point onwards. This explains why the function of the gene is lacking. To simpiify the explanation, we now postulate that a suppressor of FC 0 (for example, FC 1) is formed by deleting a base. Thus when the FC 1 mutation is present by itself, all triplets to the right of PC 1 will be read incorrectly and thus the function will be absent. However, when both mutations are present in the same piece of DNA, as in the pseudo-wild double mutant PC (0 + 1). then although the mading of triplets between PC 0 and PC 1 will be altered, the original reading will be restored to the rest of the gene. This could explain why such double mutants do not always have a true dd phenotype but are often pseudo-wild, since on our theory a small length of their amino-acid sequence is different from that of the wild-type. For convenience we have designated our original mutant FC 0 by the symbol + (this choice is a pure convention at this stage) which we have so far con- sidered as the addition of a single base. The suppres- sors of FC 0 have therefore been designated - . The suppressors of these suppressors have in the same way been labelled as + , and the suppressors of these last sets have again been labelled - (see Fig. 2). _ We can now ask: What is the character of any double mutant we like to form by putting together in the same gene any pair of mutants from OUT set of about eighty ? Obviously, in some oases we already know the answer, since some oombinations of a + with a - were formed in order to isolate the mutants. But, by definition, no pair consisting of one + with another + has been obtained in this way, and there are many combinations of + with - not so far tested. Now our theory clearly predicts that all combins- tions of the type + with + (or - with -) should give an r phenotype and not plate on K. We have put together 14 such pairs of mutants in the cases listed in Table 1 and found this prediction confirmed. Table 1. DOUBLE lmANTS Fuvmo mm I PEENOTPPE - wttil - + With + FC(1 + 231 PO (23 + 21) $$I! = "95; FC (40 + 67) FCC0 + 40) PC (40 + 68 ;g $ = if' E'C(0 + 65) FC(40 + 65 1 3% (0 + 64) FCC40 + 64) FO(40 + 38) -. At &at sight one would expect that all combinations of the type ( f with - ) would be wild or pseudo-wild, but the situation is a little more intricate than that, and must be considered more closely. This springs from the obvious fact Chat if the code in made of triplets, any long sequence of bases can be read correctly in one way, but incorrectly (by starting at the wrong point) in two different ways, depending whether the `reading frame' is shifted one place to the right or one place to the left. If we symbolize a shift, by one place, of the reading frame in one direction by + and in the opposite direction by c, then we can establish the convention that our + ia always at the head of the arrow, and our - at the tail. This is illustrated in Fig. 3. We must now ask: Why do our suppressors not extend over the whole of the gene P The simplest postulate to make is that the shit of the reading frame produces some triplets the reading of which' is `unacceptable'; for example, they may be `nonsense', or stand for `end the chain', or be unacceptable in some other way due to the complications of protein structulw. This means .that a suppressor of, say, PC 0 must be within a region such Double Mutants r----r---- T----T----r----r-- --,-----T----T- `A B C;A B C;A B C;A B C;A B C;A B C;A B C;A 0 Ci I.. I ., , . ..* *. * . . . . *. . ..(&- * + )----- T----c----T----T----T----T---`T'-'-T" `A B C'A B C'B C A'B C A'B C A'B C A'A B C'A B C;, 4 4 deletion oddhm t e r----r`--- r- --- T----r----r----T----c----r- `A B C;A B C'C A B'C A B;C A B'C A B'A BC'A 8 CL, 4 t c Q oddltm deletion Stortmq poult &!. 3. To show that our convention for mom is consistent. The letters A, B and C each re:nesent n different base of the nucleic acid. For aimplkity a repeating ~equencs of bases. ABC, is shown. (This would code for a polypeptide for which every amino-acid Wa8 the snme.) A triplet code is assumed. The dotwd lines represent the imaginary `rending irnme' implying that the sequence iS read In nets of three starting on the left that no `unacceptable' trip& is pro- duced by the shift in the reading frame between PC 0 and its sup- pressor. But, clearly, since for any sequence there are tun, possible mis- readings, we might expect that the `unacceptable' triplets produced by a --+ shift would occur in dif- ferent places on the map from those produced by a c shift. Examination of the spectra of suppres!3ors (in each case putting in the arrows --f or c) suggests that while the - shift is acceptable anywhere within our region (though not outside it) the shift c, starting from points near PC 0, is acceptable over only a more limited stretch. This is shown in Fig. 4. Some- where in the left part of our region, between FC 0 or PC 9 and the FC 1 group, there must be one or more unacceptable triplets when a - shift is made; similarly for NATURE December 30, 1961 vOL :92 the region to the right of the PC 21 cluster, Thus we predict that a combination of a + with a - will be wild or pseudo-wild if it involves a + shift,, but that such pairs involving a c shift will be phenotypically T if the arrow crosses one or more of the forbidden places, since then an unacceptable triplet will be produced. Table 2. DOWLE MUTANTS OF TEE TYPE (+ PXYJE - ) FC41 PC0 FC40 FC42FC63'FC63 FC38 FS w w FC 86 K it w z m FC 21 r TV W W FC88 I r w w FC87 I I r I W - W, wild or pseudo-wild phenotype; r, wild or pseudo-wild com- bination need to isolate the suppressor; I, r phenotype. o Double mutants formed with PC 58 (or with PC 34) give sharp plaques orl K. We have tested this prediction in the 28 cases shown in Table 2. We expected 19 of these to be wild, or pseudo-wild, and 9 of them to have the r phenotype. In all cases our prediction was correct. We regard this as a striking confirmation of our theory. It may be of interest that the theory ~88 constructed before these particular experimental results were obtained. o Rigorous Statement of the Theory So far we have spoken as if the evidence supported a triplet code, but tti ~8s simply for illustration. Exactly the same results would be obtained if the code operated with groups of, say, 5 basea. Moreover, our symbols + and - must not be taken to mean literally the addition or subtraction of a single base. It is easy to see that our symbolism is more exactly 8s follows: + represents +m, modulo n - represents -m, module n where n (a positive integer) is the coding ratio (that is, the number of bases which code one amino-acid) and m is any integral number of beses, positive or negative. It can also be seen that our choice of reading direction is arbitrary, and that, the same results (to a first approximation) would be obtained in whichever direction the genetic material was read, that is, whether the starting point is on the right or the left of the gene, 88 conventionally drawn. Triple Mutants and the Coding Ratio The somewhat abstract description given above is necessary for generality, but fortunately we have convincing evidence that the coding ratio is in fact 3 or a multiple of 3. This we have obtained by constructing triple mutants of the form ( + with + with + ) or (- with - with -). One must be careful not to make shifts Table 3. TRIPLE MUTAR~TS EA~NO A WILL OR PSEUDO-X~ILD PHENO- FC(O?i: + 38) FCC0 + 40 + 68) FCC0 + 40 + 67) FCC0 + 40 + 64) FC (0 + 40 + 66) FC (1 f 21 + 23) KO KI ;- ; - a.----- I Fc*, &T- ..: FC40 FC6 ;- at FC 38 - K- + *I Kb! FClll - + w I FCbb o ? ???? ? ?? ???? FC74; ? ,. I m Fab; + L,fC;: Fig. 4. A simpli5ed version of the genetic map of Fig. 2. Each line corresponds to the sup~~ressor from one mutant, here under- lined. The arrows show the range over Khich suppressors have 80 far been found, the extreme mutants being nsmed on the map. &TOW to the right are shown solid, arrows to the left dotted aoross the `unacceptable' regions for the c shift%. but these we can avoid by a proper choice of mutants. We have 60 far examined the six cases listed h Table 3 and in 8ll cases the triples are wild or pseudo. wild. The rather striking nature of this result can be seen by considering one of them, for example, the triple (PC 0 with PC 40 with FC 38). These three mutants are, by themselves, all of like type (+ ). JVe can say this not merely from the way in which they were obtained, but because each of them, when combined with our mucant FC 9 (-), gives the wild. or pseudo-wild phenotype. However, either sin& or together in pairs they have an 1' phenotype, antI will not grow on h'. That is. the function of thr. gene is absent. Nevertheless, t,he combination of all three in the same gene partly restores the function and produces a pseudo-wild phage which grows on 1;. This is exactly what one would expect, in farourablfb cases, if the coding ratio were 3 or a multiple of 3. Our ability to find i;he coding ratio thus depends on the fact that, in a.t least one of our composirl, mutants which are `wild', at least one amino-acid must have been added to or deleted from the poly- peptide chain without disturbing the function of tilt, gene-product too greatly. This is a very fortunate situation. The fact that n'c can make these chm,ges and can study 80 large fl region probably comes about because this p8rL of the protein is not essential for its function. Tl~lr this is so has already been suggested by ChampI' and Benzerl* in their work on complementation in tll+, t-m region. By a special test (combined infection `~1 K, followed by plating on B) it is possible to exan~ill,. the function of the .4 c&on and the B cistroll separately. A particular deletion, 1589 (see Fig. 5: covers the right-hand end of the A cistron and plu'l of thp left-hand end of the B cistron. &hod' 1589 abolishes the A function, they showed that 1: allows the B function to be expressed to a consider81)? extent. The region of the B cistron deleted bF 13$!' is that into which all our FG mutants fall. Joining two Genes Together We have used this delet,ion to re-inforce our i&`,' that the sequence is read in groups from a fix-l? starting point. Normally, 8n alteration cOd~lt"' to the A cistron (be it a deletion, an a&dine 1nll~8~~" or any other mutant) does not prevent the expre~Gio'! of the B cistron. Conversely, no alteration witllir: the B cistron prevents the function of the A ciSt@ This implies that there may be n region between t"" NO. 4809 December ;30,- I%1 NXTU R E 1231 6; oistrons which sep8rates tthem and allows their b&ions to be expressed individually.