Zkneral Model for the Chromosomes of Higher Organisms FRANCIS CRICK MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH The model suggests that chromosomal DNA falls into two classes : globular DNA (containing unpaired regions for control) and a much smaller fraction consisting of fibrous DNA which alone codes for proteins. 1 WISH to propose a general model for the structure of the f:hromosomes of higher organisms*. This model is derived .iom ideas and data from many sourcest. Because I have found it impossible to set out my ideas and the supporting evidence in a short space, I merely s ummarize here my conclu- sions. A much fuller account is in preparation and will be submitted for publication in the near future. The model assumes that the DNA in a chromatid is a very long mononeme (see the review by Prescott' and the recent careful work by Laird"), which probably runs continuously from one end of the chromatid to the other. I * I have used the term higher organisms rather than eukaryotes b:cause I want to avoid having to discuss, at this stage, thechromo- ronnes of various lower eukaryotes such as the dinoflagellates and the fungi. + Proper acknowledgments will be given in the fuller paper, but I cannot refrain from mentioning here the very stimulating the+ reticsl paper by R. J. Britten and E. H. DavidsonI! which the reader is stronciy recommended to read in parallel with this one. It contains cxtenmTe references to the earlier literature. The model has three basic features. (1) The coding sequences of the DNA (that is, all those sequences which code.for poly- peptide chains) are postulated to be mainly, if not entirely, in the interbands (as visualized in the giant polytene chromo- somes of the Diptera4). The bands, which contain all but a few per cent of the DNA, are identtied as the control elements. This is illustrated di agrammatically in Fig. 1. A genetic com- plementation group is usually contained in a band plus an interband Thus on this view most of the DNA in higher organisms does not code for proteins but is used for control purposes, as al- ready suggested by F. Vogel6 in 1964. I have calculated that the averages amount of DNA (per mononeme) in an interband of Drosophila'*' is enough to code for an "average" protein of molecular weight 34000-40,000. I have, therefore, adopted tbis speculation as a good working hypothesis. (2) The central idea is that the recognition sites, needed for control purposes in higher organisms, are mainly unpaired single stranded stretches of. double stranded DNA. I call this the Unpairing Postulate and it. is set out diagramma- tically in Fig. 2. It has been &rived from a theoretical con- sideration of the general nature of protein molecules and the probable variety and length of the base sequences which need to be recognized in higher organisms. A particular type of example of this postulate has already been put forward by Gierer9. The argument I give is a general one. It springs from the 1 The average amount of DNA in the mononeme of an interband had been estimated previously by Beermann4. 26 NATURE VOL 234 NOVEMBER 5 11/l control elements) are so large compared with the interbands (the coding elements). In the tirst place, the type of structural motif outlined in Fig. 3 cannot be constructed on too small a scale. Moreover, as B&ten and Davidson' have pointed out in their Fig. IA, multiple control elements may well be needed adjacent to each particular coding sequence. In addi. tion, a set of similar elements may be required within certain bands to help provide a graded response. Finally, it appears to be a general rule that intricate three-dimensional biological structures are always bigger than one might naively expect. The examples of globular proteins, transfer RNA and ribosomes spring to mind. My fuller paper will discuss possible mechanisms for the formation during evolution of these large control regions. It would seem likely that both tandem repetition and transloca- tion will be involved. Whatever the origin of tandemly repli- cated sequences (satellite DNAr3 or otherwise), when they are tirst formed there will be an exact repeat of the sequences both in the paired region (such as the stem of Fig. 3) and in the regions which become unpaired (such as the loop of Fig. 3). However, during evolution, mutations will accumulate at different rates in these different regions. The paired regions will diverge rapidly, since the exact base sequences there do not matter appreciably, but changes in the unpaired regions will occur less rapidly, if at all, because they have to interact with other molecules during control operations. Thus, newly evolved bands (or parts of bands) would be expected to have a closer degree of tandem repetition of their base sequences than phylogenetically older ones. The model, in its simplest form, suggests that the number of different proteins normally produced by higher organisms may not be much more than a few thousand for Drosophila and FIBROUS CODING DNA ' GLOBULAR ' CONTROL DNA Fig. 1 An extremely schematic drawing of the proposed general structure of the DNA of the chromatid. The line represents part of the continuous DNA molecule in the mononemic chromatid. The straight portions correspond to the interband regions of the giant polytene chromosomes of the Diptera, which are postulated to be similar in their general character to the corresponding interphase chromatids, which are the active form. The mitotic chromosome is relatively inert". The DNA sequences coding for protein are postulated to be mainly, if not entirely, in these extended regions. For convenience this DNA is referred to as fibrous DNA. The intricately folded regions correspond to the bands seen in the polytene chromosome**`. No attempt has been made to represent their detailed structure. They are postulated to be the sites of the control regions. The model implies that a genetic complementation group is usually contained in either an interband plus a band or an interband plus part of the bands on either side. When a gene is active the bands are probably at least partly unfolded*. The globular DNA is certainly complexed with chromosomal proteinslo, the fibrous DNA probably so. Thus both should be more strictly referred to as nucleoprotein. fact that for all proteins whose tertiary structure is known, the active site is, to a first approximation, a shallow groove or a cavity and not a protruding piece of the protein structure. The former structures can rather easily be made to provide highly specific interactions, both with small molecules or with extended polymers. However, in double-helical nucleic acid the specific groups on the bases are not protruding but are themselves in one or other of the two grooves formed by the phosphate-sugar backbones. Thus, it might not be easy to design a protein of reasonable size to recognize more than a limited number of base pairs, especially when one remembers the twisted nature of the normal double helix. The postulate has even more force if single stranded RNA is also used to recognize the control sequences on the DNA, as favoured (with reservations) by Britten and Davidson'. My argument is that when very large numbers of different sequences need to be recognized (which implies that the se- quences must not be too short), it will pay to unwind the double helix before recognition. This may be expensive to arrange but in the long run it will provide a much greater abundance of versatility. (3) The forces and energy needed to unpair the recognition stretches of the DNA are provided by the combination of the DNA with chromosomal proteins-probably the histones'O. Although the three-dimensional configuration of a band may be very intricate, such globular structures may be based on structural motifs of various kinds. The most obvious one is a simple supercoiled DNA, that is, a helical double helix, as already suggested by various workers"J', but other more complicated structures are possible". One such example is illustrated in Fig. 3. As far as I know this suggestion is a novel one. The general property of this family of structures is that there are lengths (probably of hundreds of base pairs) of DNA whose exact base sequence matters very little, interspersed with shorter stretches (perhaps of a hundred bases or so) of specific sequences which are probably repeated at very many places along the DNA. On this view the role of the histones is not merely to cover up the DNA but to help the DNA to expose itself in the right places. Plausible general reasons can be given why the bands (the n 4 & PROTEIN 7 R \ \ \ \ `4 UNPAIRED PAIRED- NUCLEIC ACID G"."""""""`..~ NUCLEIC ACID .Jg Fig. 2 Bach line represents the possibility of a highly specific interaction between two macromolecules. One molecule is from the class at one end of the double arrow and the other is from the class at the other end. Notice that no distinction is made between RNA and DNA. Instead, a strong distinction is made between paired nucleic acid (meaning a stretch of double helix) and unpaired nucleic acid which can be either single stranded nucleic acid or unwound stretches of an originally double helical structure. The latter may or may not be refolded to some extent into three-dimensional structures having a complex mixture of paired and unpaired regions9 such as is found in transfer RNA. The dotted line represents the formation of a triple helix, such as poly A+2.poly U. The dashed line represents the recognition by a specific protein molecule of a particular base squence of base-paired double stranded nucleic acid. The solid lines represent interactions of abundant versatility. The very thick iii emphasizes that for any sequence of the usual monomers (base sequence) there is always a complementary base sequence. The d&am doea not deal with relatively unspecific interactions, such as the interaction between a particular protein and a DNA backbone independent of the latter's base sequence. The .Unpairlng Postulate states that the interactions used for control in higher organisms will mainly be chosen from those shown by solid lines in the figure. This implies that double helical DNA will usually have to be unwound at the recognition sites. :JATURE VOL. 234 NOVEMBER 5 1.971 SINGLE sTRDaNNAoED DOUBLE sTRDaNNAoED A-! \ , -A' -`:,y. 3 An example of a possible structural motif within the ,. ,,bular DNA: a twisted hairpin constructed from part of a s&e length of double stranded DNA. The loop itself has become unpaired due to the untwisting effect produced by the stem. The DNA in the stem remains double stranded and forms a double helical double helix, stabilized by chromosomal pro- teins, probably mainly by histones. The figure is highly schematic and the details should not be taken too literally. For example, one type of histone (either as a monomer or a dimer) may bridge two adjacent helices in the sort of way shown by the dashed line. Another type of histone may do the same by lying horizontally, thus bridging strands on opposite sides of the axis of the structure. The loops themselves may or may not be stabilized in special contigurations by chromosomal proteins or by folding `..`c:k on themseives. Since the histones interact mainly with the L.A:kbonelo of the DNA, with little respect for basesequence, the actual base sequence of the lengths of DNA in the stem is not important, at ieast to a first approximation. The loops are postulated to be the sites of the actual control elements. Any particular loop may have an unwinding sequence (to help localize the unwinding-possibly a sequence of A's on one strand), a promoter sequence for the RNA polymerase, or an operator sequence (whether for positive or negative control is left open) or, more likely, some combination of these sequences. The base sequences for the first two functions may be much the same in very many different loops. The operator sequences may or may not be repeated in various other loops, but in general such sequences are likely to be repeated rather less often than the !: her two types postulated. An occasional loop may contain an I::itiator sequence for DNA replication. Since the normal DNA double heI& is right handed, ihe superhelix is more likely to be left handed (as shown in the figure) for mechanical reasons. I thank Drs Graeme Mitchison and David Baillie for instructing me on this point. Strictly speaking, either hand is possible for the superhelix if the histones can distort the basic twist of the DNA double helix in the aoorooriate way. If the second tvne of histone postulated above inter&s witti its neighbours [above and below, near the axis of the structure), this interaction could itself impose one particular hand of twist, since protein molecules are intrinsically handed. The structure shown is only one of a family of similar structures. The simple helical double helix is the Erst member of the family. Another example could be con- :ructed by intertwining a pair of hairpins (of double helical DNA) to form a stem having a quadruply helical double helical structure, with four single stranded stretches in the loop region. Whether the sinele strands within a 100~ reeion can interact. by complementa;y base pairing,. with sir&lar &retches in other loops in the same "band" of a smgle chromatid (or also. in the second structure mentioned. within the same looa reeio& is an open question. I expect this to happen in the highly repetitive "satellite" sequences" found in the heterocbromatin near the centromeresl"pls. In the euchromatin the answer must. in uart. depend on the relative location of different hairpins within the "band" of one chromatid. This is not easy to decide. althounh s'arious attractive models are oossible. fF& examnle.`a double- %ded structure might be fbrmed. roighly de&&d by the geration of a dyad axis along the line AA' in the figure.) Be that ..s it may, the single stranded regions are postulated td mediate the highly specific lining up of the bands of the polytene chromosome' by forming complementary base-pairs between adjacent "chromatids" in the same band. A similar interaction may perhaps take place in meiosis. some tens of thousands for man. It is not, however, completely excluded that, in special cases, multiple coding sequences may be hidden within some of the globular bands, in which case the numbers could be higher. The amount of these special bands, if they exist, might differ markedly in different kinds of higher organ&& The model, which is logically coherent, appears to me to be compatible with a very large amount of experimental data obtained using very different techniques. These include rough estimates from genetic dataI of the number of "genes" in Drosophila and man, the correlation between the number of bands plus-interbands and gedetic complementation groups", the specific pairing between the bands of the giant polytene chromosomes' shown by the study of inversions and so on, the naturei3-ls and general effects of the heterochromatin'8, the large amount of data on nucleic acid hybridization'pz9 and the formation of circles by the technique of Thomas and his colIeagu&J. It can also be ac&mmodated to the data on the "p&g" of polytene chromosomes4, the general nature of the hetero- geneous rapidly turning over nu&ar RNAzl --04 and the appa- rent absence of polycistronic messengers in higher organisms. It is not incompatible with the very scanty X-ray studies of chr~matid~~~~ and the electron microscope pictnres and meas- urementP. Although the model is speculative and not fully detailed, and raises at least as many questions as it attempts to answer, I hope it may serve as a focus for discussion and for the design of experiments. In addition to thank@ many colleagues for their patience in explaining their results and for discussing these problems with me, especially Sydney Brenuer, Leslie Orgei. Peter Walker (see his remarks reported in ref. 26), Gordy Tomkins and David Baillie, I thank especiaIly the organizers of the specialists' meeting in May of this year at Port Crosz6 for inviting me to attend. Note aaI&d in proo/: On the amount of informational DNA in higher orga@sms see T. Ohta and M. Khnura, Nature, .233,118 (1971). . Received September 6, 1971. ' Britten; R. J., and Davidson E. H Science 165 349 (1969). 2 Prescott, D. M., in Aa'vanees B Ceil hology, l'(edit: by Prescott, D. M., Goldstein, L., and McConlcey, E.), 57 (North Holland, Amsterdam, 1970). ' Laird, C. D., Chromosoma, 32,378 (1971). * Beermann, W., in Cell Di@kntiation and Morphogenesis, 24 (North-Holland, Amsterdam. 1966). ' King, D. W., and Bamhisel, M. L., J. Cell BfoL, 33,265 (1967). 6 Vogel, F., Narure, 201, 847 (1964). ' Lindsley, D. L., and Grell, E. H., Genetic Vbrktion in Drosophila melanogaster (Carnegie Inst. Washington, Publ. 627, 1968). ' Maclnnes, J. W., and Uretz, R. B., &fence, 151, 689 (1966). 9 Gierer, A., Nuwe, 212,148O (1966). lo H&tones and Nacleohiprones (edit. by Philips, D. M. P.) (Plenum, New York and London, 1971). I1 DuPraw, E. J., 9NA and Chrnommec (Holt, Rineha&nd Win- ston, 1970). I2 Pardon, J. F., Wilkins, M. H. F., and Richards, B. M., Nature, I3 215,509 (1967). Southern, E. M., Narurc. 227,794 (1970). I4 Jones, K. W., Nature, 225.912 (1970). I9 Pardue, M. L., and Gall, H. G., Science, 168, 1356 (1970). i6 Muller, H. J., in Heritage from Men&l (edit. by Brink, R. A.), 419 (Umversity of Wiinsin Press, Madison, .1967). I7 Shannon, M. P., Kaufmsn, T. C., and Judd, B. H., Generics, 64, ~58 (1970). I6 Baker, W. K., Adv. Gmctics, 14,133 (1968). l9 Sutton, W. D., and McCullum, M., Nature, 232,83 (1971). po Thomas, jun., C. A., Hamkalo, J. MoLBiol., 51, 621 (1970). B.A., Misra, D. N., and Lee, C. S.. l1 Schenzr. K., Spohr, G.. Grsnboulan, N., Morel, C., Grosclaude, J.. and Chad, C., Cold Spring l&arbor Symp. Quant. Btol., 35,539 (1970) and other papers in the same volume. " Georgiev, G. P:, 3. l&wet. Bioi., 25,473 (1969). " Ryskov, A. P.. and Georgiev, G. P., Febs Ikrrers, 8, 186 (1970). `* Coutelle, C., Ryslcov, A. P.,. and Georgiev, G. P., Febs. Letters, as dk? %g7ohd Nkohieff k J` Mol. B&l., 7, 142 (1963). l* Nature `Ne$ Biology, 231,68'(19!1): \