X-RAY ANALYSIS AND PROTEIN STRUCTURE By F. H. C. CRICK and J. C. KENDREW The Medical Research Council Unit for the Study of the Molecular Structure of BIologIcal Systems, Cavendbh Laboratory, Cambridge, England CONTENTS PAGES I. Illtroductiorl.. .......................................................... 134 II. The Nature of X-Ray DifTraction. ....................................... 135 1. Diffraction from a Crystal ............................................ I35 a. Crystal Lattice and Reciprocal Lattice. ............................ 137 b. Structure Determination ........................................... I39 c. Symmetry..........................................................l4 0 2. Diffraction froma Fiber ............................................... 142 a. The Interpretation of Fiber Diagrams .............................. 144 3. Stereochemistry and Molecular Packing. .............................. I46 4. General Remarks. . : ...... .: .......................................... I51 5. Sulnmary ............................................................. 153 III. Fibrous Proteins and Synthetic I'olypeptides .............................. I54 1. Synthetic l'olypeptides (and Silk). .................................... 154 a. a-l'olypcptides .................................................. 155 b. &l'olyprptides and Silk. ......................................... 158 c. I'olgproline. ................................................. 159 d. I'olgglgcine ................................................ 159 2. Fibrous l'rofeins .................................................... 163 a. The n-Kcral.in I':lf.tcrn ..... ... ......................... lC3 b. The fl-Kcratin l'attcw ......................................... 165 c. Coll:1gcrt ................................................... IGG IV. Crystalline I'rotcius. .................................................... 170 1. The Nature of Protein Crystals ...................................... 170 2. Direct Information ................................................... 174 a. Unit Cell and Space Group ......................................... 174 b. hlolecular Weight .................................................. 175 c. Identification and Identity ......................................... 176 d. The Shape of Protein Molecules ................................... Iii 3. The Patterson Function.. .. ......................................... Ii8 4. Methods Involving Heavy Atoms. ................................... Ii9 a. The IIcrvy Atom Method ..................................... Ii!) b. The Mnthotl of Isomorphotls Rel~laccmcut. .................... ISO c. Isnrnoq~l~nr~s Rcpl:~cmcut nntl the Strll(*turc of I~cmoglol~iu ........ ls3 d. Isomorphous Rcplncemeut and t,hc Structure of Myoglobio. ........ 18s 133 134 F. H. C. CRICK AND J. C. KENDBEW X-KAY ANALYSIS ANL) J'JLOTISIN STILUCTURJC 1:El e. Isomorphous Rcplaccmcnt and the Structure of Ribonuclease. ........ 182 f. Requirements for Isomorphous Replacement ........................ 193 5. The Chain Configuration of Globular Proteins. ........................ 197 V. Viruses. ................................................................. 199 1. Rod-shaped Viruses. .................................................. 200 a. General Features of TMV. ......................................... 200 b. X-Ray Results: Basic Features. .................................... 201 c. X-Ray Results: the Internal Structure. ............................ 202 d. Correlation between X-Ray and Other Results ...................... 206 2. Spherical Viruses ..................................................... 207. a, TomatoBushy Stunt Virus ......................................... 207 b. Turnip Yellow Virus. .............................................. 207 3. General Principles of Virus Structure. ................................ 208 References. .................................................................. 210 I. INTRODUCTION The last account of X-ray studies of proteins in Advances in Protein Chemislry was written by Fankuchen in 1945 (apart from Corey's article on Amino Acids and Peptides, in 1948). Dramatic progress has been made since then, particularly in studies of synthetic polypeptides and fibrous proteins; and although the goal of the protein crystallographer-the complete determination of t.he structure of a globular protein-remains unachieved, there have been considerable advances in method which should shortly pay dividends. In this review we shall not try to cover the whole field but only the more successful and the more pregnant parts of it. One major omission is a description of the large-scale structure found in fibrous proteins such as collagen, the muscle proteins, etc. We have not discussed in detail results from other techniques such as infrared, optical rotation, etc., althougb we refer briefly to the rcsult,s of such studies. The comprehensive articles hy Low (195X) and by Kendrew (1954b) in The Proteins should be consulted for background material and for historical details, and for a .summary of the most recent advances, a recent revielv by Kendrew and PeJlJtz (1957). We shall assume t,hat the reader is familiar with proteins, but unfamiliar with c~rystallogr:~~)hy. Wc shall not, thcrcfort?, set ortt crystlallographic: arguments in detail, l)ut shall 1 ry r:~t,lier to present a bird's eye view of the sul,jcc:t. This will c~lable the biochemist at kast t,o catch t.he drift of crystallographic discussions; and also to gain some impression of which parts of the subject arc spwula~ ivc and which park certain. Eksidcs, protein (:rystaIlograI)flcrs iicctl help from hiochrmists; it, will he part of our purpose to indicate wlirrr help is most nrcded. M'e havn ~llrtl our revic!Iv ",Y-Ilny Rtmlysis arid J'rotciti Stzli(4urc." In last, ypar's Adrwcfs irk Prokin Clwnislr~ nn excclletit arUe appeared by Anfinsen and Redfield (1956) entitled "Protein Structure in Relation to Function and Biosynthesis." Here we shall use the term "protein structure" in an entirely different sense-indeed, there is very little common ground between the two articles. Whereas Anfinsen and Redfield have concerned themselves with the amino acid sequence and topological interconnections of the polypeptide chains, we shall consider mainly the geometrical aspects-the arrangement of the atoms in space. As the complexity of molecules increases, the geometrical aspects of structure become more and more impoktant, and it is less and less possible to explain chemical behavior without taking into account dimensions and exact geometrical relationships. The strength of the X-ray approach to protein structure is that it alone among all the techniques available can hope to provide this precise quantitative information about molecules as complicated and as delicate as proteins. However, it is almost always an advantage in obtaining the geometrical structure of a molecule to know $8 much as possible about its chemistry; in particular a knowledge of the amino acid sequence of `b protein should be a very considerable, if hot indispensable, help to the crystallographer. It is likely to be a very long time before X-ray analysis can obtain by itself the amino acid-sequence of a protein. Both methods-amino acid sequence determination and X-ray diffraction-will be necessary to obtain the complete structure, chemical and geometrical, of a protein. II. THE NATURE OF X-RAY DIFFRACTION . X-ray diffraction is not a difficult branch of physics: on the contrary, it is easy to t,he point of t,ediousncss. The widespread view that ii. is litiiiit~elligihle has arisen because a cert,aiii intellectual rffort, is ric~lrd to grasp it.s mnt~licniati~al foundat,ions, riud brrausr it, is supposrtl, infwr- revtly, th:kt, sonic spwin! type of "thrrc~-dirncJiaiolinl imagination" is :I prc- rrquisite for untlcrst:mding its methods and results. 111 this r;c4otl ~1 shll not attempt1 t,o cspound the I)nsic~ throry, Imt, rat,hcr to c~h:lr:lc+wizc~ sonar of thf? I)rond ffxturrs of ,X-ray dilTr:wtitrri, so tlwt ttiv rv:i,lvr 1113)' Iwcwmc~ f:imili:tr with t,hc iiiiport:int, wiwclpts :ind with soin(' of OW j:~rg~it. `I'0 tliosr who \visli to Imild on :I firnwr fouiitl:il ioll \w rwon~in~wl t I)(> 111fw~~ O~thOth :L~~r)t'O:ll'h ill SW11 il,TOUJlt S :LS ~hOSP Of' hlg~ (I!):{!)), .j:lJllW ( I!,:i()), Rutiu (I!W), ant1 ltobcrtson (l!W). 1% II`. Il. I'. CIU(:li ANI) J. C. KISNI,IiIF\V X-RAY ANALYSIS AND PROTEIN STltUCTUllE 13i distance behind the rrystal. \i%at does the picture on t,he plate look like? h rather nice example is given in Fig. 1. (To get this picture to look SO pretty there had to be a good deal of "hokey-pokey"-complicated movements of the crystal and of the photographic plate, as well as a special screen.) The reader will at once be struck by t,wo features of the picture: its regularity and its symmetry. He should not be surprised by them, however, because regularity and symmetry are among the most important properties of crystals themselves. A crystal consists of a regular three-' dimensional lattice of "unit cells," such that eGery unit cell has the same relation to its neighbors; and the contents of every unit cell are the same, namely a number of identical molecules related to one anot,her by sym- metry elements, such as rot,ation or screw axes or planes of symmetry. In an analogous manner, t,he X-ray picture exhibits regularity, which lies in the positions of t.he spot,s. They form a regular t,mo-dimensional array or lattice. That is to say, the distance between a spot and its neighbors is the same no matter where on the picture the spot may be. The only exceptions are those cases where the spot is too weak to be shown on the photograph; and this draws our attention to two other features of the picture. First, while the positions of the spots are regular, t,heir hlarknesses (their int,ensities, that is) differ--some arc strong, some weak. Second, the pattern exhibits symmetry; that is to say, it can be divided illt,o four ct~~:wl,crs which are ident.ical. Other X-ray phot,ographs of t,his sort might have shown dilfcrcnt types of symmetry, but some there wnultl :hxys h?. What, thc11, is the signilicatrcc~ of each particular spot at such and such a position on the photograph and of such and such a blackness? What,, iu fact, is it, that, sc:attcrs 1.1~: X-r:lys? To the last qur5tion we can give a GInpie :IIM\VV: it is c:iec*trons which sontt,er X-rays, in this case tha elec~l I'OIIS of thcx :ItSorns it1 t tic cbrystal. Atoms have fin&e sizes and electrons +st ril,utcb thctrnsc4ves ov(tr atoms and the bonds which join t,hem. It is couvetlicnt to think of a crystal as a three-dimensional pattern of electron densit?/ whic41 reaches high values near the renters of atoms and tow or zero WhlW in the SpUWS I)ct,!Vecll. It can bc shown that t,hc X-ray picture represc*nts a "wave analysis" (sometimes called Fourier analysis) of this elertron tlcnsity. When we make a wave analysis of a carystal we think of it as madc up of a very large number of waves of etcctron density, running' in many different, directions through it. If we hnve carried out the analysis cnorrcctly, we shall find that when we add together all these waves--c:arh of t,hc corrcrt, siza (amplitude) and to the right extent in or out of step with its neighbors (phase)-we get back to the actual electron density of the crystal. This is a three-dimensional wave analysis, often known as a Fourier analysis, and the reverse process is a wave (or Fourier) synthesis. We are familiar with analogous processes involving only one dimension. In music, a harmonic analysis of the profile of sound produced by, say, a violin playing a steady note, gives us a fundamental and a series of harmonics, each separately being a simple or sinusoidal wave, and all of them adding up (synthesizing) to re-form the original profile. The significance of a particular X-ray spot is that it corresponds to one of these (imaginary) sinusoidal waves of electron density. The positim of the spot on the picture shows us both the direction of the wave and its wavelength. If the spot is near the center of the picture, the corresponding wave of electron density is one having a large wavelength. If it is far from the center it corresponds to a wave of short wavelength. Thus the oicter purts of X-ray pictures are concerned with fine details of structure (high resolution), the inner parts with broad features (low resolution). The direction of the spot, relative to the center of the picture, shows the direction of the electron density wave. Thus a spot vertically above the center corresponds to a wave of electron density in the crystal whose direction is vertical-i.e., to horizontal layers of high electron density separated by regions of tow electron density. The intensity of the spot is related to the amplitude of the wave-in fact the square of the amplitude is proportional to the intensit,y of blackening of the plate-and an intense spot, implies that t,his particular electron densit,y wave must be of large amplitude for the structure in question. a. Crystal I,atticc and Reciprocal Lalticc. In this sec4ion w\`c shall explain t'he concept of the "reciprocal lattice," which is nothing more than an abstract way of representing the diffrartion pattern. A crystal is :I t.hrce- dimansiollal sl.r~lcture, but the pic:t,ure in Fig. 1 is clearly a tn,o-dilllctlsioll:ll atrair; and KC must now confess that, it coI&~ins not t.hc whole diffrac:t.iun pattern of a cbrystal but. only part of it. Thn reader is to imagine that the complete pahtern is a three-dimellsioll:Il latbice of X-ray spot's, of which Fig. 1 is just one particular plane-actually a plane going through the origin. This t.hree-dimensional lattice is known as the reciprocal lattice of the crys- tal, and it, is import,ant to have a gencratl pict,urc of its propcrt,ies. X-ray cameras are merely devices which allow a part of t,he three-dimcnsioual reciprocal Iat,tiee t>o be recorded on a two-dimel~sioual photographic plate in a systematic way so that spots can readily be itlent,ified ("indexed"). IN- fcrent t,ypes of cameras may therefore present t,he same array of refiec%ious arranged in different ways; but whichever way one takes the picture thcrc is one characteristic of an X-ray spot whitah can always be obtained di- rectly from it. This is its "spacing"; that is t,o SRY, the wavtalcngth of I hi imaginary wave of electron density (`I Fourier component") to which it. cor- responds. 138 I<`. 11. I'. f'lfI(`li .\NIl J. (`. KI~:l\`l)lll':W X-ILAY ANALYSIS AND I'IWTEIN STRUCTUIW 139 As WC have alr~tly point4 out,, the Zarger the distnuce of t,he spot from the center of the pict,ure (t,hat is from the origin of the reciprocal lattice), the smaller the "spac,illg'`--hellce the word reciprocal. And the stronger a given spot, the larger the amplitude of the corresponding Fourier component; in other words, t,he larger the number of electrons (and there- fore of atoms) clustered near the anti-nodes of the wave. For example, a FIG. 1. A txpir:d S-MY phol~c~gr:~J~h of :I l)rotcir~ crystal. Notirc that the spots form 3 reg~d:w 1 \\o-tJirl~ollsir)ll:tl latlice; IIO(P :dso the symmct.ry. The picture shows only a smnll Jurt of 1.110 wmplrte X-ray diffrsrtion pattern of the crystnl. (After Kendrew and KradI: finlmk wh,zlc rnyoglobin, type F, c projection). st ro~~g spot, above: thcl origin (meridional) with a spacing of 1.5 A. implies ttiat in t.lica crystal there are c*ctrtain horizontal parallel planes 1.5 A. apart,, near whic4l many atoms c4ustcr. Fiually, from the regrllar dimensions of t,he rec:iproc*al Iatticbc OIIP c'an tlirec+t.ly calculate t,he dimensions of t,hc rc*pclating unit, or lulit. (VII of t hc c*ryst,al-Land ~WRIISC of the reriprocsal rrlationship, thca smallor t hc ullit wII the grratrr the dist,ancc apart of the spots in thr rc>c+l)roc4 Ml ice. To rccaapit ulatc our niusic~al analogy, t,lie urlit, caell dimensions :IW 1110 t Ilrc'P-tlirrleltsiotlal analog of the wavelength of the "furldamrlltal IOIIC~" in :t musical sound. b. Strzlcfurr lletermination. The relationships between the real lattice- t.hat is, the real three-dimensional crystal or, rather, its three-dimensional repeating pattern of electron density---and the reciprocal lattice (or X-ray pattern) are very intimate ones. Given the position of all the atoms in the unit cell of a crystal it is a straightforward, if sometimes lengthy, matter to calculate the entire diffraction pattern of the crystal. This is interesting, but not often useful. It is the reverse process, given the diffraction pattern to discover the structure, which one more often has to contend with. Unfortunately it is by no means so simple to carry out. There is, in fact, a fundamental reason why one cannot calculate the unknown structure from the experimental data merely by the use of some mathematical sausage machine such as a high-speed computer. In order to combine correctly all the (imaginary) waves of electron density which build up the correct structure it is necessary to know not only the amplitude of each of them (which one obtains from the blackness of the corresponding spot in the picture) but also its phase-that is to say, how far each train of waves is out of step with its neighbors-and this information is not given by the experimental data. In other words, the experimental data contain just half the required information. Onk has, therefore, the curious situation that, if the structure can be correctly guessed one can check it against the X-ray data in a str$ghtforward way; but t,he structure cannot be deduced from the data in a routine manner except in certain special, and very simple cases. There are, however, various stratagems which allow one sometimes to make a rat,her good guess at the st,ructure-espe- cially if one knows something about it bcforc one starts; but guesswork is always involved, and it, is this whicbh makes c*ryst.allography sol~lc*thing of an art. l'h: pursuit, of a strucbt,urc is rather like huntSing: it recluircs SOIIW skill, a knowledge of the vi&n's hahitP, ant1 II ccartaiii amount of !OM (*mining. A numlwr of lhc slr:~li~gcn~s uscfld for solvitlg striirturcs will t)c mcntionctl lclrr. Often lhe most useful is sllccr intuition, txwxl 011 cxJxricmx zntl on nht 11~2 chemists hwe :drc:~dy discovered about the formnl:~ of the molecule. For structures of motl- eratc comJ&sity the most J)owerful is J)robnbl?; the I':ltterson synthesis, whose pro,,- ertics and aJq~lir:iliona in t.hc I,iologic:rl field IXLVC l)ccn fully dcscribcd 1,~ one of us (Kendrcn- :~ntl I'crtitx, 1049; J displaced up or t1ow11 relative to one a11ot.hcr in a random 1va.y. A fiber disordered in .tliis way st iii shows t.lic c~rystnii~~grupliic~ repeat in the fiber direction, and the X-ray scbatleritrg produc*es iagcr-iiues, or horizontal streaks, on the photograph. l%ut t hcsrcl arc IIO tlisc774c sp0ls on the layers (with one esc~c~ptioti) ; instead t li(~r(~ is :i caollt,illuous \.ariation of srat.terrd intensity aloiig earl1 li11e. S11rprisir1gly rirough in sonic (~irrumst,a1ic~rs this may he a11 advantage rather fha11 the reverse, pro\-idi1ig cvcn more infor1natio11 th:111 :i phot ogrnph cBo11sist i1ig of tlisrrctc~ spots. A11 example of surh a photopraph is shoddy it1 Fig. 1. n. 7'lw I~~twjvrcffllin~~ I![ lzihcr l)irup~?ls. X-ray pirturrs of fihc>rs are 0ftf~1 too poor to allow 011~' t 0 IIS(' the nicbt hods of minlysis rriat,omary for si1igi1~ rrystais: it1 p:11~Ii~~1il;1r 011~' 11suaiiy (%niiot hope to "see" t hr atoms (`\.(`II ~~~I~!II 111~ f~or1M sl 17~4 11r(' 11:~ IIVV~I tliscfio\~c~rc~tl. `1'111~ i1itf~rpr(+:il ion of fitjclr diagrams is :I spcG1l art, t herc~forc~. `1`1~~ mr~thotl of at tac*k is to t,ry t.o deduce t.he symmetry of the fiber molecule from the X-ray picture; then to build scale models having this symmetry; and finally to show that only one of these models will fit all the available data, X-ray or other. Thus unless one knows in advance the chemical formula, or at least its most important features, the problem is almost hopeless. In addition, information derived by other techniques such as measurements of infrared dichroism is often invaluable. The symmetry of a fiber molecule is almost always a screw axis. The reasons for this are explained in the next section, where it is pointed out that there is no reason why a single fiber molecule should not have a nonintegral screw axis. For example, 3.6 monomer residues per turn is 18 residues in 5 turns. Such a structure will have a "true repeat" after 18 residues, but this is not a very fundatnental characteristic of it, since a very small twist, of the molecule would give a diRerent "true repeat" or even no true repeat at all. (Th us in our example a twist from 3.60 to 3.61 residues per turn would lead to a repeat of 65 residues in 18 turns.) It might be thought that.the absence of a short repeat distance would make the problem impossibly difficult to solve, but fortunately helical symmetry often produces striking effects in the photograph which imme- diately reveal its exist,ence even when the screw axis is nonintegral. Until a few years ago t,he theory of the effects produced by nonintegral helices in crystal diffraction patterns had not been worked out, simply because in ordinary rryst,allography there is no occasion for it,. It, was in fact only developed (by Co&ran et al., 1952) in response to the proposal by I'auling and Corey (1!)51:1) that. t,hr n-polJ.l)c`l)t,iclc's w(xr(: built. 1111 of 11o11i11t(!gral helixes, narneiy t,hc now famous a-hriix wit,h its 3.0 residues per turn and 1.5 A. per residur to whirh n-c have already scvrral times implic~itiy refrrrrd. Armed with the appropriate t,hcory it, is often possible to recogtlixc~ ihc: hrlirai n:tt.urc of a fihcr structure at, a glaric*c, aiid sonietimrs to spcrify t'he main parameters of the helix and it,s subunits with very little trouble indeed. There is a (hatch, however. Imagine that a shc*ct of paper has hee11 fold4 around the strucbt.ure in t,he for1n of a cylinder, and a mark put WI thp paper at. corresponding points in earh asymmctric~ unit. If this paper is uow openctl out we shall obtain a patter11 of the t,ype shown it1 Fig. 5, whirh we shall rail a ne&diczgmm. Now what. our helix t,heory is givit1g us is iii CSSCII(`(~ the net-diagram of tlich st,ruc4,uro, or :it Ic~~4l, 0111: ol' :i s11lalI number of posxihlo net-diagrams. The posit io11s of t hc poirlts of a part ir- uiar net arc 1111aml)igr1onsiy determi11rt1, Ijut Ilot, Ilow the at ems i11sitlo earh of t.hcm are arranged, nor, what, is more importa11t from thr prcsc~~t point. of virw, how cnrh net,-point is chcnuidl,y at taf4iccl to its iic~ighhors. `1'1111s t lit 11ct4:1gr:1i11 of Fig 5 might c~~1~rrsp011d to ariy of the t hrccb arr:Ing(~- nic11ts s110w11 by the arrows, or ir1dcctl to a11 irifi1iilr rir1nil)cr of ottic~rs. 146 F. II. C. CRICK AND J. C. KENDREW Note that some of these possible arrangements have a single chain of subunits winding upward, some more than one chain. To solve the structure completely, and thus resolve this ambiguity in the net-patt.ern, it is usually necessary to build models-the X-ray data alone are not sufficiently restrictive, and one's knowledge of chemistry must, be invoked to fill the gap. Generally the chemical nature of the subunit is known (in polypeptides it is simply -NH-CO-CHR-), and, as indicated in the next section, many detailed stereochemical data are available from the literature. Armed with all this information, together -- @628 R FIN;. 5. The helix-net derived from wide-angle diagrams of collagen. Hack dot's represent the relative Iorations of "centers" of equivalent grollps of atoms 011 a cy- lindrical shell of radius l? (with axis vertical). The vectors n to d show several possil,ilities for connecting the l)Iack dots 1)~ 7 mcnns of polypeptide chains to rwm hclirul structurw. The rcrrnt mod& of rollageu all use cnnncction c. As can be sew by studyillg the figure this connection corresponds to three separate chains s-ind- . iilg ro1tnd the s:mw nus. (OIIP c-hain joins 0, 3, 6, 9; another 2, 5, 8, and the third 1, -I, 7, 10.) (Ilc:w, 1055.3 \vith :niy d6~rivc~tl froni slll)sitlinry tc~chtliqrws, it is possible, with f>xpericncC, 10 tl(B\is:r :L s:c41wn(: of sgst(wa1 its n~otlrl building whkh will enable OIIC to (~linlitlato nwrl~ all I IIV illlillile iiumbcr of throretiwl nays of joining up thcl poilits oti the iwt-di:igr:lnl. If alI bnt, one of tliew ways c&an be elimi- ll:ltf~(l, :,ll(J if its t \w()rd ic*:ll tlill'rw1 ion pattorn gives rcasonnble agreement \vith the ot,srrv(ad ,x-ray picature, tlrc strwtnre is essentially solved. X-RAY ANALYSIS AND PROTEIN STRUCTURE 147 In any postulated structure the bond distances and bond angles must have acceptable values. The values to be regarded as acceptable are those derived from X-ray studies of small molecules, such as amino acids and small peptides, and the chance of any large deviations from the average values is negligible. In the field of proteins much of the work of deriving a canonical set of dimensions has been done at the California Institute of VI<:. 6. A tli:~gi~nmm:ltic reprcscntation of a f111Iy cstcntleti ~~olyprq~titlc rhnin with tJlc lwntl Iengt IIS and Imnd ntlgles derived from crystal structures and other ex- perimelkd evidence. (Corey and Patlling, 1953.) l'erhnology, nnd the standard values for t hc polypcplkle c+ain are those given by Corey mid l'auling (1953) mtl sl10w11 ii1 Fig. 6. `I'hcir n1051, import.:mt. featiire is that t.hr six atoms of t Iw prptide group (--C'- -Cl0 --- NII-- Cm- ) inwriably lie ill a phmc, or wry nearly so. This is attribllfc~tl to IWOII:~INT, whkh is also rrsponaiblc for the r&l ivrly shori II N---(Y) bond. The strntrt,nre rPt,ains freedom of motion in spite of thr plnllnrit~ of the prpt,ide bond, rotat,ion being possible about the two single bot~ds at,tachrd t,o each C, at,om. 148 F. II. C. (:IlI(`ti i\NIJ J. C. KI~NIJILRW X-RAY ANALYSIS AND PROTEIN STIiUCTUItE 14Y Apart, from t hc c:ov:tlcrlt I,olltls---llsrlally known in advance from chemical st,udies --t,he most important links in structures of the type we shall be considering are the hydrogen bonds, such as NH. . .OC or OH. . . OC. Experience has shown that in practice virtually all the NH, CO, OH or similar groups which the structure contains are somehow linked up in hydrogen bonds; but the particular groups paired to one another cannot be predicted. The stereochemical conditions which a hydrogen bond must sat.isfy are not so restrictive as those governing covalent bonds, but the bond distance and bond angle must fall nevertheless within certain limits, which have been discussed by Donohue (1952). Finally there are the van der Waals' contacts between neighboring atoms. These are not directional bonds, nor are the permissible distances very precisely deter- mined. However, a structure must not have van dcr Waals' contacts which are unacceptably short. All in all the conditions imposed by stereochemistry are very severe, and the number of configurations allowed by them for a structure is oft,en very small. This does not necessarily mean that all the allowed configurations can be simply discovered. It is considered nowadnys good practice, when proposing a structure for a fibrous molecule, to give coordinates for its atoms to the nearest 0.1 A., or preferably 0.01 A. This does not implv thnt the nnthor thinks he knows the coordinates as accnrately as this; it merely inciicates t,hat 3. confignration giving :xceptsble bond distances mid angles i9 at lcnst possible. SIxxificntion of the exnct coordinates allows this to be checked by others. The f:lct t.h:rt the coordinntes mny be slightly wrong is not a valid excllse for fnilinn to present :I consint.rnt set, of them. Apart from these st crooc~hcmi~nl ronsidcrations the most important grnernl princ*iple in st,ructural lvork is symmetry. It is a good working rlllth that whcrr possibh, the snrno pnc*king arrangements {vi11 be used over . and ovw :rg:litl irl :L atrwturc. 11. follows that, generally thcrc: will be symmet,ry dvtncwts of one sort, or another, and since the prcscnre of these .can often be tlctln~ctl raf.her diret%ly from the X-ray dnt,a they are of considerable impnrtancc in tJacakling a st.ruc%ure. Of course, as we have already indi&ed, true crystals almost alwa.ys possess symmetry elements. But this is oftrn t.ruc CJf polymer molcru!es t,oo, especially if they have been enrouraged to take up a regular configurat)ion by drawing them out into fibers. Otherwisr they are called amorphous, and are then not very suit,able for study by X-rays (S(V the next sert,ion). If a fiber st rnc%rrr claw rrpmt, the most likely symmc+ry element is a scbrew asis: a pure translalion WI be thought of as a special case of a s(`rew axis n-ith zrro rot.atioH, and is comparatively rare. Ot,her symmetry elements (mirror nntl glide pl:lIIes) are theoretically possible but are most improbnhhh in pr:lc*tic*c; indrotl they are impossible if t.hc polymer c*ontnins asymmetric carbon atoms of only one hand. There are only two excep- tions-if there are several chains in the structural unit they may be related by a rotation axis parallel to the axis of the fiber; and there is a possibility of dyad axes perpendicular to the fiber axis (as in deoxyribonucleic acid (DNA)). But, usually such symmetry elements occur, if at all, in addition to a screw axis. Unless it be a simple dyad, a screw axis generally gives a structure a helical appearance. Helices are ubiquitous in biology precisely because biological structures are very often made of small units linked together, end to end, to make up a larger entity. In recent years it has been realized that a single isolated helix may have a screw axis which is noncrystallo- graphic; that is to say, n (as defined on p. 140) is not restricted to 2, 3, 4, or 6, but may assume any value, integral or nonintegral. A nonintegral value means merely that a single turn of the helix contains a nonintegral number of subunits. Note that in an isolated helix the environment of each subunit is the same whether n is integral or nonintegral; and there is no reason why a nonintegral value, such as 3.6, should not be assumed if packing relationships betweeii neighboring residues in the helix are best satisfied in this way. If, however, an attempt is made to pack such helices into a `regular lattice, the relationship between asymmetric units in neighboring helices will not be the same everywhere unless the screw axis is 2, 3, 4 or 6-fold; in other words a true crystal rannot be formed unless this condition is satisfied, because it can be shown that these are t,he only symmetry axes (rotation or screw) which allow a pattern t.0 repeat indefinitely ii1 t\vo or three dimensions. True, the chain molecule in a crystal may have a 3.6- fold axis of symmetry, but t'his symmetry cannot, he apparent in the rela- tions betn-ccn it and its neighbors-it is ,zccitle~ltnl from the point of vie\, of ttic! cbrystal and c-antlot form part of t hc spare group. Such a situation could only arise if the interactions between neighboring chains were relatively weak. n nonintegral screw axis is likely to appear when the znteractions of a$ber molecule with itself are much stronger than its interactions with its neighbors. pseudo-axis. In crystallographic jargon a nonintegral screw axis is a It need not even apply to the whole of t,he fiber molecule. Thus the backbone of a polypeptide chain might have nonintegral screw symmetry, but not the distal ends of the side chains which are largely influenced by their neighbors. If a small number of chains, each with a nonintegral screw axis, is plncacd side by side, t,hcy may t.ry to interact in a regular manner, for example, by forming interchain hydrogen bonds. If t,hey are to remain strictly parallel, regular interact,ion will not usually bc possible since the nonint,cgral axis will cause the interrhnin bonds to get, out of step. Sometitnes, how- ever, it may happen that if the individual (helical) chains coil slowly 150 F. It. C. CJlJCK l\NI) J. C'. KIttcral typos of point ~TOUI), or finite collect.ion of sym- nict,ry clenicnts, (see p. 141) are possible: first,, those coJlsisting only of . . art n-fold rotatiott axis (n = any integer) ; second, those possessmg m adtlit,iott tlyad axes pcrprtttticutar t,o the main axis; and Gtird, the cubic pr)int groups. It is thr Iattcbr which arc important in t,he present ronnet'- tiott hec~ausc I ltry gc>ttc:rate isoditncttsiottal arrattgcnl~~tlt.s, sttcbh as spheric*nl virtts;cs arc ktto\vtt to I,(>. `I'ltrrc are tltrcc c*ttbic poitit, groups which will itttprest us. `Ilt(: first has four threefold axes, arrattgctl t(\trahrdrally, :uld a ttumhrr of tlytul UPS. l'hr serond has fourfold, attcl the t.hird fivrfoltl axes as 1vf.11 as thrrc- and twofold axes. The properties of these three poiJJt groups, known as 23,432, and 532 respcctively,4 are set out in Table I, together with the number of asymmetric units in each and the names of the regular (or Platonic) solids which possess these symmetry elements (among others). TABLE I The Three Non-Enantiomorphous Cubic Point-Groups Crystallographic description Number and type of Number of rotation axes asymmetric units Regular solids possessing the same symmetry elements 23 13 dyad 12 Tetrahedron 432 532 14 triad i G dynd 4 triad 3 tetrad 15 dyad IO triad 6 pentad 24 60 i Cube Octahedron Dodecahedron Icosahedron Q. General Remarks In this section we shall briefly consider which aspects of a structure are most clearly "seen" by the X-rays. cultivat,c his "JX-r:ly eye," This should help the reader to lack of which has so often caused misunder- standing in the past. We have spoken so far as if X-rays are scattered only by rcpeat,ing structures such as crystals; but this was a sitnplifiration, merely for didactic purposes. The fact is that X-rays are scattered by e~cr?/ part of the specitnen, but there will only bc sharp spots on the X-my photograph if the electron demity is periodic in space. Ot,ltermisc the photograph will show smears, smudges, or merely diffuse blackcnittg. If part, of the strucature is periodic, part, aperiodic, then there will he sharp spots superposed on smudges or diffuse background. Since a given amount, of blackening shows up much more clearly if it is collected into a spot than if it is spread over an area, and since spots are easier to interpret, our attention is usually concentrated on them rather than on the smudges. So when we "solve a structure" we are generally describing those parts of it which are regular, that is, which repeat. periodic~ally in space. fh~ppose wc have a structure which dops for 11~: most part, rrpcat, rcagr~- larly, with the exception that one small part, of the unit, cell is irregular, ' Pronounced two-three, four-three-two, and five-three-two. F. II. C. (!IfI(!K ANJ) J. C. KENDREW X-RAY ANALYSIS AND PROTEIN STRUCTURE 153 varying in a random manner from cell to wll. What will the X-rays see? Strictly speaking, of course, they respond to the entire structure, but the X-ray spots "see" only the average unit cell. Such a situation may be met with in fibers, which may have a random arrangement of side chains; and also in crystalline proteins, where much of the solvent in the unit cell may have no regular structure. It is true in some measure of all X-ray photographs owing to the random thermal motions of the atoms in all crystals. All these effects to some extent smear out the average electron `density and reduce the amount of fine detail which we can expect to see- which means that those parts of the reciprocal lattice corresponding to small spacings and fine details (the parts far from the center, that is to say) are reduced in intensity. In protein work the trouble is particularly acute, a?d the X-ray intensities from a protein crystal or fiber fall off with decreasing spacing much more rapidly than do those from, an ordinary organic crystal; so the atoms in the structure of a protein, if we eventually succeed in "seeing" them, will certainly be smeared out somewhat. This will naturrilly make the interpretation of the results more difficult, even if they are known to be correct. What would be the effect on the diffraction pattern of minor variations in the amino acid composition of the protein-a change in a single aide chain, for example? Strictly speaking, almost every X-ray reflection is influenced to some extent by every electron in the unit cell. But a change in the few atoms making up a single side chain represents a very small change in the electron density distribution in the unit cell as a whole (protein side chains all have about the same electron density, and we may assume that they are generally packed close together without leaving any gaps which cannot. at once be occupied by water molecules) ; hence the average effect, of slrrh a change on any given reflection is slight, generally . dell \rithin the error of measurement. Small changes in a few reflections may, however, bc just obscrvahlc. It follows that we cannot expect to show by X-rays in any simple manner whether or not two very similar `protrins are in fact idet&al. X-rays see electron density, not, atoms and bonds. Therefore they see a structure in terms of clcatron density and not as a chemist would see it,. For example, they cannot even distinguish in a simple way where one molecule ends and t,he next boginP--one must deduce this indirectly from a knowledge of bout1 dirnrusions. On the other hand they are very sensitive to FYCII slight cbhanges in the position or orientation of a molecule wit,hin the unit cell; cshnnges of a killtl to which protein crystals are peculiarly susceptible. Difficulties of this kind arc not impottant when t,he correc*t three-ditncrlsional clwtrorl tlerwity map of a struc+ure has been obtained, but they have to be borne in mind during the early st,a.ges. 6. Summary A crystal is made up by the indefinite repetition of a small three-dimen- sional unit, the unit cell, consisting of a small number of chemical molecules. It generally possesses symmetry elements, and the particular set of them which is present is known as the space group of the crystal. (Similarly for finite, nonrepeating objects exhibiting symmetry the set of symmetry elements is called a poin.8 group.) X-rays are scattered by the electron density of the crystal. The diffrac- tion pattern can be thought of as a regular three-dimensional array of spots, known as the reciprocal lattice. Each X-ray spot corresponds to bne imaginary wave of a wave analysis, or Fourier analysis, of the electron density. Ita position in the reciprocal lattice shows both the wavelength (or spacing) and the direction of the wave. Its intensity is related to tht? amplitude of the wave. Its phase is not given by the X-ray data. There- foie one cannot deduce the structure directly from the X-ray pattern, except in very simple ms; but, given the stiucture, one can always calculate the pattern. - What can we learn directly from the X-ray pattern of a crystal? The dimensions of the unit cell can be directly calculated from the dimensions of the reciprocal lattice. The symmetry of the crystal is closely related to the symmetry of the reciprocal lattice, and for protein crystals one can almost always deduce the space-iroup from a study of the X-ray pictures. Hence, knowing the volume of the unit cell, the size of the asymmetric unit can be calculated. Powder patterns are familiar from industrial practice, and are obtained by passing a beam of X-rays through a crystalline powder. They can be thought of as the superposition of a large number of single crystal pictures of crystals in every possible orientation relative to the X-ray beam. Their characteristic feature is a set of concentric and sharp but continuous rings of blackening; the radii of these rings correspond to the spacings of the priuaipal lattice planes in the crystal. One can think of them as generated by rotating the reciprocal lattice about all possil)lc axes through its origin, and taking a central section of the resulting set of concentric spheres. A $ber is usually a collection of small erystallites, whose "fiber axis" is nearly parallcl to the length of t,he fiber. X-ray pictures of fibers arc generally more confused and less perfect than those of single crystals, because the structure of most fibers is only partly ordered, so that, some of t,he X-ray intensity is thrown into regions of difruse scattering and not into discret,e spots. Fibers often possess nonintrgral screw nxcs of symmetry, and thr prrs- ence of these can often be deduced by inspection from the X-ray phot,o- 154 F. II. C. CIII(`K ANI) J. C. KENIIIlEW X-RAY ANALYSIS AND PROTEIN STRUCTURE 155 graph. Nhcrc adequate st,crcorhemic:al information has been made availahlc by studies of the sWctures of small molecules, it is sometimes possible to guess the structure of a fiber by careful model building, using accurate scale models. X-ray diffraction is only really useful for studying that part of a struc- ture which repeats regularly in space. By X-ray techniques it is easy to show that two structures are similar, but very difficult to show that they are identical, at any rate when the molecules are large. The importance of symmetry, whether in a crystal, in a fiber, or in a virus, is that it allows the same subunits to be' used in identical environ- ments, repeatedly in the same structure. Presumably for reasons of economy in manufact,ure, Nature is addicted to the mass production of identical small units for building up large constructions. These tend to aggregate in a symmetrical manner which can be "seen" by X-rays. This is why X-rays are useful in studying biological structures. And this is why symmetry is t,he most important of all crystallographic ideas for biochemists. III. FIBROUS PROTEINS AND SYNTHETIC POLYPEPTIDES In this section WC shall give a brief account, of recent work on the small- scale strurturc of fibrous proteins and syuthctic polypeptides. The latter serve as model stru&rcs, simpler than naturally occurring materials because they car1 if desired have uniform side chains; and they have pro- vided some of the most important clues to the configurations of collagens and kcral,ius. A more dcl,ailcd account of X-ray st,udics of fibrous prot,cins up to 1954 has been published by Kcndrew (1954b), while the most recent advances have bctrn reportrd hy Kendrew and l'erulz (1957). As stat,cd in the introdllction WC shall not discuss in t,his review the large-scale structure of fibrous proteins. 1. Synthetic Polypeptides (and Sillc) The polypcptidcs which have proved most useful from the present point of view arc those in which all the side chains arc identical, though random copolymers having two or more types of side chain have also been synthc- sized. The dcgrcc of polymerization is gcncrally fairly high-several hundred residues would be a typical value-so that the molecules are genuinely `I fibrous." Oriented films or fibers can be produced by various simple techniques and these have occasionally given astonishingly good X-ray fiber diagrams. It was tliscovcrcd rarly that, synthetic polypcptidcs form two main types of st rlu+urc, kuown as the (Y- and the @- forms bccausc they are analog()us to the a- and &forms of kcratin. By appropriate choice of solvent either one or the other can be precipitated from solution at will. Thus m-cresol usually gives the a-form, while the &form is precipitated from formic acid. The two forms give quite different X-ray patterns and they can also be distinguished by means of their infrared absorption spectra (as shown by the extensive studies of Elliott and his coworkers, 1956; see the review by Doty and Geiduschek, 1953). a. cr-Polypeptides. The best X-ray photographs of polypeptides in the a-form have been obtained from poly-L-alanine (Bamford et al., 1954; Brown and Trotter, 1956) and from poly-r-methyl-L-glutamate (Bamford et al., 1952, 1953). These photographs have very characteristic features, and so far all polypeptides in the a-form have given similar X-ray patterns, though with varying degrees of perfection. The most detailed studies of them are those carried out by Bamford and his colleagues at Messrs. Courtaulds Ltd (Bamford et al., 1956) and described in their recent book. The main features are a strong meridional reflection of 1.5 A., discovered by Perutz (1951); a strong "layer line" of reflections with layer linespacing 5.4 A.; together with a strong reflection, spacing about 10 A. (depending on the side chain), on the equator. It now seems certain that the configuration of the a-polypeptides is based 011 the a-helix of Pauling et al., (1951). This is a folded configura- tion of the mniu chain; the positions of the atoms in the side chains beyond Cp are not specified by it. A diagram of the a-helix is given in Fig. 7. The polypeptide chain backbone follows an approximately helical path having a pitch of 5.4 A. and containing about 3.6 amino acid residues per turn. The translation per residue in the fiber axis direction is thus 5.41 3.G = 1.5 A. The C, carbou atoms, to which the side chains are attached, are all at a radius of 2.3 A. The whole structure is held together hy hydrogen bonds running from the NH of one pcptidc group to the CO of another peptidc group on the next turn of the helix. The arguments in favor of the a-h&x have already heen rather fully set out elsewhcrc (Crick, 1954) and will bc only very briefly rccapitulatcd here. From the X-ray pattern it is possible to deduce unambiguously the parameters of tlhc nonintegral screw axis: thcsc are a rotation of about 100" and a translation of 1.5 A. From the density of the specimen and the dimensions of the unit cell it can be shown that the asymmetric unit consists of a single amino acid residue. The positions of the strong reflec- tions, together with the fact that a-polypeptides can be stretched into a p-form, show that thcrc is only oue polyppptidc chain per lattice point,, rather than two or more intertwined. Only two st,ructures can he built. to this specificaat,iou: one, the cu-h&x, has the same paramctcrs as those observed; the other (described by Bamford at al., 1952) is very much less satisfactory stereochemically. 156 F. II. C. CRICK AND J. C. KENDHIGW X-RAY ANALYSIS AND PROTEIN STRUCTURE 157 A quite different approach is to build models without assuming any particular screw axis. It can be shown that if a polypeptide chain is to I I FIG. 7. I)r:twings of the Icft-handctl and right-h:~nclcd a-heliccs. l'hc 11 :tnd II groups on the a-carbon atom are in the correct position corresponding to the known conligur~tiw of the I.-:mrino nrids iu proteins. (I,. I'nrding and 11. B. Corey, unprtb- lished drawings.) be folded ht~lieally :ml st~al~ilizc~tl l)y internal hydrogen l)onds, as l.hc infrared evidence suggests (;1int)roac: and Elliott,, 1!J51), 011ly a few siniplc arrnngc- mrllts are stcrco~l~c~i~~ic~:~lly possihh~. Alorcovcr thcbsc ran bc cn~unrr:~t~rtl systematically so that OIIP (WI bc ccrtaiu that none have bocn ovcrlookcd. All of them have been built by Donohue (1953) and arranged in a stereo- chemical order of merit. By any criteria the o-helix is the best, though one or two of the others cannot be totally excluded. The strength of the case for the a-helix lies in the fact that both of these approaches give the same answer. As originally described the a-helix was really two structures, since its backbone could follow either a right-handed or a left-handed helix. These are mirror images of each other; but there are two possible ways of adding side chains to the C, carbon atoms of each, giving in all four structures, of which two are mirror images of the other two. Thus if we confine our- selves to t-polypeptides there are two possible structures, one with a right-handed helix, the other with a left-handed helix, but not mirror images of one another. Until recently it was not known which of these two was more stable, but it now seems likely t,hat the right-handed one is the more common for L-polypeptides. This had, been suggested much earlier on structural grounds (Huggins, 1952); the newer evidence comes in part from studies on optical rotation, both experimental (Elliott et aZ.,1956; Yang and Doty, 1957) and theoretical (Moflitt, 1956 a,b; Fitts and Kirkwood, 1956a,b). In addition, a critical reconsideration of the X-ray data for poly-L-alanine (Elliott and Malcolm, ,1956a). has shown t,hat the agreement between calculated and observed X-ray intensities is greatly improved if the assump- tion is made that t,he chains are polarized at random either llpmard or downward in the strurturr; and that if this is done the right-handed a-helix fits the data muc*h better than the left-handed. Thcrc is a need for still more detailed comparisons between observed and calculated data for a-polypcptides, and these should make it) possible t)o refine t)lie structure even further. The presence of "forbidden" rcflcv- tions, albeit weak, on the meridian of the poly-L-alanine pattern illdicatcs that the a-helix must he slightly distorted in t.he solid state, probably owing to the mutual interference of neighboring chains. There must also 1~ distortion in mixed m-copolymers, since t,hcse give a I.5 A. rCflW!~i~Jll whose spacing is slightly less than usual, and ahnorma.lly broad infrared ahsorption bands (Bamford et al., personal communication). filotl(4 building suggests that in this case the distortion is due to occasional steric hindrance bet,ween CB atoms of side chains (of difrering hands) belonging to rcsiducs on adjacent turns of the same helix (CricLk, 1'J.W)). There is IIOW c*onsiderable evideucae that the a-helix exists in solution ill ccrt.aill solvellts, sr~h as m-cresol, dimrt,hyl-fornlallliCle, and c~hlorofortu- formamidc, provided that the polymer be lollg enough (say 100 residues), since surh solutions bchavr as if t,hcy contained rigid rods ill whic*lr c:L($ residue occupirs 1.5 A. of lengt Ii (Ijoty et al., I!)%). Rlixal nr+)lytncrs 158 F. II. C. CRICK AND J. C. KENDREW X-RAY ANALYSIS AND PROTEIN STRUCTURE 159 also form a helical configuration, but it is less stable than that of the pure D- or L-material. Thus it is clear that each enantiomorph of the monomer prefers its natural sense of helix (Doty and Lundberg, 1956; Elliott et al., 1956). The stability of the a-helix in solution, in various solvents, has been discussed theoretically by Schellman (1955). b. P-Polypeptides and Silk. It has been known for many years that in the so-called P-configuration of keratin and synthetic polypeptides, and also in silk whose X-ray pattern shows it to be a close relative, the poly- peptide must be very nearly fully extended. A fully extended chain has a twofold screw axis, and repeats after two residues in a distance of 7.3 A. The observed repeat in all known B-structures is less than this-usually between 6.6 A. and 7.0 A.-so the chains must be somewhat puckered. In general plan the features of a @structure follow from the disposition of the hydrogen bonds. The hydrogen bonding groups (CO and NH) of a single extended chain all lie in a plane, or nearly so, and project, in a direction roughly perpendicular to the chain direction; it follows that the polypeptide chains can easily be hydrogen bonded into infinite plane sheets. The side chains project alternately on either side of the sheet. Neighboring sheets must be held together by bonds of various types between opposed side chains. The main difficulty in making this general plan more precise is to know the relative directions of neighboring chains in the same sheet. Pauling and Corey (1953b) have described two possible regular arrangements: in the "parallel pleated sheet" all the chains in oue sheet have the same direction, while in the "anti-parallel pleated sheet" alternate chains have opposite directions (see Fig. 8). They claim that if the structures are built so as to conform to the best values of bond dimensions the former arrangement gives a repeat of 6.5 A. and the latter a repeat of 7.0 A. It has not hcen rigorously established, however, that the two models can be distinguished mcrcly by observing the exact value of the repeat, and it stems quibe as likely that iu the syntJhetic polypcptides the directions of the chnius arc randomly in one direction or the other (see Brown and Trotter, 1950). \\`c shall tlist*uss silk only briefly: for a more cxteudcd account of recent work see Kendrew and I'crutz (1957). For the silk of Bomb2/z mori, which cont,ains 44% glycinc residues, Marsh et al. (1955a,b) and also Warwirker (1954) have srlggclstctl a structure based on the nntiparallcl pleat.etl sheet, in whirh it is supposed that crcry alternate residue along tile cllains is glyc4nr; ill (:oI1sc(III(LIl(`c~ gly&c r&dues :~ll projrct (or rather, siIi(nc tliclir "si& &ins" consist, mrrrly of hydrogcu atoms, fail to project) 011 fmr si&: of 2. given shwt. `ho such she&s are packed hacbk t,o bncuk \vitli their gtyciuc 4th togdhr; arId t.he whole structure is supposed to be built up of pairs of such sheets (see Fig. 9). Chemical evidence on the amino acid sequence supports these ideas, and it seems very likely that /I:c'-0 H-N,' `. H--d\ `(-p-O' \ K3c / 0-C a /+H.. `,,,-,-, o-e& \ Hcpc / /.C'=ao. . k--N\ H-N' ~(yyo " 64 I pea \ ac/ 7 &?-=o * -0 . H-N p-o.. . `H-N" \ f-WC \ / HCPC \ / Hcpc 0-q / `N-t-t PC& oec;"N-H o=-c:\N-H,. . I \ PCcH H-Ncc'==o \ PCJH \ H-N"`c'==o \ HCflC / ITIf:. 8. Di:Igr:wun:rtic reprcscnt:~tions of tlrc two plcnted sheet, strlu:tllrcs pro- posed by I':ruling :LIICI Corey (1951~ and 1053h). (n) The anti-parallel ple:~trd sheet, mith chains running alternntely 11p md down. (b) The parallel pleated sheet, with all chains trmuing in the mme direction. much of the strurturc does possess the double-sheet strrict.urc. rht, 1llC details, for example t.hc direction of run of the chains and the iuterplctation of the longer equatorial spacings (Marsh et al., 1955a), seem to us less certain. 1GO F. II. C. CIIIC'K ANI) J. C. KENI~IWW X-ItAY ANALYSIS AND I'IWTIFIN STIIUCTUILE 161 In the more uncommon Tussah silk only 27 % of the residues are glycine so the arrangement must be somewhat different. Marsh et ~2. (1955c) Fro. 9. The basic structure proposed by Marsh el aI. (1955a) for the silk fibroin of Bambyz mori. The figure shows the view looking down the fiber axis, so that the polypeptide chains are running toward the reader. cut the figure in vertical lines. The sheets of polypeptide chains Notice two sheets close together, back-to-back, in the center of the figure, with alanine side-chains on the o&side of the pair of sheets. Fro. 10. The basic structure proposed by Marsh el al. (1955b) for Tuss:A silk, and also for the P-form of poly-r.-nl:mine. Again the view is clown the fiber asis. Notice that in cont.rast to Fig. 9 sheets of polypoptide chains do r~ol occur in pairs but are equally spaced. have suggested a simple struct.IIrc, also based on the nnt,iparallel pleated sheet; here it supposed that the glycines are arranged at random so t)hat the two sides of any sheet are equivalent and all sheets pack at t'he same distance from one another, i.e. as singlets rather than doublets (see Fig. 10). They propose a similar structure for the @-form of poly-L-alaninc whose X-ray picture is remarkably similar (Bamford et al., 1954; Brown and Trotter, 1956). Finally it should be noted that by using appropriate solvents "soluble silk" can be made to take up the a-configuration (Ambrose et al., 1951; Elliott and Malcolm, 1956b). c. Polyproline. We now come to two materials which fall outside the classification of cr- and /3-structures. The first of these is poly+proline, which is interesting because of its relationship to collagen and because, having no NH group, it is incapable of donating hydrogen atoms for hydro- gen bond formation. Cowan and McGavin (1955a,b) have studied its X-ray diffraction pattern, using material prepared by Katchalski. The X-ray pattern can be indexed in terms of a relatively simple unit cell of space group P32 and dimensions a = 6.62 A., c = 9.36 A.; in other words with a threefold screw axis. The asymmetric unit contains one residue, making the distance per residue in the fiber axis direction 3.1 A., a value which indicates that the polypeptide chain must be somewhat folded. Model building shows that,`if the peptide group is both trans- and planar, only a very limited number of configurations is at all possible, owing to the severe restrictions imposed by the steric hindrance between neighboring residues and by the fact that there is only one bond per residue about which rotation can take place.. Only one of these configurations (see Fig. 11) has a triad symmetry axis. These considerations establish the general nature of the structure, although at the time of writing. neither the exact dotails of t,lic configuration nor !.!ic position of the molecule in the unit cell have been deduced unambiguously from the X-ray data- probably because, once again, the chains are running up and down at random in the structure. There is no reason to suppose that the integral threefold axis is an especially favored configuration for the polypeptide chain. It probably arises in this case because of strong van der Waals' interactions between neighboring chains, which discourage the formation of a nonintegral screw (see p. 149). d. PoZ2/gZ@ne. I'olyglycine can be precipitated from solvents in two different forms having different X-ray patterns. That of polyglycine I is a trypica P-pattern; but polyglycine II gives a new kind of pattern not hitherto obtained from any other material, although so far oriented speci- mens have not been obtained and the only photographs available are powder patterns (Meyer and Go, 1934; Bamford et al., 1955). Polyglycine II is of interest because of it,s relationship to collagen (see p. 168). It is prepared hy precipitat.ion from aqueous solutions in the presence of salts such as lithium bromide or calcium chloride. The struc- 162 F. If. C. CMCK AND .I. C. KENDREW ture turns out to be based on an integral threefold screw axis; the powder diagram can be indexed in terms of a trigonsl unit cell, space group P31 FIG. 11. A single chain of the model proposed for poly-L-proline, seen in projcc- tion, below, along the threrfold screw axis, and, above, perpendicular to this axis and along the direction indicated by the arrow. (Cowan and McGavin, 1955b.) or P3*, a = 4.8 A., c = 0.3 11., one residue per nsyrmnetric unit. The configuration proposed for polyglycjnc II by Crick and Rich (1955) (see Fig. 12) has a backbone configuration very similar to that of poly-L- proline. But in this substance, unlike the latter, the? prptide groups all contain hydrogen atoms suitable for hydrogen bond formation, and in X-RAY ANALYSIS AND PROTEIN STRUCTUBE 163 the proposed structure neighboring chains are joined together to form an infinite three-dimensional network by three sets of hydrogen bonds running perpendicular to the fiber axis. The diffraction data are not sufficiently detailed to enable a decision to be made whether all the chains run in the Fro. 12. The I):Lsic strwtnrc proposed for polyglycinc II. A project,ion down t.he threefold screw axis, showing seven chains. Hydrogen bonds, drawn as dashed lines, run in a number of directions linking neighboring chains together. (Crick and Rich, 1955.) same directiou or whet(her they rutI randomly up and down; either arrangc- ment would lead to a stereochemically plausible structure. Further confirmation of the threefold character of the structure comes from the observations of hleggy and Sikorski (1956), who have found hexagonal crystals of polyglycine II in electron micrographs. 9. Fibrous Proteins a. l'hc a-Kcralirr. P&v-u. 11` .lir epidermis, porcupine quill, rnyosin, tropomyoxin, fibrinogen, and ot,her naturally occurring materials give 164 F. H. C. CRICK AND J. C. KENDREW X-RAY ANALYSIS AND PROTEIN STRUCTURE 165 diffraction patterns resembling one another in broad features, and known as alpha-patterns (for details see Kendrew, 1954a). The most striking of these features are strong meridional reflections with spacings of 5.1 A. (Astbury and Woods, l!M) and 1.5A. (Perutz, 1951). The latter strongly suggests that the structure is based on the a-helix. However, an array of parallel o-helixes would not give a 5.1 A. meridional reflection, but, as pointed out by Crick (1952, 1953a) and by Pauling and Corey (1953s) a * I Jb (h) FIG. 13. To illlwt,rnte the genernl idea of a "coiled-coil" or "compound helix." The figure on the Ml shows :I single polypcptide chain. The small helix is supposed . to be an a-helix whose axis has hecn distortedso that it follows a larger, more gradual helix. The figure on the right. show two possible ways of combining helices into ropes. (After I'wling and Corey, 1953x.) system of a-hc~liccs twistc*tl togcthrr into coiled coils is capable of cxplainirlg both the meridional rcflcctions. It. seems probable that this suggestion is correct in princaiplc, but the details are still very uncaclrtain. Panting and Corey (1953h) proposed a complicated st,ructure bast~l on a 7-st,randcd rope, composed of a central straight a-helix with six others twisting slowly around it (see Fig. 13), together with additional int,rrstitial a-hrlircs; and they made the suggrs- tion that thr super coiling might be prodo~rtl by a repeating sequcncc of residues. Cric*k (195Ba) tentatively proposcltl two simple models- --the double rope and the triple rope--which could be derived from simple packing considerations: for reasons of symmetry two right-handed cr-helices might be expected to pack together, not parallel, but at an angle of 20" to one another, when the side chains of one fit into the spaces between the side chains of the other. By a slight deformation this would yield a structure resembling a piece of twin lighting cable; the triple rope is similar. Recently Lang (1956a,b) has shown that this kind of structure would probably give an X-ray pattern simpler than that observed, although his argument is not entirely rigorous, since he made no allowance for side chains. Not only are the details of the configuratiori unknown, but it seems likely that they may be different in different materials giving the a-keratin pattern. Tropomyosin, for example, with no proline and little cystirie, and a molecular width corresponding to only two polypeptide chains, is unlikely to have precisely the same structure as porcupine quill, which contains large amounts of both proline and cystine and gives an X-ray pattern of considerable complexity. In spite of these reservatibns it seems almost certain that a stibstantial part of these proteins is folded into the a-helix configuration, so we may be reasonably confident that the a-helix is not restricted to synthetic polypeptides but can also occur in genuine proteins. b. The &Keratin Pat&p. It. was shown many years ago by Astbury and his colleagues (1930, 1931, and 1933) that when hair is stretched its X-ray diagram changes from what is now called the a-pattern to a radically different one called the &pattern; he concluded that this change reflected a change in t,he configuration of the polypeptide chain from a folded form t.o one which is almost fully extended. This interpretation is still con- sidered to be correct; but t,he details of the /%configuration have eluded discovery, although its general nature is not in doubt. What we have said above (p. 158) in corm&ion with /3-polypcptides and silk applies also t,o the other P-proteins, which include the stretched forms of ma.ny of the proteins we have listed above as a-prot,eins, as well as feather keratin, which exist,s only in what is presumably a &configuration. Thr most important fraturcs of the X-ray pat,tcrn of fl-keratin are equatorial rcflcct'ions of spacing 9.7 and 4.65 A., and a meridional rcflcc- tion of spa.ring 3.33 A.; there is no reflection of sparing 1.5 A., but inst'ead one of 1.1 A. Pauling and Corey (1953b) have sugge&d that t,he structure is essentially a parallel pleat,ed sheet, with repeating unit 6.5 A., in contra- distinrtion to @-polyalanine and silk to whi&, it will be rememl~ered, t hry have attributed the antiparallel plcatcd sheet. Tn our view the cxpcri- JllCllkt~ evidcnrc dots not yet permit the deduction of so precise a modrl; but in grneral terms it does stem likely that the strurturc coilsi& of pleated sheet,s or something very like t,hcm. It is to be hoped that more 103 F. II. C. CRICK AND J. C. KENDREW X-RAY ANALYSIS AND PROTEIN STRUCTURE 167 definite conclusions can be reached by working on the few p-proteins which give diffraction patterns rich in detail: among these is feather keratin, on which a preliminary note has been published by Kdmm and Schor (1956). One of the problems still to be solved about the structure of kerstin is the exact nature of the a-8 transformation. This process is reversible and takes place under relatively mild conditions. Keratin contains a very large number of S-S bridges, and the chemical evidence suggests that these are not ruptured during extension. It is not easy to see how a process which, we must presume, involves the pulling out of (possibly intertwined) helices into pleated sheets, could leave so many interchain bridges intact; the suggestion that all the S-S bridges are in&a-chain also raises formidable stereochemical difficulties. Nor is it clear how all the side chains can readjust themselves easily, since in some cases one would expect them to rotate through large angles during the extension. c. Collagen. In this review we shall be concerned only with the struc- ture of collagen at the atomic level, although it also exhibits features of great interest at a higher level which can be studied in the electron micro- scope. For a recent discussion of the latter see Schmitt et al. (1955). Collagen has been studied by X-rays for many years. As Anfinsen and Redfield have said in the review to which we have already referred (1956), "perhaps for no other protein has such a multitude of structures been proposed, or, to use a term more common among X-ray crystallographers, `discovered'." It must be admitted that the jibe was not unjustified; the structure was in fact unknown until recently, when several groups of workers suggested CSSCII t,ially similar solutions. Thcso suggestions have radically altered t,he situation, which now is that the structure is almost certainly "discovered" in a final sense of the term. It is unlikely that the recent models mill require modification except in detail. There have been several reviews of the earlier efforts (Bear, 1952; Kendrew, 1954a), which need not be described here. It is well known that collagen has an unusual amino acid composition (see Tristram, 1953). Its main peculiarities are the high glycine content (just over one-third of the residues are glycines); t,he presence of hydroxy- proline and hydroxylysine, amino acids which occur in no proteins other than collagen and its near relat,ions; and the large amounts of proline and bydroxyproline, lyhirh toget,hcr make up about 22 % of the residues of beef collagen. There have been tmo important studies of the amino acid sequence in collagen. The first, by Schroeder et al. (1954), showed that the sequence Pro-Gly was rare and Gly-Hypro absellt,, whereas Gly-Pro and IIypro-Gly Ivere common. The provisional ron(*lusioo, that Gly-Pro- ITypro-Gly might, be n common secluence in caollagcn, mas coufirmed by Kroner et al. (1955), who identified this tetrapeptide among t,heir hydrolysis products, as well as t,he t)ripcptitle (:ly-Pro-Hypro. It, seems very probable that the conclusion may be accepted, in spite of the rather low yields obtained in both studies, The X-ray pattern of collagen is of a type given by no other protein. Its main features are a strong meridional arc of spacing 2.86 A. and near-meridio- nal spots with spacings about 4 and 10 A. There are also equatorial reflections, the principal among which is humidity-sensitive, having a spacing of 10.4 A. in dry and up to 17 A. in wet collagen. Finally there is a diffuse patch on the equator in the 4% A. region, especially strong in the dry material (see Fig. 4). Certain electron micrographs (Schmitt el al., 1942; Mustacchi, 1951) have suggested that collagen fibers may be able to stretch by large amounts (up to several hundred per cent) ; but no one has been able to reproduce this phenomenon except under electron bombardment, so it seems probable that it is an artifact. There is no doubt, however, that collagen can be stretched reversibly by small amounts (up to about 10 %) and that during stretching there is an increase in the spacing of the principal meridional reflection, normally 2.86 A. .-This effect was discovered by Cowan et al. *- (1953), who found that it was accompanied by a considerable improvement in the definition of the X-ray pattern, and thus made an important. techni- cal advance. They suggested (1953), as did Cohen and Bear (1953), that the structure was based on a nonintegral helix. There is now general agreement with this view and that the approximate parameters of the screw axis are a rotation of 108" and a translation of 2.86 A. To bc more correct, the structure might have an n-fold relation axis, parallel to t)he fiber axis, in addition to the screw, whose paramctcrs would then be 108"/n and 2.86 A. Consideration of the distribution of strong int,ensit,ies in the diffraction pattern, and of the probable mean radius of the helix, makes it very likely that in fact n = 1 (Cowan et al., 1955). The net-diagram implied by this screw symmetry is shown in l?ig. 5, but it cannot tell us which way the polypeptide chains run, nor how many of them there are, even though there is independent evidence that the number of amino acid residues per asymmetric unit is three (this follows from a consideration of the density of the structure). Various possibilit,ies are shown in Fig. 5. In fact there is evidence from studies of light scat- tering, etc., on collagen in solution (Boedtker and Doty, 1956), as well as from the model-building approach which me shall now discuss, that the number of chains in the helix is most probably three. The structures recently proposed, which are all closely related t,hough not identical, spring from an earlier suggestion by Ramachandran and Kartha (1954). Their first model (whose synnnctPry is nof t hc same as that discussod above) c~onsistcd of thrrc parallel polyprptitln (*haills, joined by hydrogen bonds, not twined around a common axis but running 168 F. II. C. CRICK AND J. C. KENDREW side by side in a compact, group. Each chain had a threefold screw axis with a translation of 9.5 A. (containing three residues) in the fiber axial direction. The backbone configuration of these chains was, in fact, very similar to that subsequently est,ablished for polyproline and polyglycine II, whose axial repeats are almost the same. Ramachandran and Kartha later (1955) modified their structure by causing the three chains to twist slomly around each other, thus giving the model a nonintegral screw axis in conformity with the net-diagram discussed above. The other structures suggested recently have all been of this form, but have differed in the way the three chains are linked together. Since the asymmetric unit contains three residues one can conceive of three different types of interchain hydrogen bond. Rich and Crick (1955) have shown by exhaustive model building that only one of t.he three types can be made sysfema~icaZZy, between atoms of the polypeptide backbone: all structures with more than one type are stereochemically unsatisfactory. Moreover, there are only two ways of making a single set of hydrogen bonds, and these they have described as Structure I and Structure II. The same conclusion has been reached by Bear (1956), also from systematic model building, but to a slightly different set of postulates. It will hc rcmcmhcrcd I.h:~t. in polyglycinc II an ir1finil.n &work of hexagonally nrrauged chain8 is linked together hy hydrogen bonds (Fig. 12). We might imagine the collagen structure as derived by isolating R group of three chains from this infinite network. It cnn easily hc shown hy mean8 of models that there are juet two types of groups which aau hc isolated in t,his way, tlifTcring in the wny their hydrogen bonds arc a.rr:mgctl. One of t,hcsc lypcs corrcaponds to Structure I for collagen, the other to Structure II. Structure II (Fig. 1.4) turnctl out t,o bc much easier to bu Id than Struc- ture I, in that it gave more ac~ccptnble values of bond dimensions and ' angles and of interatomic: distancacs; also its difTrac%ion pattern is in better agrcemeut with the obscrvcd patt,crn (ltamarhandran, 195F; Bear, 1956; `Cowan rt al., I!)%; Ili& autl Crick, unpubliuhcd). For stereochemical rcusons St,ructurc 11 will accommodate only t,he amino acid sequence -G--t~1--1'2-, rcpcat.ctt indefinitely; G must be glycine, while I'1 and Pz could bc any rcsiducs, in(+lding proline and hydroxyprolinc. The arninn acid sctluencc tlat,a t,o whicbh we have already referred indicate that in fact all the hytlroxyproline must be at I'*. This site is located far from the axis of the st.ruct,ure, and thus in Struct.ure II the hydroxyl group of hydrosyproline vannot form hydrogen bonds with CO groups in the tmckhonw of the salne group of &ains. It follows t,hat if it is used for illt,crcahniil liukagcs at. all, it, must scrvc to link toget,hcr neighboring groups of c4iai ns, :ts suggrstrtl I),y Ila~~i:~c~tia~itlra~i aild Kart~lia, mthcr than to link chains within one group, which was the case in the less satisfactory X-RAY ANALYSIS AND PROTEIN STRUCTURE 169 Structure I (Rich and Crick, 1955). The data collected by Gustavson (1955), suggest that the thermal stability of collagen is greater the greater Glycine Hydroxy- proline Proline , `a ' . : , , I .* 6 $> d FIG. 11. To illustrate bhe basic idea of the proposed collagen structure (Coll~~gcu II of Rich nrltl Crick, 1055). For clarity only the C., carbon nloms are shown. The pept,ide groups connecting them are drawn simply as short straight lines. On the left the dotted lines show the general run of the three polypeptide chains about fhe fiber axis (full line). In t.he middle one of the three chnins is shown, to illustrrtte how it coils round the dotted lint. On the right all three chains nre included. The small circles show the sites which must he glycine. The large circlcR xhow where prolinc and hydroxyproline (shaded) are mainly found. Note the repenting sequence of sites. it,s cont,ent of hydroxyproline; but. as yet t,herc is no chemical evidence whether t,hc bonds it forms arc with a group of three &ains, or hetwecn groups, or bot.11. It should be noted that in t,hese structures rclativcly few hydrogen 170 F. II. C. CRICK AND J. C. KENDREW X-RAT ANALYSIS AND PROTEIN STRUCTURE 171 bonds are made between backbone atoms. There seems to be no intrinsic objection to this, however; indeed it may be that the solution of the struc- ture has been delayed by an overemphasis on backbone-backbone hydrogen bonds. It cannot be said that there is yet general agreement that Rich and Crick's Structure II is correct. There is, however, general .agreement that all other structures so far proposed are unsatisfactory, and that Structure II is the best suggestion yet. In our opinion it is likely that it will turn out to be correct. Nevertheless it is necessary to add a note of caution to the effect that different parts of the collagen molecule may have different configurations. It is well known that collagen fibers have a banded structure which can be seen in great detail in electron micrographs, and that the bands differ in certain respects from the interbands. More- over collagen has to be stretched in order to give a good diffraction pattern; it may be that the effect of stretching is to alter the configuration of part of the fib&. It is not impossible, in fact, that in unstretched collagen part of the chain has a different configuration, Structure I for example. If it were shown that the collagen molecule is inhomogeneous in some such sense as this, the force of some of the avents used to deduce the structure would naturally be weakened. IV. CRYSTALLINE PROTEINS. More work has been done to determine the structures of the globular proteins by means of X-rays than has been done in any other area of the field, and with fewer results. By and large, globular proteins are meta- bolically active, and fibrous proteins are not. From the biochemist's point of view, therefore, any results obtained with globular proteins should be the most interesting of all. This branch of protein X-ray studies has in fact just reached a critical point. For the first t,ime there is a real -prospect of getting definite and incontrovertible results. None to speak of have yrt been published-the achievements so far are spect,aculnr from the technical standpoint,, but not from the point of view of t,hc interested outside observer-bnt there is now for t8he first time a real promise fat the immediate fnt,ltrc. The t,ransformation of the fichl is largely a con- scquencne of the sucrrssfril applioat.ion, by l'erntz and his colleagues, of the mct,hod of isomorphous replaccmcnt to a prot,ein crystal. We shall speak of this in its placnr; in the meantime WC must make a preliminary survey of some basic facts about. protein crystals. 1. The Natwe of Protein C'rpfals `I`he maill tlil1crc~rrc~e I~t.~c:n prolcin c*rystals and the cryst'als of mnrh smaller orgatlic 11~0lcc1~lf~s is I hat they (contain a (~onsi(lcratJIc quantit,y of solvent, actually within each unit cell. Typically half the volume of t,he crystal will be mater (or, more often, the salt solution with which the crystal is in equilibrium). If such a crystal is removed from its mother liquor and exposed to the air, water is lost and the crystal can be seen to shrink somewhat; its optical properties usually deteriorate at the same time. X-ray measurements would show that the visible shrinkage is a consequence of the shrinkage of the unit cell itself. In this condition a crystal is conventionally described as "dry," in contradistinction to the original "wet" crystal-though in fact it can be dried still further if placed in a desiccator. All the evidence suggests that most of the water in the crystal is in a "liquid" state-that is to say, it has no regulai structure like ice oh like the hydrated layer around an ion; and it is permeable to small ions. Thus considerable amounts of salts can often be diffused into a brystal tithaut changing the dimensions of the iinit cell, and indeed, since many prdt&s are crystallized by "saltihg out," the salt concentration in the liquid ihside the crystal may reach sev@al moles per liter. Proteins such as ribo- nuclease, which are crystallized from strong solutions of organic solvents, exhibit similar behavior in that the cell dimensions hardly change when the organic solvent is changed, a typical alteration (for ribo&lease) being 0.i A. in 30 A. In all these cases, the fact that ions or other small molecules have gone right into each unit cell can be demonstrated in several ways. For example, if sodium dithionite is diffused into a crystal of methemoglobin the spectrum of the protein can be directly obgerved to change from that of ferrihemoglobin to that of ferrohemoglobin, as the process of diffusion t,akes place. Again, changes in the low order X-ray reflections (t,hat is, the reflections of long spacing near the center of the phot,ograph) show clearly t,hat these small molecules have penetrated the unit cell. This is demonstrated in Fig. 15 which shows the reflections of finback ~+alc myoglobin in four different, salt. solutions. Note that while the inner refiertions alter dramatically, the outer ones are ~n~hnngcd. This shows that. whereas the $;lae sfrucfurr of the contents of ihc unit wII is nr~altered, thr f~cnc~71 disfribufion of c,lccstroil density, as sron at IO\\ resolution, has altrrcd greatly; and t.he cf'frct is romph~tcly rxplninc~tl I)\- snpposing that, the clcctron density of the strnrtnrrlcss but, c?ttttl.Gi.(* (and, in regard to their bonndnrics, somcnhat. ill-tlcfinctl) rrgions c*ont:tirr- ing mother liqrmr has IJWII stcppcd up or down, while lhnl of tlrc prol(4ll mol~~(~nl~s, with t,hcir pre&? and drfinitc strnf4,nre, has rcnlainc~tl IIn- chllrgcd. 1'72 F. Ii. c. CIlICK ANI) J. C. KENDJtEW X-RAY ANALYSIS AND PROTEJN STRUCTURE 173 favorable cases the extra molecules have no effect on the dimensions of the unit cell; but sometimes small changes in dimension do take place, and sometimes the crystals may become disordered or perhaps break up alto- ferent space groups, suggesting that the process of oxygenation may involve an appreciable change in the shape of the molecule, possibly by altering the relative positions of the two subunits of which it is composed (see p. 175). It is noteworthy that in myoglobin, where the indications are that the molecule is not made up of subunits, no such phenomenon has been observed: oxy- and reduced myoglobins crystallize isomorphously. These observations leave no room for doubt that some of the water (or other solvent) within the protein crystal is "liquid;" and the question arises whether it is all liquid. Only in horse hemoglobin has it been fully answered, by Perutz (1946), who measured the density of the crystals after they had been equilibrated in salt solutions of various concentrations. His results are in accordance with the assumption that part of the water is "bound" to the protein, that is to say, held in a rigid or pseudocrystalline arrangement, so that salt cannot diffuse into it; on the other hand, the rest of the water is continuous with the external medium and contains the same concentration of salt. The amount of "bound water" was found to be 30 % of the proteifi (by weight). Whether this simple picture has any physical reality remains to be seen: but at least it summarizes the facts in a very compact way. On the other hand, despite earlier 8uggestions, the X-ray data clearly show that it is an oversimplification to conclude that the bound water consists of a uniform unimolecular layer covering an ellipsoidal protein molecule (Crick, 1953a). The shrinkage of protein crystals can also be studied by X-rays. It is oftctl fount1 that wc:ll-tlcfincd shrinkage stages exist bctwccn the wet and dry extremes. These stages have been carefully studied in horse hemoglobin (Huxley and Kendrew, 1953). The cell dimensions change quite sharply as the humidity is varied at a fixed temperature. In this protein it is even possible to obtain an "expanded" stage by altering the pH. Shrinkage st,ages have also been reported for various myoglobins (Kendrew, 1950; Kendrew and Pauling, 1956; Kendrew and Parrish, 1956), and also for ribonuclease (Magdoff and Crick, 1955b). For the latter protein it has been shown that the wet lattice can be "strained" by what appear to be small humidity variations; that is, the cell dimensions can be altered by about 0.3 A. in 30 A. in an apparently continuous man- ner (Mngdoff and Crick, 1955b). It is not known whether this is true of any other protein. All these phenomena can be understood if we regard a protein as a large molecule of relatively fixed size and shape. It would be surprising if such molecules (as opposed to smaller and relatively more flexible organic molecules) nere able t.o pack t,ogether without leaving considerable space bctmcen them. This spncc is naturally filled with water, or other solvent, as in many crystals of smaller organic molecules; but it is bigger, and there 174 F. Ii. C. CJtICI< AND J. C. KENDItEW X-RAY ANALYSIS AND PROTEIN STRUCTURE 175 is room for a larger number of solvent molecules, which therefore find it easier to retain their "liquid" state-in other words they distribute them- selves over the rather large space in a random manner. The protein molecules presumably touch one another at a rather small number of specific points of contact; and at least some of these are changed abruptly when the crystal goes from one shrinkage stage to another, although minor humidity changes may strain the arrangement a little without causing large and discontinuous changes. As more and more water is removed the molecules pack down together as best they can, alld the structure often becomes disordered. If the crystal is, finally,' dried thoroughly most of the water comes out of the interstices and empty spaces are left between the closely packed protein molecules. In some proteins, 8-lactoglobulin for example, it has been reported (McMeekin et al., 1954) that shrinkage is continuous; though of course the truth may be that even here shrinkage stages do exist, but that their dimensions are so similar that they elude detection. The X-ray pattern of wet protein crystals usually extends to spacings of about 144 or 2 A, the average diffracted intensity falling rapidly with increasing spacing in this region. In this respect protein crystals differ from crystals of ordinary organic molecules, which produce diffracted beams of much smaller spacing. The absence of fine detail in protein diffraction patterns sets a limit to t.he resolution of the structure we can hope to obtaiu even when X-ray methods reach their ultimate power, although in some small proteins it may just be possible to resolve individual atoms. Some protein crystals arc better t,han others from this point of view; thus ribonuclease is particularly good, with spots extending out to about 134 A. In general the smaller the protein the further out into reciprocal spare its ditYraction pattern extends. Dry crystals are always more disordered than wet, ones, and generally give few reflections with spacings less than 5 A, though there are exceptions (in both dir&ions). Thus t.hc diffraction patterns of wet crystals contain more information, and it is usual to study proteins wet rather than dry. To do so one must mount them in sealed capillaries, as thin as possible to minimize loss of X-rays, and containing a few drops of mother liquor to stabilize the humidity. 2. Direct Information. In this section we shall describe the sort of information which can bc obtained from the preliminary examination of a protein crystal. Most of it (except that, discuusscd under d) can be got in only a few days. (1. [*nil C:C~U no! Spacr Group. In most, CRSCS tho dimensions of the unit cell and the nature of the space group (i.e. the symmet,ry elements) can be derived unambiguously from two or three suitably chosen X-ray photographs. Reference. to the International Tables of CrystuZZogruphy at once gives the number of asymmetric units in the unit cell; and from the volume of the latter it is simple to calculate the volume of the asym- metric unit. If the molecular weight of the protein is approximately known one can calculate the maximum number of molecules which the asymmetric unit can contain. To obtain the actual number one must estimate the relative proportions of protein and solvent in the crystal. This usually presents little difficulty since in general only an approximate estimate is required; in fact in almost all cases the proportion of solvent is 40-60%. The most usual number of molecules in the asymmetric unit is one, but two are found quite commonly, and larger numbers oc- casionally. It may even happen that the number is a fraction. Thus in the most common form of horse hemoglobin, whose space group is C2, the number is one-half, showing that the "molecule" found in solution, of molecular weight 67,000, must consist of two identical halves. In the crystal these halves are related by the dyad or twofold rotation axis of symmetry which in this space group relates two neighboring asymmetric units. It is most likely that the same is true of a hemoglobin molecule in solution (it will be realized from what has so far been said that the environment of a protein in a crystal is rather like its environment in solution). In conditions of extreme dilution or in presence of high con- centrations of urea the horse hemoglobin molecule dissociates into two halves in solution. The contrary proposition-that a molecule possessing internal symmetry must exhibit it in the crystal-is not necessarily true. Sometimes a protein with internal symmetry may crystallize in two different forms, in one of which the internal symmetry forms part of the symmet,ry of the cell, while in the other it is not revealed. Thus X-ray evidence alone cannot tell us the minimum structural unit of the prot.ein (at least from the preliminary examination). Insulin, for example, which has a chemical molecular weight of 6000, has a crystallographic molecular weight of 12,000 in both its known crystal forms, and this is also the lowest value so far found in aqueous solution. It will be interesting to see how the two halves of the 12,000 molecule are related, but this we shall not discover without a full-scale analysis of the crystals. Recent work has shown that dissocia- tion of the 12,000 unit into "monomers" of molecular weight 6000 is promoted by urea and guanidine (Kupke and Linderstrgm-Lang, 1954; Trautman, 195G); this suggests Chat hydrogen bonds play an important role in holding the two parts toget.her. b. dfolecztlar Weig~~t. In favorable cases it, is possible to obtaiu a mthcr good value of the molecular weight of the asymmetric unit of the protein 176 F. II. C. CRICK AND J. C. KENDREW X-RAY ANALYSIS AND PROTEIN STRUCTURE 177 by X-ray studies; as we have just indicated, this may he a multiple or a submultiple of the "molecular weight" found by other methods. As this subject has been reviewed elsewhere very recently (Crick, 1957) it will only be briefly alluded to here. In essence the method is to measure the volume of the asymmetric unit (by measuring the cell .dimensions), and to calculate its weight by measuring the density of the crystal. To determine the molecular weight of the protein it is then only necessary to establish the composition of the asymmetric unit in terms of protein, solvent, and salt (if any). This is easiest when salt or organic solvent is absentzif salt is present there will be difficulties due'to the fact that part of the water is "bound" and salt-free, so that the overall salt concentration in the internal medium is less than that in the external medium. Usually therefore one works with salt-free crystals if these are available. Under favorable circumstances the errors should not exceed l-2%, and even a TABLE II Molecular Weights of Some Proteins, ae Determined by X-Rays Protein Molecular weight Determined by Ribonuclease 13,466 Lysozyme 13,906 zk 600 u-Chymotrypsinogen 25,966 f 806 &Lactoglobulin 35,006 f 406 Iiumnn serum albumin 65,266 f 1,360 Human mercnptalbumin 65,600 f 706 Harker (1956) Palmer et al. (1948) Bluhm and Kendrew (1956) Green et al. (1956) Low (1952) Low (1952) very rough estimate will usually be within 5-10%. Considering its ac- curacy, the method has been somewhat neglected in the past; partly perhaps because it requires collaboration between a protein chemist and crystallographer. In Table II we have collected some of the more recent and more accurate results obtained by this method. c. Identification and IdeMit?/. It might be thought that the X-ray diffraction pattern, being so intimately related to the structure of the protein producing it, could be used like a finger prhlt for identification purposes. IJttfortunatcly this is true only to a limited extent. The same protein, crystdlizetl under slightly difrcrent conditions, may give various cBrysta1 forms with totally difirrent space groups and diffractiott patterns. l'htts the fact8 that, t.hc X-ray pictures of two protein crystals are radically tliffcrrttt does not mean that the proteius themselves are different. On the other hand, two protAts kt~onn to br different (though the differences are slight) may sotnat irnrs wystxllizr in the same unit. ~11, attd give almost irletttical tlifl'rartiott patterns. The reason why this is possible has already been discussed (see p. 152). The various crystal forms of myoglobin provide some very good examples of this (Kendrew et aZ., 1954). Thus the form known as Type A has been obtained from sperm whale, finback whale, blue whale, sei whale, lesser rorqual, and common porpoise; the crystals are isomorphous and the dif- fracted intensities very similar though not identical. Again, crystals of the form called Type C have been obtained from the horse, common seal, and gray seal. Nevertheless the myoglobins of the different species differ immunologically and (wherever analyses have been made) chemically, albeit slightly. Changes in the diffracted intensities, of the same order of magnitude as those found in these examples, can also be produced by sini- ple chemical modification of the protein, as for example by convetiing CO-myoglobin to metmyoglobin. It seems very probable, by analogy; that the changes produced by varyink the species are a consequence of a few variations in the side chains (cf. the species variations in insulitl investigated by Brown et al., 1955). Thus while the X-ray pattern is not a safe guide to strict idehtity, it remains true that if two pt'oteins from different sources give very similar unit cells and diffraction patterns, it is virtually certain that they have the same major structural features, and therefore that their amino acid sequences are closely related. d. I'he Shape of Pro@n MoZe+Zes. In certain special cases it is possible to learn something about the shape of the protein from the dimensions and symmetry of the various unit cells in which it occurs. It is-rare that straightforward deductions can be made, however, and the information obtained is not generally very precise, so we shall only touch on it briefly (for a more extended account see Kendrew, 1954a). The cell dimensions put upper limits t,o the diameter of the molecules itt certain directions, but the restrictions arc not often severe enough to be ititrrrstitrg. If the same protein rrystailizcs in many difrcretit fortns it ma,y bc possil)lc t'o dadwe a unicfuc shape for the "cquivalcttt cllipsoitl" sudi that good close-pnckittg is achieved itt all the forms. The tnost, full!, worked ottt esarnplc of t,his approach is hcwoglohitl, nttd those ittt,ewstc~l in it, sltortld consult, the origittal papers (Rrngg and Perutz, 19521,; Hragg et nl., 1!)54). A sottrw of ittfortnatiou whic*h is tnorc of'tett profitable is the itltnosl' rrgion of twiptwal spare--the rrflec~tiorts of lwy low ortlw -cspw~i:tll~ when the clwt 1'011 dcttsity of the solvcttt, is wry dif~crettt frotn that of t,lw s0l1w1t. These reflectSions, which corrcqottd to a view of the st,rwtrtw at wry low resolttt ion, depetttl on the gcttcral wtttrast, bctwectt the prot.c:itt molrc~~tlc and the solvettt,, and wry lit t.lr OH thr: ittterttxl structurr of the proteitt. Allctwttively, whert fhc salt, rottwtltrntiott ittside the St rrt~tttt'c~ is high, ottc wn tnwsure the changes in X-ray itttSwsit,y produced by chn~tgcs 178 F. 11. C. CRIVK AND J. C. RENDREW X-RAY ANALYSIS. AND PROTEIN STRUCTURE 179 in salt conccntrat~ioti (WC Fig. 15). Uy methods of this sort Bragg and Perutz (1952a) derived a shape for t,he molecule of horse hemoglobin which agreed well with that deduced from packing considerations: namely an ellipsoid with dimensions 71 X 53 X 53 A. consisting of hydrated protein. This shape, however, can only be regarded as a .first rough approximation to the truth-the molecule is almost certainly more asym- metrical and more "knobbly" than an ellipsoid. In our view the method of isomorphous replacement offers a more general and more reliable method for discovering the shape of a protein; and its application for this purposti will be discussed on page 195. 3. The Pa.tterson Function The basic principle of the Patterson synthesis has already been men- tioned. In the past it was, for lack of anything better, the main tool for the exploratory studies of protein crystallographers. The newer methods have reduced its importance and we shall refer to it only very briefly here. The subject has been dealt with rather fully by one of us (Kcndrew, 1954a), and a simple explanation of the ideas involved in its application to protein crystals has been set out in an earlier article (Kendrew and Perutz, 1949). It will be recalled that in this method the experimental data are manip- ulated mathematically without any assumptions about the structure being made. This treatment gives a map which shows not the structure itself but the relative positions of all possible pairs of atoms in the structure, all superposed. It can be shown that, if a structure, even so complicated a one as a protein, possesses certain strong features, such as parallel "rods" of high electron density (e.g. polypeptide rhains in suitable configuratJion), the Patterson synthesis will possess analogous featurcs. The actual intcrpretat#ion is ?ont,roversial in almost all cases. It suffices to say that the Patterson approach has clearly demonstrated that the structures of the .few proteins so far examined are not of extreme simplicity in the sense of consist,ing essentially of bundles of parallel &might polypeptide chains; on the other hand they are caertainly not completely isotropic. Some proteins, such as myoglohin, show more obvious signs of regularity than do others, SW% as ribonuclcnsc. Another application of the Patterson synthesis is t,o obtain relative orientations of the same prot,ein in different unit cells, by considering the relative orientation of the strong features of their Patterson syntheses. This c&an be a powerful method in favorable cases, espcrially if three- dimensional data are avnilahlc- but the computation of three-dimensional Patterson synt,lrcscs is at, best, a very tcldions bnsillcss, and it. is doubtful if the effort is well spent, now that more powerful, though eq~~ally t,edious, methods of analysis are available. Again, it may be possible to obtain some knowledge of the relative positions of the molecules in the unit cell by looking for "pseudo-origins"-that is, for regions where the Patterson function appears to repeat within the unit cell. These methods have been used for ox hemoglobin (Crick, 1956) and for various types of myo- globin (Kendrew and Pauling, 1956; Kendrew and Parrish, 1956). But in all cases the results are suggestive rather than conclusive, and SO far there has been no opportunity to check them by more certain methods. The Patterson synthesis, then, will always be a powerful tool in the hands of the crystallographer, but for the present any results obtained by its use should be accepted with reserve. The use of the Patterson syn- thesis in the isomorphous replacement method (see p. 181) is in a different category, however. 4. Methods Involving Heavy Atoms These methods, which involve the addition of heavy atoms to the protein molecule and studying the dbnsequent alteration in the diffraction pattern of the crystals, are the only ones so far discovered which give any secure hope of solving the structure of proteins. For this reason, and- because they are intimately connected with the chemistry of proteins, we shall describe them at length. There are two distinct methods, both of which have been used for a number bf years in the study of small molecules. The first has not yet been applied to proteins, while the second was so used for t#hc first time in 1953. a. The Heavy Atom Method. The first is the Heavy Atom Method proper. This was used by Carlisle and Crowfoot (1945) in their deter- mination of the structure of rholcsterol iodide; and also hy Crowfoot- Hodgkin and her collaborators (Hodgkin et al., 1956) in the first, stages of the study of Vit)amin Nlz . In this method a heavy atom is introduced into t,he molecule, sufi&ntly heavy for it&s contribution to dominate the X-ray intensities. It is then an easy matter to find its position in the unit cell by computing a Patterson synthesis, which will clearly show heavy atom-heavy at,om vectors. One proceeds to calculate the diffraction pattern, both amplitude and phase, which such an atom would produce if it were the o?aZ!/ atom in the unit cell. The result will resemble the observed pattern, but naturally will not be identical to it, since the contribution of the rest of the molecule has been omitted. Now the observed diffraction pattern gives us the correct amplitudes for the whole structure, but not the phases. One employs, therefore, as the next best thing, the calc7Urd phases-based on the llcal:y atom alone-together with the observed amplit.ndcs, t,o c~alculntc a Fourirr or electron density synthesis. This will show the heavy atom and in addition a "ghost" of the rest of the molecule 180 I'. II. c. f:ltl(`l< ~\Nll J. (:. I used to iuvcstigatc t hc strucature of t,hc protein, we shall in the first instan~c rrstric*t, ollrsclvcs as follows. It, has been explained (see 1~. 136) tlta~, :III~ pnrt icbul:lr rc+l(~c+iou has :L rcrtain arnplit~utlc :ttd phase. It (`:I ii 1 hr~rc~forc~ IX: rq:1r(1(`(1 3s :I \.cxc$or; niorcov~~r, the cont~rihutions to it of :III the atolns it1 tllcl Ittlit ~11 :IIY thrmsclvcbs vectors, which must,, of r011tw. tw coinbill(~(l ~3~x1 ori:lIl?.. llowr~~~r, ill c3~rt:liri phm of the rccsip- rocal Iatticbc (dcpoudiug OII the symmetry of the c*l,ystal), corresponding X-RAY ANALYSIS AND PROTEIN STRUCTURE 181 t,o certain special projections of the structure, these vectors are all either parallel or antiparallel, and can therefore be added arithmetically. We may speak of such reflections as being real, i.e. having no imaginary vec- torial component,; as having phase angles of 0 or rr; or as being positive or negative. The corresponding projection is known as a real projection. Our first discussion will confine itself to these reflections. Consider, then, a certain (real) reflection, and suppose that its intensity (on some arbitrary scale) is 100 for the case where'there is protein only, and 64 for the case where there is the same arrangement of protein and in addition one mercury atom per asymmetric unit. The amplitudes will be the square roots of these numbers, that is 10 and 8 respectively; but since we do not know whether the phases of the reflections are 0 or A we must write these as f10 and +8. The mercury contribution must be the difference of these two numbers, that is to say either f 18 or f2. For most reflcct,ions the contribution of the mercury is smaller than that of the protein, so the correct value, will be the smaller one of the pair, that, is f2 in our example. We are still left with an ambiguity of sign, however: that is to say we have, to correspond with protein + heavy atom = (protein plus heavy atom), either (+10) + (-2) = t+f9 or (-10) + (+2) . = t-8) Which of these is correct, we cannot yet tell. What we do know,-however, is that the amplitude due to the heavy atom is ~2; and thus if we could have a unit, crll with all thr prot,cin sribtrart,cd, empty cxcrpt for t,he heavy atom alone, the illtcnsity of the reflection we are considering in its diffraction patt.crn would bc (f2)2 = 4. \\`c can thus calculate, wit,hout making assumptions, the intrnsit,ics in the diffraction pattern due to the heavy atom aloilc-the so-(*atled diflircnce i&~~itira. To find the positioli of the heavy atom we carry out a l':tt,terson synt,hesis, known as a dz"cr- cncc I'attcrson synthesis or (AF)2 synthesis. This shows LIS the vectors betn-c>en the heavy atoms in rarh of the asymmetric units of the cell, all t,he vectors involving atoms ot.hcr t.han the heavy at,om having been canceled out; and from it one ran simply obtain the posit,ion of the heavy atom rclativc to the symmetry elements of the cell. Examples arc given in Figs. 16 and 18. Having found t,hc heavy at,om we cm calculate its rontribut,ion to WIJ particular refiect,ion. I,et us suppose that, it, comes out to $1.7 for the reflection we have been considering. Allowing for experimental err01 this ngrces with the second of our two nlt)ernativcs; it. follows that the protein reflcctioii must bc - 10 and not +lO. That is to say, WE hnv~ dctcmincd the phase of this particular wjlectiou. If we can do this success- 182 17. II. C. CRICK AND J. C. KENDREW X-RAY ANALYSIS AND PROTEIN STRUCTURE 183 fully for all the reflections in the reciprocal lattice plane we have been studying, the way is open to calculate an electron density map of that particular projection of the structure. So much for the real projections. For those regions of the reciprocal lattice where the reflections are complex (that is, of genera1 phase) and which comprise its major part, we follow an analogous but more compli- cated procedure, whose results are less definite. We cannot do as we did before, that is &tract the two amplitudes, because tkk - corresponding vectors are not parallel. It turns out' thnt to find the heavy atom we have to calculate another variety of Difference Patterson, called a (AI) synthesis, in which the terms are simply the difference in inlen&& between (protein + heavy atom) and protein alone. In algebraic terms we use AI = Ip+n - Ip where P = protein; P + H = protein plus heavy atom, whereas before we used (AIV = (d/IP+A - X&J* The AI type of Difference Patterson synthesis gives us as before the vectors be- tween the heavy atoms, from which we can calculate their co-ordinates in the unit cell; but superposed on them are all the vectors between heavy atoms and every other alom in Ihe unit cell, i.e. atoms of protein or liquid (although our procedure does remove those between protein and protein). The diagram thus has a confused bsck- ground and it may be difficult to locate the heavy atom-heavy atom vectors. Even when we have successfully done this there are still further difficulties. Without going into det,ails me may say that: 1. With one single isotnorphous replacement we cannot hope to find t,he phases of all the reHecat ions dircc%ly. What WC get, is two values for the phase ntlgle of each reflection, and t)o remove this ambiguity a second isomorphous replnc~etnent in a dif~crent place in the unit cell is necessary. To be on t,hc safe side it would be better to have at least three separate isomorphous reptacctncr~t.s. 2. The accuracy is less than in the case of real reflections, since me have to assign a cluantitativt: valor t,o the phnsc angle-a value which Gtl bc in error-whereas in the rral case all we have to do is to decide between two alternatives, plus or tninus. 111 J)ractice, therefore, working wit,h rcficctions of con~plcx phase is quite a differtnt proposition from working with real ones. It is less acacurate and more troublesotrrc; besides, the acat ual number of gcncral reHections of general phase is far grealer. 011 the other hand, experictice with the few proteins where electron density maps have been produced indicat)es, as we shall see, that a two-dimensional projecation of the unit. ccl1 is very lit.tle use even when it is knonu to he caorroc,t--t hc thickness of protein and solution t,hrough which the projection tnust, be tnade (never less than 30 A., represent.itlg 15-20 atoms) is so great that all the features of interest are obscured and the result is an uninterpretable confusion. To make real progress the third dimension must be broken into, even though the labor involved is at best formidable. It should be added that one of the inescapable difficulties of protein crystallography is that it is impossible for all the reflections from a protein crystal to be real. This could only happen if the mirror-image protein molecule (made up of de&o residues, and related to the real molecule as a right-hand glove is related to a left-hand glove) were also present. So three dimensions mean solving the genera1 phase problem. On the other hand some space groups are more favorable than others for studying projections; thus monoclinic unit cells have one real projection, whereas orthorhombic ones have three, mutually perpendicular. To illustrate the use of the method we shall now describe some of the results obtained by it so far. C. Isomorphous Replacement and the Structure of Hemoglobin. This was the first appliiation of the-method. Perutz and his collaborators (Green et al., 1954) made use of the fact that hemoglobin contains free sulfhydryl groups by causing it to react with p-chloromercuribenztiate (PCMB), a standard reagent for SH groups. In this way they obtained a hemoglobin molecule with two mercury atpms attached to its surface at specific and definite sites. After crystallization the dimensions of the unit cell were found to be quite unchanged, but the X-ray photographs showed unmis- takable changes in the intensities of the reflections. The reflections cor- responding to a projection along the b axis, which in this space'group are all real, were measured carefully for both normal and mercury-substituted hemoglobin. The two sets of intensities were adjusted to the same scale by a stat,istical rnet.hod and the difference between t#heir square roots (i.e. t,hcir amplitudes) gives 1 AP (, the c*hange in amplitude producBc:d hy t,hc mercury atom. A difference l'at,tcrson projection was computed using values of (AF)2; it is shown in Fig. 16. It will he noticed that apart from the prak at t,he origin (always present in l'attcrson synthcsc~s, and mcrcly rrprcscnting the fact that every atom in the unit cell is at zero distnncc .from itscy), there is one other peak much larger than a.11 the rest. It has the coordinates (14.8, 31.6). It follows that in t,he mlit c*ctI t,hc: R and z coordinates of the heavy atoms are (+7.4, + 15.8) and (-7.1, - 15.X) relative to an origin at the dyad axis of symmetry by which they arc rcl:Ltetl. As il happens l'crutz W:LS :~l)lc 1,o tlcci(!c which solution was correct !)y making use of the extensive expcrirnsntnl rcsult.s on thr shrinltngc and expansion of the crystal. Horse hemoglobin crystals arc II~II~II:~! in Imlcrgoing :I p:~rtinulnr!y simple type of shrinkage. The molecules lie in sllnets and tlrtring slninkagc each sheet remains quite unchanged in itself, !)nt moves rclstive to its neighljors, in a direction always per- pendicular to the b axis. Using this fact it can be shown (Bragg and Per&z, 1952c; FIG. 16. A I JilTcwtwc !`a1 fcrson, tn show how I.!Ic position of :L heavy atom is t!is- covcret! Iwing the iso~norpl~o~is rc!~l:xemc~~ 1 mel hot!. The origirl is shown twice, at the to!) alit! bottom Icft~hant! corrw~. The pe:~k rc!,rcscntirlg the end of the vccto~ . behuee~t hc:~.v,y ntmns is nr:tr t,!le mitltllc of lho ma!) (I:~!~!ot! f 24). The ot,!lcr (smaller) peaks :~II(! hollows :~rc s!)uriolls I):tckground ~IIIC t,o errors of measure- menl, etc. (!Iorse I~cniogl0~~in: tlilTcrcnces dire to I'CMR. Green et al., 1954.) X-RAY ANALYSIS AND PllOTEIN STIlUCTUIlE 185 discussed: t,hat is to say, if the X-ray picture showed that the mercury had increased the intensity of a reflection, then it was given the same sign as the calculated mercury contribution; if the intensity had been decreased, then the signs were made different. In some cases the mercury contribu- tion happened to be so small that no decision could be made, but in the great majority of reflections the signs could be definitely allocated. The earlier studies, predicting that certain reflections would be of like or unlike sign, were confirmed wherever they made definite predictions. Some of the weaker predictions were not confirmed, however, and it is because of this, and because this method of linking signs is only possible in very special cases, that we have not de- scribed it in detail. It was nevertheless a technical tour deforce at the time. Recent work, which mill be mentioned shortly, has increased the number of definite sign determinations, and confirmed those allocated earlier, with the result that 88 reflections out of a total of 94 whose spacings exceed G A can now be taken as certainly established. From these data an elec- tron density map has been computed, showing the contents of the cell projected parallel to its b axis. By combining the results of the various shrinkage stages Perutz was able to calculate his electron density projection corresponding to an imaginary superexpanded stage in which -the layers of molecules have been, as it were, floated apart, so that there is open water between them. (Note that this procedure is only possible in very special cases, as we have indicated above. In general one cannot separate out the molecules from their overlapping neighbors.) The result, showing the projection of a single layer of hemoglobin molecules, is reproduced in Fig. 17. This has been drawn in such a way that the zero contour represents the electron density of water: protein is, on the average, more deuse t.han t,his. Looking at Fig. 17 one experiences two feelings: admiration for the very c~onsiderable technical achievement which it represents, and disap- point~mcnt that the result appears to give us so little information. Its obscurity is due mainly t'o the very great thickness of the projection-the unit cell is 63 A thick in this view-and partly to the rather low resolution. * Nevertheless there are some interesting features. For example, the outline can be fitted roughly t,o the ellipsoidal shape deduced by earlier methods (see p. 178), but is markedly more irregular. Again, the molecule appears to have a waist, or rather a dimple, close to the dyad axis; presumably this is related to the fact that it consists of two identical halves. None of the other features suggest anything in part,icular, though the "hole" marked `w is surprisingly, though not impossibly, deep. The features which a biochenlist, might first, searrh for-the iron atoms and the heme groups- would not. iu my case be expected to show up in projection at this resolu- tion. J8G F. H. C. CIlICK ANli J. C. KENDREW X-RAY ANALYSIS AND PROTEIN STRUCTURE 187 The one feature which can be identified with certainty is the position (in projection) of the heavy atom, although even this could not be picked out if we did not have two views, one with and one without it (the figure shows the latter). Since we know that the mercury atom is attached to a sulfhydryl group, it is possible to draw some conclusions about the position of sulfhydryl groups in the hemoglobin molecule by a correlation of chemi- ,' ,' r -\ ,--. .P. `, l.-. ,' : I : 8' ,.-._ .-_. : : ' i \_e--- L_' .--._a' : \_*,_._. *I -.._a ---. . FIG. 17. A Fourier projection of a row of hemoglobin molecules sllspended in salt- .free water. The rolltours nrc contours of (project&) electron density, the zero COW tour corresponding to the density found where the whole depth of the unit cell is filled with water, The dgad axis between the two halves of the molecule is in the center of the figure. (Bragg and I'erute, 1954.) cal and X-ray studies. We shall enlarge on this topic since, one hopes, it provides the pattern for much future work. It is convenient to summarize the chemical work first (Green et al., 1954; Ingram, 1%5), fixing our attention for the moment on native horse hemo- globin. Ingram used the technique of ampcrometric titration with silver nitrate: thus the protein would first he made to react, with a known quantity of PCMD, and the SH groups remaining unhlorked would be titrated with AgN& The results showed that all the available cinlfhg~lrgl grorlps of a single hemoglobin molecrde (molecular weight 67,000) could he saturat,ed either by 4 molecules of AgN03 , or by 2 of PCMB, or by 2 of HgClz (the experiments were carried out under conditions such that combi- nation of the reagents was probably with SH groups, though t,his is not certain). If the molecule was first saturnted with AgNOa or with PCMB subsequent reaction with HgClr displaced the first substituents. If on the other hand one mole of either PCMB or HgClr were added first the protein would subsequently take up only Iwo moles of AgNOs , the reagent first added not being displaced in this case. The sur- prising thing is that stoichiometrically PCMB and HgClr are equivalent, although one would expect the former to be univalent relative to SH, and the latter divalent. The results suggest that native horse hemoglobin contains four available sulfhydryl groups, arranged in tzoo close pairs. One molecule of HgC12 or PCMB will saturate both the SH groups of one pair, the former by com- bining directly with both of them, the latter by combining with one ahd inactivating the other by steric hindrance. On the other hand a silver atom, being much smaller, saturates only one SH of a pair, and Iwo are required to inaat.ivate t,he pair altogether. One would expect, therefore, that there would be only two regions on the hemoglobin molecule where mercury or silver would go, each corresponding to one of the pairs of SH groups. The X-ray results confirm this, at least as far as the z and .z coordinates are concerned. Difference Fourier projections have been prepared, showing the positions of the heavy atoms, for the cotiplexes of hemoglobin with (a) 2 moles of I'CMB, (b) 2 moles of HgClz , (c) 2 moles of AgNOz , and (d) 4 moles of AgNOn . The resulting projections all show heavy atoms combined at approximat,ely the same positions in the cell. It is especially significant that when four silver atoms are combined the unit cell cont,ains only two peaks, showing that they are present, in two pairs, each pair being so closely spaced that at 6 A. resolution the two silver atoms composing it carmot be seen as separate peaks. This is not, the whole st,ory, however. Analytical dat,a for horse hemoglobin (Tris- t8ram, 1953) show that, it, actu:dIy rontnins 6 sulfur atoms in the form of cystine or cystcine. Other experiments by Ingram have shown that, when denatured, hemo- globin ran react with siz moles of AgNOa , in contrast to the native protein which, as we have said, reacts with four. This result, toget,her with others on blocking by HgCl2 and by PCMB, suggest that the SH groups actually occur in two groups of three, but, that one member of each group is unavailable in the native prot,ein. Ox, sheep, and human hemoglobin have also been studied wit,h similar (but. not identical) results; Ingram'* paper (1955) should be consulted for details. Careful study of the X-ray data shows that although in all the deriva- tives we have mentioned the heavy atom is attached to the same part of the molecule, there are in fact minor differences in position, amounting to a few Angstrom unit,s. The reason for these small differences is unknown; they may be due perhaps to rotations about the bonds of the cysteine side chain. Tbry were in fact, of great assistance in sign determination; the signs of some of the rrflert,ions could not be decided from the I'CRiIR derivative alone because the mercury happened to be in such a position X-RAY ANALYSIS AND PROTEIN STRUCTURE 189 and isocyanides were all investigated, but for various reasons none of them was wholly successful-generally the reason was that myoglobin has so high an affinity for gaseous oxygen (much higher than that of hemoglo- bin), with the result that unless the most strictly anoxic conditions were maintained the heme group ligand was promptly replaced by an oxygen molecule. In the end success was achieved by an entirely different approach,. namely to crystallize myoglobin in the presence of various inorganic ions containing heavy elements. Naturally ions were chosen which on general chemical grounds might be expected to have some affinity'for one ok more of the types of side chain present in proteins. But of course & protein nearly always contains more than one of any given side chain; the hope ivas that in some cases steric or other factors might induce the ion td be at: tached preferentially or even specifically at a single site. In effect this hope was realized in a number of in&&es. The criteria for success tPefe crystallographic rather than chemical:' that is to say, atl X-ray picture showing changed intensitie3 was the &dence that combination had taken place; and a difference Patterson, computed from the intensity changes in the same way as we have described above, indicated that combination had been specifically at a single site if it was found to contain only one peak per asymmetric unit. we may take as an example the first ion that was successfully attached in this way-namely mercuriiodide HgId--. This was investigated because it was known to form complexes. with thio ethers; myoglobin contains two residues of methionine whose side chain is -CH,-CH,-S-CHa . Myoglobin crystals prepared in presence of po- tassium mercuriiodide gave an X-ray pattern substantially different from normal, and tdie Difference Patt,erson projection calculated from it is shown in Fig. 18. (We shall be dealing throughout with myoglobin derived from sperm whale and crystnllixcd from ammonium sulfate. It is known as Type A and is mnnorlinic, wit,h t,wo molecules in the unit cell; the space group is 1'21 , n-hicah means to say that the only symmetry elements present in the unit, cscll are srrew dyad axes parallel to b.) Figure 18 cont,ains ody one peak per asymmetric unit (i.c. per half cell) apart from the origin peak, and it may t~hcr~~fore be taken that caombination has occurred at OIIC site per molcculc. This is not, an exprctrd result since sperm whale myo- globin csontnills two mrthioninc sitlc &ins, IIO~ one; it must be sr~pposct] that, one of them only is stcri?ally av:lilnl)lc for c~orribir~ntion-if ir~JC~r:d the mcthioninr side (*hain is the sit.c of attachment,. The difYercncr Pat- terson is cnomputcd from rrficcttions of sparing greater than 4 A. ; the mercuriiodidc group is thcreforr not rcsolvrd int,o its cornponcnt at 0111s. A later projt~c*liorr wit,b all t,crms out, to 2 A. (not illustrated here) sho\vs t.hc group part i:rlly rcsnlvcd. From the known position of t!hc Iucrt'llriiotlitlc group many of t,hc signs that its contribution to their amplitudes was almost zero, while the silver atoms were displaced sufficiently to ensure that in such a case their con- tributions would be quite substantial. At the t.ime of writing nothing further has been published on hemoglobin. Sign determination has, however, been extended to a resolution of 3 A. and a Fourier projectioh wit'h this resolution has been caltiulated; as might have been anticipated it does not reveal any additional features of the structure which can be interpreted, at present at. least (Perutz, personal.. communication). Peruts (1956) has worked out the theory of a method for determining the difference ' in I/ coordinates between two heavy atoms used in two separate isomorphous replace- ments (the minimum requirement for three-dimensional work-see p. 182); in a mono- clinic cell there is no simple way of establishing this difference, since there are no symmetry elements perpendicular to r/ which can act 89 reference points, Work has also been in progress on ox hemoglobin; Green and North (personal communication) have obtained several isomorphous replace- ments, again using the sulfhydryl groups, and have determined a substan- tial proportion of the signs of the a and c projections of the (orthorhombic) unit cell. They have also derived a tentative Fourier projection along 2, a view of the molecule already known to have interesting features (Crick, 1953a, 1956). In both species of ,hemoglobin the most pressing problem is to obtain further isomorphous replacements at radically different places on the surface of the molecule-in this endeavor some success has been achieved by the use of two reagents developed for the work on myoglobin (see p. 191), namely mercuriiodide and aurichloride, but, it cannot be said that the problem is yet entirely solved. Its solution would open the way for determining the exact shapa of the molecule in a fairly short time, and in the long run for a full t hrec-dimensional analysis. d. Iso~norpho~l.s R'cplmvw~t nnd th Stmctwc qf Alpqlobin Frown several points of view myoglol)irl is an attractive objcrt, for study by the protein crystallographer. It has :1 small molecular weight. (I 7,000) and it can readily he cq%allixctl in at lcnst a ~IOZCII dilYcrcnt' spacsc groups (Ken drew ct al., 1954). It, c~otrf:~ins a proslhc~t ic: group of I~IIOWI~ c~llcmi~al structure and dcfinctl physiological rule, and il is arudogous ill function and so perhaps in structure to :Lnother protein which is being intensively studied by X-ray mrlhods, nar~~dy hcrnoglobin. On the other h:lnd it is more intractable from the poirlt of T+tw of i~omorphous rc~placrment,, berausc no myoglol)irl is k~~r~\vn to cv111ait1 ftw snlfhytlryl grou1)s, whcrc:ls these groups arc to IIO found it1 ~11 I~rmoglobit~s so far csamil&. The other obviolis uniql~c site ili tlicl niolt~~lllc i?: the hcrnc group i&If', :.ind varioils attempts wcrc m:rdc (l$) subunit,s per tlrrn, one coinplctc turn ocbclipyilrg 23 A. 11, is tlifIic~rllt to tletcrminc tlic: v:tluc of n from the tlilYrnction dat:), and early suggestions that it, might, IIC l(J 01 12 jrerc based on a vrr.y irisec~ure argument. Itercut work, (l(~sc~ril)c:tl below, make it likely that n is Ifi. 202 F. Il. C. CRICK AND J. C. KENDREW A number of other "strains" of Th$V have been studied by X-rays (Franklin, 1956a). They all give extremely similar X-ray pictures, though minor differences can be detected. In particular the so-called cucumber virus 4 has a structure very close to that of TMV, though it has a slightly smaller mean diameter (146 A. instead of 152 A.). It is clear that all these "strains," including cucumber virus 4, are structurally related, but exactly how close this relationship may be biologically is another matter (Knight, 1955). The X-ray pattern of reaggregated "A", protein (without RNA) is similar to that of TMV but less perfect, suggesting that the structure is basically the same but a little more irregular (Franklin, 1955b). The small differences between the patterns, probably due to the RNA, are discussed below. The change in the birefringence, from low positive for the intact virus to low negative for reaggregated protein, shows that the RNA makes a positive contribution to the birefringence of TMV. This result is compatible wit,h earlier studies on the ultraviolet dichroism (Seeds and Wilkins, 1950), which established that the nitrogen bases of the RNA were arranged with their planes roughly parallel to the fiber axis, rather than perpendicular to it. It is interesting to note that when gels of reaggregated "A" prot,ein are dried the layer line spacing shortens from 69 A to 62 A, whereas in the virus itself it! does not change, presumably because the structure is constrained by the R.NA (Franklin 195513). We cannot do mnre than barely mention the fact that certain globular proteins, immunologically and otherwise related to TMV, are found in infected plants (Rich et al., 195.5; Franklin and Commoner, 1955) and that these too are capable of aggre- gation into rodlike particles which give X-ray patterns somewhat resembling those from TMV. c. X-ray resulls: the Internul Structure. A knowledge of the screw ' symmetry does not, by itself tell us anything about the shape of the asym- metric unit, Or the location of the RNA. Some information on these points has come from investigations using other methods of attack. The first of lhesc is the very careful work of Caspar (1955, 195Bb) 011 the intensities of the equatorial reflections, leading to a direct deduction of their signs. . It can be shown that these reflections correspond to the cylindrical average of the electron density of the virus (at least if reflections of short spacing are excluded) and that the amplitudes of the reflections must be either positive or negative (i.e. the phase nnglc must I)c 0 or r). Caspar studied the first ten maxim& of theintensity distribution, and showctl t h:tt all sign combinations but two were very unlikely, in that t,hey indicated :I particle of too great, IL radius, and that of the two, one was dis- tinctly belter than the 01 hpr. Jlis next, approach was totally diflcrent,, caonsisting in the application of the method of isomorphous replacement to a virus for the first time. X-RAY ANALYSIS AND PROTEIN STRUCTURE 203 Lead was bound to TMV by adding to the mother liquor an amount of lead acetate corresponding to about 2500 lead atoms per virus particle (greater lead concentrations produced a curdy agglomerate). It could be deduced that the lead atoms were bound at two distinct radial distances from the virus axis, namely 25.3 A. and 84 A. Using this information it was possible to determine the signs of the reflections, and the result was Radius, R (angstroms) FIG. 21. Radial electron-density distribution in the tobacco mosaic virus par- ticle, plotted as a function of distance from the axis of the particle. The density is the mean density in excess of that of water. Note the hole (filled with water) near the axis (R less than u) A), and the large peak at the radius of 40 A. (Caspar, 195613.) identical with the preferred sign combination derived by the first method. Though the agreement between observed and calculated data was very good, some doubt might have been felt about Caspar's result because of the necessity of invoking two sites for the lead; however, recent work (see below) has confirmed his choice of signs. The Fourier synthesis computed from the observed amplitudes, to- gether with Caspar's chosen signs, shows the radial distribution of averago density in the particle and is given in Fig. 21. Its most important features are the central minimum, representing a hole down the middle of the virus 204 I<`. II. f'. (`ltl(`l< ANI) .I. (:. KlCNI)ItIC\\ X-RAY ANALYSIS AND PllOTElN STIIUCTUIIE 205 (occupied by wai,er), the peak :tO a radius of 24 A. together wit(h the even larger peak at 40 A., and finally t.he fn.ct that t.ho radius of the part,icle appears to exceed the mean value of 75 A which had been deduced from earlier data. Another important advance (Franklin, 1956b) has resulted from a second successful isomorphous replacement on the virus. Franklin studied a mercury substituted TRZV, prepared by Fraenkel-Conrat, and containing one mercury per 20,000 molecular weight of protein. This . proved to be isomorphous with uusubstituted .TMV and a study of its I.""""1 I * I . I , I 0 20 40 60 80 100 Rodius, R (angstroms) Frn. 22. Ilwli:J clwl.rr)n tlcusit,y tlislril~~~t,ion for rcpolymcrixctl, RNA-frcr, "A" prolcin from ThlV. Cornlwc Kg. 21, which sliow~ the corresponding function for intact ThIV. The main difference is the absence here of a peak at 40 A radius. This, hnd other evidence, suggests that the RNA of the virus is located at a distance of 40 A from the axis of the virus particle. (Franklin, 1956b.) &ffrartion pattern has confirmed Caspar's allocation of signs for the equatorial rcflcctions, bcsidcs showing that the number of asymmetric units per turn is probably 16. It has also proved possible to allocate signs to the quatoria1 reflections of the aggregated "A" protein (free of RNA), The Fourier syntheses prepared using on the one hand the ampli- tudes of the equatorial reflections of the complet,e virus, and on the other those of the repolymeriecd "A" protein, show clearly that the RNA must be located at a radius of 40 A., since the only major difference between the two syntheses is that the large peak at 40 A. in the former is absent in the latter (see Figs. 21 and 2'2). hloreover the nature of the intensity differences in the first eight nonequatorial layer lines of the pattern all confirm that the RNA is located at a radius of about 40 A. Further work is in progress which may reveal the general disposition of the RNA chain (or chains). Notice that the inner peak of Fig. 21, at a radius of 25 A., st,ill appears when no RNA is present (Fig. 22) and must therefore be due to protein; but little so far is known about the disposition of the protein, in particular the arrangements of the polypeptide chains, though the dif- fraction data suggest that they may run perpendicular to the axis of the virus. The infrared dichroism of oriented TMV (Fraser, 1952) is compatible with this idea. Franklin and Klug (1956) have reached some interesting conclusions about the external shape of the' virus by making an entirely different approach to the data. When the virus particles are pdcked closely td- gether (i.e. at a distance apart of 150,A.) the nature of the diffuse reflections in the pattern suggests that the virus surface is not smooth, but grooved or serrated; this produces helical disordering, as if one were packing to- gether a set of screws. Moreover the intensity distribution on the third layer line suggested to them that there was some matter outside the noerage radius of 75 A. Their arguments, though not entirely compelling, are very suggestive and are moreover compatible with the conclusions reached by isomorphous replacement. In summary, then, the X-ray studies of TMV have shown: (a) the virus is made up of identical (or at least very similar) protein subunits, related by a noninteger screw'axis; (b) the surface of the virus is probably not smooth, but is grooved or serrated; (c) there is a hole of radius 20 A. down the center of t,he virus. The RNA is located near a radius of about 40 A.; some of the protein is at a smaller radius t,han this. It is indeed remarkable that two successful isomorphous replacements should have been nrhieved in a virus at this relatively early stage in the application of the method to large molecules. Part of this success must be attributed to the high technical quality of the work of Caspar and Franklin. The only rod-shaped virus unrelated to TMV whicnh has been studied by X-ra.ys is potato virus X. This gives much poorer X-ray pattarns; and although thry snggost that the structure is helical, better pictures will be nrcdcd hefore this can be established with certainty (Watson, unpublished data). .d. Correlation between X-ray and Other Results. We shall restrict onrselv& to a comparison with electron microscopy and the chemical methods. The electron microscope shows that the length of the virus is close to 8,000 A. This distance is too great to be resolved as a long-spacing X-ray r&e&ion, but by using Bragg reflection of visible light Wilkins et al. (1950) have found that a spacing of this order does exist in intracellular virus 206 F. II. C. CRICK AND; J. C. KENDREW X-RAY ANALYSIS AND PROTEIN STRUCTURE 207 crystals. The packing diameter of TMV, measured on electron micro- graphs, is 150 A., but it is now realized that the diameter of individual virus particles in electron micrographs is a little greater than this. This agrees with the X-ray results which also show that the maximum diameter is a little greater than the mean diameter, and that the virus particles pack closely together by intermeshing. By studying partially degraded virus in the electron microscope it can be seen that the RNA is near the axis of the particle (Hart, 1955; Schramm et aI., 1955). This does not, agree with the X-ray results, ac- .. cording to which the RNA is at a radius of 40 `A.: the discrepancy is prob- ably due to the RNA collapsing toward the center when some of the sup- porting protein is removed preparatory to making electron micrographs. On other electron micrographs pieces of "A" protein, shaped like disks, can be seen to have a hole in the middle (Hart, 1955; Fraenkel-Conrat and Williams, 1955). More recently Huxley (1957) has "stained" intact TMV with salts such as KCl, and has been able to demonstrate the hole down the center of the virus. Although various claims have been made, no satisfactory demonstration of surface structure has yet been given by electron microscopy. The chemical studies have showrl that the protein of the virus is made up of small protein molecules whose molecular weight is about 18,000 (see the review (1956) by Anfinsen and Redfield). It has not yet been shown that all the subunits in a virus are identical, but they are certainly similar, for the amino acid sequences near the two ends of the single poly- peptide chain show no signs of inhomogeneity. It has also been found that (with one possible exception) the various strains differ in their amino acid composition, sometimes quite strikingly; on the other hand, as might be expect,cd, t,he X-ray results reveal only very minor differences between most strains, and these are in any case hard to interpret. To summarize, the elcrtron microscope has the advant#age in studying 1,argc featlures. It is also a very valuable auxiliary tool since it needs such smali amounts of material. The X-ray approach is unrivaled in st,utlyirtg strwtrwc nt :I sorr~ewl~:~t high resolution, and can also pick up features itisidc the int,actj virus. To detcrt differcttces at the amino acid lcvcl the c~hctnic~al tc&ttiqrtc~s arc utqttnlcd. The three m&hods, when properly used, give rrsults whirh fit together iuto a coherent, picture. In particular we can combine dat,a from all three to estimate the molec- ular weight. Accepting that thcrc are 49 subuttit,s per 69 A. of length, as indirated IJy the X-ray data, then in the total leugth of 3000 A. (meas- ured itr elect ran micrographs) t hcrr mud hc 21 ZO subunits, assuming no shrinkage in Icttgth Tvhctt (hc virtts tlrics itt t.hc cIc&rott mic*rosc:opc. The chemical met hods suggest, that t hc molecular weight of the protein subunit is about 18,000. Allowing for 6% RNA these figures lead to a molecular weight of 40 X 10s for the whole virus. This figure should be compared with the value previously regarded as the best,, namely 50 X lo", found by particle counts in electron micrographs (Williams et al., 1951); actually earlier determinations by other methods had given figures nearer to 40 x 106. The agreement, is only fair, but subsequent work may improve it. 2. Spherical Viruses X-ray studies of spherical viruses are less advanced than those of TMV which we have just described, mainly because up to now no isomorphous, replacement has been achieved. Merely noting that single brystal X-&y." photographs of a spherical virus (Rothamsted strain of tobdccll h&fosis: tirtls) were taken as early as 1945, by Crowfoot and Schmidti tpB shall at once proceed to describe recent work on tomatd bushy stunt virtis atid turnip yellow mosaic virus. For electron mitiroscope studies see (1954) and Kaesberg (1956). - Wilki&tig 1 L I * a. Tomato> Bu$hy Stun? ViTtis. This virus contains 17 % of ithA by weight, the remainder being protein. Early X-ray &u-dies (see Cariisle: and Dornberger, 1948) showed that the unit cell was cubic in shape (u = 386 A.), but did not establish definitely that its symmetry was cubic. Caspar (1956a) has recently produced new evidence which suggests very strongly that the symmetry is indeed cubic, the space group being 123. There is only one molecule in the (primitive) unit cell; it follows that the yirus particle itself must have cubic symmetry, its point group being 23, and therefore that it is made up of 12 identical subunits (see Table I). Following a careful study of the distribution of the strong reflections in reciprocal space Caspar has put forward very suggestive arguments that the point group may actually be of higher symmet,ry than the space group demands, namely 532 rather than merely 23: the 532 point group has 60 subunits, or a mult,iple thereof. Nothing has so far been discovered about the location of the RNA. b. 9'7mtip Yellow Virus. T1 iis material is of c*ottsid~~rablc inttrrrst. because it was discovered (Markham, 1!)51) that the ittfwtive virrts is accompanied in the plant by particles whicah, though ot'herwise sin1il:t.r to it, are nottinfective and RNA-free. The infective virus contaius about 40% RNA, t,he remainder being protein. The associated noninfect,i\-e particle is 40% lighter, lacking as it does a11 RNA, yet its diameter is approximately the same (about 280 A.); its protein is similar immuttologi- tally to t,hc protein component of the romplete virus. The two particlw form similar rrystals, atld will indeed form mixed cry&Is (Rcrnnl :wd Carlisle, 1918). Furt~hcrmorc, the low nnglr X-my scal,tcring of the RNA- free particle in solution, unlike that, of the ittfectivc virus, is what one 208 F. 11. C. CRI(`K AND J. C. KENDREW X-RAY ANALYSIS AND PROTEIN STRUCTURE 209 would cxpcct, f'rorn :I sphcri~:~l shnll rather thnu florn :L solid xphcro (Schmidt et al., 1954). `l'akitlg all Oris cvidruce togclhcr t,hc conclusion is clear that by and large the protein is outside and the RNA inside the virus particle. Some early electron micrographs of a few layers of virus (Cosslett and Markham, 1948) suggested that the lattice was of the diamond type, and the first X-ray studies were thought to confirm this. The problem has recently been taken up again by Ktug, Fincnh, and Franklin, who kindly allowed us to see their manuscript prior to publication. Their new data show clearly that the Intt,icc has cubic symmetry, but are difficult to reconcile with the idea of a diamond lattice: the alternative would be a body- centered cubic lattice. A diamond lattice is a very open one, containing exnctly half as much virus as would a body-centered cubic Iattice in which the distance between neighhoring virus particles was the same. It is thus possible to decide between the two alternatives simply by finding how much virus there is per unit volume of crystal. This has been done (in collaboration with Dr. Peter Walker) using a combination of ultraviolet absorption and interference microscopy; the result shows clearly that the more dense lattice is correct, at least as far ns large crystals are concerned. However a straightforward body-centered lattice cannot adequately explain the X-ray pattern except at low resolution. It is suggest,ed that the centers of the virus particles fall on a simple body-centered cubic lattice, but that the particles may have either of two orientations which alternate in a regular manner; this means that the true unit cell in larger than the simple one. The same authors find that most of the strong inten.ritios are lncntrd iu posit,ions in rcciproc:tl space which correspond to the virus particle having the point group symmetry 532. Nevertheless one or two re- flect,ions are present which are quite incompatible with the virus having such a high symmetry, and Klug nnd his colleagues arc driven to conclude that although the virus ns R whole can only have 23 symmetry, some pnrt of it may have the higher 532 sym- metry. This ingenious itlt,erpretation is too intricate to bc completely accepted pithout further work, but there seems to be little doubt that turnip yellow virus has cubic symmetry of some sort. Results from the ItNA-free protein component are eagerly awaited. 3. General Principles of Virus Structure The fact that certain small viruses form crystals, and that these crystals in some cases give X-ray diffraction patterns extending to relatively small spacings (say 5 A.), shows quite clearly that such viruses can be loosely considered as "molecules" in the sense used by protein crystallographers; namely as entities in which t,he majority of the atoms are arranged in fixed (relative) positions. As we have indirated earlier, this does not carry the implication t,hat the posit,ions of all the atoms are fixed, nor that they arc exactly the same in en& virus; but it does imply that there is a very considerable similarity between the atomic arrangements of any two sist.er virus particles. In t,he same way the existence of internal symmetry elements in the virus particle shows that one of its subunits must resemble any other subunit, though once again it is structural simi- larity rather than chemical identity which is proved by the X-rays. The fact that small viruses are either rods or spheres (and not, for example, ellipsoids or plates) has suggested the hypothesis (Crick and Watson, 195Ga) that they are all made of subunits, related by symmetry elements; This is a very natural idea to a crystallographer and had been proposed earlier in special cases (Hodgkin, 1949; Low, 1953), but it had not been sufficiently appreciated by virus workers themselves. Reasons can be given why small viruses are made of subunits, but the arguments are speculative (see Crick and Watson, 1956b); however given that subunits do exist it is natural that we should find them to be related by symmetry elements. Such an arrangement means that every subunit has the same contact points with its neighbors-points at which it must be assumed that the same chemical groups are available in each of the identical subunits. ,. Apart from its approximate location in TMV and in turnip yellow virus, very little is known about the way the RNA is arranged iti virus& or how it combines with the protein, except that the combination is unlikely to involve primary chemical bonds. It is nevertheless a very reasonable surmise that the RNA in the virus has the symmetry elements, or at least some of the symmetry elements, of the protein. In TMV this idea leads to the prediction (Crick and Watson, 1956a) that it is the backbone of the RNA which will follow-this symmetry, not the sequence of the bases. Tllis has been confirtned by the recent experiments of Hart and Smith (1956) who have shown that viruslike rods (of indefinite length) can be made by co-aggregating "A" protein from TMV with synthetic poly- ribotides having an RNA-like backbone, no matter what bases are attached to it. (That these polyribotides occupy the same sites in the "virus" particle as the native RNA does in the true virus is so far only an inference. It should be possible to prove it by X-ray methods.) It seems likely that the same prediction-that the backbone of the RNA possesses the same symmebry elements as the virus protein--will also be proved cor- rect for the spherical viruses, but so far there is no evidence to support this. Since symmetry elements can be discovered relatively easily by X-ray methods, and since they have been detected in all three plant viruses so far studied, it now becomes a worthwhile subject of inquiry whether any small virus under investigation has symmetry elements, and if SO, what they are. However there is no law which says that a virus must have symmetry, and the hypothesis can only be evaluated by examining more and more types of virus; it remains to be seen whether the surmise of 210 F. H. C. CRICK ANb J. C. KENDREW X-RAY ANALYSIS AND PROTEIN STRUCTURE 211 Crick and Watson that most small "spherical" viruses have cubic sym- metry will he confirmed or not. Meanwhile the search for symmetry and the hope of isomorphous replacements (which would hardly be practicable if the virus did not contain identical subunits) are likely to stimulate an increa,sing amount of work in this field. Moreover X-ray investigation shows that at least certain viruses are simpler than their molecular weight might lead one to expect, and this should encourage further chemical studies, especially on the proteins of the spherical viruses. . . It is not improbable (Crick and Watson, 1956b) that microsomal par- ticles-the small compact particles in the cytoplasm which are perhaps the sites of protein synthesis-may also have cubic symmetry. They contain about the same amount of RNA as do the small spherical viruses, and have a similar (or perhaps slightly smaller) diameter, and they appear to be approximately spherical. It would not be surprising if the arrange- ment of the RNA were very similar in small viruses and in microsomal particles. REFERENCES Ambrose, E. J., and Elliott, A. (1951). Proc. Roy. Sot. London A205,47. Ambrose, E. J., Bamford, C. H., Elliott, A., and Hanby, W. E. (1951). Nature 167, 264. Anfinsen, C. B., and Redfield, R. 11. (1956). Advances in Protein Chem. 11. 2. Arndt, U. W., and Riley, D. 1'. (1955). Phil. Trans. Roy. Sot. London A247, 409. Astbury, W. T., and Street, A. (1931). Phil. Trans. Roy. Sot. London A230, 75. Astbury, W. T., and Woods, H. J. (1930). Nature 126. 913. A&bury, W. T., and Woods, H. J. (1933). Phil. Trans. Roy. Sot. London A232, 333. Bamford, C. II., Brown, L., Elliott, A., Hanby, W, E., and Trotter, I. F. (1952). Nature 169, 357. . Bamford, C. H., Brown, L., Elliott, A., Hanby, W. E., and Trotter, I. F., (1953). I'roc. Roy. Sot. London Bi41, 49. Bamford, C. II., Brown, I,., Elliott, A., Hanby, W. E., and Trotter, I. F. (1954). Nature 173. 27. Bamford, C. II., Brown, I,., Cant, E. M., Elliott, A., Hanby, W. E., and Malcolm, E. R. (1955). Nature 176, 39fi. Barnford, C. H., Elliott, A., and Hanby, W. E. (1956). "Synthetic Polypeptides." Academic Press, New York. Bear, R. S. (1952). Advances in Protein Chew 7, 69. Bear, 11. S. (1955) in Fibrous Proteins and their Biological Significance. Symp. Sot. Exptl. Biol. I,Y, Cambridge Univ. Press. Bear, R. S. (1956). J. Biophys. Biochem. Cytol. 2, 363. Bernal, J. D., and Carlisle, C. II. (19.18). Nature 162, 139. Bcrnal, J. I>., and Fankuchen, I. (1941). J. Gen. Physiol. 25, 111, 147. Bluhm, M. M., and I., Tabroff, W., and McGarr, J. J. Kupke, D. W., and Linderstr@m-Lang, K. (1955). J. Am. Chem. Sot 77 3356 Lang, A. R. (1956a). Acta Cryst. 9, 436. (1954). Biochim. et Biophys. Actu lb 163: Lang, A. R. (1956b). Actu Cryst. 9, 446. Linderstrem-Lang, K., and Schellmsn, J. A. (1954). Biochim. et Riophys. Acta 16, 156. Lindley, H. (1955). Biochim. et Biophys. Actu 18, 194. Lindley, H., and Rollett, J. 5. Low, B. W. (1955). Biochim. et Biophys. Acta 18. 183. (1952). J. Am. Chem. Sot. 74.4830. Low, B. W... (1953). in "The Proteins" (H. Neurnth and K. Bailey, eds.), Vol. I, Part A. Academic Press, New York. Low, B. W. (1955). Proc. Intern. Congr. B&hem. 3rd. Congr. Brussels, p. 114. Low, B. W., and Baybrttt, R. B. (1952). J. Am. Chem. Sot. 74, 5806 Low. B. W., and Grenville-Wells, H. J. (1953). Proc. Null. Acud.`Sci. U. S. 39, 735. Magdoff, B. S., and Crick, F. H. C. Magdoff, B. S., and Crick, F. H. C. (1955n). Actu Cryst. 8, 461. (1955b). Actu Cryst. 8, 468. Magdoff, B. 5.; Crick, F. H. C., and Luzsati, V. Markham, R. (1956). Acta Cryst. 9, 156. (1951). Discussions Faraday Sot. No. 11, 221. Marsh, R. E., Corey, R. B., and Pauling, L. (1955a). Biochim. et Bfophys. Acta 16. 1. Marsh, R. E., Corey, R. B:, and Pauling, L. Marsh, R. E., Corey, R. B., and Pauling, L. (1955b). Acta Cryst. 8.62 (1955c). McMeekin, T. L., Rose, M. L., and Hipp, N. J. Actu Cryst. 8. 7;O Meggy, A. B., and Sikoraki, J. (1954). J. Polymer. Sci: 12, 309. Meyer, K. H., and Go, Y. (1956). Nature 177, 326. (1934). Helu. Chin. Acta 17, 1488. Moffitt, W. (1956a). Proc. Natl. Acad. Sci. V. S. 42, 736. Moffitt, W. (1956b). J. Chem. Phys. 26, 467. Moffitt, W., and Yang, J. T. (1956). Proc. Null. Acud. Sci. U. S. 42,596.. Mustacchi, P. 0. (1951). Science 113, 405. Palmer, K. T., Ballantyne, M., and Galvin, J. A. 906. (1948). J. Am. Chem. Sot. 70. Pauling, L., and Corey, R. B. Pauiing, L., and Corey, R. B. (1951a). Proc. Natl. Acad. Sci. U. S. 37 241 Pauiing, L., and Corey, R. B. (1951b). Proc. Nutt. Acud. Sci. U. S. 37' 282: (1951c). Proc. Nutl. Acud. Sci. V. S. 37: 729. Pauling, L., and Corey, R. B. (1953s). Nuture 171, 59. Pauling, L., and Corey, R. B. (1953b). Proc. Natl. Acad. Sci. V. S. 39, 253. Pauling, L., Corey, It. B., and Branson, H. R. 37, 205. (1951). Proc. Natl. Acud. Sci. U. S. I'eruts, M. F. (1946). I'runs. Paruduy Sot. B42, 187. Peru@ hf. F. (1949). Proc. Roy. Sot. London A196, 474. Perutz, M. F. (1951). Nature 167, 1053. Perutz, M. F. (1954). Proc. Roy. Sot. London A226, 264. Perutz, M. F. (1956). Actu Cryst. 9, 867. Ramachandran, ,G. N. (1956). Nature 177, 710. Ramachandran, G. N., and Kartha, G. (1954). Nature 174, 269. Ramachandran, G. N., and Kartha, G. (1955). Nature 176,593. Rich, A., and Crick, F. H. C. (1955). Nutnre 176.915. Rich, A., Dunitz, J. D., and Newmark, P. (1955). Nature 176. 1074. 214 F. If. C. CRICK AND J. C. KENDREW Robertson, J. M. (1953). "Organic Crystals and Molecules." Cornell Univ. Press, Ithaca, N. Y. Schellmsn, J. A. (1955). Compl. rend. truu. Iab. Cudsberg Ser. chim. 49, 230. Schmitt, F. O., Hall, C. E., and Jakus, M. H. (1942). J. Cellzllar Comp. Physiol. 20, 11. Schmidt, P., Kaesberg, P., and Beeman, W. W. (1954). Biochim. et Biophys. Acta 14, 1. Schmitt, F. O., Gross, J., and Highberger, J. H. (1955). Symposia Sot. Ezptl. Biol. No. 9, 148. Schramm, G. (1947). 2. Naturforsch. Pb, 779. Schramm, G., Schumacher, G., and Zillig, W. (1955), Nature 176, 54;. Schroeder, W. A., Kay, L. M., Le Gette, J., Hounen, L., and Green, F. C. (1954). J. Am. Chem. Sot. 76, 3556. Seeds, W. E., and Wilkins, M. H. F. (1959). Discussions Faraday Sot. 9, 417. Steere, R. L. (1957). J. Biophys. Biochem. Cytol. 9, 45. Trautman, Ft. (1956). Abstr. Meeting, Am. Chem. Sot., Atlantic City. T&tram, G,. R. (1953). "The Proteins" (H. Neurath and K. Bailey, eds.), Vol. I, Part A. Academic Press, New York. Warwicker, J. 0. (1954). Acta Cryst. 7, 565. Watson, J. D. (1964). Biochim. et Biophys. Acta 13, 10. Wilkins, M. H. F., Stokes, A. R., Seeds, W. E., and Oster, G. (1959). Nature 168, 127. Williams, R. C. (1954). Advance8 in Virus Research 2, 183. Williams, R. C., Backus, R. C., and Steere, R. L. (1951). J. Am. Chem. Sot. 73, 2962. Yang, J. T., and Doty, P. (1957). J. Am. Chem. Sot. 79, 761.