Reprinted from Connecticut Medicine October 1974 Issue, Vol. 38, No. 10 The Search for the Chemical Structure of DNA Jack S. Cohen, Ph.D., and Franklin H. Portugal, Ph.D. AasraAcT-The history of the chemistry of DNA has to a large extent been neglected in favor of the physical and genetic aspects. Yet the chemical studies on the nucleic acids spanned a period of more than 80 years, and were crucial to our current understanding of the structure and function of DNA. We have examined the published scientific record and interviewed several participants who made major contributions to the elucidation of the chemi- cal structure of DNA. A partial analysis covering the period 1900-1955 is presented. This work is principally directed towards those who believe that deoxyribonucleic acid sprang upon the scene in the early 1950's and had no prior his- tory. In fact, DNA was, of course, discovered in 1869 by Friederich Mieschcr, and a great deal of painstaking work was done in the intervening years. Unfortunately, in the space available I will only be able to present a small portion of the material cover- ing this topic. I will concentrate on the elucidation of the structure of nucleotides, the monomer units from which nucleic acids are built, and the tetra- nucleotide hypothesis for the structure of DNA, and the evidence against it. Mononucleotides and the Tetranucleotide Hypothesis From the turn of the century until the 1940's DNA was generally considered to be a small molecule with a molecular weight of approximately 1,500, consisting of only four nucleotide units and having marginal biological importance. Thus, Walter Jones in the preface of the first book on nucleic acids said in I9 14, "The nucleic acids constitute what is pos- sibly the best understood field of Physiological Chemistry."1 And P. A. Levene in the preface to "Nucleic Acids" in 1931 stated, "The chemistry of nucleic acids can be summed up very briefly. Indeed, a few graphic formulas which need not fill even a single printed page might suffice to express the en- tire store of our present-day knowledge on the sub- ject.l.2 Yet, subsequent work showed DNA to be DR. JACK S. COHEN, is a Senior Investigator in the Repro- duction Research Branch of the National Institute of Child Health and Human Development. National Institutes of Health, Bethesda, MD 20014. DR. FRANKLIN H. PORTUGAL, is a Senior Staff Fellow in the Viral Carcinogenesis Branch. National Cancer Institute, National Institutes of Health. Bethesda, MD 20014. VOLUME 38, NO. 10 one of the largest molecules, containing many thou- sands of nucleotide components, and to be of the utmost genetic significance. These profound changes culminated in 1953 in the proposal by Watson and Crick of a double-helical model for the structure of DNA, which has been described by the eminent geneticist, C. H. Waddington, as "certainly the greatest discovery in biology in this century."' It is our intention to describe the progress of this amaz- ing reversal and attempt to analyze the underlying causes for it. The credit for laying the foundation for the de- termination of the structure of the nucleic acids, by clarification of the structure of their hydrolysis products. must go chiefly to Phoebus Aaron Theodor Levene.4 5 He was born in 1869, the same year that DNA was discovered by Friederich Miescher, and was one of the few Jewish students allowed to enter the Imperial Military Medical Academy in St. Petersburg. As a result of the growing persecution in Russia, his family emigrated to the United States in 1891, and he practiced medicine on the lower East Side of New York City for four years. However, his interest lay in fundamental medical research, and he enrolled as a special student in the Chemistry Department of the School of Mines of Columbia University. Although he never obtained a chemistry degree, he was described by the citation to the Wil- lard Gibbs Medal of the American Chemical Soci- ety, which he was awarded in 1931, as the "out- standing American worker in the application of organic chemistry to biological problems." He received his first appointment to the New York Pathological Institute in 1894 and in 1905 he joined the Rockefeller Institute. In 1909 he and his co- workers made their first important discoveries, the nature of the carbohydrate group in yeast nucleic acid and the order of linkage of the three compon- ents, base, sugar and phosphate in a nucleotide. The ordering of the three chemical components of inosinic acid derived from meat extract had been speculated on by Haiser in 18956 when he had shown the presence of phosphorus. By a compari- son of the products of mild alkaline and acid hydrol- ysis of inosinic acids, Levene and Jacobs were able to establish in 1909 the order phosphate-pentose- purine. Alkali gave phosphoric acid and inosine, while acid gave ribose phosphate and the base, hypoxanthine.' The term nucleoside was introduced in the same year (1909) by Levene and Jacobs to describe the purine-carbohydrate compounds, such 551 as inosine, derived from nucleic acid hydrolysis, and the term nucleotide to describe the phosphate ester of a nucleoside such as inosinic acid. The human mind likes order. A scientist analyz- ing data will always search for a relationship to clarify the problem before him. Quantitative rela- tionships between the two purine and two pyri- midine bases which had been found to be present in nucleic acids were reported as far back as 1893 by Kossel and Neumann.& Steudel in 1906' and Levene and Mandel in 1908'0 concluded that they each occurred in equi-molecular proportions in thymus nucleic acid, and Levene came to the same conclusion in 190911 for yeast nucleic acid. This was confirmed by later workers, such as Jones in 1914.1 At that time such evaluations could be ex- pected to provide no more than crude results. Yet from 1909 until the 1940's it was almost a dogma that the four bases were present in equal propor- tions in the nucleic acids. This led to the formula- tion of what has become known as the tetranucleo- tide hypothesis for the structure of nucleic acids. This term seems to have originated with Kossel and Neumann who' believed that each purine and pyri- midine was present in a separate chemical entity in the nucleus. Thus, they stated in 1893, "It is high- ly probable that four nucleic acids exist of which each contains only one of the nucleic acid bases."8 Levene appears to have adapted this idea to de- scribe one nucleic acid containing equal quantities of the four bases, although in his published works he never fully committed himself to this-it remained a hypothesis. It was necessary to ascertain how the mono- nucleotide units were chemically linked together in the proposed tetranucleotide moiety. In a paper .published in 1912 Levene and Jacobs'2 reported products which they identified as thymidine and and cytidine di-phosphoric acids. Levene identified 2-deoxyribose in 1929 as the carbohydrate in thy- mus nucleic acid (DNA)," 20 years after identify- ing that in yeast nucleic acid (RNA) as ribose.' Then in 1935 he and Tipson proved that thymidine has a furanoside (5-membered) ring structure. They could then conclude "Thus it is evident that in deoxyribose nucleic acid the positions of the phosphoric acid radicals are carbon atoms (3) and (5) of the deoxy- ribose."r4 This was the first time this significant fact was specifically noted. While Lord Todd and his co-workers are rightly credited with establishing the position of the inter- nucleotide bond in the 1950's, it is not generally realized that Levene had proposed the correct an- swer in the early 1930's. In view of the fact that Levene was friendly with 0. T. Avery, a colleague 552 at the Rockefeller Institute and the discoverer, with Maclyn McCarty and Colin MacLeod, of the role of DNA in biological transformation,ts the question arises-could Levene have influenced this work? McCarty answered this question thus, "I do not believe that there was any direct relationship be- tween Levene's work on nucleic acids and Avery's interest in the phenomenon of pneumococcal trans- formation. After Griffith's description of the phe- nomenon in 1928, his findings were confirmed quite early in Avery's laboratory by Martin Dawson. From the beginning Avery was convinced of the potential biological importance of the phenomenon, and his goal for many years was discovering the chemical nature of the substance responsible for transformation. Work in this direction, though inter- mittent, began in 1935 after cell-free extracts be- came available. 1 believe that there were no pre- conceived ideas concerning the involvement of nu- cleic acids. However, by about 1940 it was known that the crude cell-free extracts with transforming activity contained both RNA and DNA as well as other macromolecular constituents. I was told by the late Colin MacLeod that when he and Avery con- sulted Dr. Levene about the possibility that nucleic acids might be involved in the biological activity, he discouraged them by citing the essential invari- ability of nucleic acids on the basis of the tetranu- cleotide theory of their structure. This notion that "nucleic acids are all alike" was repeated to us subsequently by others."`6 The widely held belief in the fundamental vital nature of the proteins in the life process conspired to bring about this situation. However, the tetra- nucleotide hypothesis provided Levene with a ve- hicle for refining the structural knowledge of nu- cleotides. Other workers in the field prov.ided no more than variations on the same theme. One must conclude that what proved a barrier to further prog- ress in elucidating the structure of DNA was not simply the tetra-nucleotide hypothesis itself, but rather a lack of insight by the workers in this diffi- cult and unfashionable field, coupled with lack of suitable techniques at that time to carefully study the structure of intact DNA rather than its degrada- tion products. However, to prove the correct structure of a nat- ural product it is necessary that it be chemically synthesized. Emil Fisher first attempted the chemi- cal synthesis of a nucleotide as early as 1914.17 He used as reagent phosphorylchloride to attach a phos- phate group to a nucleoside, a chemical process termed phosphorylation. However, the yields of desired product using this reagent were very low, as a result of many side reactions. The introduction CONNECTICUT MEDICINE, OCTOBER, 1974 of a mild and efficient phosphorylation agent by Alexarider Todd and co-workers led to B significant breaklhrough in the chemical synthesis of nucleo- tides i'or which he was later awarded the Nobel Prize. Alex:mder Robertus Todd obtained his D.Phil de- gree in the laboratory of Robert Robinson in Oxford in 1933. He then went to Edinburgh to work with George Barger on vitamin Bt, which led him to work on the related co-enzymes, many of which were found at that time to be pyrophosphates. In 1936 Todd went to the Lister Institute in London where he replaced J. M. Gulland, at that time the most prominent British nucleic acid chemist, who was appointed Head of the Chemistry Department at Nottingham. In 1938, at the age of 30, Todd became Head of the Chemistry Department of Manchester University, where he started several lines of re- search, on cannibis and related drugs, on purines and nucleosides, and on methods of phosphoryla- tion. The first full characterization of the reagent di- benzylphosphorochloridate and description of its use as an efficient phosphorylating agent appeared in two short papers submitted in February 1945, and published together in the Journal of the Chemi- cal Society.ts 19 The first of the two was by B. C. Saunders and co-workers, and the second by Todd and co-workers. During the war years, Bernard Saunders in Cambridge had actually been doing secret research on nerve gases.20 These contain a phosphorus fluorine bond, in place of the phosphor- us chlorine bond found in the relatively innocuous phosphorylating agent. Saunders and his co-workers considered it safer to work with the unstable P-Cl compounds than with the highly toxic P-F analogs. Todd had been appointed Professor of Chemistry and Chairman of the Department at Cambridge in 1944. World War II was still in progress, and the Western allies crossed into Germany in early Febru- ary that year. With the end of the war in sight ap- parently February was judged to be a safe time to begin publishing these findings. Thus, the research on the nerve gases had an important, positive, sci- entific by-product in the understanding of the nu- cleic acids. Not an unparalleled event in the history of warfare. Todd's work was predominantly involved with the synthesis of nucleotides and co-enzymes and in his early years in this field he rarely speculated on the structure of the nucleic acids themselves. However, one of the most important later contribu- tions to come from his work was the clarification of the hydrolytic differences between RNA and DNA which had mystified chemists for so long. In 1949 C. E. Carter and W. Cohn described the separation of "yeast" adenylic acid into two forms by ion- exchange chromatography.21 It was naturally thought that these were the Z'and 3'-phosphates of adenosine. Levene and Tipson in 1935 appear to have been the first to specifically relate the differ- ence in hydrolytic properties of the two kinds of nucleic acid to their different chemical structures.22 They suggested that the alkaline lability of RNA, compared to the stability of DNA, resulied from the presence of the 2'-hydroxyl group in the ribose of the former, which could assist in the hydrolysis of RNA. Todd and Daniel M. Brown brought all the evidence together in an influential paper in 1952.23 The first successful chemical synthesis of a dinu- cleotide as found in nucleic acid was finally accom- plished in 1955 by Michelson and Todd,24 and con- firmed the chemical structure of DNA which had been proposed. Todd and Levene, the two who contributed most to the understanding of nucleic acids in half a cen- tury of research, met only once, in the elevator at the Rockefeller Institute-a revealing fact about the degree of communication in science at that time. Todd said, "I went to see Herbert Gasser who was then director of the Rockefeller Institute in 1938. We were going out to lunch-my wife, myself and Gasser-and Gasser got into the elevator and this little old man was in the elevator and just said `Hel- lo,' and Gasser said `This is Levene,' and he got out, and that was the only time I ever saw him, and I never said anything more to him than `good morn- ing'."25 The Molecular Size of DNA A major obstacle to the chemical characterization of DNA was the question of its true molecular size. In the 1920's to 1930's it was generally believed that substances which manifested high molecular weights were "colloids;" that is they were thought to consist of aggregates of small molecules held together by partial or ionic bonds. The opposing point of view, argued most ably by J. H. Staudinger was that pro- teins were truly very large molecules, which were termed "macromolecules" or "polymers," in which the individual components were joined together by actual chemical linkages, or covalent bands. There were naturally suggestions that, in their native state, the nucleic acids were colloidal aggregates of tetra- nucleotides. In 1924 Einar Hammarsten at the Karolinska Institute in Stockholm, Sweden set out to study the colloidal properties of thymus nucleic acid. In doing so he essentially re-discovered the careful biochemi- cal preparation of DNA.26 Some 50 years earlier Miescher had worked very carefully in primitive 554 CONNECTICUT MEDICINE, OCTOBER, 1974 cold-rooms making preparations of nucleic acids. But with the advent of interest in the'nucleic acids by classical organic chemists such as Kossel and Levene much of the art in the biochemical tech- niques was ignored. Since these chemists were studying the degradation products it was considered permissible to use harsh conditions during the prep- aration of DNA. It is not surprising that values ob- tained for the molecular weight of DNA in the years 1900-1938 were usually low and variable. It was an unfortunate coincidence that they averaged around 1,500, just the value expected for a single tetra- nucleotide. In a study bearing on the size of the. DNA mole- cule reported in 1934 Torbjorn Caspersson gave the results of some filtration experiments2' Mies- cher had found in 187 I that nucleic acid was re- tained by a filter, and others had noted high vis- cosities for nucleic acid preparations which were early indications of a high molecular weight. Cas- person was a student of Hammarsten and used DNA prepared by the method which the latter had de- scribed 10 years previously. Caspersson concluded "the astonishing fact that the complexes of nucleic acids must be larger than the protein molecules." Three further studies reported in 1938 applied phys- ical methods to the determination of the actual size of DNA by measuring its molecular weight. Caspers- son and Hammarsten supplied DNA to R. Signer in Berne, Switzerland, to measure the molecular weight using flow birefringence.28 They reported a value of 500,000 to 1 million for the molecular weight of DNA. A second paper in 1938 indicating a similar mole- cular weight for DNA was by W. T. Astubry and F. 0. Bell using X-ray fiber diffraction,?Y and a third paper, also published in 1938 showing that DNA was a large molecule, was by Levene and Ger- hard Schmidt.30 They used ultracentrifugation, a t.echnique which had been developed by T. Sved- berg in the 1920's and had been applied by him and others to show that proteins were true macromole- cules with molecular weights on the order of thou- sands. In the late 1930's an ultracentrifuge was still a rare piece of equipment, even in the U.S.A. E. G. Pickels built an improved model in 1937 at the Rockefeller Institute. Thus in 1938 Levene and Schmidt were able to measure the molecular weight of native DNA at between 200,000 and I million by ultracentrifugation, and showed that the results depended on the means of preparation used. In their study, Levene and Schmidt also observed a non-sedimenting nucleic acid material of low mo- lecular weight, which they concluded, "It is not im- probable that it represents a single tetranucleo- tide."30 This implied that native DNA was a polymer of tetranucleotides, an idea which was propounded by several authors. Gulland discussed this idea be- fore the Chemical Society of London in 1943, and made a revealing comment, " . . . the conception of a molecule composed of polymerized tetranucleo- tides has grown from a mental superposition of the later demonstrations of high molecular weights on the older ideas of a simple molecule containing one each of the four appropriate nucleotides; had the true molecular sizes been realized earlier it is doubt- ful whether the conception would have gained such firm hold as is apparently the case." He then sug- gested that the ratios of the four bases might be "statistical," but then surprisingly recommended the concept of a polytetranucleotide as a "practical working hypothesis."31 Physiochemical Characterization of DNA In an attempt to clarify several questions Gulland and co-workers studied the titration characteristics of DNA. In an important contribution in 194732 they showed that the ratio of primary to secondary phosphoryl groups in carefully prepared DNA sam- ples had a minimum value of approximately 16: 1. This is inconsistent with a simple tetranucleotide, although still much too low a ratio to account for the highly polymeric nature of DNA as then known. But, more important in its implications for DNA structure were their findings on the amino and enolic hydroxyl groups of the purines and pyrimidines. These also exhibit characteristic ionization con- stants. In following the titration of DNA in both acid and alkaline directions, Gulland, et al. noted the significant fact that these transitions were not completely reversible. Changes in other properties, such as viscosity, had previously been noted on ti- tration of DNA, and had been explained in terms of a chemical de-polymerization. Gulland ef al. favored an explanation for this hysteresis in terms of the exposure of amino and hydroxyl groups from the base, which, they concluded, had been hidden in the original structure. This, they attributed to hydrogen bonds-weak bonds between oxygen and nitrogen atoms mediated by hydrogen atoms. They noted that their evidence did not enable them to distinguish whether the hydrogen bonds united dif- ferent parts of the same chain or different chains of DNA.33 The presence of hydrogen bonds had been suggested as being important for maintaining protein structures on the basis of similar phenome- non in acid-base titration. Unfortunately, Gulland did not live to follow up the questions posed by these results, nor to achieve the recognition this significant contribution warranted, for he was killed VOLUME 38, NO. IO 555 in a train derailment in 1947, the very, year of its publication. The quantitative values for the ratio of amino and hydroxyl groups present in DNA, which were de- termined by Gulland et al., from titrations, were also not in accord with the exactly equivalent stoi- chiometry of the bases required by the tetranucleo- tide hypothesis. It had been accepted for several decades that the four bases were present in nucleic acids in equimolecular proportions. From 1948-1952 Erwin Chargaff, an Austrian emigre' working at Columbia University, published a series of papers in which he and his co-workers proved this belief to be unfounded. They described in detail a sensitive and accurate technique for the determination of the purine and pyrimidine components in nucleic acid hydrolysates. Chargaff and Vischer34 and Rollin Hotchkiss35 at Rockefeller Institute utilized paper chromatography to separate small amounts of a mix- ture of purines and pyrimidines into its components, and UV spectroscopy to determine the proportions. They were able to show that the ratios of purines and pyrimidines in DNA from different species varied greatly.j6 Although discrepancies from exact stoichiometry of the four bases had been reported in the literature since its inception, these had been conveniently ignored in favor of the imaginative simplicity of the tetranucleotide hypothesis. During the course of his work Chargaff noted some other quantitative relationships between the base ratios, which with the accumulation of reliable data and the work of others" gradually became compelling. Thus, in 1948 he said "A comparison of the molar proportions reveals certain striking, but perhaps meaningless, regularities"3x And in 195 I, "as the number of examples of such regularity in- creases, the question will become pertinent whether it is merely accidental or whether it is an expression of certain structural principles that are shared by many deoxypentose nucleic acids despite far-reach- ing differences in their individual composition and the absence of a recognizable periodicity in their nucleotide sequence."j" The regularities referred to where the findings that the amount of adcnine equatled the amount of thymine, and guanine that of cytosine. Later Chargaff stated, "It will surprise many readers . . . to learn that the first announce- ment of base-pairing was made in 1950 "40 There is, of course, a major distinction between unitary base ratios of unknown origin and specific base-pairing as advanced later by Watson and Crick.41 Further, according to James Watson's personal account after he had `discovered' specific hydrogen-bonded purine-pyrimidine base-pairing, "Chargaffs rules" then suddenly stood out as a consequence of a double-helical structure for DNA."42 Nevertheless, the quantitative studies of Chargaff and his co- workers represented the last nails being driven into the coffin of the tetranucleotide hypothesis for the structure of the nucleic acids. As a result of these studies over a period of more than 50 years, DNA was known to be a high molec- ular weight polymer with phosphate groups link- ing deoxyribonucleosides between the 3' and 5' positions of the sugar groups. The sequence of bases was unknown, although some quantitative regulari- ties in base composition had been noted. While the detailed chemical structure of DNA had been de- termined, its molecular geometry remained a mys- tery. In the elucidation of this mystery molecular biology was, of course, to become the operative phrase. 556 CONNECTICUT MEDICINE, OCTOBER, 1974 Acids. II. Elecwomeric Titration OC the Acidic and Basic Groups of the Deoxypentose Nucleic Acid of Calf Thymus." J. Chem. Sot. 1131-l 141. 1947 33. Creeth JM, Gulland JM. Jordan DO: "Deoxypentose Nucleic Acids. Ill. Viscosity and Streaming Birelringence of the Sodium Salt of the Deoxypento%e Nuclw Acid of Calf Thymus." J. Chem. Sot. 1141-l 145. 1947. In D. 0. Jordan. C1wnior.1~ 01 .Vwkic Am6. Burterworths. London, 1960. p 169. it IS stated "That the bonds were intermolecular and not intramolecular was determined by the viscosity data of Crceth. Gulland and Jord.m," hut the original reference slates the opposite conclusion. 34. Vwzher I', Chargaff E: "The Separation and Choracteriration of Purines !n Minute Amoums of Nucleic Acid Hydrolysates. "J. Hiol. Chem. 168 7X1-782. 1947; ibid "The Separation and Quantitatlw Esti- matlon oC I'orincs and Pyrimidines in Minute Amounts." J. Biol. Chem. 176. 703-714. 194X; Ihid, "The Composition of the Pentose Nucleic Acids at" Yeast and Pancreas." J. Biol. Chem. 176. 715-734, 1948 35. Hotchkirs RD: "The Quantitative Separation of Purines. Pyrimidines and Nucluoude\ hy Paper Chromatography." J. Biol. Chem. 175. 315- 332. 1948. rhis paper mcluded the first published report of a "rare" base termed epicytoww. later shown to be S-methyl cytosine. 36. Chargaff E: "Chemical Specificity of Nucleic Acids and Mechanism of tbclr Enqmatvz Degradation." Experientia 6. 201-209. IL)50 37. Daly M. Alllrey VG, Mirsky AE. "Purine and Pyrimidine Content of Some Ile~~x!pentose h'ucleic Acids." .I. Gcn. Physiol. .1.1. 497-510. 1950: G. R. Wyatt. "l.he Purine and Pyrimidinc Composiuon of Deoxy- pentose Nucleic Acids." Biochem. J. 4X. 5X4-590. 1951: A Marshat and H. .J. Vngcl. "Ilicrodetermination of Punnes and Pyrimidines in Rio- logical M.~tcn;~l~." J 1~101. Chem. 180. 597.605. 1951 3X. Viscbcr I-. Zamcnhof S. Chargaff E: "M icrohul Nucleic And\. I he Dco';~pentose Nucleic Acid or Avian rubcrcle Bacilli and Yeast." J Hwl. (`hem li:. JZY-43X. 1949 VOLUME 38,N0. IO 557