THE JOURNAL OP BIOLOGICAL CHEMISTRS Vol. 237, No. 6, June 1962 Printed in U.S.A. Enzymatic Synthesis of Deoxyribonucleic Acid* XI. FURTHER STUDIES ON NEAREST NEIGHBOR BASE SEQUENCES IN DEOXYRIBONUCLEIC ACIDS b1. K. SWARTZt T. A. TRAUTNER,$ AND ARTHUR KORNBERG From the Department of Biochemistry, Stanford University School of Jfedicine, Palo Alto, California (Received for publication, January 26, 1962) The examination of nearest neighbor base sequences in deoxy- ribonucleic acid (DNA) with the technique described by Josse, Kaiser, and Kornberg (1) established (a) that DNA from a given source directs the synthesis of a product in which the four bases occur next to one another in the 16 possible arrangements, not at random but in a pattern of frequencies unique for that DNA; and (b) that enzymatically synthesized DNA, and by inference the primer DNA, shows complementary pairing of adenine to thy- mine and of guanine to cytosine between two strands of opposite polarity, as proposed in the Watson and Crick model. This paper describes esperiments with the same technique undertaken to explore the following questions. (a) Does the replication of a single stranded DNA (with noncomplementary base composition) proceed by base pairing as in double stranded DN.%? (b) Are the nearest neighbor sequence patterns in the DNA's from different tissues and tumors of a given species the same? (c) Do sequence patterns in the DNA's of various bio- logical forms reveal any relationships among or within groups of organisms? - EXPERIMEXTAL PROCEDURE -1 faterials Substrates and Enzymes-Labeled and unlabeled deoxynucleo- side triphosphates, micrococcal DNase, and calf spleen diesterase were prepared as described previously (2-5). The DNA-synthe- sizing enzymes, Escherichia coli polymerase and the T2 phage- induced polymerase, used in most experiments were prepared from Fraction VII, refractionated on diethylaminoethyl cellulose (DEAE-cellulose) (2) or on phosphocellulose resin (6) ; the specific activities of these enzymes were approximately 1,500. E. coli DN.4 endonuclease (7) (carboxymethyl cellulose fraction, 2,500 units per ml) and E. coli DIU phosphodiesterase (8) (DEAE fraction, 20,000 units per ml) were kindly supplied by Dr. I. R. Lehman. Crystalline pancreatic DNase was purchased from the Worthington Biochemical Corporation. DNA Preparations-The DNA's used in these experiments generally had eM values (based on deoxypentose) ranging be- tween 6.3 and 7.5; their protein contents were in most cases less than 6% as determined by the method of Lowry et al. (9). Un- * This investigation was supported by research grants from the National Institutes of Health, United States Public Health Serv- ice. t Present address, Massachusetts General Hospital, Bosto4 Massachusetts. 3 Fellow of Deutsche Forschungsgemeinschaft ; present address, Institut fur Genetik, Universitat Koln, Koln-Lindenthal, Ger- many. less otherwise noted, DNA's were isolated by homogenizing tis- sues or cells in a Waring Blendor in 0.15 M NaCl + 0.01 M so- dium citrate and centrifuging according to the procedure of Kay, Simmons, and Dounce (10). The crude tissue fractions thus obtained were again stirred in the Waring Blendor and then sub- jected either to repeated shaking with chloroform-octanol (1 1) and precipitation with 95% ethanol, or to treatment with sodium lauryl sulfate according to the method of Kay, Simmons, and Dounce (IO); in some cases both procedures were used. When necessary, purified DNA's were treated with pancreatic RNase to remove RNA. DNA's of bacteriophages T1 (T1 phage was kindly provided by H. Modersohn) and T5 were prepared by phenol treatment of virus stocks initially purified by differential centrifugation and freed from bacterial DNA by treatment with pancreatic DKase. Paracentrotus lividus (sea urchin) DNA was obtained by extrac- tion with 3 M NaCl of a sperm homogenate, kindly supplied by Dr. R. Hinegardner. Release of DNA from bull sperm, kindly provided by Dr. S. W. Mead, required treatment with 1 N NaOH for 5 hours at 37"; after neutralization, the DNA was pre- cipitated with 95% ethanol and further purified. We are indebted to Dr. R. L. Sinsheimer for DNA of bacterio- phage ChX 174 (ax); to Dr. N. Sueoka for Tetrahymena pyri- formis DNA, prepared essentially by the Marmur method (12), and for Cancer borealis (crab) testis DNA; to Dr. R. Sager for Chlamydomonas DNA prepared by a sodium lauryl sulfate pro- cedure; and to Dr. I<. Burton for Echinus esculenfu (sea urchin) DNA. Aiethods Nearest Neighbor Frequency AnalysisThis method (1) in volves enzymatic replication of a given DNA primer and employs as substrates, deoxyribonucleoside triphosphates in which the sugar-esterified 5`-phosphate is labeled with Pa. Subsequent cleavage between the phosphate and carbon 5` of the synthesized polynucleotide chains yields Ps*-labeled 3'-mononucleotides; P3* introduced into the DNA by the substrate nucleotide conse- quently labels the adjacent nucleotide, ita "nearest neighbor." By determining the Pa content of each of the four 3`-mononu- cleotides isolated from the digested product of a reaction with a given labeled substrate, one can calculate the frequency with which this nucleotide is linked to each of the four nucleotides found in the DNA. The methods of enzymatic synthesis of DNA, digestion to 3`- mononucleotides, separation of 3`-mononucleotides, and calcula- tion of nearest neighbor frequencies were similar to those reported 1961 1962 Enzymatic Synthesis of Deoxyribonucleic Acid. XI Vol. 237, No. 6 Cornpsition detemuned by chemical analysis' 0.246 0.328 0.242 0.185 earlier (1). As before, the extent of DNA synthesis generally represented a 20% increment over the amount of primer added. Determination of Experimental Error-An estimate of error in determination of nearest neighbor frequencies was obtained by duplicate analyses of mouse thymus, mouse lymphoma, starfish testis, and salmon sperm DNA samples. Standard deviations, expressed as percentage of the mean ("coefficients of variation"), were calculated for each of the 16 dinucleotide sequences in each of the four duplicate runs; the average of the coefficients of varia- tion obtained for each sequence was found to vary between 2.33 and 10.0%. The over-all average coefficient of variation was 5.8%. Deviations in Total Isolated Ap and Tp Nucleotides-The total amount of Tp (TpA +- TpT + TpG + TpC) has consistently exceeded the total Ap (ApA + ApT + ApG + ApC) by 2 to 20%. This deviation was not observed in our earlier studies (l), which, although largely confined to bacterial DNA's, also included calf thymus DNA. Since the enzymatic digestions and chromatographic procedures were complete and quantita- tive, it was possible that the character of the polymerase prepara- tion used or the state of the DNA primer might be responsible for this deviation. These factors were explored by (a) the use of different polymerase preparations and (b) pretreatment of the DNA primer with heating or nuclease cleavage. Mouse ascites tumor cell DNA was analyzed with three dif- ferent E. coli polymerase preparations, and heated salmon sperm 20% synthesis Predicted c/::ccal Observed analysis -- 0.328 0.310 0.246 0.242 0.185 0.202 0.242 0.246 4 TTCAOTG Pre- dictedt from analysis 0.287 0.287 0.214 0.214 A/! a33 SO 1.15 (1.00) G/cU 0.5 0.91 (1.00) E 1.33 1.33 1.33 (1.33) FIG. 1. Scheme for replication of single stranded DNA. An arbitrarily selected sequence of bases in a hypothetical single stranded DNA primer is designated in bold print. The base ratios for limited and extensive replication refer to the values for the newly synthesized DNA molecules, designated by standard print. TABLE I Composition of products after limited and extensive replication of ipx DNA Pre- dictedt from Observed synt zoip esis . --- 0.276 0.271 0.276 0.293 0.224 0.213 0.224 0.224 Base A T G C I Composition determined by nearest neighbor analysis Mx)% synthesis * See (13). t Based on unlimited replication (see Fig. 1). DNA, with the distinctive polynierases from normal and T2 phage-infected cells. The coefficients of variation in these analyses did not exceed the experimental error (see above). A comparison of heated with native calf thymus DNA showed no significant change in sequence frequencies in the earlier study (1) or in the present one. \17ith heated salmon sperm DNA com- pared with native, the coefficients of variation for three of the sequences (GpA, TpC, and CpT) differed by more than 10%. Pretreatment of calf thymus DNA with E. coli endonuclease, a variable contaminant of polymerase preparations, had a pro- found effect upon the sequence frequencies; after endonuclease treatment until 53 % of the nucleotides were released, coefficients of variations between treated and untreated DNA were greater than 10% for six of the sequences. However, prior action by pancreatic DNase, E. coli phosphodiesterase, or a polymerase preparation on calf thymus DNA did not seem to alter the se- quence frequencies significantly. It must be concluded that, although distortions of sequence frequencies can be introduced by alteration of the primer, the basis for the systematic deviations between the Tp and Ap analy- ses encountered in these studies is not yet clear. These devia- tions, however, do not alter the principal conclusions which may be drawn from the experiments to be reported. RESULTS Replication of Single Stranded Primer-The DNA of phage OX has been demonstrated by Sinsheimer (13) to be single stranded and to be further distinguished by the absence of equiva- lence between A and T and between G and C. The DNA is a primer for polymerase and can lead to 10-fold or greater net synthesis of a product which has the characteristics of double stranded DNA (14). Two nearest neighbor analyses were car- ried out, one under conditions of limited replication (20% increase over the amount of primer), and the other with extensiue replica- tion (600% increase). In limited replication, DNA synthesis will be directed mainly by original primer molecules. In order to minimize participation of newly synthesized, double stranded molecules as primers, syn- thesis was restricted to a 20% increase over the initial primer added. If such replication follows the base composition of the primer, the nucleotide composition of the newly synthesized product, identified by its P32 label, should be the complement of the base composition of the primer; Le. the A content of the product should be equal to the T content of the primer, and the T content of the product to the A content of the primer (Fig. 1). The A:T ratio of the product should therefore be the reciprocal of the A:T ratio of the primer. Similarly, the G:C ratio of the product should be the reciprocal of the G:C ratio of the primer. As a consequence, the ratio, (A + T)/(G + C), of the product should be identical with that of the primer. The results in Table I show that these predictions are fulfilled. In extensive replicatiora, the priming molecules after the initial period are double stranded, and accordingly their replication should yield a product with identical nucleotide composition. Specifically, equivalence of purine to pyrimidine nucleotides in the product (A = T, G = C) should obtain, and the A (or T) value, for example, should be equal to one-hdf the sum of the .4 and T values of the primer (or limited replication product). Thus, the mole fraction of A in the original strand was 0.246, and that of T, 0.328; consequently, the mole fractions in the complementary strand will be 0.328 for A and 0.246 for T. The June 1962 M. N. Swartz, T. 8. Trautner, and A. Kornberg 1963 ApA, TpT CpA, TpG GpA,TpC CpT, ApG GpT, ApC GpG, CpC APT CPG GPC TPA over-all mole fractions should be, A = (0.246 + 0.328)/2 and T = (0.328 + 0.246)/2. These predictions are borne out by the results obtained (Table I). Individually, the nearest neighbor frequencies obtained under both conditions of synthesis support the replication mechanism discussed here. Matching of complementary base pairs (in the Pn-labeled product) is observed under conditions of ezlensive replication but not in limited replication (Table 11). The fre- quencies of matching nearest neighbor pairs are close to values predicted from the frequencies obtained in limited replication by the same reasoning which had been applied to predict over-all base composition in 600% synthesis. Since each of the four se- quencies, TpA, APT, CpG and GpC, is its own match (1), the frequencies of each of these sequences should remain unaltered, whether replication is limited or extensive; the results fit this prediction closely. Sequence Frequencies in T1 and T5 Bacteriophage DNA-The base composition and the nearest neighbor pattern of the DNA from temperature coliphage X are similar to those of its host, E. coli (1). The DNA's of the virulent T-even phages, T2, T4, and T6, have identical base compositions; their nearest neighbor pat- terns differ from those of E. coli (1). The sequence frequencies in the DNA of T1, a phage intermediate between the typically ate and virulent phages, and that of T5, a virulent phage in many respects to the T-even phages, can be distin- d from one another (Table 111) but do not deviate strik- from values for random association of the nucleotides. ost tenable conclusion to be drawn from these compariSons f sequence patterns is to regard differences as significant but entity as no more revealing than identity of the base composi- e Frequencies in DNA of Several Tissues of a Species- ues for DNA's of three different bovine organs and of use tissues, including two tumors, are compared in Ta- V and V. The coefficients of variation for the bovine for the mouse DNA's were not significantly different obtained for repeated analyses of identical DNA's. e, any differences that may exist among the several bo- ine DNA's or among any of the mouse DNA samples are within Freqzrencies in Crab Testis DNA-Examination of crab testes has revealed discrete and distinctive density gradient centrifugation of DNA iso- d from the testes of five specimens of Cancer borealis, Sueoka two components, the lighter one representing as much a1 DNA. The buoyant density of the main to a G-C content of 42%, which is character- . However, the buoyant density of the minor indicated a low G-C content (20% or less) and was the density of dAT polymer, an alternating copolymer nd T synthesized de novo by polymerase (16). In order to ate the possibility that adventitious materials, such as pro- , might be responsible for the low buoyant density of the A band, and with the thought that this band might even be natural" dAT polymer, Dr. Sueoka asked us to examine t neighbor frequency patterns of the crab DNA prepara- light component of C. borealis primed DNA synthesis at a mparable to dAT, but unlike the latter, all four deoxy- ucleoside triphosphates were required. When dGTP and CTP were omitted, the rate of synthesis was only 19% of that or of the analysis. 0.071, 0.076 0.072, 0.082 0.079, 0.093 0.105, 0.100 0.071, 0.071 0.072, 0.070 0.065, 0.069 0.058, 0.057 0.055, 0.0560.056, 0.0620.062, 0.0650.054, 0.060 0.055, 0.0550.051, 0.0470.060, 0.0530.064, 0.056 0.055, 0.0540.054, 0.0550.056, 0.0540.048, 0.052 0.056, 0.0560.051, 0.0590.046, 0.0440.035, 0.038 0.051 0.050 0.057 0.098 0.068 0.076 0.076 0.103 0.067 0.068 0.058 0.032 0.083 0.074 0.063 0.043 TABLE I1 Nearest neighbor frequencies* of ax DNA in limited and extensive replication (1.29)' (1.24) Liver (1.28) (1.29) Thymus Nearest neighbor sequence ApA, TpT 0.080, 0.085 0.079, 0.090 CpA, TpG 0.078, 0.076 0.072, 0.077 GpA, TpC 0.066, 0.072 0.063, 0.071 CpT, ApG 0.074, 0.069 0.079, 0.071 GpT, ApC 0.049, 0.053 0.052, 0.051 GpG, CpC 0.051, 0.060 0.053, 0.057 TPA 0.052 0.055 APT 0.074 0.071 CPG 0.016 0.015 GPC 0.045 0.046 Nearest neighbor sequence (1.39) Srm K35) 0.084,0.094 0.077, 0.069 0.059, 0.069 0.073, 0.069 0.049, 0.052 0.050, 0.060 0.062 0.076 0.014 0.043 Limited re lication (20%): Otserved 0.101, 0.069 0.096, 0.048 0.054,0.064 0.052, 0.069 0.047, 0.068 0.040,0.053 0.061 0.072 0.045 0.061 Extensive replication (600%) Predicted from limited replica- Observed tiont 0.085, 0.085 0.072, 0.072 0.059, 0.059 0.061,0.061 0.057, 0.057 0.046, 0.046 0.061 0.072 0.045 0.061 0.085, 0.099 0.070, 0.070 0.058, 0.065 0.064, 0.058 0.053, 0.053 0.041, 0.045 0.059 0.075 0.045 0.061 * Expressed, in this and subsequent tables, as decimal propor- t Based on unlimited replication (see Fig. 1). tions of 1.OOO. TABLE I11 bacteriophages and E. coli Nearest neighbor frequencies in DNA's of TI and T6 1964 Enzymatic Synthesis of Deoxym'bonucleic Acid. XI Vol. 237, No. 6 ApA, TpT CpA, TpG GpA, TpC CpT, ApG GpT, ApC GpG, CpC TPA APT CPG GPC observed with the four triphosphates; when dTTP was also omitted from the incubation mixture, the rate was reduced to less than 0.1%. These results suggested at once that at least a few G and C residues were interspersed in the chains of the light crab DNA. 0.091, 0.1010.088, 0.0930.091, 0.0940.084, 0.100 0.076, 0.0830.072, 0.0780.077, 0.0790.074, 0.074 0.060, 0.0620.061, 0.0630.059, 0.0650.059, 0.068 0.076, 0.070 0.075, 0.070 0.075, 0.071 0.072, 0.072 0.057, 0.053 0.057, 0.054 0.051, 0.053 0.056, 0.050 0.051, 0.046 0.051, 0.050 0.052, 0.051 0.048, 0.052 0.060 0.067 0.063 0.062 0.072 0.075 0.075 0.078 0.011 0.009 0.011 0.011 0.038 0.039 0.037 0.039 TABLE V Nearest neighbor frequencies in DNA's of mouse tissues and tumors (0.56)' Chlamydomonas (0.87) Nearest 1 (1.38) (1.38) (1.38) neighbor 1 Tf$rs+ Liver 1 Ly$ph$na I Asci;;!4;;lmor sequence (1.43) i (1.42) (1.15) Wheat germ (1.21) * See Table 111. t Each value is the average of two analyses. TABLE VI Nearest neighbor frequencies of two DNA components of Cancer borealis (crab) testis Nearest neighbor sequence Main component (1.80). 0.085, 0.092 0.067, 0.066 0.053, 0.054 0.061, 0.055 0.057, 0.063 0.032, 0.038 0.113 0.116 0.019 0.030 Light componentt (36.6)' 0.0127, 0.0126 0.0100, 0.0089 0.0042, 0.0015 O.OOO4, 0.0018 0.0081, 0.0069 o.m, 0.0009 0.504 0.429 O.OOO7 0.0015 * The number in parentheses is the (A + T)/(G + C) ratio t Each value is the average of two analyses. determined by nearest neighbor analysis. Nearest neighbor sequence Nucleotide incorporation into DNA in the first stage of the nearest neighbor analysis was in the ratio, A:T:G:C - 0.84: 1.07 :0.030:0.028 mpmoles, indicating a G-C content of approxi- mately 3%. A similar value was obtained by determining the actual nearest neighbor frequencies (Table VI). The most remarkable result of the nearest neighbor analysis is that the light crab DNA appears very similar to the dAT co- polymer, with alternating A and T residues comprising 93% of the sequences. However, all 16 possible sequences are observed, and in strikingly nonrandom distribution. The matching of se- quences follows the predictions of Watson-Crick base pairing, except in those instances in which the very low frequencies are difficult to measure accurately. Also included in Table VI are the nearest neighbor frequencies of the main, heavy component of Cancer borealis DNA. The ratio, (A + T)/(G + C), was 1.8, compared with a value of 1.6 calculated by Sueoka from the buoyant density. Although the heavy crab DN.4 showed no gross contamination with the light component, trace amounts might have escaped detection. Since the light component is a better primer, the reaction product of a nearest neighbor analysis of the heavy component would appear to have a relatively higher A + T content (and higher APT and TpA sequences) than that of the heavy primer. The possibility might also be considered that the light com- ponent is pure dAT and that its contamination by the heavy component is responsible for the presence of G and C residues. This is unlikely, however, since replication of the light com- ponent, as mentioned, is markedly reduced when G and C are omitted from the reaction mixture. Furthermore, the nearest neighbor sequences involving G and C are distinctly different in the two DNA components. Sequence Frequencies in DNA of Various Animal and Plant Species and Bacteria Compared-Analyses of DNA's from several animal and plant species are shown in Tables VI1 and VIII. In the previous study (l), the frequency patterns of DNA's from six bacteria, five phages, and calf thymus were analyzed for their fit to frequencies predicted for a random arrangement of bases in DNA molecules. In random ordering of the nucleo- tides, the frequency of any nearest neighbor pair should be pre- dictable as the product of the frequencies of its constituent mono- nucleotides (e.g. fApT = ffpA = fAp x ffp). Although most of the observed sequence frequencies fell within these predictions, several differed sharply. This is true also in the frequency pat- TABLE VI1 Nearest neighbor frequencies of animal and plant DNA's 0.060, 0.059 0.077, 0.073 0.044, 0.046 0.057, 0.060 0.055, 0.060 0.071, 0.074 0.053 0.054 0.063 0.092 0.072, 0.089 0.068, 0.070 0.062, 0.071 0.067, 0.060 0.056, 0.053 0.051, 0.059 0.058 0.075 0.039 0.050 0.100, 0.116 0.076, 0.069 0.049, 0.064 0.064, 0.047 0.053, 0.062 0.031, 0.046 0.075 0.091 0.022 0.035 0.110,0.102 0.067, 0.067 0.057, 0.059 0.059, 0.053 0.053, 0.058 0.033, 0.038 0.090 0.104 0.020 0.031 Tctralrymcna (3.00) pyriformis (3.25) 0.153, 0.176 0.045, 0.044 0.045, 0.048 0.052, 0.049 0.034, 0.036 0.016, 0.017 0.127 0.133 0.007 0.020 June 1962 Nearest neighbor sequence APA, TPT CPA, TPG GPAi TPC CPT, APG GPT, APC GPG, CPC TPA APT GPC CPG M. N. Swartz, T. A. Trautner, and A. Kornberg (1.42). Human s leen (1.477 0.097, 0.097 0.074, 0.074 0.061, 0.057 0.071, 0.070 0.049, 0.054 0.050, 0.047 0.067 0.081 0.010 0.043 1965 TABLE VI11 Nearest neighbor frequencies of animal tissue DNA's (1.34) Chicken red cell (1.36) (1.43) Salmon liver (1.33) Rabbit liver (1,35) Starfish testis (1.45) 0.091, 0.096 0.073, 0.077 0.060, 0.060 0.071, 0.069 0.051, 0.049 0.062, 0.047 0.059 0.074 0.013 0.048 0.074, 0.083 0.077, 0.075 0.057, 0.067 0.075, 0.069 0.062, 0.062 0.049, 0.052 0.068 0.073 0.017 0.040 0.087, 0.097 0.078, 0.077 0.053, 0.060 0.077, 0.068 0.050, 0.054 0.048, 0.054 0.062 0.072 0.011 0.052 0.102, 0.104 0.072, 0.067 0.058, 0.056 0.058, 0.058 0.064, 0.060 0.043, 0.050 0.066 0.078 0.025 0.038 terns presented in this paper; variances from predicted (random) frequencies for the 16 nearest neighbor pairs were found to be considerably greater than experimental error in the cases shown in Table IX. A comparison of variances of isomeric nearest neighbor pairs such as APT versus TpA, for example, reveals sharp differences. Fig. 2 is an attempt to determine whether variances from ran- 1 TABLE IX Variances from random nearest neighbor frequencies in animal, plant, and bacterial DNA's 12 animal and plant DNA's 6 bacterial DNA's Deviation from random frequency' Variance X 10` Variance X 104 Deviation :rom randon frequency* - calcu- lated from error: - 0.04 0.04 0.07 0.07 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.04 0.04 0.07 0.07 - - - 2 2 6 6 4 4 3 6 3 6 6 - - - 1 2 10 1 10 8 4 6 3 12 9 12 6 - + - 11 10 11 12 12 1 11 2 3 8 3 7 1 5 - 0 - 1 1 1 3 2 2 1 - Ob- served - 0.74 1.13 0.27 0.62 2.62 0.49 2.12 0.77 1.05 0.97 0.38 0.67 4.30 0.80 6.20 0.59 - t - 4 4 5 2 6 2 3 3 6 6 6 Ob- served - 1.35 2.04 1.87 1.83 0.52 1.03 0.56 1.01 0.77 0.58 0.55 0.61 3.69 0.13 1.88 3.26 hlculsted from error$ 0.20 0.21 0.14 0.16 0.01 0.09 0.25 0.09 0.01 0.16 0.09 0.09 0.21 0.19 0.32 0.14 FIG. 2. Sequence frequencies in animal and plant cell DNA's compared with values predicted from random association. The product of frequencies of constituent nucleotides of a nearest neighbor pair is described by the line passing through the origin. The values represented by the lower cuse letters are the observed nearest neighbor frequencies. a, Chlamydomonas; b, wheat germ; c, calf thymus; d, salmon liver; e, rabbit liver; f, chicken red cells; g, mouse lymphoma; h, starfish testis; i, human spleen; I, Echinus esculenta; m, Paracentrotus lividus; 0, Tetrahymena pyriformis. dom distribution have phylogenetic significance. The observed nearest neighbor frequencies are plotted against the values pre- dicted from random association, the product of frequencies of constituent nucleotides of a nearest neighbor pair.' If there were agreement between observed and predicted values, the points for the sequences whose constitutent nucleotides occur with equal frequency would fall on a straight lime through the origin with a slope of 1. As seen in Fig. 2, the sequence, CpG, occurs in animal and plant cells with a frequency which is in- 1 This method of plotting the data was suggested by the obser- vation of A. D. Kaiser and R. L. Baldwin that the nearest neigh- bor frequencies are related to the product of the base frequencies (submitted for publication to the Journal of Molecular Biology). * Key: +, observed frequency > random expectancy; 0, ob- served frequency = random expectancy (rt0.005); -, observed frequency < random expectancy. z(fob. - fpred)' n-1 * t t."(Sfp.ed)P; n = number of entries; s = coefficient of variation (for animal and plant DNA's, determined as in "Methods"; for bacteria, an averagecoefficient of variation of 3% was assumed for each nearest neighbor frequency). n-1 1966 Enzymatic Synthesis of Deoxyribonucleic Acid. XI Vol. 237, No. 6 FIG. 3. Sequence frequencies in bacterial DNA's compared with values predicted from random association, as in Fig. 2. a, ~4ycrococcus lysodeikticus; b, Mycobacterium phlei; c, Aerobacter aerogenes; d, Escherichia coli; e, Bacillus subtilis; f, Haemophilua influenzae. variably less than random and in some cases only one-third the random value; by contrast, the frequencies of the isomeric se- quence, GpC, come very close to the expectation for random- ness. In bacteria, the reverse of this pattern appears to be the case (Fig. 3). In the case of the TpA and ApT sequence frequencies, bacterial DNA's as well as animal and plant cell DNA's show a TpA fre- quency that is decidedly below the prediction for random associa- tion; the APT frequencies, however, are close to the predicted values (Figs. 2 and 3). DISCUSSION Mechanism of Enzymatic Replication of Single Stranded DNA- Replication of the single stranded DNA from phage (PX was studied by analysis of a product after 20% increase in DNA over the primer added (limited replication) and after a 600% increase (extensive replication). The base composition of the DNA syn- thesized under conditions of limited replication was complemen- tary to that of the primer. A comparison of nearest neighbor frequencies after extensive replication with those from limited replication showed that the nearest neighbor sequences also con- formed with complementary base pairing with the primer. The fact that a copy of only 20% of the primer chains is representa- tive of the total can be interpreted in two ways: (a) only 1 out of 5 molecules primes and is completely replicated or (a) replica- tion of an average of # of the 5000 nucleotide sequences of each molecule is representative of the entire molecule. Density gra- dient centrifugation experiments indicating that virtually all (>85%) of the (PX DNA is combined with new DNA when a 20% increase is reached2 make the first interpretation unlikely and the second preferred. 3 I. R. Lehman, R. L. Sinsheimer, and A. Kornberg, unpublished observations. Earlier viscometric and optical measurements had shown that a double helical molecule is produced from the single stranded primer (14). The chemical determinations presented here indi- cate that the enzymatically synthesized strand (of the size of the "20% synthesis product") already primes as effectively as the original strand of phage DNA and that both strands of a double helix may prime in the ensymatic replication. Although these findings are restricted to the product of enzy- matic action in Vitro, they are consistent with Sinsheimer's (17) recent isolation from infected cells of a "replicative form" of (PX DNA which has many features of a double helii. It would follow from these observations that at least the initial step in replication of @X DNA is like that of other DNAs' in the formation of a com- plementary strand and that some unique process is responsible for including only one type of strand into mature phage. Nearest neighbor sequence analyses of DNA have now been performed with an RNA polymerase directed specifically by DNA (18-20). The composition and the sequence frequencies of en- zymatically synthesized RNA's primed by various DNA's (dAT, dGdC, DNA's of phage QX, phage T2, calf thymus) are identi- cal with those obtained with DNA polymerase. Synthesis of this type of RNA seems clearly to involve pairing of uracil to adenine and cytosine to guanine in the manner of DNA synthesis by E. coli polymerase. A bothersome, frequent deviation in the current studies was the absence of matching between certain of the complementary sequences, resulting in a lack of correspondence between the over-all frequencies of Tp and Ap nucleotides. Although the basis for this deviation has not been pinpointed, it is clear that exposure of DNA to one of the E. coli nucleases known to be a variable contaminant of the polymerase preparation leads to some distortion in the sequence pattern when this DNA is sub- sequently used as primer. Comparison of Sequence Patterns of DNA's Within a Specie- Although DNA's isolated from different tissues of the same spe- cies have the same base composition, it is conceivable that dif- ferences might exist in the arrangement of the bases. A com- parison by nearest neighbor analysis of bovine sperm, thymus, and liver revealed no variations beyond the error of the method; the DNA's of two normal tissues and two tumors of the mouse also failed to reveal significant differences. A surprising development has provided two separable and dis- tinguishable DNA entities within a species. Sueoka identified and isolated two DNA components from crab testis on the basis of distinctive buoyant densities (15). Sequence analysis of the "light" crab DNA uncovered the fact that it contains G and C residues as 3% of the total bases interspersed among A and T residues, which are almost invariably in alternating sequence. This DNA is therefore similar to the simple and well ordered dAT, the copolymer synthesized de mu0 by polymerase. The main crab DNA component, by contrast, has a sequence pattern resembling other animal DNA's of similar base composition. The biological significance of the "light" crab DNA is an intrigu- ing subject and one which immediately suggests studying the dis- tribution of this DNA in other somatic cells and the sperm of this species. Comparison of Sequence Patterns Among Bacterial, Bacterio- phage, Animal, and Plant DNA's-In the previous report, confined largely to the sequence patterns of bacterial and bac- teriophage DNA's, it was established in many cases that the nucleotide arrangements did not conform to predictions of ran- June 1962 ;M. N. Swartz, T. A. Trautner, and A. Kornberg 1967 dom distributions. In the present work, concerned principally with animal DNA's, it is clear that deviations from random ar- rangements also obtain. The most striking example is the fre- quencies of the CpG sequence, which are only approximately one- third of the values calculated from the base composition, whereas those of the isomeric GpC sequence are close to the calculated values. The consistency with which certain nearest neighbor sequences, determined from a wide variety of DNA's, deviate from the random expectancy is a surprising finding if one con- siders that in each nearest neighbor experiment, on the order of 10'5 dinucleotides, representative of the dinucleotides of many different primer molecules of a given DNA, are analyzed. This result suggests that determinants other than random arrange- ment must have governed the development of the sequence pat- tern of a species. When all of the sequence frequencies are surveyed, the data appear to distinguish the animal and plant cells from the bacteria and bacteriophages. Other distinctions between and within these groups are suggestive but less striking than the CpG se- quence. Were it possible to refine the accuracy of the sequence frequency analyses, these examples might increase and permit more meaningful phylogenetic correlations. The connection be- tween the relative abundance of certain sequences and their phenotypic expression as amino acid sequence remains an ulti- mate objective. SUMMARY 1. Replication of the single stranded deoxyribonucleic acid (DNA) from phage (PX 174 was studied under conditions of lim- ited synthesis (20% increase over the primer) and extensive syn- thesis (600% increase). The base composition of the products and the nearest neighbor frequencies conform to the model based on pairing of adenine to thymine and of guanine to cytosine be- tween strands of opposite polarity. This experiment further indicates that the enzymatically synthesized DNA strands prime as effectively as the natural strand of phage DNA and that both strands of a double helix may prime in the enzymatic replication. 2. A comparison of the nearest neighbor sequence analyses of the DNA's of several bovine tissues (sperm, thymus, liver) revealed no variations beyond the error of the method; similar re- sults were obtained in a comparison of the DNA's of mouse tis- sues, including two tumors. However, the two DNA compo- nents isolated by Sueoka from crab testis were shown to have distinctive sequence patterns, one remarkably similar to the dAT copolymer synthesized de novo by polymerase. 3. A survey of the sequence patterns of a variety of animal and plant DNA's revealed that certain sequence frequencies were strikingly different from predictions based on random arrange- ment of the bases. The data appear to distinguish animal and plant cells from bacteria by the very low frequency of the cytidyl- (3'-5')-guanosine (CpG) sequence in the former group and the high frequency of the isomeric guanyl- (3'-5`) -cytosine (GpC) sequence in the latter group. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. REFERENCES Jossn, J., KAISER, A. D., AXD KORNBERQ, A,, J. Biol. Chem., LEHMAN, I. R., BESSMAN, M. J., SIMMS, E. S., AND KORNBERG, SMITH, M., AND KHORANA, H. G., J. Am. Chem. Soc., 80, 1141 CUNNINGHAM, L., CATLIN, B. W., AND DE GARILHE, M. P., HILMOE, R. J., J. Biol. Chem., 236, 2117 (1960). APOSHIAN, H. V., AND KORNBERG, A., J. Biol. Chem., 237, 519 LEHMAN, I. R., Roussos, G. G., AND PRATT, E. A., J. Biol. LEHMAN, I. R., J. Biol. Chem., 236,1479 (1960). LOWRY, 0. H., ROSEBROUGH, N. J., FARR, A. L., AND RANDALL, KAY, E. R. M., SIMMONS, N. S., AND DOUNCE, A. L., J. Am. SEVAQ, M. G., LACHMAN, D. B., AND SMOLENS, J., J. Biol. MARMUR, J., J. Molecular Biol., 3,208 (1961). SINSHEIMER, R. L., J. Molecular BioZ., 1,43 (1959). LEHMAN, I. R., Ann. N. Y. Acad. Sci., 81, 745 (1959). SUEOKA, N., J. Molecular Biol., 3.31 (1961). SCHACHMAN, H. K., ADLER, J., RADDING, C. M., LEHMAN, I. SINSHEIMER, R. L., J. Molecular Biol., in press. WEISS, S. B., AND NAKAMOTO, T., Proc. Natl. Acad. Sci. U. S., 236, 864 (1961). A., J. Biol. Chem., 233, 163 (1958). (1958). J. Am. Chem. SOC., 78,4642 (1956). (1962). Chem., 237, 819 (1962). R. J., J. Biol. Chem., 193, 265 (1951). Chem. SOC., 74, 1724 (1952). Chem., 124, 425 (1938). R., AND KORNBERQ, A., J. Biol. Chem., 236,3242 (1960). 47. 1400 f1961). 19. FURTH, J. J., H~RWITZ, J., AND GOLDMANN, M., Biochem. and 20. CHAMBERLIN, M., AND BERQ, P., Proc. Natl. Acad. Sei. U. S., Biophys. Research Communs., 4, 431 (1961). 48, 81 (1962).