and Saskia A. Hogenhout1*
The phylogenetic tree of mollicutes is composed of two major clades that diverged early in evolution (51). One clade contains the orders Acholeplasmatales and Anaeroplasmatales (AAA clade mollicutes), and the other clade contains the orders Mycoplasmatales and Entomoplasmatales (SEM clade mollicutes) (9). Phytoplasmas, formerly known as mycoplasma-like organisms of plants, form a monophyletic group in the order Acholeplasmatales (51) and were recently assigned to a novel genus, “Candidatus Phytoplasma” (41). Approximately 20 phytoplasma phylogenetic groups have been proposed based on 16S rRNA gene sequences, and new branches are continuously being discovered (69, 85). Members of the order Acholeplasmatales are distinct from other mollicutes in several ways. For instance, whereas most mollicutes use UGA as a tryptophan codon instead of a stop codon, a feature they share with mitochondria, the acholeplasmas and phytoplasmas retained UGA as a stop codon (80).
Mollicutes have been extensively studied because of their economic importance. They are disease agents and obligate inhabitants of humans, mammals, reptiles, fish, arthropods, and plants. Phytoplasmas are generally associated with arthropods and plants, whereas mycoplasmas (Entomoplasmatales and Mycoplasmatales) and ureaplasmas (Mycoplasmatales) are pathogens that cause infections of the respiratory and urogenital tracts, eyes, alimentary canals, glands, and joints of humans and animals. Interestingly, three spiroplasmas, Spiroplasma kunkelii, Spiroplasma citri, and Spiroplasma phoeniceum, are also insect-transmitted plant pathogens but belong to the order Entomoplasmatales (34) and hence are distantly related to the phytoplasmas. Dual phytoplasma and spiroplasma infections of insects and plants occur frequently (40).
Several mycoplasmas, ureaplasmas, spiroplasmas, and acholeplasmas have been cultured outside their hosts in artificial culture media. Culture media are complex, because mollicutes suffered extensive gene losses and consequently lack genes of many basic metabolic pathways. However, to date, phytoplasmas have not been cultured in cell-free medium, indicating that phytoplasmas have a different metabolism and are likely to have more highly reduced genomes than other mollicutes.
The aster yellows phytoplasma (AYP) strain witches' broom (AY-WB) (“Ca. Phytoplasma asteris”; class Mollicutes) generally spreads systemically in lettuce (Lactuca sativa L.) and China aster (Callistephus chinensis Nees), inducing a variety of symptoms, including vein clearing, yellowing, stunting, witches' broom, pigment loss or sterility of flowers, and necrosis (99). The extreme malformations of plants suggest that phytoplasmas interfere with plant hormone metabolism (51). AY-WB also spreads systemically in Arabidopsis thaliana and Nicotiana benthamiana, inducing yellowing, stunting, and witches' broom in both (X. Bai, V. Correa, and S. A. Hogenhout, unpublished results). AY-WB was classified into the 16SrI-A subgroup of “Ca. Phytoplasma asteris” based on the restriction fragment length polymorphism banding pattern of a 1.2-kb 16S rRNA gene PCR fragment (99). In contrast, onion yellows (OY) phytoplasma strain M (OY-M), the only other phytoplasma for which a complete genome sequence is available (74), belongs to the 16SrI-B subgroup (51). “Ca. Phytoplasma asteris,” previously known as AYP or group I phytoplasma (52), is the largest of the phytoplasmas and associates with more than 100 economically important diseases worldwide (51, 62). Plant hosts include broad-leaf, herbaceous plants and several woody fruit crops (62).
AY-WB is transmitted by the polyphagous leafhopper Macrosteles quadrilineatus (Forbes). Phytoplasma interactions with insects are complex and involve intra- and extracellular replication in gut and salivary glands, epithelial and muscle tissues, and other organs and tissues. Whereas there is evidence that some phytoplasmas are vertically transmitted to the progeny of their insect vectors (37), the predominant means of survival of phytoplasmas is through transmission between insects and plants. They appear to manipulate their insect and plant hosts to enhance their own transmission efficiencies. For example, AYPs can increase fecundity and longevity of their insect vector, Macrosteles quadrilineatus (13).
Because of their small genomes and economic importance, mollicutes have been targeted for genome sequencing projects for some time. Mycoplasma genitalium was the second bacterium to be sequenced to completion because of its minimal gene complement for a cultivable organism (33). Thus far, genomes of at least nine SEM clade mollicutes and one AAA clade mollicute (OY-M phytoplasma) (76) have been fully sequenced. Here, we report the full sequence of the small genome of AY-WB. Comparative genome analysis revealed the presence of 14 to 23% repetitive DNA organized in potential mobile units (PMUs) in the phytoplasma genomes and differences in standard metabolic and nonmetabolic pathways between phytoplasmas and SEM clade mollicutes.
DNA isolation. The AY-WB strain was collected from diseased lettuce plants in Celeryville, Ohio (41.00°N, 82.45°W), in 1998 (99). AY-WB was isolated from lettuce plants about 2 weeks after the appearance of symptoms. The stems of lettuce plants were cut at several places with a sharp razor blade, and phloem sap oozing from the cut area was collected. On average, 1.6 ml sap was collected from each symptomatic lettuce plant. For preparation of gel plugs, 200 μl sap was immediately mixed with 800 μl precooled 30% glucose-1× Tris-EDTA (pH 8.0) buffer, followed by centrifugation at 16,000 × g for 20 min at 4°C. The pellet was mixed with 80 μl 1% premelted low-melting agarose (45°C) in 0.5× Tris-borate-EDTA (pH 8.0) and incubated at 4°C. Solidified plugs were subjected to proteinase K digestion at 50°C for 48 h and then rinsed with 1× Tris-EDTA buffer (pH 8.0) three times before subjection to pulsed-field gel electrophoresis (PFGE). PFGE was conducted in a 1% agarose gel with a running time of 18 h, a 60- to 120-s switch time ramp, a voltage of 6 V/cm, and an included angle of 120° (CHEF-DR III; Bio-Rad, Hercules, CA). The AY-WB chromosome produced a single band of ~700 kb in the PFGE gel. The identity of the band was confirmed by Southern blot hybridizations and PCR using phytoplasma-specific probes and primers, respectively. The 700-kb fragment was excised from the gel, and the gel blocks were placed directly into the Elutrap (Schleicher & Schuell) collection chamber for elution of DNA at 106 V at 4°C for 15 h. DNA was ethanol precipitated using standard procedures and resuspended in deionized distilled water. The concentration of the purified genomic DNA was assessed using a PicoGreen kit (Molecular Probes).
Sequencing strategy. The shotgun library was constructed at Integrated Genomics Inc. (IG). Five micrograms of DNA was sheared using a computer-controlled shearing device (GeneMachines, San Carlos, CA) to produce DNA fragments of 2 kb on average. Sheared DNA was loaded onto 0.7% agarose gels, and DNA fractions corresponding to 2 to 2.5 kb were extracted from the agarose gel. Single-stranded ends of the DNA were removed by T4 polymerase and then filled in with Klenow fragment. Size-selected 2- to 2.5-kb DNA fragments were cloned into the pGEM-3Z vector (Promega, Madison, WI), introduced into Escherichia coli DH10B, and sequenced with the DYEnamic ET Dye Terminator kit (Amersham Biosciences, Piscataway, NJ). Sequence quality assessment and subsequent assembly were performed with the Phred/Cross_match/Phrap package (29, 30) and Paracel Genome Assembler. Sequencing and physical gaps in the assembly were closed by multiplex PCR (92) and primer walking.
Annotation. The sequence data of AY-WB were submitted to the IG database and software suite, ERGO, for sequence annotation. CRITICA (8), Glimmer2 (25), and IG proprietary tools were used for open reading frame (ORF) identification. ORF function annotation was conducted by a number of IG proprietary algorithms that automatically predict the function of ORFs based on comparative analysis with orthologue clusters in ERGO. In addition, the predicted proteins were searched, using the BLAST algorithm (6), against a nonredundant database at the National Center for Biotechnology Information (NCBI). Protein functional domains were analyzed by searching against the NCBI conserved-domain database (60) and the Pfam database (12). The Kyoto encyclopedia of genes and genomes was used for the reconstruction of the metabolic pathways. The assignment of Enzyme Commission (EC) number was done according to the BRENDA database (86).
Nucleotide sequence accession numbers. Sequences of the AY-WB genome have been deposited in the GenBank database under accession numbers CP000061 (chromosome), CP000062 (plasmid AYWB-pI), CP000063 (plasmid AYWB-pII), CP000064 (plasmid AYWB-pIII), and CP000065 (AYWB-pIV). More detailed information on the AY-WB genome is available on our website (http://www.oardc.ohio-state.edu/phytoplasma).
General genomic features. The AY-WB genome is composed of one circular chromosome of 706,569 bp (Fig. 1A) and contains two rRNA gene operons, 31 tRNA genes, and 671 predicted ORFs (Table 1). UGA was used as a stop codon for the prediction of the ORFs. This is consistent with other reports showing that acholeplasmas and phytoplasmas retained UGA as a stop codon, unlike SEM branch mollicutes, which use UGA as a tryptophan codon instead of a stop codon (80). This is also in agreement with annotations conducted for OY-M (76). Our results were not in agreement with a previous report that stated that UGA should be considered as a tryptophan codon in phytoplasmas, as in mycoplasmas (64). The average guanine (G) and cytosine (C) content of the AY-WB chromosome is 27%. The genome has an irregular GC-skew pattern that is different from most prokaryotic genomes, which usually consist of two major shifts near the origin of replication and the terminus of replication (35). Irregular GC-skew patterns were also found in the genomes of some other bacteria, such as Wolbachia pipientis (97) and Mycoplasma mycoides (95). Because the location of the origin of replication (oriC) was not clear, the first nucleotide of the dnaA gene was assigned as bp 1. However, oriC is most likely located upstream of dnaA as predicted by Oriloc software (32) and by the opposite direction of ORFs surrounding the putative oriC (Fig. 1A) (35).
In addition to the chromosome, four small circular plasmids were identified (Fig. 1B and Table 2). This was surprising, because the DNA isolation procedure should not allow the isolation of small DNAs. One explanation for this discrepancy is that the plasmids are present at high copy numbers in the phytoplasma cell. As a consequence, some plasmid DNA was copurified from the PFGE gel along with the AY-WB chromosomal DNA. The plasmids contain a total of 22 putative ORFs, and their average GC contents ranged from 21.8% to 25.6%. Each plasmid has genes for a replication initiation protein (Rep) and a single-stranded DNA-binding protein (SSB) that are involved in rolling-circle amplification (45), whereas the functions of the other genes are not known. However, most of the plasmid genes were predicted to encode secreted or membrane proteins (Fig. 1B), and except for ORF pIII02 of AYWB-pIII and pIV06 of AYWB-pIV, all genes are similar to OY-M phytoplasma sequences (Table 2). It is striking that whereas the plasmids encode different Rep proteins, they contain paralogous genes in similar orders (Fig. 1B). Two AY-WB plasmids (AYWB-pI and AYWB-pIII) contain repA genes similar to geminiviruses repA, whereas the rep genes of the other two plasmids (AYWB-pII and AYWB-pIV) were unique to AY-WB and OY phytoplasmas.
The AY-WB plasmids seem prone to mutation. First, ORFs pIII04 and pIII05 of AYWB-pIII are similar to the 5′ and 3′ portions, respectively, of paralogous genes on the other three plasmids, suggesting that a mutation to a stop codon produced two ORFs in AYWB-pIII. Furthermore, the sequence between pII03 and ssb of AYWB-pII is similar to genes pI04, pIII06, and pIV04 of the other three plasmids but was not annotated as an ORF because of the presence of a premature stop codon. In addition, plasmids apparently recombine with the chromosome, as the latter contains three truncated ORFs similar to the geminivirus-like repA plasmid genes and one truncated copy similar to the rep gene (Fig. 1C).
Repetitive and mobile DNA in the AY-WB genome. The AY-WB genome contains long repeating units of DNA. Of the 671 predicted ORFs of AY-WB, 191 (28%) ORFs, covering 97,374 bp (13.8%) of the AY-WB chromosome, are present as multiple copies (Fig. 2A). Of these 191 ORFs, 134 (20%), covering 71,979 bp (10.2%) of the chromosome, are organized as clusters, consisting of genes encoding transposases (tra5), DNA primases (dnaG), DNA helicases (dnaB), thymidylate kinases (tmk), Zn-dependent proteases (hflB), DNA-binding proteins HU (himA), single-stranded DNA-binding proteins (ssb), and specialized sigma factors (sigF) and a number of other genes with unknown function (Fig. 3). Many of these hypothetical proteins are predicted to target phytoplasma membranes (Fig. 1 and 3 and Table 3) and are therefore likely involved in AY-WB interactions with plant and insect hosts.
The phytoplasma tra5 insertion sequences (ISs) belong to the IS150 group of the IS3 family (53, 58). The presence of tra5 ISs and other genes involved in recombination and repair, such as himA, suggests that these cluster are mobile elements and, hence, were named PMUs. PMU1 is flanked by a complete tra5 IS on one side and a truncated tra5 IS at the other side as well as inverted repeats (IRs) of 327 bp (Fig. 3A). Sequences highly similar to the PMU1 inverted repeats were also found adjacent to the tra5 ISs of the other three PMUs (Fig. 3A). Another striking observation is that all PMUs contain copies of dnaG, dnaB, ssb, and tmk that are involved in DNA replication, suggesting that the PMUs may transpose in a replicative fashion.
The AY-WB genome also contained several clusters that look like derivatives of PMUs, as they contained truncated versions of PMU ORFs with gene orders similar to those of PMUs. It is likely that these PMU-like clusters are in the process of being eliminated. Based on the positions of the tra5 insertion sequences, the PMUs or PMU-like ORF clusters are present in at least seven locations in the AY-WB chromosome (Fig. 1A). At three locations in the AY-WB genome, PMUs are located adjacent to each other. The largest PMU-rich region of the AY-WB chromosome is ~75,000 bp (Fig. 1A), including PMU1 and PMU2 (Fig. 3A).
Not all dnaG, dnaB, tmk, hflB, himA, and ssb genes are part of PMUs or PMU-like clusters. As discussed above, several ssb genes are located on plasmids or in plasmid-derived sequences within the chromosome (Fig. 1B and C). The AY-WB chromosome also contains single copies of dnaG, dnaB, tmk, himA, and hflB homologs, which are clearly different in sequence from the PMU genes. Furthermore, AY-WB contains several multicopy sequences that are not part of PMUs, including one complete copy and several truncated copies of uvrD and dam.
Comparative genome analysis of phytoplasmas. The AY-WB chromosome is 154,062 bp smaller than that of OY-M, and AY-WB has 83 fewer ORFs than OY-M (Table 1). This difference in genome size is the result of a lower number of multicopy genes in AY-WB compared to OY-M (Fig. 2A). OY-M multicopy genes are also organized in PMUs. The AY-WB genome contains 97,374-bp (13.8%; 191 ORFs) multicopy sequences compared to 195,035-bp (22.7%; 268 ORFs) multicopy sequences for OY-M, and the majority are clustered in PMUs with 71,979 bp (10.2%; 134 ORFs) for AY-WB and 121,226 bp (14.1%; 175 ORFs) for OY-M. Thus, compared to OY-M, the 154,062-bp-smaller genome of AY-WB is due to 97,661 bp fewer multicopy genes. The percentages of noncoding DNA are similar between AY-WB and OY-M, but because the OY-M genome is larger, OY-M noncoding DNA absorbs an additional 55,728-bp genome size difference between AY-WB and OY-M (Fig. 2A). As expected based on these observations, the numbers of single-copy ORFs are similar between the phytoplasmas, with 432,553 bp (61.2%; 482 ORFs) for AY-WB and 433,226 bp (50.3%; 486 ORFs) for OY-M (Fig. 2A).
The alignment of the AY-WB and OY-M genomes has an X-shaped pattern, illustrating synteny of the majority of AY-WB and OY-M sequences but an inverse orientation of large genome segments (Fig. 2C). In both AY-WB and OY-M, the largest aligned region is ~250 kb and starts with the lplA gene at 423,992 bp in AY-WB and 354,087 bp in OY-M and ends with glnQ at 660,824 bp in AY-WB and 103,752 bp in OY-M (Fig. 2C, arrowheads). This region is upstream of the putative oriC in AY-WB but downstream of the putative oriC in OY-M. In both AY-WB and OY-M, these ~250-kb regions contain the majority of the metabolic genes and do not contain tra5 insertion sequences (Fig. 1A).
The PMUs tend to congregate, as evidenced by the groups of ISs, and are frequently located on opposite strands, as can be noticed by the correlation of GC-skew inflection points and the boundaries of sense-antisense regions as well as tra5 insertion sequences in the AY-WB chromosome (Fig. 1A). The alignment of the AY-WB and OY-M chromosomes revealed that PMUs or PMU-like sequences at six locations in the AY-WB chromosome are also present at the same locations in the OY-M chromosome. However, at three locations, the sequences in AY-WB or OY-M have undergone excessive deletion and mutation events. PMU sequences at one location in the AY-WB chromosome and four locations in the OY-M chromosome are unique to each of the phytoplasmas. Like AY-WB, the OY-M genome contains several genes that are not part of PMUs, including two full-length and several truncated copies of dam and three full-length and several truncated copies of uvrD. Our observations are consistent with those of others, as Oshima et al. (76) previously reported that the OY-M genome contains multiple copies of uvrD, hflB, tmk, dam, and ssb, constituting 18% of the total genes.
Besides the PMUs and other multicopy sequences, other differences between AY-WB and OY-M were found. Strikingly, AY-WB lacks most sequences that are truncated in OY-M (Fig. 2B), including hsdR and hsdM of the type I restriction modification system, three adjacent fragments with similarities to recA, and two adjacent sequences of the sucP gene for sucrose phosphorylase (EC 2.4.1.7). AY-WB also lacks genes that are part of incomplete pathways in OY-M, including rfaG (EC 2.4.1.157) of the glycerolipid metabolism pathway and pdxK (EC 2.7.1.35) of the vitamin B6 pathway. Finally, whereas AY-WB lacks folC (EC 6.3.2.17) and has truncated versions of folK (EC 2.7.6.3) and folP (EC 2.5.1.15), OY-M has full-length copies of these genes that belong to the folate biosynthesis pathway. Only a few AY-WB ORFs with functional annotations were absent from OY-M (Fig. 2B). These include cbiQ and evbH of the cobalt and multidrug ATP-binding cassette (ABC) transporter systems, respectively (Table 4). However, OY-M has chromosome fragments with similarities to cbiQ and evbH, but ORFs were not assigned. Except for these sequences, a high degree of gene content conservation was observed between the genomes of AY-WB and OY-M, including major metabolic pathways and ABC and P-type ATPase transporters (76) (Tables 4 and 5).
Comparative genomics of phytoplasmas and other mollicutes. To determine to what extent phytoplasma genomes differ from the distantly related SEM clade mollicutes, ORF sequences of the AY-WB and OY-M phytoplasmas were compared to those of nine Mycoplasma and Ureaplasma spp. (blastp; E value, <10−5). More than half of the phytoplasma ORFs had similarities to those of SEM clade mollicutes, and AY-WB and OY-M had an equal number of unique phytoplasma ORFs (318 ORFs) (Fig. 2D). Relative to OY-M, AY-WB contained fewer ORFs that were present in several but not all SEM branch mollicutes (146 ORFs for AY-WB versus 214 ORFs for OY-M) (Fig. 2D). The ~250-kb segment between the lplA and glnQ genes that is syntenic between the AY-WB and OY-M phytoplasmas (Fig. 2C) contained the majority of the ORFs conserved among mollicutes (Fig. 1, blue patches in ring 5), while the less syntenic region (the first 400 kb of the AY-WB genome) (Fig. 2C) are repeat rich (Fig. 1 [IS element ring 4] and 2C) and are more enriched with phytoplasma-specific ORFs (Fig. 1, red patches of ring 5).
Of the 318 ORFs that are unique for phytoplasmas in the class Mollicutes, 40 had functional annotations and were closely examined (Table 6), since these may be part of metabolic pathways absent from SEM branch mollicutes. These 40 ORFs include sfcA for NAD-specific malic enzyme (EC 1.1.1.38) and two copies of the malate/citrate-sodium symporter gene citS. Phytoplasmas have a maltose ABC transporter system, including a maltose-binding protein (MalE) (Table 4) and several other transporters that are not present in the SEM clade mollicutes (Table 6). These include several components of the art and gln ABC transporter systems that might be important for the import of glutamine and arginine, respectively, and several solute-binding proteins, including ArtI, which is predicted to bind arginine (39); the dipeptide binding protein and d-aminopeptidase DppA (20); and NlpA lipoprotein (98), for which the gene is located between methionine ABC transporter genes and which hence may be a methionine binding protein (Table 4). Phytoplasmas also have mntB and znuA of the manganese (Mn) and zinc (Zn) ABC transporter system (15) (Table 6). All the solute-binding proteins were predicted to have signal peptides (SignalP v3.0) (14) and are likely extracellular lipoproteins (38). Two ABC transporters have adjacent genes for thermostable carboxypeptidase 1 (EC 3.4.17.19) and oligoendopeptidase F (EC 3.4.24.−) that can process imported peptides and that were not present in the genomes of SEM branch mollicutes (Table 6). Finally, three AY-WB genes were annotated as norM that encodes a Na+-driven multidrug efflux pump. One norM gene had similarity to genes of SEM mollicutes, whereas the other two did not. These two are located adjacent to each other and are transcribed in opposite directions in both the AY-WB and OY-M genomes.
Other genes present in AY-WB and OY-M but absent from SEM branch mollicutes are pssA and psd (Table 6) of the phosphatidylethanolamine pathway (63). Furthermore, mycoplasmas lack pcnB encoding poly(A) polymerase (EC 2.7.7.19) and pnp encoding polyribonucleotide nucleotidyltransferase (EC 2.7.7.8). Both are involved in the regulation of mRNA stability. Interestingly, the pnp gene is present in the genome of S. kunkelii (9), which is also an insect-transmitted plant-pathogenic mollicute. Polyribonucleotide nucleotidyltransferase may be involved in the persistent infection of insects and/or adaptation to diverse hosts and habitats of phytoplasmas and spiroplasmas (9). The adjoining phytoplasma genes pmbA and tldD were not identified in SEM branch mollicutes either. PmbA and TldD regulate DNA gyrase function and are involved in protein maturation (3, 70, 83).
Compared to other mollicutes, phytoplasmas lack several essential transporters and pathways. AY-WB and OY-M lack phosphoenolpyruvate:sugar phosphotransferase (PTS) systems for the import of sugars essential for glycolysis. AY-WB and OY-M also lack F-type ATP synthases. This is in contrast to mycoplasmas and ureaplasmas that have ATP synthase complexes, including the A, B, and C subunits for the transmembrane channel and the five-subunit (alpha, beta, gamma, delta, and epsilon) catalytic core for ATP synthesis, and can use the transmembrane potential for ATP synthesis (80). However, phytoplasmas have five genes encoding P-type ATPases (Table 5) that may generate electrochemical gradients over the membrane.
Phytoplasmas have fewer genes in the standard recombination pathway and SOS response in comparison to SEM branch mollicutes. All mollicutes sequenced so far lack recB, recC, recD, recG, and ruvC of the recombination pathway and recN, recO, recQ, and recR of the SOS response, although some mycoplasmas carry recR and recO. Thus, SEM branch mollicutes have recA, recU, ssb, polA, gyrA, gyrB, ruvA, and ruvB, a rudimentary set of genes that permit homologous recombination. Of these, phytoplasmas do not have recA, ruvA, and ruvB. Hence, phytoplasmas have a deficient homologous recombination machinery.
AY-WB virulence. The AY-WB genome was analyzed for similarities to known bacterial virulence factors. Several putative hemolysins of AY-WB were identified based on annotation. These include a protein annotated as HlyC, a putative hemolysin III. This protein belongs to the integral membrane protein family (Pfam domain number PF03006), which includes a protein with hemolytic activity from Bacillus cereus. However, other proteins in this family play a role in lipid and phosphate metabolic pathways. Another putative hemolysin-related protein of AY-WB was annotated as TlyC, which carries resemblance to cluster of orthologous group 1253 of hemolysins and related proteins containing CBS domains. Indeed, AY-WB TlyC contains a CBS domain (Pfam domain number PF00571). However, the AY-WB TlyC protein has an N-terminal transmembrane region (Pfam domain number PF01595) not found in TlyC proteins and a C-terminal domain that is present in the C terminus of Na+/H+ antiporters, including CorC, which is involved in magnesium and cobalt efflux (Pfam domain number PF03471). Thus, it is not clear whether HlyIII and TlyC of AY-WB are hemolysins.
Two AY-WB proteins, AYWB_084 and AYWB_352, are similar to the Legionella pneumophila virulence factor IcmE (E values of 5e−21 and 5e−05, respectively), which is part of the type IVB secretion system apparatus that translocates bacterial proteins into host cells (87). Proteins with similarities to IcmE were also identified in the OY-M genome (76). IcmE has sequence similarity to plasmid genes involved in conjugation (87). In both AY-WB and OY-M, the majority of the icmE-like sequences were located upstream of the ATP-dependent helicase gene uvrD. UvrD belongs to the Rep family of helicases and catalyzes ATP-dependent mediated unwinding of double-stranded DNA into single-stranded DNA and has a role in the recF recombination pathway, methyl-directed mismatch repair, and UvrABC-mediated nucleotide excision repair and replication (36, 67). Similarly to the other repeated sequences, the OY phytoplasma genome contains multiple copies of icmE-like sequences and full-length uvrD, whereas the AY-WB phytoplasma contains only one full-length icmE-like sequence and uvrD and multiple truncated copies of these sequences. Further research should reveal whether the icmE-like sequences of phytoplasmas mediate conjugation or are somehow involved in the recombination pathway. No other similarities of phytoplasma sequences to type III and type IV secretion systems were observed. This may not be surprising, as translocation of virulence factors via type III and type IV secretion systems is more specific for gram-negative bacteria.
AY-WB and OY-M share the genes of the protein export and targeting components of the sec-dependent pathway, including secA, secY, yidC, ffh, ftsY, dnaJ, dnaK, grpE, groES, and groEL and, like SEM branch mollicutes, lack several subunits and the signal peptidases of the protein maturation component, including secB, secG, secF, secE, secD, and signal peptidase I (80). Despite the absence of several components, OY-M phytoplasma has a functional sec-dependent protein translocation system (43). It is possible that some of the many hypothetical proteins have peptidase activities. This confirms previous findings (10, 44) that phytoplasmas have a functional sec-dependent protein translocation system and that the N-terminal signal peptides of proteins are cleaved. Since the closest walled relatives of phytoplasmas are Clostridium, Bacillus, and Streptococcus spp. (phylum Firmicutes), it is possible that, similarly to Streptococcus pyogenes (84), phytoplasmas secrete virulence-related proteins via the sec-dependent pathway.
Both phytoplasma genomes contain several ABC transporters (Table 4). ABC transporters import peptides, amino acids, and nutrients into the cell. They can be virulence factors, and they can deplete essential nutrients from the host and secrete toxins and antimicrobial compounds such as hemolysins (23). Furthermore, solute-binding proteins of ABC transporters are usually secreted lipoproteins that bind external substrate to the cell and deliver the substrate to the ABC transporters and may also be involved in adherence to cell surfaces (4). For instance, the ABC transporter-related solute-binding protein Sc76 of Spiroplasma citri was shown to be involved in the penetration of or multiplication in the salivary gland (17). The AY-WB genome contains genes for five solute-binding proteins with specific solute-binding activities (Table 4). All five solute-binding proteins have N-terminal cleavable signal peptide sequences, as predicted with SignalP v3 software (14), and therefore are secreted via the sec-dependent pathway. Hence, these five solute-binding proteins are putative virulence factors of phytoplasmas.
It is intriguing that phytoplasmas have small genomes that lack many standard metabolic functions but are repeat rich. The repeated DNAs are mostly multicopy genes organized in PMUs. Thus, phytoplasmas are different from other bacterial endosymbionts of insects, e.g., Buchnera and Blochmannia spp., which also have small genomes lacking many standard metabolic functions but have low levels of repeated DNAs (1, 91). On the other hand, the majority of the mollicutes have repeat-rich genomes. All mollicutes are under pressure for genome minimization, and the presence of numerous repeats is therefore highly significant (82). Indeed, it has been shown for several mycoplasmas that repeats engage in recombination events resulting in changes of mosaics of antigenic structures at cell surfaces, essential for evasion of the host immune system and for adaptation to new environments (82). Thus, similarly to mycoplasmas, the repeated DNAs of phytoplasmas probably allow adaptations to different environments. Adaptation is particularly important for phytoplasmas, as their host environments are extremely variable, including the intracellular environments of phloem tissues of plants and guts, salivary glands, and other organs and tissues of insect hosts. Also, phytoplasmas have a broad plant host range. AY-WB alone can infect China aster, lettuce, tomato, Nicotiana benthamiana, and Arabidopsis thaliana. Phytoplasma genomes are different from mycoplasma genomes in several aspects. First, phytoplasmas do not have recA, ruvA, and ruvB and hence appear to lack a functional recombination system. Second, thus far, the organization of repeated DNAs in PMUs (Fig. 3 and Table 3) is unique to phytoplasmas among the mollicutes.
PMUs. The PMUs contain tra5 ISs, which belong to the IS150 group and the IS3 family (53, 58). IS3-type mobile units are found in a number of other mollicutes, for example, IS1138 in Mycoplasma pulmonis, IS1221 in Mycoplasma hyorhinis and Mycoplasma hyopneumoniae, IS1297 in M. mycoides subsp. mycoides, ISMi1 in Mycoplasma fermentans, and one IS3 element in the spiroplasma virus DNA SPV1-C74 sequence of S. citri (58, 66). All of these elements belong to the IS150 subgroup, and it has been demonstrated that some of these elements undergo autonomous transposition (16).
PMU1 of AY-WB is the longest, appears to be the most complete, and has several striking features characteristic of composite transposons (Fig. 3). First, the right and left borders of PMU1 contain long (327-bp) IRs. Furthermore, whereas the ORF to the right is a truncated tra5 sequence, the tra5 sequence at the left can produce a full-length ORFAB fused-frame transposase (58). IS150 can generate circles by joining IRs upon production of the fused-frame transposase (90), and particularly, composite transposons that carry single inverted repeats at the left and right borders form stable circles (48). PMU1 also carries a gene for DNA protein HU (himA), which is a nonspecific binder of DNA but prefers binding to bent, kinked, or altered DNA sequences (31) and has a role in recombination through the joining of distant recombination sites (5). Thus, with the help of transposase and DNA protein HU, the IRs could join to form a circle and induce transposition of PMU1. It is striking that all the genes on PMUs are oriented in the same direction, with sigF, encoding a specialized transcription factor, as the first gene and located downstream of the inverted repeat. In IS3 family members, the adjoined IRs, which are formed on circularization, create a strong hybrid promoter that drives high levels of transposase expression (58). Hence, it is possible the adjoined 327-bp repeats upon circulation of PMU1 create a strong promoter that drives the transcription of at least part of the PMU genes.
The AY-WB and OY-M genomes also contain evidence that at least some PMUs transpose in a replicative fashion. First, there are multiple copies of PMUs and PMU-like clusters. Second, the PMUs contain full-length dnaB, dnaG, and ssb genes that are involved in DNA replication. DnaB initiates DNA replication (19). It moves along the lagging strand and unwinds the DNA helix for the propagating fork and attracts DnaG for lagging-strand synthesis (93). SSB plays an essential role in DNA replication by stabilizing single-stranded DNA (56). Most PMUs also contain a tmk gene encoding thymidylate kinase that synthesizes dTDP from dTMP for DNA synthesis. Similarly to AY-WB, the OY-M phytoplasma genome contains at least two tmk homologs, tmk-a and tmk-b, with tmk-a being present as multiple copies (68). We revealed that the tmk-a genes are part of PMUs. However TMK-b but not TMK-a was shown to have thymidylate kinase activity (68). Hence, the function of TMK-a is not yet clear.
Several sigma factor genes were identified in the AY-WB genome. These genes are rpoD, which encodes the standard 465-amino-acid σ70 protein and is present as a single copy on the AY-WB chromosome, and multiple copies of sigF that are located on PMUs or PMU-like gene clusters and have deduced proteins of ~200 amino acids in length. PMU3 contains a sequence with similarity to sigF immediately upstream of the ssb gene, but because of the presence of a premature stop codon, this sequence was not predicted to be an ORF. The OY-M genome also has multiple copies of sigF that are part of PMUs. The N-terminal 100 amino acids of the SigF proteins have region 2 domains (Pfam domain number PF04542) containing both the −10 promoter recognition helix and the primary core RNA polymerase binding determinant. However, the C-terminal 100 amino acids of the SigF proteins do not have similarities to other proteins or domains, including the region 4 domains (Pfam domain number PF04545) containing the −35 promoter-binding element. AY-WB SigF proteins showed the greatest similarity (E value, 10−6) to the stress response sigma factor [sigma(H)] of Streptococcus coelicolor (50) and the flagellar biosynthesis sigma factor FliA of Pseudomonas putida (46). Expression of SigF and other PMU genes might occur under specific environmental conditions.
Since PMUs contain several genes predicted to encode membrane-targeted sequences, one would expect that expression of PMU genes would result in a change of the phytoplasma membrane surface. In this regard, it is intriguing that the PMUs contain hflB (or ftsH) genes that encode membrane-associated ATP-dependent Zn proteases of ~700 amino acids. These proteins are conserved among bacteria and are involved in membrane-associated processes such as protein secretion (26) and membrane protein assembly (2) as well as adaptations to nutritional conditions and osmotic stress (26, 57).
Genomic plasticity. The irregular GC skews and the presence of large repeated sequences (PMUs) in the AY-WB and OY-M genomes are indicative of high genomic plasticity. The correlation between an irregular GC skew and the presence of ISs in mollicute genomes is quite striking. For instance, M. mycoides has an irregular GC skew, and 13% of the genome size consists of ISs (95), whereas Mycoplasma mobile has a regular GC skew and no ISs (42). It should be noted, however, that although AY-WB doesn't have a significant consistent GC skew, it may have another kind of significant skew or excess, including AT skew and purine excess or keto excess (89).
Phytoplasma genomic plasticity is also evidenced by the differences in genome sizes and compositions between members of “Ca. Phytoplasma asteris,” ranging from 660 to 1,130 kb and consisting of several fragments of 500 kb and larger (61; our personal observation). Since PMUs can form large clusters that may locate in different sections of the chromosome, it is likely that they are also capable of splitting a single chromosome into two smaller chromosomes. Furthermore, results reported herein show that AY-WB and OY-M differ by ~154 kb in genome size, mainly because of a difference in PMUs and other multicopy sequences (Fig. 2A).
Despite the phytoplasma genomic plasticity, the majority of the AY-WB and OY-M genomes are syntenic (Fig. 3C). Scatter plots of conserved sequences between the AY-WB and OY-M genomes show an X-shaped pattern with symmetry around the tentative oriC and two other locations at approximately opposite ends of oriC (Fig. 2C). This X-shaped pattern or X alignment is common in genome comparisons of closely related bacterial species and is most likely due to the occurrence of large inversions that rotate around oriC and the terminus of replication (28). The breakpoints of the inversions between the AY-WB and OY-M genomes are, as expected, at PMU-like regions and repeated uvrD sequences.
There are probably two reasons for the good alignment of the AY-WB and OY-M genomes. First, we already observed that the PMUs tend to congregate. This is consistent with findings that IS150 frequently transposes into target regions resembling its IR (58, 74). Thus, transposition will predominantly affect certain areas of the phytoplasma genomes, and hence, the synteny in the rest of the genome can be maintained. Second, because of the absence of recA, ruvA, and ruvB, rearrangements between PMUs through homologous recombination are likely to occur at lower frequencies than in genomes with RecA-dependent homologous recombination machineries (78, 79).
Variations in the presence of recA are common among insect-associated mollicutes (65). Truncated recA genes were found in six Spiroplasma citri strains, which, like phytoplasmas, are insect-transmitted plant pathogens, and five Spiroplasma melliferum strains, which are pathogens of bees (59). In S. citri, only the first 390 nucleotides at the 5′ end of recA are present, whereas in S. melliferum, the full-length recA gene is interrupted by a TAA stop codon. Intriguingly, truncated and full-length RecA polypeptides were observed in a proteomic study of S. melliferum (21). These finding suggest that recA sequence variation among insect-associated mollicutes is of biological significance. RecA has an important function in mycoplasmas. Deletion of recA is lethal for M. pulmonis (80). RecA is probably essential for homologous recombination between repeated lipoproteins, and adhesin genes result in a change of mosaics of antigenic structures at the bacterial surface, with subsequent evasion of the host immune response (80, 82). Thus, it seems that phytoplasmas and spiroplasmas can adapt to their hosts with less efficient homologous recombination systems, and the loss of RecA function might then be beneficial for increasing genome stability. This is supported by the observations that, like phytoplasmas, spiroplasmas have highly repeat-rich genomes mainly due to phage-derived sequences (80). On the other hand, M. mycoides, which also has a repeat-rich genome and is a human pathogen, has a full-length recA (47).
Reductive evolution. In general, AY-WB seems further along in the reductive evolution process than OY-M. First, AY-WB phytoplasma contained fewer PMUs insertions, and the ORFs in AY-WB PMUs are more frequently truncated or deleted. Second, AY-WB lacks genes that are truncated in OY-M, including asnB, hsdR, hsdM, recA, and sucP. Third, AY-WB lacks genes of incomplete pathways in OY-M, including rfaG of the glycerolipid metabolism pathway and pdxK of the vitamin B6 pathways. Furthermore, unlike OY-M, AY-WB does not have folC, and OY-M has full-length folK and folP genes that are truncated in AY-WB. The folK and folP genes were also identified as pseudogenes in clover phyllody (CPh) phytoplasma (“Ca. Phytoplasma asteris”) (24), suggesting that OY-M may be capable of de novo folate synthesis, whereas AY-WB and CPh have to import folate from host cells. Similarly to CPh (24), the folK and folP sequences of AY-WB and OY-M are flanked by gcp, which encodes a glycoprotease, and two ORFs encoding a DegV family protein and a 24-kDa lipoprotein (AYWB_245) (24). Hence, the gene organizations of this part of the genome are conserved among “Ca. Phytoplasma asteris” members. Final evidence that AY-WB is further down the reductive evolutionary path is provided by the observation that relative to OY-M, AY-WB contains fewer ORFs that are shared by several but not all mollicutes (146 ORFs for AY-WB versus 214 ORFs for OY-M) (Fig. 2D).
Plasmids. We identified four plasmids in AY-WB. Plasmids have been detected in a number of other phytoplasmas (55, 73). Each AY-WB plasmid contains two genes involved in rolling-circle amplification and two to six ORFs with unknown function, several of which were predicted to target the AY-WB membrane, suggesting that the plasmids are involved in AY-WB association with the plant and insect hosts. Indeed, the RepA proteins of OY-M phytoplasmas were detected in infected plants (71), indicating that the plasmid genes are expressed during infection of the plant. Furthermore, spontaneous OY-M mutants that lack ORFs on a plasmid and are not insect transmissible were isolated (72).
Interestingly, two AY-WB plasmids (AYWB-pI and AYWB-pIII) contain repA genes similar to geminivirus repA, whereas the rep genes of the other plasmids were unique to AY-WB and OY-M phytoplasmas. Geminivirus-like repA genes in OY-M (75) and more distantly related phytoplasmas (55, 81) were also identified. Like phytoplasmas, geminiviruses are insect-transmitted plant pathogens and have to pass through the gut epithelium, hemolymph, and salivary gland cells of the insect vectors before returning to the plant (22). Phytoplasmas and geminiviruses have overlapping plant and insect host ranges. Hence, it is possible that phytoplasmas acquired the repA genes from geminiviruses through horizontal exchange. On the other hand, it has been hypothesized that geminiviruses originated from bacterial plasmids (49). Plasmids with similar repA genes are generally incompatible, and therefore, it is likely that the four plasmids are not present in one AY-WB cell but represent the plasmid content of the AY-WB population present in plants from which the AY-WB DNA was isolated.
The variation among the AY-WB plasmids suggests that they are prone to frequent mutations. This is consistent with other findings. OY-M has plasmids ranging from ~3 to ~7 kb in size (Fig. 1B) (73), and the plasmids of beet leafhopper-transmitted virescence phytoplasma range from ~2.5 to ~11 kb (55). There is high variability of the occurrence of ORFs in the plasmids of 30 beet leafhopper-transmitted virescence phytoplasma strains (55). There is also evidence of intramolecular recombination among phytoplasma plasmids (55, 73). We show that they can also recombine with the chromosome (Fig. 1C).
Phytoplasma metabolism. Except for a few exceptions described above, the AY-WB metabolic pathways are similar to those of OY-M that have been described elsewhere (76) and will not be discussed in detail here, although a few findings need more emphasis. The phytoplasma metabolism is in several ways different from those of SEM branch mollicutes. This was expected, because phytoplasmas have not yet been grown in cell-free culture media, including mycoplasma culture media. Unlike SEM branch mollicutes, phytoplasmas do not have PTS systems to import sugars and to generate glucose-6-phosphate to feed the glycolysis pathway. Thus, phytoplasmas are clearly different from the insect-transmitted plant-pathogenic S. citri and S. kunkelii, which have three PTS systems for the import of fructose, glucose, and trehalose (7). In contrast, phytoplasmas possess ABC transporters for the import of maltose. The maltose-binding protein (MalE) (Table 4) may have affinity to maltose, trehalose, sucrose, and palatinose (88). Affinity of MalE to trehalose is likely, as trehalose is a major sugar in the insect hemolymph. The fate of these sugars after import is not clear, because enzymes required for the conversion of these sugars to glucose-6-phosphate for glycolysis were not found in the phytoplasma genomes, and the sucrose phosphorylase gene, which is important for sucrose degradation, is fragmented in the OY-M phytoplasma genome (76) and is completely absent from the AY-WB phytoplasma genome (Table 6). Generally, the genomes of AY-WB and OY-M phytoplasmas harbor significantly fewer carbohydrate transport and metabolism genes than their mycoplasma counterparts. Even in the 580-kb genome of M. genitalium, 26 carbohydrate transport and metabolism genes were identified (33). In contrast, only 19 genes are present in the 860-kb OY-M phytoplasma genome (76), and 16 genes are present in the 706-kb AY-WB phytoplasma genome.
Unlike SEM branch mollicutes, phytoplasmas have a NAD-specific malic enzyme (EC 1.1.1.38) and malate/citrate-sodium symporter genes. Thus, like symbiotic Rhizobium spp. (77) but unlike sequenced SEM branch mollicutes, phytoplasmas may use malate as a carbon source. The use of malate is advantageous, because it is readily available in the cytoplasm of host cells, and it can serve as the sole energy source for bacteria by conversion to oxaloacetate and pyruvate (27, 77). Furthermore, metabolism of malate saves energy (27), which is important, because phytoplasmas lack ATP synthases, and hence, the capacity to generate energy in phytoplasmas seems limited to glycolysis (starting with glucose-6-phosphate).
Unlike SEM clade mollicutes, phytoplasmas appear to be capable of biosynthesis of their own membrane phospholipids. The genomes of AY-WB, OY-M (76), and Western X-disease phytoplasma (54) contain the pssA and psd genes (Table 6) encoding CDP-diacylglycerol-serine-O-phosphatidyltransferase(EC 2.7.8.8) and phosphatidylserine decarboxylase (EC 4.1.1.65), respectively. Both are part of the phosphatidylethanolamine pathway (63). Furthermore, the AY-WB and OY-M genomes contain a candidate pmt gene for phospholipid N-methyltransferase (Table 6) that is involved in phosphatidylcholine synthesis in conjunction with PssA and Psd (63). This confirms that phytoplasmas are phylogenetically more related to acholeplasmas (4), which do not require exogenous phospholipids, whereas SEM branch mollicutes are sterol and fatty acid auxotrophs (80). AY-WB and OY-M also have all enzymes that link the glycolysis pathway to the glycerolipid pathway (76) and an ABC transporter gene, phnL, involved in lipoprotein release (Table 4).
Summary. Phytoplasmas have intriguing genomes that are small and contain many multicopy sequences mainly organized as PMUs. The AY-WB genome is ~154 kb smaller than the OY-M genome, primarily as a result of fewer multicopy sequences. Thus, expansions or reductions of PMUs play a major role in phytoplasma genome evolution. At least one PMU, PMU1, has the characteristics of a replicative composite transposon. PMUs contain genes for specialized sigma factors and membrane proteins, providing evidence that PMUs are important for phytoplasma interactions with the environment. Since phytoplasmas lack recA and other standard homologous recombination functions, it is unlikely that phytoplasmas generate antigenic variation of membrane proteins through RecA-dependent homologous recombination. We propose that the regulation of expression of PMU genes is one of the strategies phytoplasmas use to adapt to different environments. Expression of PMU genes might occur through a process that involves circularization and replicative transposition. In addition, genome rearrangements through expansions and deletions of PMUs might increase the chance of phytoplasma adaptation to diverse hosts and can be a major evolutionary factor allowing phytoplasmas to occupy broad plant host ranges or to adapt to different insect vectors. Few genes have similarities to known bacterial virulence factors. Like the related gram-positive bacteria, phytoplasmas may secrete virulence-related proteins via the Sec-dependent pathway. Hence, all the proteins with signal peptides are potential virulence factors, including the five solute-binding proteins of the ABC transporters and proteins derived from plasmids and PMUs. Finally, phytoplasmas have ABC transporters for the import of maltose (or trehalose, sucrose, and palatinose), utilize malate, and can make phospholipids. In contrast, SEM branch mollicutes have PTSs for the import of fructose, glucose, and trehalose, utilize lactate, and are phospholipid auxotrophs.
This work was supported by the National Research Institute of the USDA Cooperative State Research, Education, and Extension Service, grant 2002-35600-12752, and the Ohio Agricultural Research and Development Center competitive grants program.
We thank former members of the bioinformatics and genome analysis group at Integrated Genomics, including Svetlana Gerdes, Eugene Goltsman, Viktor Joukov, Vinayak Kapatral, Yakov Kogan, Nikos Kyrpides, Andrei Osterman, Olga Ostrovskaya, and Ross Overbeek. We also acknowledge Angela D. Strock, Melanie L. Lewis Ivey, and Jhony Mera for excellent technical assistance.