EvoPrinterHD
 
HGT exchanges between the Escherichia coli O157:H7 and Shigella dysenteriae Sd197 human pathogens



Acquisition of virulence factors and antibiotic resistance by many clinically important bacterial pathogens can be traced to horizontal gene transfer (HGT) events between related or evolutionarily distant microflora. Multi-genome comparative analysis has become an important tool for identifying HGT DNA. We have used the multi-genome alignment tool EvoPrinter to facilitate discovery of HGT in enteric bacteria genomes. A published report in PLoS ONE describes our analysis of HGT in Staphylococcus aureus. EvoPrinter analysis of 22 enteric bacteria reveals putative HGT exchanges between the E. coli O157:H7 and S. dysenteriae Sd197 human pathogens. EvoPrinter analysis of Escherichia coli O157:H7 reveals a close sequence relationship with S. dysenteriae Sd197, a relationship not shared with other Escherichia coli or S. genomes. We document here two out of thirteen examples of sequences shared between E. coli O157:H7 and S. dysenteriae Sd197. The exclusive relationship between these two species suggests that multiple HGT events are responsible. Given that humans represent the sole host of Shigella, co-infection of these two organisms in humans is likely to be the condition favoring such multiple HGT events.

We have adapted the web-based comparative genomics tool, EvoPrinterHD (Odenwald et al., 2005; Yavatkar et al., 2008), to enable rapid screening of bacterial DNA to discover sequences that are unique to the input reference DNA or shared with only a small subset of other genomes included in the analysis. EvoPrinterHD is currently formatted for the automated comparative analysis of 17 Staphylococcus, 20 Streptococcus and 22 enteric bacteria genomes. Unlike other multi-genome comparative tools that display columns of aligning bases with gaps to optimize alignments, EvoPrinter provides an uninterrupted representation of conserved, unique or uniquely shared DNA sequences, as they exist in the genome of interest. Because EvoPrinter readouts show only the input reference sequence (up to 40 kb) and not the aligning regions in the other genomes included in the analysis, more bases can be displayed in a single view than is possible with conventional alignments. The following algorithms were developed to identify sequences potentially involved in HGT: (1) an EvoDifferences profile portrays in a single view those sequences that are detected in all but one of the genomes included in the analysis; (2) an EvoUnique profile identifies unique or uniquely shared sequences among two or three of the genomes but absent from others, and (3) input reference DNA exchange, allowing for re-initiation of the comparative analysis using the aligning region of another genome, thus facilitating the search for unique sequence differences among the genomes included in the analysis. EvoPrinterHD also includes algorithms that search the input DNA for duplications and the aligning regions for rearrangements.

EvoPrinterHD algorithms can be used to discover unique or uniquely shared DNA present in bacterial genomes or their MGEs. The comparative analysis of the bacterial chromosomal and MGE DNAs described in this study was performed using the EvoPrinterHD alignment algorithms with the procedures described online. The analysis of clinically important enteric bacteria genomes demonstrates the utility of this web-accessed comparative tool for discovering putative HGT events. By color coding and presenting an uninterrupted view of the sequence of interest, EvoPrinter analysis allows the user to search up to 40 kb of DNA with base-pair resolution for sequences that are not uniformly present in related genomes.

E. coli O157:H7 belongs to the entero-haemorrhagic pathogenic group due to its expression of Shiga toxins (Unkmeir and Schmidt, 2000) and its ability to adhere to gut epithelial cells (Torres et al., 2005). The primary reservoir of this important food-borne pathogen is cattle (reviewed by Hancock et al., 2001). Previous comparative genomic analyses of the human isolates, E. coli O157:H7 Sakai (referred to here as E. coli Sakai) and the closely related E. coli 0157:H7 EDL933 (referred to here as E. coli EDL933), with other enteric bacteria have revealed their close evolutionary relationship to another enteric bacteria, the Shigella dysenteriae isolate Sd197 (Shigella Sd197) (Pupo et al., 2000; Yang et al., 2007). In addition, evidence of HGT exchange between E. coli and Shigella has been documented. Specifically, the wbbG gene, encoding a protein responsible for glucose metabolism was transferred between E. coli O148 and the Shigella dysenteriae type 1 strain (Feng et al., 2007).

To search for additional unique sequence relationships between E. coli O157:H7 and Shigella and other enteric bacteria, we generated a series of EvoUnique profiles using the E. coli Sakai and EDL933 isolates as reference genomes. Twenty other enteric bacteria genomes were included in the analysis (see Methods section below for list). Our EvoUnique profiles revealed a close relationship between E. coli Sakai/EDL933 and Shigella dysenteriae Sd197, as previously noted by Yang et al., 2007 (data not shown).

Our analysis revealed multiple instances of putative HGT events that were not shared with other E. coli or Shigella genomes. For example, an E. coli Sakai genomic EvoPrint of DNA spanning 953,551 to 958,320 base pairs revealed that the central 3,044 bp region was lost in one or more of the genomes while the accompanying EvoUnique profile indicated that only two of the 21 genomes included in the analysis shared homology with E. coli Sakai in this region (Figure 1). Analysis of the different genome eBLAT alignment scores, displayed on the EvoPrinter scorecard, revealed that the central region was present in E. coli EDL933 and three sub-regions were also found in the Shigella SD197 genome. Database searches revealed that the uniquely shared DNA contains an ORF (Figure 1A) that encodes a 670 amino acid protein of unknown function (data not shown). A nucleotide database homology search of the Shigella SD197 corresponding DNA revealed that its loss of alignment with the E coli EDL933/Sakai isolates (indicated by the two blocks of green-colored sequence that are unique to just EDL933 and Sakai in Figure 1B) was due to the insertion of an IS-1 plasmid into the Shigella SD197 genome (data not shown). IS-1 is a multi-copy mobile insertion element that is found in many enteric bacteria genomes (reviewed by Ohtsubo and Sekine, 1996).

Our EvoPrinter alignments uncovered additional examples of how plasmid insertions have disrupted ORFs. An EvoDifference profile of the E coli EDL933 region spanning 87,419 to 91,588 bp identified Shigella SD197 as the only bacteria of 22 included in the analysis to have lost portions of an ORF that encodes a leucine transcriptional activator (Figure 2 and data not shown). Examination of the Shigella SD197 sequence that spans this region revealed that the ORF disruption was most likely due to two different plasmid insertions because the 5' portion of the ORF was replaced with DNA that corresponds to the Shigella plasmid pSD1-197 and the 3' region was replaced with sequences that are identical to the IS-1 transposase encoding gene (data not shown).



Source of Bacteria: Enteric bacteria, obtained from the BacMap: E. coli 536, E. coli APEC O1, E. coli CFT073, E. coli O157:H7 EDL933, E. coli O157:H7 Sakai, E. coli K12 MG1655, E. coli UT189, E. coli W3110, Klebsiella pneumoniae MGH 78578, Salmonella bongori, Salmonella enterica Choleraesuis SC-B67, Salmonella enterica CT18, Salmonella enterica Paratypi A ATCC 9150, Salmonella enterica Ty2, Salmonella typhimurium LT2, Shigella boydii Sb227, Shigella dysenteriae Sd197, Shigella flexneri 2a 2457T, Shigella flexneri 301, Shigella flexneri str8401 and Shigella flexneri Ss046. The genome sequence file for Escherichia coli UT189 was curated from Enteropathogen Resource Integration Center and genome sequence data for Salmonella bongori was downloaded from the Sanger Institute Sequencing Centre.

References

Hancock, D., Besser, T., Lejeune, J., Davis, M., and Rice, D. (2001) The control of VTEC in the animal reservoir. Int J Food Microbiol 66: 71-8.

Odenwald, W.F., Rasband, W., Kuzin, A., and Brody, T. (2005) EVOPRINTER, a multigenomic comparative tool for rapid identification of functionally important DNA. Proc Natl Acad Sci USA 102: 14700-5.

Ohtsubo, E., and Sekine, Y. (1996) Bacterial insertion sequences. Curr Topics Microbiol Immunol 204: 1Ð26.

Pupo, G.M., Lan, R., and Reeves, P.R. (2000) Multiple independent origins of Shigella clones of Escherichia coli and convergent evolution of many of their characteristics. Proc Natl Acad Sci USA 97: 10567-72.

Torres, A. G., Zhou, X., and Kaper, J. B. (2005) Adherence of diarrheagenic Escherichia coli strains to epithelial cells. Infect Immun 73: 18-29.

Unkmeir, A., and Schmidt, H. (2000) Structural analysis of phage-borne stx genes and their flanking sequences in shiga toxin-producing Escherichia coli and Shigella dysenteriae type 1 strains. Infect Immun 68: 4856-64.

Yang, J., Nie, H., Chen, L., Zhang, X., Yang, F., Xu, X., Zhu, Y., Yu, J., and Jin, Q. (2007) Revisiting the molecular evolutionary history of Shigella spp. J Mol Evol 64: 71-9.

Yavatkar, A. S., Lin, Y., Ross, J., Fann, Y., Brody, T., and Odenwald, W. F. (2008) Rapid detection and curation of conserved DNA via enhanced-BLAT and EvoPrinterHD analysis. BMC Genomics 9: 106.

Figures

Figure 1. EvoPrinterHD identifies uniquely shared DNA among subsets of 22 different enteric bacteria.

Shown is an Escherichia coli O157:H7 EDL933 genomic DNA EvoPrint (A) and EvoUnique profile (B) of a 4,769 bp fragment (base pairs 955,204 to 959,973). The comparative analysis included pairwise alignments to the genomes of E. coli K12MG 1655, E. coli O157:H7 Sakai, E. coli K12W 3110, E. coli CFT073, E. coli 536, E. coli UTI89, E. coli APEC 01, Klebsiella pneumoniae MGH 78578, Shigella sonnei Ss046, Shigella flexneri 2457T, Shigella flexneri 301, Shigella flexneri 5str8401, Shigella boydii Sb227, Salmonella Paratyphi A ATCC9150, Salmonella Paratyphi A, Shigella dysenteriae Sd197, Salmonella Bongori and Salmonella Typhi CT18. A) Uppercase black-colored letters highlight sequences that are conserved in all 22 genomes, while the lowercase gray-colored letters indicate sequences that are absent from one or more of the aligning regions. The boxed sequence corresponds to a 2,010 bp ORF that encodes a 670 amino acid protein of unknown function. The flanking bases contain open reading frames for an ATP-dependent RNA helicase RhlE (upper), and an ATP-dependent DNA helicase DinG (lower). The presence of the reiterative pattern of two capital letters followed by a lower case letter is indicative of conserved amino acid codons among the 22 enteric bacteria studied. B) Generated from the same alignments, the EvoUnique profile reveals that only 2 of the 21 genomes included in the analysis contain homologous sequences. Uppercase red-colored nucleotides are unique to the E. coli O157:H7 EDL933 sequence, green-colored bases align with only 1 of the 21 genomes, blue-colored bases align with 2, and lowercase gray-colored bases are common to 3 or more alignments. Examination of the ceBLAT alignments (not shown) for each of the pairwise comparisons revealed that E. coli O157:H7 Sakai genome contained the entire novel gene locus, while the Shigella dysenteriae Sd197 genome contained only 3 short sub-fragments of the locus (blue sequences).
Figure 2. An EvoDifference profile of the Escherichia coli O157:H7 EDL933 genomic region spanning base pairs 87,419 to 91,558 reveals that Shigella dysenteriae Sd197 has lost a leucine transcriptional activator gene.

Uppercase black capital letters represent E.coli EDL933 sequences that are conserved in E. coli K12MG 1655, E. coli K12W 3110, E. coli CFT073, E. coli 536, E. coli UTI89, E. coli APEC 01, Shigella sonnei Ss046, Shigella flexneri 2457T, Shigella flexneri 301, Shigella flexneri 5str8401, Shigella boydii Sb227, Salmonella Paratyphi A ATCC9150, Salmonella Paratyphi A, Shigella dysenteriae Sd197, Salmonella Bongori and Salmonella Typhi CT18. Lowercase gray-colored letters represent bases that are not conserved in two or more genomes. Color-coded bases are missing in just one of the genomes included in the analysis (Shigella boydii Sb227, Salmonella Paratyphi A, Shigella dysenteriae Sd197, Salmonella Bongori or Salmonella Typhi CT18). The leucine transcriptional activator leuO ORF, present in all species analyzed except for Shigella dysenteriae Sd197, is boxed.

Return to EvoPrinterHD home.

[ National Institutes of Health (NIH) | Contact NINDS ]
[ Home | Disclaimer | Privacy Notice | Accessibility Compliance ]
[ National Institute of Neurological Disorders and Stroke (NINDS) | FirstGov | Department of Health and Human Services ]


H H S Logo - link to U. S. Department of Health and Human Services     N I H logo - link to U. S. National Institutes of Health    N I N D S logo - link to National Institute of Neurological Disorders and Stroke    FirstGov Logo - link To FirstGov