Bookshelf » Coffee Break » Microbial diversity: let's tell it how it is
 
coffeebrk
Coffee Break
Jo McEntyre1
National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health, Bethesda, MD 20892-6510
National Center for Biotechnology Information (NCBI)
Bioinformatics

Microbial diversity: let's tell it how it is

Jo McEntyre, PhD
26032004A622
Created March 4, 2004.
Last update March 26, 2004.

*

*

*

An impressive number of bacteria—about 30,000 species—are represented in GenBank. However, our view of the microbial world is both scant and skewed. A recent estimate suggests that the sea may support as many as 2 million different bacteria, and a ton of soil might contain 4 million (1). Less than half of the bacteria represented in GenBank—about 13,000—have been formally described, and almost all of these (90%) lie within 4 of the 40 bacterial divisions (2). Similar or greater paucity of knowledge also exists for archaea and viruses (3).

graphic element

Sampling "wild" microorganisms leads to the discovery of new species and novel metabolisms, which may be important from both a basic science and a practical perspective (for example, see Refs 4,5 [search PubMed]). For example, if we characterized the community in the human gut, it would be easier to spot non-native organisms in food poisoning outbreaks. Pathogens that may underlie neurological syndromes that present with features of infection would stand out against the background flora (1). Engineered communities of microorganisms might also be able to assist clean up of environmental disasters or create sustainable energy sources.

Exploring bacterial diversity is typically done by amplifying rRNA genes, in particular 16S rRNA genes, from DNA samples isolated from a habitat. The sequences are then compared to each other and to the 16S rRNA sequences from known species. If no close match to an existing 16S rRNA gene sequence is found, then the test sequence is thought to represent a new bacterium and is listed in GenBank as "uncultured bacterium". Even in well-studied, discrete places like the human mouth, new groups of uncultured bacteria continue to be discovered all the time. A newly identified organism has to be isolated and cultured in the lab to be described further; but many bugs are just not amenable to monoculture—they have adapted to living in a specific environment and may need to be part of a complex community to survive (1-3).

16S rRNA genes are considered standard because they are thought to be conserved across vast taxonomic distance (they are critical for protein translation), yet show some sequence variation between closely related species. However, one problem with using rRNA genes is that they are often present in multiple copy numbers; therefore, other representative genes may be used for sampling specific populations.

Whole Genome Shotgun Sequencing of Environmental Samples

New approaches to environmental sampling are emerging (69). One of these used a microarray to discover and assist in the isolation of new viruses (6); another used a shotgun clone and sequencing method to explore marine viral communities (9). Two others have used whole genome shotgun (WGS) sequencing on a population of bacteria, obviating the need to isolate each organism before sequencing can begin (7,8). These methods, used in combination with existing methods, may provide shortcuts to the discovery of new genes and give a holistic persective to microbial populations.

One recent study used a WGS approach to explore a sample from an acid mine drainage biofilm (7; AADL00000000). These investigators report that near-complete genomes for Leptospirillum Group II and Ferroplasma Type II were assembled, along with more fragmentory assemblies for Leptospirillum Group III, Thermoplasmatales archaeon gpl, and Ferroplasma acidarmanus Type I. Analysis of the results provided some insight into how such organisms survive in an extreme environment.

In another test case of the WGS method, Venter et al. (8) sampled water from the Sargasso Sea—one of the most well-characterized regions of ocean in the world. The major set of samples produced 1.66 million short sequences, some of which could be grouped together into larger genomic pieces. There remained about 400,000 paired-end reads and singleton reads.

Finding the Data

Using a WGS method to sequence an undefined population as opposed to a single organism adds significant complexity to the assembly process and to the identification of genes. About 25% of the assembled data from the Sargasso Sea had 3X coverage or greater; these well-sampled portions were used to cluster the sequence by “organism”.

Table 1
The organism bins assembled from the Sargasso Sea WGS environmental sample dataset (8)
Organism BinDescriptionDataFurther Reading
Genome Assemblies
cf. Alphaproteobacteria SAR-1Oligotrophic Typical of marine bacterioplanktonGenomeGenBankPubMedBooks
cf. Archaea SAR-1One of the three major domains of life Often inhabit extreme environmentsGenomeGenBankPubMedBooks
cf. Bacteria SAR-1One of the three major domains of lifeGenomeGenBankPubMedBooks
cf. Burkholderia SAR-1Gram-negative bacilli Aerobic Found in a variety of aquatic environments GenomeGenBankPubMedBooks
cf. Gammaproteobacteria SAR-1Purple bacteria Some plant pathogensGenomeGenBankPubMedBooks
cf. Microbulbifer SAR-1Marine bacteria that degrade and recycle complex carbohydrates GenomeGenBankPubMedBooks
cf. Prochlorococcus SAR-1Smallest known photosynthetic organism The most abundant in the ocean GenomeGenBankPubMedBooks
cf. Proteobacteria SAR-1Phylum includes nitrogen-fixing bacteria and enteric bacteriaGenomeGenBankPubMedBooks
cf. Pseudomonadaceae SAR-1Gram-negative rods Often motile Includes many plant and a few animal pathogens GenomeGenBankPubMedBooks
cf. Shewanella SAR-1Versatile metabolism Potential biotech applications such as heavy metal or chlorinated solvent reductionGenomeGenBankPubMedBooks
cf. Shewanella SAR-2* Versatile metabolism Potential biotech applications such as heavy metal or chlorinated solvent reduction GenomeGenBankPubMedBooks
cf. Streptomyces SAR-1Superficially similar to fungi (filaments and spores) Common in many habitatsGenomeGenBankPubMedBooks
Single Scaffolds
cf. Actinobacteria SAR-1High G+C group of Gram-positive bacteria Most found in soil Some pathogensGenBankPubMedBooks
cf. Bordetella SAR-1Gram-negative coccobacilli Strict aerobes GenBankPubMedBooks
cf. Burkholderiaceae SAR-1Occupy diverse ecological niches May have potential for biotech applications but also involved in human infectionsGenBankPubMedBooks
cf. Caulobacter SAR-1Found in oligotrophic environments Prosthecate (having appendages)GenBankPubMedBooks
cf. Crenarchaeota SAR-1Archaeal Most species are motile Tolerant of extreme acidity and temperatureGenBankPubMedBooks
cf. Cyanobacteria SAR-1Aquatic and photosynthetic Often called ”blue-green algae”GenBankPubMedBooks
cf. Enterobacteriaceae SAR-1Large Gram-negative rods Facultative anaerobes GenBankPubMedBooks
cf. Haemophilus SAR-1Gram-negative rods Like to grow on blood agar Some pathogensGenBankPubMedBooks
cf. Magnetococcus SAR-1Gram-negative coccus Magnetic bacteria Usually located at sediment-water interface GenBankPubMedBooks
cf. Magnetospirillum SAR-1Magnetic bacteriaGenBankPubMedBooks
cf. Ralstonia SAR-1Includes medically and economically important plant and animal pathogensGenBankPubMedBooks
cf. Rhizobiales SAR-1Involved in nitrogen fixation, often in symbiotic relationships with plantsGenBankPubMedBooks
cf. Sinorhizobium SAR-1Symbiotic nitrogen fixation in plant root nodules GenBankPubMedBooks
cf. Spirochaetales SAR-1Spiral rods Some pathogens (e.g. Borrelia burgdorferi - Lyme disease)GenBankPubMedBooks
cf. Streptomycetaceae SAR-1Typically aerobic and found in soil Some parasitic formsGenBankPubMedBooks
cf. Vibrionaceae SAR-1Gram-negative, non-sporing rods Generally motile Many strains of Vibrio genus cause infectionGenBankPubMedBooks

cf. is used to designate an unidentified species of the genus. Therefore, “cf. Burkholderia” means “something that is like the genus Burkholderia“ (in this case, by sequence similarity).

As each organism bin could actually represent several different unidentified species, a strain name cannot be assigned, so instead, the suffix "SAR-#" identifies each bin as a “Sargasso Sea cyber-species”.

* cf. Shewanella SAR-2: two distinct Shewanella genomes were constructed from the dataset.

The assembled sequences have been deposited in the WGS division of GenBank, with the project Accession number AACY01000000; thus, there are 811,372 WGS contigs in GenBank with the Accession numbers AACY01000001–AACY01811372. 498,641 of the WGS contigs are assembled into 232,442 scaffolds, the rest remain “singleton” WGS contigs; all but 10,685 of the scaffolds are made up of two contigs only. For the organism genomes listed in Table 1, 301 of the total scaffolds plus 36 singleton WGS contigs were used; the remainder have not been associated with any particular organism.

All of the short sequence reads, including those that were not included in the assembly, can be found in the Trace Archive.

Figure 1
Figure 1 (a)Genome view of cf. Shewanella SAR-1, constructed (more...)
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is cb26-f1.jpg.

Figure 1.

(a)Genome view of cf. Shewanella SAR-1, constructed from the whole genome shotgun sequence derived from Sargasso Sea environmental samples (8). Genes have been classified according to the COG functional categories of the protein products, and color-coded accordingly. note that the actual order of the scaffolds is unknown, so in this representation they have been ordered by size. Clicking on the image reveals the gene sequences and approximate location. (b) Selecting one of the genes (in this case, the blue gene around position 2619000) shows the results of an automated BLAST search (BLink). This gene is similar to L-sorbosone dehydrogenase from a variety of bacteria, archaea, and fungi. L-sorbosone dehydrogenase is an enzyme required for the biosynthesis of L-ascorbic acid, a product widely used in the food industry as a vitamin and antioxidant.

Each of the 28 organism “genomes” can be viewed in a similar mannner (see Table 1).

The assemblies were then further clustered into 30 tentative organism “bins” based on depth of coverage, oligonucleotide frequencies and similarities to previously sequenced genomes. Of these, 12 are of sufficient size to be considered a genome assembly, while the remaining 16 are relatively small single scaffolds (Table 1). All organism bins have been assigned a taxonomy ID, and have been placed in the taxonomic tree. Figure 1 shows the graphical representation of the cf. Shewanella SAR-1 “genome” sequence.

A variety of approaches suggested that there are at least 1000 species represented in the Sargasso Sea samples (8). Burkholderia species were represented in a high proportion (a genus that includes human and plant pathogens and some environmentally important bacteria), as were two distinct species closely related to Shewanella oneidensis. Both of these genera require a more nutrient-rich environment than the open ocean can offer, suggesting that they originated from microhabitats such as marine snow. The cyanobacterium Prochlorococcus was also relatively abundant in some samples.

Although the primary focus of this study was on bacterial populations, WGS environmental sampling may be an equally valid approach for exploring plasmids (Table 2), phage, viruses, and eukaryotic microbes.

References

1.
Curtis T P, Sloan W T, Scannell J W. Estimating prokaryotic diversity and its limits. Proc Natl Acad Sci USA. 2002; 99: 1049410499. [Free Full Text in PMC icon.Free Full text in PMC] [PubMed]
2.
DeLong E F. Microbial seascapes revisited. Curr Opin Microbiol. 2001; 4: 290295. [PubMed]
3.
Roossinck M J. Plant RNA virus evolution. Curr Opin Microbiol. 2003; 6: 406409. [PubMed]
4.
Kazor C E, Mitchell P M, Lee A M, Stokes L N, Loesche W J, Dewhirst F E, Paster B J. Diversity of bacterial populations on the tongue dorsa of patients with halitosis and healthy patients. J Clin Microbiol. 2003; 41: 558563. [Free Full Text in PMC icon.Free Full text in PMC] [PubMed]
5.
Béejà O, Aravind L, Koonin E V. et al. Bacterial rhodopsin: evidence for a new type of phototrophy in the sea. Science. 2000; 289: 19021906. [PubMed]
6.
Wang D, Urisman A, Liu Y T. et al. Viral discovery and sequence recovery using DNA microarrays. PLoS Biol. 2003; 1 [Free Full Text in PMC icon.Free Full text in PMC] [PubMed]
7.
Tyson G W, Chapman J, Hugenholtz P, Allen E E, Ram R J, Richardson P M, Solovyev V V, Rubin E M, Rokhsar D S, Banfield J F. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature. 2004; 428: 2526. [PubMed]
8.
Venter J C, Remington K, Heidelberg J. et al2004. Environmental genome shotgun sequencing of the Sargasso Sea Science[Epub ahead of print] [PubMed].
9.
Breitbart M, Salamon P, Andresen B. et al. Genomic analysis of uncultured marine viral communities. Proc Natl Acad Sci USA. 2002; 99: 1425014255. [Free Full Text in PMC icon.Free Full text in PMC] [PubMed]

 

Next
Coffee Break
(navigation arrows) Go to previous chapter Go to next chapter Go to top of this page Go to bottom of this page Go to Table of Contents