Research Abstracts
DOE Microbial Genome Program Report
Section 1: Sequencing and
Analysis
Sequencing the Genome of
Nitrosomonas europaea, an Obligate
Lithoautotrophic, Ammonia-Oxidizing Bacterium
Daniel J. Arp, Alan B.
Hooper,1 Jane E.
Lamerdin,2 David Arciero,1
Andre Arellano,2 Karolyn
Burkhart-Schultz,2 Anne Marie
Erler,2 Norman Hommes, Martin G.
Klotz,3 Jenny M. Norton,4 Warren
Regala,2 Luis Sayavedra-Soto, and Stephanie
Stilwagen2
Botany and Plant Pathology; Oregon State University;
2082 Cordley; Corvallis, OR 97331-2902
541/737-1294, Fax: -3573,
rpd@bcc.orst.edu
1University of Minnesota
2Lawrence Livermore National
Laboratory
3University of Louisville
4Utah State University
As part of the DOE initiative to explore the role of
microorganisms in global carbon sequestration, the
Joint Genome Institute intends to obtain the complete
genomic sequence of the autotrophic nitrifying
bacterium Nitrosomonas europaea. This organism
is the most studied of the ammonia-oxidizing bacteria
that are participants in the biogeochemical N cycle.
Nitrifying bacteria play a central role in the
availability of nitrogen to plants and hence in
limiting CO2 fixation. The reaction
catalyzed by these bacteria is the first step in the
oxidation of ammonia to nitrate. These bacteria also
are important players in the treatment of industrial
and sewage waste in the first step of oxidizing ammonia
to nitrate. Evidence suggests that ammonia-oxidizing
bacteria contribute significantly to the global
production of nitrous oxide (produced by the reduction
of nitrite). N.europaea also is capable of
degrading a variety of halogenated organic compounds,
including trichloroethylene, benzene, and vinyl
chloride. The ability of nitrifying organisms to
degrade some pollutants may make these organisms
attractive for controlled bioremediation in nitrifying
soils and waters.
N. europaea is a Gram-negative member of the
B-proteobacteria subdivision, possessing a
genome size of at least 2.2 Mb. The microbe can be
transformed and deletion mutants engineered, allowing
the study of genotype-phenotype relationships. To
complete the sequence of the N. europaea genome,
a whole-genome shotgun strategy is being used similar
to that employed successfully for many tens of
bacterial organisms. The 8X genome coverage generated
by the shotgun data is being supplemented with a
scaffold of paired end sequences from clones in the
low-copy-number fosmid vector. Shotgun data from this
organism were assembled with PHRAP (Phil Green,
University of Washington) and will progress through
"auto-finishing" using software written by Matt Nolan
(JGI-LLNL) and David Gordon (University of Washington)
prior to human intervention in the assembly.
Fingerprinting of a minimal spanning path of fosmids
will be used to aid verification of the final assembly.
A sequence-analysis pipeline, developed by Manesh Shah
and Frank Larimer of Oak Ridge National Laboratory, is
being used to define open reading frames (ORFs) and
query public databases for protein-nucleotide
similarities. Periodic lists of putative ORFs will
appear on the
Web site
(http://genome.jgi-psf.org/mic_home.htmlindex.html) as the genomic coverage continues to grow.
The raw sequence data also are directly queryable
through the accompanying BLAST server or can be
downloaded from the JGI ftp server.
This will be the second member of the b-subdivision to
have been sequenced. The most-studied gene products in
this organism are those involved in the oxidation of
ammonia, principally the hydroxylamine oxidoreductase
(HAO), ammonia monooxygenase (AMO), and the
accompanying cytochromes that make up the
electron-transport chain. We hope the genome sequence
will reveal strong candidates for as-yet-unidentified
proteins specific to the N-oxidation pathways unique to
this organism. The nature and regulation of enzymes in
the nitrite-to-nitrous oxide pathway also are of
interest. The operon encoding the subunits of AMO is
duplicated and the amino acid sequences of the two
operons differ by only a single nucleotide. The gene
that codes for HAO is present in three copies. The
extent to which other genes are duplicated in the
genome is not known but is one anticipated outcome of
generating the genomic sequence of N. europaea.
As one of the few strictly autotrophic bacteria
currently being sequenced, N. europaea's genome
sequence is expected to reveal the identity and number
of genes required for and suited to autotrophy and
possibly provide an indication of the basis for
obligate autotrophy. The sequence will allow direct
comparison to genes identified in another
lithoautotrophic organism, Thiobacillus
ferroxidans, which derives its energy from the
oxidation of iron or sulfur compounds. Comparison of
the metabolic capabilities of this organism with those
of photoautotrophs and other lithoautotrophs may reveal
the range of capabilities that were lost or gained as
N. europaea descended from its evolutionary
ancestors.
Nostoc Genome
Sequencing
Ronald M. Atlas
Department of Biology; University of Louisville;
Louisville KY 40292
502/852-3957, Fax: -0725,
r.atlas@louisville.edu
An expert advisory panel met with Jane Lamerdin of the
Joint Genome Institute (JGI) to select a strain of the
heterocystous cyanobacterium Nostoc for genome
sequencing. Based upon its relevance to carbon
sequestration and the likelihood of providing
significant new scientific information, the panel
selected Nostoc punctiforme PCC 73102, ATCC
29133. This strain fixes nitrogen and carbon
dioxide, forms symbiotic relationships, exhibits cell
differentiation with the formation of motile hormogonia
(a diagnostic characteristic of the genus
Nostoc), has a complex life cycle, has
established genetic transfer systems, and is divergent
from other cyanobacteria being sequenced. DNA from
N. punctiforme is being prepared by Jack Meeks
for submission to JGI for sequencing. The advisory
panel will work with JGI during the annotation phase
and will participate in publication of the data. The
panel consists of Ronald M. Atlas (Department of
Biology, University of Louisville), Jack Meeks
(Division of Biological Sciences, University of
California, Davis), Malcolm Potts (Department of
Biochemistry and Nutrition, Virginia Polytechnic
Institute); Jeff Elhai (Department of Biology,
University of Richmond), and Theresa Thiel (Department
of Biology, University of Missouri, St. Louis).
The Complete Genome Sequence
of Prochlorococcus
Sallie W. Chisholm
Departments of Civil and Environmental Engineering and
Biology; Massachusetts Institute of Technology; 15
Vassar St. 48425; Cambridge, MA 02139
617/2531771, Fax: /2587009,
chisholm@mit.edu
http://web.mit.edu/chisholm/www
Prochlorococcus is a unicellular cyanobacterium
that is very abundant in the temperate and tropical
oceans. It has been shown to contribute 32 to 80% of
the total photosynthesis in the world's oligotrophic
oceans, the higher values being found in the Pacific.
Thus, Prochlorococcus plays a significant role
in the global carbon cycle and the regulation of the
earth's climate.
Molecular phylogenies have shown that
Prochlorococcus is closely related to marine
Synechococcus, forming a single lineage within
the cyanobacteria. Unlike Synechococcus,
Prochlorococcus lacks phycobilisomes and
contains divinyl chlorophyll a (8desethyl, 8vinyl
chlorophyll a, or "chla2") and divinyl chlorophyll b
(chlb2) as its major photosynthetic pigments. These
pigments enable it to absorb blue light more
efficiently than Synechococcus at the low-light
intensities and blue wavelengths characteristic of the
deep euphotic zone.
We recently demonstrated that there are at least two
ecotypes of Prochlorococcus, each of which is
distinguished by its photophysiology and molecular
phylogeny. One is capable of growth at irradiances, and
the other is not. We hypothesize that multiple ecotypes
of Prochlorococcus coexist in all oceanic
environments, alternating in dominance according to
light gradients and seasonal mixing dynamics. We would
expect to find, for example, that ecotypes adapted to
low light are dominant at the base of the euphotic zone
in stratified waters and those adapted to high light
dominate at the surface. The ecotypes differ in other
physiological properties besides light-harvesting
efficiencies, and these too will play a role in
regulating their distributions. Ultimately, a
comparison of the complete genomes of these two
ecotypes will provide valuable insights into the
regulation of microdiversity in marine microbial
systems.
Prochlorococcus is an ideal candidate for
complete genome sequencing for a variety of reasons:
(1)it is the smallest known phototroph and has a
relatively small genome size (1.8Mb); (2)it is
widespread and abundant and is easily identified and
enumerated in its environment using flow cytometry;
(3)its unique photosynthetic pigment (divinyl
chlorophyll a) makes its contribution to total
photosynthetic biomass in natural communities easily
assessed; (4)different ecotypes have been identified
that are very closely related according to their 16S
rRNA sequences but are physiologically distinct; and
(5)we have an extensive culture collection of isolates
from different oceans and environments.
We plan to work with scientists at the DOE Joint Genome
Institute
(
JGI Prochlorococcus Web site http://genome.jgi-psf.org/mic_home.htmlindex.html) to obtain the
entire genomic sequence of Prochlorococcus
marinus (MED4), one of the ecotypes adapted
to high light. Our role in the project is to supply
Prochlorococcus DNA and to be a general source
of information on the ecology and biology of the
organism.
Sequencing the Large Linear
Chromosome of Borellia burgdorferi and a Strain
of Clostridium
John J. Dunn and F. William Studier
Biology Department; Brookhaven National Laboratory;
Bldg. 463, 50 Bell;
P.O. Box 5000; Upton, NY 11973-5000
631/344-3012, Fax: -3407,
jdunn@bnl.gov
631/344-3390, Fax: -3407,
studier@bnl.gov
www.genome.bnl.gov
In a program to explore possible improvements in the
accuracy, speed, and efficiency of genome sequencing,
we sequenced the large linear chromosome of Borrelia
burgdorferi, the spirochete that causes Lyme
disease. This 909,275-bp sequence is available on our
Web site, along with a comparison of the same sequence
determined independently by The Institute for Genomic
Research (TIGR).
The Brookhaven National Laboratory (BNL) sequence was
determined by random first-end and directed second-end
sequencing of plasmid libraries of random chromosomal
fragments, followed by primer walking using 12-mer
primers generated by ligation of two hexamers on
hexamer templates. The sequence assembly was confirmed
and contigs were aligned by end sequencing a framework
of ~35-kb fesmid clones, which spanned the entire
sequence. The few remaining gaps were filled by
polymerase chain reaction amplification from fesmid
clones or genomic DNA. The sequence extends to the ends
of the clones we obtained (which did not include the
covalently closed ends of the chromosome) and lacks
404bp at the left end and 249bp at the right end of
TIGR's sequence, which extends to the ends. The entire
BNL sequence was determined at least once on each
complementary strand.
The BNL and TIGR sequences are very similar, but there
are some differences. The TIGR sequence contains seven
copies of a 162bp imperfect tandem repeat that occurs
only twice in the BNL sequence. There are 86 other
discrepancies, only some of which are in a few
remaining areas of relatively low quality in the BNL
sequence. In addition, the BNL sequence contains 65
ambiguities (reflecting different base pairs at the
same position in different clones), and the TIGR
sequence contains 43 ambiguities. For each ambiguity in
either sequence, one of the ambiguous bases matches the
base at that position in the other sequence. It seems
likely that each DNA preparation used for cloning and
sequencing has polymorphisms at the 0.01% level, with a
similar level of polymorphism between the two DNA
preparations.
We are currently sequencing the genome of a
Clostridium strain being studied at BNL as a
possible bioremediation agent. This anaerobic,
nitrogen-fixing spore former can convert water-soluble
uranyl ion U(VI) to less soluble U(IV). Its circular
genome is about 4Mb, and no plasmids have been
detected. More than 500kb of edited unique sequence has
been obtained so far. Clone libraries are being
constructed in vectors we developed that allow an
ordered set of nested deletions to be generated from
either end of cloned fragments at least 10 kb long.
These vectors were designed to allow sequencing and
ordered assembly of both DNA strands in highly repeated
regions such as those encountered in human DNA. In
Clostridium, the vectors allow directed
sequencing of particularly interesting areas by using
nested deletions to fill in the framework generated by
end sequencing. We expect to sequence the relevant U
and N2 reductases and identify most genes
involved in intermediary metabolism.
This is a completed project.
DOE-Funded Microbial Genome
Sequencing
at The Institute for Genomic Research
Claire Fraser
The Institute for Genomic Research; 9712 Medical Center
Dr.; Rockville, MD 20850
301/838-3500, Fax: -0209,
cfraser@tigr.org
www.tigr.org
The Institute for Genomic Research (TIGR) is a
not-for-profit research institute with interests in
structural, functional, and comparative analysis of
genomes and gene products in viruses, bacteria,
archaea, and both plant and animal eukaryotes,
including humans. Microbial genome-sequencing efforts
at TIGR supported by the Department of Energy since
1995 have produced complete genome sequences for five
organisms: Mycoplasma genitalium,
Methanococcus jannaschii, Archaeoglobus
fulgidus, Thermotoga maritima, and
Deinococcus radiodurans. In addition, nine other
DOE-funded microbial genome projects are in progress at
TIGR, with an estimated completion date of 2001 for all
work. In total, the DOE-funded microbial genome
sequencing projects at TIGR represent nearly 33million
base pairs (Mb) of DNA and an estimated 30,000
microbial genes. The information generated in these
projects is available from the
TIGR Microbial
Database.
The strategy that we use for whole-genome sequencing is
called a "shotgun" method. In shotgun sequencing, the
genome is sheared randomly into small pieces that are
then cloned, sequenced, and reassembled to form a whole
genomic sequence. With the shotgun approach, there is
no need to develop a genetic or physical map of the
genome before sequencing it; the sequence itself serves
as the ultimate map. In large shotgun-sequencing
projects, DNA fragments are assembled into a consensus
sequence. Key to the success of the shotgun method is
the availability of a truly random genomic DNA clone
library and a powerful, accurate algorithm for
reassembling the fragments into a complete genome. The
basic approach for genome assembly is to compare all
individual sequences to find overlaps and use this
information to build a consensus sequence. Using new
software developed at TIGR for large-scale genome
sequencing projects, we have assembled the complete
genomes of 12 microbial species to date.
The next step in whole-genome analysis is to identify
all the predicted genes and search the translated
protein sequences against protein sequences available
in public databases. Because of the tremendous
conservation in protein sequence among organisms
throughout evolution, putative genes can be identified
by sequence similarities.
The Minimal Gene Complement of M.genitalium
The Mycoplasma class consists of small wall-less
bacteria that parasitize a wide range of hosts,
including humans, animals, plants, insects, and cells
in culture; they are believed to represent a minimalist
life form, having yielded to selective pressure to
reduce genome size and eliminate unnecessary genes.
M.genitalium was selected as one of the first to
be sequenced because it has the smallest genome of any
known free-living organism. M.genitalium lives
in a parasitic relationship with its primate hosts in
ciliated epithelial cells of genitalia and respiratory
tracts. Examining the makeup of the M.
genitalium genome reveals much about the metabolic
and biochemical capacity of this organism.
All genes necessary for life in M. genitalium
are packaged in a 580,070-base (bp) circular
chromosome. Genome analysis suggests that the M.
genitalium genome contains about 470 genes (average
size, 1040bp), which make up 88% of the genome (on
average, a gene every 1235bp). This value is similar to
that found in other microbial genome sequences. These
data indicate that Mycoplasma's reduction in
genome size has not resulted in increased gene density
or decreased gene size.
A complement of genes involved in DNA maintenance,
repair, transcription, translation, and cellular
transport is present; however, no complete pathways for
amino acid, fatty acid, purine, or pyrimidine
biosynthesis were identified in M. genitalium.
Comparison of the minimal M. genitalium genome
to that of more complex organisms suggests that
differences in genome content are reflected as profound
differences in physiology and metabolic capacity. The
reduction in M. genitalium's genome size is
associated with a marked reduction in the number and
components of biosynthetic pathways, thereby requiring
the pathways to use metabolic products from their
hosts.
Perhaps one of the most surprising findings from
whole-genome sequencing and analysis of M.
genitalium is that about one-third of the predicted
proteins identified in this organism displayed no
sequence similarity to known genes from any other
organisms. This means that, even for this simplest of
free-living organisms, we still do not understand a
considerable amount of its biology. Determining whether
the unknown genes in M. genitalium are species
specific or exhibit a more widespread phylogenetic
distribution will be of interest.
Comparing the M. genitalium genome with those of
other microorganisms from diverse habitats will provide
insights into what constitutes a minimal set of genes
necessary for a self-replicating organism as well as
the mechanisms associated with changes in genome
organization and content in nature. This information,
in turn, will be useful for modifying and engineering
organisms to perform specific biochemical tasks in the
laboratory or the environment.
Genome Sequence of the Archaeon M. jannaschii
The archaea were discovered as a unique phylogenetic
domain of life by Carl Woese in the 1970s using
sequence data from the small subunit of ribosomal RNA
as a biosystematic marker. M. jannaschii was the
first representative of the archaeal domain to be
completely sequenced. Isolated in 1982 from a deep-sea
hydrothermal vent, M. jannaschii fixes carbon
dioxide to methane as its primary energy-producing
biochemical pathway. Because this organism thrives at
deep-sea pressures and temperatures of 85°C and
above, its genome should provide insights into how
genomes and gene products survive and function under
these extreme conditions. Understanding the genetic
basis of methanogenesis biochemistry in the
thermophilic, barophilic M. jannaschii will
bring us closer to harnessing the unique biochemistry
of methanogens as a source of renewable energy.
Analysis of the M. jannaschii genome sequence
reveals that between 50 and 60% of its genes or gene
products have no match to any other currently known
gene sequence. In addition, initial attempts to map
database-matched genes onto known biochemical pathways
suggest that M. jannaschii's biochemistry and
physiology are quite unique among cellular organisms.
For example, certain enzymes associated with
gluconeogenesis and the synthesis of pentose sugars for
nucleotide biosynthesis, such as fructose
1,6-biophosphate aldolase and fructose 1,6-biophosphate
phosphatase, are not found among the predicted genes in
M. jannaschii. Whether other gene products have
been recruited to serve the function of these missing
genes or the genes cannot be detected by standard
sequence similarity methods is not yet known.
Most genes involved in M. jannaschii's
cellular-information processing (replication,
transcription, and translation) are more similar to
functionally equivalent counterparts in eukaryotes, not
bacteria. On the other hand, M. jannaschii genes
that are involved in energy production, cell division,
and basic cellular metabolism are more like genes in
bacteria. Further analysis of the M. jannaschii
genome sequence, together with sequence from other
members of the archaeal domain of life, will give
additional insights into the evolutionary relationship
among the prokaryotes.
Complete Sequence of the Thermophilic Archaeon A.
fulgidus
Biological sulfate reduction is part of the global
sulfur cycle, ubiquitous in the earth's anaerobic
environments and essential to the workings of the
biosphere. Growth by sulfate reduction is restricted to
relatively few groups of prokaryotes; all but one of
these is bacteria, the exception being the archaeal
sulfate reducers in the archaeoglobales. These
organisms are unique in that they are unrelated to
other sulfate reducers and they grow at extremely high
temperatures, between 60 and 95°C. They can grow
both organoheterotrophically (using a variety of carbon
and energy sources) or lithoautotrophically on
hydrogen, thiosulfate, and carbon dioxide. The known
archaeoglobales are strict anaerobes, most of which are
hyperthermo-philic marine sulfate reducers found in
hydrothermal environments and in subsurface oil fields.
High-temperature sulfate reduction by
Archaeoglobus species contributes to deep
subsurface oil well "souring" by producing iron
sulfide, which causes corrosion of iron and steel in
oil- and gas-processing systems.
The genome of the type-strain of the archaeoglobales
A. fulgidus was sequenced to better understand
the biology of this group of organisms. Genome analysis
reveals a total of ~2400 genes; these include genes for
sulfate reduction, a great diversity of electron
transport systems, a large number of transporters with
specificity for both organic and inorganic molecules,
and b-oxidation of fatty acids. The
information-processing systems and the biosynthetic
pathways in A. fulgidus have counterparts in the
archaeon M. jannaschii. However, the genomes of
these two archaea indicate dramatic differences in the
way these organisms sense their environment, perform
regulatory and transport functions, and gain energy.
Another interesting feature revealed by genome analysis
is that A. fulgidus displays extensive gene
duplication in comparison with other fully sequenced
prokaryotes. This suggests that gene duplication has
been an important evolutionary mechanism for increasing
physiological diversity in the archaeoglobales.
About 25% of the A. fulgidus genome encodes
conserved genes with unknown biological function,
two-thirds of which are shared with M.
jannaschii. Another 25% of the A. fulgidus
genome represents genes that are unique to this
organism, indicating that there is substantial
diversity among members of the archaea. As additional
archaeal and bacterial genome sequences are completed,
we may begin to define a core set of genes that are
shared among prokaryotes and those that are unique to
bacterial or archaeal species.
Thermotoga maritima
The thermotogales are a group of nonsporeforming
rod-shaped bacteria that represent the most
thermophilic of the known organotrophic bacteria. The
type strain Thermotoga maritima MSB8, isolated
originally from geothermal-heated marine sediment at
Vulcano, Italy, has an 80°C optimum temperature for
growth. T. maritima metabolizes many simple and
complex carbohydrates including glucose, sucrose,
starch, xylan, and cellulose. Xylan is a complex plant
polymer that represents the most abundant noncellulosic
polysaccharide in angiosperms, where it accounts for 20
to 30% of the dry weight of wood tissues. Cellulose is
the most abundant biopolymer occurring in nature,
estimated to account for 75 X 109 tons of
dry plant biomass annually. Both cellulose and xylan,
through conversion to fuels (e.g., H2), have
major potential as renewable carbon and energy sources.
T. maritima is of evolutionary significance
because small subunit ribosomal RNA (SSU rRNA)
phylogeny has placed the bacterium as one of the
deepest and most slowly evolving bacteria. To further
elucidate its unique metabolic properties and
evolutionary relationship to other microbial species,
we sequenced the genome of T. maritima MSB8
using the whole-genome random sequencing method. The
1,860,725-bp T. maritima genome contains 1872
predicted coding regions, 54% (1005) of which have
functional assignments and 46% (867) of which are of
unknown function. Almost 7% of the predicted coding
sequences in the T. maritima genome are involved
in the metabolism of simple and complex sugars, a
percentage more than twice that seen in other bacterial
and archaeal species sequenced to date. Biosynthetic
pathways for nine amino acids were identified in T.
maritima, but the bacterium has an extensive system
for the uptake of peptides from the environment.
Phylogenetic analysis of genes in the T.
maritima genome has demonstrated that gene
evolution may not give a true picture of organismal
evolution; gene duplication, gene loss, and horizontal
gene transfer probably account for many inconsistencies
in single-gene phylogenies. The complete genome of
T. maritima has, however, revealed a degree of
similarity with the thermophilic archaea in terms of
gene content and overall genome organization that was
not previously appreciated. Of the sequenced bacteria,
T. maritima has the highest percentage(24%) of
genes that are most similar to archaeal genes. Some 81
of these genes are clustered in regions of the genome
that range in size from 4 to20kb. Five of these regions
have a composition substantially different from the
rest of the genome, suggesting that lateral gene
transfer has occurred between the thermophilic archaea
and bacteria. In addition, repeat structures in T.
maritima have been identified only in thermophiles,
and 108genes on the T. maritima genome have
orthologues only in the genomes of other thermophilic
bacteria and archaea. One explanation for the
relatedness between thermophilic organisms seems to be
the occurrence of lateral gene transfer.
Deinococcus radiodurans
Deinococcus radiodurans, originally discovered
in food samples exposed to severe gamma irradiation, is
the most radioresistant organism ever isolated. An
important component of this resistance is the ability
to repair damage to its own chromosomal DNA. D.
radiodurans cultures exposed to 1.5Mrad of
radiation display a reduction in size of genomic DNA
fragments corresponding to about 100 double-stranded
breaks per genome. Typically, most prokaryotic and
eukaryotic organisms cannot tolerate more than five
double-stranded breaks per genome without reduced
survival.
Within 8 to 10 hours after radiation exposure, the
D. radiodurans genome is fully restored with no
evidence of double-stranded breaks. During this repair
time, cellular replication of D. radiodurans is
arrested; after this 8- to 10-hour interval, the cells
display 100% survival with no detectable mutagenesis of
their completely restored genome. DOE's interest in
D. radiodurans includes understanding its
ability to withstand radiation, particularly as it
relates to the possibility of this organism's potential
for bioremediation of toxic waste sites that contain
radioactive isotopes.
The genome sequence of D. radiodurans is
complete, and we have determined that the genome is
composed of three chromosomes and a small plasmid.
Inspection of the set of genes with similarity to
DNA-repair enzymes has so far been inconclusive
regarding radiation resistance; D. radiodurans
does not appear to contain repair genes that would make
it unique among other bacteria. However, a number of
unique sequence elements have been identified that are
being tested for their role in radiation resistance.
These experiments, coupled with the high-throughput
analysis of gene expression using microarray
technology, should lead to a more complete
understanding of this bacterium's gamma radiation
resistance in the near future.
Shewanella putrefaciens: A Model Organism for
Bioremediation
Shewanella putrefaciens is a bacterium involved
in microbiologically influenced corrosion, anaerobic
consumption of toxic organic pollutants, removal of
toxic metals by sulfide precipitation, and removal of
toxic metals and radionuclides by conversion to
insoluble reduced forms. Whole-genome sequencing of
S. putrefaciens will furnish the bioremediation
community with detailed knowledge of metabolic pathways
involved in all these processes, providing an excellent
model system for manipulating organisms for remediation
or control.
In addition, a complete genome sequence for S.
putrefaciens will furnish important information on
engineering specific regulatory mutants for
bioremediation. For example, mutants that continue to
metabolize anaerobically, even in the presence of
oxygen, could be used to remove uranium
(U6+) in dilute environments where oxygen is
still present. S. putrefaciens grows both
aerobically and anaerobically. In its anaerobic phase,
it acts as a metal reducer. The potential of
metal-reducing bacteria in pollutant removal is very
high for both the short and long terms, especially for
those iron reducers that are not inhibited by oxygen.
Two separate reports suggest that Shewanella
spp. can donate electrons to chlorinated hydrocarbons,
thus reductively dechlorinating toxic compounds by
converting tetrachloromethane to trichloromethane. In
addition, organisms such as S. putrefaciens,
which can produce Fe2+, have potential to
catalyze the reduction of toxic nitrates. Metals can be
removed from solution via direct reduction by
metal-reducing bacteria such as S. putrefaciens.
While iron and manganese are solubilized, other metals
are converted to insoluble forms upon reduction. Of
note are chromium (Cr6+) and uranium
(U6+), both of which are soluble in the
oxidized form but insoluble as the respective species
reduced by Cr3+ and U3+.
Reduction of U6+ has been demonstrated for
S. putrefaciens and has been proposed as a
mechanism for concentrating and thus removing
radionuclide waste. As with uranium, the removal of
toxic chromium should be possible using either intact
cells or cell-free systems of the metal-reducing
bacteria.
Complete genome sequences for all these metabolic
processes would accelerate bioremediation efforts in
metal and radionuclide reduction, chlorinated
hydrocarbon pollutants, and toxic nitrates. We are
midway through the closure process in the complete
genome sequencing of S. putrefaciens. Random
sequencing was completed in July 1998, and closure
began in August 1998. Analysis of the assemblies
suggests that the completed genome size will be about
5Mb.
Preliminary observation of the gene content of this
organism has shown similarities between S.
putrefaciens and Vibriocholerae in some role
categories (small molecule biosynthesis, central
intermediary metabolism) but differences in others
(sugar metabolism). It will be interesting to examine
these similarities and differences in light of the
different ecological niches occupied by these
organisms.
Chlorobium tepidum
The taxonomic group of green sulfur bacteria
(Chlorobiaceae) are formally classified as
Gram-negative organisms. Members of this genus are
photoautotrophs that can generate chemical energy
through an electron transport chain in the cytoplasmic
membrane that is associated with a light-harvesting
complex housed in a specialized organelle called the
chlorosome. The components of this light-harvesting
apparatus and some of its organizational structure are
reminiscent of photosystems found in plant chloroplasts
and, therefore, the evolutionary relationship of these
prokaryotes to eukaryotic organelles is of interest.
Chlorobium species also can fix CO2 ,
although the biochemical pathway used by these
prokaryotes is distinct from the Calvin cycle found in
higher plants.
C. tepidum initially was identified from a hot
spring in New Zealand. This species is thermophilic
with an optimum growth temperature of about 47°C.
It has a genome size of 2.1Mb with a G+C content of
56.5mol%. C. tepidum was nominated for
sequencing by DOE because of its photosynthetic
capacity and its interesting phylogenetic position in
the bacterial kingdom.
C. tepidum sequencing and closure has been
completed. Genome annotation is under way and soon will
be completed.
Caulobacter crescentus
Caulobacter crescentus is placed in the
alpha-purple bacteria that also include
Rickettsia, Rhizobium,
Agrobacterium, and Brucella species. It
is the most prevalent nonpathogenic bacterium in
nutrient-poor freshwater streams. It is also found in
marine environments. To facilitate location of nutrient
sources, C. crescentus is motile and
chemotactically competent during the swarmer phase of
its life cycle. In its nonswarmer phase
Caulobacter adheres to solid substrates such as
rocks. It is a component of the organisms responsible
for sewage treatment. Caulobacters are being
modified for use as bioremediation agents for removing
heavy metals from wastewater streams.
Caulobacter crescentus exhibits a well-studied
developmental pattern, independent of environmental
stress, with morphologically defined stages of the cell
cycle. It has easily observable physical structures
that define these specific cell cycle stages. Two major
events in C. crescentus cell cycle are used by
researchers to elucidate fundamental processes required
for development. These are the tight regulation of
chromosomal replication and the temporally and
spatially regulated biogenesis of the flagellum. The
two processes are linked by a common transcriptional
regulator that orchestrates the response of multiple
cellular processes to the progression of the cell
cycle.
The genome was electronically annotated at the end of
the random sequencing phase; the data, along with the
assembly files, was sent to Dr. Lucy Shapiro (Stanford
University), Dr. Bert Ely (University of South
Carolina), and Dr. Janine Maddock (University of
Michigan), who are collaborating with us on final
assembly and annotation of the genome. The project is
now in the closure phase.
Pseudomonas putida
Sequencing of Pseudomonas putida KT2440 began in
January 1999 as a joint effort between TIGR and a
German consortium consisting of groups from MHH
(Medizinische Hochschule Hannover, Hannover, Germany);
GBF (Gesellschaft für Biotechnologische Forschung
mbH, Braunschweig, Germany); DKFZ (Deutsches
Krebsforschungs-zentrum, Heidelberg, Germany); and
QIAGEN (QIAGEN GmbH, Hilden, Germany). The study is
supported by grants from BMBF of Germany and the U.S.
Department of Energy.
The genome sequence will be used for in-depth
functional analyses including comparisons of genome
structure and function with the related organism P.
aeruginosa. Understanding structure and function of
the P. putida genome will allow for its
increased use in biotechnological areas, including the
production of natural compounds, remediation of
polluted habitats, and the use of strains to fight
plant diseases.
The P. putida genome sequence is expected to be
closed in the next few months. The number of libraries
for scaffolding the genome, access to the genome
sequence of P. aeruginosa, and the complementary
functional studies being conducted by the German
consortium should reduce chances of major assembly
problems in the genome.
Geobacter sulfurreducens
The complete genome sequence of Geobacter
sulfurreducens is being determined to better
understand its genetic potential. G.
sulfurreducens is an important member of a family
(Geobacteraceae) of delta proteobacteria capable of
oxidizing organic compounds including aromatic
hydrocarbons to carbon dioxide with Fe(III) or other
metals and metalloids including U(VI), Tc(VII),
Co(III), Cr(IV), Au(III), Hg(II), As(V) and Se(VII)
serving as the terminal electron acceptor. It is the
dominant group of iron-reducing microorganisms
recovered from a wide variety of aquifer and subsurface
environments when both molecular and traditional
culturing techniques are used. Geobacter plays a
critical role in the biogeochemical cycling of carbon,
iron, and other metals. Its genetics and physiology are
a subject of intense study in part due to the
importance that these processes can play in the
remediation of contaminated anaerobic subsurface
environments. The determination of the G.
sulfurreducens genome is being accomplished using a
random shotgun cloning approach to provide at least
sixfold coverage of a 1-Mb genome followed by closure
of remaining physical or sequence gaps. Searches of
sequences and contigs from the early random phase of
sequencing using the BLAST algorithm and database have
produced high scores with low expect values indicating
significant homologies to proteins contained in the
database. These include enzymes considered important to
basic housekeeping functions such as tRNA synthases and
amino acid synthesis as well as those essential to
other metabolic processes known to occur in G.
sulfurreducens including nitrogen fixation. A
number of sequences have produced no significant
alignments indicating the likelihood of genes encoding
for novel functions. Of further significance has been
the extension of N-terminal sequences previously
obtained from cytochromes known to be important in
dissimilatory iron reduction. Thus, the genome will
provide information crucial to the further
understanding of this important metabolic process.
The Comprehensive Microbial Resource
One of the challenges presented by large-scale genome
sequencing efforts is the effective display of
information in a format that is accessible to the
laboratory scientist. Conventional databases offer the
scientist the means to search for a particular gene,
sequence, or organism but do little to display the vast
amounts of curated information that are becoming
available. TIGR has developed methods to effectively
"slice" the vast amounts of data in the sequencing
databases in a wide variety of ways, allowing the user
to formulate queries that search for specific genes as
well as to investigate broader topics such as genes
that might serve as vaccine and drug targets.
The Comprehensive Microbial Resource (CMR) is a
facility for annotation of TIGR genome sequencing
projects, a Web presentation of all fully sequenced
microbial genomes, curation from the original
sequencing centers, and further curation from TIGR (for
those genomes sequenced outside TIGR). The Web
presentation of CMR includes the comprehensive
collection of bacterial genome sequences, curated
information, and related informatics methodologies. The
scientist can view genes within a genome and also can
link to related genes in other genomes. This allows
construction of queries that include sequence searches,
isoelectric point, GC-content, GC-skew, functional role
assignments, growth conditions, environment, and other
questions and the isolation of genes of interest. The
database contains extensive curated data as well as
prerun homology searches to facilitate data mining. The
interface allows the display of the results in numerous
formats that will help the user ask more accurate
questions. This resource should be of value to the
scientific community to design experiments and spur
further research. Resources of this type are an
essential tool to make sense of bacterial genome
information as the number of completed genomes
continues to grow.
Rhodopseudomonas
palustris Genome Project
Caroline S. Harwood
Department of Microbiology; University of Iowa; 3-432
Bowen Science Bldg.; Iowa City, IA 52242
319/335-7783, Fax: -7679,
caroline-harwood@uiowa.edu
Rhodopseudomonas palustris is a common soil and
water bacterium that makes its living by converting
sunlight to cellular energy and by absorbing
atmospheric carbon dioxide and converting it to
biomass. This microbe can also degrade and recycle
components of the woody tissues of plants (wood is the
most abundant polymer on earth). Because of its
intimate involvement in carbon management and
recycling, R. palustris has been selected by the
DOE Carbon Management Program to have its genome
sequenced by the Human Genome Program's Joint Genome
Institute (JGI).
R. palustris is acknowledged by microbiologists
to be one of the most metabolically versatile bacteria
ever described. Not only can it convert carbon dioxide
gas into cell material but nitrogen gas into ammonia,
and it can produce hydrogen gas. It grows both in the
absence and presence of oxygen. In the absence of
oxygen, it prefers to generate all its energy from
light by photosynthesis. It grows and increases its
biomass by absorbing carbon dioxide, but it also can
increase biomass by degrading organic
compoundsincluding such toxic compounds as
3chlorobenzoateto cellular building blocks. When oxygen
is present, R. palustris generates energy by
degrading a variety of carboncontaining compounds
(including sugars, lignin monomers, and methanol) and
by carrying out respiration.
R. palustris undergoes two major developmental
processes. The first is cell division by budding. This
process of asymmetric cell division results in two
different kinds of daughter cellsone a motile swarmer
cell and the other a stalked nonmotile cell. The second
is the differentiation of an elaborate system of
intracytoplasmic membrane vesicles when cells run out
of oxygen and are placed in light. The membranes are
used to house photosynthetic pigments and associated
proteins. Budding division and differentiation to
photosynthetically competent cells both require a
temporally regulated program of gene expression
followed by a pattern of precise localization of
protein products.
The diverse metabolism and the developmental cycles of
R. palustris are a large part of what makes this
bacterium such a seductive target for genome
sequencing. With the entire genome sequence in hand,
determining how R. palustris can coordinate and
appropriately express its many metabolic capabilities
in response to changing environmental conditions will
be possible, as will devising strategies to maximize
this bacterium's carbon-recycling capabilities.
R. palustris has a genetic system; genes can be
moved in and out of this bacterium easily, and specific
genes thus can be targeted for mutagenesis. This is of
great value because it will allow researchers to
rapidly apply information gained from genome sequencing
to the developing area of functional genomics.
This work will supply the JGI with sufficient R.
palustris genomic DNA for genome sequencing as well
as any information needed about the biology of R.
palustris.
Sequencing Microbial Genomes
of Environmental Relevance
Jane E. Lamerdin
Joint Genome Institute; Lawrence Livermore National
Laboratory; 7000 East Ave.; Livermore, CA 94550
925/423-3629, Fax: /422-2282,
lamerdinl@llnl.gov
http://genome.jgi-psf.org/mic_home.html
The DOE Joint Genome Institute (JGI) has established a
new program to obtain the complete genome sequence of
microorganisms that may significantly impact global
climate. This program supports the new DOE Global
Carbon Management and Sequestration initiative, which
funds basic research aimed at understanding factors
that contribute to global warming and effective ways to
manage carbon (particularly carbon dioxide) in soil and
ocean ecosystems. The goal of JGI's effort is to
explore the role of diverse microorganisms in carbon
cycling by elucidating their genetic content to
identify metabolic pathways that allow these organisms
to adapt to their respective niches. These specialized
processes include nutrient-uptake systems, pathways
that contribute to nitrogen fixation and carbon cycling
in soils, and pathways that regulate photosynthesis.
JGI's work is focused initially on five microorganisms:
Nitrosomonas europaea, Rhodopseudomonas
palustris, Nostoc punctiforme, and two
marine cyanobacteria, Prochlorococcus
marinus and Synechococcus. The common
trait shared by these microbes is that all are
autotrophic (i.e., they fix C02 as their
sole carbon source), are fairly numerous within their
respective ecosystems, and contribute materially to
carbon cycling or biomass production (with the
exception of N. europaea).
N. europaea is a soil-dwelling
chemolithoautotroph that oxidizes ammonia to nitrite, a
process that often depletes nitrogen available to
plants, thereby limiting C02 fixation.
Significantly, when oxygen concentrations in soils are
low, N. europaea oxidizes nitrite to
N20, a catalyst of ozone breakdown and
greenhouse gas production. We expect that the genome
sequence of N. europaea, one of the few
obligately autotrophic bacteria currently being
sequenced, will allow us to catalog the identity and
number of genes required for autotrophy. The genome
sequence also should uncover special redox enzymes that
allow N. europaea to adapt to the narrow niche
it occupies.
R. palustris is a purple nonsulfur phototrophic
bacterium commonly found in soils and fresh water. This
species is of particular interest to the Carbon
Management program because it is able to degrade and
recycle components of woody tissues of plants (wood is
the most abundant polymer on earth). It also possesses
a large repertoire of metabolic capabilities, including
the ability to fix C02 into cellular
material, fix nitrogen gas into ammonia, and produce
and use hydrogen gas. In the absence of oxygen, it
grows phototrophically; in the presence of oxygen, it
can generate energy by degrading sugars, organic acids,
and methanol and can carry out respiration.
Nostoc punctiforme is a cyano-bacterium that
enters into symbiotic associations with fungi and
lichens; these relationships are relevant to carbon
cycling and sequestration in tundra. Nostoc
species also have complex life cycles, fix nitrogen,
and are capable of chromatic adaptation.
Prochlorococcus and Synechococcus are
unicellular picoplankton, which are major biomass
producers in the world's temperate and tropical oceans.
Synechococcus species are abundant in surface
waters, while Prochlorococcus is found to exist
in the layer 100 to 200 m deep. Prochlorococcus
possesses an unorthodox pigment composition of divinyl
derivatives of chlorophyll a and b, alpha
carotene, zeaxanthin, and a type of phycoerythrin. The
last has not yet been shown to function in light
harvesting. By contrast, the highly related
Synechococcus contains chlorophyll a and
phycobilins that are more typical of cyanobacteria.
Prochlorococcus, the only photosynthetic
organism known to contain this particular combination
of pigments, could be a model for the ancestral
photosynthetic bacterium that gave rise to
cyanobacteria and chloroplasts. Sequence analysis of
the Prochlorococcus genome may shed more light
on this hypothesis, and a comparison of the two genomes
should provide additional insights into cyanobacterial
radiation in general.
In part due to the lack of physical maps and mapping
resources for these particular organisms, we have
employed a whole-genome shotgun strategy to determine
the complete sequence of each microbe. To aid our
assembly, we are supplementing our six- to eightfold
genome coverage in plasmid paired ends with a
large-insert scaffold of paired ends in the
low-copy-number fosmid vector. As the genome size
increases (e.g., in Nostoc), we will shift to
BAC clones for this scaffold. These scaffold clones are
being fingerprinted to aid in verification of the final
sequence assembly. We also will obtain optical maps of
several of the larger organisms, Nostoc in
particular, through a collaboration with David Schwartz
at the University of Wisconsin.
JGI has completed the initial data-generation phase for
N. europaea and P. marinus, which
produced >95% of the genomic sequence for each
microbe. (Progress towards completion can be monitored
through our
Web site http://genome.jgi-psf.org/mic_home.html) A similar level of coverage is
anticipated for R. palustris by mid-March.
Finishing is under way on the first two organisms, and
we expect closure of both by spring of 2000. With the
level of coverage achieved by the initial
data-generation phase, we can readily generate a rough
inventory of the types of genes present in each
organism. Preliminary or draft analyses have been
performed on N. europaea and P. marinus
by Frank Larimer and his team at Oak Ridge National
Laboratory. The resulting catalog format provides user
scientists with access to the contents of unfinished
sequence data in a consumable format, without the need
for protracted data manipulations on their part (see
http://compbio.ornl.gov/~fwl/neur_files.html).
This allows them to focus on identifying gene products
of particular interest to their research programs. The
raw sequence data also are directly queryable through
an accompanying BLAST server or can be downloaded from
JGI's ftp server.
In summary, JGI's new microbial sequencing program is
well under way, with at least three organisms on target
to be completed before the end of FY00. A scientific
advisory board has assigned additional organisms for
FY00 that continue the theme of relevance to the Global
Carbon Management and Sequestration effort. We
anticipate generating about 20 to 25 Mb of microbial
genomic sequence in FY00 (initially in ~eightfold
genome coverage) and ramping to a rate of 60 Mb in
FY01.
See also the related abstracts of Ronald Atlas, Daniel
Arp, David Schwartz, Caroline Harwood, Frank Larimer,
and Sallie Chisholm.
The Genome of Geobacter
sulfurreducens
B. A. Methe, Linda Banerjei,1 William C.
Nierman,1 O. Snoeyenbos-West, S. Sciufo, and
Derek R. Lovley
Department of Microbiology; University of Massachusetts;
Amherst, MA 01003
413/545-9651, Fax: -1578,
dlovley@microbio.umass.edu
1The Institute for Genomic Research;
Rockville, MD 20850
The complete genome sequence of Geobacter
sulfurreducens currently is being determined to
better understand its genetic potential. G.
sulfurreducens is an important member of a family
(Geobacteraceae) of delta proteobacteria. This family
is capable of oxidizing organic compounds including
aromatic hydrocarbons to carbon dioxide with Fe(III) or
other metals and metalloids including U(VI), Tc(VII),
Co(III), Cr(IV), Au(III), Hg(II), As(V) and Se(VII)
serving as the terminal electron acceptor. It is the
dominant group of iron-reducing microorganisms
recovered from a wide variety of aquifer and subsurface
environments when both molecular and traditional
culturing techniques are used. Geobacter plays a
critical role in the biogeochemical cycling of carbon,
iron, and other metals. Its genetics and physiology are
a subject of intense study in part due to the
importance that these processes can play in the
remediation of contaminated anaerobic subsurface
environments. The determination of the G.
sulfurreducens genome is being accomplished using a
random shotgun cloning approach to provide at least
sixfold coverage of a 1-Mb genome followed by closure
of remaining physical or sequence gaps. Assembler
software and other computer programs developed by The
Institute for Genomic Research are used to assemble the
genome and aid in gap closing, finishing, and
annotation. Searches of sequences and contigs from the
early random phase of sequencing using the BLAST
algorithm and database have produced high scores with
low expect values indicating significant homologies to
proteins contained in the database. These include
enzymes considered important to basic housekeeping
functions such as tRNA syntheses and amino acid
synthesis as well as those essential to other metabolic
processes known to occur in G. sulfurreducens,
including nitrogen fixation. A number of sequences have
produced no significant alignments, indicating the
likelihood of genes encoding for novel functions. Of
further significance has been the extension of
N-terminal sequences previously obtained from
cytochromes known to be important in dissimilatory iron
reduction. Thus, the genome will provide information
crucial to the further understanding of this important
metabolic process.
Optical Approaches for
Physical Mapping and Sequence Assembly of the
Deinococcus radiodurans Chromosome
David C. Schwartz
Biotechnology Center; University of Wisconsin-Madison;
425 Henry Mall; Madison, WI 53706
608/2650546, Fax: /2626748,
dcschwartz@facstaff.wisc.edu
www.chem.wisc.edu/~schwartz
Maps of genomic or cloned DNA frequently are
constructed by analyzing the cleavage patterns produced
by restriction enzymes. Restriction enzymes are
remarkable reagents that consistently cleave only at
specific four- to eight-nucleotide sequences, varying
according to the specific enzymes.
Restriction enzymes are reliable, numerous, and easily
obtainable, and there now are around 250 different
sequences represented among thousands of enzymes.
Restriction maps characterize gene structure and even
entire genomes. Furthermore, such maps provide a useful
scaffold for the alignment and verification of sequence
data. Restriction maps generated by computer and
predicted from the sequence are aligned with the actual
restriction map.
Restriction enzyme action traditionally has been
assayed by gel electrophoresis. This technique
separates cleaved molecules on the basis of their
mobilities under the influence of an applied electrical
field within a gelseparation matrix (small fragments
have a greater mobility than large ones). Although gel
electrophoresis distinguishes differentsized DNA
fragments (known as "fingerprinting"), the original
order of these fragments remains unknown. The
subsequent task of determining the order of such
fragments is labor intensive, especially when making
restriction maps of whole genomes, and, therefore, the
procedure is not widely employed despite its obvious
usefulness to genome analysis.
Our laboratory developed Optical Mapping, a system for
the construction of ordered restriction maps from
individual DNA molecules. The mapping substrate
consisted of very large, randomly sheared genomic DNA
fragments that were bound to derivatized glass surfaces
and cleaved with the restriction enzyme Nhe I.
The resulting fragments were imaged by fluorescence
microscopy. Cut sites were visualized as gaps between
cleaved DNA fragments that retained their original
order. A whole-genome restriction map of Deinococcus
radiodurans, a radiationresistant bacterium able to
survive up to 15,000 grays of ionizing radiation, was
constructed without using DNA libraries, the polymerase
chain reaction, or electrophoresis. Very large,
randomly sheared, genomic DNA fragments were used to
construct maps from individual DNA molecules that were
assembled into two circular overlapping maps (2.6 and
0.415 Mb), without gaps. A third smaller chromosome
(176 kb) was identified and characterized. Aberrant
nonlinear DNA structures that may define chromosome
structure and organization, as well as intermediates in
DNA repair, were visualized directly by optical mapping
techniques after irradiation.
This highresolution restriction map was used by
collaborators at The Institute for Genomic Research to
verify sequenceassembly data from D. radiodurans
by aligning the restriction map predicted from their
sequence. Optical mapping of D. radiodurans also
rendered insights into the organism's biology by
providing a picture of the entire genome's basic
organization. The genome was shown to be composed of
two rather than one chromosome, and the presence of
other extrachromosomal elements was demonstrated.
Whole-genome characterization by optical mapping may
facilitate further understanding of the
radiationresistant nature of D. radiodurans,
which is being used as a vehicle for bioremediation of
toxic organic pollutants within radioactive waste
dumps.
Whole-Genome Sequence of
Pyrobaculum aerophilum
Melvin I. Simon and Sorel Fitz-Gibbon
Biology Division; California Institute of Technology;
1200 E. California Blvd.; Pasadena, CA 91125
626/395-3944, Fax: /796-7066,
simonm@starbase1.caltech.edu
www.tree.caltech.edu
Pyrobaculum aerophilum was chosen as a model
organism for the study of hyperthermophiles and
archaea. This rod-shaped microbe, isolated from a
boiling marine vent, has a maximum growth temperature
of 104°C, not far from the 113°C maximum known
for all life. Unlike most hyperthermophiles, however,
P. aerophilum is able to withstand exposure to
oxygen and thus is amenable to experimental
manipulations on the laboratory benchtop. In addition
to being an ideal model-organism candidate, P.
aerophilum warrants further studies because of its
phylogenetic position as a member of the
crenarchaea-eocytes, which may be the eukaryotes'
closest prokaryotic relatives.
The entire P. aerophilum genome has been
sequenced using a random shotgun approach (3.5X genomic
coverage) followed by oligonucleotide primer-directed
sequencing guided by our fosmid map. The genome was
assembled and edited using the Phred-Phrap-Consed
system. The 2.2-Mb genome codes for about 2500
proteins, 30% of which have been identified by sequence
similarities to proteins of known function. We have
made extensive use of the MAGPIE software for genome
annotation and GeneMark and Glimmer for prediction of
coding regions. In completing the "polishing" of the
genome, we are nearing our goal of no more than 1error
in 10,000 bases. We also are continuing to annotate the
genome and attempting to improve our functional
predictions by using information on conserved residues,
potential 3-D structure alignments, and gene
phylogenies.
In our publications early in 1999, we discussed in
detail the results of the annotation process. One
interesting set of results pertains to genes involved
in DNA repair. Two major mechanisms for avoiding
mutations during DNA replication are the DNA
polymerase's immediate editing of the growing strand
and the mismatch-repair system's detection and
correction of mismatches soon after replication.
Homologs of the Escherichia coli proteins
involved in mismatch repair have been found in humans,
and damage to them has been implicated in hereditary
nonpolyposis colon cancer. However, homologs to
mismatch-repair proteins have not been detected in the
P. aerophilum genome nor in any of the other
three completed archaeal genomes. It remains to be seen
whether mismatch-repair activities can be detected in
these organisms, and, if so, whether different enzymes
have been recruited for these functions or the archaeal
homologs have diverged too much to be recognized by
simple sequence comparisons.
Having the entire genome sequence is an extraordinary
tool for research on this organism, and numerous
downstream projects already are in progress. The genome
sequence has been invaluable in guiding work to develop
a laboratory research system that would allow such
E. coli-like experiments as gene knockouts and
homologous overexpression of archaeal proteins. The
P. aerophilum genome-proteome also is being used
by several laboratories worldwide to develop methods
for high-throughput 3-D structure determination.
Proteins from thermophiles appear to be more stable
than their mesophilic homologs and may have higher
rates of successful crystallization, thus simplifying
the development of high-throughput "structural
proteomics."
Completion of microbial genome sequences provides not
only a wealth of information on individual species but
also allows implementation of new methods for
deciphering genomes. For example, it is now possible to
predict functionally linked proteins simply by looking
for the presence or absence of similar distribution
patterns among completed genomes. With perhaps half the
proteins in microbial genomes having no clear
functional assignments, a good deal of exciting work
remains to be done.
This is a completed project.
Whole-Genome Shotgun
Sequencing
Douglas Smith
Genome and Technology Development; Genome Therapeutics
Corp.; 100 Beaver St.; Waltham, MA 02154-8440
781/398-2378 or /893-5007 (ext. 219), Fax: /893-9535 or
/642-0310,
doug.smith@genomecorp.com
The information in the chromosome of a bacterium (or
any other organism) is encoded in the specific sequence
of four chemical building blocks called nucleotides.
Millions of these nucleotides are polymerized into long
strands that stick together in pairs to form the DNA
double helix. Genes are encoded in the DNA by specific
sequences of nucleotides, much as the words in this
paragraph are encoded by sequences of letters.
Bacterial chromosomes typically contain 1 to 7million
nucleotide pairs (abbreviatedMb).
Current biochemical methods for determining DNA
sequences generate "reads" of about 500 to
700nucleotides. To sequence an entire bacterial genome,
therefore, a method is needed for accurately piecing
together lots of individual reads. To accomplish this,
we use a "whole-genome shotgun" approach in which
thousands of sequence reads (enough to span a whole
genome 7 to 8times) are generated from random locations
in the genome. Using powerful computer programs,
investigators then assemble these sequences into
overlapping sets that, together with additional
information, can be joined to reassemble the entire
chromosome.
Methanobacterium thermoautotrophicum
This organism is a member of the archaea, one of the
three major kingdoms into which all living things can
be classified (the other two are bacteria, which
include most of the familiar disease-causing organisms;
and eucarya, which include protozoa, fungi, plants,
animals, and humans). Archaea are interesting because
many of their cellular processes are similar to those
of eucarya, while others are more closely related to
bacteria.
M. thermoautotropicum, originally isolated from
sewage sludge, also is found in the manure of farm
animals. In combination with other organisms, M.
thermoautotrophicum can be used to produce methane
from such materials. The organism prefers growth
temperatures of about 65°C and is capable of
growing and producing methane in the presence of only
hydrogen, carbon dioxide, and a few salts. The complete
genome sequence provides informationthat could be used
to reengineer the organism to grow more rapidly and to
produce larger amounts of methane with fewer
by-products. The thermostable proteins may be useful in
the chemical industry as reagents for bioconversion or
biocatalysis.
Using the whole-genome shotgun approach, we completed
the sequence of the entire 1.75-Mb genome of M.
thermoautotrophicum during 1997. In the shotgun
phase, we generated over 36,000 sequence reads (about
13Mb, or 7.5-fold genome coverage). The reads were
assembled, and the resulting sets of overlapping
fragments were joined together by using a
"primer-walking" technique to generate new sequences
extending from the ends of the contigs. Additional
biochemical tools and computer programs were used to
identify and fix misassembled regions and to confirm
the links between the assembled sequences, allowing us
to reconstruct the entire circular chromosome.
The resulting sequence was analyzed to identify the
encoded genes. Many M. thermoautotrophicum genes
encode proteins that are more closely related to
eucaryal proteins (from higher plants and animals) than
to bacterial ones. This is especially true of
components involved in transcription and translation,
processes by which gene sequences are "expressed" to
produce protein products in the cell. Comparisons to
the genome of Methanococcus jannaschii (another
archaeon) revealed many similarities but also many
differences. Both organisms contain a significant
number of unique genes that are unrelated to any other
known genes. This finding underscores the high degree
of complexity and genetic diversity present in the
biological universe.
Clostridium acetobutylicum
Continued Microbial Genome Program work in our
laboratory focused on the gram-positive, spore-forming
bacterium C. acetobutylicum ATCC 824. Its 4.1-Mb
genome, reflecting its more complex life processes and
metabolism, is more than twice the size of
Methanobacterium. The organism is related to the
pathogenic species C. botulinum, C. tetani, and
C. perfringens, which cause the diseases
botulism, tetanus, and gangrene, respectively.
Isolates of C. acetobutylicum were identified
before the First World War when rubber shortages
stimulated a search for microbes that could produce
butanol for synthetic rubber production. Chaim Weizmann
(who later became the first president of Israel)
developed a process for ABE fermentation (to produce
acetone, butanol, and ethanol) using C.
acetobutylicum and plant starch that was later
pursued commercially. Demand for acetone during the
Second World War led to the establishment of a
molasses-based ABE process, but increases in the cost
of molasses, together with advances in the
petrochemical industry, led to its eventual
abandonment.
Since that time, scientific interest in the
solvent-producing Clostridia has continued. A
great deal of work has been done to elucidate the
metabolic pathways by which solvents are produced. Many
solvent-overproducing derivatives (strains) have been
identified, and it is now possible to pursue a rational
approach to develop modified strains with industrially
useful properties. Experimental research systems have
been developed that allow genes to be manipulated in
these organisms, and strains have been altered to grow
on cellulose constituents that will not support the
growth of natural strains. The complete genome sequence
will be immensely useful in further development of
these organisms as natural bioconversion factories for
the chemical and fuel industries.
C. acetobutylicum ATCC 824 was sequenced by the
whole-genome shotgun approach, essentially as described
above but including several technological advances. The
finishing phase involved exhaustive gap closure and
quality enhancement using a variety of biochemical
methods and computational tools. Only a few gaps
remain, and a publication describing the work is
expected during 1999.
The genome sequences of M. thermoautotrophicum
and C.acetobutylicum are freely available in
public databases, enabling research scientists
throughout the world to access the information to
expedite the development of useful derivatives of these
and other organisms.
This is a completed project.
The Complete Genome of the
Hyperthermophilic Bacterium Aquifex aeolicus
Ronald Swanson
Diversa Corporation; 10665 Sorrento Valley Road; San
Diego, CA 92121
619/623-5156, Fax: -5120,
rswanson@diversa.com
Diversa Corporation has completed the genome sequence
of the most heat tolerant bacterium currently known.
This organism, Aquifex aeolicus, is capable of
growing at up to 95°C (203°F). Isolated and
described only recently, Aquifex is related to
filamentous bacteria first observed at the turn of the
century, growing at 89°C in the outflow of hot
springs in Yellowstone National Park. Observation of
these macroscopic assemblages would later be
instrumental in the drive to culture hyperthermophilic
organisms.
Aquifex is able to grow on hydrogen, oxygen,
carbon dioxide, and simple mineral salts. The complex
metabolic machinery necessary to function as a
hyperthermophilic chemolithoautotroph is encoded within
a 1,551,335-bp genome only one-third the size of
Escherichia coli; this small size appears to
limit metabolic flexibility. The use of oxygen as an
electron acceptor is enabled by the presence of a
complex respiratory apparatus. Despite the fact that
this organism grows at bacteria's extreme thermal
limit, only a few specific indications of thermophily
are apparent from the genome.
One of the most exciting results of sequence analysis
is the lack of coherence in the apparent phylogenies of
different genes. It was widely anticipated that,
because of the small subunit ribosomal RNA gene's
branching position near the bacterial lineage's root,
Aquifex gene analysis would shed light on the
phenotype of bacteria's last common ancestor, including
the bacterial domain's hypothesized thermophilic
origin. However, protein-based phylogenies do not in
many cases support the original rRNAbased placement and
show no consistent picture of the organism's phylogeny.
This result has fundamental implications for our
understanding of the evolutionary mode.
The sequencing strategy used to assemble the complete
genome was based on the whole-genome shotgun approach.
Shotgun sequencing is characterized by two phases: an
initial, completely random phase in which most data are
collected, and a closure phase in which directed
techniques are used to close gaps and complete the
assembly. By pursuing a strategy in which only 97%
coverage was achieved initially, we were able to limit
the number of random-phase sequences to only 10,500.
Sequence fragments were assembled on an Apple Macintosh
computer using Sequencher, a commercially available
assembly and editing program. Sequences were obtained
from both ends of clones randomly chosen from a fosmid
library; using Sequencher, we assembled these sequences
with consensus sequences derived from the contigs of
random-phase sequences. Gaps between contigs were
closed by direct sequencing on fosmids not wholly
contained within a contig. The final assembly comprises
13,785sequences with an average edited read length of
557bp.
More than half of Aquifex's 1512 open reading
frames were assigned a putative function based on
similarity to known sequences. The extreme
thermostability of Aquifex proteins, coupled
with their bacterial origins, makes them ideal
candidates for over expression, nuclear magnetic
resonance imaging, and Xray crystallographic studies.
Consequently, large numbers of researchers are pursuing
structures of the thermostable Aquifex proteins,
and several heterologously expressed proteins are being
evaluated in commercial applications.
This is a completed project.
The Genome Sequence of the
Hyperthermophilic Archaeon Pyrococcus
furiosus
Robert B. Weiss, Frank Robb,1 and
James R. Brown2
Human Genetics Dept.; Eccles Institute of Human Genetics;
20 South 2030 East, Room308 BPRB; University of Utah;
Salt Lake City, UT 84112-5330
801/585-3435 or -5606, Fax: -7177,
bob.weiss@genetics.utah.edu
or
bob@watneys.med.utah.edu
1Center of Marine Biotechnology; University
of Maryland
2Microbial Bioinformatics Group; SmithKline
Beecham Pharmaceuticals
http://www-genetics.med.utah.edu/
Pyrococcus furiosus is the best-studied member
of the unusual class of organisms known as extreme
hyperthermophiles because they live at extremes of
temperature and pressure. Isolated from geothermally
heated marine sediment in the shallow waters off
Vulcano Island, Italy, P. furiosus grows
optimally at 100°C and derives its energy by
fermentation of protein, peptide, and sugar mixtures
found in its geothermal environment. The organism is
fast growing and capable of dividing every 40 min.
Extreme hyperthermophiles play an important role in
advancing the fundamental understanding of protein
biochemistry, RNA and DNA metabolism, and protein
interactions. How has a cell's macromolecular machinery
adapted to function at 100°C? Proteins from
organisms living at moderate temperatures unfold or
denature when heated, but proteins from
hyperthermophiles maintain their three-dimensional
shapes. The genome sequence provides a resource for
beginning to understand why this happens.
Extremely stable proteins have potential
biotechnological uses as rugged industrial catalysts.
The diverse metabolism of P. furiosus provides a
wide variety of biocatalysts that are potentially
useful as environmentally safe reagents in transforming
biomass to derive energy and specialty chemicals and in
degrading organic compounds for environmental
detoxification.
The P. furiosus genome sequence was completed
recently. Its circular chromosome is 1,908,253bp long
with a G-C content of 40.8%. The sequencing strategy
tested a variant of whole-genome shotgun sequencing
with a new sequencing vector that allows the genome to
be subcloned as larger pieces. The genome was pieced
together from fewer than 2500 subclones, compared to
the more typical number of 20,000. These mediuminsert
sequencing vectors may help to assemble the larger
human and mouse genomes.
Genome analysis and annotation are ongoing. Recently,
the complete sequence of the distantly related P.
horikoshii, which was isolated from a hydrothermal
vent at a depth of 1395m in the Sea of Japan, was
determined by a
group in Japan.
P. furiosus and P. horikoshii diverged
over 100 million years ago, and comparisons
between them are providing unique insight into
processes that result in changes to genes and genomes
by revealing complex gene rearrangements and changes in
gene content.
The sequence was completed in late November 1998, and
the annotation phase was completed early in 1999. The
sequence is available for searching and downloading
from the Web.
Library construction, sequencing, and assembly and the
production of finished sequence was done at the
University of Utah. Dr.Frank Robb's group provided the
organism and has assisted in the finishing and
annotation stages. Dr.Brown's group is assisting in the
gene-finding and annotation stages of the
project.
This is a completed project.
|