Research Abstracts
DOE Microbial Genome Program
Report
Section 1: Sequencing and Analysis
Sequencing the Genome of Nitrosomonas europaea,
an Obligate Lithoautotrophic, Ammonia-Oxidizing Bacterium
Daniel J. Arp, Alan B. Hooper,1 Jane E. Lamerdin,2
David Arciero,1 Andre Arellano,2 Karolyn Burkhart-Schultz,2
Anne Marie Erler,2 Norman Hommes, Martin G. Klotz,3
Jenny M. Norton,4 Warren Regala,2 Luis Sayavedra-Soto,
and Stephanie Stilwagen2
Botany and Plant Pathology; Oregon State University; 2082 Cordley; Corvallis,
OR 97331-2902
541/737-1294, Fax: -3573, rpd@bcc.orst.edu
1University of Minnesota
2Lawrence Livermore National Laboratory
3University of Louisville
4Utah State University
As part of the DOE initiative to explore the role of microorganisms
in global carbon sequestration, the Joint Genome Institute intends to obtain
the complete genomic sequence of the autotrophic nitrifying bacterium Nitrosomonas
europaea. This organism is the most studied of the ammonia-oxidizing
bacteria that are participants in the biogeochemical N cycle. Nitrifying
bacteria play a central role in the availability of nitrogen to plants
and hence in limiting CO2 fixation. The reaction catalyzed by
these bacteria is the first step in the oxidation of ammonia to nitrate.
These bacteria also are important players in the treatment of industrial
and sewage waste in the first step of oxidizing ammonia to nitrate. Evidence
suggests that ammonia-oxidizing bacteria contribute significantly to the
global production of nitrous oxide (produced by the reduction of nitrite).
N.europaea
also is capable of degrading a variety of halogenated organic compounds,
including trichloroethylene, benzene, and vinyl chloride. The ability of
nitrifying organisms to degrade some pollutants may make these organisms
attractive for controlled bioremediation in nitrifying soils and waters.
N. europaea is a Gram-negative member of the B-proteobacteria
subdivision, possessing a genome size of at least 2.2 Mb. The microbe
can be transformed and deletion mutants engineered, allowing the study
of genotype-phenotype relationships. To complete the sequence of the N.
europaea genome, a whole-genome shotgun strategy is being used similar
to that employed successfully for many tens of bacterial organisms. The
8X genome coverage generated by the shotgun data is being supplemented
with a scaffold of paired end sequences from clones in the low-copy-number
fosmid vector. Shotgun data from this organism were assembled with PHRAP
(Phil Green, University of Washington) and will progress through "auto-finishing"
using software written by Matt Nolan (JGI-LLNL) and David Gordon (University
of Washington) prior to human intervention in the assembly. Fingerprinting
of a minimal spanning path of fosmids will be used to aid verification
of the final assembly. A sequence-analysis pipeline, developed by Manesh
Shah and Frank Larimer of Oak Ridge National Laboratory, is being used
to define open reading frames (ORFs) and query public databases for protein-nucleotide
similarities. Periodic lists of putative ORFs will appear on the Web
site as the genomic coverage continues to grow. The raw sequence data
also are directly queryable through the accompanying BLAST server or can
be downloaded from the JGI ftp server.
This will be the second member of the b-subdivision to have been sequenced.
The most-studied gene products in this organism are those involved in the
oxidation of ammonia, principally the hydroxylamine oxidoreductase (HAO),
ammonia monooxygenase (AMO), and the accompanying cytochromes that make
up the electron-transport chain. We hope the genome sequence will reveal
strong candidates for as-yet-unidentified proteins specific to the N-oxidation
pathways unique to this organism. The nature and regulation of enzymes
in the nitrite-to-nitrous oxide pathway also are of interest. The operon
encoding the subunits of AMO is duplicated and the amino acid sequences
of the two operons differ by only a single nucleotide. The gene that codes
for HAO is present in three copies. The extent to which other genes are
duplicated in the genome is not known but is one anticipated outcome of
generating the genomic sequence of N. europaea. As one of the few
strictly autotrophic bacteria currently being sequenced, N. europaea's
genome sequence is expected to reveal the identity and number of genes
required for and suited to autotrophy and possibly provide an indication
of the basis for obligate autotrophy. The sequence will allow direct comparison
to genes identified in another lithoautotrophic organism, Thiobacillus
ferroxidans, which derives its energy from the oxidation of iron or
sulfur compounds. Comparison of the metabolic capabilities of this organism
with those of photoautotrophs and other lithoautotrophs may reveal the
range of capabilities that were lost or gained as N. europaea descended
from its evolutionary ancestors.
Nostoc Genome Sequencing
Ronald M. Atlas
Department of Biology; University of Louisville; Louisville KY 40292
502/852-3957, Fax: -0725, r.atlas@louisville.edu
An expert advisory panel met with Jane Lamerdin of the Joint Genome
Institute (JGI) to select a strain of the heterocystous cyanobacterium
Nostoc
for genome sequencing. Based upon its relevance to carbon sequestration
and the likelihood of providing significant new scientific information,
the panel selected Nostoc punctiforme PCC 73102, ATCC 29133. This
strain fixes nitrogen and carbon dioxide, forms symbiotic relationships,
exhibits cell differentiation with the formation of motile hormogonia (a
diagnostic characteristic of the genus Nostoc), has a complex life
cycle, has established genetic transfer systems, and is divergent from
other cyanobacteria being sequenced. DNA from N. punctiforme is
being prepared by Jack Meeks for submission to JGI for sequencing. The
advisory panel will work with JGI during the annotation phase and will
participate in publication of the data. The panel consists of Ronald M.
Atlas (Department of Biology, University of Louisville), Jack Meeks (Division
of Biological Sciences, University of California, Davis), Malcolm Potts
(Department of Biochemistry and Nutrition, Virginia Polytechnic Institute);
Jeff Elhai (Department of Biology, University of Richmond), and Theresa
Thiel (Department of Biology, University of Missouri, St. Louis).
The Complete Genome Sequence of Prochlorococcus
Sallie W. Chisholm
Departments of Civil and Environmental Engineering and Biology; Massachusetts
Institute of Technology; 15 Vassar St. 48425; Cambridge, MA 02139
617/2531771, Fax: /2587009, chisholm@mit.edu
http://web.mit.edu/chisholm/www
Prochlorococcus is a unicellular cyanobacterium that is very
abundant in the temperate and tropical oceans. It has been shown to contribute
32 to 80% of the total photosynthesis in the world's oligotrophic oceans,
the higher values being found in the Pacific. Thus, Prochlorococcus
plays a significant role in the global carbon cycle and the regulation
of the earth's climate.
Molecular phylogenies have shown that Prochlorococcus is closely
related to marine Synechococcus, forming a single lineage within
the cyanobacteria. Unlike Synechococcus, Prochlorococcus
lacks phycobilisomes and contains divinyl chlorophyll a (8desethyl, 8vinyl
chlorophyll a, or "chla2") and divinyl chlorophyll b (chlb2) as its major
photosynthetic pigments. These pigments enable it to absorb blue light
more efficiently than Synechococcus at the low-light intensities
and blue wavelengths characteristic of the deep euphotic zone.
We recently demonstrated that there are at least two ecotypes of Prochlorococcus,
each of which is distinguished by its photophysiology and molecular phylogeny.
One is capable of growth at irradiances, and the other is not. We hypothesize
that multiple ecotypes of Prochlorococcus coexist in all oceanic
environments, alternating in dominance according to light gradients and
seasonal mixing dynamics. We would expect to find, for example, that ecotypes
adapted to low light are dominant at the base of the euphotic zone in stratified
waters and those adapted to high light dominate at the surface. The ecotypes
differ in other physiological properties besides light-harvesting efficiencies,
and these too will play a role in regulating their distributions. Ultimately,
a comparison of the complete genomes of these two ecotypes will provide
valuable insights into the regulation of microdiversity in marine microbial
systems.
Prochlorococcus is an ideal candidate for complete genome sequencing
for a variety of reasons: (1)it is the smallest known phototroph and has
a relatively small genome size (1.8Mb); (2)it is widespread and abundant
and is easily identified and enumerated in its environment using flow cytometry;
(3)its unique photosynthetic pigment (divinyl chlorophyll a) makes its
contribution to total photosynthetic biomass in natural communities easily
assessed; (4)different ecotypes have been identified that are very closely
related according to their 16S rRNA sequences but are physiologically distinct;
and (5)we have an extensive culture collection of isolates from different
oceans and environments.
We plan to work with scientists at the DOE Joint Genome Institute (JGI
Prochlorococcus Web site) to obtain the entire genomic sequence
of Prochlorococcus marinus (MED4), one of the ecotypes adapted
to high light. Our role in the project is to supply Prochlorococcus
DNA and to be a general source of information on the ecology and biology
of the organism.
Sequencing the Large Linear Chromosome of Borellia
burgdorferi and a Strain of Clostridium
John J. Dunn and F. William Studier
Biology Department; Brookhaven National Laboratory; Bldg. 463, 50 Bell;
P.O. Box 5000; Upton, NY 11973-5000
631/344-3012, Fax: -3407, jdunn@bnl.gov
631/344-3390, Fax: -3407, studier@bnl.gov
www.genome.bnl.gov
In a program to explore possible improvements in the accuracy, speed,
and efficiency of genome sequencing, we sequenced the large linear chromosome
of Borrelia burgdorferi, the spirochete that causes Lyme disease.
This 909,275-bp sequence is available on our Web site, along with a comparison
of the same sequence determined independently by The Institute for Genomic
Research (TIGR).
The Brookhaven National Laboratory (BNL) sequence was determined by
random first-end and directed second-end sequencing of plasmid libraries
of random chromosomal fragments, followed by primer walking using 12-mer
primers generated by ligation of two hexamers on hexamer templates. The
sequence assembly was confirmed and contigs were aligned by end sequencing
a framework of ~35-kb fesmid clones, which spanned the entire sequence.
The few remaining gaps were filled by polymerase chain reaction amplification
from fesmid clones or genomic DNA. The sequence extends to the ends of
the clones we obtained (which did not include the covalently closed ends
of the chromosome) and lacks 404bp at the left end and 249bp at the right
end of TIGR's sequence, which extends to the ends. The entire BNL sequence
was determined at least once on each complementary strand.
The BNL and TIGR sequences are very similar, but there are some differences.
The TIGR sequence contains seven copies of a 162bp imperfect tandem repeat
that occurs only twice in the BNL sequence. There are 86 other discrepancies,
only some of which are in a few remaining areas of relatively low quality
in the BNL sequence. In addition, the BNL sequence contains 65 ambiguities
(reflecting different base pairs at the same position in different clones),
and the TIGR sequence contains 43 ambiguities. For each ambiguity in either
sequence, one of the ambiguous bases matches the base at that position
in the other sequence. It seems likely that each DNA preparation used for
cloning and sequencing has polymorphisms at the 0.01% level, with a similar
level of polymorphism between the two DNA preparations.
We are currently sequencing the genome of a Clostridium strain
being studied at BNL as a possible bioremediation agent. This anaerobic,
nitrogen-fixing spore former can convert water-soluble uranyl ion U(VI)
to less soluble U(IV). Its circular genome is about 4Mb, and no plasmids
have been detected. More than 500kb of edited unique sequence has been
obtained so far. Clone libraries are being constructed in vectors we developed
that allow an ordered set of nested deletions to be generated from either
end of cloned fragments at least 10 kb long. These vectors were designed
to allow sequencing and ordered assembly of both DNA strands in highly
repeated regions such as those encountered in human DNA. In Clostridium,
the vectors allow directed sequencing of particularly interesting areas
by using nested deletions to fill in the framework generated by end sequencing.
We expect to sequence the relevant U and N2 reductases and identify
most genes involved in intermediary metabolism.
This is a completed project.
DOE-Funded Microbial Genome Sequencing
at The Institute for Genomic Research
Claire Fraser
The Institute for Genomic Research; 9712 Medical Center Dr.; Rockville,
MD 20850
301/838-3500, Fax: -0209, cfraser@tigr.org
www.tigr.org
The Institute for Genomic Research (TIGR) is a not-for-profit research
institute with interests in structural, functional, and comparative analysis
of genomes and gene products in viruses, bacteria, archaea, and both plant
and animal eukaryotes, including humans. Microbial genome-sequencing efforts
at TIGR supported by the Department of Energy since 1995 have produced
complete genome sequences for five organisms: Mycoplasma genitalium,
Methanococcus
jannaschii, Archaeoglobus fulgidus, Thermotoga maritima,
and Deinococcus radiodurans. In addition, nine other DOE-funded
microbial genome projects are in progress at TIGR, with an estimated completion
date of 2001 for all work. In total, the DOE-funded microbial genome sequencing
projects at TIGR represent nearly 33million base pairs (Mb) of DNA and
an estimated 30,000 microbial genes. The information generated in these
projects is available from the TIGR Microbial
Database.
The strategy that we use for whole-genome sequencing is called a "shotgun"
method. In shotgun sequencing, the genome is sheared randomly into small
pieces that are then cloned, sequenced, and reassembled to form a whole
genomic sequence. With the shotgun approach, there is no need to develop
a genetic or physical map of the genome before sequencing it; the sequence
itself serves as the ultimate map. In large shotgun-sequencing projects,
DNA fragments are assembled into a consensus sequence. Key to the success
of the shotgun method is the availability of a truly random genomic DNA
clone library and a powerful, accurate algorithm for reassembling the fragments
into a complete genome. The basic approach for genome assembly is to compare
all individual sequences to find overlaps and use this information to build
a consensus sequence. Using new software developed at TIGR for large-scale
genome sequencing projects, we have assembled the complete genomes of 12
microbial species to date.
The next step in whole-genome analysis is to identify all the predicted
genes and search the translated protein sequences against protein sequences
available in public databases. Because of the tremendous conservation in
protein sequence among organisms throughout evolution, putative genes can
be identified by sequence similarities.
The Minimal Gene Complement of M.genitalium
The Mycoplasma class consists of small wall-less bacteria that
parasitize a wide range of hosts, including humans, animals, plants, insects,
and cells in culture; they are believed to represent a minimalist life
form, having yielded to selective pressure to reduce genome size and eliminate
unnecessary genes. M.genitalium was selected as one of the first
to be sequenced because it has the smallest genome of any known free-living
organism. M.genitalium lives in a parasitic relationship with its
primate hosts in ciliated epithelial cells of genitalia and respiratory
tracts. Examining the makeup of the M. genitalium genome reveals
much about the metabolic and biochemical capacity of this organism.
All genes necessary for life in M. genitalium are packaged in
a 580,070-base (bp) circular chromosome. Genome analysis suggests that
the M. genitalium genome contains about 470 genes (average size,
1040bp), which make up 88% of the genome (on average, a gene every 1235bp).
This value is similar to that found in other microbial genome sequences.
These data indicate that Mycoplasma's reduction in genome size has
not resulted in increased gene density or decreased gene size.
A complement of genes involved in DNA maintenance, repair, transcription,
translation, and cellular transport is present; however, no complete pathways
for amino acid, fatty acid, purine, or pyrimidine biosynthesis were identified
in M. genitalium. Comparison of the minimal M. genitalium
genome to that of more complex organisms suggests that differences in genome
content are reflected as profound differences in physiology and metabolic
capacity. The reduction in M. genitalium's genome size is associated
with a marked reduction in the number and components of biosynthetic pathways,
thereby requiring the pathways to use metabolic products from their hosts.
Perhaps one of the most surprising findings from whole-genome sequencing
and analysis of M. genitalium is that about one-third of the predicted
proteins identified in this organism displayed no sequence similarity to
known genes from any other organisms. This means that, even for this simplest
of free-living organisms, we still do not understand a considerable amount
of its biology. Determining whether the unknown genes in M. genitalium
are species specific or exhibit a more widespread phylogenetic distribution
will be of interest.
Comparing the M. genitalium genome with those of other microorganisms
from diverse habitats will provide insights into what constitutes a minimal
set of genes necessary for a self-replicating organism as well as the mechanisms
associated with changes in genome organization and content in nature. This
information, in turn, will be useful for modifying and engineering organisms
to perform specific biochemical tasks in the laboratory or the environment.
Genome Sequence of the Archaeon M. jannaschii
The archaea were discovered as a unique phylogenetic domain of life
by Carl Woese in the 1970s using sequence data from the small subunit of
ribosomal RNA as a biosystematic marker. M. jannaschii was the first
representative of the archaeal domain to be completely sequenced. Isolated
in 1982 from a deep-sea hydrothermal vent, M. jannaschii fixes carbon
dioxide to methane as its primary energy-producing biochemical pathway.
Because this organism thrives at deep-sea pressures and temperatures of
85°C and above, its genome should provide insights into how genomes
and gene products survive and function under these extreme conditions.
Understanding the genetic basis of methanogenesis biochemistry in the thermophilic,
barophilic M. jannaschii will bring us closer to harnessing the
unique biochemistry of methanogens as a source of renewable energy.
Analysis of the M. jannaschii genome sequence reveals that between
50 and 60% of its genes or gene products have no match to any other currently
known gene sequence. In addition, initial attempts to map database-matched
genes onto known biochemical pathways suggest that M. jannaschii's
biochemistry and physiology are quite unique among cellular organisms.
For example, certain enzymes associated with gluconeogenesis and the synthesis
of pentose sugars for nucleotide biosynthesis, such as fructose 1,6-biophosphate
aldolase and fructose 1,6-biophosphate phosphatase, are not found among
the predicted genes in M. jannaschii. Whether other gene products
have been recruited to serve the function of these missing genes or the
genes cannot be detected by standard sequence similarity methods is not
yet known.
Most genes involved in M. jannaschii's cellular-information processing
(replication, transcription, and translation) are more similar to functionally
equivalent counterparts in eukaryotes, not bacteria. On the other hand,
M.
jannaschii genes that are involved in energy production, cell division,
and basic cellular metabolism are more like genes in bacteria. Further
analysis of the M. jannaschii genome sequence, together with sequence
from other members of the archaeal domain of life, will give additional
insights into the evolutionary relationship among the prokaryotes.
Complete Sequence of the Thermophilic Archaeon A. fulgidus
Biological sulfate reduction is part of the global sulfur cycle, ubiquitous
in the earth's anaerobic environments and essential to the workings of
the biosphere. Growth by sulfate reduction is restricted to relatively
few groups of prokaryotes; all but one of these is bacteria, the exception
being the archaeal sulfate reducers in the archaeoglobales. These organisms
are unique in that they are unrelated to other sulfate reducers and they
grow at extremely high temperatures, between 60 and 95°C. They can
grow both organoheterotrophically (using a variety of carbon and energy
sources) or lithoautotrophically on hydrogen, thiosulfate, and carbon dioxide.
The known archaeoglobales are strict anaerobes, most of which are hyperthermo-philic
marine sulfate reducers found in hydrothermal environments and in subsurface
oil fields. High-temperature sulfate reduction by Archaeoglobus
species contributes to deep subsurface oil well "souring" by producing
iron sulfide, which causes corrosion of iron and steel in oil- and gas-processing
systems.
The genome of the type-strain of the archaeoglobales A. fulgidus
was sequenced to better understand the biology of this group of organisms.
Genome analysis reveals a total of ~2400 genes; these include genes for
sulfate reduction, a great diversity of electron transport systems, a large
number of transporters with specificity for both organic and inorganic
molecules, and b-oxidation of fatty acids. The information-processing systems
and the biosynthetic pathways in A. fulgidus have counterparts in
the archaeon M. jannaschii. However, the genomes of these two archaea
indicate dramatic differences in the way these organisms sense their environment,
perform regulatory and transport functions, and gain energy. Another interesting
feature revealed by genome analysis is that A. fulgidus displays
extensive gene duplication in comparison with other fully sequenced prokaryotes.
This suggests that gene duplication has been an important evolutionary
mechanism for increasing physiological diversity in the archaeoglobales.
About 25% of the A. fulgidus genome encodes conserved genes with
unknown biological function, two-thirds of which are shared with M.
jannaschii. Another 25% of the A. fulgidus genome represents
genes that are unique to this organism, indicating that there is substantial
diversity among members of the archaea. As additional archaeal and bacterial
genome sequences are completed, we may begin to define a core set of genes
that are shared among prokaryotes and those that are unique to bacterial
or archaeal species.
Thermotoga maritima
The thermotogales are a group of nonsporeforming rod-shaped bacteria
that represent the most thermophilic of the known organotrophic bacteria.
The type strain Thermotoga maritima MSB8, isolated originally from
geothermal-heated marine sediment at Vulcano, Italy, has an 80°C optimum
temperature for growth. T. maritima metabolizes many simple and
complex carbohydrates including glucose, sucrose, starch, xylan, and cellulose.
Xylan is a complex plant polymer that represents the most abundant noncellulosic
polysaccharide in angiosperms, where it accounts for 20 to 30% of the dry
weight of wood tissues. Cellulose is the most abundant biopolymer occurring
in nature, estimated to account for 75 X 109 tons of dry plant
biomass annually. Both cellulose and xylan, through conversion to fuels
(e.g., H2), have major potential as renewable carbon and energy
sources.
T. maritima is of evolutionary significance because small subunit
ribosomal RNA (SSU rRNA) phylogeny has placed the bacterium as one of the
deepest and most slowly evolving bacteria. To further elucidate its unique
metabolic properties and evolutionary relationship to other microbial species,
we sequenced the genome of T. maritima MSB8 using the whole-genome
random sequencing method. The 1,860,725-bp T. maritima genome contains
1872 predicted coding regions, 54% (1005) of which have functional assignments
and 46% (867) of which are of unknown function. Almost 7% of the predicted
coding sequences in the T. maritima genome are involved in the metabolism
of simple and complex sugars, a percentage more than twice that seen in
other bacterial and archaeal species sequenced to date. Biosynthetic pathways
for nine amino acids were identified in T. maritima, but the bacterium
has an extensive system for the uptake of peptides from the environment.
Phylogenetic analysis of genes in the T. maritima genome has
demonstrated that gene evolution may not give a true picture of organismal
evolution; gene duplication, gene loss, and horizontal gene transfer probably
account for many inconsistencies in single-gene phylogenies. The complete
genome of T. maritima has, however, revealed a degree of similarity
with the thermophilic archaea in terms of gene content and overall genome
organization that was not previously appreciated. Of the sequenced bacteria,
T.
maritima has the highest percentage(24%) of genes that are most similar
to archaeal genes. Some 81 of these genes are clustered in regions of the
genome that range in size from 4 to20kb. Five of these regions have a composition
substantially different from the rest of the genome, suggesting that lateral
gene transfer has occurred between the thermophilic archaea and bacteria.
In addition, repeat structures in T. maritima have been identified
only in thermophiles, and 108genes on the T. maritima genome have
orthologues only in the genomes of other thermophilic bacteria and archaea.
One explanation for the relatedness between thermophilic organisms seems
to be the occurrence of lateral gene transfer.
Deinococcus radiodurans
Deinococcus radiodurans, originally discovered in food samples
exposed to severe gamma irradiation, is the most radioresistant organism
ever isolated. An important component of this resistance is the ability
to repair damage to its own chromosomal DNA. D. radiodurans
cultures exposed to 1.5Mrad of radiation display a reduction in size of
genomic DNA fragments corresponding to about 100 double-stranded breaks
per genome. Typically, most prokaryotic and eukaryotic organisms cannot
tolerate more than five double-stranded breaks per genome without reduced
survival.
Within 8 to 10 hours after radiation exposure, the D. radiodurans
genome is fully restored with no evidence of double-stranded breaks. During
this repair time, cellular replication of D. radiodurans is arrested;
after this 8- to 10-hour interval, the cells display 100% survival with
no detectable mutagenesis of their completely restored genome. DOE's interest
in D. radiodurans includes understanding its ability to withstand
radiation, particularly as it relates to the possibility of this organism's
potential for bioremediation of toxic waste sites that contain radioactive
isotopes.
The genome sequence of D. radiodurans is complete, and we have
determined that the genome is composed of three chromosomes and a small
plasmid. Inspection of the set of genes with similarity to DNA-repair enzymes
has so far been inconclusive regarding radiation resistance; D. radiodurans
does not appear to contain repair genes that would make it unique among
other bacteria. However, a number of unique sequence elements have been
identified that are being tested for their role in radiation resistance.
These experiments, coupled with the high-throughput analysis of gene expression
using microarray technology, should lead to a more complete understanding
of this bacterium's gamma radiation resistance in the near future.
Shewanella putrefaciens: A Model Organism for Bioremediation
Shewanella putrefaciens is a bacterium involved in microbiologically
influenced corrosion, anaerobic consumption of toxic organic pollutants,
removal of toxic metals by sulfide precipitation, and removal of toxic
metals and radionuclides by conversion to insoluble reduced forms. Whole-genome
sequencing of S. putrefaciens will furnish the bioremediation community
with detailed knowledge of metabolic pathways involved in all these processes,
providing an excellent model system for manipulating organisms for remediation
or control.
In addition, a complete genome sequence for S. putrefaciens will
furnish important information on engineering specific regulatory mutants
for bioremediation. For example, mutants that continue to metabolize anaerobically,
even in the presence of oxygen, could be used to remove uranium (U6+)
in dilute environments where oxygen is still present. S. putrefaciens
grows both aerobically and anaerobically. In its anaerobic phase, it acts
as a metal reducer. The potential of metal-reducing bacteria in pollutant
removal is very high for both the short and long terms, especially for
those iron reducers that are not inhibited by oxygen.
Two separate reports suggest that Shewanella spp. can donate
electrons to chlorinated hydrocarbons, thus reductively dechlorinating
toxic compounds by converting tetrachloromethane to trichloromethane. In
addition, organisms such as S. putrefaciens, which can produce Fe2+,
have potential to catalyze the reduction of toxic nitrates. Metals can
be removed from solution via direct reduction by metal-reducing bacteria
such as S. putrefaciens.
While iron and manganese are solubilized, other metals are converted
to insoluble forms upon reduction. Of note are chromium (Cr6+)
and uranium (U6+), both of which are soluble in the oxidized
form but insoluble as the respective species reduced by Cr3+
and U3+. Reduction of U6+ has been demonstrated for
S.
putrefaciens and has been proposed as a mechanism for concentrating
and thus removing radionuclide waste. As with uranium, the removal of toxic
chromium should be possible using either intact cells or cell-free systems
of the metal-reducing bacteria.
Complete genome sequences for all these metabolic processes would accelerate
bioremediation efforts in metal and radionuclide reduction, chlorinated
hydrocarbon pollutants, and toxic nitrates. We are midway through the closure
process in the complete genome sequencing of S. putrefaciens. Random
sequencing was completed in July 1998, and closure began in August 1998.
Analysis of the assemblies suggests that the completed genome size will
be about 5Mb.
Preliminary observation of the gene content of this organism has shown
similarities between S. putrefaciens and Vibriocholerae in
some role categories (small molecule biosynthesis, central intermediary
metabolism) but differences in others (sugar metabolism). It will be interesting
to examine these similarities and differences in light of the different
ecological niches occupied by these organisms.
Chlorobium tepidum
The taxonomic group of green sulfur bacteria (Chlorobiaceae) are formally
classified as Gram-negative organisms. Members of this genus are photoautotrophs
that can generate chemical energy through an electron transport chain in
the cytoplasmic membrane that is associated with a light-harvesting complex
housed in a specialized organelle called the chlorosome. The components
of this light-harvesting apparatus and some of its organizational structure
are reminiscent of photosystems found in plant chloroplasts and, therefore,
the evolutionary relationship of these prokaryotes to eukaryotic organelles
is of interest. Chlorobium species also can fix CO2 ,
although the biochemical pathway used by these prokaryotes is distinct
from the Calvin cycle found in higher plants.
C. tepidum initially was identified from a hot spring in New
Zealand. This species is thermophilic with an optimum growth temperature
of about 47°C. It has a genome size of 2.1Mb with a G+C content of
56.5mol%. C. tepidum was nominated for sequencing by DOE because
of its photosynthetic capacity and its interesting phylogenetic position
in the bacterial kingdom.
C. tepidum sequencing and closure has been completed. Genome
annotation is under way and soon will be completed.
Caulobacter crescentus
Caulobacter crescentus is placed in the alpha-purple bacteria
that also include Rickettsia, Rhizobium, Agrobacterium,
and Brucella species. It is the most prevalent nonpathogenic bacterium
in nutrient-poor freshwater streams. It is also found in marine environments.
To facilitate location of nutrient sources, C. crescentus is motile
and chemotactically competent during the swarmer phase of its life cycle.
In its nonswarmer phase Caulobacter adheres to solid substrates
such as rocks. It is a component of the organisms responsible for sewage
treatment. Caulobacters are being modified for use as bioremediation
agents for removing heavy metals from wastewater streams.
Caulobacter crescentus exhibits a well-studied developmental
pattern, independent of environmental stress, with morphologically defined
stages of the cell cycle. It has easily observable physical structures
that define these specific cell cycle stages. Two major events in C.
crescentus cell cycle are used by researchers to elucidate fundamental
processes required for development. These are the tight regulation of chromosomal
replication and the temporally and spatially regulated biogenesis of the
flagellum. The two processes are linked by a common transcriptional regulator
that orchestrates the response of multiple cellular processes to the progression
of the cell cycle.
The genome was electronically annotated at the end of the random sequencing
phase; the data, along with the assembly files, was sent to Dr. Lucy Shapiro
(Stanford University), Dr. Bert Ely (University of South Carolina), and
Dr. Janine Maddock (University of Michigan), who are collaborating with
us on final assembly and annotation of the genome. The project is now in
the closure phase.
Pseudomonas putida
Sequencing of Pseudomonas putida KT2440 began in January 1999
as a joint effort between TIGR and a German consortium consisting of groups
from MHH (Medizinische Hochschule Hannover, Hannover, Germany); GBF (Gesellschaft
für Biotechnologische Forschung mbH, Braunschweig, Germany); DKFZ
(Deutsches Krebsforschungs-zentrum, Heidelberg, Germany); and QIAGEN (QIAGEN
GmbH, Hilden, Germany). The study is supported by grants from BMBF of Germany
and the U.S. Department of Energy.
The genome sequence will be used for in-depth functional analyses including
comparisons of genome structure and function with the related organism
P.
aeruginosa. Understanding structure and function of the P. putida
genome will allow for its increased use in biotechnological areas, including
the production of natural compounds, remediation of polluted habitats,
and the use of strains to fight plant diseases.
The P. putida genome sequence is expected to be closed in the
next few months. The number of libraries for scaffolding the genome, access
to the genome sequence of P. aeruginosa, and the complementary functional
studies being conducted by the German consortium should reduce chances
of major assembly problems in the genome.
Geobacter sulfurreducens
The complete genome sequence of Geobacter sulfurreducens is being
determined to better understand its genetic potential. G. sulfurreducens
is an important member of a family (Geobacteraceae) of delta proteobacteria
capable of oxidizing organic compounds including aromatic hydrocarbons
to carbon dioxide with Fe(III) or other metals and metalloids including
U(VI), Tc(VII), Co(III), Cr(IV), Au(III), Hg(II), As(V) and Se(VII) serving
as the terminal electron acceptor. It is the dominant group of iron-reducing
microorganisms recovered from a wide variety of aquifer and subsurface
environments when both molecular and traditional culturing techniques are
used. Geobacter plays a critical role in the biogeochemical cycling
of carbon, iron, and other metals. Its genetics and physiology are a subject
of intense study in part due to the importance that these processes can
play in the remediation of contaminated anaerobic subsurface environments.
The determination of the G. sulfurreducens genome is being accomplished
using a random shotgun cloning approach to provide at least sixfold coverage
of a 1-Mb genome followed by closure of remaining physical or sequence
gaps. Searches of sequences and contigs from the early random phase of
sequencing using the BLAST algorithm and database have produced high scores
with low expect values indicating significant homologies to proteins contained
in the database. These include enzymes considered important to basic housekeeping
functions such as tRNA synthases and amino acid synthesis as well as those
essential to other metabolic processes known to occur in G. sulfurreducens
including nitrogen fixation. A number of sequences have produced no significant
alignments indicating the likelihood of genes encoding for novel functions.
Of further significance has been the extension of N-terminal sequences
previously obtained from cytochromes known to be important in dissimilatory
iron reduction. Thus, the genome will provide information crucial to the
further understanding of this important metabolic process.
The Comprehensive Microbial Resource
One of the challenges presented by large-scale genome sequencing efforts
is the effective display of information in a format that is accessible
to the laboratory scientist. Conventional databases offer the scientist
the means to search for a particular gene, sequence, or organism but do
little to display the vast amounts of curated information that are becoming
available. TIGR has developed methods to effectively "slice" the vast amounts
of data in the sequencing databases in a wide variety of ways, allowing
the user to formulate queries that search for specific genes as well as
to investigate broader topics such as genes that might serve as vaccine
and drug targets.
The Comprehensive Microbial Resource (CMR) is a facility for annotation
of TIGR genome sequencing projects, a Web presentation of all fully sequenced
microbial genomes, curation from the original sequencing centers, and further
curation from TIGR (for those genomes sequenced outside TIGR). The Web
presentation of CMR includes the comprehensive collection of bacterial
genome sequences, curated information, and related informatics methodologies.
The scientist can view genes within a genome and also can link to related
genes in other genomes. This allows construction of queries that include
sequence searches, isoelectric point, GC-content, GC-skew, functional role
assignments, growth conditions, environment, and other questions and the
isolation of genes of interest. The database contains extensive curated
data as well as prerun homology searches to facilitate data mining. The
interface allows the display of the results in numerous formats that will
help the user ask more accurate questions. This resource should be of value
to the scientific community to design experiments and spur further research.
Resources of this type are an essential tool to make sense of bacterial
genome information as the number of completed genomes continues to grow.
Rhodopseudomonas palustris Genome Project
Caroline S. Harwood
Department of Microbiology; University of Iowa; 3-432 Bowen Science
Bldg.; Iowa City, IA 52242
319/335-7783, Fax: -7679, caroline-harwood@uiowa.edu
Rhodopseudomonas palustris is a common soil and water bacterium
that makes its living by converting sunlight to cellular energy and by
absorbing atmospheric carbon dioxide and converting it to biomass. This
microbe can also degrade and recycle components of the woody tissues of
plants (wood is the most abundant polymer on earth). Because of its intimate
involvement in carbon management and recycling, R. palustris has
been selected by the DOE Carbon Management Program to have its genome sequenced
by the Human Genome Program's Joint Genome Institute (JGI).
R. palustris is acknowledged by microbiologists to be one of
the most metabolically versatile bacteria ever described. Not only can
it convert carbon dioxide gas into cell material but nitrogen gas into
ammonia, and it can produce hydrogen gas. It grows both in the absence
and presence of oxygen. In the absence of oxygen, it prefers to generate
all its energy from light by photosynthesis. It grows and increases its
biomass by absorbing carbon dioxide, but it also can increase biomass by
degrading organic compoundsincluding such toxic compounds as 3chlorobenzoateto
cellular building blocks. When oxygen is present, R. palustris generates
energy by degrading a variety of carboncontaining compounds (including
sugars, lignin monomers, and methanol) and by carrying out respiration.
R. palustris undergoes two major developmental processes. The
first is cell division by budding. This process of asymmetric cell division
results in two different kinds of daughter cellsone a motile swarmer cell
and the other a stalked nonmotile cell. The second is the differentiation
of an elaborate system of intracytoplasmic membrane vesicles when cells
run out of oxygen and are placed in light. The membranes are used to house
photosynthetic pigments and associated proteins. Budding division and differentiation
to photosynthetically competent cells both require a temporally regulated
program of gene expression followed by a pattern of precise localization
of protein products.
The diverse metabolism and the developmental cycles of R. palustris
are a large part of what makes this bacterium such a seductive target for
genome sequencing. With the entire genome sequence in hand, determining
how R. palustris can coordinate and appropriately express its many
metabolic capabilities in response to changing environmental conditions
will be possible, as will devising strategies to maximize this bacterium's
carbon-recycling capabilities.
R. palustris has a genetic system; genes can be moved in and
out of this bacterium easily, and specific genes thus can be targeted for
mutagenesis. This is of great value because it will allow researchers to
rapidly apply information gained from genome sequencing to the developing
area of functional genomics.
This work will supply the JGI with sufficient R. palustris genomic
DNA for genome sequencing as well as any information needed about the biology
of R. palustris.
Sequencing Microbial Genomes of Environmental Relevance
Jane E. Lamerdin
Joint Genome Institute; Lawrence Livermore National Laboratory; 7000
East Ave.; Livermore, CA 94550
925/423-3629, Fax: /422-2282, lamerdinl@llnl.gov
http://spider.jgi-psf.org/JGI_microbial/html/
The DOE Joint Genome Institute (JGI) has established a new program to
obtain the complete genome sequence of microorganisms that may significantly
impact global climate. This program supports the new DOE Global Carbon
Management and Sequestration initiative, which funds basic research aimed
at understanding factors that contribute to global warming and effective
ways to manage carbon (particularly carbon dioxide) in soil and ocean ecosystems.
The goal of JGI's effort is to explore the role of diverse microorganisms
in carbon cycling by elucidating their genetic content to identify metabolic
pathways that allow these organisms to adapt to their respective niches.
These specialized processes include nutrient-uptake systems, pathways that
contribute to nitrogen fixation and carbon cycling in soils, and pathways
that regulate photosynthesis. JGI's work is focused initially on five microorganisms:
Nitrosomonas
europaea, Rhodopseudomonas palustris, Nostoc punctiforme,
and
two marine cyanobacteria, Prochlorococcus marinus and Synechococcus.
The common trait shared by these microbes is that all are autotrophic (i.e.,
they fix C02 as their sole carbon source), are fairly numerous
within their respective ecosystems, and contribute materially to carbon
cycling or biomass production (with the exception of N. europaea).
N. europaea is a soil-dwelling chemolithoautotroph that oxidizes
ammonia to nitrite, a process that often depletes nitrogen available to
plants, thereby limiting C02 fixation. Significantly, when oxygen
concentrations in soils are low, N. europaea oxidizes nitrite to
N20, a catalyst of ozone breakdown and greenhouse gas production.
We expect that the genome sequence of N. europaea, one of the few
obligately autotrophic bacteria currently being sequenced, will allow us
to catalog the identity and number of genes required for autotrophy. The
genome sequence also should uncover special redox enzymes that allow N.
europaea to adapt to the narrow niche it occupies.
R. palustris is a purple nonsulfur phototrophic bacterium commonly
found in soils and fresh water. This species is of particular interest
to the Carbon Management program because it is able to degrade and recycle
components of woody tissues of plants (wood is the most abundant polymer
on earth). It also possesses a large repertoire of metabolic capabilities,
including the ability to fix C02 into cellular material, fix
nitrogen gas into ammonia, and produce and use hydrogen gas. In the absence
of oxygen, it grows phototrophically; in the presence of oxygen, it can
generate energy by degrading sugars, organic acids, and methanol and can
carry out respiration.
Nostoc punctiforme is a cyano-bacterium that enters into symbiotic
associations with fungi and lichens; these relationships are relevant to
carbon cycling and sequestration in tundra. Nostoc species also
have complex life cycles, fix nitrogen, and are capable of chromatic adaptation.
Prochlorococcus
and Synechococcus are unicellular picoplankton,
which are major biomass producers in the world's temperate and tropical
oceans. Synechococcus species are abundant in surface waters, while
Prochlorococcus is found to exist in the layer 100 to 200 m deep.
Prochlorococcus possesses an unorthodox pigment composition of divinyl
derivatives of chlorophyll
a and b, alpha carotene, zeaxanthin,
and a type of phycoerythrin. The last has not yet been shown to function
in light harvesting. By contrast, the highly related Synechococcus
contains chlorophyll a and phycobilins that are more typical of
cyanobacteria. Prochlorococcus, the only photosynthetic organism
known to contain this particular combination of pigments, could be a model
for the ancestral photosynthetic bacterium that gave rise to cyanobacteria
and chloroplasts. Sequence analysis of the Prochlorococcus genome
may shed more light on this hypothesis, and a comparison of the two genomes
should provide additional insights into cyanobacterial radiation in general.
In part due to the lack of physical maps and mapping resources for these
particular organisms, we have employed a whole-genome shotgun strategy
to determine the complete sequence of each microbe. To aid our assembly,
we are supplementing our six- to eightfold genome coverage in plasmid paired
ends with a large-insert scaffold of paired ends in the low-copy-number
fosmid vector. As the genome size increases (e.g., in Nostoc), we
will shift to BAC clones for this scaffold. These scaffold clones are being
fingerprinted to aid in verification of the final sequence assembly. We
also will obtain optical maps of several of the larger organisms, Nostoc
in particular, through a collaboration with David Schwartz at the University
of Wisconsin.
JGI has completed the initial data-generation phase for N. europaea and
P. marinus, which produced >95% of the genomic sequence for each
microbe. (Progress towards completion can be monitored through our Web
site) A similar level of coverage is anticipated for R. palustris
by mid-March. Finishing is under way on the first two organisms, and
we expect closure of both by spring of 2000. With the level of coverage
achieved by the initial data-generation phase, we can readily generate
a rough inventory of the types of genes present in each organism. Preliminary
or draft analyses have been performed on N. europaea and P.
marinus by Frank Larimer and his team at Oak Ridge National Laboratory.
The resulting catalog format provides user scientists with access to the
contents of unfinished sequence data in a consumable format, without the
need for protracted data manipulations on their part (see example).
This allows them to focus on identifying gene products of particular interest
to their research programs. The raw sequence data also are directly queryable
through an accompanying BLAST server or can be downloaded from JGI's ftp
server.
In summary, JGI's new microbial sequencing program is well under way,
with at least three organisms on target to be completed before the end
of FY00. A scientific advisory board has assigned additional organisms
for FY00 that continue the theme of relevance to the Global Carbon Management
and Sequestration effort. We anticipate generating about 20 to 25 Mb of
microbial genomic sequence in FY00 (initially in ~eightfold genome coverage)
and ramping to a rate of 60 Mb in FY01.
See also the related abstracts of Ronald Atlas, Daniel Arp, David
Schwartz, Caroline Harwood, Frank Larimer, and Sallie Chisholm.
The Genome of Geobacter sulfurreducens
B. A. Methe, Linda Banerjei,1 William C. Nierman,1
O. Snoeyenbos-West, S. Sciufo, and Derek R. Lovley
Department of Microbiology; University of Massachusetts; Amherst, MA
01003
413/545-9651, Fax: -1578, dlovley@microbio.umass.edu
1The Institute for Genomic Research; Rockville, MD 20850
The complete genome sequence of Geobacter sulfurreducens currently
is being determined to better understand its genetic potential. G. sulfurreducens
is an important member of a family (Geobacteraceae) of delta proteobacteria.
This family is capable of oxidizing organic compounds including aromatic
hydrocarbons to carbon dioxide with Fe(III) or other metals and metalloids
including U(VI), Tc(VII), Co(III), Cr(IV), Au(III), Hg(II), As(V) and Se(VII)
serving as the terminal electron acceptor. It is the dominant group of
iron-reducing microorganisms recovered from a wide variety of aquifer and
subsurface environments when both molecular and traditional culturing techniques
are used. Geobacter plays a critical role in the biogeochemical cycling
of carbon, iron, and other metals. Its genetics and physiology are a subject
of intense study in part due to the importance that these processes can
play in the remediation of contaminated anaerobic subsurface environments.
The determination of the G. sulfurreducens genome is being accomplished
using a random shotgun cloning approach to provide at least sixfold coverage
of a 1-Mb genome followed by closure of remaining physical or sequence
gaps. Assembler software and other computer programs developed by The Institute
for Genomic Research are used to assemble the genome and aid in gap closing,
finishing, and annotation. Searches of sequences and contigs from the early
random phase of sequencing using the BLAST algorithm and database have
produced high scores with low expect values indicating significant homologies
to proteins contained in the database. These include enzymes considered
important to basic housekeeping functions such as tRNA syntheses and amino
acid synthesis as well as those essential to other metabolic processes
known to occur in G. sulfurreducens, including nitrogen fixation.
A number of sequences have produced no significant alignments, indicating
the likelihood of genes encoding for novel functions. Of further significance
has been the extension of N-terminal sequences previously obtained from
cytochromes known to be important in dissimilatory iron reduction. Thus,
the genome will provide information crucial to the further understanding
of this important metabolic process.
Optical Approaches for Physical Mapping and Sequence
Assembly of the
Deinococcus radiodurans Chromosome
David C. Schwartz
Biotechnology Center; University of Wisconsin-Madison; 425 Henry Mall;
Madison, WI 53706
608/2650546, Fax: /2626748, dcschwartz@facstaff.wisc.edu
www.chem.wisc.edu/~schwartz
Maps of genomic or cloned DNA frequently are constructed by analyzing
the cleavage patterns produced by restriction enzymes. Restriction enzymes
are remarkable reagents that consistently cleave only at specific four-
to eight-nucleotide sequences, varying according to the specific enzymes.
Restriction enzymes are reliable, numerous, and easily obtainable, and
there now are around 250 different sequences represented among thousands
of enzymes. Restriction maps characterize gene structure and even entire
genomes. Furthermore, such maps provide a useful scaffold for the alignment
and verification of sequence data. Restriction maps generated by computer
and predicted from the sequence are aligned with the actual restriction
map.
Restriction enzyme action traditionally has been assayed by gel electrophoresis.
This technique separates cleaved molecules on the basis of their mobilities
under the influence of an applied electrical field within a gelseparation
matrix (small fragments have a greater mobility than large ones). Although
gel electrophoresis distinguishes differentsized DNA fragments (known as
"fingerprinting"), the original order of these fragments remains unknown.
The subsequent task of determining the order of such fragments is labor
intensive, especially when making restriction maps of whole genomes, and,
therefore, the procedure is not widely employed despite its obvious usefulness
to genome analysis.
Our laboratory developed Optical Mapping, a system for the construction
of ordered restriction maps from individual DNA molecules. The mapping
substrate consisted of very large, randomly sheared genomic DNA fragments
that were bound to derivatized glass surfaces and cleaved with the restriction
enzyme Nhe I. The resulting fragments were imaged by fluorescence
microscopy. Cut sites were visualized as gaps between cleaved DNA fragments
that retained their original order. A whole-genome restriction map of Deinococcus
radiodurans, a radiationresistant bacterium able to survive up to 15,000
grays of ionizing radiation, was constructed without using DNA libraries,
the polymerase chain reaction, or electrophoresis. Very large, randomly
sheared, genomic DNA fragments were used to construct maps from individual
DNA molecules that were assembled into two circular overlapping maps (2.6
and 0.415 Mb), without gaps. A third smaller chromosome (176 kb) was identified
and characterized. Aberrant nonlinear DNA structures that may define chromosome
structure and organization, as well as intermediates in DNA repair, were
visualized directly by optical mapping techniques after irradiation.
This highresolution restriction map was used by collaborators at The
Institute for Genomic Research to verify sequenceassembly data from D.
radiodurans by aligning the restriction map predicted from their sequence.
Optical mapping of D. radiodurans also rendered insights into the
organism's biology by providing a picture of the entire genome's basic
organization. The genome was shown to be composed of two rather than one
chromosome, and the presence of other extrachromosomal elements was demonstrated.
Whole-genome characterization by optical mapping may facilitate further
understanding of the radiationresistant nature of D. radiodurans,
which is being used as a vehicle for bioremediation of toxic organic pollutants
within radioactive waste dumps.
Whole-Genome Sequence of Pyrobaculum aerophilum
Melvin I. Simon and Sorel Fitz-Gibbon
Biology Division; California Institute of Technology; 1200 E. California
Blvd.; Pasadena, CA 91125
626/395-3944, Fax: /796-7066, simonm@starbase1.caltech.edu
www.tree.caltech.edu
Pyrobaculum aerophilum was chosen as a model organism for the
study of hyperthermophiles and archaea. This rod-shaped microbe, isolated
from a boiling marine vent, has a maximum growth temperature of 104°C,
not far from the 113°C maximum known for all life. Unlike most hyperthermophiles,
however, P. aerophilum is able to withstand exposure to oxygen and
thus is amenable to experimental manipulations on the laboratory benchtop.
In addition to being an ideal model-organism candidate, P. aerophilum
warrants further studies because of its phylogenetic position as a member
of the crenarchaea-eocytes, which may be the eukaryotes' closest prokaryotic
relatives.
The entire P. aerophilum genome has been sequenced using a random
shotgun approach (3.5X genomic coverage) followed by oligonucleotide primer-directed
sequencing guided by our fosmid map. The genome was assembled and edited
using the Phred-Phrap-Consed system. The 2.2-Mb genome codes for about
2500 proteins, 30% of which have been identified by sequence similarities
to proteins of known function. We have made extensive use of the MAGPIE
software for genome annotation and GeneMark and Glimmer for prediction
of coding regions. In completing the "polishing" of the genome, we are
nearing our goal of no more than 1error in 10,000 bases. We also are continuing
to annotate the genome and attempting to improve our functional predictions
by using information on conserved residues, potential 3-D structure alignments,
and gene phylogenies.
In our publications early in 1999, we discussed in detail the results
of the annotation process. One interesting set of results pertains to genes
involved in DNA repair. Two major mechanisms for avoiding mutations during
DNA replication are the DNA polymerase's immediate editing of the growing
strand and the mismatch-repair system's detection and correction of mismatches
soon after replication. Homologs of the Escherichia coli proteins
involved in mismatch repair have been found in humans, and damage to them
has been implicated in hereditary nonpolyposis colon cancer. However, homologs
to mismatch-repair proteins have not been detected in the P. aerophilum
genome nor in any of the other three completed archaeal genomes. It remains
to be seen whether mismatch-repair activities can be detected in these
organisms, and, if so, whether different enzymes have been recruited for
these functions or the archaeal homologs have diverged too much to be recognized
by simple sequence comparisons.
Having the entire genome sequence is an extraordinary tool for research
on this organism, and numerous downstream projects already are in progress.
The genome sequence has been invaluable in guiding work to develop a laboratory
research system that would allow such E. coli-like experiments as
gene knockouts and homologous overexpression of archaeal proteins. The
P. aerophilum genome-proteome also is being used by several laboratories
worldwide to develop methods for high-throughput 3-D structure determination.
Proteins from thermophiles appear to be more stable than their mesophilic
homologs and may have higher rates of successful crystallization, thus
simplifying the development of high-throughput "structural proteomics."
Completion of microbial genome sequences provides not only a wealth
of information on individual species but also allows implementation of
new methods for deciphering genomes. For example, it is now possible to
predict functionally linked proteins simply by looking for the presence
or absence of similar distribution patterns among completed genomes. With
perhaps half the proteins in microbial genomes having no clear functional
assignments, a good deal of exciting work remains to be done.
This is a completed project.
Whole-Genome Shotgun Sequencing
Douglas Smith
Genome and Technology Development; Genome Therapeutics Corp.; 100 Beaver
St.; Waltham, MA 02154-8440
781/398-2378 or /893-5007 (ext. 219), Fax: /893-9535 or /642-0310,
doug.smith@genomecorp.com
www.genomecorp.com
The information in the chromosome of a bacterium (or any other organism)
is encoded in the specific sequence of four chemical building blocks called
nucleotides. Millions of these nucleotides are polymerized into long strands
that stick together in pairs to form the DNA double helix. Genes are encoded
in the DNA by specific sequences of nucleotides, much as the words in this
paragraph are encoded by sequences of letters. Bacterial chromosomes typically
contain 1 to 7million nucleotide pairs (abbreviatedMb).
Current biochemical methods for determining DNA sequences generate "reads"
of about 500 to 700nucleotides. To sequence an entire bacterial genome,
therefore, a method is needed for accurately piecing together lots of individual
reads. To accomplish this, we use a "whole-genome shotgun" approach in
which thousands of sequence reads (enough to span a whole genome 7 to 8times)
are generated from random locations in the genome. Using powerful computer
programs, investigators then assemble these sequences into overlapping
sets that, together with additional information, can be joined to reassemble
the entire chromosome.
Methanobacterium thermoautotrophicum
This organism is a member of the archaea, one of the three major kingdoms
into which all living things can be classified (the other two are bacteria,
which include most of the familiar disease-causing organisms; and eucarya,
which include protozoa, fungi, plants, animals, and humans). Archaea are
interesting because many of their cellular processes are similar to those
of eucarya, while others are more closely related to bacteria.
M. thermoautotropicum, originally isolated from sewage sludge,
also is found in the manure of farm animals. In combination with other
organisms, M. thermoautotrophicum can be used to produce methane
from such materials. The organism prefers growth temperatures of about
65°C and is capable of growing and producing methane in the presence
of only hydrogen, carbon dioxide, and a few salts. The complete genome
sequence provides informationthat could be used to reengineer the organism
to grow more rapidly and to produce larger amounts of methane with fewer
by-products. The thermostable proteins may be useful in the chemical industry
as reagents for bioconversion or biocatalysis.
Using the whole-genome shotgun approach, we completed the sequence of
the entire 1.75-Mb genome of M. thermoautotrophicum during 1997.
In the shotgun phase, we generated over 36,000 sequence reads (about 13Mb,
or 7.5-fold genome coverage). The reads were assembled, and the resulting
sets of overlapping fragments were joined together by using a "primer-walking"
technique to generate new sequences extending from the ends of the contigs.
Additional biochemical tools and computer programs were used to identify
and fix misassembled regions and to confirm the links between the assembled
sequences, allowing us to reconstruct the entire circular chromosome.
The resulting sequence was analyzed to identify the encoded genes. Many
M.
thermoautotrophicum genes encode proteins that are more closely related
to eucaryal proteins (from higher plants and animals) than to bacterial
ones. This is especially true of components involved in transcription and
translation, processes by which gene sequences are "expressed" to produce
protein products in the cell. Comparisons to the genome of Methanococcus
jannaschii (another archaeon) revealed many similarities but also many
differences. Both organisms contain a significant number of unique genes
that are unrelated to any other known genes. This finding underscores the
high degree of complexity and genetic diversity present in the biological
universe.
Clostridium acetobutylicum
Continued Microbial Genome Program work in our laboratory focused on
the gram-positive, spore-forming bacterium C. acetobutylicum ATCC 824.
Its 4.1-Mb genome, reflecting its more complex life processes and metabolism,
is more than twice the size of Methanobacterium. The organism is
related to the pathogenic species C. botulinum, C. tetani, and C.
perfringens, which cause the diseases botulism, tetanus, and gangrene,
respectively.
Isolates of C. acetobutylicum were identified before the First
World War when rubber shortages stimulated a search for microbes that could
produce butanol for synthetic rubber production. Chaim Weizmann (who later
became the first president of Israel) developed a process for ABE fermentation
(to produce acetone, butanol, and ethanol) using C. acetobutylicum
and plant starch that was later pursued commercially. Demand for acetone
during the Second World War led to the establishment of a molasses-based
ABE process, but increases in the cost of molasses, together with advances
in the petrochemical industry, led to its eventual abandonment.
Since that time, scientific interest in the solvent-producing Clostridia
has continued. A great deal of work has been done to elucidate the metabolic
pathways by which solvents are produced. Many solvent-overproducing derivatives
(strains) have been identified, and it is now possible to pursue a rational
approach to develop modified strains with industrially useful properties.
Experimental research systems have been developed that allow genes to be
manipulated in these organisms, and strains have been altered to grow on
cellulose constituents that will not support the growth of natural strains.
The complete genome sequence will be immensely useful in further development
of these organisms as natural bioconversion factories for the chemical
and fuel industries.
C. acetobutylicum ATCC 824 was sequenced by the whole-genome
shotgun approach, essentially as described above but including several
technological advances. The finishing phase involved exhaustive gap closure
and quality enhancement using a variety of biochemical methods and computational
tools. Only a few gaps remain, and a publication describing the work is
expected during 1999.
The genome sequences of M. thermoautotrophicum and C.acetobutylicum
are freely available in public databases, enabling research scientists
throughout the world to access the information to expedite the development
of useful derivatives of these and other organisms.
This is a completed project.
The Complete Genome of the Hyperthermophilic Bacterium
Aquifex aeolicus
Ronald Swanson
Diversa Corporation; 10665 Sorrento Valley Road; San Diego, CA 92121
619/623-5156, Fax: -5120, rswanson@diversa.com
www. diversa.com
Diversa Corporation has completed the genome sequence of the most heat
tolerant bacterium currently known. This organism, Aquifex aeolicus,
is capable of growing at up to 95°C (203°F). Isolated and described
only recently, Aquifex is related to filamentous bacteria first
observed at the turn of the century, growing at 89°C in the outflow
of hot springs in Yellowstone National Park. Observation of these macroscopic
assemblages would later be instrumental in the drive to culture hyperthermophilic
organisms.
Aquifex is able to grow on hydrogen, oxygen, carbon dioxide,
and simple mineral salts. The complex metabolic machinery necessary to
function as a hyperthermophilic chemolithoautotroph is encoded within a
1,551,335-bp genome only one-third the size of Escherichia coli;
this small size appears to limit metabolic flexibility. The use of oxygen
as an electron acceptor is enabled by the presence of a complex respiratory
apparatus. Despite the fact that this organism grows at bacteria's extreme
thermal limit, only a few specific indications of thermophily are apparent
from the genome.
One of the most exciting results of sequence analysis is the lack of
coherence in the apparent phylogenies of different genes. It was widely
anticipated that, because of the small subunit ribosomal RNA gene's branching
position near the bacterial lineage's root, Aquifex gene analysis
would shed light on the phenotype of bacteria's last common ancestor, including
the bacterial domain's hypothesized thermophilic origin. However, protein-based
phylogenies do not in many cases support the original rRNAbased placement
and show no consistent picture of the organism's phylogeny. This result
has fundamental implications for our understanding of the evolutionary
mode.
The sequencing strategy used to assemble the complete genome was based
on the whole-genome shotgun approach. Shotgun sequencing is characterized
by two phases: an initial, completely random phase in which most data are
collected, and a closure phase in which directed techniques are used to
close gaps and complete the assembly. By pursuing a strategy in which only
97% coverage was achieved initially, we were able to limit the number of
random-phase sequences to only 10,500. Sequence fragments were assembled
on an Apple Macintosh computer using Sequencher, a commercially available
assembly and editing program. Sequences were obtained from both ends of
clones randomly chosen from a fosmid library; using Sequencher, we assembled
these sequences with consensus sequences derived from the contigs of random-phase
sequences. Gaps between contigs were closed by direct sequencing on fosmids
not wholly contained within a contig. The final assembly comprises 13,785sequences
with an average edited read length of 557bp.
More than half of Aquifex's 1512 open reading frames were assigned
a putative function based on similarity to known sequences. The extreme
thermostability of Aquifex proteins, coupled with their bacterial
origins, makes them ideal candidates for over expression, nuclear magnetic
resonance imaging, and Xray crystallographic studies. Consequently, large
numbers of researchers are pursuing structures of the thermostable Aquifex
proteins, and several heterologously expressed proteins are being evaluated
in commercial applications.
This is a completed project.
The Genome Sequence of the Hyperthermophilic Archaeon
Pyrococcus furiosus
Robert B. Weiss, Frank Robb,1 and James R. Brown2
Human Genetics Dept.; Eccles Institute of Human Genetics; 20 South
2030 East, Room308 BPRB; University of Utah; Salt Lake City, UT 84112-5330
801/585-3435 or -5606, Fax: -7177, bob.weiss@genetics.utah.edu
or bob@watneys.med.utah.edu
1Center of Marine Biotechnology; University of Maryland
2Microbial Bioinformatics Group; SmithKline Beecham Pharmaceuticals
http://www-genetics.med.utah.edu/
Pyrococcus furiosus is the best-studied member of the unusual
class of organisms known as extreme hyperthermophiles because they live
at extremes of temperature and pressure. Isolated from geothermally heated
marine sediment in the shallow waters off Vulcano Island, Italy, P.
furiosus grows optimally at 100°C and derives its energy by fermentation
of protein, peptide, and sugar mixtures found in its geothermal environment.
The organism is fast growing and capable of dividing every 40 min.
Extreme hyperthermophiles play an important role in advancing the fundamental
understanding of protein biochemistry, RNA and DNA metabolism, and protein
interactions. How has a cell's macromolecular machinery adapted to function
at 100°C? Proteins from organisms living at moderate temperatures unfold
or denature when heated, but proteins from hyperthermophiles maintain their
three-dimensional shapes. The genome sequence provides a resource for beginning
to understand why this happens.
Extremely stable proteins have potential biotechnological uses as rugged
industrial catalysts. The diverse metabolism of P. furiosus provides
a wide variety of biocatalysts that are potentially useful as environmentally
safe reagents in transforming biomass to derive energy and specialty chemicals
and in degrading organic compounds for environmental detoxification.
The P. furiosus genome sequence was completed recently. Its circular
chromosome is 1,908,253bp long with a G-C content of 40.8%. The sequencing
strategy tested a variant of whole-genome shotgun sequencing with a new
sequencing vector that allows the genome to be subcloned as larger pieces.
The genome was pieced together from fewer than 2500 subclones, compared
to the more typical number of 20,000. These mediuminsert sequencing vectors
may help to assemble the larger human and mouse genomes.
Genome analysis and annotation are ongoing. Recently, the complete sequence
of the distantly related P. horikoshii, which was isolated from
a hydrothermal vent at a depth of 1395m in the Sea of Japan, was determined
by a group in Japan. P. furiosus
and P. horikoshii diverged over 100 million years ago, and
comparisons between them are providing unique insight into processes that
result in changes to genes and genomes by revealing complex gene rearrangements
and changes in gene content.
The sequence was completed in late November 1998, and the annotation
phase was completed early in 1999. The sequence is available for searching
and downloading from the Web.
Library construction, sequencing, and assembly and the production of finished
sequence was done at the University of Utah. Dr.Frank Robb's group provided
the organism and has assisted in the finishing and annotation stages. Dr.Brown's
group is assisting in the gene-finding and annotation stages of the project.
This is a completed project. |