DOE Genomes
Human Genome Project Information  Genomics:GTL  DOE Microbial Genomics  home
-

Milestone 1: Develop Techniques to Determine the Genome Structure and Functional Potential of Microbes, Plants, and Microbial Communities

Research Highlights for Milestone 1: Sequences, Proteins, Molecular Complexes

GTL Milestone 1 has Two Distinct Components:

Component A. Microbial Sequences and Protein Characteristics

Background and Science Needs

Proteins are the chemically and physically active products of virtually all genes. Highly dynamic and shifting in amount, modification state, higher-order association, and subcellular localization, proteins carry out the primary functions of a cell in response to intracellular and extracellular signals.

For a systems understanding of microbes, we first must understand the panoply of proteins the genome is capable of producing. GTL’s first challenge in studying mission-relevant microbes and microbial communities is to determine the system’s genetic makeup and the extent and patterns of genetic diversity. This is especially true when many identified coding genes are unknown, microbes are unculturable, or only gene sequence is in hand (e.g., metagenomic experiments involve determining the genetic sequence of a whole community of microbes).

Unknown genes are the first target. With a mature database of thousands of microbes available within a decade, comparative genomics, phylogenetic analysis, and sophisticated computational annotation will provide an increasingly complete set of gene functional assignments. In the interim and to reach that end state, we must be able to perform functional annotations based on information from proteins produced from sequence and analyzed biophysically and biochemically in vitro. GTL’s ultimate goal, however, goes beyond simple assignment to achieving a mechanistic structural and functional understanding of proteins and molecular machines that can form the basis for comprehensive and predictive systems models.

The availability of gene sequence and proteins allows the generation of various affinity reagents. Development of affinity methods and reagents from produced proteins will open the door to identifying and tracking microbes and specific proteins in complex and dynamic microbial systems. Affinity reagents also can be used to manipulate (activate or inactivate) proteins, capture and track them, and determine their relative locations through a variety of sensitive analytical methods for understanding and visualizing protein structure, function, and behavior. Specific milestone objectives are set forth below.

Computation Needs

Computational challenges in characterizing the composition and functional capability of microorganisms range from “simple”data management to complex data analysis, integration, and use. New algorithms for DNA sequence assembly, as well as better use of current state-of-the-art methods and annotation, will be required to analyze multiorganism sequence data; new modeling methods will be needed to predict the behavior of microbial communities. Computational research must develop methods to

Component B. Molecular Complexes

Background and Science Needs

Most proteins do not act alone but instead are organized into molecular complexes (machines) that carry out activities needed for metabolism, communication, growth, and structure. GTL’s first milestone includes the creation of capabilities for comprehensively identifying, characterizing, and beginning to understand multiprotein complexes. These studies will help build the essential knowledgebase, and the stage will be set for linking proteome dynamics and architecture to cellular and community functions.

Identifying and characterizing multiprotein complexes on a genome-wide scale will require new tools and research strategies designed to increase throughput, reliability, accuracy, and sensitivity. While RNA measurements, such as microarrays, can give us a notion of which machines might form, the importance of understanding post-transcriptional and post-translational regulation requires direct knowledge of proteins and their interactions. Also, new tools for characterizing these complexes must bridge current size and resolution gaps between the high-resolution technologies for studying single proteins and those suitable for very large protein assemblies and cellular ultrastructures that are more amenable to just-emerging nanoscale structural techniques.

An initial target for GTL is to develop a suite of methods to isolate, identify, and characterize all essential protein complexes in a microbial system. Currently, only a few of the most stable and common protein complexes are well characterized, but data suggest that hundreds, if not thousands, of other complexes operate together to carry out cellular functions. Many important associations may be less stable, less abundant, and more dynamic. The near-term challenge is to develop methods to analyze the difficult ones. These most demanding protocols can be supported in a comprehensive way only with a technically and scientifically robust infrastructure. Providing the necessary infrastructure and scaling up these capabilities in research centers will enable scientists to rapidly generate a draft protein-machinery map of a typical microbe of interest to DOE.

An important aspect of understanding the assembly, stability, and function of protein complexes is the high-throughput characterization of protein-protein proximity and interfaces within complexes and between interacting complexes. When coupled with other information about structure and interrelationships among proteins, this characterization will provide a comprehensive database for understanding spatial and temporal hierarchies in the assembly of protein complexes. Ultimately, this analysis will reveal the internal, transmembrane, and extracellular structure of cells and bring understanding of how assembly and disassembly of these complexes are organized and controlled. Data on coincident expression and cellular or subcellular localization can powerfully constrain possible functions for a given multiprotein complex. By coupling localization and colocalization information with genetic and biochemical data from diverse sources, scientists can postulate and then test the contributions of specific complexes to a cell’s survival and behavior. High-throughput implementation of new and existing technologies will be needed to achieve these goals.

Molecular Complexes.

Develop Capabilities for a Predictive Understanding of Protein Interactions and the Resulting Structure and Properties of Molecular Complexes

Computation Needs