DOE Genomes
Human Genome Project Information  Genomics:GTL  DOE Microbial Genomics  home
-

Milestone 2: Develop Methods and Concepts Needed to Achieve a Systems-Level Understanding of Microbial Cell and Community Function, Regulation, and Dynamics

Research Highlights for Milestone 2: Cell and Community Function, Regulation, and Dynamics

GTL Milestone 2 has Two Distinct Components:

Component A. Systems Analytical Measurements (Omics) of Microbes and Microbial Communities

Background and Science Needs

Of all the molecular components, the proteome is the most critical to measure comprehensively. The end result of genome transcription and expression, the proteome comprises the cell’s working parts. Understanding its dynamic nature calls for methods to accurately, sensitively, and temporally monitor the conditional state of any organism’s entire proteome, correlated to other cellular molecular species. This task will require greater completeness, resolution, and sensitivity than has been possible with conventional imaging and gel-based technologies. Providing a comprehensive view of proteome organization and dynamics promises to be a singularly important watershed of whole-genome biology for the coming decade because it will enable, inform, and enhance virtually all other molecular and cellular investigations. As a starting point for studying regulatory networks, cell pathways, and metabolic interactions in microbial communities, such a comprehensive view would provide basic understanding of how an entire cell and community work.

This information would complement that derived by using capabilities developed under Milestone 1. Progress already has been achieved in the development of technologies with the resolving power, dynamic range, and sensitivity to rapidly measure a cell’s proteome.

To develop ultimately a predictive understanding of these systems, the proteome must be analyzed in conjunction with the intracellular mix of RNAs, metabolites, and signaling molecules. Also requiring analysis is the extracellular and intercellular mix of environmental physicochemical variables; signaling molecules; metabolites and their metabolic intermediates (e.g., in syntrophy); genetic materials; and other microbial species and their genetic, phenotypical, and physiological makeup.

Community Structures and Processes: Science Needs

The core of systems biology is the ability to measure, in a coordinated way, all the cell’s responses and functions as referenced to the genome sequence. Microbes have many mechanisms to position themselves relative to their environment’s physicochemical variables and to each other to optimize microbial-community function. The dynamic and intimate nature of interactions in microbial communities is such a dominant phenomenology that community behavior, not just microbes acting alone, must be deciphered to develop a predictive understanding of microbial systems, even at the cellular level. These structured communities live in ocean water columns, on particulates or plant roots in soils, and on minerals in the deep subsurface of the earth and ocean. While initial analytical attempts necessarily will be global measurements of ensemble samples, the nature of interactions and behaviors in local niches will require the ability to make measurements that can spatially resolve (image) such variables in a single cell, within a community, and in a well-defined environment. A citation by the American Academy of Microbiology reads:

There is a need to “develop technology and analysis capability to study microbial communities and symbioses holistically, measuring system-wide expression patterns (mRNA and protein) and activity measurements at the level of populations and single cells.”(Stahl and Tiedje, Microbial Ecology and Genomics: A Crossroads of Opportunity, American Society for Microbiology, 2002)

Microbial communities essentially act as a multicellular organism, utilizing the function of individual components for the benefit of the whole, including functional flexibility and diversity. Microniches, in which microbes exhibit unique phenotypes, are formed within communities. In these communities, microbes find protection from the environment and communicate within and between populations, exchanging nutrients, regulatory and sensing molecules, metabolites, and genetic materials. They exhibit a wide variety of ecosystem interactions including syntrophy, commensalism, amensalism, predation, parasitism, mutualism, competition, and warfare. These complex functions and relationships can be analyzed only in a community context.

Success in achieving this milestone will set the stage for causally linking gene regulation, proteome composition, architecture, and dynamics with cellular and community function. The ultimate test for an accurate and useful understanding of causality in any system is the capacity to predict how the system will change when perturbed by new external or internal stimuli, in this instance including genetic changes. A long-term aim of GTL is to develop the theoretical infrastructure and knowledgebase for understanding the microbe and community at the proteome level, in multiprotein complexes and the pathways and structures they comprise, and in intermicrobe interactions and processes.

This understanding will require the coupling of increasingly sophisticated models with experimental tests of predictions from models. The following are specific milestone objectives under Component A:

Component B. Networks and Regulatory Processes

Background and Science Needs

Understanding gene regulatory networks is prerequisite for redesigning biological control systems required to solve a wide range of problems we can barely fathom today. Gene regulatory networks explicitly represent the causality of life systems. They explain exactly how genomic sequence encodes the regulation of expression of the large sets of genes that create the biological processes we observe, measure, and utilize to practical ends. It is at the system level of gene regulatory networks that we can address biological causality and provide a complete answer to why biological systems function as they do.

Regulatory processes govern which genes are expressed in a cell at any given time, the level of that expression, the resultant biochemical activities, and the cell’s responses to diverse environmental cues and intracellular signals. This most fundamental domain of life—genomic control systems—is now within reach of the biosciences. Flexible and responsive, these genomic control systems consist essentially of hardwired regulatory codes that specify the sets of genes that must be expressed in specific spatial and temporal patterns in response to internal or external inputs. In physical terms, the control systems consist of thousands of modular DNA sequences, which receive and integrate multiple regulatory inputs in the form of proteins. These proteins recognize and bind to them, resulting in transfer of specific transcriptional instructions to the protein-coding genes they direct. The most important of all classes of such regulatory modules are those that control the activity of genes encoding the DNA-recognizing regulatory proteins themselves. These genes, and the control sequences of the genes to which their protein products bind, can be treated literally as networks of functional regulatory linkages. Each such linkage joins a regulatory gene to its target DNA regulatory sequence modules. For microbial systems, GTL will encompass comprehensive mapping of all these regulatory processes, including the cytoplasmic regulation that operates following gene expression of the functioning networks.

The regulatory genome is a logic-processing system. Every regulatory module encoded in the genome—that is, every node of every gene regulatory network—receives multiple disparate inputs and processes them in ways that can be represented mathematically as combinations of logic functions (e.g., “and”functions, “switch”functions, and “or”functions). At the system level, a gene regulatory network consists of assemblages of these information-processing units. Thus it is essentially a network of analogue computational devices, the functions of which are conditional on their inputs.

Major objectives for this milestone are to develop methods to discover the architecture, dynamics, and function of regulation; make useful computational models; and learn how to adapt and design them. To redesign these most potent of all biological control systems to produce desired functions, first we must be able to insert regulatory subcircuits—far beyond any simple gene insertions—into the target biology; second, we must understand the flow of causality in a genomically encoded gene regulatory network to design an effective means of altering it.

Gaining a comprehensive view of the architecture of microbial regulatory networks will not necessarily reveal how such networks really work, nor will it provide a solid basis for employing or modifying them in useful ways or designing new ones. Mastering the complexities of regulatory switches, oscillators, and more complex functions will require a predictive theoretical framework and computational horsepower, coupled with experimental resources to test and validate models. To meet this challenge, GTL will seek to nurture and accelerate emerging capabilities that include new concepts combined with relevant ideas from engineering, applied mathematics, and other disciplines.

Within this network-discovery portion of the milestone, one activity is to map related networks at multiple nodes across phylogeny based on comparison of genome sequences. Knowledge of comparative network structure and function is likely to produce insights into fundamental issues in biology, in addition to providing essential information for GTL’s later phases. Initial tasks will be to identify and map core regulatory network components (e.g., regulons, operons, and sRNAs). Integral to this effort is the task of relating the regulatory apparatus to the groups of target genes they regulate and to whatever is known about the function of those target genes.

To map regulatory networks, several core technologies and approaches will be needed. Pilot studies will further define the best approach to use in genomes of varying sizes and structures. One such promising strategy is to use comparative genomics to initiate large-scale component identification, focusing on candidate regulatory sequences and their interacting regulatory proteins. Results from comparative sequence analysis would then be integrated with data from such other key technologies as large-scale gene-expression analysis, comprehensive loss-of-function and gain-of-function genetic analyses, and measures of in vivo protein-DNA interactions and proteome status, among others.

Other critical elements in network mapping will come from, for example, proteomic and metabolomic activities encompassed by Component A of this milestone or by specific adaptation of those technologies to regulatory network components. These elements include learning the composition of multiprotein complexes that assemble on DNA to regulate gene expression; learning the composition and regulatory actions of protein machinery that govern post-transcriptional and post-translational regulation; and determining subcellular localization of regulatory proteins and how localization changes as a function of circuit dynamics.

Vigorous application of a comprehensive genome-wide approach to network mapping in selected microbes has the potential to yield the first complete dissection of the regulatory networks that run a living cell. Regulatory networks in microbes employ many mechanisms distinct from both transcription and translation. Examples include active control of protein turnover, dynamic localization of regulatory and structural proteins, cell membrane processes, and complex phosphor-transfer pathways. Studying nontranscriptional systems, therefore, is critical for fully understanding regulatory mechanisms. The following are specific milestone objectives under Component B:

Computation Needs

Computational capabilities must be developed for the following: