Subsystem Navigation Tutorial |
UserPreferences |
SEED Wiki | FrontPage | RecentChanges | TitleIndex | WordIndex | SiteNavigation | HelpContents |
Part I. Explore Subsystem page in NMPDR - the basics: identify functional Subsystem(s) in NMPDR that your protein of interest potentially belongs to, explore a Subsystem page in light browsing mode. Learn basic tools of subsystem visualization and analysis.
Part II. Explore Subsystem page in SEED - advanced: navigate Subsystem page in advanced browsing mode. Learn about additional tools available in SEED (not in NMPDR yet).
There are several ways to find Subsystem(s) relevant for a query protein:
(i) If a query protein has been associated with one (or more) Subsystem(s), this will be indicated on the corresponding PEG page by:
-- a numerical entry (1, 2, etc) in the SS column of the Context table indicating the number of different Subsystems this PEG is connected to;
-- a link in the table "Subsystems in Which This Protein Plays a Role" under the Context table.
Note, that activating this link opens Subsystem page (i) in a simplified light browsing mode and (ii) with Subsystem spreadsheet display limited to a small number of genomes in the immediate phylogenetic neighborhood of the organism, from a protein of which you have started. To display all genomes connected to this Subsystem, highlight Show all in a drop-down menu and click show spreadsheet button (located under SS spreadsheet).
(ii) If a query protein is not associated with any Subsystem, check if any close homologs of your protein have been included in a Subsystem. Open Bidirectional Best Hits table. Homologs associated with Subsystem(s) will have a numerical entry (1, 2, etc) in the column In Sub. Go to the respective PEG page (by clicking on it's ID) and then follow Subsystem link as described in (i) above.
(iii) Finally, one can browse a list of Subsystems in NMPDR. Reach the list of Subsystems by clicking on Subsystems button from the main NMPDR search page.
(i) Browse Subsystem (SS) page. It opens with Subsystem spreadsheet - a table, in which each column represents a functional role in the Subsystem, each row represents a specific genome, and cell are populated with genes/proteins that implement specific functional roles in each organism. Protein IDs in the cells are linked to the corresponding PEG pages.
(ii) By default, cells within the same row (genome) that contain genes co-localized (“clustered”) on a chromosome - are highlighted by a matching color. Note conserved gene clusters present in a large number of diverse organisms.
(iii) Table of Functional Roles constituting this SS. The roles are defined by the most general descriptive names and include corresponding Enzyme Classification (EC) numbers, whenever available. Note, that role names serve as connectors associating PEGs with Subsystems -- and must exactly match gene annotations in the underlying database. Abbreviations of functional roles are used in Subsystem spreadsheet and SS diagrams.
(iv) Subsets of Roles table. The concept of sub-sets plays an important role in subsystems encoding and interpretation. They usually represent compact units, such as multi-subunit complexes, or variants of pathways. A star (*) in front of a sub-set abbreviated name causes all the functional roles grouped in it to collapse into a single column in a Subsystem spreadsheet – a useful feature for displaying synonymic functional roles or subunits of multi-subunit complexes
(v) A small set of tools "Spreadsheet Options" is located immediately under Subsystem spreadsheet. It allows to limit/expand spreadsheet display to selected group of organisms. Try using the following:
-- limit//expand the number of organisms displayed in the SS Spreadsheet. To display all genomes connected to this Subsystem, highlight All in a drop-down menu and click show spreadsheet button. Use the same menu to limit display to a specific phylogenetic group
-- Show clusters - Check/uncheck the box near Show clusters and press show spreadsheet button. The cells in the same row (genome) highlighted by a matching color contain genes that are located in the immediate vicinity of each other (“clustered”) on a chromosome.
-- Show [-1] variants is assigned to all organisms lacking a functional variant of a pathway. By default such organisms are not displayed in a SS spreadsheet. To view these genomes check the Show -1 variants box and press show spreadsheet button under the spreadsheet. Variant codes capture meaningful differences in SS implementation across the genomes. While defining a subsystem, annotators include a collection of functional roles broad enough to cover distinct variations in all relevant organisms. Each subset of functional roles that exists in at least one organism with an operational version of a Subsystem constitutes a functional variant. Variant codes allow formal cataloging of variations in pathway implementation across microbial kingdom, as well as semi-automatic accurate propagation of gene annotations
(vi) Notes section at the bottom of each SS page contains annotator’s comments, lists open problems identified during SS encoding and analysis, and - most importantly - explains Variant codes identified in the SS.
You are on the SEED main (or Search) page now. Please, familiarize yourself with it. You can return to this page from any place in SEED by clicking FIG search link located at the top left corner of every page.
You don't need to authenticate yourself to simply browse the database. Username is required only to be able to annotate genes or encode new Subsystems in SEED (not covered by this Tutorial).
To open the same Subsystem in an unabbreviated advanced browsing mode – enter it from the SEED main search page by clicking on Work on Subsystems button. From here you can browse a list of Subsystems or search for a relevant term (use your browser’s Find in this page functionality). Click on SS name to enter the corresponding SS page.
(i) Browse Subsystem page - it's organization is similar to that in NMPDR, but relative positions of Functional Roles and Subsystem spreadsheet tables are switched. In SEED SS page opens with the table of Functional Roles
(ii) Subsets of Roles table is located immediately under it. Importantly - a menu further below this table allows one to limit spreadsheet display to a selected sub-set of functional roles (columns)
(iii) A list of taxonomical groups further below can be used to limit spreadsheet display to a selected group of organisms (rows). Try using the following:
-- limit the number of columns (roles) to those encoding the 3 terminal steps of Heme biosynthesis: highlight subset Terminal_steps_heme
-- limit the number of rows (organisms) displayed to Firmicutes (or any other taxonomical group of your choice)
Click Show spreadsheet button under the Spreadsheet to activate your choices.
(iv) Subsystem diagram (if available) can be accessed via a link above SS spreadsheet. Try opening one. On these graphic maps functional roles are shown by abbreviations in boxes (hover with a mouse over role abbreviations to display full role names). Key metabolites and intermediates are shown by roman numerals in circles (linked to KEGG Compounds db). Diagrams can be highlighted to show the presence/absence of genes implementing each functional role in a specific organism. Select a genome from a drop-down menu and press Color diagram button.
(v) The main Subsystem visualization/construction tools are located below the SS spreadsheet:
-- sorting -- Select an option Alphabetical and press show spreadsheet button below. The organisms in the SS spreadsheet will be rearranged accordingly. Selecting Patterns arranges organisms according to the presence/absence of PEGs in the cells of a spreadsheet – a useful tool for analyzing variations in SS implementation in different organisms. Limiting SS display to different subsets of roles further expands sorting capabilities
-- ignore alternatives - if activated, will cause composite columns (corresponding to subsets prefixed with a star (*)) to open, displaying each functional role in a separate column. After limiting SSs spreadsheet to subset Terminal_steps_heme, try activating this button and pressing show spreadsheet button below.
-- color rows by each organism’s attribute - choose an attribute from a drop-down menu (e.g. motile or oxygen) and click show spreadsheet. Make sure the checkbox Show clusters is NOT activated, otherwise the system will attempt to color cells by two parameters at once, leading to a meaningless display of colors. Entire row corresponding to an organism will be colored to show selected attribute. Legend explaining color usage will also appear.
-- color columns by each PEG’s attribute - choose an attribute from the drop-down menu, e.g. Essential_Gene_Sets_Bacterial and click show spreadsheet button. Gene essentiality as determined in genome-wide experimental screens in ten microorganisms will be displayed (for details follow blue link Essentiality Data available on top of every page in SEED)
(vi) Finding candidate genes for "missing" functional roles:
-- find an empty cell in SS spreadsheet that you'd like to fill (= you have reasons to believe that a particular protein must be present in a particular organism, yet the corresponding cell is empty)
-- locate show missing with matches tool under SS spreadsheet
-- activate check-box near this tool
-- use To restrict to a single genome window to select the organism from a drop-down menu (required)
-- use To restrict to a single role window to select a role from a drop-down menu (optional)
-- click show spreadsheet button
This tool activates automatic search for candidate genes in an attempt to fill the specified empty cell (or ALL cells in the spreadsheet row corresponding to selected genome). This search is homology-based, it scans selected genome using as query all proteins in the corresponding SS column. Missing Entries table with candidate genes will be generated at the bottom of your SS page. The tool is self-learning: only proteins already associated with SS spreadsheet are used as query.
-- Examine each candidate by opening the corresponding PEG page (follow the links in PEG column in Missing Entries table). It is convenient to open each PEG page in its own Tab or Window in order not to loose the SS page with search results. Analyze close homologs of the candidate genes, their genome context, annotations of the candidate genes in other databases. Consider functional context as well: using SS diagram rationalize addition of each candidate functional role in the context of heme biosynthesis in your specific genome.
-- Candidates that you find acceptable could be further re-annotated so that protein annotations will match exactly functional roles as they appear in SS. Such candidate PEGs will be automatically connected to your SS spreadsheet. But editing capabilities in SEED - are a whole different topic and are beyond the scope of this tutorial.
NMPDR | National Microbial Pathogen Data Resource, one of the 8 National Bioinformatics Resource Centers (BRCs) |
SEED | Genome database and collection of tool for genome annotation and comparative analysis, underlying NMPDR viewer |
FIG | Fellowship for Interpretation of Genomes - original developers of SEED |
fid | FIG ID |
PEG | Protein-encoding gene |
Subsystem (SS) | collection of functional roles (enzymes or structural subunits, regulatory proteins, transporters, etc) that together implement a biological process or structural complex across a large collection of genomes |
Functional Role | abstraction of a specific biological function as retained within a group of homologs. Similar to the notion of a protein family, but with the added requirement of common biological function |
SS spreadsheet, Populated SS | a spreadsheet in which each column represents a functional roles in the SS, each row represents a genome, and each cell contains the gene(s) that implement this functiom in this organism. By default, genes that are clustered on the chromosome share the same background color in the spreadsheet |
Functional Variants | Different combinations of functional roles that represent distinct operational forms of a subsystem in different organisms |
Variant Codes | Numeric codes used to distinguish different functional variants |
Functional Coupling (FC) | a tendency of functionally related genes to be co-localized on prokaryotic chromosome |
fc-sc | Functional coupling score, which take into account the number of genomes in which the two genes are neighbors, and the phylogenetic distance between the genomes. The score is approximately equal to the number of different species (not strains) in which the two genes are co-localized |
Ev | Evidence codes |