MiST Information


Here you will find a brief description about the MiST database and how to use the MiST database to its full extent. Follow the links below to quickly navigate to a topic of interest





Starting with MiST

Generally, exploring MiST first involves selecting one or more bacteria to compare or navigating to an organism of interest. There are three major entry points into exploring MiST from the homepage: the taxonomy selector, organism list, and querying.


  1. Taxonomy selector

    This taxonomically-organized approach to finding organisms of interest is best suited for performing queries across multiple organisms. The taxonomy based selector organizes and displays a list of organisms in a hierarchical fashion according to their taxonomy. The number of organisms associated with each taxonomic designation is given in parentheses beside its name. Clicking on the 'Show/Hide' link beside each taxonomic designation will reveal the organisms belonging to that particular group.


    The taxonomy level represents the depth or extent to which organisms are grouped taxonomically. The current taxonomy level defaults to phyla and is displayed with a green font. You may view the taxonomic tree at a different level by clicking on the desired taxonomic level (e.g. class, order, etc.)


    There are a couple of means for selecting an organism(s). The most intuitive method invovles display the organisms beneath a taxonomic designation by clicking on the 'Show/Hide' link and clicking the organisms of interest. Once an organism is selected it will display a check in the checkbox beside its name.


    The second alternative method selects/deselects groups of organisms belonging to a particular taxonomic designation. Clicking on the small gray button beside a taxonomic designation selects (or deselects) all the organisms belonging to this group. For example, to select all Cyanobacteria, make sure the taxonomy level is set to phyla and then push the button beside Cyanobacteria. Clicking the 'Show/Hide' link beside Cyanobacteria should reveal that all cyanobacteria species have a check in the checkbox beside their name.


    After selecting all the organisms of interest to compare, click the button labeled 'Select Oragnisms' to continue the analysis. For more information about comparing these selected organisms, see the querying multiple organisms section.


  2. Organism list

    The organism list displays the complete list of bacterial and archaeal genomes contained in the MiST database along with some of their basic information (e.g. GC content, number of genes/proteins,etc.). The list is sorted alphanumerically by the organism name; however, clicking on the header column labels (displayed in reddish-brown) will sort by that column. Click on the name of a given organism to view details about its signal transduction capabilities. To learn more about the details of the selected organism, see the view organism section.


  3. Querying MiST from the homepage

    There are two means of querying MiST from the homepage: 1) By organism name, or 2) Genbank Identification (GI) number.


    To query by organism name, make sure 'Organism' is selected and type part (or all) of the organism name in the adjacent text box and push the search button. Any organisms that match the given text will be displayed with links to viewing their signal transduction capabilities (see the view organism section). Any keywords separated by spaces will be interpreted as separate queries, allowing the user to search for multiple organisms simulatenously.


    To search for proteins in MiST that match a given GI, make sure 'GI number' is selected and type in one or more GI numbers separated by spaces and push the search button. If a protein within MiST matches the given GI number, a link to this protein will be displayed including its associated organism. Frequently, the given GI number will not hit a direct match in the MiST database. This is due to many GI numbers identifying the same sequence. If this happens, an attempt will be made to find proteins within MiST that have an identical sequence to the sequence represented by the query GI number. Clicking the link to a particular protein will display the View Protein page with variuos information about that protein. For more information about viewing proteins, see the view protein section.




Querying multiple organisms

MiST makes it possible to query multiple organisms for proteins containing a particular domain(s), description, locus, GI number, or internal MiST identifer. First select two or more organisms via the taxonomy selector. This next page, the select analysis page, displays the list of selected organisms in a tree-like list according to their taxonomy. Next, select the type of search to perform (e.g. Domain, description, etc.), fill in the search terms of interest, and push the search button. The query will be executed against each organism that was selected.


Please note the following:


The GI number, description, and domain architecture of each protein match is displayed beneath its associated organism. To view more detailed information about a particular protein, click its GI number, which will take you to the view protein page for this protein. The domain name, start, stop, score, evalue, and significance for all predicted domains including overlapping or insignificant domains can be displayed by clicking the appropriate 'Details' link. Clicking the 'Details' link again will hide this information.


To carry out another search against this same organism set, use your browser's back button to return to the Select Analysis page and simply input your new query text.




View organism page

The View Organism page provides a general overview of an organism's genome, and its signal transduction network, along with the ability to search this organism for particular keywords. Specifically, there are four primary sections to understand: 1) the Genome summary, 2) Signal transduction profile, 3) Querying, and 4) the table of signal transduction proteins by replicon.


  1. Genome summary

    This section contains basic information about the organism's genome and taxonomic classification, and is displayed in the upper left. Just beneath the taxonomic information is a table containing the summary of the predicted two-component, and one-component systems. The number of two-component systems is based on the number of predicted response regulators as these imply a particular output response. This number is an estimate that is automatically determined and thus should not be taken as exact or necessarily accurate. Rules for distinguishing between phosphorelays or other more complex response regulator types (hybrid histidine kinases) have not been implemented in this calculation. Thus, for best results, it is advisable to ascertain the number of two-component systems from a manual inspection of all the predicted two-component proteins.


  2. Signal transduction profile

    This graph provides a qualitative overview of the different types of input and output characteristics of this bacterium and signaling machinery (e.g. response regulators, etc). It is important to note that this graph is based on domain counts, rather than protein counts, and thus the number in the graph will not necessarily correlate to the number of proteins containing that domain. For example, the graph may show 60 receiver domains for an organism that only contains 55 response regulator proteins. This discrepancy is due to such things as hybrid sensor kinases which contain transmitter and receiver domains. Consequently, this protein would contribute to both the transmitter and receiver counts on the graph.


    Legend
    Green histidine kinase domains
    Red response regulator domains
    Orange input domain type
    Blue output domain type

  3. Querying

    Select the type of search to perform (e.g. Domain, description, etc.), fill in the search terms of interest, and push the search button. The query will be executed against each organism that was selected.


    Please note the following:

    • All search terms are case-insensitive. Thus, the text 'RESPONSE_REG' is treated the same as 'response_reg' and 'ResPOnSe_rEg'

    • Only organisms that have checks in the checkboxes beside their name will be searched

    • Multiple keywords should be separated by spaces and are treated independently (except for domain searches, see below)

    • When performing domain searches:

      • The boolean logic operators - AND, OR - may be used for more complex domain searches. For example, to search for the PAS domain and the conserved MCPsignal domain, type (without the quotes): 'PAS AND MCPsignal'

      • By default, domain searches search both the pfam and SMART libraries. To limit this to either Pfam or SMART, prefix the query with the domain library name. For example, to specifically search for the Pfam response_reg domain, type (without the quotes): 'Pfam:Response_reg'


    The GI number, description, and domain architecture of each protein match is displayed beneath its associated organism. To view more detailed information about a particular protein, click its GI number, which will take you to the view protein page for this protein. The domain name, start, stop, score, evalue, and significance for all predicted domains including overlapping or insignificant domains can be displayed by clicking the appropriate 'Details' link. Clicking the 'Details' link again will hide this information.


  4. Table of signal transduction proteins by replicon

    This table reveals the type and number of signal transduction proteins found on each replicon.


    • To view all the signal transduction proteins contained on a particular replicon, click on the replicon name

    • To view all proteins belonging to a particular class of signal transduction (e.g. One-component proteins), click on the desired class name

    • To view the proteins on a particular replicon that belong to a particular class of signal transduction, click on the desired number within the table


    Performing one of the above actions will present a list of proteins displayed similarly to the output from querying (details), except it also includes beneath the protein description a list of the input and output domains identified in this protein.




View protein page

This page may conceptually be broken into three sections: 1) protein/gene sequence information, 2) domain architecture, and 3) a chromosome view.


  1. Protein/gene sequence information

    The full RefSeq annotation is shown in this section along with various identification information. Protein information is shown on the left-hand side, and gene information is displayed on the right-hand side. Beside the italicized protein and gene labels is the MiST identifier. To obtain either the protein/gene sequence click on the appropriate 'Sequence' link. Clicking the sequence link again will hide the sequence.


  2. Domain architecture

    The visualized Pfam and SMART domain architecture's for this protein. Information about the predicted domains may be revealed by clicking the 'Details' link. Blue vertical boxes () represent transmembrane regions, red boxes () represent signal peptides, green boxes () represent coiled-coil regions, and pink boxes () represent low-complexity regions.


  3. Chromosome view

    A graphical representation (drawn to scale) of the genes surrounding the currently selected protein/gene. The current gene is centered on the image and drawn in blue. Neighboring genes are drawn in gray and may be viewed by clicking on the gray arrow. Hovering the mouse over a particular neighboring gene will display its location and MiST identifier.


The 'Return to query screen' link will take you to whatever page you were at before entering the View Protein page(s).




Bioinformatic tools used in MiST

  1. Pfam version 19.0*
  2. SMART version 5.0*
  3. Phobius version 1.01 - signal peptide and transmembrane region prediction
  4. Coils - coiled-coil prediction
  5. Seg - low-complexity region prediction

  6. * Note: the HMMER software suite was used to search the Pfam and SMART domain libraries



Pfam and SMART domains used to predict signal transduction proteins


Domain nameSourceTypeFunction
1.ACTPfaminputSmall-molecule binding
2.Ada_Zn_bindingPfaminputSmall-molecule binding
3.AlkA_NPfaminputSmall-molecule binding
4.AraC_bindingPfaminputSmall-molecule binding
5.Autoind_bindPfaminputSmall-molecule binding
6.Cache_1PfaminputSmall-molecule binding
7.Cache_2PfaminputSmall-molecule binding
8.CHASEPfaminputSmall-molecule binding
9.cNMP_bindingPfaminputSmall-molecule binding
10.Diacid_recPfaminputSmall-molecule binding
11.Fe_dep_repr_CPfaminputSmall-molecule binding
12.FeoAPfaminputSmall-molecule binding
13.FHAPfaminputSmall-molecule binding
14.GAFPfaminputSmall-molecule binding
15.HMAPfaminputSmall-molecule binding
16.LysR_substratePfaminputSmall-molecule binding
17.NITPfaminputSmall-molecule binding
18.PASPfaminputSmall-molecule binding
19.PAS_2PfaminputSmall-molecule binding
20.PAS_3PfaminputSmall-molecule binding
21.PAS_4PfaminputSmall-molecule binding
22.PASSMARTinputSmall-molecule binding
23.PACSMARTinputSmall-molecule binding
24.Peripla_BP_1PfaminputSmall-molecule binding
25.Peripla_BP_2PfaminputSmall-molecule binding
26.SBP_bac_3PfaminputSmall-molecule binding
27.SISPfaminputSmall-molecule binding
28.STASPfaminputSmall-molecule binding
29.TetR_CPfaminputSmall-molecule binding
30.TOBEPfaminputSmall-molecule binding
31.V4RPfaminputSmall-molecule binding
32.Aminotran_1_2PfaminputEnzymatic
33.Arch_ATPasePfaminputEnzymatic
34.Citrate_syntPfaminputEnzymatic
35.Cyanate_lyasePfaminputEnzymatic
36.EPSP_synthasePfaminputEnzymatic
37.FmdA_AmdAPfaminputEnzymatic
38.GATase_2PfaminputEnzymatic
39.GlucokinasePfaminputEnzymatic
40.Glycos_trans_3NPfaminputEnzymatic
41.GlyoxalasePfaminputEnzymatic
42.HEAT_PBSPfaminputEnzymatic
43.HEM4PfaminputEnzymatic
44.NitroreductasePfaminputEnzymatic
45.NTP_transf_2PfaminputEnzymatic
46.NUDIXPfaminputEnzymatic
47.PALPPfaminputEnzymatic
48.Peptidase_M23PfaminputEnzymatic
49.peroxidasePfaminputEnzymatic
50.PfkBPfaminputEnzymatic
51.PribosyltranPfaminputEnzymatic
52.PTS_EIICPfaminputEnzymatic
53.PTS-HPrPfaminputEnzymatic
54.Pyr_redoxPfaminputEnzymatic
55.RhodanesePfaminputEnzymatic
56.SKIPfaminputEnzymatic
57.CBSPfaminputProtein-protein interaction
58.HAMPPfaminputProtein-protein interaction
59.TPR_1PfaminputProtein-protein interaction
60.TPR_2PfaminputProtein-protein interaction
61.TPR_3PfaminputProtein-protein interaction
62.TPR_4PfaminputProtein-protein interaction
63.BLUFPfaminputCofactor binding
64.Fer4PfaminputCofactor binding
65.FeSPfaminputCofactor binding
66.HemerythrinPfaminputCofactor binding
67.HhH-GPDPfaminputCofactor binding
68.NIR_SIRPfaminputCofactor binding
69.NIR_SIR_ferrPfaminputCofactor binding
70.Nitro_FeMo-CoPfaminputCofactor binding
71.PhytochromePfaminputCofactor binding
72.CHASE2PfaminputUnknown function
73.CHASE3PfaminputUnknown function
74.CHASE4PfaminputUnknown function
75.MASE1PfaminputUnknown function
76.MASE2PfaminputUnknown function
77.MHYTPfaminputUnknown function
78.TrkA_CPfaminputUnknown function
79.ArcPfamoutputDNA-binding
80.Arg_repressorPfamoutputDNA-binding
81.AsnC_trans_regPfamoutputDNA-binding
82.CrpPfamoutputDNA-binding
83.CtsRPfamoutputDNA-binding
84.DeoRPfamoutputDNA-binding
85.Fe_dep_repressPfamoutputDNA-binding
86.GerEPfamoutputDNA-binding
87.GntRPfamoutputDNA-binding
88.HTH_AraCPfamoutputDNA-binding
89.HTH_1PfamoutputDNA-binding
90.HTH_3PfamoutputDNA-binding
91.HTH_5PfamoutputDNA-binding
92.HTH_6PfamoutputDNA-binding
93.HTH_7PfamoutputDNA-binding
94.HTH_8PfamoutputDNA-binding
95.HTH_10PfamoutputDNA-binding
96.HTH_11PfamoutputDNA-binding
97.HTH_12PfamoutputDNA-binding
98.IclRPfamoutputDNA-binding
99.LacIPfamoutputDNA-binding
100.LytTRPfamoutputDNA-binding
101.MarRPfamoutputDNA-binding
102.MerRPfamoutputDNA-binding
103.PadRPfamoutputDNA-binding
104.ROS_MUCRPfamoutputDNA-binding
105.TetR_NPfamoutputDNA-binding
106.Trans_reg_CPfamoutputDNA-binding
107.EALPfamoutputDi-guanylate cyclase
108.GGDEFPfamoutputDi-guanylate cyclase
109.ANTARPfamoutputRNA-binding
110.CsrAPfamoutputRNA-binding
111.PP2C_SIGSMARToutputPhosphatase
112.PkinasePfamoutputProtein kinase
113.HDPfamoutputHydrolase
114.Guanylate_cycPfamoutputOther
115.LytR_cpsA_psrPfamoutputOther
116.Rrf2PfamoutputOther
117.RseA_NPfamoutputOther
118.HATPase_cPfamtransmitterHistidine kinase
119.HWE_HKPfamtransmitterHistidine kinase
120.HisKAPfamtransmitterHistidine kinase
121.HisKA_2PfamtransmitterHistidine kinase
122.HisKA_3PfamtransmitterHistidine kinase
123.Response_regPfamreceiverResponse regulator
124.MCPsignalPfamchemotaxisMCP
125.CheB_methylestPfamchemotaxisChemotaxis
126.CheCPfamchemotaxisChemotaxis
127.CheDPfamchemotaxisChemotaxis
128.CheWPfamchemotaxisChemotaxis
129.CheRPfamchemotaxisChemotaxis
130.CheR_NPfamchemotaxisChemotaxis
131.CheZPfamchemotaxisChemotaxis
132.HATPase_cSMARTtransmitterHistidine kinase
133.RECSMARTreceiverResponse regulator