MiST Information

Here you will find a brief description about the MiST database and how to use the MiST database to its full extent. Follow the links below to quickly navigate to a topic of interest

Starting with MiST - how to work with selecting organisms and direct querying

Taxonomy selector - select organisms based on their taxonomy
Organism list
Search for organism name or GI

Querying multiple organisms
View organism page

Genome summary
Signal transduction profile (legend)
Querying
Table of signal transduction proteins by replicon

View protein page

Basic protein/gene information
Domain architecture
Chromosome view

Bioinformatic tools used in MiST
Pfam and SMART domains used to predict signal transduction proteins

Starting with MiST

Generally, exploring MiST first involves selecting one or more bacteria to compare or navigating to an organism of interest. There are three major entry points into exploring MiST from the homepage: the taxonomy selector, organism list, and querying.

Taxonomy selector

This taxonomically-organized approach to finding organisms of interest is best suited for performing queries across multiple organisms. The taxonomy based selector organizes and displays a list of organisms in a hierarchical fashion according to their taxonomy. The number of organisms associated with each taxonomic designation is given in parentheses beside its name. Clicking on the 'Show/Hide' link beside each taxonomic designation will reveal the organisms belonging to that particular group.

The taxonomy level represents the depth or extent to which organisms are grouped taxonomically. The current taxonomy level defaults to phyla and is displayed with a green font. You may view the taxonomic tree at a different level by clicking on the desired taxonomic level (e.g. class, order, etc.)

There are a couple of means for selecting an organism(s). The most intuitive method invovles display the organisms beneath a taxonomic designation by clicking on the 'Show/Hide' link and clicking the organisms of interest. Once an organism is selected it will display a check in the checkbox beside its name.

The second alternative method selects/deselects groups of organisms belonging to a particular taxonomic designation. Clicking on the small gray button beside a taxonomic designation selects (or deselects) all the organisms belonging to this group. For example, to select all Cyanobacteria, make sure the taxonomy level is set to phyla and then push the button beside Cyanobacteria. Clicking the 'Show/Hide' link beside Cyanobacteria should reveal that all cyanobacteria species have a check in the checkbox beside their name.

After selecting all the organisms of interest to compare, click the button labeled 'Select Oragnisms' to continue the analysis. For more information about comparing these selected organisms, see the querying multiple organisms section.

Organism list

The organism list displays the complete list of bacterial and archaeal genomes contained in the MiST database along with some of their basic information (e.g. GC content, number of genes/proteins,etc.). The list is sorted alphanumerically by the organism name; however, clicking on the header column labels (displayed in reddish-brown) will sort by that column. Click on the name of a given organism to view details about its signal transduction capabilities. To learn more about the details of the selected organism, see the view organism section.

Querying MiST from the homepage

There are two means of querying MiST from the homepage: 1) By organism name, or 2) Genbank Identification (GI) number.

To query by organism name, make sure 'Organism' is selected and type part (or all) of the organism name in the adjacent text box and push the search button. Any organisms that match the given text will be displayed with links to viewing their signal transduction capabilities (see the view organism section). Any keywords separated by spaces will be interpreted as separate queries, allowing the user to search for multiple organisms simulatenously.

To search for proteins in MiST that match a given GI, make sure 'GI number' is selected and type in one or more GI numbers separated by spaces and push the search button. If a protein within MiST matches the given GI number, a link to this protein will be displayed including its associated organism. Frequently, the given GI number will not hit a direct match in the MiST database. This is due to many GI numbers identifying the same sequence. If this happens, an attempt will be made to find proteins within MiST that have an identical sequence to the sequence represented by the query GI number. Clicking the link to a particular protein will display the View Protein page with variuos information about that protein. For more information about viewing proteins, see the view protein section.

Querying multiple organisms

MiST makes it possible to query multiple organisms for proteins containing a particular domain(s), description, locus, GI number, or internal MiST identifer. First select two or more organisms via the taxonomy selector. This next page, the select analysis page, displays the list of selected organisms in a tree-like list according to their taxonomy. Next, select the type of search to perform (e.g. Domain, description, etc.), fill in the search terms of interest, and push the search button. The query will be executed against each organism that was selected.

Please note the following:

All search terms are case-insensitive. Thus, the text 'RESPONSE_REG' is treated the same as 'response_reg' and 'ResPOnSe_rEg'
Only organisms that have checks in the checkboxes beside their name will be searched
Multiple keywords should be separated by spaces and are treated independently (except for domain searches, see below)
When performing domain searches:
- The boolean logic operators - AND, OR - may be used for more complex domain searches. For example, to search for the PAS domain and the conserved MCPsignal domain, type (without the quotes): 'PAS AND MCPsignal'
- By default, domain searches search both the pfam and SMART libraries. To limit this to either Pfam or SMART, prefix the query with the domain library name. For example, to specifically search for the Pfam response_reg domain, type (without the quotes): 'Pfam:Response_reg'

The GI number, description, and domain architecture of each protein match is displayed beneath its associated organism. To view more detailed information about a particular protein, click its GI number, which will take you to the view protein page for this protein. The domain name, start, stop, score, evalue, and significance for all predicted domains including overlapping or insignificant domains can be displayed by clicking the appropriate 'Details' link. Clicking the 'Details' link again will hide this information.

To carry out another search against this same organism set, use your browser's back button to return to the Select Analysis page and simply input your new query text.

View organism page

The View Organism page provides a general overview of an organism's genome, and its signal transduction network, along with the ability to search this organism for particular keywords. Specifically, there are four primary sections to understand: 1) the Genome summary, 2) Signal transduction profile, 3) Querying, and 4) the table of signal transduction proteins by replicon.

Genome summary

This section contains basic information about the organism's genome and taxonomic classification, and is displayed in the upper left. Just beneath the taxonomic information is a table containing the summary of the predicted two-component, and one-component systems. The number of two-component systems is based on the number of predicted response regulators as these imply a particular output response. This number is an estimate that is automatically determined and thus should not be taken as exact or necessarily accurate. Rules for distinguishing between phosphorelays or other more complex response regulator types (hybrid histidine kinases) have not been implemented in this calculation. Thus, for best results, it is advisable to ascertain the number of two-component systems from a manual inspection of all the predicted two-component proteins.

Signal transduction profile

This graph provides a qualitative overview of the different types of input and output characteristics of this bacterium and signaling machinery (e.g. response regulators, etc). It is important to note that this graph is based on domain counts, rather than protein counts, and thus the number in the graph will not necessarily correlate to the number of proteins containing that domain. For example, the graph may show 60 receiver domains for an organism that only contains 55 response regulator proteins. This discrepancy is due to such things as hybrid sensor kinases which contain transmitter and receiver domains. Consequently, this protein would contribute to both the transmitter and receiver counts on the graph.

Legend
Green	histidine kinase domains
Red	response regulator domains
Orange	input domain type
Blue	output domain type

Querying

Select the type of search to perform (e.g. Domain, description, etc.), fill in the search terms of interest, and push the search button. The query will be executed against each organism that was selected.

Please note the following:
- All search terms are case-insensitive. Thus, the text 'RESPONSE_REG' is treated the same as 'response_reg' and 'ResPOnSe_rEg'
- Only organisms that have checks in the checkboxes beside their name will be searched
- Multiple keywords should be separated by spaces and are treated independently (except for domain searches, see below)
- When performing domain searches:
  - The boolean logic operators - AND, OR - may be used for more complex domain searches. For example, to search for the PAS domain and the conserved MCPsignal domain, type (without the quotes): 'PAS AND MCPsignal'
  - By default, domain searches search both the pfam and SMART libraries. To limit this to either Pfam or SMART, prefix the query with the domain library name. For example, to specifically search for the Pfam response_reg domain, type (without the quotes): 'Pfam:Response_reg'
The GI number, description, and domain architecture of each protein match is displayed beneath its associated organism. To view more detailed information about a particular protein, click its GI number, which will take you to the view protein page for this protein. The domain name, start, stop, score, evalue, and significance for all predicted domains including overlapping or insignificant domains can be displayed by clicking the appropriate 'Details' link. Clicking the 'Details' link again will hide this information.

Table of signal transduction proteins by replicon

This table reveals the type and number of signal transduction proteins found on each replicon.
- To view all the signal transduction proteins contained on a particular replicon, click on the replicon name
- To view all proteins belonging to a particular class of signal transduction (e.g. One-component proteins), click on the desired class name
- To view the proteins on a particular replicon that belong to a particular class of signal transduction, click on the desired number within the table
Performing one of the above actions will present a list of proteins displayed similarly to the output from querying (details), except it also includes beneath the protein description a list of the input and output domains identified in this protein.

View protein page

This page may conceptually be broken into three sections: 1) protein/gene sequence information, 2) domain architecture, and 3) a chromosome view.

Protein/gene sequence information

The full RefSeq annotation is shown in this section along with various identification information. Protein information is shown on the left-hand side, and gene information is displayed on the right-hand side. Beside the italicized protein and gene labels is the MiST identifier. To obtain either the protein/gene sequence click on the appropriate 'Sequence' link. Clicking the sequence link again will hide the sequence.

Domain architecture

The visualized Pfam and SMART domain architecture's for this protein. Information about the predicted domains may be revealed by clicking the 'Details' link. Blue vertical boxes () represent transmembrane regions, red boxes () represent signal peptides, green boxes () represent coiled-coil regions, and pink boxes () represent low-complexity regions.

Chromosome view

A graphical representation (drawn to scale) of the genes surrounding the currently selected protein/gene. The current gene is centered on the image and drawn in blue. Neighboring genes are drawn in gray and may be viewed by clicking on the gray arrow. Hovering the mouse over a particular neighboring gene will display its location and MiST identifier.

The 'Return to query screen' link will take you to whatever page you were at before entering the View Protein page(s).

Bioinformatic tools used in MiST

Pfam version 19.0^*
SMART version 5.0^*
Phobius version 1.01 - signal peptide and transmembrane region prediction
Coils - coiled-coil prediction
Seg - low-complexity region prediction

^* Note: the HMMER software suite was used to search the Pfam and SMART domain libraries

Pfam and SMART domains used to predict signal transduction proteins

	Domain name	Source	Type	Function
1.	ACT	Pfam	input	Small-molecule binding
2.	Ada_Zn_binding	Pfam	input	Small-molecule binding
3.	AlkA_N	Pfam	input	Small-molecule binding
4.	AraC_binding	Pfam	input	Small-molecule binding
5.	Autoind_bind	Pfam	input	Small-molecule binding
6.	Cache_1	Pfam	input	Small-molecule binding
7.	Cache_2	Pfam	input	Small-molecule binding
8.	CHASE	Pfam	input	Small-molecule binding
9.	cNMP_binding	Pfam	input	Small-molecule binding
10.	Diacid_rec	Pfam	input	Small-molecule binding
11.	Fe_dep_repr_C	Pfam	input	Small-molecule binding
12.	FeoA	Pfam	input	Small-molecule binding
13.	FHA	Pfam	input	Small-molecule binding
14.	GAF	Pfam	input	Small-molecule binding
15.	HMA	Pfam	input	Small-molecule binding
16.	LysR_substrate	Pfam	input	Small-molecule binding
17.	NIT	Pfam	input	Small-molecule binding
18.	PAS	Pfam	input	Small-molecule binding
19.	PAS_2	Pfam	input	Small-molecule binding
20.	PAS_3	Pfam	input	Small-molecule binding
21.	PAS_4	Pfam	input	Small-molecule binding
22.	PAS	SMART	input	Small-molecule binding
23.	PAC	SMART	input	Small-molecule binding
24.	Peripla_BP_1	Pfam	input	Small-molecule binding
25.	Peripla_BP_2	Pfam	input	Small-molecule binding
26.	SBP_bac_3	Pfam	input	Small-molecule binding
27.	SIS	Pfam	input	Small-molecule binding
28.	STAS	Pfam	input	Small-molecule binding
29.	TetR_C	Pfam	input	Small-molecule binding
30.	TOBE	Pfam	input	Small-molecule binding
31.	V4R	Pfam	input	Small-molecule binding
32.	Aminotran_1_2	Pfam	input	Enzymatic
33.	Arch_ATPase	Pfam	input	Enzymatic
34.	Citrate_synt	Pfam	input	Enzymatic
35.	Cyanate_lyase	Pfam	input	Enzymatic
36.	EPSP_synthase	Pfam	input	Enzymatic
37.	FmdA_AmdA	Pfam	input	Enzymatic
38.	GATase_2	Pfam	input	Enzymatic
39.	Glucokinase	Pfam	input	Enzymatic
40.	Glycos_trans_3N	Pfam	input	Enzymatic
41.	Glyoxalase	Pfam	input	Enzymatic
42.	HEAT_PBS	Pfam	input	Enzymatic
43.	HEM4	Pfam	input	Enzymatic
44.	Nitroreductase	Pfam	input	Enzymatic
45.	NTP_transf_2	Pfam	input	Enzymatic
46.	NUDIX	Pfam	input	Enzymatic
47.	PALP	Pfam	input	Enzymatic
48.	Peptidase_M23	Pfam	input	Enzymatic
49.	peroxidase	Pfam	input	Enzymatic
50.	PfkB	Pfam	input	Enzymatic
51.	Pribosyltran	Pfam	input	Enzymatic
52.	PTS_EIIC	Pfam	input	Enzymatic
53.	PTS-HPr	Pfam	input	Enzymatic
54.	Pyr_redox	Pfam	input	Enzymatic
55.	Rhodanese	Pfam	input	Enzymatic
56.	SKI	Pfam	input	Enzymatic
57.	CBS	Pfam	input	Protein-protein interaction
58.	HAMP	Pfam	input	Protein-protein interaction
59.	TPR_1	Pfam	input	Protein-protein interaction
60.	TPR_2	Pfam	input	Protein-protein interaction
61.	TPR_3	Pfam	input	Protein-protein interaction
62.	TPR_4	Pfam	input	Protein-protein interaction
63.	BLUF	Pfam	input	Cofactor binding
64.	Fer4	Pfam	input	Cofactor binding
65.	FeS	Pfam	input	Cofactor binding
66.	Hemerythrin	Pfam	input	Cofactor binding
67.	HhH-GPD	Pfam	input	Cofactor binding
68.	NIR_SIR	Pfam	input	Cofactor binding
69.	NIR_SIR_ferr	Pfam	input	Cofactor binding
70.	Nitro_FeMo-Co	Pfam	input	Cofactor binding
71.	Phytochrome	Pfam	input	Cofactor binding
72.	CHASE2	Pfam	input	Unknown function
73.	CHASE3	Pfam	input	Unknown function
74.	CHASE4	Pfam	input	Unknown function
75.	MASE1	Pfam	input	Unknown function
76.	MASE2	Pfam	input	Unknown function
77.	MHYT	Pfam	input	Unknown function
78.	TrkA_C	Pfam	input	Unknown function
79.	Arc	Pfam	output	DNA-binding
80.	Arg_repressor	Pfam	output	DNA-binding
81.	AsnC_trans_reg	Pfam	output	DNA-binding
82.	Crp	Pfam	output	DNA-binding
83.	CtsR	Pfam	output	DNA-binding
84.	DeoR	Pfam	output	DNA-binding
85.	Fe_dep_repress	Pfam	output	DNA-binding
86.	GerE	Pfam	output	DNA-binding
87.	GntR	Pfam	output	DNA-binding
88.	HTH_AraC	Pfam	output	DNA-binding
89.	HTH_1	Pfam	output	DNA-binding
90.	HTH_3	Pfam	output	DNA-binding
91.	HTH_5	Pfam	output	DNA-binding
92.	HTH_6	Pfam	output	DNA-binding
93.	HTH_7	Pfam	output	DNA-binding
94.	HTH_8	Pfam	output	DNA-binding
95.	HTH_10	Pfam	output	DNA-binding
96.	HTH_11	Pfam	output	DNA-binding
97.	HTH_12	Pfam	output	DNA-binding
98.	IclR	Pfam	output	DNA-binding
99.	LacI	Pfam	output	DNA-binding
100.	LytTR	Pfam	output	DNA-binding
101.	MarR	Pfam	output	DNA-binding
102.	MerR	Pfam	output	DNA-binding
103.	PadR	Pfam	output	DNA-binding
104.	ROS_MUCR	Pfam	output	DNA-binding
105.	TetR_N	Pfam	output	DNA-binding
106.	Trans_reg_C	Pfam	output	DNA-binding
107.	EAL	Pfam	output	Di-guanylate cyclase
108.	GGDEF	Pfam	output	Di-guanylate cyclase
109.	ANTAR	Pfam	output	RNA-binding
110.	CsrA	Pfam	output	RNA-binding
111.	PP2C_SIG	SMART	output	Phosphatase
112.	Pkinase	Pfam	output	Protein kinase
113.	HD	Pfam	output	Hydrolase
114.	Guanylate_cyc	Pfam	output	Other
115.	LytR_cpsA_psr	Pfam	output	Other
116.	Rrf2	Pfam	output	Other
117.	RseA_N	Pfam	output	Other
118.	HATPase_c	Pfam	transmitter	Histidine kinase
119.	HWE_HK	Pfam	transmitter	Histidine kinase
120.	HisKA	Pfam	transmitter	Histidine kinase
121.	HisKA_2	Pfam	transmitter	Histidine kinase
122.	HisKA_3	Pfam	transmitter	Histidine kinase
123.	Response_reg	Pfam	receiver	Response regulator
124.	MCPsignal	Pfam	chemotaxis	MCP
125.	CheB_methylest	Pfam	chemotaxis	Chemotaxis
126.	CheC	Pfam	chemotaxis	Chemotaxis
127.	CheD	Pfam	chemotaxis	Chemotaxis
128.	CheW	Pfam	chemotaxis	Chemotaxis
129.	CheR	Pfam	chemotaxis	Chemotaxis
130.	CheR_N	Pfam	chemotaxis	Chemotaxis
131.	CheZ	Pfam	chemotaxis	Chemotaxis
132.	HATPase_c	SMART	transmitter	Histidine kinase
133.	REC	SMART	receiver	Response regulator