Electronic Notebook for Protein Sequence Analysis

The electronic notebook is a tutorial and analysis web-form consisting of a set of links to DNA and protein analysis tools combined with areas into which results and personal notes can be recorded. All the analysis tools open into a second "tools" window from which the results can be transferred into the notebook. These results and notes can be saved to a local file using the "Save the Notebook" buttons found throughout the notebook. The "Cheat now!" links open a third window in which a complete set of results have already been recorded. The electronic notebook can also be used to analyze a new DNA sequence by substituting the new sequence for the original sequence found in the DNA sequence text area.







Start here with your DNA sequence

Initial DNA Sequence



DNA Sequence Notes



To identify any exons in the DNA sequence and generate a predicted protein sequence, click here:

GenScan


Paste your DNA sequence into the GenScan input window and press the "Run Genscan" button. Select the protein translation with the highest exon P-values and paste this FASTA formatted output into your notebook.
 

Protein Sequence from Genscan



Protein Sequence Notes



To scan the protein sequence for the occurrence of motifs/patterns found in the PROSITE database, use:

ScanProsite

Paste the raw (leave off the fasta defline) protein sequence from GenScan into the ScanProsite input box, choose to Exclude patterns with a high probability of occurrence, and press the "Start the Scan" button. Paste the ScanProsite hit into your notebook. To see the Prosite summary for the hit, click on the PDOCxxxx number.

Hit from ScanProsite




Prosite pattern




Prosite Summary



Prosite Notes



To search for proteins with similar sequences, use:


 

Run a BLASTp search against the SwissProt database by pasting the protein sequence from GenScan into the input box on the Advanced BLAST page.  Choose the SwissProt database from the database listbox and the "blastp" program from the program listbox, then press the "Submit" button. Format your results as "Flat query anchored with identities" and paste this alignment into your notebook.
 

BLASTP Alignment (against SwissProt)



BLASTP Alignment Notes



To search against the COGs database, click here:

COGs


 

Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing protein sequences encoded in 21 complete genomes, representing 17 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. Use the COGnitor to compare the protein sequence to the COGs database.

Paste the FASTA formatted protein sequence from GenScan into the COGnitor input box and press the "compare to COGs" button.   Click on the link to the highest-scoring COG and download the sequence alignment by clicking on the "# proteins" link in the upper left-hand corner of the page and paste the alignment into your notebook. Go back to the previous page and click on the disk icon to save the sequences in the COG to a local file on your desktop to be used as input to Multalin below. Drag this file from your desktop onto your "tools" browser window to display the sequences. Then copy and paste these into your notebook under "COGs FASTA Sequences".

 

COGs Alignment



COGs FASTA Sequences



COGs Alignment Notes



To generate a multiple sequence alignment, use:

Paste the sequences from your best-hit COG, saved in your "COGs FASTA Sequences" notebook area, into the input box of Multalin. Also paste in the protein sequence derived from GenScan to include your unknown sequence in this alignment and press the "Start Multalin!" button. Display these results in  text form by clicking on the "-Results as a text page (msf) " link.  Paste this Multalin display into your notebook.
 

Multalin Alignment



Multalin Alignment Notes



To search for protein domains and view a model structure for your protein, click here:


NCBI's Conserved Domain Search allows you to match your protein sequence to a library of conserved protein domains, generate a multiple sequence alignment based on this match, and explore 3D modeling templates for your sequence.
Paste your protein sequence from GenScan into the CD-Search query box and run the search. From the search results page, generate a multiple sequence alignment for the top 10 sequences representitive of the conserved domain hit. Paste this alignment into your notebook. Before viewing a structure with Cn3D, use the listbox to specify "up to 5" sequences. Invoke Cn3D with a display of a 3D modeling template, and a multiple sequence alignment including your query sequence, by pressing the "Redisplay Alignment" button. In the Cn3D Sequence Window, use the "Alignment/Hide or Show Rows" menu item to hide all but your query and the structurally-anchored sequence shown at the top of the alignment. Residues identical in your sequence and the structural template are shown in red. Locate the Prosite Motif you found earlier within the Cn3D alignment window. Save the Cn3D alignemt by exporting as text to a file on your desktop. Drag this file onto your "tools" window, then select and paste the alignment into your notebook.


CD-Search Domain Hits



CD-Search Alignment



Cn3D Alignment


Other Tools for DNA and Protein Sequence Analysis