BLASTX+BEAUTY Search Results

WU-BLAST 2.0 search of the National Center for Biotechnology Information's NR Protein Database.

BEAUTY post-processing provided by the Human Genome Sequencing Center, Baylor College of Medicine.

BEAUTY Reference:
Worley KC, Culpepper P, Wiese BA, Smith RF. BEAUTY-X: enhanced BLAST searches for DNA queries. Bioinformatics 1998;14(10):890-1. Abstract

Worley KC, Wiese BA, Smith RF. BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results. Genome Res 1995 Sep;5(2):173-84 Abstract




processing output: cycle 1 cycle 2 cycle 3 cycle 4

Repeat sequence:

   SW  perc perc perc  query                position in query    matching repeat         position in  repeat
score  div. del. ins.  sequence             begin  end (left)    repeat   class/family   begin  end (left)  ID

   32   2.6  0.0  0.0  'E12E12_I12_10.ab1'    368  406   (76) +  AT_rich  Low_complexity     1   39    (0)      

Alignments:

32  2.56  0.00  0.00  'E12E12_I12_10.ab1'  368  406  (76)  AT_rich#Low_complexity  1  39  (261)  5

  'E12E12_I12_10.    368 TTTTTTTAATTTATTTTTTAAAACTATATAATAATTTAA 406    
                         vvvvv v  v v v vv vv   vvvv v vv   vvv 
  AT_rich#Low_com      1 AAAAATAAAATAAATAATATAAAAATAAAATAAATAATA 39

Transitions / transversions = 0.00 (0 / 23)
Gap_init rate = 0.00 (0 / 39), avg. gap size = 0.00 (0 / 0)  

Masked Sequence:

>'E12E12_I12_10.ab1'
TCTTACGGCCGGGGCGGTTTGAGAAGCTGGTGTACAAAGNATTCAAAGAA
AGGGATTAGAAGGGATCGAAAATATGCAGTGCAACCAAGGAGTTTGTATG
CAACTGGGGTTCACATGTCACAAAACCTTGCAGGAAAGACCACTCACAAA
ACTGAAGGGAAAATAATGTACTACCACTATCATGGAACCATAGCTGAAAA
GAGAGAATCATGTAAAATGCTTATAAATTCAACAGAAATCACATATGACA
AAACCCCCTATGTGTTGGACACCACCATGAGGGACATTGCTGGTGTGATC
AAGAAATTTGAGCTCAAGATGATTGGAGACAGGCTACAACACAAGACACG
GCAATGACACTGCCAACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNCTTGTATTTTAGAGTAGAAATTCATTGAGCGGTTTTTTTTTTTN
TTTCCTGTTNGATAGTTTNCTTTTTTTTTTCT

Summary:

==================================================
file name: /repeatmasker/tmp/RM2seq
sequences:            1
total length:       482 bp
GC level:         36.61 %
bases masked:        39 bp (  8.09 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:                0            0 bp     0.00 %
      ALUs            0            0 bp     0.00 %
      MIRs            0            0 bp     0.00 %

LINEs:                0            0 bp     0.00 %
      LINE1           0            0 bp     0.00 %
      LINE2           0            0 bp     0.00 %
      L3/CR1          0            0 bp     0.00 %

LTR elements:         0            0 bp     0.00 %
      MaLRs           0            0 bp     0.00 %
      ERVL            0            0 bp     0.00 %
      ERV_classI      0            0 bp     0.00 %
      ERV_classII     0            0 bp     0.00 %

DNA elements:         0            0 bp     0.00 %
      MER1_type       0            0 bp     0.00 %
      MER2_type       0            0 bp     0.00 %

Unclassified:         0            0 bp     0.00 %

Total interspersed repeats:        0 bp     0.00 %


Small RNA:            0            0 bp     0.00 %

Satellites:           0            0 bp     0.00 %
Simple repeats:       0            0 bp     0.00 %
Low complexity:       1           39 bp     8.09 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element

The sequence(s) were assumed to be of primate origin.
RepeatMasker version 07/16/2000               default
ProcessRepeats version 07/16/2000
Repbase version 03/31/2000


Reference:  Gish, Warren (1994-1997).  unpublished.
Gish, Warren and David J. States (1993).  Identification of protein coding
regions by database similarity search.  Nat. Genet. 3:266-72.

Notice: statistical significance is estimated under the assumption that the equivalent of one entire reading frame in the query sequence codes for protein and that significant alignments will involve only coding reading frames.

Query= 'E12E12_I12_10.ab1' (482 letters)

  Translating both strands of query sequence in all 6 reading frames

Database: nr 625,274 sequences; 197,782,623 total letters.



     Observed Numbers of Database Sequences Satisfying
    Various EXPECTation Thresholds (E parameter values)

        Histogram units:      = 3 Sequences     : less than 3 sequences

 EXPECTation Threshold
 (E parameter)
    |
    V   Observed Counts-->
  10000 746 162 |======================================================
   6310 584  92 |==============================
   3980 492 128 |==========================================
   2510 364 137 |=============================================
   1580 227  82 |===========================
   1000 145  64 |=====================
    631  81  21 |=======
    398  60  24 |========
    251  36   7 |==
    158  29   3 |=
    100  26   4 |=
   63.1  22   1 |:
   39.8  21   4 |=
   25.1  17   8 |==
   15.8   9   1 |:
 >>>>>>>>>>>>>>>>>>>>>  Expect = 10.0, Observed = 8  <<<<<<<<<<<<<<<<<
   10.0   8   0 |
   6.31   8   3 |=
   3.98   5   0 |
   2.51   5   1 |:


                                                                     Smallest
                                                                       Sum
                                                     Reading  High  Probability
Sequences producing High-scoring Segment Pairs:        Frame Score  P(N)      N
gi|8953765|dbj|BAA98120.1|(AB024024) gene_id:K15C23.1... +1   357  1.2e-31   1
gi|7452443|pir||T05325hypothetical protein F1C12.90 -... +1   346  1.7e-30   1
gi|6041838|gb|AAF02147.1|AC009853_7(AC009853) unknown... +1   131  9.8e-08   1
gi|11358255|pir||T50520hypothetical protein T27I15_80... +1   131  9.8e-08   1
gi|4165838|gb|AAD08769.1|(U86203) envelope glycoprote... +3    64  0.87      1
gi|4165840|gb|AAD08770.1|(U86204) envelope glycoprote... +3    61  0.990     1
gi|4165844|gb|AAD08772.1|(U86206) envelope glycoprote... +3    61  0.990     1
gi|4165848|gb|AAD08774.1|(U86208) envelope glycoprote... +3    61  0.990     1

Use the and icons to retrieve links to Entrez:

E = Retrieve Entrez links (e.g., Medline abstracts, FASTA-formatted sequence reports).
R = Retrieve links to Related sequences (neighbors).
Use the icon (if present) to retrieve links to the Sequence Retrieval System (SRS).
Use the icon (if present) to retrieve links to the Ligand Enzyme and Chemical Compound Database .
Use the icon (if present) to retrieve links to the Protein Data Bank database.


to_Entrezto_Relatedto_Related >gi|8953765|dbj|BAA98120.1|  (AB024024) gene_id:K15C23.12~pir||T05325~strong
            similarity to unknown protein [Arabidopsis thaliana]
            Length = 519

Frame  1 hits (HSPs):                                         ____________
                        __________________________________________________
Database sequence:     |              |             |              |      | 519
                       0            150           300            450

  Plus Strand HSPs:

 Score = 357 (125.7 bits), Expect = 1.2e-31, P = 1.2e-31
 Identities = 67/118 (56%), Positives = 87/118 (73%), Frame = +1

Query:     1 SYGRGGLRSWCTKXSKKGIRRDRKYAVQPRSLYATGVHMSQNLAGKTTHKTEGKIMYYHY 180
             +Y + G      +  KK  RRDRKYAVQPR+++ATGVHMSQ+L GKT H+ EGKI Y+HY
Sbjct:   403 TYRKWGFEKLAYRDVKKVPRRDRKYAVQPRNVFATGVHMSQHLQGKTYHRAEGKIRYFHY 462

Query:   181 HGTIAEKRESCKMLINSTEITYDKTPYVLDTTMRDIAGVIKKFELKMIGDRLQHKTRQ 354
             HG+I+++RE C+ L N T I ++  PYVLDTTMRDI   +K FE++ IGDRL  +TRQ
Sbjct:   463 HGSISQRREPCRHLYNGTRIVHENNPYVLDTTMRDIGLAVKTFEIRTIGDRLL-RTRQ 519


to_Entrezto_Relatedto_Related >gi|7452443|pir||T05325  hypothetical protein F1C12.90 - Arabidopsis thaliana
            >gi|2982434|emb|CAA18242.1| (AL022224) putative protein
            [Arabidopsis thaliana] >gi|7268812|emb|CAB79017.1| (AL161552)
            putative protein [Arabidopsis thaliana]
            Length = 504

Frame  1 hits (HSPs):                                         ____________
                        __________________________________________________
Database sequence:     |              |              |              |     | 504
                       0            150            300            450

  Plus Strand HSPs:

 Score = 346 (121.8 bits), Expect = 1.7e-30, P = 1.7e-30
 Identities = 63/118 (53%), Positives = 86/118 (72%), Frame = +1

Query:     1 SYGRGGLRSWCTKXSKKGIRRDRKYAVQPRSLYATGVHMSQNLAGKTTHKTEGKIMYYHY 180
             +Y + G+     +  KK  RRDRKYAVQP +++ATGVHMSQNL GKT HK E KI Y+HY
Sbjct:   388 TYRKWGIEKLAYRDVKKVPRRDRKYAVQPENVFATGVHMSQNLQGKTYHKAESKIRYFHY 447

Query:   181 HGTIAEKRESCKMLINSTEITYDKTPYVLDTTMRDIAGVIKKFELKMIGDRLQHKTRQ 354
             HG+I+++RE C+ L N + + ++ TPYVLDTT+ D+   ++ FEL+ IGDRL  +TRQ
Sbjct:   448 HGSISQRREPCRQLFNDSRVVFENTPYVLDTTICDVGLAVRTFELRTIGDRLL-RTRQ 504


to_Entrezto_Relatedto_Related >gi|6041838|gb|AAF02147.1|AC009853_7  (AC009853) unknown protein [Arabidopsis
            thaliana]
            Length = 102

Frame  1 hits (HSPs):                             ________________________
                        __________________________________________________
Database sequence:     |         |         |        |         |         | | 102
                       0        20        40       60        80       100

  Plus Strand HSPs:

 Score = 131 (46.1 bits), Expect = 9.8e-08, P = 9.8e-08
 Identities = 26/47 (55%), Positives = 33/47 (70%), Frame = +1

Query:     1 SYGRGGLRSWCTKXSKKGIRRDRKYAVQPRSLYATGVHMSQNLAGKT 141
             +Y + G+     +  KK  RRDRKYAVQP +++A GVHMSQNL GKT
Sbjct:    56 TYRKWGIEKLAYRDVKKVPRRDRKYAVQPENVFAIGVHMSQNLQGKT 102


to_Entrezto_Relatedto_Related >gi|11358255|pir||T50520  hypothetical protein T27I15_80 - Arabidopsis thaliana
            >gi|8388615|emb|CAB94135.1| (AL358732) putative protein
            [Arabidopsis thaliana] >gi|8777409|dbj|BAA96999.1| (AB023039)
            gb|AAF02147.1~gene_id:MIF21.8~similar to unknown protein
            [Arabidopsis thaliana]
            Length = 102

Frame  1 hits (HSPs):                             ________________________
                        __________________________________________________
Database sequence:     |         |         |        |         |         | | 102
                       0        20        40       60        80       100

  Plus Strand HSPs:

 Score = 131 (46.1 bits), Expect = 9.8e-08, P = 9.8e-08
 Identities = 26/47 (55%), Positives = 33/47 (70%), Frame = +1

Query:     1 SYGRGGLRSWCTKXSKKGIRRDRKYAVQPRSLYATGVHMSQNLAGKT 141
             +Y + G+     +  KK  RRDRKYAVQP +++A GVHMSQNL GKT
Sbjct:    56 TYRKWGIEKLAYRDVKKVPRRDRKYAVQPENVFAIGVHMSQNLQGKT 102


to_Entrezto_Relatedto_Related >gi|4165838|gb|AAD08769.1|  (U86203) envelope glycoprotein [Human
            immunodeficiency virus type 1]
            Length = 73

Frame  3 hits (HSPs):       _____________________________                 
                        __________________________________________________
Database sequence:     |             |            |             |         | 73
                       0            20           40            60

  Plus Strand HSPs:

 Score = 64 (22.5 bits), Expect = 2.1, P = 0.87
 Identities = 11/41 (26%), Positives = 22/41 (53%), Frame = +3

Query:    63 GSKICSATKEFVCNWGSHVTKPCRKDHSQN*RENNVLPLSW 185
             G  I ++ K  +    S+++  CR+   Q  +E N+ P++W
Sbjct:     8 GKNISNSGKNIIVTLNSNISMTCRRPWDQEVQELNIGPMAW 48


to_Entrezto_Relatedto_Related >gi|4165840|gb|AAD08770.1|  (U86204) envelope glycoprotein [Human
            immunodeficiency virus type 1]
            Length = 73

Frame  3 hits (HSPs):       _____________________________                 
                        __________________________________________________
Database sequence:     |             |            |             |         | 73
                       0            20           40            60

  Plus Strand HSPs:

 Score = 61 (21.5 bits), Expect = 4.6, P = 0.99
 Identities = 10/41 (24%), Positives = 22/41 (53%), Frame = +3

Query:    63 GSKICSATKEFVCNWGSHVTKPCRKDHSQN*RENNVLPLSW 185
             G  I ++ +  +    S+++  CR+   Q  +E N+ P++W
Sbjct:     8 GKNISNSGRNIIVTLNSNISMTCRRPWDQEVQELNIGPMAW 48


to_Entrezto_Relatedto_Related >gi|4165844|gb|AAD08772.1|  (U86206) envelope glycoprotein [Human
            immunodeficiency virus type 1]
            Length = 73

Frame  3 hits (HSPs):       _____________________________                 
                        __________________________________________________
Database sequence:     |             |            |             |         | 73
                       0            20           40            60

  Plus Strand HSPs:

 Score = 61 (21.5 bits), Expect = 4.6, P = 0.99
 Identities = 10/41 (24%), Positives = 22/41 (53%), Frame = +3

Query:    63 GSKICSATKEFVCNWGSHVTKPCRKDHSQN*RENNVLPLSW 185
             G  I ++ +  +    S+++  CR+   Q  +E N+ P++W
Sbjct:     8 GKNISNSGRNIIVTLNSNISMTCRRPWDQEVQELNIGPMAW 48


to_Entrezto_Relatedto_Related >gi|4165848|gb|AAD08774.1|  (U86208) envelope glycoprotein [Human
            immunodeficiency virus type 1]
            Length = 73

Frame  3 hits (HSPs):       _____________________________                 
                        __________________________________________________
Database sequence:     |             |            |             |         | 73
                       0            20           40            60

  Plus Strand HSPs:

 Score = 61 (21.5 bits), Expect = 4.6, P = 0.99
 Identities = 10/41 (24%), Positives = 22/41 (53%), Frame = +3

Query:    63 GSKICSATKEFVCNWGSHVTKPCRKDHSQN*RENNVLPLSW 185
             G  I ++ +  +    S+++  CR+   Q  +E N+ P++W
Sbjct:     8 GKNISNSGRNIIVTLNSNISMTCRRPWDQKVQELNIGPMAW 48


Parameters:
  filter=none
  matrix=BLOSUM62
  V=50
  B=50
  E=10
  gi
  H=1
  sort_by_pvalue
  echofilter

  ctxfactor=5.99

  Query                        -----  As Used  -----    -----  Computed  ----
  Frame  MatID Matrix name     Lambda    K       H      Lambda    K       H
   Std.    0   BLOSUM62                                 0.318   0.135   0.401  
   +3      0   BLOSUM62        0.318   0.135   0.401    0.350   0.152   0.579  
               Q=9,R=2         0.244   0.0300  0.180     n/a     n/a     n/a
   +2      0   BLOSUM62        0.318   0.135   0.401    0.353   0.156   0.545  
               Q=9,R=2         0.244   0.0300  0.180     n/a     n/a     n/a
   +1      0   BLOSUM62        0.318   0.135   0.401    0.338   0.148   0.478  
               Q=9,R=2         0.244   0.0300  0.180     n/a     n/a     n/a
   -1      0   BLOSUM62        0.318   0.135   0.401    0.356   0.158   0.567  
               Q=9,R=2         0.244   0.0300  0.180     n/a     n/a     n/a
   -2      0   BLOSUM62        0.318   0.135   0.401    0.348   0.148   0.537  
               Q=9,R=2         0.244   0.0300  0.180     n/a     n/a     n/a
   -3      0   BLOSUM62        0.318   0.135   0.401    0.349   0.156   0.555  
               Q=9,R=2         0.244   0.0300  0.180     n/a     n/a     n/a

  Query
  Frame  MatID  Length  Eff.Length     E    S W   T  X   E2     S2
   +3      0      160       143       10.  73 3  12 22  0.12    33
                                                    30  0.11    36
   +2      0      160       143       10.  73 3  12 22  0.12    33
                                                    30  0.11    36
   +1      0      160       142       10.  73 3  12 22  0.12    33
                                                    30  0.11    36
   -1      0      160       142       10.  73 3  12 22  0.12    33
                                                    30  0.11    36
   -2      0      160       143       10.  73 3  12 22  0.12    33
                                                    30  0.11    36
   -3      0      160       143       10.  73 3  12 22  0.12    33
                                                    30  0.11    36


Statistics:

  Database:  /usr/local/dot5/sl_home/beauty/seqdb/blast/nr
    Title:  nr
    Release date:  unknown
    Posted date:  4:06 PM CST Feb 28, 2001
    Format:  BLAST
  # of letters in database:  197,782,623
  # of sequences in database:  625,274
  # of database sequences satisfying E:  8
  No. of states in DFA:  593 (58 KB)
  Total size of DFA:  186 KB (192 KB)
  Time to generate neighborhood:  0.01u 0.00s 0.01t  Elapsed: 00:00:00
  No. of threads or processors used:  6
  Search cpu time:  138.04u 1.35s 139.39t  Elapsed: 00:00:24
  Total cpu time:  138.07u 1.37s 139.44t  Elapsed: 00:00:24
  Start:  Wed Jan 23 17:19:29 2002   End:  Wed Jan 23 17:19:53 2002

Annotated Domains Database: March 14, 2000
Release Date: March 14, 2000