WU-BLAST 2.0 search of the National Center for Biotechnology Information's NR Protein Database.
BEAUTY post-processing provided by the Human Genome Sequencing Center, Baylor College of Medicine.
BEAUTY Reference:
Worley KC, Culpepper P, Wiese BA, Smith RF. BEAUTY-X: enhanced BLAST searches for DNA queries. Bioinformatics 1998;14(10):890-1. Abstract
Worley KC, Wiese BA, Smith RF. BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results. Genome Res 1995 Sep;5(2):173-84 Abstract
RepeatMasker repeats found in sequence:No Repeats Found.Reference: Gish, Warren (1994-1997). unpublished. Gish, Warren and David J. States (1993). Identification of protein coding regions by database similarity search. Nat. Genet. 3:266-72.Notice: statistical significance is estimated under the assumption that the equivalent of one entire reading frame in the query sequence codes for protein and that significant alignments will involve only coding reading frames.
Query= 'E02H07_B12_04.ab1' (829 letters)
Translating both strands of query sequence in all 6 reading framesDatabase: nr 625,274 sequences; 197,782,623 total letters.Observed Numbers of Database Sequences Satisfying Various EXPECTation Thresholds (E parameter values) Histogram units: = 6 Sequences : less than 6 sequences EXPECTation Threshold (E parameter) | V Observed Counts--> 10000 1242 353 |========================================================== 6310 889 186 |=============================== 3980 703 195 |================================ 2510 508 147 |======================== 1580 361 123 |==================== 1000 238 79 |============= 631 159 41 |====== 398 118 37 |====== 251 81 26 |==== 158 55 17 |== 100 38 12 |== 63.1 26 7 |= 39.8 19 2 |: 25.1 17 5 |: 15.8 12 3 |: >>>>>>>>>>>>>>>>>>>>> Expect = 10.0, Observed = 9 <<<<<<<<<<<<<<<<< 10.0 9 2 |: 6.31 7 1 |: 3.98 6 1 |: 2.51 5 0 | 1.58 5 1 |: 1.00 4 2 |: Smallest Sum Reading High Probability Sequences producing High-scoring Segment Pairs: Frame Score P(N) N gi|11357983|pir||T48025hypothetical protein T12C14.30... +1 745 8.5e-73 1 gi|6498462|dbj|BAA87851.1|(AP000816) hypothetical pro... +1 713 2.1e-69 1 gi|7517337|pir||B72581hypothetical protein APES063 - ... +3 50 0.55 2 gi|6691188|gb|AAF24526.1|AC007534_7(AC007534) F7F22.1... +1 88 0.56 1 gi|9294244|dbj|BAB02146.1|(AP000411) copia retroeleme... +1 95 0.68 1 gi|1030731|emb|CAA32198.1|(X14037) polyprotein [Droso... +1 94 0.97 1 gi|85056|pir||S02021micropia polyprotein - fruit fly ... +1 94 0.99 1 gi|11358885|pir||T48160transcription factor GT-3a - A... +1 85 0.9993 1 gi|6683623|dbj|BAA89271.1|(AB025309) Gag [Alternaria ... +1 86 0.9997 1
Use the and icons to retrieve links to Entrez:
>gi|11357983|pir||T48025 hypothetical protein T12C14.30 - Arabidopsis thaliana >gi|7340704|emb|CAB82947.1| (AL162507) putative protein [Arabidopsis thaliana] Length = 479 Frame 1 hits (HSPs): _____________________ __________________________________________________ Database sequence: | | | | | 479 0 150 300 450 Plus Strand HSPs: Score = 745 (262.3 bits), Expect = 8.5e-73, P = 8.5e-73 Identities = 141/196 (71%), Positives = 163/196 (83%), Frame = +1 Query: 94 ANRPDPDIDDDFRELYKEYTGPLGTATTN-MQERAKSNK-RSNAGSDEEEEAR-DPNAVP 264 + R DP++DDDF E+YKEYTGP T N +Q++ K K RS DEEEE DPN+VP Sbjct: 3 STRSDPELDDDFSEIYKEYTGPASAVTNNNIQDKDKPVKQRSEERCDEEEEQLPDPNSVP 62 Query: 265 TDFTSREAKVWEAKSKATERNWKKRKEEEMICKLCGESGHFTQGCPSTLGANRKSQDFFE 444 TDFTSREAKVWEAKSKATERNWKKRKEEEMICK+CGESGHFTQGCPSTLGANRKSQ+FFE Sbjct: 63 TDFTSREAKVWEAKSKATERNWKKRKEEEMICKICGESGHFTQGCPSTLGANRKSQEFFE 122 Query: 445 RIPARDKNVRALFTEKVLSKIEKDVGCKIKMDEKFIIVSGKDRLILAKGVDAGHKIREEG 624 R+PARD NVR LFTEKV+ IE++ CKIK+DEKFIIVSGKDRLIL KGVDA HK++E+G Sbjct: 123 RVPARDNNVRVLFTEKVMESIERETSCKIKLDEKFIIVSGKDRLILRKGVDAVHKVKEDG 182 Query: 625 DQRGSSSSQMTQSRSP 672 + + SS S ++SRSP Sbjct: 183 EMKSSSVSHRSRSRSP 198 >gi|6498462|dbj|BAA87851.1| (AP000816) hypothetical protein [Oryza sativa] >gi|7106535|dbj|BAA92220.1| (AP001278) hypothetical protein [Oryza sativa] Length = 460 Frame 1 hits (HSPs): ______________________ __________________________________________________ Database sequence: | | | | | 460 0 150 300 450 Plus Strand HSPs: Score = 713 (251.0 bits), Expect = 2.1e-69, P = 2.1e-69 Identities = 139/199 (69%), Positives = 160/199 (80%), Frame = +1 Query: 91 MANRPDPDIDDD-FRELY-KEYTGPLGTATTNMQERAKSNKR--SNAGSDEEEEARDPNA 258 MA P P+IDD+ F E+Y K Y+GP+ T T N+ R KR SDEE+ DPNA Sbjct: 1 MAREPSPEIDDELFNEVYGKAYSGPVATTTNNVTPRVNDEKRPLEREKSDEEDGPPDPNA 60 Query: 259 VPTDFTSREAKVWEAKSKATERNWKKRKEEEMICKLCGESGHFTQGCPSTLGANRKSQDF 438 VPTDFTSREAKVWEAK+KATERNWKKRKEEEMICK+CGESGHFTQGCPSTLGANR++ DF Sbjct: 61 VPTDFTSREAKVWEAKAKATERNWKKRKEEEMICKICGESGHFTQGCPSTLGANRRNADF 120 Query: 439 FERIPARDKNVRALFTEKVLSKIEKDVGCKIKMDEKFIIVSGKDRLILAKGVDAGHKIRE 618 FER+PARDK VR LFTE+ +S+IEKDVGCKIKMDEKF+ VSGKDRLILAKGVDA HKI + Sbjct: 121 FERVPARDKQVRDLFTERTISQIEKDVGCKIKMDEKFLFVSGKDRLILAKGVDAVHKIIQ 180 Query: 619 EGDQRGSSSS-QMTQSRSP 672 EG + +SSS + + RSP Sbjct: 181 EGKGKNTSSSPKRDRLRSP 199 >gi|7517337|pir||B72581 hypothetical protein APES063 - Aeropyrum pernix (strain K1) >gi|5105622|dbj|BAA80935.1| (AP000062) 64aa long hypothetical protein [Aeropyrum pernix] Length = 64 Frame 3 hits (HSPs): _______________________ Frame 2 hits (HSPs): ___________________ __________________________________________________ Database sequence: | | | | | 64 0 20 40 60 Plus Strand HSPs: Score = 50 (17.6 bits), Expect = 0.80, Sum P(2) = 0.55 Identities = 12/29 (41%), Positives = 16/29 (55%), Frame = +3 Query: 678 SPVSARFHRSEPKGLILTRNTSRFTKVGR 764 S V+AR HR+ L+L R T+ GR Sbjct: 35 SLVTARHHRAVNTSLLLAHTARRSTRGGR 63 Score = 40 (14.1 bits), Expect = 0.80, Sum P(2) = 0.55 Identities = 8/24 (33%), Positives = 15/24 (62%), Frame = +2 Query: 503 RLKRMLAAKLRWMRSLLLSVVRID 574 R ++ L +L W R L L++V ++ Sbjct: 11 RGRQSLKPRLSWDRGLQLALVNVE 34 >gi|6691188|gb|AAF24526.1|AC007534_7 (AC007534) F7F22.12 [Arabidopsis thaliana] Length = 169 Frame 1 hits (HSPs): _____________________________ __________________________________________________ Database sequence: | | | | | 169 0 50 100 150 Plus Strand HSPs: Score = 88 (31.0 bits), Expect = 0.81, P = 0.56 Identities = 24/93 (25%), Positives = 42/93 (45%), Frame = +1 Query: 136 LYKEYTGPLGTAT-TNMQERAKSNKRSNAGSDEEEEARDPNAVPTD-FTSREAKVWEAKS 309 L Y G + T +N +E+ + N A D+E E N + + +R V + + Sbjct: 41 LPSRYDGLVETMKYSNSREKLRLNDVMVAARDKEREMSQNNRLIAEGHYARRRPVGKNNN 100 Query: 310 ---KATERNWKKRKEEEMICKLCGESGHFTQGC 399 K R+W K + + +C +CG+ HF + C Sbjct: 101 QGNKGKNRSWSKSADGKRVCWICGKEKHFNEQC 133 >gi|9294244|dbj|BAB02146.1| (AP000411) copia retroelement pol polyprotein-like [Arabidopsis thaliana] Length = 526 Frame 1 hits (HSPs): _____________________ __________________________________________________ Database sequence: | | | | | 526 0 150 300 450 Plus Strand HSPs: Score = 95 (33.4 bits), Expect = 1.1, P = 0.68 Identities = 52/217 (23%), Positives = 86/217 (39%), Frame = +1 Query: 43 SSHPHRVQFS*DIY-FLMANRPDPDID-DDFRELYKEYTGPLGTATTN--MQERAKSNKR 210 +S P+R+ Y + M + D + DDF +L + +G T ++E K +K Sbjct: 122 TSLPNRIYLHLKFYTYKMTDSKSIDGNVDDFLKLVTDLNN-IGVNVTKERIKESGKLSKT 180 Query: 211 SNAGSDEEEEARDPNAVPTDFTSREAKVWEAKSKATERNWKKR---KEEEMICKLCGESG 381 + G E R + F + K W +SK+ R+ K R + C +C G Sbjct: 181 QSEGLYVETRGR----LEKRFDKGKGKPWRGRSKSKGRS-KSRPNYNKNNNGCFICRREG 235 Query: 382 HFTQGCPSTLGANRKSQDFFERIPARDKNVRALFTEKVLSKIEK--DVGCKIKMDEKFII 555 H+ + CP +N+ S I K L T +K E D GC F I Sbjct: 236 HWKRECPEK-SSNKPSSS--ANIAVEPKQPLVLTTSPQYTKEESVVDSGCS------FHI 286 Query: 556 VSGKDRLILAKGVDAGHKIREEGDQRGSSSSQMTQSRSPEEVLLVL 693 KD + D G + R + +P++ +++L Sbjct: 287 TPNKDSPFGLQEFDGGKVLMGNMTHREVKGIGKIKILNPDDYVVIL 332 >gi|1030731|emb|CAA32198.1| (X14037) polyprotein [Drosophila melanogaster] Length = 1053 Frame 1 hits (HSPs): ________ __________________________________________________ Database sequence: | | | | | | | || 1053 0 150 300 450 600 750 900 1050 Plus Strand HSPs: Score = 94 (33.1 bits), Expect = 3.4, P = 0.97 Identities = 38/146 (26%), Positives = 66/146 (45%), Frame = +1 Query: 187 ERAKSNKRSNAGSDEEEEARDPNAVPTDFTSREAK-VWEAKSKATE-RNWKKRKEEEMI- 357 ++ + + N G D++ P V F S+ + + E +SK + R K ++E+ + Sbjct: 187 DKKRHARDDNLGPDQKNRKASP--VVCHFCSKPGRRIAECRSKMRQDRRAKPQREKSNVT 244 Query: 358 CKLCGESGHFTQGCPSTLGANRKSQDFFERIPARDKNVR----ALFTEKVLSKIEKDVG- 522 C CG+ GHF+ CP G K QD ++ V +L + I D G Sbjct: 245 CYRCGQPGHFSNQCPKN-GTAAK-QDVTQQKTVNQCCVTEPKGSLHQRGEIYPICFDSGA 302 Query: 523 -CKIKMDEKFIIVSGK--DRLILAKGVDAG 603 C + D+ +SGK + ++ KG+ G Sbjct: 303 ECSLIKDDISSKLSGKRINNTVMIKGIGGG 332 >gi|85056|pir||S02021 micropia polyprotein - fruit fly (Drosophila melanogaster) (fragment) Length = 1291 Frame 1 hits (HSPs): ______ __________________________________________________ Database sequence: | | | | 1291 0 500 1000 Plus Strand HSPs: Score = 94 (33.1 bits), Expect = 4.2, P = 0.99 Identities = 38/146 (26%), Positives = 66/146 (45%), Frame = +1 Query: 187 ERAKSNKRSNAGSDEEEEARDPNAVPTDFTSREAK-VWEAKSKATE-RNWKKRKEEEMI- 357 ++ + + N G D++ P V F S+ + + E +SK + R K ++E+ + Sbjct: 187 DKKRHARDDNLGPDQKNRKASP--VVCHFCSKPGRRIAECRSKMRQDRRAKPQREKSNVT 244 Query: 358 CKLCGESGHFTQGCPSTLGANRKSQDFFERIPARDKNVR----ALFTEKVLSKIEKDVG- 522 C CG+ GHF+ CP G K QD ++ V +L + I D G Sbjct: 245 CYRCGQPGHFSNQCPKN-GTAAK-QDVTQQKTVNQCCVTEPKGSLHQRGEIYPICFDSGA 302 Query: 523 -CKIKMDEKFIIVSGK--DRLILAKGVDAG 603 C + D+ +SGK + ++ KG+ G Sbjct: 303 ECSLIKDDISSKLSGKRINNTVMIKGIGGG 332 >gi|11358885|pir||T48160 transcription factor GT-3a - Arabidopsis thaliana >gi|6573264|gb|AAF17610.1|AF206715_1 (AF206715) transcription factor GT-3a [Arabidopsis thaliana] >gi|7320716|emb|CAB81921.1| (AL161746) transcription factor GT-3a [Arabidopsis thaliana] Length = 323 Frame 1 hits (HSPs): ___________________ __________________________________________________ Database sequence: | | | | | | | | 323 0 50 100 150 200 250 300 Plus Strand HSPs: Score = 85 (29.9 bits), Expect = 7.3, P = 1.0 Identities = 33/118 (27%), Positives = 53/118 (44%), Frame = +1 Query: 34 SSSSSHPHRVQFS*DIYFLMANRPDPDIDDDFRELY---KEYTGPLGTAT-TNMQERAKS 201 S+SS H QFS D + P+ DI+++ L K T + T+T TN ++RAK Sbjct: 152 STSSKRKHH-QFSSDDEEEEVDEPNQDINEELLSLVETQKRETEVITTSTSTNPRKRAKK 210 Query: 202 NKRSNAGSDEEEEARDPNAVPTDFTSREAKV-------WEAKS---KATERNWKKRKEE 348 K +G+ E + +F + K+ WE K + E+ W++R E Sbjct: 211 GKGVASGTKAETAGNTLKDILEEFMRQTVKMEKEWRDAWEMKEIEREKREKEWRRRMAE 269 >gi|6683623|dbj|BAA89271.1| (AB025309) Gag [Alternaria alternata] Length = 406 Frame 1 hits (HSPs): __________ __________________________________________________ Database sequence: | | | | 406 0 150 300 Plus Strand HSPs: Score = 86 (30.3 bits), Expect = 8.0, P = 1.0 Identities = 24/74 (32%), Positives = 32/74 (43%), Frame = +1 Query: 202 NKRSNAGSDEEEEARDPNAVPTDFTSREAKVWEAKSKATERNWK-KRKEEEMICKLCGES 378 N+RS A + P AV + S EA WE + R + K K + C CG+ Sbjct: 307 NQRSTAHDGAQNH---PRAVQRE-ASPEAMDWEPSKVSQARESRVKTKRAPLTCYSCGKP 362 Query: 379 GHFTQGCPSTLGANR 423 GH + C ST R Sbjct: 363 GHIARDCQSTTRVRR 377 Parameters: filter=none matrix=BLOSUM62 V=50 B=50 E=10 gi H=1 sort_by_pvalue echofilter ctxfactor=5.99 Query ----- As Used ----- ----- Computed ---- Frame MatID Matrix name Lambda K H Lambda K H Std. 0 BLOSUM62 0.318 0.135 0.401 +3 0 BLOSUM62 0.318 0.135 0.401 0.367 0.169 0.676 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a +2 0 BLOSUM62 0.318 0.135 0.401 0.348 0.153 0.496 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a +1 0 BLOSUM62 0.318 0.135 0.401 0.324 0.139 0.415 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a -1 0 BLOSUM62 0.318 0.135 0.401 0.359 0.164 0.630 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a -2 0 BLOSUM62 0.318 0.135 0.401 0.357 0.161 0.552 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a -3 0 BLOSUM62 0.318 0.135 0.401 0.340 0.147 0.469 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a Query Frame MatID Length Eff.Length E S W T X E2 S2 +3 0 275 275 10. 79 3 12 22 0.097 36 33 0.12 39 +2 0 276 276 10. 79 3 12 22 0.098 36 33 0.12 39 +1 0 276 276 10. 79 3 12 22 0.098 36 33 0.12 39 -1 0 276 276 10. 79 3 12 22 0.098 36 33 0.12 39 -2 0 276 276 10. 79 3 12 22 0.098 36 33 0.12 39 -3 0 275 275 10. 79 3 12 22 0.097 36 33 0.12 39 Statistics: Database: /usr/local/dot5/sl_home/beauty/seqdb/blast/nr Title: nr Release date: unknown Posted date: 4:06 PM CST Feb 28, 2001 Format: BLAST # of letters in database: 197,782,623 # of sequences in database: 625,274 # of database sequences satisfying E: 9 No. of states in DFA: 595 (59 KB) Total size of DFA: 257 KB (320 KB) Time to generate neighborhood: 0.02u 0.00s 0.02t Elapsed: 00:00:00 No. of threads or processors used: 6 Search cpu time: 307.93u 0.98s 308.91t Elapsed: 00:02:21 Total cpu time: 307.98u 0.98s 308.96t Elapsed: 00:02:21 Start: Wed Jan 16 19:57:04 2002 End: Wed Jan 16 19:59:25 2002
Annotated Domains Database: March 14, 2000
Release Date: March 14, 2000