WU-BLAST 2.0 search of the National Center for Biotechnology Information's NR Protein Database.
BEAUTY post-processing provided by the Human Genome Sequencing Center, Baylor College of Medicine.
BEAUTY Reference:
Worley KC, Culpepper P, Wiese BA, Smith RF. BEAUTY-X: enhanced BLAST searches for DNA queries. Bioinformatics 1998;14(10):890-1. Abstract
Worley KC, Wiese BA, Smith RF. BEAUTY: an enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results. Genome Res 1995 Sep;5(2):173-84 Abstract
RepeatMasker repeats found in sequence:No Repeats Found.Reference: Gish, Warren (1994-1997). unpublished. Gish, Warren and David J. States (1993). Identification of protein coding regions by database similarity search. Nat. Genet. 3:266-72.Notice: statistical significance is estimated under the assumption that the equivalent of one entire reading frame in the query sequence codes for protein and that significant alignments will involve only coding reading frames.
Query= SSH8A10.SEQ(1>636) (601 letters)
Translating both strands of query sequence in all 6 reading framesDatabase: nr 505,245 sequences; 158,518,215 total letters.Observed Numbers of Database Sequences Satisfying Various EXPECTation Thresholds (E parameter values) Histogram units: = 3 Sequences : less than 3 sequences EXPECTation Threshold (E parameter) | V Observed Counts--> 10000 866 184 |============================================================= 6310 682 167 |======================================================= 3980 515 95 |=============================== 2510 420 128 |========================================== 1580 292 121 |======================================== 1000 171 81 |=========================== 631 90 30 |========== 398 60 23 |======= 251 37 14 |==== 158 23 8 |== 100 15 2 |: 63.1 13 4 |= 39.8 9 2 |: 25.1 7 0 | 15.8 7 0 | >>>>>>>>>>>>>>>>>>>>> Expect = 10.0, Observed = 7 <<<<<<<<<<<<<<<<< 10.0 7 0 | 6.31 7 1 |: 3.98 6 0 | 2.51 6 0 | 1.58 6 0 | 1.00 6 0 | 0.63 6 0 | 0.40 6 0 | 0.25 6 0 | 0.16 6 0 | 0.10 6 0 | 0.063 6 1 |: 0.040 5 1 |: Smallest Sum Reading High Probability Sequences producing High-scoring Segment Pairs: Frame Score P(N) N gi|7484652|pir||T14580SIEP1L protein precursor - beet... +2 284 4.9e-24 1 gi|3834308|gb|AAC83024.1|(AC005679) Strong similarity... +2 179 1.4e-16 2 gi|480297|pir||S36638glycoprotein EP1 - carrot >gi|34... +2 213 4.9e-16 1 gi|3834309|gb|AAC83025.1|(AC005679) Strong similarity... +2 177 6.4e-12 1 gi|3834328|gb|AAC83044.1|(AC005679) Strong similarity... +2 100 0.028 2 gi|3834312|gb|AAC83028.1|(AC005679) Strong similarity... +2 102 0.052 1 gi|3757746|emb|CAA72808.1|(Y12091) BoNT protein [Clos... +2 63 0.99 1 Locally-aligned regions (HSPs) with respect to query sequence: Locus_ID Frame 3 Hits gi|3834308 | ___ gi|3834328 | ________ __________________________________________________ Query sequence: | | | | || 201 0 50 100 150 200 Locus_ID Frame 2 Hits gi|7484652 |____________________________________________ gi|3834308 |_________________________________ gi|480297 |__________________________________________ gi|3834309 |__________________________________ gi|3834328 | _________________ gi|3834312 | ______________________ gi|3757746 | _________ Prosite Hits: ___ __________________________________________________ Query sequence: | | | | || 201 0 50 100 150 200 __________________ Prosite hits: TYR_PHOSPHO_SITE Tyrosine kinase phosphorylation site. 160..167 __________________ Locus_ID Frame 1 Hits gi|3834308 | _____________ Prosite Hits: ____ __________________________________________________ Query sequence: | | | | || 201 0 50 100 150 200 __________________ Prosite hits: PROKAR_LIPOPROTEIN Prokaryotic membrane lipoprotein lipid a 141..151 __________________
Use the and icons to retrieve links to Entrez:
>gi|7484652|pir||T14580 SIEP1L protein precursor - beet >gi|1107526|emb|CAA61158.1| (X87931) SIEP1L protein [Beta vulgaris] Length = 391 Frame 2 hits (HSPs): _______________________ __________________________________________________ Database sequence: | | | | 391 0 150 300 Plus Strand HSPs: Score = 284 (100.0 bits), Expect = 4.9e-24, P = 4.9e-24 Identities = 71/174 (40%), Positives = 102/174 (58%), Frame = +2 Query: 5 YKSKNSPKPILYWFSSDWFTIQRGSLENVTFTSDPET-----FELGFDYHVANSSSGGNR 169 YKS NSPKP+LY+ D + + SL+ VTF+ PE +++ F Y + S GGN Sbjct: 213 YKSPNSPKPLLYFSMLD---LSKSSLKEVTFSCSPENDDNYAYDITFAYQSIDGSIGGNA 269 Query: 170 ILGRPVNNSTITYLRLGIDGNIRFYTYFLDVRDGVWQVTYTLFDRDS----DESECQLPE 337 + RP NST++ LRLGIDGN+R +TY V W+ T+TLF R+S ++ECQLPE Sbjct: 270 EIARPKYNSTLSILRLGIDGNLRVFTYSDKVDWAAWEATFTLFARNSPYGLSDTECQLPE 329 Query: 338 RCWEIWVVLKITNVLACPLEKWTT-WLEQQLHCQAC*HPAKLIISTTT--KLKDFEHYMS 508 RC + + + + +ACP K W + C P S T+ KL+ +HY+S Sbjct: 330 RCGKFGLC-EDSQCVACPTPKGLLGWSNK------CEQPKPSCGSKTSYYKLEGVDHYLS 382 Query: 509 QYLQ 520 L+ Sbjct: 383 SILK 386 >gi|3834308|gb|AAC83024.1| (AC005679) Strong similarity to glycoprotein EP1 gb|L16983 Daucus carota and a member of S locus glycoprotein family PF|00954. EST gb|AA720110 comes from this gene. [Arabidopsis thaliana] Length = 443 Frame 3 hits (HSPs): __ Frame 2 hits (HSPs): _______________ Frame 1 hits (HSPs): ______ __________________________________________________ Database sequence: | | | | 443 0 150 300 Plus Strand HSPs: Score = 179 (63.0 bits), Expect = 1.4e-16, Sum P(2) = 1.4e-16 Identities = 45/131 (34%), Positives = 71/131 (54%), Frame = +2 Query: 5 YKSKNSPKPILYWFSSDWFTIQRGSLENVTFTSDPETFELGFDYHVANSSSGG----NRI 172 Y + +PKPI Y + ++FT + L+++TF + E + + H+ SG + Sbjct: 206 YTTNKTPKPIGY-YEYEFFT-KIAQLQSMTFQA-VEDADTTWGLHMEGVDSGSQFNVSTF 262 Query: 173 LGRPVNNSTITYLRLGIDGNIRFYTYFLDVRDGVWQVTYTLFDRDSDES--ECQLPERCW 346 L RP +N+T+++LRL DGNIR ++Y W VTYT F D+ + EC++PE C Sbjct: 263 LSRPKHNATLSFLRLESDGNIRVWSYSTLATSTAWDVTYTAFTNDNTDGNDECRIPEHCL 322 Query: 347 EIWVVLKITNVLACP 391 + K ACP Sbjct: 323 GFGLCKK-GQCNACP 336 Score = 63 (22.2 bits), Expect = 1.4e-16, Sum P(2) = 1.4e-16 Identities = 19/49 (38%), Positives = 25/49 (51%), Frame = +1 Query: 343 LGNLGCVEDNQCFGL--SIGKMDYLVGATIALPSLLTSCQANYFHYYEIEG 489 LG G + QC IG + + T +PSL SC FHY++IEG Sbjct: 322 LG-FGLCKKGQCNACPSDIGLLGW--DETCKIPSL-ASCDPKTFHYFKIEG 368 Score = 40 (14.1 bits), Expect = 3.4e-14, Sum P(2) = 3.4e-14 Identities = 6/9 (66%), Positives = 6/9 (66%), Frame = +3 Query: 402 GLLGWSNNC 428 GLLGW C Sbjct: 340 GLLGWDETC 348 >gi|480297|pir||S36638 glycoprotein EP1 - carrot >gi|349437|gb|AAA33136.1| (L16983) N-glycosylation sites: (130..138), (244..252), (352..360), (734..742), (748..756), (865..873) [Daucus carota] Length = 389 Frame 2 hits (HSPs): _______________________ Annotated Domains: __________ __________________________________________________ Database sequence: | | | | 389 0 150 300 __________________ Annotated Domains: DOMO DM00234: 82..117 DOMO DM00234: 119..155 __________________ Plus Strand HSPs: Score = 213 (75.0 bits), Expect = 4.9e-16, P = 4.9e-16 Identities = 67/174 (38%), Positives = 93/174 (53%), Frame = +2 Query: 5 YKSKNSPKPILYWFSSDWFTIQRG-SLENVTFTSDPET-----FELGFDYHVANSSSGGN 166 YK SPKPI Y+ S + + + SL+NVTF + E F L Y +NS GG Sbjct: 207 YKPTTSPKPIRYYSFSLFTKLNKNESLQNVTFEFENENDQGFAFLLSLKYGTSNSL-GGA 265 Query: 167 RILGRPVNNSTITYLRLGIDGNIRFYTYFLDVRDGVWQVTYTLFDR-------------D 307 IL R N+T+++LRL IDGN++ YTY V G W+VTYTLF + + Sbjct: 266 SILNRIKYNTTLSFLRLEIDGNVKIYTYNDKVDYGAWEVTYTLFLKAPPPLFQVSLAATE 325 Query: 308 SDESECQLPERCWEIWVVLKITNVLACPLEKWTTWLEQQLHCQAC*HPAKLIISTTTKLK 487 S+ SECQLP++C + + + + CP L C+ P KL ++ K Sbjct: 326 SESSECQLPKKCGNFGLCEE-SQCVGCPTSSGPV-LAWSKTCE----PPKL---SSCGPK 376 Query: 488 DFEHY 502 DF HY Sbjct: 377 DF-HY 380 >gi|3834309|gb|AAC83025.1| (AC005679) Strong similarity to glycoprotein EP1 gb|L16983 Daucus carota and a member of S locus glycoprotein family PF|00954. ESTs gb|F13813, gb|T21052, gb|R30218 and gb|W43262 come from this gene. [Arabidopsis thaliana] Length = 441 Frame 2 hits (HSPs): ________________ __________________________________________________ Database sequence: | | | | 441 0 150 300 Plus Strand HSPs: Score = 177 (62.3 bits), Expect = 6.4e-12, P = 6.4e-12 Identities = 48/134 (35%), Positives = 73/134 (54%), Frame = +2 Query: 5 YKSKNSPKPILYWFSSDWFTIQRGSLENVTFTS--DPETFELGFDYHVANSSSGGN--RI 172 Y + +PKPI Y F ++FT + +++TF + D +T G +S S N Sbjct: 206 YTTNKTPKPIAY-FEYEFFT-KITQFQSMTFQAVEDSDT-TWGLVMEGVDSGSKFNVSTF 262 Query: 173 LGRPVNNSTITYLRLGIDGNIRFYTYFLDVRDGVWQVTYTLF-DRDSD-ESECQLPERCW 346 L RP +N+T++++RL DGNIR ++Y W VTYT F + D+D EC++PE C Sbjct: 263 LSRPKHNATLSFIRLESDGNIRVWSYSTLATSTAWDVTYTAFTNADTDGNDECRIPEHCL 322 Query: 347 EIWVVLKITNVLACPLEK 400 + K ACP +K Sbjct: 323 GFGLCKK-GQCNACPSDK 339 >gi|3834328|gb|AAC83044.1| (AC005679) Strong similarity to glycoprotein EP1 gb|L16983 Daucus carota and a member of S locus glycoprotein family PF|00954. [Arabidopsis thaliana] Length = 455 Frame 3 hits (HSPs): ____ Frame 2 hits (HSPs): ________ __________________________________________________ Database sequence: | | | || 455 0 150 300 450 Plus Strand HSPs: Score = 100 (35.2 bits), Expect = 0.028, Sum P(2) = 0.028 Identities = 23/63 (36%), Positives = 35/63 (55%), Frame = +2 Query: 155 SGGNRILGRPVN-NSTITYLRLGIDGNIRFYTYFLDVRDGVWQVTYTLFDRDSDESECQL 331 SGG + +N N TI+YLRLG DG+++ ++YF W+ T+ F + +C L Sbjct: 276 SGGGTLNLNKINYNGTISYLRLGSDGSLKAFSYFPAATYLEWEETFAFFS-NYFVRQCGL 334 Query: 332 PERC 343 P C Sbjct: 335 PTFC 338 Score = 41 (14.4 bits), Expect = 0.028, Sum P(2) = 0.028 Identities = 11/28 (39%), Positives = 13/28 (46%), Frame = +3 Query: 345 GKFGLC*R*PMFWLVHWKNGLLGWSNNC 428 G +G C R M GLL WS+ C Sbjct: 339 GDYGYCDR-GMCVGCPTPKGLLAWSDKC 365 >gi|3834312|gb|AAC83028.1| (AC005679) Strong similarity to glycoprotein EP1 gb|L16983 Daucus carota and a member of S locus glycoprotein family PF|00954. ESTs gb|AA067487, gb|Z35737, gb|Z30815, gb|Z35350, gb|AA713171, gb|AI100553, gb|Z34248, gb|AA728536, gb|Z30816 and gb|Z35351> Length = 455 Frame 2 hits (HSPs): __________ __________________________________________________ Database sequence: | | | || 455 0 150 300 450 Plus Strand HSPs: Score = 102 (35.9 bits), Expect = 0.053, P = 0.052 Identities = 28/82 (34%), Positives = 43/82 (52%), Frame = +2 Query: 155 SGGNRILGRPVN-NSTITYLRLGIDGNIRFYTYFLDVRDGVWQVTYTLFDRDSDESECQL 331 SGG + +N N TI+YLRLG DG+++ Y+YF W+ +++ F +C L Sbjct: 276 SGGGTLNLNKINYNGTISYLRLGSDGSLKAYSYFPAATYLKWEESFSFFSTYFVR-QCGL 334 Query: 332 PERCWEIWVVLK-ITNVLACPLEK 400 P C + + + N ACP K Sbjct: 335 PSFCGDYGYCDRGMCN--ACPTPK 356 >gi|3757746|emb|CAA72808.1| (Y12091) BoNT protein [Clostridium barati] Length = 49 Frame 2 hits (HSPs): ________________________________ __________________________________________________ Database sequence: | | | | 49 0 20 40 Plus Strand HSPs: Score = 63 (22.2 bits), Expect = 4.3, P = 0.99 Identities = 12/32 (37%), Positives = 22/32 (68%), Frame = +2 Query: 182 PVNNSTITYLRLGI--DGNIRFYTYFLDVRDGVW 277 P+NN+TI Y+++ D N ++Y F ++ D +W Sbjct: 13 PINNTTILYMKMPYYEDSN-KYYKAF-EIMDNIW 44 Parameters: filter=none matrix=BLOSUM62 V=50 B=50 E=10 gi H=1 sort_by_pvalue echofilter ctxfactor=5.98 Query ----- As Used ----- ----- Computed ---- Frame MatID Matrix name Lambda K H Lambda K H Std. 0 BLOSUM62 0.318 0.135 0.401 +3 0 BLOSUM62 0.318 0.135 0.401 0.346 0.154 0.535 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a +2 0 BLOSUM62 0.318 0.135 0.401 0.329 0.143 0.487 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a +1 0 BLOSUM62 0.318 0.135 0.401 0.370 0.167 0.710 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a -1 0 BLOSUM62 0.318 0.135 0.401 0.339 0.150 0.488 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a -2 0 BLOSUM62 0.318 0.135 0.401 0.349 0.152 0.529 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a -3 0 BLOSUM62 0.318 0.135 0.401 0.352 0.153 0.607 Q=9,R=2 0.244 0.0300 0.180 n/a n/a n/a Query Frame MatID Length Eff.Length E S W T X E2 S2 +3 0 199 199 10. 76 3 12 22 0.093 35 31 0.10 38 +2 0 200 199 10. 76 3 12 22 0.093 35 31 0.10 38 +1 0 200 199 10. 76 3 12 22 0.093 35 31 0.10 38 -1 0 200 200 10. 76 3 12 22 0.094 35 31 0.10 38 -2 0 200 199 10. 76 3 12 22 0.093 35 31 0.10 38 -3 0 199 199 10. 76 3 12 22 0.093 35 31 0.10 38 Statistics: Database: /usr/local/dot5/sl_home/beauty/seqdb/blast/nr Title: nr Release date: unknown Posted date: 8:50 PM CDT May 27, 2000 Format: BLAST # of letters in database: 158,518,215 # of sequences in database: 505,245 # of database sequences satisfying E: 7 No. of states in DFA: 598 (59 KB) Total size of DFA: 249 KB (256 KB) Time to generate neighborhood: 0.02u 0.01s 0.03t Elapsed: 00:00:00 No. of threads or processors used: 4 Search cpu time: 271.55u 1.30s 272.85t Elapsed: 00:01:10 Total cpu time: 271.61u 1.33s 272.94t Elapsed: 00:01:10 Start: Thu Feb 15 01:30:18 2001 End: Thu Feb 15 01:31:28 2001
Annotated Domains Database: March 14, 2000
Release Date: March 14, 2000