src/algo/blast/core/pattern.c File Reference


Detailed Description

Functions for finding pattern matches in sequence.

The following functions are defined here.

See also:
phi_lookup.h
 SPHIQueryInfoNew, SPHIQueryInfoFree, SPHIQueryInfoCopy - life cycle functions
 for the SPHIQueryInfo structure for saving pattern occurrences in query.

 Main API function to find and save pattern occurrences in query, and functions 
 called from it:

 PHIGetPatternOccurrences
     FindPatternHits
         if ( pattern fits into a single word)
             s_FindHitsShortHead
         else if ( pattern fits into several words )
             s_FindHitsLong
         else if ( pattern contains parts longer than a word )
             s_FindHitsVeryLong
                 calls s_FindHitsShortHead for every word and extends them

 For pattern occurrences in subject (database), 
 FindPatternHits is called from PHIBlastScanSubject.
 

Definition in file pattern.c.

#include <algo/blast/core/pattern.h>
#include "pattern_priv.h"

Include dependency graph for pattern.c:

Go to the source code of this file.

Functions

void _PHIGetRightOneBits (Int4 s, Int4 mask, Int4 *rightOne, Int4 *rightMaskOnly)
 Looks for 1 bits in the same position of s and mask Let R be the rightmost position where s and mask both have a 1.
static Int4 s_LenOf (Int4 s, Int4 mask)
 Looks for 1 bits in the same position of s and mask Let R be the rightmost position where s and mask both have a 1.
Int4 _PHIBlastFindHitsShort (Int4 *hitArray, const Uint1 *seq, Int4 len1, const SPHIPatternSearchBlk *pattern_blk)
 Routine to find hits of pattern to sequence when sequence is proteins.
static Int4 s_FindHitsShortDNA (Int4 *hitArray, const Uint1 *seq, Int4 pos, Int4 len, const SPHIPatternSearchBlk *pattern_blk)
 Find hits when sequence is DNA and pattern is short returns twice the number of hits.
static Int4 s_FindHitsShortHead (Int4 *hitArray, const Uint1 *seq, Int4 start, Int4 len, Uint1 is_dna, const SPHIPatternSearchBlk *pattern_blk)
 Top level routine to find hits when pattern has a short description.
void _PHIPatternWordsLeftShift (Int4 *a, Uint1 b, Int4 numWords)
 Shift each word in the array left by 1 bit and add bit b.
void _PHIPatternWordsBitwiseOr (Int4 *a, Int4 *b, Int4 numWords)
 Do a word-by-word bit-wise or of two integer arrays and put the result back in the first array.
Int4 _PHIPatternWordsBitwiseAnd (Int4 *result, Int4 *a, Int4 *b, Int4 numWords)
 Do a word-by-word bit-wise and of two integer arrays and put the result in a new array.
static Int4 s_LenOfL (Int4 *s, Int4 *mask, Int4 numWords)
 Returns the difference between the offset F of a first 1-bit in a word sequence and the first offset G < F of a 1-bit in the pattern mask.
static Int4 s_FindHitsLong (Int4 *hitArray, const Uint1 *seq, Int4 len1, const SPHIPatternSearchBlk *pattern_blk)
 Finds places where pattern matches seq and returns them as pairs of positions in consecutive entries of hitArray; similar to _PHIBlastFindHitsShort.
static Int4 s_FindHitsVeryLong (Int4 *hitArray, const Uint1 *seq, Int4 len, Boolean is_dna, const SPHIPatternSearchBlk *pattern_blk)
 Find matches when pattern is very long,.
Int4 FindPatternHits (Int4 *hitArray, const Uint1 *seq, Int4 len, Boolean is_dna, const SPHIPatternSearchBlk *pattern_blk)
 Find the places where the pattern matches seq; 3 different methods are used depending on the length of the pattern.
SPHIQueryInfoSPHIQueryInfoNew ()
 Allocates the pattern occurrences structure.
SPHIQueryInfoSPHIQueryInfoFree (SPHIQueryInfo *pat_info)
 Frees the pattern information structure.
SPHIQueryInfoSPHIQueryInfoCopy (const SPHIQueryInfo *pat_info)
 Copies the SPHIQueryInfo structure.
static Int2 s_PHIBlastAddPatternHit (SPHIQueryInfo *pattern_info, Int4 offset, Int4 length)
 Adds a new pattern hit to the PHI BLAST pseudo lookup table.
Int4 PHIGetPatternOccurrences (const SPHIPatternSearchBlk *pattern_blk, const BLAST_SequenceBlk *query, const BlastSeqLoc *location, Boolean is_dna, BlastQueryInfo *query_info)
 Finds all pattern hits in a given query and saves them in the previously allocated SPHIQueryInfo structure.

Variables

static char const rcsid []


Function Documentation

Int4 _PHIBlastFindHitsShort Int4 hitArray,
const Uint1 seq,
Int4  len1,
const SPHIPatternSearchBlk pattern_blk
 

Routine to find hits of pattern to sequence when sequence is proteins.

Definition at line 109 of file pattern.c.

References SShortPatternItems::match_mask, SPHIPatternSearchBlk::one_word_items, PHI_MAX_HIT, s_LenOf(), and SShortPatternItems::whichPositionPtr.

Referenced by s_FindHitsShortHead(), and s_PHIGetExtraLongPattern().

void _PHIGetRightOneBits Int4  s,
Int4  mask,
Int4 rightOne,
Int4 rightMaskOnly
 

Looks for 1 bits in the same position of s and mask Let R be the rightmost position where s and mask both have a 1.

Let L < R be the rightmost position where mask has a 1, if any, or -1 otherwise.

Parameters:
s Number to check bits in [in]
mask Mask to apply [in]
rightOne The rightmost position where s and mask both have a 1 [out]
rightMaskOnly The rightmost position < rightOne, where mask has a 1, if any, or -1 otherwise [out]

Definition at line 66 of file pattern.c.

References PHI_BITS_PACKED_PER_WORD.

Referenced by s_LenOf().

Int4 _PHIPatternWordsBitwiseAnd Int4 result,
Int4 a,
Int4 b,
Int4  numWords
 

Do a word-by-word bit-wise and of two integer arrays and put the result in a new array.

Parameters:
result Result of the operation [out]
a First array [in]
b Second array [in]
numWords Size of the two input arrays [in]
Returns:
1 if there are any non-zero words, otherwize 0.

Definition at line 270 of file pattern.c.

void _PHIPatternWordsBitwiseOr Int4 a,
Int4 b,
Int4  numWords
 

Do a word-by-word bit-wise or of two integer arrays and put the result back in the first array.

Parameters:
a First array [in] [out]
b Second array [in]
numWords Number of words in a and b [in]

Definition at line 262 of file pattern.c.

void _PHIPatternWordsLeftShift Int4 a,
Uint1  b,
Int4  numWords
 

Shift each word in the array left by 1 bit and add bit b.

If the new values is bigger than an overflow threshold, then subtract the overflow threshold.

Parameters:
a Array of integers, representing words in a pattern [in] [out]
b bit to add [in]
numWords Number of words to process [in]

Definition at line 241 of file pattern.c.

References PHI_BITS_PACKED_PER_WORD.

Int4 FindPatternHits Int4 hitArray,
const Uint1 seq,
Int4  len,
Boolean  is_dna,
const SPHIPatternSearchBlk patternSearch
 

Find the places where the pattern matches seq; 3 different methods are used depending on the length of the pattern.

Parameters:
hitArray Stores the results as pairs of positions in consecutive entries [out]
seq Sequence [in]
len Length of the sequence [in]
is_dna Indicates whether seq is made of DNA or protein letters [in]
patternSearch Pattern information [in]
Returns:
Twice the number of hits (length of hitArray filled in)

Definition at line 473 of file pattern.c.

References eMultiWord, eOneWord, SPHIPatternSearchBlk::flagPatternLength, s_FindHitsLong(), s_FindHitsShortHead(), and s_FindHitsVeryLong().

Referenced by PHIBlastScanSubject(), and PHIGetPatternOccurrences().

Int4 PHIGetPatternOccurrences const SPHIPatternSearchBlk pattern_blk,
const BLAST_SequenceBlk query,
const BlastSeqLoc location,
Boolean  is_dna,
BlastQueryInfo query_info
 

Finds all pattern hits in a given query and saves them in the previously allocated SPHIQueryInfo structure.

Parameters:
pattern_blk Structure containing pattern structure. [in]
query Query sequence(s) [in]
location Segments in the query sequence where to look for pattern [in]
is_dna Is this a nucleotide sequence? [in]
query_info Used to store pattern occurrences and get length of query (for error checking) [out]
Returns:
a negative number is an unknown error, INT4_MAX indicates the pattern (illegally) covered the entire query, other non-negative numbers indicate the nubmer of pattern occurrences found.

Definition at line 558 of file pattern.c.

References ASSERT, BlastQueryInfoGetQueryLength(), eBlastTypePhiBlastn, eBlastTypePhiBlastp, FindPatternHits(), INT4_MAX, SSeqRange::left, BlastSeqLoc::next, BlastQueryInfo::pattern_info, query, SSeqRange::right, and BlastSeqLoc::ssr.

Referenced by Blast_SetPHIPatternInfo().

static Int4 s_FindHitsLong Int4 hitArray,
const Uint1 seq,
Int4  len1,
const SPHIPatternSearchBlk pattern_blk
[static]
 

Finds places where pattern matches seq and returns them as pairs of positions in consecutive entries of hitArray; similar to _PHIBlastFindHitsShort.

Parameters:
hitArray Array of hits to return [out]
seq Input sequence [in]
len1 Length of seq [in]
pattern_blk carries all the pattern variables
Returns:
twice the number of hits.

Definition at line 320 of file pattern.c.

References SLongPatternItems::match_maskL, SPHIPatternSearchBlk::multi_word_items, and SLongPatternItems::numWords.

Referenced by FindPatternHits().

static Int4 s_FindHitsShortDNA Int4 hitArray,
const Uint1 seq,
Int4  pos,
Int4  len,
const SPHIPatternSearchBlk pattern_blk
[static]
 

Find hits when sequence is DNA and pattern is short returns twice the number of hits.

Parameters:
hitArray Array of hits to pass back [out]
seq The input sequence [in]
pos Starting position [in]
len Length of sequence seq [in]
pattern_blk Carries variables that keep track of search parameters. [in]
Returns:
Number of hits found.

Definition at line 159 of file pattern.c.

References SShortPatternItems::dna_items, SDNAShortPatternItems::DNAwhichPrefixPosPtr, SDNAShortPatternItems::DNAwhichSuffixPosPtr, SShortPatternItems::match_mask, SPHIPatternSearchBlk::one_word_items, PHI_BITS_PACKED_PER_WORD, and s_LenOf().

Referenced by s_FindHitsShortHead().

static Int4 s_FindHitsShortHead Int4 hitArray,
const Uint1 seq,
Int4  start,
Int4  len,
Uint1  is_dna,
const SPHIPatternSearchBlk pattern_blk
[static]
 

Top level routine to find hits when pattern has a short description.

Parameters:
hitArray Array of hits to pass back [out]
seq Input sequence [in]
start Position to start at in seq [in]
len Length of seq [in]
is_dna 1 if and only if seq is a DNA sequence [in]
pattern_blk Carries variables that keep track of search parameters. [in]
Returns:
Number of matches found.

Definition at line 232 of file pattern.c.

References _PHIBlastFindHitsShort(), and s_FindHitsShortDNA().

Referenced by FindPatternHits(), and s_FindHitsVeryLong().

static Int4 s_FindHitsVeryLong Int4 hitArray,
const Uint1 seq,
Int4  len,
Boolean  is_dna,
const SPHIPatternSearchBlk pattern_blk
[static]
 

Find matches when pattern is very long,.

Parameters:
hitArray Array to pass back pairs of start and end positions for hits [out]
seq Input sequence [in]
len Length of seq [in]
is_dna Is sequence DNA or protein? [in]
pattern_blk carries all the pattern variables [in]
Returns:
Twice the number of hits found.

Definition at line 373 of file pattern.c.

References SLongPatternItems::dna_items, SShortPatternItems::dna_items, SDNALongPatternItems::DNAprefixSLL, SDNALongPatternItems::DNAsuffixSLL, SDNAShortPatternItems::DNAwhichPrefixPosPtr, SDNAShortPatternItems::DNAwhichSuffixPosPtr, SLongPatternItems::extra_long_items, SShortPatternItems::match_mask, SLongPatternItems::match_maskL, SPHIPatternSearchBlk::multi_word_items, SLongPatternItems::numWords, SPHIPatternSearchBlk::one_word_items, PHI_MAX_HIT, s_FindHitsShortHead(), SLongPatternItems::SLL, SExtraLongPatternItems::whichMostSpecific, and SShortPatternItems::whichPositionPtr.

Referenced by FindPatternHits().

static Int4 s_LenOf Int4  s,
Int4  mask
[static]
 

Looks for 1 bits in the same position of s and mask Let R be the rightmost position where s and mask both have a 1.

Let L < R be the rightmost position where mask has a 1, if any, or -1 otherwise.

Parameters:
s Some number [in]
mask Mask [in]
Returns:
(R - L).

Definition at line 98 of file pattern.c.

References _PHIGetRightOneBits().

Referenced by _PHIBlastFindHitsShort(), and s_FindHitsShortDNA().

static Int4 s_LenOfL Int4 s,
Int4 mask,
Int4  numWords
[static]
 

Returns the difference between the offset F of a first 1-bit in a word sequence and the first offset G < F of a 1-bit in the pattern mask.

If such G does not exist, it is set to -1.

Parameters:
s Input sequence [in]
mask Array of word masks [in]
numWords Number of words in s. [in]
Returns:
F - G, see explanation above.

Definition at line 291 of file pattern.c.

References PHI_BITS_PACKED_PER_WORD.

static Int2 s_PHIBlastAddPatternHit SPHIQueryInfo pattern_info,
Int4  offset,
Int4  length
[static]
 

Adds a new pattern hit to the PHI BLAST pseudo lookup table.

Parameters:
pattern_info The query pattern information structure. [in] [out]
offset Offset in query at which pattern was found. [in]
length Length of the pattern at this offset. [in]

Definition at line 535 of file pattern.c.

References SPHIQueryInfo::allocated_size, SPHIPatternInfo::length, SPHIQueryInfo::num_patterns, SPHIQueryInfo::occurrences, and SPHIPatternInfo::offset.

SPHIQueryInfo* SPHIQueryInfoCopy const SPHIQueryInfo pat_info  ) 
 

Copies the SPHIQueryInfo structure.

Parameters:
pat_info Structure to copy [in]
Returns:
New structure.

Definition at line 512 of file pattern.c.

References BlastMemDup(), SPHIQueryInfo::num_patterns, SPHIQueryInfo::occurrences, and SPHIQueryInfo::pattern.

Referenced by BlastQueryInfoDup(), and CSearchResults::CSearchResults().

SPHIQueryInfo* SPHIQueryInfoFree SPHIQueryInfo pat_info  ) 
 

Frees the pattern information structure.

Parameters:
pat_info Structure to free. [in]
Returns:
NULL.

Definition at line 501 of file pattern.c.

References SPHIQueryInfo::occurrences, SPHIQueryInfo::pattern, and sfree.

Referenced by BlastQueryInfoFree().

SPHIQueryInfo* SPHIQueryInfoNew void   ) 
 

Allocates the pattern occurrences structure.

Definition at line 483 of file pattern.c.

References SPHIQueryInfo::allocated_size, and SPHIQueryInfo::occurrences.

Referenced by Blast_SetPHIPatternInfo().


Variable Documentation

char const rcsid[] [static]
 

Initial value:

 
    "$Id: pattern.c 134303 2008-07-17 17:42:49Z camacho $"

Definition at line 58 of file pattern.c.


Generated on Wed Mar 11 22:44:43 2009 for NCBI C++ ToolKit by  doxygen 1.4.6
Modified on Wed Mar 11 23:16:10 2009 by modify_doxy.py rev. 117643