[BKMA] info about OKAPI

Wang, Alex (NIH/CIT) [E] wangal at mail.nih.gov
Wed Jan 17 10:45:01 EST 2007


OKAPI BM25 Ranking of FREETEXT

 

Rank = SUM[Terms in Query] w ( ( ( k1 + 1 ) tf ) / ( K + tf ) ) * ( ( k3
+ 1 ) qtf / ( k3 + qtf ) ) )

Where: 

w is the Robertson-Sparck Jones weight. 

In simplified form, w is defined as: 

w = log10 ( ( ( r + 0.5 ) * ( N - R + r + 0.5 ) ) / ( ( R - r + 0.5 ) *
( n - r + 0.5 ) )

N is the number of indexed rows for the property being queried. 

n is the number of rows containing the word. 

K is ( k1 * ( ( 1 - b ) + ( b * dl / avdl ) ) ). 

dl is the property length, in word occurrences. 

avdl is the average length of the property being queried, in word
occurrences. 

k1, b, and k3 are the constants 1.2, 0.75, and 8.0, respectively. 

tf is the frequency of the word in the queried property in a specific
row. 

qtf is the frequency of the term in the query. 

 

FREETEXT ranking is based on the OKAPI BM25 ranking formula. FREETEXT
queries will add words to the query via inflectional generation
(inflected forms of the original query words); these words are treated
as separate words with no special relationship to the words from which
they were generated. Synonyms generated from the Thesaurus feature are
treated as separate, equally weighted terms. Each word in the query
contributes to the rank.



More information about the BKMA mailing list