[BKMA] info about OKAPI

Johnson, Calvin (NIH/CIT) [E] johnson at mail.nih.gov
Thu Jan 18 19:46:03 EST 2007


Alex,

Thanks.

Is there some way that you could "condense" the equations to make them
more readable, perhaps using MS Word?  Then resend to the group.

Jigar, after you receive the MS Word file from Alex, could you convert
to PDF and post to the portal?

Calvin 


 

-----Original Message-----
From: Wang, Alex (NIH/CIT) [E] 
Sent: Wednesday, January 17, 2007 10:45 AM
To: bkma at dcb.cit.nih.gov
Subject: [BKMA] info about OKAPI

OKAPI BM25 Ranking of FREETEXT

 

Rank = SUM[Terms in Query] w ( ( ( k1 + 1 ) tf ) / ( K + tf ) ) * ( ( k3
+ 1 ) qtf / ( k3 + qtf ) ) )

Where: 

w is the Robertson-Sparck Jones weight. 

In simplified form, w is defined as: 

w = log10 ( ( ( r + 0.5 ) * ( N - R + r + 0.5 ) ) / ( ( R - r + 0.5 ) *
( n - r + 0.5 ) )

N is the number of indexed rows for the property being queried. 

n is the number of rows containing the word. 

K is ( k1 * ( ( 1 - b ) + ( b * dl / avdl ) ) ). 

dl is the property length, in word occurrences. 

avdl is the average length of the property being queried, in word
occurrences. 

k1, b, and k3 are the constants 1.2, 0.75, and 8.0, respectively. 

tf is the frequency of the word in the queried property in a specific
row. 

qtf is the frequency of the term in the query. 

 

FREETEXT ranking is based on the OKAPI BM25 ranking formula. FREETEXT
queries will add words to the query via inflectional generation
(inflected forms of the original query words); these words are treated
as separate words with no special relationship to the words from which
they were generated. Synonyms generated from the Thesaurus feature are
treated as separate, equally weighted terms. Each word in the query
contributes to the rank.

_______________________________________________
BKMA mailing list
BKMA at dcb.cit.nih.gov
http://dcb.cit.nih.gov/mailman/listinfo/bkma



More information about the BKMA mailing list