From: ncbi-admin@ncbi.nlm.nih.gov on behalf of Boris Kiryutin [kiryutin@ncbi.nlm.nih.gov] Sent: Monday, March 26, 2001 2:16 PM To: ncbi-seminar@ncbi.nlm.nih.gov Subject: NCBI/CBB seminar, March 27, Tuesday, 11 a.m., 8-th floor, Bldg 38a NCBI/CBB seminar, March 27, Tuesday, 11 a.m., 8-th floor, Bldg. 38a IMPROVING SPECIFICITY AND SPEED IN PSSM SEARCHES Boris Kiryutin National Center for Biotechnology Information, National Institutes of Health The specificity in database searches is crucial for obtaining biologically valid results. The position specific score matrix (PSSM) is constructed from and alignment of related sequences. PSSMs are employed by many methods for sequence database search, above all PSI-BLAST. Each column in a PSSM contains scores for each letter from the amino acid alphabet reflecting the expectation of this character in a given column. We propose an algorithm that identifies informative columns in PSSMs. Only columns with high information content are used for the search, whereas the rest of the columns that contain mostly noise are ignored. Disregarding columns with extensive noise improves specificity and allows a modification of the traditional Needleman-Wunsch algorithm to achieve greater speed. Another modification includes merging contiguous blocks of informative columns into "super-positions". Examples of application of the new approach will be described.