US 7,363,166 B2
	Computational method for the identification of candidate proteins useful as anti-infectives
Samir Kumar Brahmachari, Delhi (India); Srinivasan Ramachandran, Delhi (India); Tannistha Nandi, Delhi (India); and Chandrika Bhimarao, Delhi (India)
Assigned to Council of Scientific & Industrial Research, New Delhi (India)
Filed on Mar. 30, 2001, as Appl. No. 9/820,843.
Prior Publication US 2003/0039963 A1, Feb. 27, 2003
Int. Cl. G06F 19/00 (2006.01)

U.S. Cl. 702—19 [702/30; 703/2; 707/6; 707/100]

8 Claims

1. A method for identifying a candidate protein useful as an anti-infective, comprising:

(a) calculating computationally protein sequence-based attributes from protein sequences of a pathogenic organism, wherein said protein sequences are predicted either from whole or partial genomic sequences, and wherein said protein sequence-based attributes comprise: percentage of charged amino acids, percentage hydrophobicity, distance of protein sequence from a fixed reference frame, measure of dipeptide complexity, and measure of hydrophobicity from a fixed reference frame, and wherein said pathogenic organism is selected from the group consisting of B.burgdorfei, C.jejuni, C.pneumoniae, C.trachomatis, Hinfluenzae, H.pylori, L.major, M.genitalium, M.pneumoniae, M.tuberculosis, N.meningitidis, P.aeruginosa, P.falciparum, R.prowazekii, T.pallidum, and V.cholerae;

(b) clustering computationally said protein sequences based on said protein sequence-based attributes using Principle Component Analysis;

(c) identifying computationally outlier protein sequences, wherein said outlier protein sequences appear outside a main cluster;

(d) comparing said outlier protein sequences to protein sequences listed in public sequence databases of organisms including B.burgdorfei, C.jejuni, C.pneumoniae, C.trachomatis, H.influenzae, H.pylori, L.major, M.genitalium, M.pneumoniae, M.tuberculosis, N.meningitidis, P.aeruginosa, P.falciparum, R.prowazekii, T.pallidum, and V.cholerae to (1) identify outlier proteins that are unique to said pathogenic organism based on the sequences in the databases accessed for the comparing, and (2) identify outlier proteins that are identical to proteins known to be involved in virulence; and

(e) displaying the results of said step (d).