From: Yonil Park [park@ncbi.nlm.nih.gov] Sent: Friday, June 20, 2003 12:00 PM To: ncbi-seminar@ncbi.nlm.nih.gov Subject: CBB Seminar: June 24, 11 AM, 5th floor of 38A Room: Building 38A, 5th Floor Conference Room Date: 11 am, Tuesday, June 24, 2002 Title: Matrix analytical approach to exact distributions of word occurrences and local maximum score in Markov dependent sequences In this talk, I present solutions for two exact distributions, one for word occurrences and one for the local maximum score in Markov dependent sequences. The talk exemplifies the methods with DNA or protein models based on Markov chains of order 1, but the methods generalize to higher order Markov models. The expected frequency of some patterns (words) and the distribution of the distance between the occurrences of patterns in biological sequences has been extensively studied because it has many applications in genome analysis. Both the exact distribution, along with normal approximations or (Compound) Poisson approximations, are now known. Previous theory gives the exact distribution as a very complicated formula. Here I can give the exact distribution as a simple formula. The formula is based on an intuitive decomposition of the transition matrix into two matrices according to whether or not the pattern occurred. Both simple and complicated formulas require computation time proportional to the sequence length. We present empirical study results using both promoter set and random human genome set. Monte Carlo simulation study results are also presented. In addition, I present an algorithm to calculate the exact p-value of local maximum score in Markov sequences. This algorithm depends on the local maximum score, the number of Markov state, and the sequence length. The Monte Carlo simulation study results are also presented. ---------- Yonil Park NIH/NLM/NCBI Bldg 38A, 6th Floor, Room N611N 8600 Rockville Pike Bethesda, MD 20894 Voice: 301-402-1438 Fax: 301-480-2288