From: Comeau, Don (NIH/NLM/NCBI) Sent: Thursday, March 03, 2005 1:28 PM To: ncbi-seminar@ncbi.nlm.nih.gov Subject: NCBI CBB Seminar, Tuesday, March 8 2005, 11 am Follow Up Flag: Follow up Flag Status: Red NCBI CBB Seminar Tuesday, March 8 2005, 11 am Building 38A, Room B2N14 (NCBI Library) Significant Phrases: Indexing the Bookshelf Don Comeau NCBI/NLM/NIH The Bookshelf is a collection of more than 30 biomedical texts receiving nearly 600,000 hits a week. Starting from the full text in XML, possibly significant phrases are identified by either appearing in titles or by repeated occurrence. POS tagging identifies well formed phrases. A database preserves these results and the results of additional human review. Porter stemming allows recognition of morphologically different versions of a phrase. UMLS allows recognition of synonymous phrases. Synonymy offers counting and ambiguity challenges which are addressed. Abbreviation handling and improved phrase determination and extraction lead efforts toward full text understanding.