From: Wilbur, John (NIH/NLM/NCBI) Sent: Thursday, June 17, 2004 4:38 PM To: 'ncbi-seminar@ncbi.nlm.nih.gov' Subject: Special Seminar on 6/24/04 There will be a seminar presented by Hong Yu of Columbia University at 2 pm on 6/24/04 in Bldg. 38A, 2B library Title: Answering Opinion Questions: Opinion Classifications and Semantic Analysis of Propositional Opinion Abstract: Opinion question answering is concerned with questions of the form such as what is X's opinion about Y? At Columbia, we have developed and fully implemented several statistical and machine-learning models to separate subjective information from factual statements and to further assign the semantic orientation of subjective information to be either positive or negative. Specifically, we have implemented a Bayesian document-level classifier, trained on Wall Street Journal articles, to achieve high performance (97% F-measure) for distinguishing between "news" (corresponding to facts) and "editorial" (corresponding to opinions) articles. For the more difficult problem of deciding opinion status at the sentence level, we explored different strategies and found the highest performance (80% accuracy) to be Bayesian classification with approximate training labels inherited from the document level. We further classified with 90% accuracy the semantic orientation of opinion sentences by calculating the log-likelihood ratio of the loaded words in the sentences. Our fully implemented question answering system aggregates sentences that express the same semantic orientation and generated coherent paragraph-length answers through summarization techniques. We now move our tasks to a more analytic interpretation by identifying components of opinion sentences with specific roles relative to the opinion. Such components include the opinion holder X and the topic of this opinion Y for the question of what is X's opinion about Y? Often a sentence that contains subjective clauses expresses an opinion only in the main part or one of the clauses. A very common case of such component opinions is propositional opinions. We developed advanced techniques for identifying opinion holders and propositional opinions. Specifically, we first manually assigned new semantic roles (i.e., opinion holders and fact or opinion propositions) to semantically annotated databases FrameNet and PropBank. We then extended algorithms for 'semantic parsing' of those opinion components. Our state-of-the-art performance is 0.58/0.51 precision/recall for fact/opinion propositions and 0.53/0.43 precision/recall when includes opinion holders. W. John Wilbur Senior Investigator Computational Biology Branch National Center for Biotechnology Information National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894 Phone 301-435-5926 Fax 301-480-2290 Email wilbur@ncbi.nlm.nih.gov