Text similarity searching and data mining of Medline: from studies in ethics to drug discovery. Harold “Skip” Garner, the members of the laboratory and collaborators. There is an immense amount of information within databases, specifically text databases, and new computational approaches may help us unlock and exploit some of the hidden knowledge which can be derived from their contents. We have developed two applications that help in this process. First, a text similarity code, eTBLAST, is free on the web (http://invention.swmed.edu/etblast/etblast.shtml) and is of value in identifying similar literature without the user having to manipulate keywords. eTBLAST compares a query (a paragraph of text) to each record in Medline or other databases, computes a similarity score and then presents the results to the user in rank order. This list and also post-processors that operate on this list aid the user (researchers, clinicians, editors, reviewers, lawyers, etc.) in finding references, scanning the literature without substantial prior knowledge of an area, find experts who frequently publish in the areas defined by the query, find journals that frequently publish manuscripts in areas similar to the query, etc. Also, by randomly selecting Medline records and using them as queries we have also been able to measure the percentage of duplicate publications in Medline and other characteristics of the publication process to address issues in ethics. Our code, eTBLAST, and its duplicate publication database, déjà vu (http://spore.swmed.edu/dejavu/), work together to not only study the problem of duplicate publication but can also act as a deterrent, for submitted abstracts can be compared to the literature corpus for novelty. The bottom line – we have found tens of thousands of duplicate publications in Medline. Our second code, IRIDESCENT, identifies direct and implicit connections among a set of 2.5 million biomedical objects (diseases, genes, drugs, chemicals, phenotypes, etc.) found to be co-mentioned in Medline (and other databases). This code can be used to find hidden connections in, for example, lists of responsive genes found in microarray experiments, but most important, we are using it as a hypothesis generation engine. Specifically, we are identifying potential new uses for existing drugs using this code and then after further consideration and prioritization we test the suggested drugs in mouse models of the new indication. The bottom line – we have 6 new drugs for the treatment of cardiac hypertrophy and myocardial infarction, with tests ongoing in models of atrial fibrillation, arthritis, epilepsy, basal cell carcinoma and ALS. For both of these projects, descriptions of the approach, characterization of the functionality and utility demonstrations will be presented. Visit us at http://innovation.swmed.edu. References. 1. Lewis J, Ossowski S, Hicks J, Errami M, Garner HR (2006). Text similarity: an alternative way to search MEDLINE. Bioinformatics 22: 2298-2304. PMID: 16926219 2. Errami M, Wren JD, Hicks JM, Garner HR (2007). eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publications. Nucleic Acids Res. 35: W12-W15. PMID: 17452348