Archives

Previous <·····> Next


Picture of the seminar speaker, Brian Luke, Ph.D.

Brian Luke

SAIC-Frederick

Office: 301-846-5553
FAX: 301-846-5762
E-mail: lukeb@ncifcrf.gov

Job Title: Senior Scientist
Ph.D. in Theoretical Chemistry
from the University of Southern California

Speaker: Brian Luke, Advanced Biomedical Computing Center, SAIC-Frederick, Frederick, MD 21701

Topic: Extracting Information from Large Datasets - Video (running time 01:05:34) *

Place: Building 549, Auditorium, NCI at Frederick, Frederick, MD

Time: Tuesday, May 13, 2003, at 2:00 PM

Abstract: Current "-omic" techniques are able to produce large amounts of data for a relatively small number of samples in different disease-states. The goal of on-going investigations at the Advanced Biomedical Computing Center (ABCC) is to identify specific features from these datasets to classify the state of an unknown sample with high sensitivity and selectivity. Unfortunately, the amount of data available for each sample is so large that random noise can be used to separate one class of samples from another with virtually 100% accuracy. Such a numerical model has very good statistics, but very little information content. Our efforts are designed to bridge the gap between purely numerical models and classification models that use key features that may suggest an underlaying biological mechanism. For this to occur, the mathematician must realize that the datasets are more than just a collection of numbers, and the experimentalist should become familiar with some of the common numerical techniques that are employed.

This presentation discusses many of the numerical techniques that are available without delving into the details of the mathematics. This includes data normalization and the serach for outliers, as well as techniques that can be used, or should not be used, to develop classification models that employ a small number of relevant features. Such fun topics as Principal Component Analysis, Sammon Maps, Branch and Bound, Tabu Search, Simulated Annealing, Genetic Algorithms, Particle Swarm Optimization, Fuzzy Clustering, K-Nearest Neighbors, K-Means, and Receiver Operating Characteristics will be described.

The slides to this seminar are in a 0.4 Megabyte PDF file, which can be opened and read by using the free Adobe Acrobat Reader®.

____________________

* Video viewing minimally requires the latest free version of RealPlayer® and a 56 Kbps dial-up bandwidth.


MSIG Home  Meetings  Members  Join MSIG  Special Items  Archives  Links


Updated 13-May-2003

Copyright © 1999-2006 The National Cancer Institute at Frederick (Frederick, MD 21701 USA)