W. Philip Kegelmeyer

		W. Philip Kegelmeyer Distinguished Member of Technical Staff Address: Sandia National Laboratories P.O. Box 969, Mail Stop 9159 Livermore, CA 94551
		Department:	Informatics and Decision Sciences
		Phone:	(925) 294-3016
		Cell:	(925) 413-2457
		Fax:	(925) 294-2234
E-mail:	wpk (at) sandia.gov

Philip Kegelmeyer (Ph.D, Stanford, Information Systems Lab, 1985) is a Distinguished Member of the Technical Staff at Sandia National Laboratories in Livermore, CA. At Sandia, he led the Advanced Simulation Computing Data Discovery Program, devoted to search in, and characterization of, petascale scientific simulation data. He currently serves as Principal Investigator for the Networks Grand Challenge LDRD. He has twenty years experience inventing, tinkering with, and quantitatively improving supervised machine learning algorithms, including a recent digression into publishing comprehensive guidelines on how to accurately and statistically significantly compare such algorithms. His work has

What's New ...

Expanded slides for a longer version of "Situational Awareness at Internet Scale: Detection of Extremely Rare Crisis Periods", one which includes some review of the nature and properties of ensembles. Presented at a Computer Science and Engineering Symposium at the University of South Florida. (September 24, 2008)

There are openings in my department for Informatics/Complex Systems Researchers at both the junior and senior levels. If you want to ensure that your application gets attention, please drop me a note to let me know you've applied.

There are also openings in a sister department at Sandia/NM for Informatics-oriented Statistics Researchers.

Selected Recent Papers: [Click here for a publication list] (incomplete; only general machine learning papers, as of January 15, 2009)

"Using classifier ensembles to label spatially disjoint data" Larry Shoemaker, Robert E. Banfield, Lawrence O. Hall, Kevin W. Bowyer and W. Philip Kegelmeyer. Information Fusion Journal, Special Issue on Applications of Ensemble Methods, 9:1, pp. 120-133, January 2008.

"Boosting Lite - Handling Larger Datasets and Slower Base Classifiers", Lawrence O. Hall, W. Philip Kegelmeyer, Robert E. Banfield, and Kevin W. Bowyer, Proceedings of the 7th International Workshop on Multiple Classifier Systems, May, 2007, Lecture Notes in Computer Science #4472, edited by Michal Haindl, Josf Kitle, Fabio Roli, Springer.

"Learning to Predict Salient Regions from Disjoint and Skewed Training Sets", Larry Shoemaker, Robert E. Banfield, Lawrence O. Hall, Kevin W. Bowyer, W. Philip Kegelmeyer, in Proceedings of the 18th IEEE Conference on Tools with Artificial Intelligence (ICTAI 2006), Arlington, Virginia, USA, pp. 116-123, 2006.

"A Comparison of Decision Tree Ensemble Creation Techniques", Robert E. Banfield, Lawrence O. Hall, Kevin W. Bowyer, W. Philip Kegelmeyer, IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 29, no. 1, pp 173-180, January 2007.Appendix.

Selected Recent Presentations

Slides and abstract from "Situational Awareness at Internet Scale: Detection of Extremely Rare Crisis Periods", presented at the 2008 Sandia Workshop on Data Mining and Data Analysis (July 22, 2008)

Slides from a panel presentation at the "Collection/Analysis Challenge" Workshop (April 24, 2008)

Updated slides for "The Counter-Intuitive Properties of Ensembles for Machine Learning, or, Democracy Defeats Meritocracy" (April 11, 2008). The first version was a Tech Talk at Google (June 28, 2007). Here's the video.(AVI format; warning, 660 mbytes. Right click to download; might stutter if streamed.)

Slides from "Why and How to exploit OOB Validation for Ensemble Size", presented at the LLNL CASIS workshop (November 16, 2007).

Slides from "Pattern Recognition for Massive, Messy Data", presented at the LLNL CASIS workshop (November, 2006).

Software:

Avatar Tools - Ensembles for Decision Trees, implementing a decade's worth of research into best practices for machine learning in huge, messy data sets. (For Sandians only, but an open source version is expected in March of 2009)

Recent Professional Service:

Organizing Chair, DOE Workshop on Mathematics for the Analysis of Petascale Data (MAPD), June 3--5, 2008 --- final report, citation information, summary slides

Miscellaneous Links:

I have a long history of sponsorship and collaboration with the Avatar Project at the University of South Florida.

As much of my machine learning work has been intended to aid human decision making, I have an amateur's interest in the psychology of decision making, and how it can go awry. An excellent book on one aspect of that topic is Robert Cialdini's Influence: The Psychology of Persuasian. I'm such a fan I worked up a talk to summarize the book, and gave a version of it as a Tech Talk at Google, June 28, 2007. Here's the video.(AVI format; warning, 418 mbytes. Right click to download; might stutter if streamed)

(Many thanks to Tammy Kolda for the use of her home page template.)

Maintained by: Philip Kegelmeyer (wpk@sandia.gov).
Disclaimer and Acknowledgment.