Papers

Illuminating the Path: The Research and Development Agenda for Visual Analytics. Download or learn more

Links to papers are to non-PNNL sites, some of which require a subscription or charge a fee to access the full text of papers.

2008

A Dynamic Multiscale Magnifying Tool for Exploring Large Sparse Graphs

Wong PC, HP Foote, PS Mackey, G Chin, Jr, HJ Sofia, and JJ Thomas. 2008. "A Dynamic Multiscale Magnifying Tool for Exploring Large Sparse Graphs." Information Visualization 7:105-117.

BioGraphE: High-performance bionetwork analysis using the Biological Graph Environment

Chin G, Jr, D Chavarría-Miranda, GC Nakamura, and HJ Sofia. 2008. "BioGraphE: High-performance bionetwork analysis using the Biological Graph Environment." BMC Bioinformatics.

Bringing A Vector/Image Conflation Tool To The Commercial Market

Martucci LM, and B Kovalerchuk. 2008. "Bringing A Vector/Image Conflation Tool To The Commercial Market." In American Society of Photogrammetry and Remote Sensing (ASPRS) 2008 Annual Conference. American Society of Photogrammetry and Remote Sensing (ASPRS), Washington, DC.

Progress and Challenges in Evaluating Tools for Sensemaking

Scholtz JC. 2008. "Progress and Challenges in Evaluating Tools for Sensemaking." Presented at the ACM Computer Human Information (CHI) conference Workshop on Sensemaking in Florence, Italy, April 6, 2008.

2007

Fast Point-Feature Label Placement for Dynamic Visualizations

Mote KD. 2008. "Fast Point-Feature Label Placement for Dynamic Visualizations." Information Visualization 6(4):249-260

Putting Security in Context: Visual Correlation of Network Activity with Real-World Information

Pike WA, SJ Zabriskie, and C Scherrer. 2007. "Putting Security in Context: Visual Correlation of Network Activity with Real-World Information." In Workshop on Visualization for Computer Security 2007 (VizSEC 07). PNNL-SA-57153, Pacific Northwest National Laboratory, Richland, WA. (Abstract)

Scalable Visual Analytics of Massive Textual Datasets

Krishnan M, SJ Bohn, WE Cowley, VL Crow, and J Nieplocha. 2007. "Scalable Visual Analytics of Massive Textual Datasets." In IEEE International Parallel & Distributed Processing Symposium. Long Beach, CA, March 26-30, 2007. (Abstract)

Abstract:

This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.

Visual Analysis of Weblog Content

Gregory ML, DA Payne, D McColgin, NO Cramer, and DV Love. 2006. "Visual Analysis of Weblog Content." In International Conference on Weblogs and Social Media '07. pp. 227-230. Boulder, March 26-28, 2007.

Abstract:

In recent years, one of the advances of the World Wide Web is social media and one of the fastest growing aspects of social media is the blogosphere. Blogs make content creation easy and are highly accessible through web pages and syndication. With their growing influence, a need has arisen to be able to monitor the opinions and insight revealed within their content. This paper describes a technical approach for analyzing the content of blog data using a visual analytic tool, IN-SPIRE, developed by Pacific Northwest National Laboratory. We will describe both how an analyst can explore blog data with IN-SPIRE and how the tool could be modified in the future to handle the specific nuances of analyzing blog data.

Visual Analytics Science and Technology

Wong PC. 2007. "Visual Analytics Science and Technology." Information Visualization 2007(6):1-2.

2006

Diverse Information Integration and Visualization

Havre SL, A Shah, C Posse, and BM Webb-Robertson. 2006."Diverse Information Integration and Visualization." In Visualization and Data Analysis 2006 (EI10). SPIE The International Society for Optical Engineering, San Jose, CA.

Abstract:

This paper presents and explores a technique for visually integrating and exploring diverse information. Society produces, collects, and processes ever larger and diverse data including semi- and un-structured text, as well as transaction, communication, and scientific data. It is no longer sufficient to analyze one type of data or information in isolation. Users need to explore their data/information in the context of related information to discover often hidden, but meaningful, complex relationships. Our approach visualizes multiple, like entities across multiple dimensions where each dimension is a partitioning of the entities. The partitioning may be based on inherent or assigned attributes of the entities (or entity data) such as meta-data or prior knowledge captured in annotations. The partitioning may also be derived from entity data. For example, clustering, or unsupervised classification, can be applied to arrays of multidimensional entity data to partition the entities into groups of similar entities, or clusters. The same entities may be clustered on data from different experiment types or processing approaches. This reduction of diverse data/information on an entity to a series of partitions, or discrete (and unit-less) categories, allows the user to view the entities across a variety of data without concern for data types and units. Parallel coordinates visualize entity data across multiple dimensions of typically continuous attributes. We adapt parallel coordinates for dimensions with discrete attributes (partitions) to allow the comparison of entity partition patterns for identifying trends and outlier entities. We illustrate this approach through a prototype, Juxter (short for Juxtaposer).

From Question Answering to Visual Exploration

McColgin DW, ML Gregory, EG Hetzler, and AE Turner. 2006. "From Question Answering to Visual Exploration." In Proceedings of the ACM SIGIR workshop on Evaluating Exploratory Search Systems, pp. 47-50. Seattle, August 10, 2006.

Abstract:

Research in Question Answering has focused on the quality of information retrieval or extraction using the metrics of precision and recall to judge success; these metrics drive toward finding the specific best answer(s) and are best supportive of a lookup type of search. These do not address the opportunity that users' natural language questions present for exploratory interactions. In this paper, we present an integrated Question Answering environment that combines a visual analytics tool for unstructured text and a state-of-the-art query expansion tool designed to compliment the cognitive processes associated with an information analysts work flow. Analysts are seldom looking for factoid answers to simple questions; their information needs are much more complex in that they may be interested in patterns of answers over time, conflicting information, and even related non-answer data may be critical to learning about a problem or reaching prudent conclusions. In our visual analytics tool, questions result in a comprehensive answer space that allows users to explore the variety within the answers and spot related information in the rest of the data. The exploratory nature of the dialog between the user and this system requires tailored evaluation methods that better address the evolving user goals and counter cognitive biases inherent to exploratory search tasks.

Generating Graphs for Visual Analytics through Interactive Sketching

Wong PC, HP Foote, PS Mackey, KA Perrine, and G Chin, JR. 2006. "Generating Graphs for Visual Analytics through Interactive Sketching." IEEE Transactions on Visualization and Computer Graphics Volume 12(Number 6):, doi:10.1109/TVCG.2006.91

Abstract:

We introduce an interactive graph generator, GreenSketch, designed to facilitate the creation of descriptive graphs required for different visual analytics tasks. The human-centric design approach of GreenSketch enables users to master the creation process without specific training or prior knowledge of graph model theory. The customized user interface encourages users to gain insight into the connection between the compact matrix representation and the topology of a graph layout when they sketch their graphs. Both the human-enforced and machine-generated randomnesses supported by GreenSketch provide the flexibility needed to address the uncertainty factor in many analytical tasks. This paper describes over two dozen examples that cover a wide variety of graph creations from a single line of nodes to a real-life small-world network that describes a snapshot of telephone connections. While the discussion focuses mainly on the design of GreenSketch, we include a case study that applies the technology in a visual analytics environment and a usability study that evaluates the strengths and weaknesses of our design approach.

Graph Signatures for Visual Analytics

Wong PC, HP Foote, G Chin, JR, PS Mackey, and KA Perrine. 2006. "Graph Signatures for Visual Analytics." IEEE Transactions on Visualization and Computer Graphics 12(6):, doi:10.1109/TVCG.2006.92

Abstract:

We present a visual analytics technique to explore graphs using the concept of a data signature. A data signature, in our context, is a multidimensional vector that captures the local topology information surrounding each graph node. Signature vectors extracted from a graph are projected onto a low-dimensional scatterplot through the use of scaling. The resultant scatterplot, which reflects the similarities of the vectors, allows analysts to examine the graph structures and their corresponding real-life interpretations through repeated use of brushing and linking between the two visualizations. The interpretation of the graph structures is based on the outcomes of multiple participatory analysis sessions with intelligence analysts conducted by the authors at the Pacific Northwest National Laboratory. The paper first uses three public domain datasets with either well-known or obvious features to explain the rationale of our design and illustrate its results. More advanced examples are then used in a customized usability study to evaluate the effectiveness and efficiency of our approach. The study results reveal not only the limitations and weaknesses of the traditional approach based solely on graph visualization but also the advantages and strengths of our signature-guided approach presented in the paper.

Have Green - A Visual Analytics Framework for Large Semantic Graphs

Wong PC, G Chin, Jr, HP Foote, PS Mackey, and JJ Thomas. 2006. "Have Green - A Visual Analytics Framework for Large Semantic Graphs." In IEEE Symposium on Visual Analytics Science and Technology, pp 67-74. Baltimore, Maryland, October 31-November 2, 2006.

Abstract:

A semantic graph is a network of heterogeneous nodes and links annotated with a domain ontology. In intelligence analysis, investigators use semantic graphs to organize concepts and relationships as graph nodes and links in hopes of discovering key trends, patterns, and insights. However, as new information continues to arrive from a multitude of sources, the size and complexity of the semantic graphs will soon overwhelm an investigator's cognitive capacity to carry out significant analyses. We introduce a powerful visual analytics framework designed to enhance investigators' natural analytical capabilities to comprehend and analyze large semantic graphs. The paper describes the overall framework design, presents major development accomplishments to date, and discusses future directions of a new visual analytics system known as Have Green.

Walking the Path-A New Journey to Explore and Discover through Visual Analytics

Wong PC, SJ Rose, G Chin, Jr, D Frincke, RA May, II, C Posse, AP Sanfilippo, and JJ Thomas. 2006. "Walking the Path-A New Journey to Explore and Discover through Visual Analytics." Information Visualization 5(4):237-249. doi:10.1057/palgrave.ivs.9500133

Abstract:

Visual representations are essential aids to human cognitive tasks and are valued to the extent that they provide stable and external reference points upon which dynamic activities and thought processes may be calibrated and upon which models and theories can be tested and confirmed. The active use and manipulation of visual representations makes many complex and intensive cognitive tasks feasible. As described in the recently published "Illuminating the Path", visual analytics is "the science of analytical reasoning facilitated by interactive visual interfaces." We describe research and development at PNNL focused on improving the value that interactive visual representations provide to persons engaged in complex cognitive tasks. We describe work at PNNL that carries forward research from multiple disciplines with a goal to improve the capability of visual representations and present examples whose aim is to improve the extraction, and reasoning about information, knowledge, and data.

2005

A Typology for Visualizing Uncertainty

Thomson JR, EG Hetzler, A MacEachren, MN Gahegan, and M Pavel. 2005. "A Typology for Visualizing Uncertainty." In Visualization and Data Analysis 2005, Published in Proceedings of the SPIE, vol. 5669, pp. 146-157. SPIE, IS&T, San Jose, CA.

Abstract:

Information analysts must rapidly assess information to determine its usefulness in supporting and informing decision makers. In addition to assessing the content, the analyst must also be confident about the quality and veracity of the information. Visualizations can concisely represent vast quantities of information thus aiding the analyst to examine larger quantities of material; however visualization programs are challenged to incorporate a notion of confidence or certainty because the factors that influence the certainty or uncertainty of information vary with the type of information and the type of decisions being made. For example, the assessment of potentially subjective human-reported data leads to a large set of uncertainty concerns in fields such as national security, law enforcement (witness reports), and even scientific analysis where data is collected from a variety of individual observers. What's needed is a formal model or framework for describing uncertainty as it relates to information analysis, to provide a consistent basis for constructing visualizations of uncertainty. This paper proposes an expanded typology for uncertainty, drawing from past frameworks targeted at scientific computing. The typology provides general categories for analytic uncertainty, a framework for creating task-specific refinements to those categories, and examples drawn from the national security field.

Bioinformatic Insights from Metagenomics through Visualization

Havre SL, BM Webb-Robertson, A Shah, C Posse, B Gopalan, and FJ Brockman. 2005. "Bioinformatic Insights from Metagenomics through Visualization." In Proceedings of the IEEE Computational Systems Bioinformatics Conference (CSB 2005). August 8-11, 2005, pp. 341-350. IEEE Computer Society, Los Alamitos, CA.

Abstract:

Cutting-edge biological and bioinformatics research seeks a systems perspective through the analysis of multiple types of high-throughput and other experimental data for the same sample. Systems-level analysis requires the integration and fusion of such data, typically through advanced statistics and mathematics. Visualization is a complementary com-putational approach that supports integration and analysis of complex data or its derivatives. We present a bioinformatics visualization prototype, Juxter, which depicts categorical information derived from or assigned to these diverse data for the purpose of comparing patterns across categorizations. The visualization allows users to easily discern correlated and anomalous patterns in the data. These patterns, which might not be detected automatically by algorithms, may reveal valuable information leading to insight and discovery. We describe the visualization and interaction capabilities and demonstrate its utility in a new field, metagenomics, which combines molecular biology and genetics to identify and characterize genetic material from multi-species microbial samples.

Building a Human Information Discourse Interface to Uncover Scenario Content

Sanfilippo AP, BL Baddeley, AJ Cowell, ML Gregory, RE Hohimer, and SC Tratz. 2005. "Building a Human Information Discourse Interface to Uncover Scenario Content." In 2005 International Conference on Intelligence Analysis . Mitre Website, McLean, VA.

Dynamic Visualization of Graphs with Extended Labels

Wong PC, PS Mackey, KA Perrine, JR Eagan, HP Foote, and J Thomas. 2005. "Dynamic Visualization of Graphs with Extended Labels." In 2005 IEEE Symposium on Information Visualization, Los Alamitos, CA, October 2005, pp. 73-80. IEEE, Piscataway, NJ.

Abstract:

The paper describes a novel technique to visualize graphs with extended node and link labels. The lengths of these labels range from a short phrase to a full sentence to an entire paragraph and beyond. Our solution is different from all the existing approaches that almost always rely on intensive computational effort to optimize the label placement problem. Instead, we share the visualization resources with the graph and present the label information in static, interactive, and dynamic modes without the requirement for tackling the intractability issues. This allows us to reallocate the computational resources for dynamic presentation of real-time information. The paper includes a user study to evaluate the effectiveness and efficiency of the visualization technique.

Extending the Reach of Augmented Cognition To Real-World Decision Making Tasks

Greitzer FL. 2005. "Extending the Reach of Augmented Cognition To Real-World Decision Making Tasks." In Augmented Cognition International Conference. HCI-International, Las Vegas.

Abstract:

The focus of this paper is on the critical challenge of bridging the gap between psychophysiological sensor data and the inferred cognitive states of users. It is argued that a more robust behavioral data collection foundation will facilitate accurate inferences about the state of the user so that an appropriate mitigation strategy, if needed, can be applied. The argument for such a foundation is based on two premises: (1) To realize the envisioned impact of augmented cognition systems, the technology should be applied to a broad, and more cognitively complex, range of real-world problems. (2) To support identifying cognitive states for more complex, real-world tasks, more sophisticated instrumentation will be needed for behavioral data collection. It is argued that such instrumentation would enable inferences to be made about higher-level semantic aspects of performance. The paper describes how instrumentation software developed to support information analysis R&D may serve as an integration environment that can provide additional behavioral data, in context, to facilitate inferences of cognitive state that will enable the successful augmenting of cognitive performance.

InfoStar: An Adaptive Visual Analytics Platform for Mobile Devices

Sanfilippo AP, RA May, II, GR Danielson, RM Riensche, and BL Baddeley. 2005. "InfoStar: An Adaptive Visual Analytics Platform for Mobile Devices." In First International Workshop on Managing Context Information in Mobile and Pervasive Environments. CEUR-WS.org, Ayia Napa, Cyprus.

Abstract:

We present the design and implementation of InfoStar, an adaptive Visual Analytics platform for mobile devices such a PDAs, laptops, Tablet PCs and mobile phones. InfoStar extends the reach of visual analytics technology beyond the traditional desktop paradigm to provide ubiquitous access to inter-active visualizations of information spaces. These visualizations are critical in addressing the knowledge needs of human agents operating in the field, in areas as diverse as business, homeland security, law enforcement, protective services, emergency medical services and scientific discovery. We describe an initial real world deployment of this technology, in which the InfoStar platform has been used to offer mobile access to scheduling and venue information to conference attendees at Supercomputing 2004.

Metrics and Measures for Intelligence Analysis Task Difficulty

Greitzer FL, and KM Allwein. 2005. "Metrics and Measures for Intelligence Analysis Task Difficulty ." In First International Conference on Intelligence Analysis Methods and Tools . MITRE Corp, McLean, VA.

Abstract:

Recent workshops and conferences supporting the intelligence community (IC) have highlighted the need to characterize the difficulty or complexity of intelligence analysis (IA) tasks in order to facilitate assessments of the impact or effectiveness of IA tools that are being considered for introduction into the IC. Some fundamental issues are: (a) how to employ rigorous methodologies in evaluating tools, given a host of problems such as controlling for task difficulty, effects of time or learning, small-sample size limitations; (b) how to measure the difficulty/complexity of IA tasks in order to establish valid experimental/quasi-experimental designs aimed to support evaluation of tools; and (c) development of more rigorous (summative), performance-based measures of human performance during the conduct of IA tasks, beyond the more traditional reliance on formative assessments (e.g., subjective ratings). Invited discussants will be asked to comment on one or more of these issues, with the aim of bringing the most salient issues and research needs into focus.

New Challenges Facing Integrative Biological Science in the Post-Genomic Era

Oehmen CS, T Straatsma, GA Anderson, G Orr, BM Webb-Robertson, RC Taylor, RW Mooney, DJ Baxter, DR Jones, and DA Dixon. 2005. "New Challenges Facing Integrative Biological Science in the Post-Genomic Era." Journal of Biological Systems.

Abstract:

The future of biology will be increasingly driven by the fundamental paradigm shift from hypothesis-driven research to data-driven discovery research employing the massive amounts of available biological data. We identify key technological developments needed to enable this paradigm shift involving (1) the ability to store and manage extremely large datasets which are dispersed over a wide geographical area, (2) development of novel analysis and visualization tools which are capable of operating on enormous data resources without overwhelming researchers with unusable information, and (3) formalisms for integrating mathematical models of biosystems from the molecular level to the organism population level. This will require the development of tools which efficiently utilize high-performance compute power, large storage infrastructures and large aggregate memory architectures. The end result will be the ability of a researcher to integrate complex data from many different sources with simulations to analyze a given system at a wide range of temporal and spatial scales in a single conceptual model.

Turning the Bucket of Text into a Pipe

Hetzler EG, VL Crow, DA Payne, and AE Turner. 2005. "Turning the Bucket of Text into a Pipe." In Proceedings of the IEEE Symposium on Information Visualization. INFOVIS 2005. 23-25 Oct. 2005, pp. 89-94. IEEE, Los Alamitos, CA.

Abstract:

Many visual analysis tools operate on a fixed set of data. However, professional information analysts follow issues over a period of time, and need to be able to easily add the new documents to an ongoing exploration. Some analysts handle documents in a moving window of time, with new documents constantly added and old ones aging out. This paper describes both the user interaction and the technical implementation approach for a visual analysis system designed to support constantly evolving text collections.

Scientist-Centered Graph-Based Models of Scientific Knowledge

Chin G, JR, EG Stephan, DK Gracio, OA Kuchar, PD Whitney, and KL Schuchardt. 2005. "Scientist-Centered Graph-Based Models of Scientific Knowledge." In HCI International 2005. 11th International Conference on Human-Computer Interaction, 22-27, July 2005, Caesars Palace, Las Vegas, Nevada USA., p. 10 pages. Lawrence Erlbaum and Associates, Mahwah, NJ.

Abstract:

At the Pacific Northwest National Laboratory, we are researching and developing visual models and paradigms that will allow scientists to capture and represent conceptual models in a computational form that may linked to and integrated with scientific data sets and applications. Captured conceptual models may be logical in conveying how individual concepts tie together to form a higher theory, analytical in conveying intermediate or final analysis results, or temporal in describing the experimental process in which concepts are physically and computationally explored. In this paper, we describe and contrast three different research and development systems that allow scientists to capture and interact with computational graph-based models of scientific knowledge. Through these examples, we explore and examine ways in which researchers may graphically encode and apply scientific theory and practice on computer systems.

Top Ten Needs for Intelligence Analysis Tool Development

Badalamente RV, and FL Greitzer. "Top Ten Needs for Intelligence Analysis Tool Development." 2005. In First International Conference on Intelligence Analysis Methods and Tools. MITRE Corp, McLean, VA.

Abstract:

The purpose of this paper is to report on the results of R&D to generate ideas about future enhancements to software systems designed to aid the process of intelligence analysis (IA). Use of IA tools in actual settings has revealed significant problems: the user's thought process has not been adequately modeled and is therefore not reflected in the design of analysis tools; users find the tools difficult to learn and use; the tools are not tailored to specific intelligence domains; the tools do not offer an integrated approach (data preprocessing/ingest is a particular problem); the tools do not address the longitudinal nature (continuing over extended periods of time) of the general analysis problem. The aim of this work was to establish an enduring, well-integrated, robust technical foundation for the development and deployment of information-technology (IT)-based IA tools recognized by users and clients as uniquely well designed to meet their varied analysis needs. An overarching strategy or "roadmap" is needed to guide technology development, and a more accurate understanding is needed about how real intelligence analysts do their job. To address these needs, we conducted a facilitated workshop with nine working analysts. An intelligence analysis process model was developed and discussed with the analysts as a point of departure for the discussion. Participants worked in break-out groups to discuss concepts for tools and enhanced products to aid in the IA process. The top ten enhancements identified during the workshop were: seamless data access and ingest; diverse data ingest and fusion; shared electronic folders for collaborative analysis; hypothesis generation and tracking; template for analysis strategy; electronic skills inventory; dynamic data processing and visualization; intelligent tutor for intelligence product development; imagery data resources; intelligence analysis knowledge base. This paper and presentation will discuss the conduct of the workshop and the results obtained.

Toward the Development of Cognitive Task Difficulty Metrics to Support Intelligence Analysis Research

Greitzer FL. 2005. "Toward the Development of Cognitive Task Difficulty Metrics to Support Intelligence Analysis Research." In The Fourth IEEE Conference on Cognitive Informatics, Aug. 8-10, 2005. ICCI 2005, pp. 315-320. Institute of Electrical and Electronics Engineers, Piscataway, NJ.

Abstract:

Intelligence analysis is a cognitively complex task that is the subject of considerable research aimed at developing methods and tools to aid the analysis process. To support such research, it is necessary to characterize the difficulty or complexity of intelligence analysis tasks in order to facilitate assessments of the impact or effectiveness of tools that are being considered for deployment. A number of informal accounts of "What makes intelligence analysis hard" are available, but there has been no attempt to establish a more rigorous characterization with well-defined difficulty factors or dimensions. This paper takes an initial step in this direction by describing a set of proposed difficulty metrics based on cognitive principles.

Visual Sample Plan (VSP) Software: Designs and Data Analyses for Sampling Contaminated Buildings

Pulsipher BA, JE Wilson, RO Gilbert, LL Nuffer, and NL Hassig. 2005. "Visual Sample Plan (VSP) Software: Designs and Data Analyses for Sampling Contaminated Buildings." In Proceedings of 24th Annual National Conference on Managing Environmental Quality Systems , vol. 24-2-2, pp. 24-34. US EPA, Washington, DC.

Abstract:

A new module of the Visual Sample Plan (VSP) software has been developed to provide sampling designs and data analyses for potentially contaminated buildings. An important application is assessing levels of contamination in buildings after a terrorist attack. This new module, funded by DHS through the Combating Terrorism Technology Support Office, Technical Support Working Group, was developed to provide a tailored, user-friendly and visually-orientated buildings module within the existing VSP software toolkit, the latest version of which can be downloaded from http://dqo.pnl.gov/vsp. In case of, or when planning against, a chemical, biological, or radionuclide release within a building, the VSP module can be used to quickly and easily develop and visualize technically defensible sampling schemes for walls, floors, ceilings, and other surfaces to statistically determine if contamination is present, its magnitude and extent throughout the building and if decontamination has been effective. This paper demonstrates the features of this new VSP buildings module, which include: the ability to import building floor plans or to easily draw, manipulate, and view rooms in several ways; being able to insert doors, windows and annotations into a room; 3-D graphic room views with surfaces labeled and floor plans that show building zones that have separate air handing units. The paper will also discuss the statistical design and data analysis options available in the buildings module. Design objectives supported include comparing an average to a threshold when the data distribution is normal or unknown, and comparing measurements to a threshold to detect hotspots or to insure most of the area is uncontaminated when the data distribution is normal or unknown.

2004

Analysis Experiences Using Information Visualization

Hetzler, E. and Turner A. 2004. "Analysis experiences using information visualization." IEEE Computer Graphics and Applications, 24:5, pp. 22-26.

Abstract:

To deliver truly useful tools, researchers must learn how to map between the knowledge domains inherent in information collections and the knowledge domains in users' minds. The true measure of this work is not what the software shows, but what the user is able to understand by using it. This article summarizes lessons learned from an observational study of the application of the In-Spire visually-oriented text exploitation system in an operational analysis environment.

Supporting Mutual Understanding in a Visual Dialogue Between Analyst and Computer

Chappell AR, AJ Cowell, DA Thurman, and JR Thomson. 2004. "Supporting Mutual Understanding in a Visual Dialogue Between Analyst and Computer." In HFES 2004 proceedings of the Human Factors and Ergonomics Society 48th Annual Meeting: September 20-24, 2004, New Orleans, Louisiana, p. 5 Human Factors & Ergonomics Society, Santa Monica, AB, Canada.

Abstract:

The Knowledge Associates for Novel Intelligence (KANI) project is developing a system of automated "associates" to actively support and participate in the information analysis task. The primary goal of KANI is to use automatically extracted information in a reasoning system that draws on the strengths of both a human analyst and automated reasoning. The interface between the two agents is a key element in achieving this goal. The KANI interface seeks to support a visual dialogue with mixed-initiative manipulation of information and reasoning components. To be successful, the interface must achieve mutual understanding between the analyst and KANI of the other's actions. Toward this mutual understanding, KANI allows the analyst to work at multiple levels of abstraction over the reasoning process, links the information presented across these levels to make use of interaction context, and provides querying facilities to allow exploration and explanation.

Visual Analytics

Wong PC, and J Thomas. 2004. "Visual Analytics." IEEE Computer Graphics and Applications, 24:5 pp20-21.

Excerpt:

The information revolution is upon us, and it's guaranteed to change our lives and the way we conduct our daily business. The fact that we have to deal with not just the size but also the variety and complexity of this information makes it a real challenge to survive the revolution. Enter visual analytics, a contemporary and proven approach to combine the art of human intuition and the science of mathematical deduction to directly perceive patterns and derive knowledge and insight from them.

Visual analytics is the formation of abstract visual metaphors in combination with a human information discourse (interaction) that enables detection of the expected and discovery of the unexpected within massive, dynamically changing information spaces. These suites of technologies apply to almost all fields but are being driven by critical needs in biology and national security...

Visualizing Data Streams

Wong PC, HP Foote, DR Adams, WE Cowley, LR Leung, and JJ Thomas. 2004. "Visualizing Data Streams." Chapter 11 in Visual and Spatial Analysis: Advances in Data Mining, Reasoning, and Problem Solving, ed. Boris Kovalerchuk and James Schwing, pp. 265-291,568,569,570,571. Springer, Dordrecht, Netherlands.

Abstract:

We introduce two dynamic visualization techniques using multi-dimensional scaling to analyze transient data streams such as newswires and remote sensing imagery. While the time-sensitive nature of these data streams requires immediate attention in many applications, the unpredictable and unbounded characteristics of this information can potentially overwhelm many scaling algorithms that require a full re-computation for every update. We present an adaptive visualization technique based on data stratification to ingest stream information adaptively when influx rate exceeds processing rate. We also describe an incremental visualization technique based on data fusion to project new information directly onto a visualization subspace spanned by the singular vectors of the previously processed neighboring data. The ultimate goal is to leverage the value of legacy and new information and minimize re-processing of the entire dataset in full resolution. We demonstrate these dynamic visualization results using a newswire corpus and a remote sensing imagery sequence.

2003

Dynamic Visualization of Transient Data Streams

Wong PC, HP Foote, DR Adams, WE Cowley, and JJ Thomas. 2003. "Dynamic Visualization of Transient Data Streams." In IEEE Symposium on Information Visualization 2003. Proceedings IEEE Symposium Information Visualization, Seattle, WA.

Abstract:

Global Visualization and Alignments of Whole Bacterial Genomes

Wong PC, K Wong, HP Foote, and JJ Thomas. 2003. "Global Visualization and Alignments of Whole Bacterial Genomes." IEEE Transactions on Visualization and Computer Graphics 9(3):361-377.

Abstract:

We present a novel visualization technique to align whole bacterial genomes with millions of nucleotides. Our basic design combines the descriptive power of pixel-based visualizations with the interpretative strength of digital image-processing filters. The innovative use of pixel enhancement techniques on pixel-based visualizations brings out the best of the recursive data patterns and further enhances the effectiveness of the visualization techniques. The result is a fast, versatile, and cost-effective analysis tool to reveal the functional identifications and the phenotypic changes of whole bacterial genomes. Our experiments show that our visualization-based genome alignment technique outperforms other computational-based tools in processing time. They also show that our pictorial results are far superior to the hardcopy printouts generated by computation-based programs in studying the overall genomic structures. Six different bacterial genomes obtained from public genome banks are used to demonstrate our designs and measure their performances.

2002

Multivariate Visualization with Data Fusion

Wong PC, HP Foote, DL Kao, LR Leung, and JJ Thomas. 2002. "Mulitvariate Visualization with Data Fusion." In Infomation Visualization, vol. 1, no. 3/4, ed. Chaomei Chen, pp. 182-193. MacMillan, Hampshire, United Kingdom.

Abstract:

We discuss a fusion-based visualization method to analyze a 2D flow field together with its related scalars. The primary difference between a conventional visualization and a fusion-based visuali-zation is that the former draws on a single image whereas the latter draws on multiple see-through layers, which are then over-laid on each other to form the final visualization. We propose uniquely designed colormaps to highlight flow features that would not be shown with conventional colormaps. We present fusion techniques that integrate multiple single-purpose flow visualiza-tion techniques into the same viewing space. Our highly flexible fusion approach allows scientists to explore multiple parameters concurrently by mixing and matching images without frequently reconstructing new visualizations from its data for every possible combination. Sample datasets collected from a climate modeling study are used to demonstrate our approach.

ThemeRiver: Visualizing Thematic Changes in Large Document Collections

Havre S, E Hetzler, P Whitney, and L Nowell. "ThemeRiver: Visualizing Thematic Changes in Large Document Collections". IEEE Transactions on Visualization and Computer Graphics, Vol.8, No. 1, January-March 2002.

Abstract:

The ThemeRiver visualization depicts thematic variations over time within a large collection of documents. The thematic changes are shown in the context of a timeline and corresponding external events. The focus on temporal thematic change whithin a context framework allows a user to discern patterns that suggest relationships or trends. For example, the sudden change of thematic strength following an external event may indicate a causal relationship. Such patterns are not readily accessible in other visualizations of the data. We use a river metaphor to convey several key notions. The document collection's time line, selected thematic content, and thematic strength are indicated by the river's directed flow, composition, and changing width, respectively. The directed flow from left to right is interpreted as movement through time and the horizontal distance between two points on the river defines a time interval. At any point in time, the vertical distance, or width, of the river indicates that collective strength of the selected themes. Colored "currents" flowing within the river represent individual themes. A current's vertical width narrows or broadens to indicate decreases or increases in the strength of the individual theme.

2001

Change blindness in information visualization: a case study

Nowell LT, EG Hetzler, and TE Tanasse. 2001. "Change Blindness in Information Visualization." October 22-23, 2001 Proceedings of the IEEE Information Visualization Symposium 2001 (InfoVis 2001), San Diego, CA.

Abstract:

This paper introduces a graphical method for visually presenting and exploring the results of multiple queries simultaneously. This method allows a user to visually compare multiple query result sets, explore various combinations among the query result sets, and identify the "best" matches for combinations of multiple independent queries. This approach might also help users explore methods for progressively improving queries by visually comparing the improvement in result sets.

Interactive Visualization of Multiple Query Results

S. Havre, E. Hetzler, K. Perrine, E. Jurrus, and N. Miller. 2001."Interactive Visualization of Multiple Query Results." October 22-23, 2001 Proceedings of the IEEE Information Visualization Symposium 2001 (InfoVis 2001), San Diego, CA.

Abstract:

This paper introduces a graphical method for visually presenting and exploring the results of multiple queries simultaneously. This method allows a user to visually compare multiple query result sets, explore various combinations among the query result sets, and identify the “best” matches for combinations of multiple independent queries. This approach might also help users explore methods for progressively improving queries by visually comparing the improvement in result sets.

Radical SAM, A Novel Protein Superfamily Linking Unresolved Steps in Familiar Biosynthetic Pathways with Radical Mechanisms: Functional Characterization Using New Analysis and Information Visualization Methods

Sofia HJ, G Chen, EG Hetzler, JF Reyes Spindola, and NE Miller. 2001. "Radical SAM, A Novel Protein Superfamily Linking Unresolved Steps in Familiar Biosynthetic Pathways with Radical Mechanisms: Functional Characterization Using New Analysis and Information Visualization Methods." Nucleic Acids Research 29(5):1097-1106.

Abstract:

A large protein superfamily with over 500 members has been discovered and analyzed using powerful new bioinformatics and information visualization methods. Evidence exists that these proteins generate a 5?-deoxyadenosyl radical by reductive cleavage of S-adenosylmethionine (SAM) through an unusual Fe-S center. Radical SAM superfamily proteins function in DNA precursor, vitamin, cofactor, antibiotic, and herbicide biosynthesis in a collection of basic and familiar pathways. One of the members is interferon-inducible and is considered a candidate drug target for osteoporosis. The identification of this superfamily suggests that radical-based catalysis is important in a number of previously well-studied but unresolved biochemical pathways.

2000

Data Signatures and Visualization of Very Large Datasets

Wong PC, H Foote, R Leung, D Adams, and J Thomas. 2000. Data Signatures and Visualization of Very Large Datasets. IEEE Computer Graphics and Applications, Vol 20, No 2, March 2000.

Abstract:

Today, as data sets used in computations grow in size and complexity,the technologies developed over the years to deal with scientific data sets have become less efficient and effective. Many frequently used operations,such as Eigenvector computation, could quickly exhaust our desktop workstations once the data size reaches certain limits.

On the other hand,the high-dimensional data sets we collect every day don't relieve the problem. Many conventional metric designs that build on quantitative or categorical data sets cannot be applied directly to heterogeneous data sets with multiple data types. While building new machines with more resources might conquer the data size problems, the complexity of today's computations requires a new breed of projection techniques to support analysis of the data and verification of the results.

We introduce the concept of a data signature, which captures the essence of a scientific data set in a compact format, and use it to conduct analysis as if using the original. A time-dependent climate simulation data set demonstrates our approach and presents the results.

DriftWeed - A Visual Metaphor for Interactive Analysis of Multivariate Data

Rose S and PC Wong. 2000. DriftWeed - A Visual Metaphor for Interactive Analysis of Multivariate Data. Proceedings IS&T/SPIE Conference on Visual Data Exploration and Analysis, San Jose, CA, Jan 2000.

Abstract:

We present a visualization technique that allows a user to identify and detect patterns and structures within a multivariate data set. Our research builds on previous efforts to represent multivariate data in a two-dimensional information display through the use of icon plots. Although the icon plot work done by Pickett and Grinstein is similar to our approach, we improve on their efforts in several ways.

Our technique allows analysis of a time series without using animation; promotes visual differentiation of information clusters based on measures of variance; and facilitates exploration through direct manipulation of geometry based on scales of variance.

Our goal is to provide a visualization that implicitly conveys the degree to which an element's ordered collection (pattern) of attributes varies from the prevailing pattern of attributes for other elements in the collection. We apply this technique to multivariate abstract data and use it to locate exceptional elements in a data set and divisions among clusters.

ThemeRiver: Visualizing Theme Changes over Time

Havre S, B Hetzler, and L Nowell. 2000. "ThemeRiver: Visualizing Theme Changes over Time", Proceedings of IEEE Symposium on Information Visualization, InfoVis 2000, pp. 115 - 123.

Abstract:

ThemeRiver™ is a prototype system that visualizes thematic variations over time within a large collection of documents. The "river" flows from left to right through time, changing width to depict changes in thematic strength of temporally associated documents. Colored "currents" flowing within the river narrow or widen to indicate decreases or increases in the strength of an individual topic or a group of topics in the associated documents. The river is shown within the context of a timeline and a corresponding textual presentation of external events.

Vector Fields Simplification - A Case Study of Visualizing Climate Modeling and Simulation Data Sets

Wong PC, H Foote, R Leung, E Jurrus, D Adams, and J Thomas. 2000. Vector Fields Simplification - A Case Study of Visualizing Climate Modeling and Simulation Data Sets. Proceedings IEEE Visualization 2000. Salt Lake City, Utah, Oct 8 - Oct 13, 2000.

Abstract:

In our study of regional climate modeling and simulation, we frequently encounter vector fields that are crowded with large numbers of critical points. A critical point in a flow is where the vector field vanishes. While these critical points accurately reflect the topology of the vector fields, in our study only a subset of them is worth further investigation. We present a filtering technique based on the vorticity of the vector fields to eliminate the less interesting and sometimes sporadic critical points in a multi-resolution fashion. The neighboring regions of the preserved features, which are characterized by strong shear and circulation, are potential locations of weather instability. We apply our feature- filtering technique to a regional climate modeling data set covering East Asia in the summer of 1991.

Visualizing Sequential Patterns for Text Mining

Wong PC, W Cowley, H Foote, E Jurrus, and J Thomas. 2000. Visualizing Sequential Patterns for Text Mining. Proceedings IEEE Information Visualization 2000, Salt Lake City, Utah, Oct 8 - Oct 13, 2000.

Abstract:

A sequential pattern in data mining is a finite series of elements such as A→B→C→D where A, B, C, and D are elements of the same domain. The mining of sequential patterns is designed to find patterns of discrete events that frequently happen in the same arrangement along a timeline. Like association and clustering, the mining of sequential patterns is among the most popular knowledge discovery techniques that apply statistical measures to extract useful information from large datasets. As our computers become more powerful, we are able to mine bigger datasets and obtain hundreds of thousands of sequential patterns in full detail. With this vast amount of data, we argue that neither data mining nor visualization by itself can manage the information and reflect the knowledge effectively. Subsequently, we apply visualization to augment data mining in a study of sequential patterns in large text corpora. The result shows that we can learn more and more quickly in an integrated visual data-mining environment.

1999

Visual Data Mining - Guest Editor's Introduction

Wong PC. 1999. Visual Data Mining - Guest Editor's Introduction. IEEE Computer Graphics and Applications, Vol 19, No 5, Sep 1999.

Abstract:

Seeing is knowing, though merely seeing is not enough. When you understand what you see, seeing becomes believing. A while ago scientists discovered that seeing and understanding together enable humans to glean knowledge and deeper insight from large amounts of data. The approach integrates the human mind's exploration abilities with the enormous processing power of computers to form a powerful knowledge discovery environment that capitalizes on the best of both worlds. The technology builds on visual and analytical processes developed in various disciplines including scientific visualization, data mining, statistics, and machine learning with custom extensions that handle very large, multidimensional, multivariate data sets. The methodology is based on both functionality that characterizes structures and displays data and human capabilities that perceive patterns, exceptions, trends, and relationships. Here I'll define the vision, present the state of the art, and discuss the future of a young discipline called visual data mining.

Visualizing Association Rules for Text Mining

Wong PC, P Whitney, and J Thomas. 1999. Visualizing Association Rules for Text Mining. Proceedings IEEE Information Visualization 99, San Francisco, CA, Oct 24 - Oct 29, 1999.

Abstract:

An association rule in data mining is an implication of the form X → Y where X is a set of antecedent items and Y is the consequent item. For years researchers have developed many tools to visualize association rules. However, few of these tools can handle more than dozens of rules, and none of them can effectively manage rules with multiple antecedents. Thus, it is extremely difficult to visualize and understand the association information of a large data set even when all the rules are available. This paper presents a novel visualization technique to tackle many of these problems. We apply the technology to a text mining study on large corpora. The results indicate that our design can easily handle hundreds of multiple antecedent association rules in a three-dimensional display with minimum human interaction, low occlusion percentage, and no screen swapping.

ThemeRiver™: In Search of Trends, Patterns, and Relationships

Havre S, B Hetzler, and L Nowell. 1999. ThemeRiver™: In Search of Trends, Patterns, and Relationships. In Proceedings of IEEE Symposium on Information Visualization, InfoVis '99, October 25-26, San Francisco CA.

Abstract:

ThemeRiver™ is a prototype system that visualizes thematic variations over time across a collection of documents. The "river" flows through time, changing width to depict changes in the thematic strength of documents temporally collocated. Themes or topics are represented as colored "currents" flowing within the river that narrow or widen to indicate decreases or increases in the strength of a topic in associated documents at a specific point in time. The river is shown within the context of a timeline and a corresponding textual presentation of external events.

Human Computer Interaction with Global Information Spaces - Beyond Data Mining

Thomas J, K Cook, V Crow, B Hetzler, R May, D McQuerry, R McVeety, N Miller, G Nakamura, L Nowell, P Whitney, and PC Wong. 1999. Human Computer Interaction with Global Information Spaces - Beyond Data Mining. Pacific Northwest National Laboratory, Richland, WA 99352

Abstract:

This invited paper describes a vision and progress towards a fundamentally new approach for dealing with the massive information overload situation of the emerging global information age. Today we use techniques such as data mining, through a WIMP interface, for searching or for analysis. Yet, the human mind can deal and interact simultaneously with millions of information items, e.g. documents. The challenge is to find visual paradigms, interaction techniques, and physical devices that encourage a new human information discourse between the human and their massive global and corporate information resources. After the vision, the current progress towards some core technology development, we present the grand challenges to bring this vision to reality.

1998

TOPIC ISLANDS™ - A Wavelet-Based Text Visualization System

Miller NE, PC Wong, M Brewster, and H Foote. 1998. TOPIC ISLANDS™ - A Wavelet Based Text Visualization System. In Proceedings of the conference on Visualization '98, pp. 189-196.

Abstract

We present a novel approach to visualize and explore unstructured text. The underlying technology, called TOPIC-O-GRAPHY™, applies wavelet transforms to a custom digital signal constructed from words within a document. The resultant multiresolution wavelet energy is used to analyze the characteristics of the narrative flow in the frequency domain, such as theme changes, which is then to the overall thematic content of the text document using statistical methods. The thematic characteristics of a document can be analyzed at varying degrees of detail, ranging from section-sized text partitions to partitions consisting of a few words. Using this technology, we are developing a visualization system prototype known as TOPIC ISLANDS™ to browse a document, generate fuzzy document outlines, summarize text by levels of detail and according to user interests, define meaningful subdocuments, query text content, and provide summaries of topic evolution.

Four Critical Elements for Designing Information Exploration Systems. [web page]

Hetzler B and N Miller. 1998. Four Critical Elements for Designing Information Exploration Systems. Presented at Information Exploration workshop for ACM SIGCHI '98. Los Angeles, CA. April 1998. PNNL-SA-29745

Abstract

Designing an information exploration system requires attention to four critical components. Since information exploration is a highly interactive process, the user is a key element. The second and third critical elements are the presentation methods that are used to communicate information and the interaction techniques that enable that user to actively explore that information. Finally, powerful mathematics are needed to identify and manipulate features of the information. This paper describes how these four critical components can work together to flexibly meet varied user goals.

Visualizing the Full Spectrum of Document Relationships

Hetzler B, WM Harris, S Havre , and P Whitney. 1998.Visualizing the Full Spectrum of Document Relationships. In Structures and Relations in Knowledge Organization. Proc. 5th Int. ISKO Conf. Wurzburg: ERGON Verlag, pp. 168-175.

Abstract

Documents embody a rich and potentially very useful set of complex interrelationships, both among the documents themselves and among the terms they contain. However, the very richness of these relationships and the variety of potential applications make it difficult to present them in a usable form. This paper describes an approach that enables the user to visualize a multitude of document or entity relationships. Two visual metaphors are presented that allow the user to gain new insights and understandings by interactively exploring these relationship patterns at multiple levels of detail.

Multi-faceted Insight Through Interoperable Visual Information Analysis Paradigms.

Hetzler B, P Whitney , L Martucci , and J Thomas. 1998. Multi-faceted Insight Through Interoperable Visual Information Analysis Paradigms. In Proceedings of IEEE Symposium on Information Visualization, InfoVis '98, October 19-20, 1998, Research Triangle Park, North Carolina. pp.137-144.

Abstract

To gain insight and understanding of complex information collections, users must be able to visualize and explore many facets of the information. This paper presents several novel visual methods from an information analyst's perspective. We present a sample scenario, using the various methods to gain a variety of insights from a large information collection. We conclude that no single paradigm or visual method is sufficient for many analytical tasks. Often a suite of integrated methods offers a better analytic environment in today's emerging culture of information overload and rapidly changing issues. We also conclude that the interactions among these visual paradigms are equally as important as, if not more important than, the paradigms themselves.

1997

Beyond Word Relations - SIGIR '97

Hetzler, E. 1997. Beyond Word Relations. SIGIR Forum, Fall 1997. Vol 31, No. 2. ACM Press, p. 28-32.

Abstract

Many information retrieval systems identify documents or provide a document visualization based on analysis of a particular relationship among documents — that of similar topical content. But there may be layers of other less apparent and less traditional relationships that are useful to the user. Exploring this other information was the subject of this workshop, with a focus on identifying new non-traditional relationships. An initial taxonomy was introduced and fleshed out during the workshop.

The Need For Metrics In Visual Information Analysis

Miller NE, G Nakamoto, B Hetzler , and P Whitney. 1997. The Need For Metrics In Visual Information Analysis. Workshop on New Paradigms in Information Visualization and Manipulation in conjunction with the Sixth ACM International Conference on Information and Knowledge Management (CIKM '97), November 13-14, 1997, Las Vegas Nevada, ACM Press

Abstract

This paper explores several methods for visualizing the thematic content of large document collections. As opposed to traditional query-driven document retrieval, these methods are used for exploring and gaining insight into document collections. For our experiments, we used 12,000 medical abstracts. The SPIRE [now IN-SPIRE] system was used to create the mathematical signal from text and to project the documents into a universe of "docustars" and as a thematic contour map based on thematic proximity. A self-organizing map is used to project the documents onto a "Tree" fractal. A topic-based approach is used to align documents between concepts in the "Cosmic Tumbleweed" projection. In the 32-D Hypercube, documents are organized by cascading theme strengths. An argument is made for a new type of metric that would facilitate comparisons among the many methods for visualizing or browsing document collections. An initial organization is proposed for some of the relevant research that metrics for information visualization can draw upon.