Skip to Main Content U.S. Department of Energy
PNNL Community Outreach

Volume 2, Issue 7

August 30, 2007

Universal parsing agent—more time analyzing, less time sorting


UPA earned a R&D 100 award earlier this year.

Thanks to a new Pacific Northwest National Laboratory-developed technology, countless businesses and organizations may spend less time sorting through mounds of data for “just the right” information.  The Laboratory has developed a unique approach to information extraction, transformation and delivery. Today, information workers using available tools often spend more time cleaning, sorting and reformatting data in preparation for analysis than analyzing the data itself.

The Universal Parsing Agent (UPA) is a document analysis and transformation software program that helps mitigate this problem by supporting massive-scale conversion of information into forms suitable for the semantic web. UPA provides reusable tools to analyze text documents, identify and extract important information elements, enhance text with semantically descriptive tags and output the information in the format and structure that is needed.

This innovative technology accepts multiple datasets or streams of data, discovers and extracts information needed by users and delivers results in their most useful form. The UPA is implemented as a Java-based client-server application, which allows a heterogeneous mix of users to access the program through a web-based interface. The system has four basic components:

  • Identifying input—Users point UPA at one or more directories where information is being stored for processing.
  • Specify a delivery and transformation approach—UPA users create a knowledge library of templates that describe how data are to be identified.
  • Enhance the output—Once information has been identified and extracted, it can be semantically enhanced with output tags.
  • Mange the processing—UPA allows users to work in test mode or operational mode.

UPA’s flexibility and adaptability are a fundamental reason why this software is a valuable assistant in an environment of information overload. This tool is a front-end solution for any application requiring sophisticated data ingests preparation and transformation. The key contribution of UPA is the added value of semantic enhancement. Marking data with semantic descriptions can provide tools to function at a much higher efficiency level than is currently possible.

Pacific Northwest Technology Today

Additional Information

In this Issue

Volume 2, Issue 7