Background
The Unified Medical Language System® (UMLS) approach
involves the development of a set of widely distributed
Knowledge Sources (Metathesuarus®, Semantic Network, and
SPECIALIST Lexicon) that can be used by a variety of
applications to compensate for differences in the way
concepts are expressed in a variety of computerized
biomedical sources.
The UMLS Knowledge Source Server (UMLSKS) is a computer
application that provides Internet access to the Knowledge
Sources and other related resources made available by
developers using the UMLS. Its purpose is to make UMLS
data more accessible to users, and in particular to system
developers. The system architecture allows remote site users
(individuals as well as computer programs) to send requests
to a server at the National Library of Medicine (NLM). Access
to the system is provided through the World Wide Web, an
Extensible Markup Language (XML)-based socket programming
interface, and through an Application Programmer Interface (API).
The centrally managed UMLSKS provides developers with UMLS
information remotely and on demand. The advantage of such
an approach is that it makes the Knowledge Sources readily
available, and perhaps more importantly, developers do not
need to invest time and effort in understanding the structure
of the data files and other details to use the UMLS data in
their applications.
System Architecture
Version 2.0 of the UMLSKS, made available in March 2002,
was a redesign of the original "C" programming language system
with new features added, including access to the UMLS Knowledge
Sources through a public Web interface, incorporation of XML
support for programmers both in requesting and returning data,
and inclusion of a Java-based Object Model of the UMLS
Metathesaurus data. Subsequent releases of the software have
augmented the available API functions and refined system operations.
The system was designed with following
design tenets in mind:
- Extensibility for ease of new feature incorporation
- Scalability in handling ever increasing user loads and
increasing numbers of UMLS vocabularies
- Performance considerations permitting faster access to UMLS data
- Flexibility in access modes including a rich API set with
access to all of the UMLS data
- Ease of administration by NLM staff and contractors
- Limited system interruptions during system software
upgrades
Metathesaurus®
The UMLSKS allows the user to request information about
particular Metathesaurus concepts, including attributes
such as the concept's definition, its semantic types,
concepts that are related to it, hierarchical context
details, co-occurrence information from MEDLINE® and
AI-RHEUM, etc., all of which can be restricted to source
specific details. The UMLSKS also allows the user to
request information about the attributes themselves, for
example, by asking for all the concepts that have been
assigned to a particular semantic type, or by asking for
all the terms that have a particular lexical tag.
Semantic Network
The Semantic Network contains information about semantic
types and their relationships. The implementation of the
network module computes the relationships between semantic
types using the inheritance property of the network type
hierarchy. Information in the Semantic Network can be
queried in terms of two semantic types and the relationship
between them. Individual queries are specified by providing
the known types or relations and leaving out the unknowns.
The system then retrieves the corresponding values for the
unknowns. If the user wishes to know what types are related
by a particular relation, then the user would indicate only
the relationship name and all the semantic type pairs linked
by that relationship would be retrieved. The user might also
wish to know if a particular relationship holds between a pair
of types.
SPECIALIST Lexicon
The UMLSKS also provides access to lexical records in the
SPECIALIST Lexicon. The lexicon entry for each word, or term,
records syntactic, morphological, and orthographic
information. Lexical entries may be single or multi-word
terms. Lexical information includes syntactic category,
inflectional variation (e.g. singular and plural for nouns,
the conjugation of verbs, the positive, comparative, and
superlative for adjectives and adverbs), and allowable
complementation patterns (i.e., the objects and other
arguments that verbs, nouns, and adjectives can take).
UMLS Resources
A number of additional resources for use by developers
and researchers are available through the UMLSKS. Source
files for the three Knowledge Sources may be downloaded
from the UMLSKS. Source code and documentation for the
SPECIALIST Lexicon programs are also made available to
developers.
System Design
Version 2.0 of the UMLSKS migrated from a "C" implementation
to a pure Java implementation and included a number of new
features and functions not found in the previous version.
The Web interface was completely redesigned, enabling the
user to access more information about concepts and terms
quickly and easily. The new look-and-feel puts more
capabilities at users' fingertips. The Web interface also
incorporates additional documentation and resources to aid
users in accessing the full power of the system and assist
developers in creating powerful applications utilizing the
UMLSKS. Version 3.0 and later releases incorporated additional
features into the web interface including a browser for the
Semantic Network, new API functions for accessing Semantic
Network details, and new API functions to access SPECIALIST
Lexicon descriptions.
The architecture of the UMLSKS from Version 1.0 was changed
significantly to form the new architecture used in Version 2.0
and later releases. The
Web server is implemented as a collection of Java servlets.
These servlets along with XML and XSLT stylesheets enable
quick and easy access to UMLS data. Open source software
from Apache was used for the development of all aspects of
the Web server. The Web server software, as well as the
socket interface and Java API, connects through the Internet
to a backend Remote Method Invocation (RMI) server. This
server processes all requests for Knowledge Source data,
accessing an Oracle® database to obtain relevant
information, and forwards those data sets through the Internet
to the requestor. The UMLSKS is designed to simultaneously
support any number of releases of UMLS Knowledge Sources.
The Web interface continues to provide access to all of the
UMLS Knowledge Sources and incorporates new features into
its design:
- On-line account and user login requests
- User profile editing
- Metathesaurus concept/term searching
- Source-specific searching
- On-line User's Guide
In addition to the programming language change and
enhancements to the user interface, the Extensible Markup
Language (XML) has been incorporated into the design and
used extensively throughout the implementation to provide
flexibility in delivering data to users. An Object Model
for Metathesaurus, Semantic Network and SPECIALIST
Lexicon data was created that
allows users to ingest XML documents produced by the UMLSKS
and to manipulate those data in an object-oriented fashion
within their own programs. The Object Model provides a
mechanism for representing concepts, semantic network nodes
and relationships, and lexical elements and related data
consistently among developers.
UMLSKS Application Programmer Interfaces (APIs)
The new Application Programmer Interface (API) is entirely
written in Java with the goal of providing a platform
independent form. In addition, an XML-based API resembling
the Java API methods is included allowing both Java and
non-Java programs to interface to the UMLS through a
standard TCP socket. The flexibility provided by both APIs
enables the UMLSKS to support developers whose platforms
range from PCs to high-end Unix machines. In all,
approximately 40 API methods have been defined allowing
access to all details of the Metathesaurus. The API download
includes 'javadoc' documentation for all interface and
object model classes, a set of example Java programs for
issuing API calls, some sample XML documents that may be used
as input to the UMLSKS socket interface, and sample XML output
files for each of the API methods.
Future Enhancements
Development of Version 5.0 is currently underway and
expected deployment is slated for September of 2004. Users
can expect some of the following additions to the UMLSKS suite of
functions/enhancements:
- API methods and an associated object model for
accessing/representing the new UMLS atomic model.
- On-line access to the features/functions of the UMLS
MetamorphoSys utility.
- Tailorable user interface for viewing concepts and
terms.
|