This page contains documentation resources associated with the MMTx project. This includes introductorary material, documentation on the main program MMTx, documentation on the component programs, documentation on ancillary programs and documentation on MMTx configured for specific applications.
- Overview Documentation
- Installation Documentation
- Introductory Documentation
- System Overview (A PowerPoint presentation turned into html)
- Java-Doc MMTx Class and API Documentation
- MetaMap References from the SKR (Semantic Knowledge Representation Project) web site
- MMTx: Examples
- Customization Documentation
- MMTx: The Main Program
MMTx Program MMTX maps (matches) text (from documents, queries) into concepts from the UMLS Metathesaurus. The best of these candidate concepts are organized in such a way as to best cover the text. - Component Programs
The MMTx project includes components (Java packages) that are
useful in and of themselves. These components were made into
individual programs. Such component programs include a tokenizer,
a noun phrase parser, a variant generation program just to mention
a few. The documentation that follows are the man pages for these
component programs.
Components Descriptions Tokenize Program The Tokenizer tokenizes collections into documents. It tokenizes documents into sections. It tokenizes sections into sets of Sentences and Tokens. Lexical Lookup Program LexicalLookUp retrieves lexical elements from some given text. Those lexical elements could have been terms that have been found in a lexicon, or identified by pattern such as numbers, dates, or identified by some other mechanism. Variants Program The Variants program retrieves useful variants for a given input phrase. These variants could and are used to retrieve candidate concepts from the UMLS Metathesaurus.
This program looks up pre-computed variants, computed from the MetaMap Variant Generation Program.MetaMap Variant Generation Program MetaMap Variants are variants of words and terms that have been computed from a variety of variant generation methods. This mixture includes recursively defined derivational variants, recursively defined synonyms, acronyms and abbreviations, acronym and abbreviation expansions, spelling variants, and useful combinations of all of the above methods. All inflectional variants of the resulting variants are also computed.
This program pre-computes the variants for a given customized source dataset.Parse Program This parser breaks sentences into phrases, using a minimal commitment barrier category parser. Tagger Program The MedPost/SKR tagger is a modified version of Larry Smith's MedPost Part of Speech tagger. MedPost/SKR is a java implementation of the trained component with changes made to address tokenization differences between MetaMap and MedPost.
NEW in MMTx V2.4.ACandidates Program The Candidates Generation program generates the closest matching UMLS Metathesaurus Concept candidates from input noun phrases that have been determined via some prior processing. These closest matching concept candidates are evaluated against the input noun phrases based on several criteria: coverage How many of the noun phrase tokens are covered in the candidate concept. cohesiveness A measure of how many contiguous tokens from the noun phrase match a candidate concept. centrality A measure of whether the important part of the noun phrase, the head of the noun phrase, was covered in a candidate concept. involvement A measure that combines coverage and cohesiveness, but takes into account word order variation. FinalMapping Program The FinalMapping program generates best covering set of concept candidates of a noun phrase.
- Ancillary Programs
Data File Builder The Data File Builder is a collection of scripts and programs that allow the creation of a custom data set. If the custom dataset is used, MMTx maps text into the concepts of a custom metathesaurus instead of the full UMLS Metathesaurus. Using a user supplied, custom metathesaurus, the Data File Builder generates and loads into the database the data files used by MMTx.
Use of a custom metathesaurus allows users to apply MMTx outside of the medical domain, or to avoid copyright issues associated with the full UMLS Metathesaurus.
- MMTx configured for specific applications
Approximate Matching Program This is MetaMap's term browse mode (-zogm command line options) This application maps input to Metathesaurus Concepts in a configuration that assumes that the input is a term, and that the concepts being returned are useful for term to term mapping, or are useful for browsing purposes. This application returns concepts that may contain gaps in the mapping and/or may contain over-matches in the mappings. No final mapping is performed.