TY - JOUR
T1 - A Systematic Approach to Configuring MetaMap for Optimal Performance
AU - Jing, Xia
AU - Indani, Akash
AU - Hubig, Nina
AU - Min, Hua
AU - Gong, Yang
AU - Cimino, James J.
AU - Sittig, Dean F.
AU - Rennert, Lior
AU - Robinson, David
AU - Biondich, Paul
AU - Wright, Adam
AU - Nøhr, Christian Gradhandt
AU - Law, Timothy
AU - Faxvaag, Arild
AU - Gimbel, Ronald
N1 - The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).
PY - 2022/12
Y1 - 2022/12
N2 - Background: MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward.Objective: To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance.Methods: MetaMap, the word2vec model, and the phrase model were used to build a pipeline. For unsupervised training, the phrase and word2vec models used abstracts related to clinical decision support as input. During testing, MetaMap was configured with the default option, one behavior option, and two behavior options. For each configuration, cosine and soft cosine similarity scores between identified entities and gold-standard terms were computed for 40 annotated abstracts (422 sentences). The similarity scores were used to calculate and compare the overall percentages of exact matches, similar matches, and missing gold-standard terms among the abstracts for each configuration. The results were manually spot-checked. The precision, recall, and F-measure (β =1) were calculated.Results: The percentages of exact matches and missing gold-standard terms were 0.6–0.79 and 0.09–0.3 for one behavior option, and 0.56–0.8 and 0.09–0.3 for two behavior options, respectively. The percentages of exact matches and missing terms for soft cosine similarity scores exceeded those for cosine similarity scores. The average precision, recall, and F-measure were 0.59, 0.82, and 0.68 for exact matches, and 1.00, 0.53, and 0.69 for missing terms, respectively.Conclusion: We demonstrated a systematic approach that provides objective and accurate evidence guiding MetaMap configurations for optimizing performance. Combining objective evidence and the current practice of using principles, experience, and intuitions outperforms a single strategy in MetaMap configurations. Our methodology, reference codes, measurements, results, and workflow are valuable references for optimizing and configuring MetaMap.
AB - Background: MetaMap is a valuable tool for processing biomedical texts to identify concepts. Although MetaMap is highly configurative, configuration decisions are not straightforward.Objective: To develop a systematic, data-driven methodology for configuring MetaMap for optimal performance.Methods: MetaMap, the word2vec model, and the phrase model were used to build a pipeline. For unsupervised training, the phrase and word2vec models used abstracts related to clinical decision support as input. During testing, MetaMap was configured with the default option, one behavior option, and two behavior options. For each configuration, cosine and soft cosine similarity scores between identified entities and gold-standard terms were computed for 40 annotated abstracts (422 sentences). The similarity scores were used to calculate and compare the overall percentages of exact matches, similar matches, and missing gold-standard terms among the abstracts for each configuration. The results were manually spot-checked. The precision, recall, and F-measure (β =1) were calculated.Results: The percentages of exact matches and missing gold-standard terms were 0.6–0.79 and 0.09–0.3 for one behavior option, and 0.56–0.8 and 0.09–0.3 for two behavior options, respectively. The percentages of exact matches and missing terms for soft cosine similarity scores exceeded those for cosine similarity scores. The average precision, recall, and F-measure were 0.59, 0.82, and 0.68 for exact matches, and 1.00, 0.53, and 0.69 for missing terms, respectively.Conclusion: We demonstrated a systematic approach that provides objective and accurate evidence guiding MetaMap configurations for optimizing performance. Combining objective evidence and the current practice of using principles, experience, and intuitions outperforms a single strategy in MetaMap configurations. Our methodology, reference codes, measurements, results, and workflow are valuable references for optimizing and configuring MetaMap.
KW - clinical decision support system
KW - configuration and optimization
KW - MetaMap
KW - natural language processing
KW - performance
UR - http://www.scopus.com/inward/record.url?scp=85138668239&partnerID=8YFLogxK
U2 - 10.1055/a-1862-0421
DO - 10.1055/a-1862-0421
M3 - Journal article
C2 - 35613942
AN - SCOPUS:85138668239
SN - 0026-1270
VL - 61
SP - E51-E63
JO - Methods of Information in Medicine
JF - Methods of Information in Medicine
IS - S 02
ER -