TY - JOUR
T1 - Evaluating gradient-based explanation methods for neural network ECG analysis using heatmaps
AU - Storås, Andrea Marheim
AU - Mæland, Steffen
AU - Isaksen, Jonas L
AU - Hicks, Steven Alexander
AU - Thambawita, Vajira
AU - Graff, Claus
AU - Hammer, Hugo Lewi
AU - Halvorsen, Pål
AU - Riegler, Michael Alexander
AU - Kanters, Jørgen K
N1 - © The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: [email protected].
PY - 2025/1/1
Y1 - 2025/1/1
N2 - OBJECTIVE: Evaluate popular explanation methods using heatmap visualizations to explain the predictions of deep neural networks for electrocardiogram (ECG) analysis and provide recommendations for selection of explanations methods.MATERIALS AND METHODS: A residual deep neural network was trained on ECGs to predict intervals and amplitudes. Nine commonly used explanation methods (Saliency, Deconvolution, Guided backpropagation, Gradient SHAP, SmoothGrad, Input × gradient, DeepLIFT, Integrated gradients, GradCAM) were qualitatively evaluated by medical experts and objectively evaluated using a perturbation-based method.RESULTS: No single explanation method consistently outperformed the other methods, but some methods were clearly inferior. We found considerable disagreement between the human expert evaluation and the objective evaluation by perturbation.DISCUSSION: The best explanation method depended on the ECG measure. To ensure that future explanations of deep neural networks for medical data analyses are useful to medical experts, data scientists developing new explanation methods should collaborate tightly with domain experts. Because there is no explanation method that performs best in all use cases, several methods should be applied.CONCLUSION: Several explanation methods should be used to determine the most suitable approach.
AB - OBJECTIVE: Evaluate popular explanation methods using heatmap visualizations to explain the predictions of deep neural networks for electrocardiogram (ECG) analysis and provide recommendations for selection of explanations methods.MATERIALS AND METHODS: A residual deep neural network was trained on ECGs to predict intervals and amplitudes. Nine commonly used explanation methods (Saliency, Deconvolution, Guided backpropagation, Gradient SHAP, SmoothGrad, Input × gradient, DeepLIFT, Integrated gradients, GradCAM) were qualitatively evaluated by medical experts and objectively evaluated using a perturbation-based method.RESULTS: No single explanation method consistently outperformed the other methods, but some methods were clearly inferior. We found considerable disagreement between the human expert evaluation and the objective evaluation by perturbation.DISCUSSION: The best explanation method depended on the ECG measure. To ensure that future explanations of deep neural networks for medical data analyses are useful to medical experts, data scientists developing new explanation methods should collaborate tightly with domain experts. Because there is no explanation method that performs best in all use cases, several methods should be applied.CONCLUSION: Several explanation methods should be used to determine the most suitable approach.
KW - Deep Learning
KW - Electrocardiography
KW - Humans
KW - Neural Networks, Computer
KW - Signal Processing, Computer-Assisted
KW - explainable artificial intelligence
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85212991104&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocae280
DO - 10.1093/jamia/ocae280
M3 - Journal article
C2 - 39504476
SN - 1527-974X
VL - 32
SP - 79
EP - 88
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 1
M1 - ocae280
ER -