TY - JOUR
T1 - Performance analysis of machine learning algorithms and screening formulae for β–thalassemia trait screening of Indian antenatal women
AU - Das, Reena
AU - Saleh, Sarkaft
AU - Nielsen, Izabela
AU - Kaviraj, Anilava
AU - Sharma, Prashant
AU - Dey, Kartick
AU - Saha, Subrata
N1 - Copyright © 2022 The Author(s). Published by Elsevier B.V. All rights reserved.
PY - 2022/11
Y1 - 2022/11
N2 - Background: Currently, more than forty discrimination formulae based on red blood cell (RBC) parameters and some supervised machine learning algorithms (MLAs) have been recommended for β-thalassemia trait (BTT) screening. The present study was aimed to evaluate and compare the performance of 26 such formulae and 13 MLAs on antenatal woman data with a recently developed formula SCSBTT, which is available for evaluation in over seventy countries as an Android app, called SUSOKA [16]. Methods: A diagnostic database of 2942 antenatal females were collected from PGIMER, Chandigarh, India and was used for this analysis. The data set consists of hypochromic microcytic anemia, BTT, Hemoglobin E trait, double heterozygote for Hemoglobin S and BTT, heterozygote for Hemoglobin D Punjab and normal subjects. Performance of the formulae and the MLAs were assessed by Sensitivity, Specificity, Youden's Index, and AUC-ROC measures. A final recommendation was made from the ranking obtained through two Multiple Criteria Decision-Making (MCDM) techniques, namely, Simultaneous Evaluation of Criteria and Alternatives (SECA) and TOPSIS. Results: It was observed that Extreme Learning Machine (ELM) and Gradient Boosting Classifier (GBC) showed maximum Youden's index and AUC-ROC measures compared to all discriminating formulae. Sensitivity remains maximum for SCSBTT. K-means clustering and the ranking from MCDM methods show that SCSBTT, Shine & Lal and Ravanbakhsh-F4 formula ensures higher performance among all formulae. The discriminant power of some MLAs and formulae was found considerably lower than that reported in original studies. Conclusion: Comparative information on MLAs can aid researchers in developing new discriminating formulae that simultaneously ensure higher sensitivity and specificity. More multi-centric verification of the formulae on heterogeneous data is indispensable. SCSBTT and Shine & Lal formula, and ELM and GBC are recommended for screening BTT based on MCDM. SCSBTT can be used with certainty as a tangible cost-saving screening tool for mass screening for antenatal women in India and other countries.
AB - Background: Currently, more than forty discrimination formulae based on red blood cell (RBC) parameters and some supervised machine learning algorithms (MLAs) have been recommended for β-thalassemia trait (BTT) screening. The present study was aimed to evaluate and compare the performance of 26 such formulae and 13 MLAs on antenatal woman data with a recently developed formula SCSBTT, which is available for evaluation in over seventy countries as an Android app, called SUSOKA [16]. Methods: A diagnostic database of 2942 antenatal females were collected from PGIMER, Chandigarh, India and was used for this analysis. The data set consists of hypochromic microcytic anemia, BTT, Hemoglobin E trait, double heterozygote for Hemoglobin S and BTT, heterozygote for Hemoglobin D Punjab and normal subjects. Performance of the formulae and the MLAs were assessed by Sensitivity, Specificity, Youden's Index, and AUC-ROC measures. A final recommendation was made from the ranking obtained through two Multiple Criteria Decision-Making (MCDM) techniques, namely, Simultaneous Evaluation of Criteria and Alternatives (SECA) and TOPSIS. Results: It was observed that Extreme Learning Machine (ELM) and Gradient Boosting Classifier (GBC) showed maximum Youden's index and AUC-ROC measures compared to all discriminating formulae. Sensitivity remains maximum for SCSBTT. K-means clustering and the ranking from MCDM methods show that SCSBTT, Shine & Lal and Ravanbakhsh-F4 formula ensures higher performance among all formulae. The discriminant power of some MLAs and formulae was found considerably lower than that reported in original studies. Conclusion: Comparative information on MLAs can aid researchers in developing new discriminating formulae that simultaneously ensure higher sensitivity and specificity. More multi-centric verification of the formulae on heterogeneous data is indispensable. SCSBTT and Shine & Lal formula, and ELM and GBC are recommended for screening BTT based on MCDM. SCSBTT can be used with certainty as a tangible cost-saving screening tool for mass screening for antenatal women in India and other countries.
KW - Antenatal Women
KW - Diagnostic performance
KW - Multi-criteria decision-making
KW - Supervised machine learning algorithm
KW - β-Thalassemia carrier screening
KW - Diagnosis, Differential
KW - beta-Thalassemia/diagnosis
KW - Humans
KW - Machine Learning
KW - Pregnancy
KW - Algorithms
KW - Mass Screening
KW - Anemia, Iron-Deficiency/diagnosis
KW - Female
KW - Hemoglobin, Sickle
KW - Hemoglobin E
UR - http://www.scopus.com/inward/record.url?scp=85138480416&partnerID=8YFLogxK
U2 - 10.1016/j.ijmedinf.2022.104866
DO - 10.1016/j.ijmedinf.2022.104866
M3 - Journal article
C2 - 36174416
AN - SCOPUS:85138480416
VL - 167
JO - International Journal of Medical Informatics
JF - International Journal of Medical Informatics
SN - 1386-5056
M1 - 104866
ER -