Performance analysis of machine learning algorithms and screening formulae for β–thalassemia trait screening of Indian antenatal women

Reena Das, Sarkaft Saleh, Izabela Nielsen, Anilava Kaviraj, Prashant Sharma, Kartick Dey, Subrata Saha

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

1 Citationer (Scopus)
6 Downloads (Pure)


Background: Currently, more than forty discrimination formulae based on red blood cell (RBC) parameters and some supervised machine learning algorithms (MLAs) have been recommended for β-thalassemia trait (BTT) screening. The present study was aimed to evaluate and compare the performance of 26 such formulae and 13 MLAs on antenatal woman data with a recently developed formula SCSBTT, which is available for evaluation in over seventy countries as an Android app, called SUSOKA [16]. Methods: A diagnostic database of 2942 antenatal females were collected from PGIMER, Chandigarh, India and was used for this analysis. The data set consists of hypochromic microcytic anemia, BTT, Hemoglobin E trait, double heterozygote for Hemoglobin S and BTT, heterozygote for Hemoglobin D Punjab and normal subjects. Performance of the formulae and the MLAs were assessed by Sensitivity, Specificity, Youden's Index, and AUC-ROC measures. A final recommendation was made from the ranking obtained through two Multiple Criteria Decision-Making (MCDM) techniques, namely, Simultaneous Evaluation of Criteria and Alternatives (SECA) and TOPSIS. Results: It was observed that Extreme Learning Machine (ELM) and Gradient Boosting Classifier (GBC) showed maximum Youden's index and AUC-ROC measures compared to all discriminating formulae. Sensitivity remains maximum for SCSBTT. K-means clustering and the ranking from MCDM methods show that SCSBTT, Shine & Lal and Ravanbakhsh-F4 formula ensures higher performance among all formulae. The discriminant power of some MLAs and formulae was found considerably lower than that reported in original studies. Conclusion: Comparative information on MLAs can aid researchers in developing new discriminating formulae that simultaneously ensure higher sensitivity and specificity. More multi-centric verification of the formulae on heterogeneous data is indispensable. SCSBTT and Shine & Lal formula, and ELM and GBC are recommended for screening BTT based on MCDM. SCSBTT can be used with certainty as a tangible cost-saving screening tool for mass screening for antenatal women in India and other countries.

TidsskriftInternational Journal of Medical Informatics
StatusUdgivet - nov. 2022

Bibliografisk note

Copyright © 2022 The Author(s). Published by Elsevier B.V. All rights reserved.


Dyk ned i forskningsemnerne om 'Performance analysis of machine learning algorithms and screening formulae for β–thalassemia trait screening of Indian antenatal women'. Sammen danner de et unikt fingeraftryk.