Performance analysis of machine learning algorithms and screening formulae for β–thalassemia trait screening of Indian antenatal women

Reena Das; Sarkaft Saleh; Izabela Nielsen; Anilava Kaviraj; Prashant Sharma; Kartick Dey; Subrata Saha

doi:10.1016/j.ijmedinf.2022.104866

Performance analysis of machine learning algorithms and screening formulae for β–thalassemia trait screening of Indian antenatal women

Reena Das, Sarkaft Saleh, Izabela Nielsen, Anilava Kaviraj, Prashant Sharma, Kartick Dey, Subrata Saha

Research output: Contribution to journal › Journal article › Research › peer-review

7 Citations (Scopus)

119 Downloads (Pure)

Abstract

Background: Currently, more than forty discrimination formulae based on red blood cell (RBC) parameters and some supervised machine learning algorithms (MLAs) have been recommended for β-thalassemia trait (BTT) screening. The present study was aimed to evaluate and compare the performance of 26 such formulae and 13 MLAs on antenatal woman data with a recently developed formula SCS_BTT, which is available for evaluation in over seventy countries as an Android app, called SUSOKA [16]. Methods: A diagnostic database of 2942 antenatal females were collected from PGIMER, Chandigarh, India and was used for this analysis. The data set consists of hypochromic microcytic anemia, BTT, Hemoglobin E trait, double heterozygote for Hemoglobin S and BTT, heterozygote for Hemoglobin D Punjab and normal subjects. Performance of the formulae and the MLAs were assessed by Sensitivity, Specificity, Youden's Index, and AUC-ROC measures. A final recommendation was made from the ranking obtained through two Multiple Criteria Decision-Making (MCDM) techniques, namely, Simultaneous Evaluation of Criteria and Alternatives (SECA) and TOPSIS. Results: It was observed that Extreme Learning Machine (ELM) and Gradient Boosting Classifier (GBC) showed maximum Youden's index and AUC-ROC measures compared to all discriminating formulae. Sensitivity remains maximum for SCS_BTT. K-means clustering and the ranking from MCDM methods show that SCS_BTT, Shine & Lal and Ravanbakhsh-F4 formula ensures higher performance among all formulae. The discriminant power of some MLAs and formulae was found considerably lower than that reported in original studies. Conclusion: Comparative information on MLAs can aid researchers in developing new discriminating formulae that simultaneously ensure higher sensitivity and specificity. More multi-centric verification of the formulae on heterogeneous data is indispensable. SCS_BTT and Shine & Lal formula, and ELM and GBC are recommended for screening BTT based on MCDM. SCS_BTT can be used with certainty as a tangible cost-saving screening tool for mass screening for antenatal women in India and other countries.

Original language	English
Article number	104866
Journal	International Journal of Medical Informatics
Volume	167
ISSN	1386-5056
DOIs	https://doi.org/10.1016/j.ijmedinf.2022.104866
Publication status	Published - Nov 2022

Bibliographical note

Keywords

Antenatal Women
Diagnostic performance
Multi-criteria decision-making
Supervised machine learning algorithm
β-Thalassemia carrier screening
Diagnosis, Differential
beta-Thalassemia/diagnosis
Humans
Machine Learning
Pregnancy
Algorithms
Mass Screening
Anemia, Iron-Deficiency/diagnosis
Female
Hemoglobin, Sickle
Hemoglobin E

Access to Document

10.1016/j.ijmedinf.2022.104866Licence: CC BY 4.0

Open Access articleFinal published version, 1.28 MBLicence: CC BY 4.0

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{b8167b932d6c4ea68326af516f7f9f78,

title = "Performance analysis of machine learning algorithms and screening formulae for β–thalassemia trait screening of Indian antenatal women",

abstract = "Background: Currently, more than forty discrimination formulae based on red blood cell (RBC) parameters and some supervised machine learning algorithms (MLAs) have been recommended for β-thalassemia trait (BTT) screening. The present study was aimed to evaluate and compare the performance of 26 such formulae and 13 MLAs on antenatal woman data with a recently developed formula SCSBTT, which is available for evaluation in over seventy countries as an Android app, called SUSOKA [16]. Methods: A diagnostic database of 2942 antenatal females were collected from PGIMER, Chandigarh, India and was used for this analysis. The data set consists of hypochromic microcytic anemia, BTT, Hemoglobin E trait, double heterozygote for Hemoglobin S and BTT, heterozygote for Hemoglobin D Punjab and normal subjects. Performance of the formulae and the MLAs were assessed by Sensitivity, Specificity, Youden's Index, and AUC-ROC measures. A final recommendation was made from the ranking obtained through two Multiple Criteria Decision-Making (MCDM) techniques, namely, Simultaneous Evaluation of Criteria and Alternatives (SECA) and TOPSIS. Results: It was observed that Extreme Learning Machine (ELM) and Gradient Boosting Classifier (GBC) showed maximum Youden's index and AUC-ROC measures compared to all discriminating formulae. Sensitivity remains maximum for SCSBTT. K-means clustering and the ranking from MCDM methods show that SCSBTT, Shine & Lal and Ravanbakhsh-F4 formula ensures higher performance among all formulae. The discriminant power of some MLAs and formulae was found considerably lower than that reported in original studies. Conclusion: Comparative information on MLAs can aid researchers in developing new discriminating formulae that simultaneously ensure higher sensitivity and specificity. More multi-centric verification of the formulae on heterogeneous data is indispensable. SCSBTT and Shine & Lal formula, and ELM and GBC are recommended for screening BTT based on MCDM. SCSBTT can be used with certainty as a tangible cost-saving screening tool for mass screening for antenatal women in India and other countries.",

keywords = "Antenatal Women, Diagnostic performance, Multi-criteria decision-making, Supervised machine learning algorithm, β-Thalassemia carrier screening, Diagnosis, Differential, beta-Thalassemia/diagnosis, Humans, Machine Learning, Pregnancy, Algorithms, Mass Screening, Anemia, Iron-Deficiency/diagnosis, Female, Hemoglobin, Sickle, Hemoglobin E",

author = "Reena Das and Sarkaft Saleh and Izabela Nielsen and Anilava Kaviraj and Prashant Sharma and Kartick Dey and Subrata Saha",

year = "2022",

month = nov,

doi = "10.1016/j.ijmedinf.2022.104866",

language = "English",

volume = "167",

journal = "International Journal of Medical Informatics",

issn = "1386-5056",

publisher = "Elsevier",

}

TY - JOUR

T1 - Performance analysis of machine learning algorithms and screening formulae for β–thalassemia trait screening of Indian antenatal women

AU - Das, Reena

AU - Saleh, Sarkaft

AU - Nielsen, Izabela

AU - Kaviraj, Anilava

AU - Sharma, Prashant

AU - Dey, Kartick

AU - Saha, Subrata

PY - 2022/11

Y1 - 2022/11

N2 - Background: Currently, more than forty discrimination formulae based on red blood cell (RBC) parameters and some supervised machine learning algorithms (MLAs) have been recommended for β-thalassemia trait (BTT) screening. The present study was aimed to evaluate and compare the performance of 26 such formulae and 13 MLAs on antenatal woman data with a recently developed formula SCSBTT, which is available for evaluation in over seventy countries as an Android app, called SUSOKA [16]. Methods: A diagnostic database of 2942 antenatal females were collected from PGIMER, Chandigarh, India and was used for this analysis. The data set consists of hypochromic microcytic anemia, BTT, Hemoglobin E trait, double heterozygote for Hemoglobin S and BTT, heterozygote for Hemoglobin D Punjab and normal subjects. Performance of the formulae and the MLAs were assessed by Sensitivity, Specificity, Youden's Index, and AUC-ROC measures. A final recommendation was made from the ranking obtained through two Multiple Criteria Decision-Making (MCDM) techniques, namely, Simultaneous Evaluation of Criteria and Alternatives (SECA) and TOPSIS. Results: It was observed that Extreme Learning Machine (ELM) and Gradient Boosting Classifier (GBC) showed maximum Youden's index and AUC-ROC measures compared to all discriminating formulae. Sensitivity remains maximum for SCSBTT. K-means clustering and the ranking from MCDM methods show that SCSBTT, Shine & Lal and Ravanbakhsh-F4 formula ensures higher performance among all formulae. The discriminant power of some MLAs and formulae was found considerably lower than that reported in original studies. Conclusion: Comparative information on MLAs can aid researchers in developing new discriminating formulae that simultaneously ensure higher sensitivity and specificity. More multi-centric verification of the formulae on heterogeneous data is indispensable. SCSBTT and Shine & Lal formula, and ELM and GBC are recommended for screening BTT based on MCDM. SCSBTT can be used with certainty as a tangible cost-saving screening tool for mass screening for antenatal women in India and other countries.

AB - Background: Currently, more than forty discrimination formulae based on red blood cell (RBC) parameters and some supervised machine learning algorithms (MLAs) have been recommended for β-thalassemia trait (BTT) screening. The present study was aimed to evaluate and compare the performance of 26 such formulae and 13 MLAs on antenatal woman data with a recently developed formula SCSBTT, which is available for evaluation in over seventy countries as an Android app, called SUSOKA [16]. Methods: A diagnostic database of 2942 antenatal females were collected from PGIMER, Chandigarh, India and was used for this analysis. The data set consists of hypochromic microcytic anemia, BTT, Hemoglobin E trait, double heterozygote for Hemoglobin S and BTT, heterozygote for Hemoglobin D Punjab and normal subjects. Performance of the formulae and the MLAs were assessed by Sensitivity, Specificity, Youden's Index, and AUC-ROC measures. A final recommendation was made from the ranking obtained through two Multiple Criteria Decision-Making (MCDM) techniques, namely, Simultaneous Evaluation of Criteria and Alternatives (SECA) and TOPSIS. Results: It was observed that Extreme Learning Machine (ELM) and Gradient Boosting Classifier (GBC) showed maximum Youden's index and AUC-ROC measures compared to all discriminating formulae. Sensitivity remains maximum for SCSBTT. K-means clustering and the ranking from MCDM methods show that SCSBTT, Shine & Lal and Ravanbakhsh-F4 formula ensures higher performance among all formulae. The discriminant power of some MLAs and formulae was found considerably lower than that reported in original studies. Conclusion: Comparative information on MLAs can aid researchers in developing new discriminating formulae that simultaneously ensure higher sensitivity and specificity. More multi-centric verification of the formulae on heterogeneous data is indispensable. SCSBTT and Shine & Lal formula, and ELM and GBC are recommended for screening BTT based on MCDM. SCSBTT can be used with certainty as a tangible cost-saving screening tool for mass screening for antenatal women in India and other countries.

KW - Antenatal Women

KW - Diagnostic performance

KW - Multi-criteria decision-making

KW - Supervised machine learning algorithm

KW - β-Thalassemia carrier screening

KW - Diagnosis, Differential

KW - beta-Thalassemia/diagnosis

KW - Humans

KW - Machine Learning

KW - Pregnancy

KW - Algorithms

KW - Mass Screening

KW - Anemia, Iron-Deficiency/diagnosis

KW - Female

KW - Hemoglobin, Sickle

KW - Hemoglobin E

UR - http://www.scopus.com/inward/record.url?scp=85138480416&partnerID=8YFLogxK

U2 - 10.1016/j.ijmedinf.2022.104866

DO - 10.1016/j.ijmedinf.2022.104866

M3 - Journal article

C2 - 36174416

AN - SCOPUS:85138480416

SN - 1386-5056

VL - 167

JO - International Journal of Medical Informatics

JF - International Journal of Medical Informatics

M1 - 104866

ER -

Performance analysis of machine learning algorithms and screening formulae for β–thalassemia trait screening of Indian antenatal women

Abstract

Bibliographical note

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this