Classification of α-thalassemia data using machine learning models

Frederik Christensen*, Deniz Kenan Kılıç, Izabela Ewa Nielsen, Tarec Christoffer El-Galaly, Andreas Glenthøj, Jens Helby, Henrik Frederiksen, Sören Möller, Alexander Djupnes Fuglkjær

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

10 Downloads (Pure)

Abstract

BACKGROUND: Around 7% of the global population has congenital hemoglobin disorders, with over 300,000 new cases of α-thalassemia annually. Diagnosis is costly and inaccurate in low-income regions, often relying on complete blood count (CBC) tests. This study employs machine learning (ML) to classify α-thalassemia traits based on gender and CBC, exploring the effects of grouping silent- and non-carriers.

METHODS: The dataset includes 288 individuals with suspected α-thalassemia from Sri Lanka. It was classified using eleven discriminant formulae and nine ML models. Outliers were removed using Mahalanobis distance, and resampling was conducted with the synthetic minority oversampling technique (SMOTE) and SMOTE-nominal continuous (NC). The Mann-Whitney U test handled feature extraction and class grouping. ML performance was evaluated with eight criteria.

RESULTS: The Ehsani formula achieved an area under the receiver operating characteristic curve (ROC-AUC) of 0.66 by grouping silent- and non-carriers. The convolutional neural network (CNN) without feature extraction demonstrated better performance, with an accuracy of 0.85, sensitivity of 0.8, specificity of 0.86, and ROC-AUC of 0.95/0.93 (micro/macro). Performance was maintained even without preprocessing.

CONCLUSION: ML models outperformed classical discriminant formulae in classifying α-thalassemia using sex and CBC features. A larger dataset could enhance ML model generalization and the impact of feature extraction. Grouping silent- and non-carriers improved ML results, especially with resampling. The silent carriers were not separable from non-carriers regarding the available features.

Original languageEnglish
Article number108581
JournalComputer Methods and Programs in Biomedicine
Volume260
Number of pages20
ISSN0169-2607
DOIs
Publication statusPublished - Mar 2025

Bibliographical note

Copyright © 2025 The Author(s). Published by Elsevier B.V. All rights reserved.

Keywords

  • Algorithms
  • Female
  • Humans
  • Machine Learning
  • Male
  • Neural Networks, Computer
  • ROC Curve
  • Sri Lanka
  • alpha-Thalassemia/classification
  • Hemoglobinopathies
  • Classification
  • Machine learning
  • Artificial intelligence
  • Alpha thalassemia

Fingerprint

Dive into the research topics of 'Classification of α-thalassemia data using machine learning models'. Together they form a unique fingerprint.

Cite this