Abstract
BACKGROUND: Around 7% of the global population has congenital hemoglobin disorders, with over 300,000 new cases of α-thalassemia annually. Diagnosis is costly and inaccurate in low-income regions, often relying on complete blood count (CBC) tests. This study employs machine learning (ML) to classify α-thalassemia traits based on gender and CBC, exploring the effects of grouping silent- and non-carriers.
METHODS: The dataset includes 288 individuals with suspected α-thalassemia from Sri Lanka. It was classified using eleven discriminant formulae and nine ML models. Outliers were removed using Mahalanobis distance, and resampling was conducted with the synthetic minority oversampling technique (SMOTE) and SMOTE-nominal continuous (NC). The Mann-Whitney U test handled feature extraction and class grouping. ML performance was evaluated with eight criteria.
RESULTS: The Ehsani formula achieved an area under the receiver operating characteristic curve (ROC-AUC) of 0.66 by grouping silent- and non-carriers. The convolutional neural network (CNN) without feature extraction demonstrated better performance, with an accuracy of 0.85, sensitivity of 0.8, specificity of 0.86, and ROC-AUC of 0.95/0.93 (micro/macro). Performance was maintained even without preprocessing.
CONCLUSION: ML models outperformed classical discriminant formulae in classifying α-thalassemia using sex and CBC features. A larger dataset could enhance ML model generalization and the impact of feature extraction. Grouping silent- and non-carriers improved ML results, especially with resampling. The silent carriers were not separable from non-carriers regarding the available features.
Original language | English |
---|---|
Article number | 108581 |
Journal | Computer Methods and Programs in Biomedicine |
Volume | 260 |
Number of pages | 20 |
ISSN | 0169-2607 |
DOIs | |
Publication status | Published - Mar 2025 |
Bibliographical note
Copyright © 2025 The Author(s). Published by Elsevier B.V. All rights reserved.Keywords
- Algorithms
- Female
- Humans
- Machine Learning
- Male
- Neural Networks, Computer
- ROC Curve
- Sri Lanka
- alpha-Thalassemia/classification
- Hemoglobinopathies
- Classification
- Machine learning
- Artificial intelligence
- Alpha thalassemia