TY - JOUR
T1 - Explainable Machine Learning-Based Approach to Identify People at Risk of Diabetes using Physical Activity Monitoring
AU - Cichosz, Simon Lebech
AU - Bender, Clara
AU - Hejlesen, Ole
PY - 2025/3
Y1 - 2025/3
N2 - Objective: This study aimed to investigate the utilization of patterns derived from physical activity monitoring (PAM) for the identification of individuals at risk of type 2 diabetes mellitus (T2DM) through an at-home screening approach employing machine learning techniques. Methods: Data from the 2011–2014 National Health and Nutrition Examination Survey (NHANES) were scrutinized, focusing on the PAM component. The primary objective involved the identification of diabetes, characterized by an HbA1c ≥ 6.5% (48 mmol/mol), while the secondary objective included individuals with prediabetes, defined by an HbA1c ≥ 5.7% (39 mmol/mol). Features derived from PAM, along with age, were utilized as inputs for an XGBoost classification model. SHapley Additive exPlanations (SHAP) was employed to enhance the interpretability of the models. Results: The study included 7532 subjects with both PAM and HbA1c data. The model, which solely included PAM features, had a test dataset ROC-AUC of 0.74 (95% CI = 0.72–0.76). When integrating the PAM features with age, the model’s ROC-AUC increased to 0.79 (95% CI = 0.78–0.80) in the test dataset. When addressing the secondary target of prediabetes, the XGBoost model exhibited a test dataset ROC-AUC of 0.80 [95% CI; 0.79–0.81]. Conclusions: The objective quantification of physical activity through PAM yields valuable information that can be employed in the identification of individuals with undiagnosed diabetes and prediabetes.
AB - Objective: This study aimed to investigate the utilization of patterns derived from physical activity monitoring (PAM) for the identification of individuals at risk of type 2 diabetes mellitus (T2DM) through an at-home screening approach employing machine learning techniques. Methods: Data from the 2011–2014 National Health and Nutrition Examination Survey (NHANES) were scrutinized, focusing on the PAM component. The primary objective involved the identification of diabetes, characterized by an HbA1c ≥ 6.5% (48 mmol/mol), while the secondary objective included individuals with prediabetes, defined by an HbA1c ≥ 5.7% (39 mmol/mol). Features derived from PAM, along with age, were utilized as inputs for an XGBoost classification model. SHapley Additive exPlanations (SHAP) was employed to enhance the interpretability of the models. Results: The study included 7532 subjects with both PAM and HbA1c data. The model, which solely included PAM features, had a test dataset ROC-AUC of 0.74 (95% CI = 0.72–0.76). When integrating the PAM features with age, the model’s ROC-AUC increased to 0.79 (95% CI = 0.78–0.80) in the test dataset. When addressing the secondary target of prediabetes, the XGBoost model exhibited a test dataset ROC-AUC of 0.80 [95% CI; 0.79–0.81]. Conclusions: The objective quantification of physical activity through PAM yields valuable information that can be employed in the identification of individuals with undiagnosed diabetes and prediabetes.
KW - XGBoost
KW - physical activity monitoring
KW - prediabetes
KW - prediction
KW - screening
KW - type 2 diabetes mellitus
UR - http://www.scopus.com/inward/record.url?scp=105000941778&partnerID=8YFLogxK
U2 - 10.3390/biomedinformatics5010001
DO - 10.3390/biomedinformatics5010001
M3 - Journal article
SN - 2673-7426
VL - 5
JO - BioMedInformatics
JF - BioMedInformatics
IS - 1
M1 - 1
ER -