A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients

Simon Lebech Cichosz*, Clara Bender, Ole Hejlesen

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

12 Downloads (Pure)


Introduction: Early detection of type 2 diabetes is essential for preventing long-term complications. However, screening the entire population for diabetes is not cost-effective, so identifying individuals at high risk for this disease is crucial. The aim of this study was to compare the performance of five diverse machine learning (ML) models in classifying undiagnosed diabetes using large heterogeneous datasets. Methods: We used machine learning data from several years of the National Health and Nutrition Examination Survey (NHANES) from 2005 to 2018 to identify people with undiagnosed diabetes. The dataset included 45,431 participants, and biochemical confirmation of glucose control (HbA1c) were used to identify undiagnosed diabetes. The predictors were based on simple and clinically obtainable variables, which could be feasible for prescreening for diabetes. We included five ML models for comparison: random forest, AdaBoost, RUSBoost, LogitBoost, and a neural network. Results: The prevalence of undiagnosed diabetes was 4%. For the classification of undiagnosed diabetes, the area under the ROC curve (AUC) values were between 0.776 and 0.806. The positive predictive values (PPVs) were between 0.083 and 0.091, the negative predictive values (NPVs) were between 0.984 and 0.99, and the sensitivities were between 0.742 and 0.871. Conclusion: We have demonstrated that several types of classification models can accurately classify undiagnosed diabetes from simple and clinically obtainable variables. These results suggest that the use of machine learning for prescreening for undiagnosed diabetes could be a useful tool in clinical practice.
Original languageEnglish
Issue number1
Pages (from-to)1-11
Number of pages11
Publication statusPublished - 3 Jan 2024


  • undiagnosed diabetes
  • diabetes mellitus
  • machine learning
  • prescreening
  • clinically obtainable variables


Dive into the research topics of 'A Comparative Analysis of Machine Learning Models for the Detection of Undiagnosed Diabetes Patients'. Together they form a unique fingerprint.

Cite this