ChatGPT fails challenging the recent ESCMID brain abscess guideline

Susanne Dyckhoff-Shen; Uwe Koedel; Matthijs C. Brouwer; Jacob Bodilsen; Matthias Klein

doi:10.1007/s00415-023-12168-1

ChatGPT fails challenging the recent ESCMID brain abscess guideline

Susanne Dyckhoff-Shen^*, Uwe Koedel, Matthijs C. Brouwer, Jacob Bodilsen, Matthias Klein

^*Corresponding author for this work

Research output: Contribution to journal › Journal article › Research › peer-review

Abstract

BACKGROUND: With artificial intelligence (AI) on the rise, it remains unclear if AI is able to professionally evaluate medical research and give scientifically valid recommendations.

AIM: This study aimed to assess the accuracy of ChatGPT's responses to ten key questions on brain abscess diagnostics and treatment in comparison to the guideline recently published by the European Society for Clinical Microbiology and Infectious Diseases (ESCMID).

METHODS: All ten PECO (Population, Exposure, Comparator, Outcome) questions which had been developed during the guideline process were presented directly to ChatGPT. Next, ChatGPT was additionally fed with data from studies selected for each PECO question by the ESCMID committee. AI's responses were subsequently compared with the recommendations of the ESCMID guideline.

RESULTS: For 17 out of 20 challenges, ChatGPT was able to give recommendations on the management of patients with brain abscess, including grade of evidence and strength of recommendation. Without data prompting, 70% of questions were answered very similar to the guideline recommendation. In the answers that differed from the guideline recommendations, no patient hazard was present. Data input slightly improved the clarity of ChatGPT's recommendations, but, however, led to less correct answers including two recommendations that directly contradicted the guideline, being associated with the possibility of a hazard to the patient.

CONCLUSION: ChatGPT seems to be able to rapidly gather information on brain abscesses and give recommendations on key questions about their management in most cases. Nevertheless, single responses could possibly harm the patients. Thus, the expertise of an expert committee remains inevitable.

Original language	English
Journal	Journal of Neurology
Volume	271
Issue number	4
Pages (from-to)	2086-2101
Number of pages	16
ISSN	0340-5354
DOIs	https://doi.org/10.1007/s00415-023-12168-1
Publication status	Published - Apr 2024

Bibliographical note

Keywords

AI
Brain abscess
ChatGPT
Guideline

Access to Document

10.1007/s00415-023-12168-1Licence: CC BY 4.0

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{06edcbf0ca5f42c1bd0f1c6c6d61ed60,

title = "ChatGPT fails challenging the recent ESCMID brain abscess guideline",

abstract = "BACKGROUND: With artificial intelligence (AI) on the rise, it remains unclear if AI is able to professionally evaluate medical research and give scientifically valid recommendations.AIM: This study aimed to assess the accuracy of ChatGPT's responses to ten key questions on brain abscess diagnostics and treatment in comparison to the guideline recently published by the European Society for Clinical Microbiology and Infectious Diseases (ESCMID).METHODS: All ten PECO (Population, Exposure, Comparator, Outcome) questions which had been developed during the guideline process were presented directly to ChatGPT. Next, ChatGPT was additionally fed with data from studies selected for each PECO question by the ESCMID committee. AI's responses were subsequently compared with the recommendations of the ESCMID guideline.RESULTS: For 17 out of 20 challenges, ChatGPT was able to give recommendations on the management of patients with brain abscess, including grade of evidence and strength of recommendation. Without data prompting, 70% of questions were answered very similar to the guideline recommendation. In the answers that differed from the guideline recommendations, no patient hazard was present. Data input slightly improved the clarity of ChatGPT's recommendations, but, however, led to less correct answers including two recommendations that directly contradicted the guideline, being associated with the possibility of a hazard to the patient.CONCLUSION: ChatGPT seems to be able to rapidly gather information on brain abscesses and give recommendations on key questions about their management in most cases. Nevertheless, single responses could possibly harm the patients. Thus, the expertise of an expert committee remains inevitable.",

keywords = "AI, Brain abscess, ChatGPT, Guideline",

author = "Susanne Dyckhoff-Shen and Uwe Koedel and Brouwer, {Matthijs C.} and Jacob Bodilsen and Matthias Klein",

note = "{\textcopyright} 2024. The Author(s).",

year = "2024",

month = apr,

doi = "10.1007/s00415-023-12168-1",

language = "English",

volume = "271",

pages = "2086--2101",

journal = "Journal of Neurology",

issn = "0340-5354",

publisher = "Springer",

number = "4",

}

TY - JOUR

T1 - ChatGPT fails challenging the recent ESCMID brain abscess guideline

AU - Dyckhoff-Shen, Susanne

AU - Koedel, Uwe

AU - Brouwer, Matthijs C.

AU - Bodilsen, Jacob

AU - Klein, Matthias

PY - 2024/4

Y1 - 2024/4

N2 - BACKGROUND: With artificial intelligence (AI) on the rise, it remains unclear if AI is able to professionally evaluate medical research and give scientifically valid recommendations.AIM: This study aimed to assess the accuracy of ChatGPT's responses to ten key questions on brain abscess diagnostics and treatment in comparison to the guideline recently published by the European Society for Clinical Microbiology and Infectious Diseases (ESCMID).METHODS: All ten PECO (Population, Exposure, Comparator, Outcome) questions which had been developed during the guideline process were presented directly to ChatGPT. Next, ChatGPT was additionally fed with data from studies selected for each PECO question by the ESCMID committee. AI's responses were subsequently compared with the recommendations of the ESCMID guideline.RESULTS: For 17 out of 20 challenges, ChatGPT was able to give recommendations on the management of patients with brain abscess, including grade of evidence and strength of recommendation. Without data prompting, 70% of questions were answered very similar to the guideline recommendation. In the answers that differed from the guideline recommendations, no patient hazard was present. Data input slightly improved the clarity of ChatGPT's recommendations, but, however, led to less correct answers including two recommendations that directly contradicted the guideline, being associated with the possibility of a hazard to the patient.CONCLUSION: ChatGPT seems to be able to rapidly gather information on brain abscesses and give recommendations on key questions about their management in most cases. Nevertheless, single responses could possibly harm the patients. Thus, the expertise of an expert committee remains inevitable.

AB - BACKGROUND: With artificial intelligence (AI) on the rise, it remains unclear if AI is able to professionally evaluate medical research and give scientifically valid recommendations.AIM: This study aimed to assess the accuracy of ChatGPT's responses to ten key questions on brain abscess diagnostics and treatment in comparison to the guideline recently published by the European Society for Clinical Microbiology and Infectious Diseases (ESCMID).METHODS: All ten PECO (Population, Exposure, Comparator, Outcome) questions which had been developed during the guideline process were presented directly to ChatGPT. Next, ChatGPT was additionally fed with data from studies selected for each PECO question by the ESCMID committee. AI's responses were subsequently compared with the recommendations of the ESCMID guideline.RESULTS: For 17 out of 20 challenges, ChatGPT was able to give recommendations on the management of patients with brain abscess, including grade of evidence and strength of recommendation. Without data prompting, 70% of questions were answered very similar to the guideline recommendation. In the answers that differed from the guideline recommendations, no patient hazard was present. Data input slightly improved the clarity of ChatGPT's recommendations, but, however, led to less correct answers including two recommendations that directly contradicted the guideline, being associated with the possibility of a hazard to the patient.CONCLUSION: ChatGPT seems to be able to rapidly gather information on brain abscesses and give recommendations on key questions about their management in most cases. Nevertheless, single responses could possibly harm the patients. Thus, the expertise of an expert committee remains inevitable.

KW - AI

KW - Brain abscess

KW - ChatGPT

KW - Guideline

UR - http://www.scopus.com/inward/record.url?scp=85183340027&partnerID=8YFLogxK

U2 - 10.1007/s00415-023-12168-1

DO - 10.1007/s00415-023-12168-1

M3 - Journal article

C2 - 38279999

SN - 0340-5354

VL - 271

SP - 2086

EP - 2101

JO - Journal of Neurology

JF - Journal of Neurology

IS - 4

ER -

ChatGPT fails challenging the recent ESCMID brain abscess guideline

Abstract

Bibliographical note

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this