TY - JOUR
T1 - A chat about actinic keratosis
T2 - Examining capabilities and user experience of ChatGPT as a digital health technology in dermato-oncology
AU - Lent, Heather C.
AU - Ortner, Vinzent K.
AU - Karmisholt, Katrine E.
AU - Wiegell, Stine R.
AU - Nissen, Christoffer V.
AU - Omland, Silje H.
AU - Kamstrup, Maria R.
AU - Togsverd-Bo, Katrine
AU - Haedersdal, Merete
N1 - Publisher Copyright:
© 2023 The Authors. JEADV Clinical Practice published by John Wiley & Sons Ltd on behalf of European Academy of Dermatology and Venereology.
PY - 2024/3
Y1 - 2024/3
N2 - Background: The potential applications of artificial intelligence (AI) in dermatology are evolving rapidly. Chatbots are an emerging trend in healthcare that rely on large language models (LLMs) to generate answers to prompts from users. However, the factuality and user experience (UX) of such chatbots remain to be evaluated in the context of dermato-oncology. Objectives: To examine the potential of Chat Generative Pretrained Transformer (ChatGPT) as a reliable source of information in the context of actinic keratosis (AK) and to evaluate clinicians' attitudes and UX with regard to the chatbot. Methods: A set of 38 clinical questions were compiled and entered as natural language queries in separate, individual conversation threads in ChatGPT (OpenAI, default GPT 3.5). Questions pertain to patient education, diagnosis, and treatment. ChatGPT's responses were presented to a panel of 7 dermatologists for rating of factual accuracy, currency of information, and completeness of the response. Attitudes towards ChatGTP were explored qualitatively and quantitatively using a validated user experience questionnaire (UEQ). Results: ChatGPT answered 12 questions (31.6%) with an accurate, current, and complete response. ChatGPT performed best for questions on patient education, including pathogenesis of AK and potential risk factors, but struggled with diagnosis and treatment. Major deficits were seen in grading AK, providing up-to-date treatment guidance, and asserting incorrect information with unwarranted confidence. Further, responses were considered verbose with an average word count of 198 (SD 55) and overly alarming of the risk of malignant transformation. Based on UEQ responses, the expert panel considered ChatGPT an attractive and efficient tool, scoring highest for speed of information retrieval, but deemed the chatbot inaccurate and verbose, scoring lowest for clarity. Conclusions: While dermatologists rated ChatGPT high in UX, the underlying LLMs that enable such chatbots require further development to guarantee accuracy and concision required in a clinical setting.
AB - Background: The potential applications of artificial intelligence (AI) in dermatology are evolving rapidly. Chatbots are an emerging trend in healthcare that rely on large language models (LLMs) to generate answers to prompts from users. However, the factuality and user experience (UX) of such chatbots remain to be evaluated in the context of dermato-oncology. Objectives: To examine the potential of Chat Generative Pretrained Transformer (ChatGPT) as a reliable source of information in the context of actinic keratosis (AK) and to evaluate clinicians' attitudes and UX with regard to the chatbot. Methods: A set of 38 clinical questions were compiled and entered as natural language queries in separate, individual conversation threads in ChatGPT (OpenAI, default GPT 3.5). Questions pertain to patient education, diagnosis, and treatment. ChatGPT's responses were presented to a panel of 7 dermatologists for rating of factual accuracy, currency of information, and completeness of the response. Attitudes towards ChatGTP were explored qualitatively and quantitatively using a validated user experience questionnaire (UEQ). Results: ChatGPT answered 12 questions (31.6%) with an accurate, current, and complete response. ChatGPT performed best for questions on patient education, including pathogenesis of AK and potential risk factors, but struggled with diagnosis and treatment. Major deficits were seen in grading AK, providing up-to-date treatment guidance, and asserting incorrect information with unwarranted confidence. Further, responses were considered verbose with an average word count of 198 (SD 55) and overly alarming of the risk of malignant transformation. Based on UEQ responses, the expert panel considered ChatGPT an attractive and efficient tool, scoring highest for speed of information retrieval, but deemed the chatbot inaccurate and verbose, scoring lowest for clarity. Conclusions: While dermatologists rated ChatGPT high in UX, the underlying LLMs that enable such chatbots require further development to guarantee accuracy and concision required in a clinical setting.
KW - actinic keratosis
KW - ChatGPT
KW - large language models
KW - natural language processing
KW - skin cancer
KW - user experience
UR - http://www.scopus.com/inward/record.url?scp=85186396655&partnerID=8YFLogxK
U2 - 10.1002/jvc2.263
DO - 10.1002/jvc2.263
M3 - Journal article
AN - SCOPUS:85186396655
SN - 2768-6566
VL - 3
SP - 258
EP - 265
JO - JEADV Clinical Practice
JF - JEADV Clinical Practice
IS - 1
ER -