Evaluating the accuracy and reliability of AI chatbots in disseminating the content of current resuscitation guidelines: a comparative analysis between the ERC 2021 guidelines and both ChatGPTs 3.5 and 4

Name: Evaluating the accuracy and reliability of AI chatbots in disseminating the content of current resuscitation guidelines: a comparative analysis between the ERC 2021 guidelines and both ChatGPTs 3.5 and 4
Keywords: RC86-88.9, Artificial Intelligence, Information Dissemination, Resuscitation, Practice Guidelines as Topic, Humans, Reproducibility of Results, Medical emergencies. Critical care. Intensive care. First aid, Information Dissemination/methods [MeSH] ; Reproducibility of Results [MeSH] ; Humans [MeSH] ; Prospective Studies [MeSH] ; Practice Guidelines as Topic [MeSH] ; Original Research ; Artificial Intelligence [MeSH] ; Resuscitation/standards [MeSH], Prospective Studies

Stefanie Beck; Manuel Kuhner; Markus Haar; Anne Daubmann; Martin Semmann; Stefan Kluge

Found an issue? Give us feedback

Scandinavian Journal...arrow_drop_down

Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine

Article . 2024 . Peer-reviewed

License: CC BY

Data sources: Crossref

Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine

Article . 2024

Data sources: Europe PubMed Central

PubMed Central

Other literature type . 2024

License: http://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (http://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (http://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Data sources: PubMed Central

Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine

Article . 2024

Data sources: DOAJ

Fachrepositorium Lebenswissenschaften

Article . 2024

License: CC BY

Data sources: Fachrepositorium Lebenswissenschaften

Evaluating the accuracy and reliability of AI chatbots in disseminating the content of current resuscitation guidelines: a comparative analysis between the ERC 2021 guidelines and both ChatGPTs 3.5 and 4

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 26 Sep 2024 English Publisher:Springer Science and Business Media LLCJournal:Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine, volume 32 (eissn: 1757-7241,

Copyright policy )

Authors: Stefanie Beck; Manuel Kuhner; Markus Haar; Anne Daubmann; Martin Semmann; Stefan Kluge;

doi: 10.1186/s13049-024-01266-2

pmid: 39327587

pmc: PMC11425874

Evaluating the accuracy and reliability of AI chatbots in disseminating the content of current resuscitation guidelines: a comparative analysis between the ERC 2021 guidelines and both ChatGPTs 3.5 and 4

- Summary
- Subjects
- Metrics

Abstract

Abstract Aim of the study Artificial intelligence (AI) chatbots are established as tools for answering medical questions worldwide. Healthcare trainees are increasingly using this cutting-edge technology, although its reliability and accuracy in the context of healthcare remain uncertain. This study evaluated the suitability of Chat-GPT versions 3.5 and 4 for healthcare professionals seeking up-to-date evidence and recommendations for resuscitation by comparing the key messages of the resuscitation guidelines, which methodically set the gold standard of current evidence and recommendations, with the statements of the AI chatbots on this topic. Methods This prospective comparative content analysis was conducted between the 2021 European Resuscitation Council (ERC) guidelines and the responses of two freely available ChatGPT versions (ChatGPT-3.5 and the Bing version of the ChatGPT-4) to questions about the key messages of clinically relevant ERC guideline chapters for adults. (1) The content analysis was performed bidirectionally by independent raters. The completeness and actuality of the AI output were assessed by comparing the key message with the AI-generated statements. (2) The conformity of the AI output was evaluated by comparing the statements of the two ChatGPT versions with the content of the ERC guidelines. Results In response to inquiries about the five chapters, ChatGPT-3.5 generated a total of 60 statements, whereas ChatGPT-4 produced 32 statements. ChatGPT-3.5 did not address 123 key messages, and ChatGPT-4 did not address 132 of the 172 key messages of the ERC guideline chapters. A total of 77% of the ChatGPT-3.5 statements and 84% of the ChatGPT-4 statements were fully in line with the ERC guidelines. The main reason for nonconformity was superficial and incorrect AI statements. The interrater reliability between the two raters, measured by Cohen’s kappa, was greater for ChatGPT-4 (0.56 for completeness and 0.76 for conformity analysis) than for ChatGPT-3.5 (0.48 for completeness and 0.36 for conformity). Conclusion We advise healthcare professionals not to rely solely on the tested AI-based chatbots to keep up to date with the latest evidence, as the relevant texts for the task were not part of the training texts of the underlying LLMs, and the lack of conceptual understanding of AI carries a high risk of spreading misconceptions. Original publications should always be considered for comprehensive understanding.

Related Organizations

University Medical Center Hamburg-Eppendorf
Germany
Universität Hamburg
Germany

Keywords

RC86-88.9, Artificial Intelligence, Information Dissemination, Resuscitation, Practice Guidelines as Topic, Humans, Reproducibility of Results, Medical emergencies. Critical care. Intensive care. First aid, Information Dissemination/methods [MeSH] ; Reproducibility of Results [MeSH] ; Humans [MeSH] ; Prospective Studies [MeSH] ; Practice Guidelines as Topic [MeSH] ; Original Research ; Artificial Intelligence [MeSH] ; Resuscitation/standards [MeSH], Prospective Studies, Original Research

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	6
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

6

Top 10%

Average

Top 10%

Green

Published in a Diamond OA journal

Related to Research communities

Knowmad Institut