A Comparative Analysis of GPT-3.5, GPT-4 and GPT-4.o in Heart Failure

Şeyda Günay-polatkan; Deniz Sığırlı

Found an issue? Give us feedback

Uludağ Üniversitesi ...arrow_drop_down

Uludağ Üniversitesi Tıp Fakültesi Dergisi

Article . 2025 . Peer-reviewed

Data sources: Crossref

A Comparative Analysis of GPT-3.5, GPT-4 and GPT-4.o in Heart Failure

descriptionPublicationkeyboard_double_arrow_right Article 12 Jan 2025Publisher:Uludag Universitesi Tip Fakultesi DergisiJournal:Uludağ Üniversitesi Tıp Fakültesi Dergisi, volume 50, pages 443-447 (issn: 1300-414X,

Copyright policy )

Authors: Şeyda Günay-polatkan; Deniz Sığırlı;

doi: 10.32708/uutfd.1543370

A Comparative Analysis of GPT-3.5, GPT-4 and GPT-4.o in Heart Failure

- Summary
- Metrics

Abstract

Digitalization have increasingly penetrated in healthcare. Generative artificial intelligence (AI) is a type of AI technology that can generate new content. Patients can use AI-powered chatbots to get medical information. Heart failure is a syndrome with high morbidity and mortality. Patients search about heart failure in many web sites commonly. This study aimed to assess Large Language Models (LLMs) -ChatGPT 3.5, GPT-4 and GPT-4.o- in terms of their accuracy in answering the questions about heart failure (HF). Thirteen questions regarding to the definition, causes, signs and symptoms, complications, treatment and lifestyle recommendations of the HF were evaluated. These questions to assess the knowledge and awareness of medical students about heart failure were taken from a previous study in literature. Of the students who participated in this study, 158 (58.7%) were first-year students, while 111 (41.3%) were sixth-year students and were taking their cardiology internship in their fourth year. The questions were entered in Turkish language and 2 cardiologists with over ten years of experience evaluated the responses generated by different models including GPT-3.5, GPT-4 and GPT-4.o. ChatGPT-3.5 yielded “correct” responses to 8/13 (61.5%) of the questions whereas, GPT-4 yielded “correct” responses to 11/13 (84.6%) of the questions. All of the responses of GPT-4.o were accurate and complete. Performance of medical students did not include 100% correct answers for any question. This study revealed that performance of GPT-4.o was superior to GPT-3.5, but similar with GPT-4

Related Organizations

Uludağ University
Turkey

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold

Fields of Science

medical and health sciences

clinical medicine

Fields of Science

medical and health sciences

clinical medicine