Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Annals of the Rheuma...arrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
Annals of the Rheumatic Diseases
Article . 2025 . Peer-reviewed
License: Elsevier TDM
Data sources: Crossref
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
https://dx.doi.org/10.82161/cr...
Conference object . 2025
Data sources: Datacite
versions View all 5 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Assessing the performance of AI chatbots in answering patients' common questions about low back pain

Authors: Scaff, Simone P S; Reis, Felipe J J; Ferreira, Giovanni E; Jacob, Maria Fernanda; Saragiotto, Bruno T;

Assessing the performance of AI chatbots in answering patients' common questions about low back pain

Abstract

The aim of this study was to assess the accuracy and readability of the answers generated by LLM-chatbots to common patient questions about low back pain. This cross-sectional study analysed responses to 30 LBP-related questions, covering self-management, risk factors, and treatment. The questions were developed by experienced clinicians and researchers, and were piloted with a group of consumer representatives with lived experience of LBP. The inquiries were inputted in prompt form into: ChatGPT 3.5, Bing, Bard (Gemini) and ChatGPT 4.0. Responses were evaluated in relation to their accuracy, readability, and presence of disclaimers about health advice. The accuracy was assessed by comparing the recommendations generated with the main guidelines for LBP. The responses were analysed by two independent reviewers and classified as accurate, inaccurate, or unclear. Readability was measured with the Flesch Reading Ease Score (FRES). Out of 120 responses yielding 1069 recommendations, 55.8% were accurate, 42.1% inaccurate, and 1.9% unclear. The chatbots demonstrated overall moderate accuracy in their recommendations on low back pain. They deliver relatively precise responses in areas such as 'self-management' and 'treatment.' However, they exhibit notable inaccuracies concerning 'risk factors'. Overall, LLM-chatbots provided answers that were “reasonably difficult” to read, with a mean (SD) FRES score of 50.94 (3.06). Readability was generally poor and could negatively impact patient understanding and behaviour. We also found that the chatbots included a "disclaimer about health advice" in 70% to 100% of their responses, helping users recognise that the information provided is not a substitute for professional medical advice. This study highlights the potential and limitations of using LLM-chatbots as a patient resource for low back pain. The findings suggest that while LLM chatbots can provide moderately accurate information, inconsistencies in accuracy across different domains, especially in risk factors, and challenges with readability could impact patient understanding and behaviour. These findings can guide future research on improving LLM-chatbot algorithms, inform clinical practice, and shape policy decisions regarding integrating AI in patient education and support systems worldwide. The use of LLM-chatbots as tools for patient education and counselling in LBP shows promising but variable results. These chatbots generally provide moderately accurate recommendations. However, the accuracy may vary depending on the topic of each question. The reliability level of the answers was inadequate, potentially affecting the patient's ability to comprehend the information

Keywords

Male, Adult, Patient Education as Topic/methods, Self-Management, Innovative technology: information management, big data and artificial intelligence, Musculoskeletal: spine, Middle Aged, artificial intelligence, patient education, Health Literacy, Cross-Sectional Studies, Patient Education as Topic, Surveys and Questionnaires, Low Back Pain/therapy, Pain and pain management, Self-Management/methods, Humans, Female, comprehension, Comprehension, health literacy, Low Back Pain, low back pain

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    24
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
24
Top 10%
Top 10%
Top 10%
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!