Can a Conversational Agent Pass Theory-of-Mind Tasks? A Case Study of ChatGPT with the Hinting, False Beliefs, and Strange Stories Paradigms

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Preprint , Conference object 01 Jan 2024 France English Publisher:Springer Nature Switzerland

Authors: Brunet-Gouet, Eric; Vidal, Nathan; Roux, Paul;

doi: 10.1007/978-3-031-55245-8_7 , 10.5281/zenodo.8009748 , 10.5281/zenodo.7637475

Can a Conversational Agent Pass Theory-of-Mind Tasks? A Case Study of ChatGPT with the Hinting, False Beliefs, and Strange Stories Paradigms

- Summary
- Subjects
- Metrics

Abstract

We investigate the possibility that the recently proposed OpenAI’s ChatGPT conversational agent could be examined with classical theory-of-mind paradigms. We used an indirect speech understanding task, the hinting task, a new text version of a False Belief/False Photographs paradigm, and the Strange Stories paradigm. The hinting task is usually used to assess individuals with autism or schizophrenia by requesting them to infer hidden intentions from short conversations involving two characters. In a first experiment, ChatGPT 3.5 exhibits quite limited performances on the Hinting task when either original scoring or revised rating scales are used. We introduced slightly modified versions of the hinting task in which either cues about the presence of a communicative intention were added or a specific question about the character’s intentions were asked. Only the latter demonstrated enhanced performances. No dissociation between the conditions was found. The Strange Stories were associated with correct performances but we could not be sure that the algorithm had no prior knowledge of the test. In the second experiment, the most recent version of ChatGPT (4-0314) exhibited better performances in the Hinting task, although they did not match the average scores of healthy subjects. In addition, the model could solve first and second order False Beliefs tests but failed on items with reference to a physical property like object visibility or more complex inferences. This work offers an illustration of the possible application of psychological constructs and paradigms to a conversational agent of a radically new nature.

Updated version of the study (first version: Feb 13, 2023, DOI : 10.5281/zenodo.7637476) with a second experiment. Peer-reviewed and published in : Brunet-Gouet, E., Vidal, N., Roux, P. (2024). Can a Conversational Agent Pass Theory-of-Mind Tasks? A Case Study of ChatGPT with the Hinting, False Beliefs, and Strange Stories Paradigms. In: Baratgin, J., Jacquet, B., Yama, H. (eds) Human and Artificial Rationalities. HAR 2023. Lecture Notes in Computer Science, vol 14522. Springer, Cham. https://doi.org/10.1007/978-3-031-55245-8_7

Country

France

Related Organizations

Inserm
France
Assistance Publique -Hopitaux De Paris
France
Centre Hospitalier de Versailles
France
Institut National de la Santé et la Recherche Médicale
France
University of Paris-Saclay
France

View all View all

Keywords

ChatGPT, False beliefs, False Beliefs, theory-of-mind, [SCCO] Cognitive science, [INFO] Computer Science [cs], indirect speech

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	11
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%