Using LLMs to Identify Indicators of Persistence from Students' Dialogues with a Pedagogical Agent

Ober, Teresa M.; Zhang, Shan; Zapata-Rivera, Diego; Schroeder, Noah L.; Botelho, Anthony F.

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Article . 2026

License: CC BY

Data sources: Datacite

ZENODO

Article . 2026

License: CC BY

Data sources: Datacite

Using LLMs to Identify Indicators of Persistence from Students' Dialogues with a Pedagogical Agent

descriptionPublicationkeyboard_double_arrow_right Article 03 Mar 2026 English Publisher:Zenodo

Authors: Ober, Teresa M.; Zhang, Shan; Zapata-Rivera, Diego; Schroeder, Noah L.; Botelho, Anthony F.;

doi: 10.5281/zenodo.18852440 , 10.5281/zenodo.18852441

Using LLMs to Identify Indicators of Persistence from Students' Dialogues with a Pedagogical Agent

- Summary
- Subjects
- Metrics

Abstract

Conversational learning systems offer new opportunities to examine learning processes through chat log data. Constructs such as persistence, self-efficacy, interest, perceived challenge, and prior knowledge are known predictors of student performance but are challenging to detect at scale using traditional methods. This study explores the use of Large Language Models (LLMs) to automatically code indicators of these constructs from student chat logs collected through a conversation-based assessment (CBA) for middle school mathematics. Indicators included observable behaviors such as students' expressions of challenge, help-seeking, goal-setting, and self-regulatory strategies evident in their conversational interactions within the CBA. We evaluated multiple configurations of ChatGPT4o, varying temperature settings (0, .3, .7, 1) and model types (mini vs. regular), against human expert coders. The dataset comprised over 10,000 student turns collected from 107 middle school students classified as English learners as they interact with the CBA. Reliability was assessed within and between LLM configurations and humans. Results reveal systematic patterns: constructs with moderate theoretical coherence benefited from higher temperatures, while well-defined constructs required deterministic settings. Self-efficacy showed the highest human-LLM alignment. These findings illustrate the challenges of measuring complex psychological constructs and highlight the promise of human-LLM collaboration to enhance qualitative coding efficiency and validity in educational research. Supplemental materials are available online here: https://doi.org/10.17605/osf.io/s85ck.

Keywords

educational data mining, conversation-based assessment (CBA), human-LLM collaboration, construct validity, model configuration, qualitative analysis, persistence, temperature settings, construct extraction, large language models (LLMs)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now