Automated Scoring of Creative Achievement

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 21 Oct 2025 English Publisher:WileyJournal:The Journal of Creative Behavior, volume 60 (issn: 0022-0175, eissn: 2162-6057,

Copyright policy )

Authors: Noah Meinzer; Janika Saretzki; Simon M. Ceh; Mathias Benedek;

doi: 10.1002/jocb.70104 , 10.31234/osf.io/rk6yp_v1 , 10.13140/rg.2.2.20538.25281

Automated Scoring of Creative Achievement

- Summary
- Metrics

Abstract

ABSTRACT The assessment of creative achievement (CA) can be cumbersome as participants are typically asked to respond to long lists of possible accomplishments that may still miss their very specific achievements. A bottom‐up alternative is to let participants openly report their most significant CAs, which, however, involves more complex scoring such as via human ratings. In this study, we investigated whether language models (LMs) can provide an efficient and valid scoring of such open‐ended responses. Across two data sets, participants described their three most significant CAs. These responses were rated by human judges and by three LMs (Llama 3.1–8B, Llama 3.3–70B, GPT‐4o) using zero‐shot prompting. Correlations between human and LM ratings were consistently high ( r = 0.53–0.80), and criterion validity evidence of LM‐based scores was largely on par with rater‐based scores. In addition, we examined zero‐shot domain classification of CAs into nine creative domains (e.g., music, visual arts). Classification accuracy was 62.3% overall; closer inspection suggested that automated classification has the potential to unveil conceptual overlaps between domains and to identify CAs involving multiple domains. Taken together, automated scoring of CA via LMs represents a promising and efficient alternative to traditional CA measures by approximating human ratings and providing useful domain classifications.

Related Organizations

University of Graz
Austria
Ludwig-Maximilians-Universität München
Germany
Munich University of Applied Sciences
Germany

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

hybrid