
ABSTRACT The assessment of creative achievement (CA) can be cumbersome as participants are typically asked to respond to long lists of possible accomplishments that may still miss their very specific achievements. A bottom‐up alternative is to let participants openly report their most significant CAs, which, however, involves more complex scoring such as via human ratings. In this study, we investigated whether language models (LMs) can provide an efficient and valid scoring of such open‐ended responses. Across two data sets, participants described their three most significant CAs. These responses were rated by human judges and by three LMs (Llama 3.1–8B, Llama 3.3–70B, GPT‐4o) using zero‐shot prompting. Correlations between human and LM ratings were consistently high ( r = 0.53–0.80), and criterion validity evidence of LM‐based scores was largely on par with rater‐based scores. In addition, we examined zero‐shot domain classification of CAs into nine creative domains (e.g., music, visual arts). Classification accuracy was 62.3% overall; closer inspection suggested that automated classification has the potential to unveil conceptual overlaps between domains and to identify CAs involving multiple domains. Taken together, automated scoring of CA via LMs represents a promising and efficient alternative to traditional CA measures by approximating human ratings and providing useful domain classifications.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
