Talk to your data: Introducing text embedding similarity analysis (TESA) in psychological research

Name: Talk to your data: Introducing text embedding similarity analysis (TESA) in psychological research
Keywords: Psychology/methods, Humans, semantics, qualitative research

Juul Vossen; Evy Kuijpers; Joeri Hofmans

Found an issue? Give us feedback

Behavior Research Me...arrow_drop_down

Behavior Research Methods

Article . 2025 . Peer-reviewed

License: Springer Nature TDM

Data sources: Crossref

Vrije Universiteit Brussel Research Portal

Article . 2025

Data sources: Vrije Universiteit Brussel Research Portal

Talk to your data: Introducing text embedding similarity analysis (TESA) in psychological research

Introducing text embedding similarity analysis (TESA) in psychological research

descriptionPublicationkeyboard_double_arrow_right Article 28 May 2025 English Publisher:Springer Science and Business Media LLCJournal:Behavior Research Methods, volume 57 (eissn: 1554-3528,

Copyright policy )

Authors: Juul Vossen; Evy Kuijpers; Joeri Hofmans;

doi: 10.3758/s13428-025-02698-z

Talk to your data: Introducing text embedding similarity analysis (TESA) in psychological research

- Summary
- Subjects
- Metrics

Abstract

While qualitative research plays a vital role in understanding complex phenomena, it lends itself poorly to testing formal hypotheses due to its inability to fit statistical models to text data. Approaches that are traditionally used to quantify text data (e.g., content analysis) are generally time-consuming, prone to researcher bias, and neglect a substantial amount of potentially important semantic context. Although novel approaches have been proposed, these typically require large amounts of text data and tend to be inductive in nature. To enable researchers to ask hypothesis-based and open-ended questions from one's text data, the current study proposes a novel retrieval augmented generation (RAG)-based approach (called text embedding similarity analysis, TESA) that transforms a hypothesis into two specific search terms: a population (or sample) and a variable of interest. Using pretrained large language models (LLM), we extract the semantic embedding of the search terms and text data and use cosine similarity to match search terms. This allows hypothesis testing by assessing the alignment between the distribution of similarity scores for a variable of interest with the expectation for the population.

Related Organizations

KU Leuven
Belgium
Vrije Universiteit Brussel
Belgium

Keywords

Psychology/methods, Humans, semantics, qualitative research

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Related to Research communities

EUTOPIA Open Research Portal

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now