Out of Context! Managing the Limitations of Context Windows in ChatGPT-4o Text Analyses

Name: Out of Context! Managing the Limitations of Context Windows in ChatGPT-4o Text Analyses
Keywords: ChatGPT, Green transition, Parliamentary speeches, Languages, Text analysis, Large language models

Mervaala, Erkki; Kousa, Ilona

Found an issue? Give us feedback

Journal of Data Mini...arrow_drop_down

Journal of Data Mining & Digital Humanities

Article . 2025 . Peer-reviewed

Data sources: Crossref

HELDA - Digital Repository of the University of Helsinki

Article . 2025 . Peer-reviewed

Data sources: HELDA - Digital Repository of the University of Helsinki

Research.fi

Article . 2025 . Peer-reviewed

Data sources: Research.fi

ZENODO

Preprint . 2025

License: CC BY

Data sources: Datacite

ZENODO

Article . 2025

License: CC BY

Data sources: Datacite

Out of Context! Managing the Limitations of Context Windows in ChatGPT-4o Text Analyses

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 07 Mar 2025 Finland English Publisher:Centre pour la Communication Scientifique Directe (CCSD)Journal:Journal of Data Mining & Digital Humanities, volume NLP4DH (eissn: 2416-5999,

Copyright policy )Funded by:EC | SI-PALEO

Authors: Mervaala, Erkki; Kousa, Ilona;

doi: 10.46298/jdmdh.15090 , 10.5281/zenodo.14670362 , 10.5281/zenodo.14670361

handle: 10138/598074

Out of Context! Managing the Limitations of Context Windows in ChatGPT-4o Text Analyses

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

In recent years, large language model (LLM) applications have surged in popularity, and academia has followed suit. Researchers frequently seek to automate text annotation - often a tedious task – and, to some extent, text analysis. Notably, popular LLMs such as ChatGPT have been studied as both research assistants and analysis tools, revealing several concerns regarding transparency and the nature of AI-generated content. This study assesses ChatGPT’s usability and reliability for text analysis – specifically keyword extraction and topic classification – within an “out-of-the-box” zero-shot or few-shot context, emphasizing how the size of the context window and varied text types influence the resulting analyses. Our findings indicate that text type and the order in which texts are presented both significantly affect ChatGPT’s analysis. At the same time, context-building tends to be less problematic when analyzing similar texts. However, lengthy texts and documents pose serious challenges: once the context window is exceeded, “hallucinated” results often emerge. While some of these issues stem from the core functioning of LLMs, some can be mitigated through transparent research planning.

Country

Finland

Related Organizations

Finnish Environment Institute
Finland
University of Helsinki
Finland

Keywords

ChatGPT, Green transition, Parliamentary speeches, Languages, Text analysis, Large language models

2 Research products, page 1 of 1

Order Up! Micromanaging Inconsistencies in ChatGPT-4o Text Analyses
2024IsNewVersionOf
Order Up! Micromanaging Inconsistencies in ChatGPT-4o Text Analyses
2024Continues

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average