Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling

descriptionPublicationkeyboard_double_arrow_right Article , Conference object , Preprint 01 May 2024Embargo end date: 01 Jan 2024 United Kingdom Publisher:ELRAJournal:Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)Funded by:EC | vera.ai

Authors: Mu, Y.; Dong, C.; Bontcheva, K.; Song, X.;

doi: 10.63317/2x489fw7wi5m , 10.48550/arxiv.2403.16248 , 10.5281/zenodo.11952273 , 10.5281/zenodo.11952270

arXiv: 2403.16248

Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling

- Summary
- Subjects
- Metrics

Abstract

Topic modelling, as a well-established unsupervised technique, has found extensive use in automatically detecting significant topics within a corpus of documents. However, classic topic modelling approaches (e.g., LDA) have certain drawbacks, such as the lack of semantic understanding and the presence of overlapping topics. In this work, we investigate the untapped potential of large language models (LLMs) as an alternative for uncovering the underlying topics within extensive text corpora. To this end, we introduce a framework that prompts LLMs to generate topics from a given set of documents and establish evaluation protocols to assess the clustering efficacy of LLMs. Our findings indicate that LLMs with appropriate prompts can stand out as a viable alternative, capable of generating relevant topic titles and adhering to human guidelines to refine and merge topics. Through in-depth experiments and evaluation, we summarise the advantages and constraints of employing LLMs in topic extraction.

Accepted at LREC-COLING 2024

Country

United Kingdom

Related Organizations

Department of Computer Science
Spain
White Rose Consortium: University of Leeds; University of Sheffield; University of York
United Kingdom
University of Sheffield
United Kingdom
THE UNIVERSITY OF SHEFFIELD
The University of Sheffield
Greece

Keywords

FOS: Computer and information sciences, Large Language Models, Computer Science - Computation and Language, Topic Modelling, Evaluation Protocol, LLM-driven Topic Extraction, Computation and Language (cs.CL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Funded by

EC| vera.ai