Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ Padua Thesis and Dis...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
addClaim

An Experimental Assessment of the Efficacy of BERTopic

Authors: ZAHIR, FARIN BINTA#idabnull;

An Experimental Assessment of the Efficacy of BERTopic

Abstract

Topic modelling is an unsupervised machine-learning technique for finding abstract topics in a large collection of documents. It helps in organizing, understanding, and summarizing large collections of textual information while discovering the latent topics that vary among documents in a given corpus. Recently, newly developed algorithms for topic modelling, such as BERTopic have gained significant attention from researchers and continue to attract growing interest. This research not only sheds light on the efficacy of using these advanced algorithms but also emphasizes the importance of possessing certain technical skills for conducting meaningful investigations in this domain. Efficient, speedy, and scalable implementations of these algorithms are essential for handling vast corpora of text data. Additionally, to ensure the success of this study and meaningful comparisons among various topic modelling approaches, proficiency in technical skills such as data analysis and data visualization is imperative. Utilizing Python as the programming language of choice provides the flexibility and robustness required for algorithmic implementations, while a solid foundation in statistical modelling and mathematical skills is indispensable for accurate calculation and prediction. Specifically, the main contribution of the study is to introduce the NMI (Normalized Mutual Information) and modularity which are the two different evaluation metrics used to assess the quality of clusters or topics generated by clustering algorithms, including those used in BERTopic. In essence, this research not only explores the effectiveness of state-of-the-art topic modelling algorithms but also underscores the significance of technical expertise in data analysis, data visualization, Python programming, and statistical modelling to facilitate comprehensive comparisons within the field of topic modelling.

Country
Italy
Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green