
Abstract Purpose: The work presented in this article focusses on improving the interpretability of probabilistic topic models created from a large collection of scientific documents that evolve over time. Methods: Several time-dependent approaches based on topic models were compared to analyse the annual evolution of latent concepts in the CORD-19 corpus: Dynamic Topic Model, Dynamic Embedded Topic Model, and BERTopic. Then COVID-19 period (December 2019 - present) has been analysed in greater depth, month by month, to explore the evolution of what is written about the disease. Results: The evaluations suggest that the Dynamic Topic Model is the best choice to analyse the CORD-19 corpus. A novel topic labelling strategy is proposed for dynamic topic models to analyse the evolution of latent concepts. It incorporates content changes in both the annual evolution of the corpus and the monthly evolution of the COVID-19 disease. The generated labels are manually validated using two approaches: through the most relevant documents on the topic, and through the documents that share the most semantically similar label topics. Conclusions: The labelling enables the interpretation of topics. The novel method for dynamic topic labelling fits the content of each topic and supports the semantics of the topics.
CORD-1, Informática, Coronaviruses, Medicina, Labeling techniques, topic modeling, COVID-19, Labelings, Dynamic topic models, Topic labelling, CORD-19, Tim, Stem-Cell Transplantation, Scientific Literature, Coronavirus, Topic labeling, Topic interpretability, Dynamic topic model, Interpretability
CORD-1, Informática, Coronaviruses, Medicina, Labeling techniques, topic modeling, COVID-19, Labelings, Dynamic topic models, Topic labelling, CORD-19, Tim, Stem-Cell Transplantation, Scientific Literature, Coronavirus, Topic labeling, Topic interpretability, Dynamic topic model, Interpretability
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
