Tracking News Stories in Short Messages in the Era of Infodemic

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Conference object 01 Jan 2022 France English Publisher:Springer International PublishingFunded by:EC | NewsEye

Authors: Bernard, Guillaume; Suire, Cyrille; Faucher, Cyril; Doucet, Antoine; Rosso, Paolo;

doi: 10.1007/978-3-031-13643-6_2

Tracking News Stories in Short Messages in the Era of Infodemic

- Summary
- Subjects
- Metrics

Abstract

Le suivi des actualités dans les documents est un moyen de traiter la grande quantité d'informations qui nous entoure chaque jour, de réduire le bruit et de détecter les sujets émergents dans les actualités. Depuis l'épidémie de Covid-19, le monde connaît un nouveau problème : l'infodémie. Les titres des articles d'actualité sont massivement partagés sur les réseaux sociaux et l'analyse des tendances et des sujets en expansion est complexe. Le regroupement de documents dans des articles d'actualité réduit le nombre de sujets à analyser et d'informations à ingérer et/ou à évaluer. Notre étude propose d'analyser le suivi des actualités avec peu d'informations fournies par les titres sur les réseaux sociaux. Dans cet article, nous tirons parti de jeux de données de titres d'articles d'actualité publics pour expérimenter des algorithmes de suivi des actualités sur des messages courts. Nous évaluons la performance du clustering avec une faible quantité de données par document. Nous traitons de la représentation du document (clairsemée avec TF-IDF et dense avec Transformers [26]), de son impact sur les résultats et de la raison pour laquelle elle est essentielle pour ce type de travail. Nous avons utilisé un algorithme supervisé proposé par Miranda et al. [22] et K-Means pour fournir des évaluations pour différents cas d'utilisation. Nous avons constaté que les vecteurs TF-IDF ne sont pas toujours les meilleurs pour regrouper les documents, et que les algorithmes sont sensibles au type de représentation. Sachant cela, nous recommandons de prendre en compte ces deux aspects lors du suivi des nouvelles dans les messages courts. Avec cet article, nous partageons tout le code source et les ressources que nous avons manipulés.

Tracking news stories in documents is a way to deal with the large amount of information that surrounds us everyday, to reduce the noise and to detect emergent topics in news. Since the Covid-19 outbreak, the world has known a new problem: infodemic. News article titles are massively shared on social networks and the analysis of trends and growing topics is complex. Grouping documents in news stories lowers the number of topics to analyse and the information to ingest and/or evaluate. Our study proposes to analyse news tracking with little information provided by titles on social networks. In this paper, we take advantage of datasets of public news article titles to experiment news tracking algorithms on short messages. We evaluate the clustering performance with little amount of data per document. We deal with the document representation (sparse with TF-IDF and dense using Transformers [26]), its impact on the results and why it is key to this type of work. We used a supervised algorithm proposed by Miranda et al. [22] and K-Means to provide evaluations for different use cases. We found that TF-IDF vectors are not always the best ones to group documents, and that algorithms are sensitive to the type of representation. Knowing this, we recommend taking both aspects into account while tracking news stories in short messages. With this paper, we share all the source code and resources we handled.

Country

France

Related Organizations

Universidade Politecnica
Mozambique
Universitat Politècnica de València (Espagne)
Spain
Sciences Po
France
Universitat Politècnica deValència
Spain
University of La Rochelle
France

View all View all

Keywords

Social data, [INFO.INFO-SI] Computer Science [cs]/Social and Information Networks [cs.SI], Text Classification and Clustering, [INFO.INFO-TT] Computer Science [cs]/Document and Text Processing, [INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR], News

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average