
This dataset contains processed data extracted from Telegram channels using pytopicgram from 2019-12-01 to 2024-08-31. It includes anonymized channel information, sampled messages, and topics identified using BERTopic. The data has been anonymized and structured for ease of analysis. The dataset comprises two main CSV files: 1. Topics (topics.csv) This file contains topics extracted from the full dataset using BERTopic. Each topic is described by a concise text generated by OpenAI o1. Column Name Description Topic Numeric identifier for each topic. -1 is the generic topic for non-assignable messages. Name Human-readable name summarizing the topic. Representation List of representative keywords for the topic. Description Concise description of the topic generated by OpenAI. 2. Messages (messages.csv) This file contains a 25% stratified sample of messages (on topic column) from Telegram channels. Column Name Description channel_id Anonymized identifier for the Telegram channel. week_year Week and year when the message was posted (format: week_year). media_type Type of media included in the message (txt, img, video, audio, doc, web). reach Number of users reached by the message. virality Virality score of the message. is_viral Boolean indicating whether the message is considered viral. topics Topic identifier associated with the message. probs Probability scores for topic assignment.
Data Science, Disinformation, Social Media
Data Science, Disinformation, Social Media
| citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
