
This repository contains the YouTube data used in the project CLINT (https://clint-project.github.io/) during the task that developed prediction models for content classification, link generation, and analysis of polarization evolution. Summary This repository contains a labelled dataset derived from the Recfluence project (https://recfluence.net/), where YouTube channels have been annotated according to their political bias (Left / Center / Right) and content type tags, such as Mainstream News or Conspiracy. We focused on two subnetworks with opposing tags: Partisan Network: Videos from channels tagged as either partisan left or partisan right. Content-Type Network: Videos from channels tagged as mainstream news or conspiracy. Data for both networks was collected using the YouTube Data API, following the official documentation and adhering to quota limitations. We queried videos from the available channels related to climate change and labelled each video according to the channel’s evaluation. Data description The dataset (data/video_data.csv) contains the following columns: network: Indicates which subnetwork the video belongs to. There are two subnetworks in this dataset: Partisan Network – videos from channels tagged as either partisan left or partisan right. Content-Type Network – videos from channels tagged as mainstream news or conspiracy. video_id: The unique identifier for the YouTube video, as provided by the YouTube platform. This can be used to access the video directly on YouTube. node_label: The label assigned to the video based on the channel it belongs to. For the partisan network, this is either left or right. For the content-type network, this is either mainstream or conspiracy. publishedAt: The publication date and time of the video, as reported by YouTube. The format follows the standard ISO 8601 timestamp. Statistics overview network mean_viewCount mean_likeCount mean_commentCount videoCount ideology 164695 4174 972 484 news 112120 2013 367 373
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
