<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
ABSTRACT ---------------Rumble has emerged as a prominent platform hosting controversial figures facing restrictions on YouTube. Despite this, the academic community’s engagement with Rumble has been minimal. To help researchers address this gap, we introduce a comprehensive dataset of about 6.7K podcast videos from August 2020 to December 2022, amounting to over 5.6K hours of content. Besides covering metadata of these podcast videos, we provide speech-to-text transcriptions for future analysis. We also provide speaker diarization information, a collection of ~250K unique representative images from podcast videos, and face embeddings of ~400K extracted faces. With the rise of the influence of podcasts and populist figures, this dataset provides a rich resource for identifying challenges in cyber social threats in a relatively underexplored space. Rumble platform: http://rumble.com/ Link to paper: https://workshop-proceedings.icwsm.org/abstract.php?id=2024_07 License: CC BY-NC-SA 4.0 Dataset Summary iDRAMA-rumble-2024 is a large-scale dataset of 6,735 podcast videos from Rumble, an alternative Youtube-like platform. Using state-of-the-art models, we extract information across three modalities: 1) text, 2) audio, and 3) video. We detail the methodology for extracting information from podcast videos in the paper and release a first-of-its-kind dataset including data from different modalities: Metadata: Details about podcast videos, e.g., channel name, video name, video description, and more. Text: Transcription (i.e., speech-to-text) of podcast videos. Audio: Speaker diarization information providing speaker detection over time for each video. Video: Sampled representative video frames from each video, totaling 200K images. We also detect ~400K non-unique faces from these images and release face embeddings. Repository links Zenodo: On Zenodo, we provide JSON formatted dataset for all modalities and representative images in compressed files. Github: The main repository of this dataset, where we provide code snippets to get started with this dataset. Link here: https://github.com/idramalab/iDRAMA-rumble-2024 Huggingface: On Huggingface, we provide a dataset that can be accessed through Huggingface APIs in a `parquet` format. Link here: https://hf.co/datasets/iDRAMALab/iDRAMA-rumble-2024 Dataset Info The dataset is organized by modalities -- transcripts, representative images, speaker diarization, and face embeddings. Config Data-points Podcast videos 6,735 Representative images 252,387 Face embeddings 399,333 Transcripts & Speaker diarization 6,735 Zenodo Dataset Files Info #Files File names Metadata 1 iDRAMA-rumble-2024-metadata.ndjson Speaker diarization 1 iDRAMA-rumble-2024-speaker-dirization.zip Face embeddings 1 iDRAMA-rumble-2024-face-embeddings.ndjson Representation images 5 iDRAMA-rumble-2024-repr-images-set1.tar.gz iDRAMA-rumble-2024-repr-images-set2.tar.gz iDRAMA-rumble-2024-repr-images-set3.tar.gz iDRAMA-rumble-2024-repr-images-set4.tar.gz iDRAMA-rumble-2024-repr-images-set5.tar.gz Transcription Lite (Minimal information) 3 iDRAMA-rumble-2024-transcription-lite_part_1.ndjson iDRAMA-rumble-2024-transcription-lite_part_2.ndjson iDRAMA-rumble-2024-transcription-lite_part_3.ndjson Transcription 3 iDRAMA-rumble-2024-transcription_part_1.ndjson iDRAMA-rumble-2024-transcription_part_2.ndjson iDRAMA-rumble-2024-transcription_part_3.ndjson Authorship This dataset is published in the "Workshop Proceedings of the 18th International AAAI Conference on Web and Social Media" hosted in Buffalo, NY, USA. Academic Organization: iDRAMA Lab Authors: Utkucan Balci, Jay Patel, Berkan Balci, Jeremy Blackburn Affiliation: Binghamton University, Middle East Technical University Licensing This dataset is available for free to use under terms of the non-commercial license CC BY-NC-SA 4.0. Citation @article{balci2024idrama, title = {iDRAMA-rumble-2024: A Dataset of Podcasts from Rumble Spanning 2020 to 2022}, author = {Balci, Utkucan and Patel, Jay and Balci, Berkan and Blackburn, Jeremy}, year = {2024}, journal = {Workshop Proceedings of the 18th International AAAI Conference on Web and Social Media}}
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |