
Abstract Scholarly communication through YouTube videos has been increasing. Although Altmetric (https://altmetric.com/) provides the dataset on such references, its coverage is unclear, and it does not contain the original external links in each video. Considering this background, we built and published a dataset of scholarly bibliographic references on YouTube videos by using YouTube Data API v3, targeting six types of domain names: "doi.org," "ncbi.nlm.nih.gov," ieeexplore.ieee.org," "link.springer.com," "onlinelibrary.wiley.com," and "sciencedirect.com." As a result, we identified approximately 480,000 references associated with Crossref DOIs among 230,000 videos published by December 31, 2023, posted on 55,000 channels. Notably, over half of these references were not covered by the Altmetric dataset, resulting in a 150% increase in the number of references when combining the dataset constructed by the proposed method with the Altmetric dataset, compared to the Altmetric dataset alone. Regarding external links, PubMed and DOI links were prominent; however, a substantial number of direct links to publisher platforms were observed. Most channels and videos contained external links to a single platform, scattered across each platform. This dataset is helpful for identifying and analyzing scholarly references on YouTube.As for the original paper related to this dataset, please refer to the references section. Data Records The data format of the dataset is JSON lines, where each line is a single record. The data is split into files by DOI Registration Agencies. A sample of the record is as follows: { "channel_id": "UCEfEi-IMiB87UsxY3765P6w", "video_id": "e7YmyVd4uOE", "is_covered_by_altmetric_com": false, "youtube_data_api_search": [ { "query": "doi.org", "uri": "http://dx.doi.org/10.1145/2807442.2814654" } ], "doi": "10.1145/2807442.2814654", "doiRA": "Crossref"} channel_id (String) -- Channel ID of the YouTube channel that uploaded the video. video_id (String) -- Video ID. is_covered_by_altmetric_com (Boolean) -- Whether this reference is covered by altmetric.com or not. youtube_data_api_search (Array) query (String) -- The query used in the search:list of YouTube Data API v3. (https://developers.google.com/youtube/v3/docs/search/list?hl=en) uri (String)-- The original external links written in the description text or video title in each video. doi (String) -- DOI corresponding to the bibliographic reference in the video. doiRA (String) -- DOI registration agency for the DOI. We obtained this data using the WhichRA? API (https://www.doi.org/the-identifier/resources/factsheets/doi-resolution-documentation#4-which-ra). We note that the altmetric dataset obtained from Altmetric Explorer in this study is not included in this dataset. References Kikkawa, Jiro; Takaku, Masao; Yoshikane, Fuyuki: "Enhancing Identification of Scholarly Reference on YouTube: Method Development and Analysis of External Link Characteristics", Proceedings of the 28th International Conference on Theory and Practice of Digital Libraries (TPDL 2024), Ljubljana, Slovenia, Lecture Notes in Computer Science (LNCS), Vol.15178, 2024.09. (in press). Fundings JSPS KAKENHI Grant Numbers JP22K18147, JP23K11761, and JP24K15652.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
