Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

YouTube Tagging Dataset (2006-2007): 1+ Million Videos from Early YouTube

Authors: Burns, Samuel; Geisler, Gary;

YouTube Tagging Dataset (2006-2007): 1+ Million Videos from Early YouTube

Abstract

This dataset contains metadata and user-generated tags from 1,092,310 YouTube videos collected between November 2, 2006 and January 28, 2007, representing one of the earliest systematic collections of YouTube user behavior data. The data was collected during YouTube's first full year of operation, before the Google acquisition was finalized and before algorithmic recommendations became dominant. It captures organic folksonomy and tagging practices of YouTube's early community. Dataset Statistics:- 1,092,310 unique videos- 517,008 unique tags- 7,530,904 video-tag pairs- 537,246 unique users- 87-day collection period The dataset is provided in multiple formats for accessibility:- SQLite database (1.1 GB)- CSV files (603 MB total)- JSON Lines format (603 MB total)- Sample JSON files (1,000 records each) Historical Significance:This dataset captures a unique moment in social media history when users created tags organically without algorithmic suggestion. Analysis showed that 66% of tags had zero relevance to video titles, descriptions, or authors, demonstrating purely user-driven categorization behavior. Data Collection:Collected via YouTube's Data API v1 (now deprecated) through systematic sampling. The collection methodology and findings were published in peer-reviewed research (see Related Identifiers). This dataset is valuable for research in:- Information Science (folksonomy, user-generated metadata)- Social Computing (early social media practices)- Digital History (internet culture, YouTube's formative period)- Computational Linguistics (natural language use in tags)- Information Retrieval (tag-based search and discovery) For complete documentation, schema details, and example queries, see DATA_DICTIONARY.md and README.md included in the archive.

Related Organizations
Keywords

video metadata, early YouTube, 2006, YouTube, social media, collaborative tagging, 2007, social computing, tagging, folksonomy, Digital Libraries, Computer Science, information retrieval, video sharing, Social Media, Information Science, user-generated content

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average