
This dataset contains metadata and user-generated tags from 1,092,310 YouTube videos collected between November 2, 2006 and January 28, 2007, representing one of the earliest systematic collections of YouTube user behavior data. The data was collected during YouTube's first full year of operation, before the Google acquisition was finalized and before algorithmic recommendations became dominant. It captures organic folksonomy and tagging practices of YouTube's early community. Dataset Statistics:- 1,092,310 unique videos- 517,008 unique tags- 7,530,904 video-tag pairs- 537,246 unique users- 87-day collection period The dataset is provided in multiple formats for accessibility:- SQLite database (1.1 GB)- CSV files (603 MB total)- JSON Lines format (603 MB total)- Sample JSON files (1,000 records each) Historical Significance:This dataset captures a unique moment in social media history when users created tags organically without algorithmic suggestion. Analysis showed that 66% of tags had zero relevance to video titles, descriptions, or authors, demonstrating purely user-driven categorization behavior. Data Collection:Collected via YouTube's Data API v1 (now deprecated) through systematic sampling. The collection methodology and findings were published in peer-reviewed research (see Related Identifiers). This dataset is valuable for research in:- Information Science (folksonomy, user-generated metadata)- Social Computing (early social media practices)- Digital History (internet culture, YouTube's formative period)- Computational Linguistics (natural language use in tags)- Information Retrieval (tag-based search and discovery) For complete documentation, schema details, and example queries, see DATA_DICTIONARY.md and README.md included in the archive.
video metadata, early YouTube, 2006, YouTube, social media, collaborative tagging, 2007, social computing, tagging, folksonomy, Digital Libraries, Computer Science, information retrieval, video sharing, Social Media, Information Science, user-generated content
video metadata, early YouTube, 2006, YouTube, social media, collaborative tagging, 2007, social computing, tagging, folksonomy, Digital Libraries, Computer Science, information retrieval, video sharing, Social Media, Information Science, user-generated content
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
