
The Brazilian YouTube Regional Music (BYRM) dataset was created to support research on the automatic classification of Brazilian regional music genres using machine learning models and other computational approaches. It includes tracks from ten culturally diverse Brazilian genres: axé, rock brasileiro, toada, carimbó, samba, pagode, xote gaúcho, vaneira, sertanejo, and forró. Due to copyright restrictions, the original audio tracks are not included in this release. Instead, this dataset provides: BYRM_specs_v1.zip: Mel-spectrogram images (PNG) extracted from 3s, 5s, and 10s segments within different temporal excerpts of each song. Data is organized by genre, excerpt (e.g., 0–30s, 90–120s), and partition (train/val/test). BYRM_features_v1.zip: Acoustic feature vectors extracted using the Librosa library, including MFCC, chroma, spectral centroid, rolloff, zero-crossing rate, bandwidth, and tempo. Each CSV file corresponds to a segment configuration. metadata_csv: A set of 10 CSV files containing metadata for the original YouTube tracks used to construct the dataset. Each file provides information such as video title, YouTube ID, and channel name. This dataset was developed as part of the Master's dissertation titled: "Aprendizagem Profunda com Redes de Transformadores de Visão Computacional para Reconhecimento de Gêneros Musicais"by Victória de Souza Guimarães, supervised by Prof. Dr. Rosiane de Freitas, Universidade Federal do Amazonas (UFAM), 2025. Research based on this dataset has resulted in the following academic contributions: Accepted: “Segment-based evaluation of music genre classification models with the BYRM Dataset” – KDMiLe 2025 Accepted: “Understanding genre similarity in Brazilian music through Vision Transformer embeddings” - SBCM 2025 Acknowledgments This research was supported by FAPEAM (Fundação de Amparo à Pesquisa do Estado do Amazonas), CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico), and CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior). We gratefully acknowledge the support of these institutions for making this research possible. Citation If you use this dataset in your work, please cite both the dataset and the associated peer-reviewed publication: Dataset: Guimarães, Victória de Souza; de Freitas, Rosiane. BYRM: Brazilian YouTube Regional Music Dataset. Zenodo, 2025. https://doi.org/10.5281/zenodo.16617888 Associated publication: Guimarães, Victória; Kienen, João Gustavo; de Freitas, Rosiane. Segment-based evaluation of music genre classification models with the BYRM Dataset. In: Proceedings of the 13th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe), 2025, Fortaleza, Brazil. Porto Alegre: Sociedade Brasileira de Computação, 2025. p. 97–104. https://doi.org/10.5753/kdmile.2025.247582 BibTex @dataset{guimaraes2025byrm, author = {Guimar{\~a}es, Vict{\'o}ria de Souza and de Freitas, Rosiane}, title = {BYRM: Brazilian YouTube Regional Music Dataset}, year = {2025}, publisher = {Zenodo}, doi = {10.5281/zenodo.16617888}, url = {https://doi.org/10.5281/zenodo.16617888}} @inproceedings{guimaraes2025segment, author = {Guimar{\~a}es, Vict{\'o}ria and Kienen, Jo{\~a}o Gustavo and de Freitas, Rosiane}, title = {Segment-based evaluation of music genre classification models with the BYRM Dataset}, booktitle = {Proceedings of the 13th Symposium on Knowledge Discovery, Mining and Learning (KDMiLe)}, pages = {97--104}, year = {2025}, publisher = {Sociedade Brasileira de Computa{\c{c}}{\~a}o}, address = {Porto Alegre, Brazil}, doi = {10.5753/kdmile.2025.247582}, url = {https://doi.org/10.5753/kdmile.2025.247582}}
Mel-spectrogram, Acoustic features, Deep Learning, Music genre classification, Music information retrieval, BYRM, Regional Music
Mel-spectrogram, Acoustic features, Deep Learning, Music genre classification, Music information retrieval, BYRM, Regional Music
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
