
doi: 10.37943/17nzkg3418
This article analysis a Kazakh Music dataset, which consists of 800 audio tracks equally distributed across 5 different genres. The purpose of this research is to classify music genres by using machine learning algorithms Decision Tree Classifier and Logistic regression. Before the classification, the given data was pre-processed, missing or irrelevant data was removed. The given dataset was analyzed using a correlation matrix and data visualization to identify patterns. To reduce the dimension of the original dataset, the PCA method was used while maintaining variance. Several key studies aimed at analyzing and developing machine learning models applied to the classification of musical genres are reviewed. Cumulative explained variance was also plotted, which showed the maximum proportion (90%) of discrete values generated from multiple individual samples taken along the Gaussian curve. A comparison of the decision tree model to a logistic regression showed that for f1 Score Logistic regression produced the best result for classical music - 82%, Decision tree classification - 75%. For other genres, the harmonic mean between precision and recall for the logistic regression model is equal to zero, which means that this model completely fails to classify the genres Zazz, Kazakh Rock, Kazakh hip hop, Kazakh pop music. Using the Decision tree classifier algorithm, the Zazz and Kazakh pop music genres were not recognized, but Kazakh Rock with an accuracy and completeness of 33%. Overall, the proposed model achieves an accuracy of 60% for the Decision Tree Classifier and 70% for the Logistic regression model on the training and validation sets. For uniform classification, the data were balanced and assessed using the cross-validation method. The approach used in this study may be useful in classifying different music genres based on audio data without relying on human listening.
music genre, decision tree classifier, machine learning algorithms, logistic regression, Information technology, T58.5-58.64, cross-validation
music genre, decision tree classifier, machine learning algorithms, logistic regression, Information technology, T58.5-58.64, cross-validation
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
