AudioMood: Classificação de emoções em bandas sonoras de filmes usando Redes Neuronais

Mendonça, Francisco de Andrade Bravo

Found an issue? Give us feedback

downloadFull-Text

Universidade de Lisb...arrow_drop_down

Universidade de Lisboa: Repositório.UL

Master thesis . 2021

Full-Text: https://repositorio.ulisboa.pt/bitstream/10451/49347/1/TM_Francisco_Mendon%c3%a7a.pdf

Data sources: Universidade de Lisboa: Repositório.UL

AudioMood: Classificação de emoções em bandas sonoras de filmes usando Redes Neuronais

descriptionPublicationkeyboard_double_arrow_right Master thesis 01 Jan 2021 Portugal Portuguese

Authors: Mendonça, Francisco de Andrade Bravo;

handle: 10451/49347

AudioMood: Classificação de emoções em bandas sonoras de filmes usando Redes Neuronais

- Summary
- Subjects
- Metrics

Abstract

O recurso à Inteligência Artificial para a ajuda ou execução de uma tarefa é cada vez mais frequente na nossa vida. Desde assistentes pessoais e médicos ou até carros autónomos, o uso é vasto e é adoptado nas mais diversas áreas. Com o aumentar de complexidade das AI, estas requerem a criação de novos métodos para melhorar o treino de tarefas complexas. Nesse sentido, esta dissertação tenta ajudar o estudo dos métodos de treino de Redes Neuronais, utilizando áudio de modo a que a rede consiga identificar os sons presentes num filme. Para concretizar esse objectivo, o primeiro passo foi a análise de diversos datasets, de forma a seleccionar um que seja adaptado à metodologia utilizada. O dataset escolhido foi o AudioSet da Google, pois tem mais de dois milhões de vídeos anotados, algo que favorece este estudo. De seguida, foram desenvolvidas ferramentas para a criação de conjuntos mais pequenos de dados com base no AudioSet. Estas ferramentas trataram do download dos vídeos, a sua conversão em áudio, a manipulação e tratamento dos últimos, e a construção de novos datasets. No processo anteriormente descrito, foram aplicados os métodos de aumentação de dados, sendo estes a rotação de dados e o controlo de volume. Após a criação do dataset procedeu-se o treino. Para cada treino foi utilizado a mesma arquitectura do modelo, com pequenas diferenças no método de treino. É possível afirmar que para a tarefa escolhida, o aumento de dados no dataset e o uso de rotação de dados melhorou os resultados, enquanto a manipulação de volumes não ofereceu alterações suficientes aos dados para permitir que o modelo melhorasse.

Nowadays the use of Artificial Intelligence to help or execute a task is ever more frequent. From personal assistants, to video games, to autonomous cars, the ability to use AI is vast, and getting adopted in new areas. As the complexity of AI increases, the necessity of developing new methods to help in the training of AI is critical. In that sense, this dissertation tries to help in the study training methods for Neural Networks, using audio sources, so that it is able to identify the different sounds present in a movie. To meet this purpose, the first step was the analysis of different datasets, to find one that is adaptable to the methodology used. The chosen dataset was AudioSet by Google, which has more than 2 million annotated videos. Later, tools were developed to create smaller datasets from AudioSet. These tools took care of video download, their conversion to audio, the manipulation and treatment of these audios, and the construction of new datasets. In this process, data rotation and volume control, two methods of data augmentation, were applied with the intention of creating new data. With the abovementioned new dataset, models were trained. The same model architecture was used for all the training processes, but with small differences in the training method. For the chosen task, it can be said that the increase of data in the dataset and the use of data rotation improved the test results, while volume control didn’t offer sufficient alterations to the data, and so the test results didn’t improve.

Tese de mestrado, Informática, Universidade de Lisboa, Faculdade de Ciências, 2021

Country

Portugal

Related Organizations

Universidade de Lisboa
Portugal

Keywords

Departamento de Informática, AudioSet, Redes Neuronais, Inteligência Artificial, Datasets Abertos, Teses de mestrado - 2021, Augmentação de Dados

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green