
Работа ÑоÑтоит из неÑкольких Ñтапов, а именно: обзор предметной облаÑти, ознакомление Ñ Ñ‚ÐµÐ¾Ñ€ÐµÑ‚Ð¸Ñ‡ÐµÑким наполнением ÑиÑтемы, техничеÑÐºÐ°Ñ Ñ€ÐµÐ°Ð»Ð¸Ð·Ð°Ñ†Ð¸Ñ Ð¼Ð¾Ð´ÐµÐ»Ð¸ и Ñбор результатов. Ð’ данной работе производитÑÑ Ð¸Ð·ÑƒÑ‡ÐµÐ½Ð¸Ðµ Ñферы речевого раÑÐ¿Ð¾Ð·Ð½Ð°Ð²Ð°Ð½Ð¸Ñ Ñмоций, анализ и выÑвление оÑновных компонентов, которые необходимы Ð´Ð»Ñ Ð¿Ð¾ÑÑ‚Ñ€Ð¾ÐµÐ½Ð¸Ñ Ñовременной ÑиÑтемы SER. Кроме того, в рамках работы приводитÑÑ Ð¿Ð¾Ð´Ñ€Ð¾Ð±Ð½Ð¾Ðµ изучение вÑех деталей и оÑобенноÑтей модели, нарÑду Ñо Ñхемой Ñ„ÑƒÐ½ÐºÑ†Ð¸Ð¾Ð½Ð¸Ñ€Ð¾Ð²Ð°Ð½Ð¸Ñ ÑиÑтемы как по отдельноÑти, так и целиком. Были напиÑаны функции и методы Ð´Ð»Ñ Ð¸Ð·Ð²Ð»ÐµÑ‡ÐµÐ½Ð¸Ñ Ñ‡ÐµÑ‚Ñ‹Ñ€ÐµÑ… признаков речевого Ñигнала, а именно: Ñпектрограммы, кохлеаграммы, набора мел-кепÑтральных коÑффициентов и фрактальных размерноÑтей, а также реализована архитектура 3D CNN Ñ Ð¼Ð¾Ð´ÑƒÐ»ÐµÐ¼ вниманиÑ. Ð’ качеÑтве результатов получены 4 модели, обученные на 3 датаÑетах (SAVEE, RAVDESS, TESS) по отдельноÑти и на Ñмешанной выборке, которые во многом не уÑтупают в точноÑти актуальным иÑÑледованиÑм, а также приведена ÑÑ€Ð°Ð²Ð½Ð¸Ñ‚ÐµÐ»ÑŒÐ½Ð°Ñ Ñ…Ð°Ñ€Ð°ÐºÑ‚ÐµÑ€Ð¸Ñтика, Ð´Ð¾ÐºÐ°Ð·Ñ‹Ð²Ð°ÑŽÑ‰Ð°Ñ Ð·Ð½Ð°Ñ‡Ð¸Ð¼Ð¾Ñть иÑÐ¿Ð¾Ð»ÑŒÐ·Ð¾Ð²Ð°Ð½Ð¸Ñ Ñ„Ñ€Ð°ÐºÑ‚Ð°Ð»ÑŒÐ½Ñ‹Ñ… размерноÑтей в Ñфере глубокого Ð¾Ð±ÑƒÑ‡ÐµÐ½Ð¸Ñ Ð´Ð»Ñ ÐºÐ»Ð°ÑÑификации Ñмоций.
The work consists of several stages: review of the subject area, familiarization with the theoretical content of the system, technical implementation of the model and summary of the results. In this paper, we study the sphere of speech recognition of emotions, analyze and identify the main components that are necessary to build a modern SER system. In addition, the work provides a detailed study of all the details and features of the model, along with a scheme for the functioning of the system, both individually and as a whole. Functions and methods were written to extract four features of a speech signal, namely: spectrogram, cochleagram, a set of mel-cepstral coefficients (MFCC) and fractal dimensions, and a 3D CNN architecture with an attention module was implemented. As a result, 4 models were obtained, trained on 3 datasets (SAVEE, RAVDESS, TESS) separately and on a mixed sample, which are in many ways not inferior in accuracy to current research, and a comparative characteristic is given that proves the importance of using fractal dimensions in the field of deep learning to classify emotions.
ÐºÐ¾Ñ Ð»ÐµÐ°Ð³Ñамма, Ð¼Ð¾Ð´ÐµÐ»Ñ Ð³Ð»Ñбокого обÑÑениÑ, neural network, CBAM, ÑÑакÑалÑнÑе ÑазмеÑноÑÑи, deep learning, SER, MFCC, spectrogram, cochleagram, fractal dimensions, нейÑÐ¾Ð½Ð½Ð°Ñ ÑеÑÑ, CNN, ÑпекÑÑогÑамма
ÐºÐ¾Ñ Ð»ÐµÐ°Ð³Ñамма, Ð¼Ð¾Ð´ÐµÐ»Ñ Ð³Ð»Ñбокого обÑÑениÑ, neural network, CBAM, ÑÑакÑалÑнÑе ÑазмеÑноÑÑи, deep learning, SER, MFCC, spectrogram, cochleagram, fractal dimensions, нейÑÐ¾Ð½Ð½Ð°Ñ ÑеÑÑ, CNN, ÑпекÑÑогÑамма
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
