Learning Robust Speech Representation with an Articulatory-Regularized Variational Autoencoder

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 30 Aug 2021Embargo end date: 01 Jan 2021Publisher:ISCAJournal:Interspeech 2021Funded by:ANR | MIAI

Authors: Georges, Marc-Antoine; Girin, Laurent; Schwartz, Jean-Luc; Hueber, Thomas;

doi: 10.21437/interspeech.2021-1604 , 10.48550/arxiv.2104.03204

arXiv: 2104.03204

Learning Robust Speech Representation with an Articulatory-Regularized Variational Autoencoder

- Summary
- Subjects
- Metrics

Abstract

It is increasingly considered that human speech perception and production both rely on articulatory representations. In this paper, we investigate whether this type of representation could improve the performances of a deep generative model (here a variational autoencoder) trained to encode and decode acoustic speech features. First we develop an articulatory model able to associate articulatory parameters describing the jaw, tongue, lips and velum configurations with vocal tract shapes and spectral features. Then we incorporate these articulatory parameters into a variational autoencoder applied on spectral features by using a regularization technique that constraints part of the latent space to follow articulatory trajectories. We show that this articulatory constraint improves model training by decreasing time to convergence and reconstruction loss at convergence, and yields better performance in a speech denoising task.

Related Organizations

Grenoble Alpes University
France

Keywords

FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Average

Green

Fields of Science (4) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all

Funded by

ANR| MIAI