Dynamical Variational Autoencoders: A Comprehensive Review

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Other literature type 01 Jan 2022Embargo end date: 01 Jan 2020 English Publisher:EmeraldJournal:Foundations and Trends in Machine Learning, volume 15, pages 1-175 (issn: 1935-8237, eissn: 1935-8245,

Copyright policy )Funded by:EC | SPRING

Authors: Laurent Girin; Simon Leglaive; Xiaoyu Bie; Julien Diard; Thomas Hueber; Xavier Alameda-Pineda;

doi: 10.1561/2200000089 , 10.48550/arxiv.2008.12595

arXiv: 2008.12595

Dynamical Variational Autoencoders: A Comprehensive Review

- Summary
- Subjects
- Metrics

Abstract

Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, the input data vectors are processed independently. Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models. In this monograph, we perform a literature review of these models. We introduce and discuss a general class of models, called dynamical variational autoencoders (DVAEs), which encompasses a large subset of these temporal VAE extensions. Then, we present in detail seven recently proposed DVAE models, with an aim to homogenize the notations and presentation lines, as well as to relate these models with existing classical temporal models. We have reimplemented those seven DVAE models and present the results of an experimental benchmark conducted on the speech analysis-resynthesis task (the PyTorch code is made publicly available). The monograph concludes with a discussion on important issues concerning the DVAE class of models and future research guidelines.

Related Organizations

CentraleSupélec
France
CNRS (LCP)
France
French National Centre for Scientific Research
France
CENTRALESUPELEC
France
Grenoble Alpes University
France

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Research exposition (monographs, survey articles) pertaining to computer science, Learning and adaptive systems in artificial intelligence, deep learning, nonlinear signal processing, Machine Learning (stat.ML), dynamics, speech/audio/image/video compression, Machine Learning (cs.LG), Statistics - Machine Learning, latent variable models, graphical models, time-series analysis, learning and statistical methods, Artificial neural networks and deep learning, variational inference, dimensionality reduction

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	127
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 0.1%