Lipreading with long short-term memory

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Mar 2016Embargo end date: 01 Jan 2016Publisher:IEEEJournal:2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Funded by:EC | PROTOTOUCH, SNSF | Advanced Reinforcement Le...

Authors: Michael Wand 0002; Jan Koutník; Jürgen Schmidhuber;

doi: 10.1109/icassp.2016.7472852 , 10.48550/arxiv.1601.08188

arXiv: 1601.08188

Lipreading with long short-term memory

- Summary
- Subjects
- Metrics

Abstract

Lipreading, i.e. speech recognition from visual-only recordings of a speaker's face, can be achieved with a processing pipeline based solely on neural networks, yielding significantly better accuracy than conventional methods. Feed-forward and recurrent neural network layers (namely Long Short-Term Memory; LSTM) are stacked to form a single structure which is trained by back-propagating error gradients through all the layers. The performance of such a stacked network was experimentally evaluated and compared to a standard Support Vector Machine classifier using conventional computer vision features (Eigenlips and Histograms of Oriented Gradients). The evaluation was performed on data from 19 speakers of the publicly available GRID corpus. With 51 different words to classify, we report a best word accuracy on held-out evaluation speakers of 79.6% using the end-to-end neural network-based solution (11.6% improvement over the best feature-based solution evaluated).

Accepted for publication at ICASSP 2016

Related Organizations

University of Applied Sciences and Arts of Southern Switzerland
Switzerland
Universita della Svizzera Italiana
Switzerland
Dalle Molle Institute for Artificial Intelligence Research
Switzerland

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Computation and Language (cs.CL)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	132
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%

Found an issue? Give us feedback

132

Top 1%

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Funded by

EC| PROTOTOUCH, SNSF| Advanced Reinforcement Learning