Encoding of phonology in a recurrent neural model of grounded speech

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object , Other literature type 01 Jan 2017Embargo end date: 01 Jan 2017 Netherlands Publisher:Association for Computational Linguistics (ACL)Journal:Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

Authors: Afra Alishahi; Marie Barking; Grzegorz Chrupala;

doi: 10.18653/v1/k17-1037 , 10.48550/arxiv.1706.03815

arXiv: 1706.03815

Encoding of phonology in a recurrent neural model of grounded speech

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how information about individual phonemes is encoded in the MFCC features extracted from the speech signal, and the activations of the layers of the model. Via experiments with phoneme decoding and phoneme discrimination we show that phoneme representations are most salient in the lower layers of the model, where low-level signals are processed at a fine-grained level, although a large amount of phonological information is retain at the top recurrent layer. We further find out that the attention mechanism following the top recurrent layer significantly attenuates encoding of phonology and makes the utterance embeddings much more invariant to synonymy. Moreover, a hierarchical clustering of phoneme representations learned by the network shows an organizational structure of phonemes similar to those proposed in linguistics.

Accepted at CoNLL 2017

Country

Netherlands

Related Organizations

Tilbury University
Netherlands
Tilburg University
Netherlands
Tilburg University
Tilburg University
STICHTING KATHOLIEKE UNIVERSITEIT BRABANT UNIVERSITEIT VAN TILBURG
Netherlands

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Computer Science - Computation and Language, Computation and Language (cs.CL), Computer Science - Sound, Machine Learning (cs.LG)

3 Research products, page 1 of 1

visually-grounded-speech software on GitHub
IsRelatedTo
gentle software on GitHub
IsRelatedTo
gTTS software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	14
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%