Deep Latent Fusion Layers for Binaural Speech Enhancement

descriptionPublicationkeyboard_double_arrow_right Article 05 Oct 2022Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE/ACM Transactions on Audio, Speech, and Language Processing, volume 31, pages 3,127-3,138 (issn: 2329-9290, eissn: 2329-9304,

Copyright policy )Funded by:DFG | unidentified

Authors: Tom Gajecki; Waldo Nogueira;

doi: 10.36227/techrxiv.21215378.v2 , 10.36227/techrxiv.21215378 , 10.1109/taslp.2023.3301223 , 10.36227/techrxiv.21215378.v1

Deep Latent Fusion Layers for Binaural Speech Enhancement

- Summary
- Metrics

Abstract

<p>This work addresses the issue of enhancing speech in binaural hearing scenarios. Specifically, we present a method to improve binaural noise reduction by integrating latent features produced by monaural speech enhancement algorithms through the use of "Fusion layers." These layers perform Hadamard products between latent spaces at specific processing stages. These fusion layers draw inspiration from multi-task learning techniques, which involve sharing model weights across various models aimed at handling interconnected tasks. The layers perform element-wise dot products between tensors representing latent representations at the same processing stage, mimicking the physiological excitatory and inhibitory mechanisms of the binaural hearing system. This study initially presents a general fusion model, demonstrating its ability to better fit synthetic data compared to independent linear models, equalize activation variance between learning modules, and exploit input data redundancy to improve the training error. We then apply the concept of fusion layers to enhance speech in binaural listening conditions. The proposed method shows promise for improved noise reduction compared to other feature-sharing approaches. The study also suggests that including fusion can enhance predicted speech intelligibility and quality, but too many fused features may have a negative impact on expected speech intelligibility. Furthermore, the results suggest that fusion layers should share parameterized latent representations to effectively utilize information from each listening side, rather than using deterministic representations. Overall, this study highlights the potential of sharing information between speech enhancement modules through deep fusion layers to improve binaural speech enhancement while maintaining constant trainable parameters and improving generalization.</p>

Related Organizations

University of Hannover
Germany

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

3

Top 10%

Average

hybrid

Funded by

DFG| unidentified