Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

Name: Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG)

Daniel Haider; Felix Perfler; Vincent Lostanlen; Martin Ehler; Peter Balazs

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

https://doi.org/10.21437/inter...

Article . 2024 . Peer-reviewed

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2024

License: CC BY NC SA

Data sources: Datacite

DBLP

Article

Data sources: DBLP

DBLP

Conference object

Data sources: DBLP

Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Sep 2024Embargo end date: 01 Jan 2024Publisher:ISCAJournal:Interspeech 2024Funded by:ANR | MuReNN, FWF | Localized, Fusion and Ten..., FWF | Nonsmooth Nonconvex Optim...

Authors: Daniel Haider; Felix Perfler; Vincent Lostanlen; Martin Ehler; Peter Balazs;

doi: 10.21437/interspeech.2024-1622 , 10.48550/arxiv.2408.17358

arXiv: 2408.17358

Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

Convolutional layers with 1-D filters are often used as frontend to encode audio signals. Unlike fixed time-frequency representations, they can adapt to the local characteristics of input data. However, 1-D filters on raw audio are hard to train and often suffer from instabilities. In this paper, we address these problems with hybrid solutions, i.e., combining theory-driven and data-driven approaches. First, we preprocess the audio signals via a auditory filterbank, guaranteeing good frequency localization for the learned encoder. Second, we use results from frame theory to define an unsupervised learning objective that encourages energy conservation and perfect reconstruction. Third, we adapt mixed compressed spectral norms as learning objectives to the encoder coefficients. Using these solutions in a low-complexity encoder-mask-decoder model significantly improves the perceptual evaluation of speech quality (PESQ) in speech enhancement.

Accepted at INTERSPEECH 2024

Related Organizations

Universität Wien
Austria
Acoustics Research Institute, Austrian Academy of Sciences
Austria
Austrian Academy of Sciences
Austria
Universität Wien
Austria
Nantes University

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG)

1 Research products, page 1 of 1

Stable-Hybrid-Auditory-Filterbanks software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Funded by

ANR| MuReNN, FWF| Localized, Fusion and Tensors of Frames, FWF| Nonsmooth Nonconvex Optimization Methods in Acoustics

Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

Hold Me Tight: Stable Encoder-Decoder Design for Speech Enhancement

1 Research products, page 1 of 1

Stable-Hybrid-Auditory-Filterbanks software on GitHub