Audio Decoding by Inverse Problem Solving

Name: Audio Decoding by Inverse Problem Solving
Keywords: FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG)

T., Pedro J. Villasana; Villemoes, Lars; Klejsa, Janusz; Hedelin, Per

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1109/icassp...

Article . 2025 . Peer-reviewed

License: STM Policy #29

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2024

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

Audio Decoding by Inverse Problem Solving

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 06 Apr 2025Embargo end date: 01 Jan 2024Publisher:IEEEJournal:ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Authors: T., Pedro J. Villasana; Villemoes, Lars; Klejsa, Janusz; Hedelin, Per;

doi: 10.1109/icassp49660.2025.10888255 , 10.48550/arxiv.2409.07858

arXiv: 2409.07858

Audio Decoding by Inverse Problem Solving

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

We consider audio decoding as an inverse problem and solve it through diffusion posterior sampling. Explicit conditioning functions are developed for input signal measurements provided by an example of a transform domain perceptual audio codec. Viability is demonstrated by evaluating arbitrary pairings of a set of bitrates and task-agnostic prior models. For instance, we observe significant improvements on piano while maintaining speech performance when a speech model is replaced by a joint model trained on both speech and piano. With a more general music model, improved decoding compared to legacy methods is obtained for a broad range of content types and bitrates. The noisy mean model, underlying the proposed derivation of conditioning, enables a significant reduction of gradient evaluations for diffusion posterior sampling, compared to methods based on Tweedie's mean. Combining Tweedie's mean with our conditioning functions improves the objective performance. An audio demo is available at https://dpscodec-demo.github.io/.

5 pages, 4 figures, audio demo available at https://dpscodec-demo.github.io/, pre-review version submitted to ICASSP 2025

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG)

1 Research products, page 1 of 1

visqol software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Audio Decoding by Inverse Problem Solving

Audio Decoding by Inverse Problem Solving

1 Research products, page 1 of 1

visqol software on GitHub