Sudo rm -rf pre-trained audio source separation models

Name: Sudo rm -rf pre-trained audio source separation models
Creator: Tzinis, Efthymios
Keywords: sudo rm rf, audio source separation, speech separation, pre-trained models

appsOther research productkeyboard_double_arrow_right PhysicalObject 26 Feb 2022 English Publisher:Zenodo

Authors: Tzinis, Efthymios;

doi: 10.5281/zenodo.6299851 , 10.5281/zenodo.6299852

Sudo rm -rf pre-trained audio source separation models

- Summary
- Subjects
- Metrics

Abstract

Efficient pre-trained models for 8kHz 2-speaker source separation (anechoic and noisy with reverberation). You can see the full code here alongside some basic description of the models' performance and computation requirements github-codebase. You can git-clone the repo and download the pre-trained models under: sudo_rm_rf/pretrained_models We have also prepared an easy to use example for the pre-trained sudo rm -rf models here python-notebook so you can take all models for a spin ��.. Simply normalize the input audio and infer! # Load a pretrained model separation_model = torch.load(anechoic_model_p) # Normalize the waveform and apply the model input_mix_std = separation_model.std(-1, keepdim=True) input_mix_mean = separation_model.mean(-1, keepdim=True) input_mix = (separation_model - input_mix_mean) / (input_mix_std + 1e-9) # Apply the model rec_sources_wavs = separation_model(input_mix.unsqueeze(1)) # Rescale the input sources with the mixture mean and variance rec_sources_wavs = (rec_sources_wavs * input_mix_std) + input_mix_mean One of the main points that sudo rm -rf models have brought forward is that focusing only on the reconstruction fidelity performance and ignoring all other computational metrics, such as: execution time and actual memory consumption is an ideal way of wasting resources for getting almost neglidgible performance improvement. To that end, we show that the Sudo rm -rf models can provide a very effective alternative for a range of separation tasks while also being respectful to users who do not have access to immense computational power or researchers who prefer not to train their models for weeks on a multitude of GPUs. Results on WSJ0-2mix Results on WHAMR! Thus, Sudo rm- rf models are able to perform adequately with SOTA and even surpass it in certain cases with minimal computational overhead in terms of both time and memory. Also, the importance of reporting all the above metrics when proposign a new model becomes apparent. We have conducted all the experiments assuming 8kHz sampling rate and 4 seconds of input audio on a server with an NVIDIA GeForce RTX 2080 Ti (11 GBs) and an 12-core Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz. OOM means out of memory for the corresponding configuration. A value of Z ex/sec corresponds to the throughput of each model, in other words, for each second that passes, the model is is capable of processing (either forward or backward pass) Z 32,000 sampled audio files. The attention models, which undoubtly provide the best performance in most of the cases, are extremely heavy in terms of actual time and memory consumption (even if they appear that the number of parameters is rather small). They also become prohibitively expenssive for longer sequencies. Please cite as: ```BibTex @inproceedings{tzinis2020sudo, title={Sudo rm-rf: Efficient networks for universal audio source separation}, author={Tzinis, Efthymios and Wang, Zhepei and Smaragdis, Paris}, booktitle={2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP)}, pages={1--6}, year={2020}, organization={IEEE} } @article{tzinis2022compute, title={Compute and Memory Efficient Universal Sound Source Separation}, author={Tzinis, Efthymios and Wang, Zhepei and Jiang, Xilin and Smaragdis, Paris}, journal={Journal of Signal Processing Systems}, year={2022}, volume={94}, number={2}, pages={245--259}, publisher={Springer} } ```

{"references": ["Tzinis, E., Wang, Z. and Smaragdis, P., 2020, September. Sudo rm-rf: Efficient networks for universal audio source separation. In 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP) (pp. 1-6). IEEE.", "E. Tzinis, Z. Wang, X. Jiang, and P. Smaragdis, \"Compute and memory efficient universal sound source separation,\" Journal of Signal Processing Systems, vol. 94, no. 2, pp. 245\u2013259, 2022."]}

Related Organizations

University of Illinois at Urbana Champaign
United States

Keywords

sudo rm rf, audio source separation, speech separation, pre-trained models

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	29
download	downloads	326

29
views
326
downloads
Powered by

Found an issue? Give us feedback

visibility

download

Average

326