Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
PhysicalObject . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
PhysicalObject . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
PhysicalObject . 2022
License: CC BY
Data sources: ZENODO
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Sudo rm -rf pre-trained audio source separation models

Authors: Efthymios Tzinis;

Sudo rm -rf pre-trained audio source separation models

Abstract

Efficient pre-trained models for 8kHz 2-speaker source separation (anechoic and noisy with reverberation). You can see the full code here alongside some basic description of the models' performance and computation requirements github-codebase. You can git-clone the repo and download the pre-trained models under: sudo_rm_rf/pretrained_models We have also prepared an easy to use example for the pre-trained sudo rm -rf models here python-notebook so you can take all models for a spin �������.. Simply normalize the input audio and infer! # Load a pretrained model separation_model = torch.load(anechoic_model_p) # Normalize the waveform and apply the model input_mix_std = separation_model.std(-1, keepdim=True) input_mix_mean = separation_model.mean(-1, keepdim=True) input_mix = (separation_model - input_mix_mean) / (input_mix_std + 1e-9) # Apply the model rec_sources_wavs = separation_model(input_mix.unsqueeze(1)) # Rescale the input sources with the mixture mean and variance rec_sources_wavs = (rec_sources_wavs * input_mix_std) + input_mix_mean One of the main points that sudo rm -rf models have brought forward is that focusing only on the reconstruction fidelity performance and ignoring all other computational metrics, such as: execution time and actual memory consumption is an ideal way of wasting resources for getting almost neglidgible performance improvement. To that end, we show that the Sudo rm -rf models can provide a very effective alternative for a range of separation tasks while also being respectful to users who do not have access to immense computational power or researchers who prefer not to train their models for weeks on a multitude of GPUs. Results on WSJ0-2mix Results on WHAMR! Thus, Sudo rm- rf models are able to perform adequately with SOTA and even surpass it in certain cases with minimal computational overhead in terms of both time and memory. Also, the importance of reporting all the above metrics when proposign a new model becomes apparent. We have conducted all the experiments assuming 8kHz sampling rate and 4 seconds of input audio on a server with an NVIDIA GeForce RTX 2080 Ti (11 GBs) and an 12-core Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz. OOM means out of memory for the corresponding configuration. A value of Z ex/sec corresponds to the throughput of each model, in other words, for each second that passes, the model is is capable of processing (either forward or backward pass) Z 32,000 sampled audio files. The attention models, which undoubtly provide the best performance in most of the cases, are extremely heavy in terms of actual time and memory consumption (even if they appear that the number of parameters is rather small). They also become prohibitively expenssive for longer sequencies. Please cite as: ```BibTex @inproceedings{tzinis2020sudo, title={Sudo rm-rf: Efficient networks for universal audio source separation}, author={Tzinis, Efthymios and Wang, Zhepei and Smaragdis, Paris}, booktitle={2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP)}, pages={1--6}, year={2020}, organization={IEEE} } @article{tzinis2022compute, title={Compute and Memory Efficient Universal Sound Source Separation}, author={Tzinis, Efthymios and Wang, Zhepei and Jiang, Xilin and Smaragdis, Paris}, journal={Journal of Signal Processing Systems}, year={2022}, volume={94}, number={2}, pages={245--259}, publisher={Springer} } ```

{"references": ["Tzinis, E., Wang, Z. and Smaragdis, P., 2020, September. Sudo rm-rf: Efficient networks for universal audio source separation. In 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP) (pp. 1-6). IEEE.", "E. Tzinis, Z. Wang, X. Jiang, and P. Smaragdis, \"Compute and memory efficient universal sound source separation,\" Journal of Signal Processing Systems, vol. 94, no. 2, pp. 245\u2013259, 2022."]}

Related Organizations
Keywords

sudo rm rf, audio source separation, speech separation, pre-trained models

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 29
    download downloads 326
  • 29
    views
    326
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
29
326