
Remote human interaction and human-machine interaction require reliable speech-processing technologies that can work in unconstrained real-world acoustic conditions. Speech recordings are inevitably contaminated by interfering sound sources and by the presence of reverberation. Whether for human or artificial listening, speech enhancement algorithms are necessary to improve speech quality and intelligibility. The vast majority of current algorithms rely on the use of deep neural networks trained in a supervised manner, using a dataset of noisy speech signals labeled with the corresponding clean-speech reference signals. Given the impossibility of acquiring such data in real conditions, datasets are artificially generated by creating synthetic mixtures of isolated speech and noise signals. However, the performance of supervised algorithms drops drastically when these synthetic data differ from the real conditions of use. The current trend is to create larger and larger synthetic datasets, in the unrealistic hope of covering all possible acoustic conditions. In contrast, the DEGREASE project proposes a weakly-supervised learning framework with the aim of developing more flexible, robust and ecologically-valid algorithms that can be trained on real unlabeled data and that are able to adapt to new acoustic conditions. At the crossroad of audio signal processing, probabilistic graphical modeling, and deep learning, we propose a deep generative learning methodological framework for multi-microphone speech signals, which combined with amortized variational inference techniques will allow models to be trained efficiently in a weakly-supervised manner.

Remote human interaction and human-machine interaction require reliable speech-processing technologies that can work in unconstrained real-world acoustic conditions. Speech recordings are inevitably contaminated by interfering sound sources and by the presence of reverberation. Whether for human or artificial listening, speech enhancement algorithms are necessary to improve speech quality and intelligibility. The vast majority of current algorithms rely on the use of deep neural networks trained in a supervised manner, using a dataset of noisy speech signals labeled with the corresponding clean-speech reference signals. Given the impossibility of acquiring such data in real conditions, datasets are artificially generated by creating synthetic mixtures of isolated speech and noise signals. However, the performance of supervised algorithms drops drastically when these synthetic data differ from the real conditions of use. The current trend is to create larger and larger synthetic datasets, in the unrealistic hope of covering all possible acoustic conditions. In contrast, the DEGREASE project proposes a weakly-supervised learning framework with the aim of developing more flexible, robust and ecologically-valid algorithms that can be trained on real unlabeled data and that are able to adapt to new acoustic conditions. At the crossroad of audio signal processing, probabilistic graphical modeling, and deep learning, we propose a deep generative learning methodological framework for multi-microphone speech signals, which combined with amortized variational inference techniques will allow models to be trained efficiently in a weakly-supervised manner.
<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=anr_________::41100dc60953512ef2ca52cc57b7bb46&type=result"></script>');
-->
</script>