publication . Conference object

Phase Reconstruction with Learned Time-Frequency Representations for Single-Channel Speech Separation

Gordon Wichern; Jonathan Le Roux;
Restricted
  • Publisher: IEEE
Abstract
Progress in solving the cocktail party problem, i.e., separating the speech from multiple overlapping speakers, has recently accelerated with the invention of techniques such as deep clustering and permutation free mask inference. These approaches typically focus on estimating target STFT magnitudes and ignore problems of phase inconsistency. In this paper, we explicitly integrate phase reconstruction into our separation algorithm using a loss function defined on time-domain signals. A deep neural network structure is defined by unfolding a phase reconstruction algorithm and treating each iteration as a layer in our network. Furthermore, instead of using fixed S...
Persistent Identifiers
Subjects
arXiv: Computer Science::Sound
free text keywords: Cluster analysis, Spectrogram, Source separation, Artificial neural network, Iterative reconstruction, Algorithm, Cocktail party effect, Permutation, Time–frequency analysis, Computer science
Any information missing or wrong?Report an Issue