Low-Latency Deep Clustering For Speech Separation

Preprint English OPEN
Wang, Shanshan; Naithani, Gaurav; Virtanen, Tuomas;
(2019)
  • Subject: Computer Science - Sound | Electrical Engineering and Systems Science - Audio and Speech Processing

This paper proposes a low algorithmic latency adaptation of the deep clustering approach to speaker-independent speech separation. It consists of three parts: a) the usage of long-short-term-memory (LSTM) networks instead of their bidirectional variant used in the origi... View more
  • References (28)
    28 references, page 1 of 3

    [1] E. Vincent, T. Virtanen, and S. Gannot, Audio source separation and speech enhancement. John Wiley & Sons, 2018.

    [2] P. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, “Deep learning for monaural speech separation,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, 2014, pp. 1562-1566.

    [3] H. Erdogan, J. R. Hershey, S. Watanabe, and J. Le Roux, “Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 708-712.

    [4] S. T. Roweis, “One microphone source separation,” in Proc. Advances in Neural Information Processing Systems (NIPS), 2001, pp. 793-799.

    [5] T. Virtanen, “Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 3, pp. 1066-1074, 2007.

    [6] M. N. Schmidt and R. K. Olsson, “Single-channel speech separation using sparse non-negative matrix factorization,” in Proc. International Conference on Spoken Language Processing, 2006.

    [7] J. R. Hershey, Z. Chen, J. Le Roux, and S. Watanabe, “Deep clustering: Discriminative embeddings for segmentation and separation,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 31-35.

    [8] Y. Isik, J. L. Roux, Z. Chen, S. Watanabe, and J. R. Hershey, “Single-channel multi-speaker separation using deep clustering,” in Proc. Interspeech, 2016, pp. 545-549. [Online]. Available: http://dx.doi.org/10.21437/Interspeech.2016-1176

    [9] D. Yu, M. Kolbaek, Z.-H. Tan, and J. Jensen, “Permutation invariant training of deep models for speaker-independent multitalker speech separation,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 241-245.

    [10] Z. Chen, Y. Luo, and N. Mesgarani, “Deep attractor network for single-microphone speaker separation,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 246-250.

  • Related Research Results (2)
  • Metrics
Share - Bookmark