Permutation Invariant Training Based Single-Channel Multi-Talker Speech Recognition with Music Background

descriptionPublicationkeyboard_double_arrow_right Article 01 Oct 2019Publisher:IEEEJournal:2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM)

Authors: Yuexi Shangguan; Jiayi Yang;

doi: 10.1109/aiam48774.2019.00090

Permutation Invariant Training Based Single-Channel Multi-Talker Speech Recognition with Music Background

- Summary
- Metrics

Abstract

This paper addresses the single-channel multi-talker speech recognition task with music background, where the speech recognition accuracy will deteriorate significantly. To improve the speech recognition accuracy with music background, we propose two approaches: 1) music-separation method to separate human speech from the music background, 2) permutation invariant training (PIT) for single-channel multi-talker, specifically two-talker, speech separation. Experimental results show that all the proposed methods can improve the speech recognition accuracy. Specifically, we use the music-separation method instead of the de-noising feature-mapping method to extract the human speech. Compared with the de-noising feature-mapping method, the music separation method achieves consistent improvement on the music-distorted speech separation and recognition tasks.

Related Organizations

Harbin Institute of Technology
China (People's Republic of)
Renmin University of China
China (People's Republic of)

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Related to Research communities

UArctic

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now