U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2021Embargo end date: 01 Jan 2021Publisher:arXivJournal:CoRR, volume abs/2106.05642

Authors: Di Wu 0061; Binbin Zhang; Chao Yang 0031; Zhendong Peng; Wenjing Xia; Xiaoyu Chen; Xin Lei;

doi: 10.48550/arxiv.2106.05642

arXiv: 2106.05642

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

The unified streaming and non-streaming two-pass (U2) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy, real-time factor (RTF), and latency. In this paper, we present U2++, an enhanced version of U2 to further improve the accuracy. The core idea of U2++ is to use the forward and the backward information of the labeling sequences at the same time at training to learn richer information, and combine the forward and backward prediction at decoding to give more accurate recognition results. We also proposed a new data augmentation method called SpecSub to help the U2++ model to be more accurate and robust. Our experiments show that, compared with U2, U2++ shows faster convergence at training, better robustness to the decoding method, as well as consistent 5\% - 8\% word error rate reduction gain over U2. On the experiment of AISHELL-1, we achieve a 4.63\% character error rate (CER) with a non-streaming setup and 5.05\% with a streaming setup with 320ms latency by U2++. To the best of our knowledge, 5.05\% is the best-published streaming result on the AISHELL-1 test set.

Keywords

FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

1 Research products, page 1 of 1

wenet software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

U2++: Unified Two-pass Bidirectional End-to-end Model for Speech Recognition

1 Research products, page 1 of 1

wenet software on GitHub