WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 18 Sep 2022Embargo end date: 01 Jan 2022Publisher:ISCAJournal:Interspeech 2022

Authors: Binbin Zhang; Di Wu 0061; Zhendong Peng; Xingchen Song; Zhuoyuan Yao; Hang Lv 0001; Lei Xie 0001; +3 Authors

doi: 10.21437/interspeech.2022-483 , 10.48550/arxiv.2203.15455

arXiv: 2203.15455

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

- Summary
- Subjects
- Related research
  (10)
- Metrics

Abstract

Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To further improve ASR performance and facilitate various production requirements, in this paper, we present WeNet 2.0 with four important updates. (1) We propose U2++, a unified two-pass framework with bidirectional attention decoders, which includes the future contextual information by a right-to-left attention decoder to improve the representative ability of the shared encoder and the performance during the rescoring stage. (2) We introduce an n-gram based language model and a WFST-based decoder into WeNet 2.0, promoting the use of rich text data in production scenarios. (3) We design a unified contextual biasing framework, which leverages user-specific context (e.g., contact lists) to provide rapid adaptation ability for production and improves ASR accuracy in both with-LM and without-LM scenarios. (4) We design a unified IO to support large-scale data for effective model training. In summary, the brand-new WeNet 2.0 achieves up to 10\% relative recognition performance improvement over the original WeNet on various corpora and makes available several important production-oriented features.

Related Organizations

Horizon Robotics (China)
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

10 Research products, page 1 of 1

Analysis of the Big-Five personality traits in the Chatbot "UC - Paraguay"
2022IsAmongTopNSimilarDocuments
Book Reviews: The Inner Work of Leaders Leadership as a Habit of Mind Barbara Mackoff and Gary Wenet New York, NY: Amacom, 2000 226 pp., $24.95 Hardcover
2001IsAmongTopNSimilarDocuments
Sur les peuples de nom «vénète» ou assimilé dans l’Occident européen
2003IsAmongTopNSimilarDocuments
WeNet: Weighted Networks for Recurrent Network Architecture Search
2019IsAmongTopNSimilarDocuments
A Near-Optimal Condition for Exact Sparse Recovery with Orthogonal Least Squares
2019IsAmongTopNSimilarDocuments
WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit
2021IsAmongTopNSimilarDocuments
TALCS: An open-source Mandarin-English code-switching corpus and a speech recognition baseline
2022IsAmongTopNSimilarDocuments
The Corn Inhibitor of Activated Hageman Factor: Purification and Properties of Two Recombinant Forms of the Protein
1998IsAmongTopNSimilarDocuments
Model evaluation of CO₂ reduction technologies in the Asia‐pacific region
1995IsAmongTopNSimilarDocuments
wenet software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	37
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%

Found an issue? Give us feedback

37

Top 10%

Top 1%

Green

Fields of Science (4) View all

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

View all