descriptionPublicationkeyboard_double_arrow_right Article 01 Mar 2002 English Publisher:Elsevier BVJournal:Speech Communication, volume 36, pages 247-265 (issn: 0167-6393,

Authors: Wang, Wern-Jun; Liao, Yuan-Fu; Chen, Sin-Horng;

doi: 10.1016/s0167-6393(01)00006-1

RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Summary: A recurrent neural network (RNN) based prosodic modeling method for Mandarin speech-to-text conversion is proposed. The prosodic modeling is performed in the post-processing stage of acoustic decoding and aims at detecting word-boundary cues to assist in linguistic decoding. It employs a simple three-layer RNN to leam the relationship between input prosodic features, extracted from the input utterance with syllable boundaries pre-determined by the preceding acoustic decoder, and output word-boundary information of the associated text. After the RNN prosodic model is properly trained, it can be used to generate word-boundary cues to help the linguistic decoder solving the problem of word-boundary ambiguity. Two schemes of using these word-boundary cues are proposed. Scheme 1 modifies the baseline scheme of the conventional linguistic decoding search by directly taking the RNN outputs as additional scores and adding them to all word-sequence hypotheses to assist in selecting the best recognized word sequence. Scheme 2 is an extended version of Scheme 1 by further using the RNN outputs to drive a finite state machine (FSM) for setting path constraints to restrict the linguistic decoding search. Character accuracy rates of 73.6\%, 74.6\% and 74.7\% were obtained for the systems using the baseline scheme, Schemes 1 and 2, respectively. Besides, a gain of 17\% reduction in the computational complexity of the linguistic decoding search was also obtained for Scheme 2. So the proposed prosodic modeling method is promising for Mandarin speech recognition.

Related Organizations

National Chiao Tung University
Taiwan

Keywords

Computing methodologies and applications, Natural language processing, Pattern recognition, speech recognition, Learning and adaptive systems in artificial intelligence, recurrent neural network

3 Research products, page 1 of 1

Learning Multiscale Transformer Models for Sequence Generation
2022IsAmongTopNSimilarDocuments
Word segmentation by alternating colors facilitates eye guidance in Chinese reading
2018IsAmongTopNSimilarDocuments
Alternating-color words facilitate reading and eye movements among second-language learners of Chinese
2020IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	15
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Top 10%

Average

Fields of Science

medical and health sciences

other medical science

Fields of Science

medical and health sciences

other medical science

Related to Research communities

Digital Humanities and Cultural Heritage

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now

RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion

RNN-based prosodic modeling for mandarin speech and its application to speech-to-text conversion

3 Research products, page 1 of 1

Learning Multiscale Transformer Models for Sequence Generation

Word segmentation by alternating colors facilitates eye guidance in Chinese reading

Alternating-color words facilitate reading and eye movements among second-language learners of Chinese