A Hidden Semi-Markov Model-Based Speech Synthesis System

descriptionPublicationkeyboard_double_arrow_right Article 01 May 2007 English Publisher:Institute of Electronics, Information and Communications Engineers (IEICE)Journal:IEICE Transactions on Information and Systems, volume E90-D, pages 825-834 (issn: 0916-8532, eissn: 1745-1361,

Authors: Heiga Zen; Keiichi Tokuda; Takashi Masuko; Takao Kobayashi; Tadashi Kitamura;

doi: 10.1093/ietisy/e90-d.5.825

A Hidden Semi-Markov Model-Based Speech Synthesis System

- Summary
- Related research
  (1)
- Metrics

Abstract

A statistical speech synthesis system based on the hidden Markov model (HMM) was recently proposed. In this system, spectrum, excitation, and duration of speech are modeled simultaneously by context-dependent HMMs, and speech parameter vector sequences are generated from the HMMs themselves. This system defines a speech synthesis problem in a generative model framework and solves it based on the maximum likelihood (ML) criterion. However, there is an inconsistency: although state duration probability density functions (PDFs) are explicitly used in the synthesis part of the system, they have not been incorporated into its training part. This inconsistency can make the synthesized speech sound less natural. In this paper, we propose a statistical speech synthesis system based on a hidden semi-Markov model (HSMM), which can be viewed as an HMM with explicit state duration PDFs. The use of HSMMs can solve the above inconsistency because we can incorporate the state duration PDFs explicitly into both the synthesis and the training parts of the system. Subjective listening test results show that use of HSMMs improves the reported naturalness of synthesized speech.

Related Organizations

Institute of Science Tokyo
Japan

1 Research products, page 1 of 1

Modeling of various speaking styles and emotions for synthetic speech
2003IsPartOf

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	156
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%