Soft context clustering for F0 modeling in HMM-based speech synthesis

descriptionPublicationkeyboard_double_arrow_right Article 09 Jan 2015 United Kingdom English Publisher:Springer Science and Business Media LLCJournal:EURASIP Journal on Advances in Signal Processing, volume 2,015 (eissn: 1687-6180,

Copyright policy )Funded by:EC | SIMPLE4ALL

Authors: Soheil Khorram; Hossein Sameti; Simon King 0001;

doi: 10.1186/1687-6180-2015-2

handle: 20.500.11820/bbb2cf00-dc35-41d4-81ba-8e673108ce3f

Soft context clustering for F0 modeling in HMM-based speech synthesis

- Summary
- Subjects
- Related research
  (5)
- Metrics

Abstract

This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional ‘hard’ decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this ‘divide-and-conquer’ approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method.

Country

United Kingdom

Related Organizations

Universtity of Edinburgh
United Kingdom
Sharif University of Technology (Sharif Policy Research Institute)
Iran (Islamic Republic of)
University of Edinburgh
United Kingdom
Sharif University of Technology
Iran (Islamic Republic of)

Keywords

Hidden Markov model, HMM-based speech synthesis, F0 modeling, soft decision tree, maximum entropy model, statistical parametric speech synthesis, context clustering, decision tree-based clustering, HMM, soft context clustering

5 Research products, page 1 of 1

Wireless Sensors for Brain Activity—A Survey
2020IsAmongTopNSimilarDocuments
Accelerating integer-based fully homomorphic encryption using Comba multiplication
2014IsAmongTopNSimilarDocuments
Identifying energy-efficient concurrency levels using machine learning
2007IsAmongTopNSimilarDocuments
Mutual Coupling Reduction between Finite Spaced Planar Antenna Elements Using Modified Ground Structure
2020IsAmongTopNSimilarDocuments
MALGRA: Machine Learning and N-Gram Malware Feature Extraction and Detection System
2020IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	4
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average