Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ https://dr.ntu.edu.s...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://doi.org/10.32657/10356...
Doctoral thesis . 2020 . Peer-reviewed
Data sources: Crossref
DBLP
Doctoral thesis
Data sources: DBLP
versions View all 3 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Feature-based robust techniques for speech recognition

Authors: Nguyen, Duc Hoang Ha;

Feature-based robust techniques for speech recognition

Abstract

Automatic speech recognition (ASR) decodes speech signals into text. While ASR can produce accurate word recognition in clean environment, its accuracy degrades considerably under noisy conditions. I.e., robustness of ASR systems for real-world applications remains a challenge. In this thesis, speech feature enhancement and model adaptation for robust speech recognition is studied, and three novel methods to improve performance are introduced. The first work proposes a modification of the spectral subtraction method to reduce the non-stationary characteristics of additive noise in the speech. The main idea is to first normalise the noise's characteristics towards a Gaussian noise model, and then tackle the remaining noise by a model compensation method. The strategy is to reduce the noise handling problem to the back-end process. In this work, the back-end compensation process is applied using the vector Taylor series (VTS) model compensation approach, and we call this method the noise normalization VTS (NN-VTS). The second work proposes an extension of particle filter compensation (PFC) for the large vocabulary continuous speech recognition (LVCSR) task. PFC is a clean speech features tracking method using side information from hidden Markov models (HMM) for the particle filter framework. However, under noisy conditions for sub-word based LVCSR, the task to obtain an accurately aligned state sequence of HMM that describe the underlying clean speech features is challenging. This is because the total number of triphone models involved can be very large. To improve the identification of correct phone sequence, this work proposes to use a noisy model HMM trained from noisy data to estimate the state sequence and a parallel clean model HMM trained from clean data to generate the clean speech features. These two HMMs are trained jointly, and the alignment of states between the clean and noisy models HMM is obtained by single pass retraining (SPR) technique. With this approach, the accuracy of state sequence estimate is improved by the noisy model HMM, and the accurately aligned state is obtained by SPR technique. When the missing side information for PFC is available, a word error reduction of 28.46% from multi-condition training is observed for the Aurora-4 task. The third work proposes a novel spectro-temporal transform framework to improve word error rate for the noisy and reverberant environments. Motivated by the findings that human speech comprehension relies on both the spectral content and temporal envelope of speech signal, a spectro-temporal (ST) transform framework is proposed. This framework adapts the features to minimize the mismatch between the input features and training data using the Kullback Leibler divergence based cost function. In our work, we examined two implementations to overcome the limited adaptation data issue. The first implementation is a cross transform which is a sparse spectro-temporal transforms. The second implementation is a cascaded transform of temporal transform and spectral transform. Experiments are conducted on the REVERB Challenge 2014 task, where clean and multi-condition trained acoustic models are tested with real reverberant and noisy speech. Experimental results confirmed that temporal information is important for reverberant speech recognition and the simultaneous use of spectral and temporal information for feature adaptation is effective. Doctor of Philosophy (SCE)

Country
Singapore
Related Organizations
Keywords

DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition, :Engineering::Computer science and engineering::Computing methodologies::Pattern recognition [DRNTU], 004, 620

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green
bronze