Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ UNSWorksarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
UNSWorks
Doctoral thesis . 2012
License: CC BY NC ND
https://dx.doi.org/10.26190/un...
Doctoral thesis . 2012
License: CC BY NC ND
Data sources: Datacite
DBLP
Doctoral thesis . 2024
Data sources: DBLP
versions View all 2 versions
addClaim

Improving automatic speaker verification using front-end and back-end diversity

Authors: Kua, Jia Min Karen;

Improving automatic speaker verification using front-end and back-end diversity

Abstract

Technologies that exploit biometrics can potentially be applied to the identification and verification of individuals for controlling access to secured areas or materials. Among these technologies, automatic speaker verification systems are of growing interest, as they are the least invasive and they allow recognition via any type of communication network over long distances. The overall goal of this thesis is to improve the performance of automatic speaker verification systems by investigating novel features and classification methods that complement current state-of-the-art systems. At the feature level, novel log-compressed least squares group delay and spectral centroid features are proposed. The log-compression and least squares regularisation are shown to reduce the dynamic range of modified group delay features and outperform other existing group delay extraction methods. The proposed spectral centroid features provide a better characterisation of spectral energy distribution and experimental results show that the detailed spectral characterisation significantly improves performance. A diverse front-end involving multiple features would improve both phonetic (acoustic) and speaker modelling. In this regard, the relative contributions of the acoustic and speaker modelling ‘stages’ on the speaker recognition performance across different features are investigated. The investigation conducted through the use of clustering comparison measures suggests that front-end diversity, and hence improved performance from fused systems, can be achieved purely through different ‘partitioning’ of the acoustic space. Built on the finding, a novel universal background model (UBM) data/utterance selection algorithm that increases stability of the acoustic modelling is proposed. Finally, at the classification level, the use of the sparse representation classification (SRC) using Gaussian mixture model supervectors (GMMSRC) is proposed and is found to perform comparably to Gaussian mixture model-support vector machines (GMM-SVM). However, GMM-SRC results in a slower verification process. In order to increase the computation efficiency, the large dimensional supervectors are replaced with speaker factors resulting in the joint factor analysis-sparse representation classification (JFA-SRC). In addition, a novel dictionary composition technique to further improve the computation efficiency is developed. Results demonstrate that the refined dictionary provide comparable performance over the use of the complete dataset and generalises well to the evaluation on other databases. Notably, a detailed comparison of the proposed JFA-SRC across various state-of-the-art classifiers on the NIST 2010 databases showed that the proposed JFA-SRC achieved the best Minimum Detection Cost Function (minDCF), highlighting the usefulness of the SRC-based systems.

Country
Australia
Related Organizations
Keywords

Spectral centroid features, Normalised information distance, Automatic Speaker Verification, Group delay features, Sparse representation classification, 004, 620

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green