Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2024
License: CC BY
Data sources: ZENODO
ZENODO
Article . 2024
License: CC BY
Data sources: Datacite
ZENODO
Article . 2024
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

Integrating Lip Dynamics into Visual Speech Framework

Authors: Soham Akhade; J.C.Musale; S.J.Nawale; Omkar Jadhav; Atharva Bhadale; Prathmesh Gaikwad;

Integrating Lip Dynamics into Visual Speech Framework

Abstract

Visual Speech Recognition (VSR) is a rapidly evolving field with diverse applications in human-computer interaction, accessibility, and security. This paper presents an innovative approach to VSR, focusing on the extraction and analysis of lip movements for speech recognition. Traditional speech recognition systems rely primarily on acoustic information, making them vulnerable to noisy environments and audio disturbances. In contrast, our proposed method leverages the visual modality by harnessing the rich information encoded in lip movements during speech production. The study begins by collecting a comprehensive dataset of visual and audio recordings of speech in various languages and contexts. Subsequently, a deep learning architecture is designed to process the visual data, emphasizing lip movements, and the corresponding audio data. The proposed model integrates convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to extract and fuse information from both modalities. This fusion process enhances the robustness of the system by mitigating the limitations of traditional audio-only speech recognition. We evaluate the performance of the visual- based speech recognition system on a range of benchmark datasets and real-world scenarios. The results demonstrate the efficacy of our approach, highlighting its capacity to improve recognition accuracy, particularly in noisy environments or situations where audio data is incomplete or unavailable. In conclusion, our research contributes to the advancement of Visual Speech Recognition by introducing a novel approach that emphasizes lip movement analysis. By leveraging both audio and visual modalities, the proposed system provides a more robust and versatile solution for speech recognition, with the potential to enhance applications in human-computer interaction, accessibility, and security.

Keywords

Visual Speech Recognition, Lip Movement Analysis, Multimodal Speech Recognition, Deep Learning, Convolutional Neural Networks (CNN)

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average