Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Article . 2023
License: CC BY NC
Data sources: ZENODO
ZENODO
Article . 2023
License: CC BY NC
Data sources: Datacite
ZENODO
Article . 2023
License: CC BY NC
Data sources: Datacite
versions View all 2 versions
addClaim

Python-Powered Speech-to-Text: A Comprehensive Survey and Performance Analysis

Authors: Aschin Dhakad; Shruti Singh;

Python-Powered Speech-to-Text: A Comprehensive Survey and Performance Analysis

Abstract

Speech recognition, the technology that enables machines to convert spoken language into text, has witnessed widespread adoption across various domains, from virtual assistants to transcription services. Python, with its versatile libraries and extensive community support, has become a go-to choice for developing speech recognition systems. This paper provides a comprehensive survey of the field, focusing on the role of Python in shaping the landscape of automatic speech recognition (ASR). The survey begins with an overview of the growing importance of speech recognition technology in today's digital age. It highlights Python's pivotal role as a programming language in the development of ASR systems, citing its accessibility and integration capabilities as key strengths. The paper delves into the fundamental concepts of audio data preprocessing, feature extraction techniques such as Mel Frequency Cepstral Coefficients (MFCC), and diverse model architectures. In addition to surveying the landscape, this paper conducts a performance analysis of Python-based speech recognition systems, evaluating their accuracy and efficiency. Practical considerations for performance evaluation, including evaluation metrics, are explored to provide a holistic view of system effectiveness. Throughout the paper, references to authoritative sources, including IBM Cloud, Google Cloud, and academic resources, enrich the discussion and provide real-world insights. The paper culminates in a conclusion that underscores Python's significance in the field and its potential to shape the future of speech recognition. This paper serves as a valuable resource for researchers, developers, and enthusiasts seeking to harness Python's power in the realm of speech-to-text conversion. It not only offers a comprehensive understanding of ASR technology but also highlights Python's adaptability and potential to drive innovation in this transformative field.

Related Organizations
Keywords

Mel Frequency Cepstral Coefficients (MFCC), Data preprocessing., Fourier Transform, Feature extraction, Automatic speech recognition (ASR), Speech recognition, Speech-to-text conversion, Natural language processing (NLP)

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average