Advanced search in
Research products
arrow_drop_down
Searching FieldsTerms
Field of Science [Beta]
arrow_drop_down
is
arrow_drop_down
speech-language pathology & audiology
Include:

Filters

31,030 Research products, page 1 of 621

50
arrow_drop_down
Relevance
arrow_drop_down
  • Authors: 
    Su Zhu; Ouyu Lan; Kai Yu;
    Publisher: IEEE

    Robustness to errors produced by automatic speech recognition (ASR) is essential for Spoken Language Understanding (SLU). Traditional robust SLU typically needs ASR hypotheses with semantic annotations for training. However, semantic annotation is very expensive, and th...

  • Authors: 
    Babafemi O. Odelowo; David V. Anderson;
    Publisher: IEEE

    In this paper, we propose a method for post-processing of deep neural network (DNN) enhanced speech. The method, which is simple and does not require additional training or expansion of the feature or target vectors, can be viewed as a mask-based approach in which a noi...

  • Publication . Part of book or chapter of book . 2019
    Open Access
    Authors: 
    Duque, Andréa B.; Santos, Luã Lázaro J.; Macêdo, David; Zanchettin, Cleber;
    Publisher: Springer International Publishing

    Embedding artificial intelligence on constrained platforms has become a trend since the growth of embedded systems and mobile devices, experimented in recent years. Although constrained platforms do not have enough processing capabilities to train a sophisticated deep l...

  • Open Access
    Authors: 
    Kai Sun; Su Zhu; Lu Chen; Siqiu Yao; Xueyang Wu; Kai Yu;
    Publisher: ISCA
  • Publication . Article . 2018
    Open Access
    Authors: 
    Fatemeh Pishdadian; Bryan Pardo;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    The multi-resolution common fate transform (MCFT) is an audio signal representation useful for representing mixtures of multiple audio signals that overlap in both time and frequency. The MCFT combines the invertibility of a state-of-the-art representation, the common f...

  • Publication . Article . 2019
    Open Access
    Authors: 
    Wong, Jeremy Heng Meng; Gales, Mark John Francis; Wang, Yu;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    In automatic speech recognition, performance gains can often be obtained by combining an ensemble of multiple models. However, this can be computationally expensive when performing recognition. Teacher–student learning alleviates this cost by training a single student m...

  • Authors: 
    Weiqing Wang; Jin Pan; Hua Yi; Zhanmei Song; Ming Li;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    In this paper, we propose two different audio-based piano performance evaluation systems for beginners. The first is a sequential and modularized system, including three steps: Convolutional Neural Network (CNN)-based acoustic feature extraction, matching via dynamic ti...

  • Authors: 
    Jianyu Zheng; Kun Ma; Xuemei Tang; Shichen Liang;
    Publisher: IEEE

    The translation activity involves both the source language and the target language. Compared to the standard texts in the two language, translated texts show unique language characteristics. In order to explore them from the perspective of integrality and complexity, we...

  • Authors: 
    Anamika Baishya; Priyatam Kumar;
    Publisher: IEEE

    This paper presents an improved speech enhancement technique based on wavelet transform along with excitation-based classification of speech to eliminate noise from speech signals. The method initially classifies the speech into voiced, unvoiced and silence regions on t...

  • Authors: 
    Masahito Togami; Ryoichi Takashima; Yusuke Fujita;
    Publisher: IEEE

    In this paper, we propose a novel method to solve the permutation problem for multi-channel frequency-domain blind source separation problems. For low spectral correlation problem between lower frequencies and higher frequencies, the proposed method utilizes phase diffe...

  • Open Access
    Authors: 
    Helen Chilton; Connie Mayer; Wendy McCracken;
    Publisher: Oxford University Press (OUP)
    Country: United Kingdom

    The link between Theory of Mind (ToM) and literacy is increasingly being recognized in the literature. However, the focus to date has concentrated on the connections between reading and ToM, with an emphasis on the ways in which ToM is implicated in making inferences fr...

  • Publication . Conference object . 2018
    Authors: 
    Peter Kubinec; Oldrich Ondracek; Miroslav Hagara; Adam Fibich; Tomas Bagala;
    Publisher: IEEE

    In this paper, a straightforward calculation of the comb filter based digital reverberator structure is given. It is based on geometrical properties of the closed room and also source and listener positions. Three possible groups of the sound wave propagation directions...

  • Closed Access
    Authors: 
    Nacereddine Hammami; Isah A. Lawal; Mouldi Bedda; Nadir Farah;
    Publisher: Springer Science and Business Media LLC

    The accurate and automatic recognition of speech sound errors in children is crucial to facilitate the early detection and correction of any faulty phonological process in their early life. This paper addresses the problem of speech sound error classification in native ...

  • Publication . Part of book or chapter of book . 2020
    Closed Access
    Authors: 
    Zihan Xu; Zhi Chen; Lu Chen; Su Zhu; Kai Yu;
    Publisher: Springer International Publishing

    In a task-oriented dialogue system, the dialogue state tracker aims to generate a structured summary (domain-slot-value triples) over the whole dialogue utterance. However, existing approaches generally fail to make good use of pre-defined ontologies. In this paper, we ...

  • Closed Access
    Authors: 
    Wang, Zhongqing; Sun, Qingying; Li, Shoushan; Zhu, Qiaoming; Zhou, Guodong;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    Stance detection aims to assign a stance label (i.e., favor or against ) to a post towards a specific target. Recently, there is a growing interest in adopting neural models to detect stance of a document. However, most of these works focus on modeling the sequence of w...

  • Closed Access
    Authors: 
    Zhou, Mantong; Huang, Minlie; Zhu, Xiaoyan;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    The ability of story comprehension is a strong indicator of natural language understanding. Recently, Story Cloze Test has been introduced as a new task of machine reading comprehension, i.e., selecting a correct ending from two candidate endings given a four-sentence s...

  • Closed Access
    Authors: 
    Wang, Rui; Chen, Zhe; Yin, Fuliang Yin;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    Acoustic sensor networks (ASNs) are widely applied in scenarios like teleconference, teaching, and theatre. ASNs can be used in tracking speakers, enhancing the speaker's speech and human–machine interactions, etc., but the geometric structure of the ASN has to be calib...

  • Open Access
    Authors: 
    M. Płonkowski; P. Urbanovich;
    Publisher: Wydawnictwo SIGMA-NOT, sp. z.o.o.
    Country: Belarus

    In this article authors proposed a hybrid system in which the full covariance matrix is used only at the initial stage of learning. At the further stage of learning, the amount of covariance matrix increases significantly, which, combined with rounding errors, causes pr...

  • Open Access
    Authors: 
    Sudarsana Reddy Kadiri; B. Yegnanarayana;
    Publisher: ISCA
  • Authors: 
    Yingke Zhao; Jacob Benesty; Jingdong Chen;
    Publisher: IEEE

    This paper develops a single-channel noise reduction algorithm in the short-time Fourier transform (STFT) domain, which attempts to optimize the fullband output signal-to-noise ratio (SNR). We show that the conventional Wiener filter, the maximum SNR filter, and the ide...

  • Authors: 
    Hongyu Liu; Shumin Shi; Heyan Huang;
    Publisher: IEEE

    Multi-document machine reading comprehension (MRC) has two characteristics compared with traditional MRC: 1) many documents are irrelevant to the question; 2) the length of the answer is relatively longer. However, in existing models, not only key ranking metrics at dif...

  • Closed Access
    Authors: 
    V. Srinivasarao; Umesh Ghanekar;
    Publisher: Springer Science and Business Media LLC

    Speech enhancement primarily focuses on improving the intelligibility and quality of the speech signal by using various algorithms and techniques. Processing of a speech signal refers to applying efficient mechanisms to reduce noise in the way of extracting the intended...

  • Publication . Conference object . 2020
    Authors: 
    Yumeto Inaoka; Kazuhide Yamamoto;
    Publisher: IEEE

    We construct a Japanese grammatical simplification corpus and established automatic simplification methods. We compare the conventional machine translation approach, our proposed method, and a hybrid method by automatic and manual evaluation. The results of the automati...

  • Open Access English
    Authors: 
    Yu, Yi; He, Hongsen; Chen, Badong; Li, Jianghui; Zhang, Youwen; Lu, Lu;
    Project: EC | STEMM-CCS (654462)

    This article studies the mean and mean-square behaviors of the M-estimate based normalized subband adaptive filter algorithm (M-NSAF) with robustness against impulsive noise. Based on the contaminated-Gaussian noise model, the stability condition, transient and steady-s...

  • Authors: 
    Peiqi Liu; Sheng-hua Zhong; Zhong Ming; Yan Liu;
    Publisher: IEEE

    Dialogue response generation system is one of the hot topics in natural language processing, but it is still a long way to go before it can generate human-like dialogues. A good evaluation method will help narrow the gap between the machine and human in dialogue generat...

  • Authors: 
    Suliang Bu; Yunxin Zhao; Mei-Yuh Hwang; Sining Sun;
    Publisher: IEEE

    We propose a robust nonlinear microphone array postfilter for noise reduction. This postfilter is formulated as a function of noise power ratio before and after beamforming and a local speech-to-observation power ratio. The two ratios are readily obtained during beamfor...

  • Authors: 
    Qing Wang; Xiumei Wang; Weiping Liu; Guannan Chen;
    Publisher: IEEE

    The prosody is an integration of the formal beauty and the content beauty of Chinese classical poetry. In this study, a prosodic structure prediction method based on a pretrained language representation model was proposed by us. Based on the pretrained language represen...

  • Closed Access
    Authors: 
    Hamidi, Mohamed; Satori, Hassan; Zealouk, Ouissam; Satori, Khalid;
    Publisher: Springer Science and Business Media LLC

    This paper describes the performance of Amazigh speech recognition via an interactive voice response in noisy conditions. The experiments were first conducted for the uncoded speech and then repeated for decoded speech in a noisy environment for different signal noise r...

  • Closed Access
    Authors: 
    Chen, Kehai; Wang, Rui; Utiyama, Masao; Sumita, Eiichiro; Zhao, Tiejun;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    Traditional neural machine translation NMT methods use the word-level context to predict target language translation while neglecting the sentence-level context, which has been shown to be beneficial for translation prediction in statistical machine translation. This pa...

  • Open Access
    Authors: 
    Jian Hao; Chunsha Wu;
    Publisher: Oxford University Press (OUP)

    The present study examined deaf children's moral development with experimental tasks. Experiment 1 investigated lying and sharing behavior in 37 six- to 11-year-old deaf children, 39 age-matched hearing children and 33 twelve- to 16-year-old deaf adolescents who were ma...

  • Authors: 
    Ali Sarafnia; M. Omair Ahmad; Mallappa Kumara Swamy;
    Publisher: IEEE

    For differential microphone arrays, most of the performance evaluation measures that are used in the context of noise reduction are based on the energy of the signal. In this paper, we propose a spectral entropy-based measure, which quantifies the ratio of the spectral ...

  • Authors: 
    Riheng Wu; Jaya Mongh; Rigen Mo;
    Publisher: IEEE

    Syntactic Parsing has played an important role in Natural Language Processing (NLP). The character of Traditional Mongolian is that “the predicate is generally at the end of the sentence, the other constituent can change location but the meaning of the sentence is not c...

  • Authors: 
    Rui Cai; Wei Wei; Jinsong Zhang;
    Publisher: IEEE

    The scoring feedback of Computer Assisted Pronunciation Training (CAPT) systems facilitate learner’s instant awareness of their problems, easily lead to more practices. But whether it is enough to instruct the learners to understand how to correct their errors is still ...

  • Open Access
    Authors: 
    Ojima, Yuta; Nakamura, Eita; Itoyama, Katsutoshi; Yoshii, Kazuyoshi;
    Publisher: Cambridge University Press (CUP)

    This paper describes automatic music transcription with chord estimation for music audio signals. We focus on the fact that concurrent structures of musical notes such as chords form the basis of harmony and are considered for music composition. Since chords and musical...

  • Authors: 
    Todd K. Moon; Jacob H. Gunther;
    Publisher: IEEE

    This paper describes a new algorithm for acoustic echo cancellation during doubletalk or, more precisely, acoustic echo separation, based on blind source separation (BSS) of convolutively mixed signals. The signal model assumes independence between sources, but temporal...

  • Publication . Part of book or chapter of book . 2019
    Closed Access
    Authors: 
    Yihe Pang; Jie Liu; Jianshe Zhou; Kai Zhang;
    Publisher: Springer International Publishing

    Paragraph coherence detection which is to evaluate the semantic correlation and structure coherence between paragraphs in text. Most of previous studies are based on English text, and only few studies are based on Chinese composition. The Chinese compositions emphasize ...

  • Open Access English
    Authors: 
    Xiaona Xu; Li Yang; Yue Zhao; Hui Wang;
    Publisher: Hindawi

    The research on Tibetan speech synthesis technology has been mainly focusing on single dialect, and thus there is a lack of research on Tibetan multidialect speech synthesis technology. This paper presents an end-to-end Tibetan multidialect speech synthesis model to rea...

  • Closed Access
    Authors: 
    Tanja Schultz; Thomas Hueber; Dean J. Krusienski; Jonathan S. Brumberg;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)
    Country: France

    The papers in this special section focus on biosignal-based spoken communication. Speech production is a complex process resulting from human activities initiated in the brain, eventually leading to muscle activities that produce respiratory, laryngeal, and articulatory...

  • Open Access
    Authors: 
    Cheng Yu; Ryandhimas E. Zezario; Syu-Siang Wang; Jonathan Sherman; Yi-Yen Hsieh; Xugang Lu; Hsin-Min Wang; Yu Tsao;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    Deep learning-based models have greatly advanced the performance of speech enhancement (SE) systems. However, two problems remain unsolved, which are closely related to model generalizability to noisy conditions: (1) mismatched noisy condition during testing, i.e., the ...

  • Publication . Conference object . Article . Preprint . 2019
    Open Access English
    Authors: 
    Francesc Lluís; Jordi Pons; Xavier Serra;
    Country: Spain

    Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase. To avoid omitting potentially useful information, we study the viability of using end-to-end models...

  • Authors: 
    Sanaz Saki Norouzi; Ahmad Akbari; Babak Nasersharif;
    Publisher: IEEE

    In recent years, neural networks have been widely used for language modeling in different tasks of natural language processing. Results show that long short-term memory (LSTM) neural networks are appropriate for language modeling due to their ability to process long seq...

  • Authors: 
    Sufeng Duan; Hai Zhao; Junru Zhou; Rui Wang;
    Publisher: IEEE

    Syntax has been shown a helpful clue in various natural language processing tasks including previous statistical machine translation and recurrent neural network based machine translation. However, since the state-of-the-art neural machine translation (NMT) has to be bu...

  • Closed Access
    Authors: 
    Samba Raju Chiluveru; Manoj Tripathy;
    Publisher: Springer Science and Business Media LLC

    In low Signal-to-Noise Ratio environment phase information is one of the important factor and therefore this article consider the importance of clean phase in single channel speech enhancement technique. The proposed method uses Deep Neural Network based regression mode...

  • Authors: 
    Kang Hyun Lee; Woo Hyun Kang; Tae Gyoon Kang; Nam Soo Kim;
    Publisher: IEEE

    Since the introduction of deep neural network (DNN)-based acoustic model, robust automatic speech recognition using DNN are being in research. Especially in model adaptation, the techniques utilizing auxiliary context features is known to be a promising technique. Recen...

  • Open Access English
    Authors: 
    Haryo Akbarianto Wibowo; Tatag Aziz Prawiro; Muhammad Ihsan; Alham Fikri Aji; Radityo Eko Prasojo; Rahmad Mahendra; Suci Fitriany;

    In its daily use, the Indonesian language is riddled with informality, that is, deviations from the standard in terms of vocabulary, spelling, and word order. On the other hand, current available Indonesian NLP models are typically developed with the standard Indonesian...

  • Closed Access
    Authors: 
    Li, Wei; Chen, Nancy F.; Siniscalchi, Sabato Marco; Lee, Chin-Hui;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    We investigate the effectiveness of soft-target tone labels and sequential context information for mispronunciation detection of Mandarin lexical tones pronounced by second language (L2) learners whose first language (L1) is of European origin. In conventional approache...

  • Publication . Conference object . Other literature type . 2017
    Open Access
    Authors: 
    Luka Kraljević; Mladen Russo; Mia Mlikota; Matko Šarić;
    Country: Croatia

    Listening to music often evokes strong emotions. With the rapid growth of easily-accessible digital music libraries there is an increasing need in reliable music emotion recognition systems. Common musical features like tempo, mode, pitch, clarity, etc. which can be eas...

  • Closed Access
    Authors: 
    Wang, Yijun; Xia, Yingce; Zhao, Li; Bian, Jiang; Qin, Tao; Chen, Enhong; Liu, Tie-Yan;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    Neural machine translation (NMT) heavily relies on parallel bilingual corpora for training. Since large-scale, high-quality parallel corpora are usually costly to collect, it is appealing to exploit monolingual corpora to improve NMT. Inspired by the law of total probab...

  • Open Access
    Authors: 
    Kukanov, Ivan; Trong, Trung Ngo; Hautamaki, Ville; Siniscalchi, Sabato Marco; Salerno, Valerio Mario; Lee, Kong Aik;
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)
    Project: AKA | Deep reinforcement learni... (313970)

    Bottleneck features (BNFs) generated with a deep neural network (DNN) have proven to boost spoken language recognition accuracy over basic spectral features significantly. However, BNFs are commonly extracted using language-dependent tied-context phone states as learnin...

  • Authors: 
    Chen Guoqiang; Zhang Yuying; Tang De-you;
    Publisher: IEEE

    Noise classification is a global extreme value solution problem for complex nonlinear functions. Based on SAMME and BP Neural Network, this paper proposes a noise classification algorithm-SAMME-NN. In SAMME-NN, multiple BP-NN weak classifiers are combined into a strong ...