Continuous speech recognition using articulatory data.

Conference object, Article, Contribution for newspaper or weekly magazine English OPEN
Wrench, Alan A ; Richmond, Korin
  • Publisher: International Speech Communication Association

In this paper we show that there is measurable information in the articulatory system which can help to disambiguate the acoustic signal. We measure directly the movement of the lips, tongue, jaw, velum and larynx and parameterise this articulatory feature space using principal components analysis. The parameterisation is developed and evaluated using a speaker dependent phone recognition task on a specially recorded TIMIT corpus of 460 sentences. The results show that there is useful supplementary information contained in the articulatory data which yields a small but significant improvement in phone recognition accuracy of 2%. However, preliminary attempts to estimate the articulatory data from the acoustic signal and use this to supplement the acoustic input have not yielded any significant improvement in phone accuracy.
  • References (7)

    1. Wrench, A.A., "A multi-channel/multi-speaker articulatory database for continuous speech recognition research", In Phonus, Research Report No. 4, Institute of Phonetics, University of Saarland, In press, 2000.

    2. Zlokarnik, I., "Adding articulatory features to acoustic features for automatic speech recognition", Acoust. Soc. Am. 129th Meeting, Abstract 1aSC38, 1995.

    3. Soquet, A., Saerens, M, and Lecuit, V, "Complimentary cues for speech recognition", Proc. Int. Conf. Phonetic. Sci., 1645-1648, 1999.

    4. Wrench, A.A. and Hardcastle W. J., "A multichannel articulatory speech database and its application for automatic speech recognition.", Proc. 5th seminar on speech production: models and data, 305-308, 2000.

    5. Frankel, J., Richmond, K., King, S. & Taylor, P., "An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces.", 6th Int. Conf. Spoken Lang. Proc., In press, 2000.

    6. Dupont, S. and Luettin, J., "Using the Multi-Stream Approach for Continuous Audio-Visual Speech Recognition: Experiments on the M2VTS Database", 5th Int. Conf. Spoken Lang. Proc. , CDROM, 1998.

    7. King, S. and Wrench A., "Dynamical system modelling of articulator movement.", Proc. Int. Conf. Phonetic Sciences, 3, 2259-2262, 1999.

  • Metrics
    No metrics available
Share - Bookmark