
doi: 10.1121/1.4781430
Speech production makes not only acoustic signals but also visual images of the face especially for the jaw, lips, teeth, and tongue. To reproduce realistic facial images during Japanese speech, a three-dimensional computer graphics model of Japanese visemes was made. These visemes were extracted from the video database of images around lips during speech captured by two high speed (up to 300 frames per second) cameras [M. J. Hirayama, Proc. ICPhS 2003 (2003), pp. 3157–3161]. Most of the visemes are created as static shapes. They are for five vowels, semi-vowels, and some consonants. For explosives by labials (/p/ /b/ /n/) or tongue (/t/ /d/ /n/), dynamic information, that is, multiple shapes and timing information were assigned for each viseme. By placing these visemes onto a time axis at the timing of phonemes of a sentence, then interpolating shapes in between by using a spline interpolation technique on speech articulators’ motion graphs, computer graphics animation was made by ray-tracing rendering. [A part of this work was supported by Japan MEXT Academic Frontier Project.]
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
