publication . Preprint . 2018

Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

Jalalifar, Seyed Ali; Hasani, Hosein; Aghajan, Hamid;
Open Access English
  • Published: 20 Mar 2018
We present a novel approach to generating photo-realistic images of a face with accurate lip sync, given an audio input. By using a recurrent neural network, we achieved mouth landmarks based on audio features. We exploited the power of conditional generative adversarial networks to produce highly-realistic face conditioned on a set of landmarks. These two networks together are capable of producing a sequence of natural faces in sync with an input audio track.
ACM Computing Classification System: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONGeneralLiterature_MISCELLANEOUS
free text keywords: Computer Science - Computer Vision and Pattern Recognition
Download from
36 references, page 1 of 3

1. Suwajanakorn, S., Seitz, S.M., Kemelmacher-Shlizerman, I.: Synthesizing obama: Learning lip sync from audio. ACM Trans. Graph. 36(4) (July 2017) 95:1{95:13

2. Taylor, S., Kim, T., Yue, Y., Mahler, M., Krahe, J., Rodriguez, A.G., Hodgins, J., Matthews, I.: A deep learning approach for generalized speech animation. ACM Trans. Graph. 36(4) (July 2017) 93:1{93:11

3. Shimba, T., Sakurai, R., Yamazoe, H., Lee, J.H.: Talking heads synthesis from audio with deep neural networks. In: 2015 IEEE/SICE International Symposium on System Integration (SII). (Dec 2015) 100{105

4. Llorach, G., Evans, A., Blat, J., Grimm, G., Hohmann, V.: Web-based live speechdriven lip-sync. In: 2016 8th International Conference on Games and Virtual Worlds for Serious Applications (VS-GAMES). (Sept 2016) 1{4

5. Thies, J., Zollhfer, M., Stamminger, M., Theobalt, C., Niessner, M.: Demo of face2face: Real-time face capture and reenactment of rgb videos. In: ACM SIGGRAPH 2016 Emerging Technologies. SIGGRAPH '16, New York, NY, USA, ACM (2016) 5:1{5:2 [OpenAIRE]

6. Thies, J., Zollhfer, M., Niessner, M., Valgaerts, L., Stamminger, M., Theobalt, C.: Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34(6) (October 2015) 183:1{183:14

7. Garrido, P., Valgaerts, L., Sarmadi, H., Steiner, I., Varanasi, K., Perez, P., Theobalt, C.: Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. 34(2) (2015) 193{204

8. Shi, F., Wu, H.T., Tong, X., Chai, J.: Automatic acquisition of high- delity facial performances using monocular videos. ACM Trans. Graph. 33(6) (November 2014) 222:1{222:13

9. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q., eds.: Advances in Neural Information Processing Systems 27. Curran Associates, Inc. (2014) 2672{2680

10. Zhu, J.Y., Krahenbuhl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Proceedings of European Conference on Computer Vision (ECCV). (2016) [OpenAIRE]

11. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196 (2017)

12. Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Gool, L.V.: Pose guided person image generation. CoRR abs/1705.09368 (2017)

13. Im, D.J., Kim, C.D., Jiang, H., Memisevic, R.: Generating images with recurrent adversarial networks. CoRR abs/1602.05110 (2016)

14. Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784 (2014) [OpenAIRE]

15. Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: Infogan: Interpretable representation learning by information maximizing generative adversarial nets. CoRR abs/1606.03657 (2016) [OpenAIRE]

36 references, page 1 of 3
Any information missing or wrong?Report an Issue