Image-Based Synthesis for Deep 3D Human Pose Estimation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 19 Mar 2018Embargo end date: 01 Jan 2018 France English Publisher:Springer Science and Business Media LLCJournal:International Journal of Computer Vision, volume 126, pages 993-1,008 (issn: 0920-5691, eissn: 1573-1405,

Copyright policy )Funded by:EC | EGOVISION4HEALTH, EC | ALLEGRO

Authors: Grégory Rogez; Cordelia Schmid;

doi: 10.1007/s11263-018-1071-9 , 10.48550/arxiv.1802.04216

arXiv: 1802.04216

Image-Based Synthesis for Deep 3D Human Pose Estimation

- Summary
- Subjects
- Metrics

Abstract

This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D motion capture data. Given a candidate 3D pose, our algorithm selects for each joint an image whose 2D pose locally matches the projected 3D pose. The selected images are then combined to generate a new synthetic image by stitching local image patches in a kinematically constrained manner. The resulting images are used to train an end-to-end CNN for full-body 3D pose estimation. We cluster the training data into a large number of pose classes and tackle pose estimation as a $K$-way classification problem. Such an approach is viable only with large training sets such as ours. Our method outperforms most of the published works in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for real-world images (LSP). This demonstrates that CNNs trained on artificial images generalize well to real images. Compared to data generated from more classical rendering engines, our synthetic images do not require any domain adaptation or fine-tuning stage.

accepted to appear in IJCV (with minor revisions). Follow-up to NIPS 2016 arXiv:1607.02046

Country

France

Related Organizations

Grenoble Alpes University
France
French National Centre for Scientific Research
France
Inria Grenoble - Rhône-Alpes research centre
France
University of Grenoble
France
French Institute for Research in Computer Science and Automation
France

View all View all

Keywords

FOS: Computer and information sciences, [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], ACM: I.: Computing Methodologies/I.4: IMAGE PROCESSING AND COMPUTER VISION, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], 004

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	25
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%