HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive Media

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Jan 2020Embargo end date: 01 Oct 2021 Netherlands Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Access, volume 8, pages 176,241-176,262 (eissn: 2169-3536,

Copyright policy )Funded by:EC | VRTogether, unidentified

Authors: Anargyros Chatzitofis; Leonidas Saroglou; Prodromos Boutis; Petros Drakoulis; Nikolaos Zioulis; Shishir Subramanyam; Bart Kevelham; +5 Authors

APC: 1,444.25 EUR

doi: 10.1109/access.2020.3026276 , 10.48550/arxiv.2110.07235

arXiv: 2110.07235

HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive Media

- Summary
- Subjects
- Metrics

Abstract

We introduce HUMAN4D, a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system. By capturing 2 female and $2$ male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc.), along with multi-RGBD (mRGBD), volumetric and audio data. Despite the existence of multi-view color datasets captured with the use of hardware (HW) synchronization, to the best of our knowledge, HUMAN4D is the first and only public resource that provides volumetric depth maps with high synchronization precision due to the use of intra- and inter-sensor HW-SYNC. Moreover, a spatio-temporally aligned scanned and rigged 3D character complements HUMAN4D to enable joint research on time-varying and high-quality dynamic meshes. We provide evaluation baselines by benchmarking HUMAN4D with state-of-the-art human pose estimation and 3D compression methods. For the former, we apply 2D and 3D pose estimation algorithms both on single- and multi-view data cues. For the latter, we benchmark open-source 3D codecs on volumetric data respecting online volumetric video encoding and steady bit-rates. Furthermore, qualitative and quantitative visual comparison between mesh-based volumetric data reconstructed in different qualities showcases the available options with respect to 4D representations. HUMAN4D is introduced to the computer vision and graphics research communities to enable joint research on spatio-temporally aligned pose, volumetric, mRGBD and audio data cues. The dataset and its code are available https://tofis.github.io/myurls/human4d.

Country

Netherlands

Related Organizations

View all View all

Keywords

social activities, FOS: Computer and information sciences, Dataset, 4D, multi-view, motion capture, RGBD, volumetric video, pose estimation, 3D compression, 4D capture, visual evaluation, benchmarking, depth sensing, audio, social activities., Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, pose estimation, Machine Learning (cs.LG), audio, motion capture, dataset, benchmarking, 4D, 3D compression, $D capture, volumetric video, visual evaluation, TK1-9971, depth sensing, Artificial Intelligence (cs.AI), multi-view, RGBD, Electrical engineering. Electronics. Nuclear engineering, Dataset

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	37
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%