<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

ITOP Dataset

Name: ITOP Dataset
Keywords: depth sensor, human pose estimation, 3D vision, computer vision

Research datakeyboard_double_arrow_right Dataset 08 Oct 2016 English Publisher:Zenodo

Authors: Haque, Albert; Peng, Boya; Luo, Zelun; Alahi, Alexandre; Yeung, Serena; Fei-Fei, Li;

doi: 10.5281/zenodo.3932973 , 10.5281/zenodo.3932972

ITOP Dataset

- Summary
- Subjects
- Metrics

Abstract

Summary The ITOP dataset (Invariant Top View) contains 100K depth images from side and top views of a person in a scene. For each image, the location of 15 human body parts are labeled with 3-dimensional (x,y,z) coordinates, relative to the sensor's position. Read the full paper for more context [pdf]. Getting Started Download then decompress the h5.gz file. gunzip ITOP_side_test_depth_map.h5.gz Using Python and h5py (pip install h5py or conda install h5py), we can load the contents: import h5py import numpy as np f = h5py.File('ITOP_side_test_depth_map.h5', 'r') data, ids = f.get('data'), f.get('id') data, ids = np.asarray(data), np.asarray(ids) print(data.shape, ids.shape) # (10501, 240, 320) (10501,) Note: For any of the *_images.h5.gz files, the underlying file is a tar file and not a h5 file. Please rename the file extension from h5.gz to tar.gz before opening. The following commands will work: mv ITOP_side_test_images.h5.gz ITOP_side_test_images.tar.gz tar xf ITOP_side_test_images.tar.gz Metadata File sizes for images, depth maps, point clouds, and labels refer to the uncompressed size. +-------+--------+---------+---------+----------+------------+--------------+---------+ | View | Split | Frames | People | Images | Depth Map | Point Cloud | Labels | +-------+--------+---------+---------+----------+------------+--------------+---------+ | Side | Train | 39,795 | 16 | 1.1 GiB | 5.7 GiB | 18 GiB | 2.9 GiB | | Side | Test | 10,501 | 4 | 276 MiB | 1.6 GiB | 4.6 GiB | 771 MiB | | Top | Train | 39,795 | 16 | 974 MiB | 5.7 GiB | 18 GiB | 2.9 GiB | | Top | Test | 10,501 | 4 | 261 MiB | 1.6 GiB | 4.6 GiB | 771 MiB | +-------+--------+---------+---------+----------+------------+--------------+---------+ Data Schema Each file contains several HDF5 datasets at the root level. Dimensions, attributes, and data types are listed below. The key refers to the (HDF5) dataset name. Let \(n\) denote the number of images. Transformation To convert from point clouds to a \(240 \times 320\) image, the following transformations were used. Let \(x_{\textrm{img}}\) and \(y_{\textrm{img}}\) denote the \((x,y)\) coordinate in the image plane. Using the raw point cloud \((x,y,z)\) real world coordinates, we compute the depth map as follows: \(x_{\textrm{img}} = \frac{x}{Cz} + 160\) and \(y_{\textrm{img}} = -\frac{y}{Cz} + 120\) where \(C\approx 3.50×10^{−3} = 0.0035\) is the intrinsic camera calibration parameter. This results in the depth map: \((x_{\textrm{img}}, y_{\textrm{img}}, z)\). Joint ID (Index) Mapping joint_id_to_name = { 0: 'Head', 8: 'Torso', 1: 'Neck', 9: 'R Hip', 2: 'R Shoulder', 10: 'L Hip', 3: 'L Shoulder', 11: 'R Knee', 4: 'R Elbow', 12: 'L Knee', 5: 'L Elbow', 13: 'R Foot', 6: 'R Hand', 14: 'L Foot', 7: 'L Hand', } Depth Maps Key: id Dimensions: \((n,)\) Data Type: uint8 Description: Frame identifier in the form XX_YYYYY where XX is the person's ID number and YYYYY is the frame number. Key: data Dimensions: \((n,240,320)\) Data Type: float16 Description: Depth map (i.e. mesh) corresponding to a single frame. Depth values are in real world meters (m). Point Clouds Key: id Dimensions: \((n,)\) Data Type: uint8 Description: Frame identifier in the form XX_YYYYY where XX is the person's ID number and YYYYY is the frame number. Key: data Dimensions: \((n,76800,3)\) Data Type: float16 Description: Point cloud containing 76,800 points (240x320). Each point is represented by a 3D tuple measured in real world meters (m). Labels Key: id Dimensions: \((n,)\) Data Type: uint8 Description: Frame identifier in the form XX_YYYYY where XX is the person's ID number and YYYYY is the frame number. Key: is_valid Dimensions: \((n,)\) Data Type: uint8 Description: Flag corresponding to the result of the human labeling effort. This is a boolean value (represented by an integer) where a one (1) denotes clean, human-approved data. A zero (0) denotes noisy human body part labels. If is_valid is equal to zero, you should not use any of the provided human joint locations for the particular frame. Key: visible_joints Dimensions: \((n,15)\) Data Type: int16 Description: Binary mask indicating if each human joint is visible or occluded. This is denoted by \(\alpha\) in the paper. If \(\alpha_j=1\) then the \(j^{th}\) joint is visible (i.e. not occluded). Otherwise, if \(\alpha_j = 0\) then the \(j^{th}\) joint is occluded. Key: image_coordinates Dimensions: \((n,15,2)\) Data Type: int16 Description: Two-dimensional \((x,y)\) points corresponding to the location of each joint in the depth image or depth map. Key: real_world_coordinates Dimensions: \((n,15,3)\) Data Type: float16 Description: Three-dimensional \((x,y,z)\) points corresponding to the location of each joint in real world meters (m). Key: segmentation Dimensions: \((n,240,320)\) Data Type: int8 Description: Pixel-wise assignment of body part labels. The background class (i.e. no body part) is denoted by −1. Citation If you would like to cite our work, please use the following. Haque A, Peng B, Luo Z, Alahi A, Yeung S, Fei-Fei L. (2016). Towards Viewpoint Invariant 3D Human Pose Estimation. European Conference on Computer Vision. Amsterdam, Netherlands. Springer. @inproceedings{haque2016viewpoint, title={Towards Viewpoint Invariant 3D Human Pose Estimation}, author={Haque, Albert and Peng, Boya and Luo, Zelun and Alahi, Alexandre and Yeung, Serena and Fei-Fei, Li}, booktitle = {European Conference on Computer Vision}, month = {October}, year = {2016} }

Related Organizations

Stanford University
United States

Keywords

depth sensor, human pose estimation, 3D vision, computer vision

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	1K
download	downloads	1K

1K
views
1K
downloads
Powered by

Found an issue? Give us feedback

visibility

download

Average