Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Preprint
Data sources: ZENODO
addClaim

XRFv2 Plus: A Multimodal Sensor-Vision-Language Dataset for Action Understanding

Authors: Fei, Wang;

XRFv2 Plus: A Multimodal Sensor-Vision-Language Dataset for Action Understanding

Abstract

We present XRFv2 Plus, a synchronized multimodal dataset for sensor-vision-language action understanding. Built from the XRFv2 recording corpus, XRFv2 Plus reorganizes 853 valid continuous action sequences around a common cropped-video timeline and releases aligned WiFi CSI, five-position IMU, AirPods IMU, RGB video embeddings, Kinect depth videos, Kinect infrared videos, 2D pose, depth-assisted 3D pose, SMPL mesh, and DensePose-style human-surface information. The dataset further provides relative-time temporal action localization annotations, action captioning annotations, and action question answering annotations. This paper does not introduce a new recording campaign; instead, it defines a new public benchmark built on a different release contract, modality set, annotation set, and task scope. XRFv2 Plus defines a unified video-aligned benchmark contract: standardized tensor shapes, fixed device order, per-second sensor resampling, privacy-aware no-RGB public packaging, and explicit handling of shortened Kinect-video cases. This paper describes the dataset construction, alignment protocol, modality formats, annotation schemas, and public release organization. Project page: https://github.com/airslab2020/XRFV2Dataset: https://www.kaggle.com/datasets/airslab2020/xrfv2-multimodal-tal-caption-qa-no-rgb

Powered by OpenAIRE graph
Found an issue? Give us feedback