Open Vocabulary Attribute Detection (OVAD) Dataset

Introduction Current detection datasets usually contain various object annotations. Compared to that, there are few detection dataset contains attribute annotations, which is also important for the task of detection. To address this gap, we propose a novel attribute dataset, OVAD, to support training and testing attribute detection comprehensively. OVAD is built on the nuScenes dataset (license: CC BY-NC-SA 4.0), supplementing it with detailed attribute annotations capturing spatial relationships, motion states, and interactions between objects. It is useful for developing and evaluating systems needing to know complex scene dynamics. To encourage more follow up works on Open Vocabulary Attribute Detection, we are publicly releasing the dataset split used in our paper ("Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes" by Xinhao Xiang, Kuan-Chuan Peng, Suhas Lohit, Michael J. Jones, Jiawei Zhang, BMVC 2025). Files in the unzipped folder: OVAD |---./README.md: This Markdown file |---OVAD_full: The OVAD dataset |---|---OVAD_infos_test.pkl |---|---OVAD_infos_train.pkl |---|---OVAD_infos_val.pkl |---OVAD_mini: The mini OVAD dataset |---|---OVAD_mini_infos_val.pkl |---|---OVAD_mini_infos_train.pkl |---|---example_val_0.json At a Glance The size of the unzipped dataset is ~2.5GB. OVAD is build on the nuScenes dataset (license: CC BY-NC-SA 4.0). Please download the dataset from their original repository. The .pkl files contains the meta information and the data list. It is organized as follows: 1. Metadata Type: dict Content: version: 2. Infos Type: list of dict Each entry store a sample information, it contains: Key Type Description lidar_path Path to the LiDAR data. num_features Number of features in the data. token Unique identifier for the sample. sweeps List of previous LiDAR frames. cams Camera-related information. lidar2ego_translation Translation from LiDAR to ego-frame. lidar2ego_rotation Rotation from LiDAR to ego-frame. ego2global_translation Translation from ego-frame to global. ego2global_rotation Rotation from ego-frame to global. timestamp Timestamp of the sample. gt_spatial_boxes np.ndarray (num_spat, 7) Spatial box information. gt_spatial_names np.ndarray (num_spat,) Spatial relationship names. gt_boxes np.ndarray (num_obj, 7) Ground truth 3D boxes. gt_names np.ndarray (num_obj,) Object category names. gt_attribute_names Attribute names for each object. gt_velocity np.ndarray (num_obj, 2) Object velocities on x and y axises num_lidar_pts np.ndarray (num_obj,) Number of LiDAR points per object. num_radar_pts np.ndarray (num_obj,) Number of radar points per object. valid_flag np.ndarray (num_obj,) Validity flag for objects. More information for those keys not related to open vocabulary attribute detection could be found in MMdetection3d (license: Apache 2.0). Example Representation { "metadata": { "version": "v1.0" }, "infos": [ { "lidar_path": "path/to/lidar/file.bin", "num_features": 5, "token": "37091c75b9704e0daa829ba56dfa0906", "sweeps": [...], "cams": {...}, "lidar2ego_translation": [...], "lidar2ego_rotation": [...], "ego2global_translation": [...], "ego2global_rotation": [...], "timestamp": 1533201470427893, "gt_spatial_boxes": [[...], [...], ...], "gt_spatial_names": ["From the perspective of pedestrian, car is behind pedestrian", "...", "..."], "gt_boxes": [[...], [...], ...], "gt_names": ["car", "pedestrian", ...], "gt_attribute_names": [["cycle.with_rider"], ["pedestrian.standing"], ...], "gt_velocity": [[0.0, 1.2], [1.1, -0.5], ...], "num_lidar_pts": [12, 8, ...], "num_radar_pts": [5, 3, ...], "valid_flag": [True, False, ...] }, {...} ] } The 'example_val_0.json' file shows the comprehensive example of the one entry data under the "infos" key. It is the first entry data in the validation set. Citation If you use the OVAD dataset in your research, please cite our paper: @inproceedings{yang2025ltoad, author = {Xiang, Xinhao and Peng, Kuan-Chuan and Lohit, Suhas and Jones, Michael J. and Zhang, Jiawei}, title = {Towards Open-Vocabulary Multimodal 3D Object Detection with Attributes}, booktitle = {The British Machine Vision Conference (BMVC)}, year = {2025} } License The OVAD dataset is released under CC-BY-NC-SA-4.0 license. For the images in the nuScenes dataset, please refer to their website for their copyright and license terms. Created by Mitsubishi Electric Research Laboratories (MERL), 2024-2025 SPDX-License-Identifier: CC-BY-NC-SA-4.0

Related Organizations

Mitsubishi Electric Research Laboratories
United States
University of California, Davis
United States

Keywords

3D object detection, Open vocabulary detection, Attribute detection

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average