BEVFusion With Dual Hard Instance Probing for Multimodal 3D Object Detection

Name: BEVFusion With Dual Hard Instance Probing for Multimodal 3D Object Detection
Keywords: 3D object detection, transformer, deep learning, multi-modal, Electrical engineering. Electronics. Nuclear engineering, deformable attention, TK1-9971

Taeho Kim; Joohee Kim

Found an issue? Give us feedback

IEEE Accessarrow_drop_down

IEEE Access

Article . 2025 . Peer-reviewed

License: CC BY

Data sources: Crossref

IEEE Access

Article . 2025

Data sources: DOAJ

DBLP

Article

Data sources: DBLP

BEVFusion With Dual Hard Instance Probing for Multimodal 3D Object Detection

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2025Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Access, volume 13, pages 25,546-25,556 (eissn: 2169-3536,

Copyright policy )

Authors: Taeho Kim; Joohee Kim;

doi: 10.1109/access.2025.3538866

BEVFusion With Dual Hard Instance Probing for Multimodal 3D Object Detection

- Summary
- Subjects
- Metrics

Abstract

False negatives (FN) in 3D object detection, which occur when small, distant, or hidden objects are missed, pose significant safety risks in autonomous driving systems. Recent multi-modal fusion methods have been proposed to enhance 3D object detection by combining the geometric accuracy of LiDAR point clouds with the rich semantic features of camera images. However, few methods explicitly address false negatives, and many fail to effectively align and interact multimodal features during the fusion process. To address these challenges, we propose BEVFusion with Dual Hard Instance Probing (BEVFusion-DHIP), a novel 3D object detection framework designed to systematically reduce false negatives. BEVFusion-DHIP incorporates Hard Instance Probing (HIP) into both LiDAR BEV features and 3D position-aware image features, progressively refining the detection of challenging objects across multiple stages. Furthermore, we introduce a Deformable Attention Fusion Network (DAFusionNet) to dynamically align and fuse LiDAR and camera BEV features during the fusion process, effectively mitigating spatial misalignment and enhancing inter-modal feature interaction. Experimental results on the nuScenes dataset show that the proposed BEVFusion-DHIP outperforms state-of-the-art lidar and camera+lidar based 3D object detection models. For example, BEVFusion-DHIP achieves improvements of 3.0 and 3.2 in mAP and NDS, respectively, compared to the baseline model BEVFusion.

Related Organizations

Illinois Institute of Technology
United States

Keywords

3D object detection, transformer, deep learning, multi-modal, Electrical engineering. Electronics. Nuclear engineering, deformable attention, TK1-9971

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold