Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models

Name: Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models
Keywords: FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Yasiru Ranasinghe; Vibashan VS; James Uplinger; Celso de Melo; Vishal M. Patel

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1109/avss65...

Article . 2025 . Peer-reviewed

License: STM Policy #29

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2025

License: CC BY

Data sources: Datacite

DBLP

Article

Data sources: DBLP

DBLP

Conference object

Data sources: DBLP

Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 11 Aug 2025Embargo end date: 01 Jan 2025Publisher:IEEEJournal:2025 IEEE International Conference on Advanced Visual and Signal-Based Systems (AVSS)

Authors: Yasiru Ranasinghe; Vibashan VS; James Uplinger; Celso de Melo; Vishal M. Patel;

doi: 10.1109/avss65446.2025.11149918 , 10.48550/arxiv.2501.07396

arXiv: 2501.07396

Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models

- Summary
- Subjects
- Metrics

Abstract

Automatic target recognition (ATR) plays a critical role in tasks such as navigation and surveillance, where safety and accuracy are paramount. In extreme use cases, such as military applications, these factors are often challenged due to the presence of unknown terrains, environmental conditions, and novel object categories. Current object detectors, including open-world detectors, lack the ability to confidently recognize novel objects or operate in unknown environments, as they have not been exposed to these new conditions. However, Large Vision-Language Models (LVLMs) exhibit emergent properties that enable them to recognize objects in varying conditions in a zero-shot manner. Despite this, LVLMs struggle to localize objects effectively within a scene. To address these limitations, we propose a novel pipeline that combines the detection capabilities of open-world detectors with the recognition confidence of LVLMs, creating a robust system for zero-shot ATR of novel classes and unknown domains. In this study, we compare the performance of various LVLMs for recognizing military vehicles, which are often underrepresented in training datasets. Additionally, we examine the impact of factors such as distance range, modality, and prompting methods on the recognition performance, providing insights into the development of more reliable ATR systems for novel conditions and classes.

Related Organizations

Johns Hopkins University
United States
United States Army
United States
United States Army Research Laboratory
United States

Keywords

FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green