
doi: 10.2139/ssrn.6467245
Automatic Target Recognition (ATR) from Unmanned Aerial Vehicle (UAV) imagery is a critical challenge in modern defense intelligence, surveillance, and reconnaissance (ISR) operations. Existing approaches struggle with small target detection at altitude, real-time inference on constrained hardware, multi-modal data fusion, and robustness to adversarial concealment. This paper proposes ATR-HybridNet, a dual-branch CNN/Vision Transformer architecture that fuses RGB, infrared, and thermal modalities through a cross-modal attention mechanism. To address the scarcity of labeled military datasets, a semi-supervised mean-teacher framework augmented with adversarial domain randomization reduces the labeled data requirement by 65% relative to fully supervised baselines. Model compression via structured pruning and post-training quantization yields a 4× reduction in parameter count while maintaining 98.2% of full-precision performance, enabling real-time inference at 47 FPS on an NVIDIA Jetson AGX Xavier. Evaluated across five diverse UAV datasets covering urban, desert, and forested terrains under day/night and multi-weather conditions, ATR-HybridNet achieves mAP@0.5 of 0.847, precision of 0.913, recall of 0.891, and F1-score of 0.902. An integrated GradCAM and Transformer attention visualization module supports operator interpretability. All code, model weights, and dataset configurations are released for full reproducibility.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
