
Unmanned Surface Vehicles (USVs) play a critical role in maritime monitoring, environmental protection, and emergency response, necessitating accurate scene understanding in complex aquatic environments. Conventional semantic segmentation methods often fail to capture global context and lack physical boundary consistency, limiting real-world performance. This paper proposes USV-Seg, a unified segmentation framework integrating a vision-language model, the Segment Anything Model (SAM), DINOv2-based visual features, and a physically constrained refinement module. We design a task-specific <Describe> Token to enable fine-grained semantic reasoning of navigation scenes, considering USV-to-shore distance, landform complexity, and water surface texture. A mask selection algorithm based on multi-layer Intersection-over-Prediction (IoP) heads improves segmentation precision across sky, water, and obstacle regions. A boundary-aware correction module refines outputs using estimated sky-water and land-water boundaries, enhancing robustness and realism. Unlike prior works that simply apply vision-language or geometric post-processing in isolation, USV-Seg integrates structured scene reasoning and scene-aware boundary constraints into a unified and physically consistent framework. Experiments on a real-world USV dataset demonstrate that USV-Seg outperforms state-of-the-art methods, achieving 96.30% mIoU in challenging near-shore scenarios.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
