Grounding Vision-Language Models for Multimodal Remote Sensing Transfer Performance

Assignee Research

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

Grounding Vision-Language Models for Multimodal Remote Sensing Transfer Performance

descriptionPublicationkeyboard_double_arrow_right Report Under curation English Publisher:Zenodo

Authors: Assignee Research;

doi: 10.5281/zenodo.20681429

Grounding Vision-Language Models for Multimodal Remote Sensing Transfer Performance

- Summary

Abstract

Deep learning models benefit from increasing data diversity and volume, motivating synthetic data augmentation to improve existing datasets. However, existing evaluation metrics for synthetic data typically calculate latent feature similarity, which is difficult to interpret and does not always correlate with the contribution to downstream tasks. We propose a vision-language grounded framework for interpretable synthetic data augmentation and evaluation in remote sensing. Our approach combines generative models, semantic segmentation and image captioning with vision and language models. BaseResearch goal: Does grounding synthetic data generation in vision-language models improve cross-domain transfer performance on multimodal remote sensing tasks relative to ungrounded augmentation?Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 8.7/10.

Found an issue? Give us feedback