Multimodal Fusion Remote Sensing Image&#x2013;Audio Retrieval

Rui Yang; Shuang Wang; Yingzhi Sun; Huan Zhang; Yu Liao; Yu Gu; Biao Hou; Licheng Jiao

Found an issue? Give us feedback

IEEE Journal of Sele...arrow_drop_down

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

Article . 2022

Data sources: DOAJ

Multimodal Fusion Remote Sensing Image–Audio Retrieval

descriptionPublicationkeyboard_double_arrow_right Article 01 Jan 2022 English Publisher:IEEEJournal:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (issn: 2151-1535,

Copyright policy )

Authors: Rui Yang; Shuang Wang; Yingzhi Sun; Huan Zhang; Yu Liao; Yu Gu; Biao Hou; +1 Authors

Multimodal Fusion Remote Sensing Image–Audio Retrieval

- Summary
- Subjects
- Metrics

Abstract

Remote sensing image–audio retrieval (RSIAR) has been an emerging research topic in recent years, and many different methods have been proposed for this topic. These RSIAR methods have achieved good retrieval results, but two problems remain: the lack of discriminability of audio modality and the existence of a heterogeneous gap between audio and image. These two problems make the cross-modal common embedding space for audio and images suboptimal, often failing to perform superior retrieval. This article proposes a novel RSIAR method named multimodal fusion remote sensing image–audio retrieval (MMFR) to address these two problems. MMFR first converts original audio input to text. Then, MMFR uses a feature fusion module to obtain a fusion representation fused with text information instead of the original sole audio representation. Fusion text information can make the pronunciation-based audio feature more semantically discriminable and convert pronunciation-based audio feature to more “high-level” fusion feature to cross the heterogeneous gap. Seven different fusion methods are tried in the feature fusion module. In addition, the triplet loss, the semantic loss, and the consistency loss are used to optimize the common retrieval space. Extensive experiments conducted on the UCM_IV, RSICD_IV, and SYDNE_IV datasets demonstrate that our MMFR method outperforms state-of-the-art methods.

Keywords

multimodal learning, Ocean engineering, Feature fusion, QC801-809, Geophysics. Cosmic physics, remote sensing audio–image retrieval, TC1501-1800

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

gold