Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ IEEE Journal of Sele...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
addClaim

Multimodal Fusion Remote Sensing Image–Audio Retrieval

Authors: Rui Yang; Shuang Wang; Yingzhi Sun; Huan Zhang; Yu Liao; Yu Gu; Biao Hou; +1 Authors

Multimodal Fusion Remote Sensing Image–Audio Retrieval

Abstract

Remote sensing image–audio retrieval (RSIAR) has been an emerging research topic in recent years, and many different methods have been proposed for this topic. These RSIAR methods have achieved good retrieval results, but two problems remain: the lack of discriminability of audio modality and the existence of a heterogeneous gap between audio and image. These two problems make the cross-modal common embedding space for audio and images suboptimal, often failing to perform superior retrieval. This article proposes a novel RSIAR method named multimodal fusion remote sensing image–audio retrieval (MMFR) to address these two problems. MMFR first converts original audio input to text. Then, MMFR uses a feature fusion module to obtain a fusion representation fused with text information instead of the original sole audio representation. Fusion text information can make the pronunciation-based audio feature more semantically discriminable and convert pronunciation-based audio feature to more “high-level” fusion feature to cross the heterogeneous gap. Seven different fusion methods are tried in the feature fusion module. In addition, the triplet loss, the semantic loss, and the consistency loss are used to optimize the common retrieval space. Extensive experiments conducted on the UCM_IV, RSICD_IV, and SYDNE_IV datasets demonstrate that our MMFR method outperforms state-of-the-art methods.

Keywords

multimodal learning, Ocean engineering, Feature fusion, QC801-809, Geophysics. Cosmic physics, remote sensing audio–image retrieval, TC1501-1800

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
gold