Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao https://doi.org/10.1...arrow_drop_down
image/svg+xml Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao Closed Access logo, derived from PLoS Open Access logo. This version with transparent background. http://commons.wikimedia.org/wiki/File:Closed_Access_logo_transparent.svg Jakob Voss, based on art designer at PLoS, modified by Wikipedia users Nina and Beao
versions View all 1 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Question-Aware Tube-Switch Network for Video Question Answering

Authors: Tianhao Yang; Zheng-Jun Zha; Hongtao Xie; Meng Wang; Hanwang Zhang;

Question-Aware Tube-Switch Network for Video Question Answering

Abstract

Video Question & Answering (VideoQA), a task to answer questions in videos, involves rich spatio-temporal content (e.g., appearance and motion) and requires multi-hop reasoning process. However, existing methods usually deal with appearance and motion separately and fail to synchronize the attentions on appearance and motion features, neglecting two key properties of video QA: (1) appearance and motion features are usually concomitant and complementary to each other at time slice level. Some questions rely on joint representations of both kinds of features at some point in the video; (2) appearance and motion have different importance in multi-step reasoning. In this paper, we propose a novel Question- Aware Tube-Switch Network (TSN) for video question answering which contains (1) a Mix module to synchronously combine the appearance and motion representation at time slice level, achieving fine-grained temporal alignment and correspondence between appearance and motion at every time slice and (2) a Switch mod- ule to adaptively choose appearance or motion tube as primary at each reasoning step, guiding the multi-hop reasoning process. To end-to-end train TSN, we utilize the Gumbel-Softmax strategy to account for the discrete tube-switch process. Extensive experimental results on two benchmarks: MSVD-QA and MSRVTT-QA, have demonstrated that the proposed TSN consistently outperforms state-of-the-art on all metrics.

Related Organizations
  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    19
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Top 10%
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Top 10%
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Top 10%
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
19
Top 10%
Top 10%
Top 10%
Upload OA version
Are you the author of this publication? Upload your Open Access version to Zenodo!
It’s fast and easy, just two clicks!