Hierarchical Relational Attention for Video Question Answering

descriptionPublicationkeyboard_double_arrow_right Article , Part of book or chapter of book 01 Oct 2018Publisher:IEEEJournal:2018 25th IEEE International Conference on Image Processing (ICIP)

Authors: Chowdhury, Muhammad Iqbal Hasan; Sridharan, Sridha; Fookes, Clinton; Nguyen Thanh, Kien;

doi: 10.1109/icip.2018.8451103

Hierarchical Relational Attention for Video Question Answering

- Summary
- Subjects
- Related research
  (9)
- Metrics

Abstract

Video Question Answering (VideoQA) tasks require understanding of the connection of context specific video parts which are temporally distributed. Humans are capable of focusing on temporally distributed video scenes and also to find correspondence or relationships among these segments. To achieve similar capability, a hierarchical relational attention mechanism is proposed in this paper. The proposed VideoQA model derives attention on temporal segments i.e. video features based on each of the question words. Also, contextual relevance of these temporal segments are captured to derive the final video representation which leads to a better reasoning capability. We evaluate the performance of the proposed approach on the MSRVTT-QA and the MSVD-QA datasets to establish its superior performance over the state of the art.

Related Organizations

Queensland University of Technology
Australia

Keywords

scene understanding, Hierarchical relational attention, Visual Question Answering (VQA), 004

9 Research products, page 1 of 1

Video Question Answering with Iterative Video-Text Co-tokenization
2022IsAmongTopNSimilarDocuments
Pairwise VLAD Interaction Network for Video Question Answering
2021IsAmongTopNSimilarDocuments
Long-Term Video Question Answering via Multimodal Hierarchical Memory Attentive Networks
2021IsAmongTopNSimilarDocuments
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
2022IsAmongTopNSimilarDocuments
Question-Aware Tube-Switch Network for Video Question Answering
2019IsAmongTopNSimilarDocuments
End-to-End Video Question-Answer Generation With Generator-Pretester Network
2021IsAmongTopNSimilarDocuments
Multi-Scale Progressive Attention Network for Video Question Answering
2021IsAmongTopNSimilarDocuments
Multi-Scale Progressive Attention Network for Video Question Answering
2021IsAmongTopNSimilarDocuments
Compositional Attention Networks With Two-Stream Fusion for Video Question Answering
2020IsAmongTopNSimilarDocuments

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	20
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%