Performance of Attention-Informed Mixed-Language Training in Multilingual VQA Benchmarks

Assignee Research

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

Performance of Attention-Informed Mixed-Language Training in Multilingual VQA Benchmarks

descriptionPublicationkeyboard_double_arrow_right Report Under curation English Publisher:Zenodo

Authors: Assignee Research;

doi: 10.5281/zenodo.20987262

Performance of Attention-Informed Mixed-Language Training in Multilingual VQA Benchmarks

- Summary

Abstract

While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse lResearch goal: How does the performance of Attention-Informed Mixed-Language Training (MLT) compare to other zero-shot adaptation methods like cross-lingual transfer learning or multitask learning on the Multilingual Visual Question Answering (ML-VQA) benchmark when evaluated on languages with varying levels of linguistic and structural similarity to the training language?Autonomous synthesis report generated by Assignee Research. Tribunal consensus score: 8.7/10.

Found an issue? Give us feedback