Does incorporating multi-turn reinforcement learning during training improve the nDTW score of vision-language

SOVEREIGN Research Kernel

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Report

Data sources: ZENODO

Does incorporating multi-turn reinforcement learning during training improve the nDTW score of vision-language

descriptionPublicationkeyboard_double_arrow_right Report Under curation English Publisher:Zenodo

Authors: SOVEREIGN Research Kernel;

doi: 10.5281/zenodo.20439563

Does incorporating multi-turn reinforcement learning during training improve the nDTW score of vision-language

- Summary

Abstract

The speaker-follower models have proven to be effective in vision-and-language navigation, where a speaker model is used to synthesize new instructions to augment the training data for a follower navigation model. However, in many of the previous methods, the generated instructions are not directly trained to optimize the performance of the follower. In this paper, we present FOAM, a FOllower-Aware speaker Model that is constantly updated given the follower feedback, so that the generated instructions can be more suitable to the current learning state of the follower. Specifically, we optimizeResearch goal: Does incorporating multi-turn reinforcement learning during training improve the nDTW score of vision-language navigation models on RxR-CE compared to single-turn policy gradient methods?Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.8/10.

Found an issue? Give us feedback