<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Grounding spatial relations in text-only language models

Name: Grounding spatial relations in text-only language models
Keywords: FOS: Computer and information sciences, Computer Science - Computation and Language, language models, deep learning, Learning, spatial grounding, Computation and Language (cs.CL), Problem Solving, Language

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Feb 2024Embargo end date: 01 Jan 2024 Spain Publisher:Elsevier BVJournal:Neural Networks, volume 170, pages 215-226 (issn: 0893-6080,

Authors: Gorka Azkune; Ander Salaberria; Eneko Agirre;

doi: 10.1016/j.neunet.2023.11.031 , 10.48550/arxiv.2403.13666

pmid: 37992509

arXiv: http://arxiv.org/abs/2403.13666

handle: 10810/67552

Grounding spatial relations in text-only language models

- Summary
- Subjects
- Related research
  (1)
- Metrics

Abstract

This paper shows that text-only Language Models (LM) can learn to ground spatial relations like "left of" or "below" if they are provided with explicit location information of objects and they are properly trained to leverage those locations. We perform experiments on a verbalized version of the Visual Spatial Reasoning (VSR) dataset, where images are coupled with textual statements which contain real or fake spatial relations between two objects of the image. We verbalize the images using an off-the-shelf object detector, adding location tokens to every object label to represent their bounding boxes in textual form. Given the small size of VSR, we do not observe any improvement when using locations, but pretraining the LM over a synthetic dataset automatically derived by us improves results significantly when using location tokens. We thus show that locations allow LMs to ground spatial relations, with our text-only LMs outperforming Vision-and-Language Models and setting the new state-of-the-art for the VSR dataset. Our analysis show that our text-only LMs can generalize beyond the relations seen in the synthetic dataset to some extent, learning also more useful information than that encoded in the spatial rules we used to create the synthetic dataset itself.

Accepted in Neural Networks

Country

Spain

Related Organizations

Keywords

FOS: Computer and information sciences, Computer Science - Computation and Language, language models, deep learning, Learning, spatial grounding, Computation and Language (cs.CL), Problem Solving, Language

1 Research products, page 1 of 1

SpatialM software on GitHub
IsRelatedTo

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green

hybrid

Grounding spatial relations in text-only language models

Grounding spatial relations in text-only language models

1 Research products, page 1 of 1

SpatialM software on GitHub