Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ https://doi.org/10.1...arrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://doi.org/10.1145/368712...
Article . 2024 . Peer-reviewed
License: CC BY
Data sources: Crossref
u:cris
Conference object . 2024
Data sources: u:cris
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

Spatial Task-Explicity Matters in Prompting Large Multimodal Models for Spatial Planning

Authors: Ivan Majic; Zhangyu Wang; Krzysztof Janowicz; Mina Karimi;

Spatial Task-Explicity Matters in Prompting Large Multimodal Models for Spatial Planning

Abstract

The advance in large multimodal models (LMMs) gives rise to autonomous bots that perform complex tasks using human-like reasoning on their own. The ability of large models to understand spatial relations and perform spatial operations, however, is known to be limited. This gap hinders the development of autonomous GIS analysts, travel planning assistants, and other possibilities of spatial bots. In this paper, we explore the impact of modality on the performance of LMMs in spatial planning tasks-specifically, retrieving a target brick by first removing all other bricks on top of it. Experiments demonstrate that what matters is not only the modality of the prompts (text or image), but also how informative the spatial descriptions are for the LMMs to complete the task. We propose novel concepts of task-implicit and task-explicit spatial descriptions to qualitatively quantify the task-specific informativity of prompts. Furthermore, we develop simple techniques to increase the spatial task-explicity of image prompts, and the accuracy of spatial planning increases from 26% to 100% accordingly.

Keywords

102035 Data science, 102001 Artificial intelligence, 507003 Geoinformatik, GeoAI, 102001 Artificial Intelligence, 102035 Data Science, 507003 Geoinformatics, task-explicity, multi-modal prompts, spatial reasoning, large multimodal models (LMM)

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    2
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
2
Average
Average
Average
hybrid