Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2025
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

IMAGEO-Bench: A Systematic Benchmark Dataset for Evaluating Image Geolocalization Ability in Large Language Models

Authors: LI, Lingyao; Runlong, Yu; Qikai, Hu; Bowei, Li; Min, Deng; Yang, Zhou; Xiaowei, Jia;

IMAGEO-Bench: A Systematic Benchmark Dataset for Evaluating Image Geolocalization Ability in Large Language Models

Abstract

Dataset Introduction We introduce three benchmark datasets—Dataset-GSS, Dataset-UPC, and Dataset-PCW—designed to support comprehensive and contamination-aware evaluation of large language models (LLMs) on image geolocation tasks. Each dataset is constructed from a distinct source, covers unique geospatial contexts, and is carefully filtered to ensure high-quality, geo-informative visual content. The datasets include geographic coordinates, region labels, and image-level metadata to support reproducible geolocation benchmarking. Dataset-GSS (Dataset1): Global Streetscape Set Derived from the NUS Global Streetscapes dataset (Hou et al., 2024), this set focuses on high-quality street-level images from around the world. Starting from over 10,000 manually annotated images, we applied multi-stage filtering to retain only those with: Complete annotations across eight visual attributes (e.g., lighting, glare, weather, platform), High image quality (quality: good, reflection: no, glare: no), Sufficient geolocation cues (e.g., visible signs or context-revealing elements). The final dataset includes 6,152 images from 396 cities in 123 countries, offering wide-ranging cultural, architectural, and environmental diversity at the global scale. All images are precisely geo-tagged and manually verified. Dataset-UPC (Dataset2): U.S. POIs Crowdsourced Set This dataset is compiled from the Google Maps POI dataset released by UC San Diego, which contains nearly 5 million U.S. POIs collected up to 2021. We apply a stratified sampling approach to ensure balanced representation across: All 50 U.S. states and the District of Columbia, 17 POI categories, such as “restaurant,” “hotel,” “museum,” “scenic spot,” “park,” and others. For each state-category pair, we randomly selected POIs, downloaded associated images from Google Maps, and manually filtered the pool to remove: Broken or inaccessible image URLs, Advertisements or promotional content, Images with identifiable human faces. The final dataset contains 2,929 geo-tagged images, each associated with a POI name, type, address (including ZIP code), latitude and longitude, and category label. Dataset-PCW (Dataset3): Privately Collected Wild Set To support evaluation on out-of-distribution data and avoid leakage from public web sources or LLM training corpora, we constructed a private dataset of 272 original image–address pairs collected by the authors. The images were captured across various locations in the U.S. and globally and span: Scene types: urban, suburban, rural, and natural settings, Weather conditions: sunny, cloudy, snowy, etc., Time of day: daytime, nighttime, etc. Each image is matched to a verified physical address and latitude-longitude pair. This dataset is reserved exclusively for held-out evaluation and does not overlap with any existing benchmarks. Ground Truth Description The ground truth for all datasets in this collection consists of verified geographic coordinates (latitude and longitude) for each sample. These coordinates serve as the authoritative reference for evaluating the accuracy of geolocation, mapping, or spatial analysis methods. For each sample, the ground truth is defined by the values in the latitude and longitude columns. Coordinates were determined from authoritative sources and/or validated through manual review to ensure high accuracy.

Related Organizations
Keywords

image geolocalization, large language model

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average