IMAGEO-Bench: A Systematic Benchmark Dataset for Evaluating Image Geolocalization Ability in Large Language Models

LI, Lingyao; Runlong, Yu; Qikai, Hu; Bowei, Li; Min, Deng; Yang, Zhou; Xiaowei, Jia

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Dataset . 2025

License: CC BY

Data sources: ZENODO

ZENODO

Dataset . 2025

License: CC BY

Data sources: Datacite

ZENODO

Dataset . 2025

License: CC BY

Data sources: Datacite

IMAGEO-Bench: A Systematic Benchmark Dataset for Evaluating Image Geolocalization Ability in Large Language Models

Research datakeyboard_double_arrow_right Dataset 01 Aug 2025Embargo end date: 01 Aug 2025 English Publisher:Zenodo

Authors: LI, Lingyao; Runlong, Yu; Qikai, Hu; Bowei, Li; Min, Deng; Yang, Zhou; Xiaowei, Jia;

doi: 10.5281/zenodo.16670471 , 10.5281/zenodo.16670470

IMAGEO-Bench: A Systematic Benchmark Dataset for Evaluating Image Geolocalization Ability in Large Language Models

- Summary
- Subjects
- Metrics

Abstract

Dataset Introduction We introduce three benchmark datasets—Dataset-GSS, Dataset-UPC, and Dataset-PCW—designed to support comprehensive and contamination-aware evaluation of large language models (LLMs) on image geolocation tasks. Each dataset is constructed from a distinct source, covers unique geospatial contexts, and is carefully filtered to ensure high-quality, geo-informative visual content. The datasets include geographic coordinates, region labels, and image-level metadata to support reproducible geolocation benchmarking. Dataset-GSS (Dataset1): Global Streetscape Set Derived from the NUS Global Streetscapes dataset (Hou et al., 2024), this set focuses on high-quality street-level images from around the world. Starting from over 10,000 manually annotated images, we applied multi-stage filtering to retain only those with: Complete annotations across eight visual attributes (e.g., lighting, glare, weather, platform), High image quality (quality: good, reflection: no, glare: no), Sufficient geolocation cues (e.g., visible signs or context-revealing elements). The final dataset includes 6,152 images from 396 cities in 123 countries, offering wide-ranging cultural, architectural, and environmental diversity at the global scale. All images are precisely geo-tagged and manually verified. Dataset-UPC (Dataset2): U.S. POIs Crowdsourced Set This dataset is compiled from the Google Maps POI dataset released by UC San Diego, which contains nearly 5 million U.S. POIs collected up to 2021. We apply a stratified sampling approach to ensure balanced representation across: All 50 U.S. states and the District of Columbia, 17 POI categories, such as “restaurant,” “hotel,” “museum,” “scenic spot,” “park,” and others. For each state-category pair, we randomly selected POIs, downloaded associated images from Google Maps, and manually filtered the pool to remove: Broken or inaccessible image URLs, Advertisements or promotional content, Images with identifiable human faces. The final dataset contains 2,929 geo-tagged images, each associated with a POI name, type, address (including ZIP code), latitude and longitude, and category label. Dataset-PCW (Dataset3): Privately Collected Wild Set To support evaluation on out-of-distribution data and avoid leakage from public web sources or LLM training corpora, we constructed a private dataset of 272 original image–address pairs collected by the authors. The images were captured across various locations in the U.S. and globally and span: Scene types: urban, suburban, rural, and natural settings, Weather conditions: sunny, cloudy, snowy, etc., Time of day: daytime, nighttime, etc. Each image is matched to a verified physical address and latitude-longitude pair. This dataset is reserved exclusively for held-out evaluation and does not overlap with any existing benchmarks. Ground Truth Description The ground truth for all datasets in this collection consists of verified geographic coordinates (latitude and longitude) for each sample. These coordinates serve as the authoritative reference for evaluating the accuracy of geolocation, mapping, or spatial analysis methods. For each sample, the ground truth is defined by the values in the latitude and longitude columns. Coordinates were determined from authoritative sources and/or validated through manual review to ensure high accuracy.

Related Organizations

Florida Southern College
United States

Keywords

image geolocalization, large language model

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average