
Dataset Introduction We introduce three benchmark datasets—Dataset-GSS, Dataset-UPC, and Dataset-PCW—designed to support comprehensive and contamination-aware evaluation of large language models (LLMs) on image geolocation tasks. Each dataset is constructed from a distinct source, covers unique geospatial contexts, and is carefully filtered to ensure high-quality, geo-informative visual content. The datasets include geographic coordinates, region labels, and image-level metadata to support reproducible geolocation benchmarking. Dataset-GSS (Dataset1): Global Streetscape Set Derived from the NUS Global Streetscapes dataset (Hou et al., 2024), this set focuses on high-quality street-level images from around the world. Starting from over 10,000 manually annotated images, we applied multi-stage filtering to retain only those with: Complete annotations across eight visual attributes (e.g., lighting, glare, weather, platform), High image quality (quality: good, reflection: no, glare: no), Sufficient geolocation cues (e.g., visible signs or context-revealing elements). The final dataset includes 6,152 images from 396 cities in 123 countries, offering wide-ranging cultural, architectural, and environmental diversity at the global scale. All images are precisely geo-tagged and manually verified. Dataset-UPC (Dataset2): U.S. POIs Crowdsourced Set This dataset is compiled from the Google Maps POI dataset released by UC San Diego, which contains nearly 5 million U.S. POIs collected up to 2021. We apply a stratified sampling approach to ensure balanced representation across: All 50 U.S. states and the District of Columbia, 17 POI categories, such as “restaurant,” “hotel,” “museum,” “scenic spot,” “park,” and others. For each state-category pair, we randomly selected POIs, downloaded associated images from Google Maps, and manually filtered the pool to remove: Broken or inaccessible image URLs, Advertisements or promotional content, Images with identifiable human faces. The final dataset contains 2,929 geo-tagged images, each associated with a POI name, type, address (including ZIP code), latitude and longitude, and category label. Dataset-PCW (Dataset3): Privately Collected Wild Set To support evaluation on out-of-distribution data and avoid leakage from public web sources or LLM training corpora, we constructed a private dataset of 272 original image–address pairs collected by the authors. The images were captured across various locations in the U.S. and globally and span: Scene types: urban, suburban, rural, and natural settings, Weather conditions: sunny, cloudy, snowy, etc., Time of day: daytime, nighttime, etc. Each image is matched to a verified physical address and latitude-longitude pair. This dataset is reserved exclusively for held-out evaluation and does not overlap with any existing benchmarks. Ground Truth Description The ground truth for all datasets in this collection consists of verified geographic coordinates (latitude and longitude) for each sample. These coordinates serve as the authoritative reference for evaluating the accuracy of geolocation, mapping, or spatial analysis methods. For each sample, the ground truth is defined by the values in the latitude and longitude columns. Coordinates were determined from authoritative sources and/or validated through manual review to ensure high accuracy.
image geolocalization, large language model
image geolocalization, large language model
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
