Powered by OpenAIRE graph
Found an issue? Give us feedback
ZENODOarrow_drop_down
ZENODO
Dataset . 2025
License: CC BY SA
Data sources: Datacite
ZENODO
Dataset . 2025
License: CC BY SA
Data sources: Datacite
versions View all 2 versions
addClaim

HyBEAR 🐻

The collection of hig-resolution hyperspectral images with handcrafted annotations for bare soil detection
Authors: Wijata, Agata; Ruszczak, Bogdan; Niepala, Adriana; Gumiela, Michał; Smykała, Krzysztof; Longepe, Nicolas; Nalepa, Jakub;
Abstract

Task The primary task is the detection of bare soil areas in Earth observation data. This is an important step in Precision Agriculture (PA) applications related to quantifying soil parameters and quality. Accurately identifying bare soil allows researchers to isolate the spectral response originating directly from the soil surface, which enhances the reliability of subsequent analyses aimed at estimating crucial soil properties, such as moisture content, nutrient levels, organic matter, and texture. Bare soil identification is also essential for monitoring agricultural practices like tillage and assessing soil erosion risks.While bare soil detection is commonly addressed at the pixel level (classifying pixels as soil or background), HyBEAR 🐻 aims to support the development of methods that identify entire fields with no vegetation (entire agricultural parcels). Dataset HyBEAR 🐻 is introduced as a novel large-scale collection of high-resolution hyperspectral aerial images. It is the largest and most heterogeneous dataset for bare soil detection released to date. Size and Scale: The dataset contains 1,954 hyperspectral image patches, totaling 108,064,591 pixels, corresponding to 35,588 hectares. The compressed dataset has a total size of 87.6 [GB]. Resolution: The Ground Sampling Distance (GSD) is 2 [m]. Acquisition: Data was acquired by QZ Solutions in Southern Poland on March 3, 2021. The imaging system used was the HySpex VS-725 (Norsk Elektro Optikk AS), flown on a Piper PA-31 Navajo aircraft. Spectral Information: 430 spectral bands are captured for each pixel, covering the range 414.1–2357.4 [nm]. This includes data from two sensors: SWIR-384 (288 bands, 930–2500 nm) and VNIR-1800 (186 bands, 400–1000 nm). Location and Heterogeneity: Data was collected for two areas: P1 (Lower Silesian Voivodeship, near Przeworno) and P2 (Opolskie Voivodeship, south of Głubczyce). These areas are geographically separated by more than 60 km, and images were acquired within an hour of each other, introducing variability in acquisition conditions and contributing to the dataset’s heterogeneity. Annotations (Ground Truth - GT): GT was meticulously prepared using a combination of automated and manual interpretation methods, verified by domain experts. Manual labeling leveraged RGB, NDVI, and especially CIR (Color Infrared) compositions to accurately delineate bare soil. The annotations are binary: SOIL class is encoded as (1). NON-SOIL class is encoded as (0). Background/No Data pixels are encoded as (-9999). Data Structure: The data consists of square patches of fixed dimensions 250x250 pixels. Versions: The dataset is available in two versions: FULL (the complete collection of 1,954 patches) and MINI (a random, stratified subset of 250 images, 50 from each fold). Validation Procedure and Baseline Results HyBEAR defines a standardized validation procedure, protocols, and quality metrics to ensure reproducibility and unbiased confrontation of emerging algorithms. Cross-Validation: A five-fold cross-validation protocol is defined using 5 spatially-disjoint folds (F0 to F4). Fold F0 represents map P1, and F1–F4 represent map P2. This spatial splitting is designed to evaluate the algorithms' ability to generalize to new, unknown areas and verify their robustness to variable acquisition conditions. Evaluation Metrics: Performance is assessed using standard classification and segmentation metrics: Accuracy (ACC), Sensitivity (SEN), Specificity (SPE), F-score (F1), Intersection over Union (IoU), Matthews Correlation Coefficient (MCC), and the Area Under the ROC Curve (AUC). Baseline Results: Baseline results were established using classic Machine Learning (ML) models operating on 430-size feature vectors (all spectral bands per pixel). For the FULL dataset, the Logistic Regression (LR) and Support Vector Machines (SVM) models achieved the highest performance. The average accuracy (ACC) for LR was 0.927 ± 0.016, and for SVM 0.926 ± 0.016. Instructions and Availability The HyBEAR dataset, along with code and trained baseline models, is released to ensure full reproducibility of bare soil detection research. Availability: HyBEAR is published on Zenodo. DOI: https://doi.org/10.5281/zenodo.17607897. Code: The accompanying package includes Python code (Jupyter Notebooks) for displaying data, reproducing benchmark results, and configuration files necessary to process the dataset. Models: Trained models for Logistic Regression and Support Vector Machines (10 files in total) are delivered under the suggested 5-fold cross-validation regime. Citation @article{2026HyBEAR, title = {{HyBEAR} : A Large-Scale Hyperspectral Benchmark for Bare Soil Detection}, author = {Wijata, Agata M. and Ruszczak, Bogdan and Niepala, Adriana and Gumiela, Micha\l{} and Smykala, Krzysztof and Long\'ep\'e, Nicolas and Nalepa, Jakub}, journal = {Earth System Science Data (ESSD)}, year = {2026}, % Inferred from the source file name volume = {TBD}, % To Be Determined pages = {TBD}, doi = {TBD}} The dataset files HyBEAR_MINI.zip - 250 images (50 images for each fold of 5 folds) plus: all the metadata, Python code examples, baseline ML models, and configuration files. HyBEAR_F0_FULL.zip - 310 images from fold 0 HyBEAR_F1_FULL.zip - 339 images from fold 1 HyBEAR_F2_FULL.zip - 344 images from fold 2 HyBEAR_F3_FULL.zip - 350 images from fold 3 HyBEAR_F4_FULL.zip - 361 images from fold 4

Keywords

Earth science, Earth observation, Deep learning, Hyperspectral Imaging, Remote sensing, Machine Learning, machine learning, Deep Learning, Image processing, bare soil, Machine learning, Remote Sensing Technology, Earth Sciences, soil analysis, Remote sensing centre

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average