Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Other literature type . 2025
License: CC BY
Data sources: ZENODO
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
https://lps25.esa.int/programm...
Article . 2025
License: CC BY
Data sources: Sygma
ZENODO
Presentation . 2025
License: CC BY
Data sources: Datacite
ZENODO
Presentation . 2025
License: CC BY
Data sources: Datacite
versions View all 4 versions
addClaim

Boosting Air Quality Downscaling with Extreme-Value-Sensitive Strategies

Authors: Maria Dekavalla; Chrysovalantis Tsiakos; Angelos Amditis; Georgios Tsimiklis;

Boosting Air Quality Downscaling with Extreme-Value-Sensitive Strategies

Abstract

Presentation presented at the Living Planet Symposium 2025 Conference. Mining industries must monitor air pollutant emissions to comply with EU directives, protect public health and the environment, and support broader sustainability goals. This involves using direct on-site measurements, drones, and mobile units. With their broad coverage and pollutant-specific detection capabilities, spaceborne sensors provide a complementary tool for tracking emissions over large areas, ensuring regulatory compliance, and improving environmental management. However, the spaceborne estimates represent only the total atmospheric column and may not provide comprehensive information about surface concentrations. Therefore, data fusion from multiple sources, including ground stations, satellites, and CTMs based on machine learning (ML) methods, has seen significant advancements in estimating ground-level air pollutant concentrations. ML methods like Random Forests (RF), support vector regression, feed-forward neural networks, and eXtreme Gradient Boosting (XGBoost) are widely used to predict high-resolution ground-level air pollutant concentrations by using data from ground-based measurements, assimilated records (e.g., meteorological variables), and satellite observations (e.g., Sentinel-5P TROPOMI). To enable end-to-end predictions, these downscaling models establish relationships between inputs (e.g., NO₂, CO column density) and outputs (e.g., ground-based measurements of air pollutant concentrations). These ML-based downscaling methods usually outperform classical spatial interpolation and statistical regression methods; however, they struggle to estimate extreme events since their representative values are located in the tails or even outside of the training distribution. Therefore, it is important to identify ML models and training strategies capable of addressing the highly imbalanced data distribution and extending the magnitude of their prediction range to capture hotspots of extreme air pollutant emissions. The EU-funded TERRAVISION project addresses this gap by introducing a novel framework that utilises ensemble techniques that combine multiple ML models; strengths and training strategies for effectively handling imbalanced datasets. Techniques such as oversampling the high-concentration samples or incorporating appropriate loss functions that penalise prediction errors for extreme values could help alleviate the challenges posed by the imbalanced distribution. In addition to these strategies, evaluation metrics, like Geometric Mean (GM) and Squared Error Relevance Area – SERA should be employed to accurately measure how well the ML models capture extreme values and get a complete understanding of their downscaling capabilities. This framework incorporates an ablation study involving variations of ML models, oversampling techniques, cost-sensitive learning where data points are weighted according to their target value rarities, loss functions that perform asymmetric optimisation and highlight extreme values, and evaluation metrics to assess the individual contributions of each component. The models were trained on benchmark datasets, including ground-based measurements from the European Environmental Agency (EEA) air quality monitoring station network, Sentinel-5P tropospheric vertical column density values and modelled meteorological data obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5 -Land (ERA5-Land) model to estimate the near-surface concentrations of air pollutants at 1 km spatial resolution. The trained models were benchmarked against state-of-the-art ML models to downscale air pollutant concentrations. Evaluations were performed on the entire test dataset as well as on three disjoint subsets based on the number of samples that correspond to specific ranges of near-surface concentration values. Extensive experiments verify the superior performance of the proposed strategies towards addressing the underestimation of extreme ground-level concentrations on air quality downscaling tasks. By incorporating these improvements into our modelling framework, we can overcome existing limitations and improve the accuracy and reliability of air quality predictions. This will ultimately benefit environmental monitoring and decision-making processes.

Keywords

Earth observation, Mining

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Green