
This is the older version. Please see the version 2 Open-source daily fine particulate matter (PM2.5) datasets at a 10 km resolution for India from 2005 to 2023, using a region-specific two-stage machine learning model carefully validated on held-out monitor data that it was not trained on. Our model demonstrates robust out-of-sample performance, substantially outperforming existing publicly-available monthly PM2.5 datasets. To take advantage of both the longer available time series of Aerosol Optical Depth (AOD) data and information from newer sensors such as TROPOspheric Monitoring Instrument (TROPOMI), we developed two separate machine learning models - the "Full model" and the "AOD model". Full model: Predictive performance (spatial cross-validation): R2 value of 0.67, RMSE of 27.79 μg/m3 Input features: Moderate Resolution Imaging Spectroradiometer (MODIS) AOD and TROPOMI satellite inputs along with other remote sensing data Daily PM2.5 predictions for: July 10, 2018 - September 30, 2023 AOD model: Predictive performance (spatial cross-validation): R2 value of 0.64, RMSE of 32.08 μg/m3 Input features: all inputs except TROPOMI used for the Full model Daily PM2.5 predictions for: January 1, 2005 - September 30, 2023 Please note that we employed spatial cross-validation (CV) rather than more conventional random CV to be responsible for predicting daily PM2.5 concentrations for locations without air quality monitors across India. When the above Full model was evaluated using 10-fold random CV, it showed notably higher performance (R2 of 0.85 and RMSE of 18.48 μg/m3). This highlights the potential of random CV to overstate model performance on critical real-world applications. The paper has been submitted for publication in a peer reviewed journal, but has yet to be formally accepted for publication. You can find a preprint on EarthArXiv: https://doi.org/10.31223/X5H40F
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 1 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
