
If you use this dataset in your research/work, please cite the following paper: Kawano, Ayako, et al. "Improved daily PM2.5 estimates in India reveal inequalities in recent enhancement of air quality." Science Advances 11.4 (2025): eadq1071. DOI: 10.1126/sciadv.adq1071 Thank you for acknowledging our work! ---------------------------------------------- Open-source daily fine particulate matter (PM2.5) datasets at a 10 km resolution for India from 2005 to 2023, using a region-specific two-stage machine learning model carefully validated on held-out monitor data that it was not trained on. Our model demonstrates robust out-of-sample performance, substantially outperforming existing publicly-available monthly PM2.5 datasets. To take advantage of both the longer available time series of Aerosol Optical Depth (AOD) data and information from newer sensors such as TROPOspheric Monitoring Instrument (TROPOMI), we developed two separate machine learning models - the "Full model" and the "AOD model". Full model: Predictive performance (spatial cross-validation): R2 value of 0.67, RMSE of 27.79 μg/m3 Input features: Moderate Resolution Imaging Spectroradiometer (MODIS) AOD and TROPOMI satellite inputs along with other remote sensing data Daily PM2.5 predictions for: July 10, 2018 - September 30, 2023 AOD model: Predictive performance (spatial cross-validation): R2 value of 0.64, RMSE of 32.08 μg/m3 Input features: all inputs except TROPOMI used for the Full model Daily PM2.5 predictions for: January 1, 2005 - September 30, 2023 Please note that we employed spatial cross-validation (CV) rather than more conventional random CV to be responsible for predicting daily PM2.5 concentrations for locations without air quality monitors across India. When the above Full model was evaluated using 10-fold random CV, it showed notably higher performance (R2 of 0.85 and RMSE of 18.48 μg/m3). This highlights the potential of random CV to overstate model performance on critical real-world applications. Code and source data needed to replicate the results have been also deposited.
Machine Learning, Asia, aerosol, air pollution, public health, Machine learning, India, health, PM2.5, air quality, Atmospheric model
Machine Learning, Asia, aerosol, air pollution, public health, Machine learning, India, health, PM2.5, air quality, Atmospheric model
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
