
AbstractHealth and environmental hazards related to high pollution concentrations have become a serious issue from public policy perspectives and human health. Using Machine Learning (ML) approach, this research aims to improve the estimation of grid‐wise PM2.5, a criteria pollutant, by reducing systematic bias from speciation provided by MERRA‐from the Modern‐Era Retrospective analysis for Research and Applications version 2 (MERRA‐2). The ML model was trained using various meteorological parameters and aerosol species simulated by MERRA‐2 and ground measurements from Environmental Protection Agency (EPA) air quality system stations. The ML approach significantly improved performance and reduced mean bias in the 0–10 μg m−3 range. We also used the Random Forest ML model for each EPA region using 1 year of collocated data sets. The resulting ML models for each EPA region were validated, and the aggregate data set has a Spearman Rank correlation (SR) of 0.73 (RMSE = 4.8 μg m−3) and 0.69 (RMSE = 5.8 μg m−3) for training and testing, respectively. The SR (and RMSE in μg m−3) increased to 0.81 (3.9), 0.89 (1.6), and 0.90 (1.1) for daily, monthly, and yearly averages, respectively. The results from the initial implementation of the ML model for the global region are encouraging. Still, they require more research and development to overcome challenges associated with data gaps in many parts of the world.
QE1-996.5, Astronomy, QB1-991, Geology, PM2.5, ML, grided estimation, random forest, MERRA‐2
QE1-996.5, Astronomy, QB1-991, Geology, PM2.5, ML, grided estimation, random forest, MERRA‐2
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 11 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
