MP CBM-Z V1.0: design for a new CBM-Z gas-phase chemical mechanism architecture for next generation processors

Other literature type English OPEN
Wang, Hui ; Lin, Junmin ; Wu, Qizhong ; Chen, Huansheng ; Tang, Xiao ; Wang, Zifa ; Chen, Xueshun ; Cheng, Huaqiong ; Wang, Lanning (2018)

Precise and rapid air quality simulation and forecasting are limited by the computation performance of the air quality model, and the gas-phase chemistry module is the most time-consuming function in the air quality model. In this study, we designed a new framework for the widely used Carbon Bond Mechanism Z (CBM-Z) gas-phase chemical kinetics kernel to adapt the Single Instruction Multiple Data (SIMD) technology in the next-generation processors for improving its calculation performance. The optimization implements the fine-grain level parallelization of CBM-Z by improving its vectorization ability. Through constructing loops and integrating the main branches, e.g. diverse chemistry sub-schemes, multiple spatial points in the model can be operated simultaneously on vector processing units (VPU). The Intel Xeon E5-2697 V4 CPU and Intel Xeon Phi 7250 Knight Landing (KNL) are used as the benchmark processors. The validation of the model outputs indicates that the relative errors are in an acceptable range (<&thinsp;0.05&thinsp;%). The results show that the optimization resulted in a 4.24x speedup on a single CPU core and 17.33x speedup on a single KNL core. For the node, the speedup on the CPU can reach 113.42x using Message Passing Interface (MPI) and 118.13x using OpenMP, and the speedup on the KNL node can reach 170.31x using MPI and 179.95x using OpenMP. The speedup of the optimized CBM-Z is approximately 50&thinsp;~&thinsp;52&thinsp;% higher on a 1-socket KNL platform than on a 2-socket CPU platform. This work improves the performance of the CBM-Z chemical kinetics kernel as well as the calculation efficiency of the air quality model, which can directly improve the practical value of the air quality model in scientific simulation and routine forecasting. Furthermore, since this optimization seeks to improve the utilization of the VPU, the model is more suitable for the new generation processors adopting the more advanced SIMD technology.
Share - Bookmark