
Important Notice: These old dataset versions (V2.0.x) have been superseded. A new version, CHM_PRE V2.1, is now available and is recommended for all users (https://doi.org/10.5281/zenodo.14632156). The updated version (V2.1) extends the data coverage to 2024 and incorporates adjusted precipitation values for the southern foothills of the Himalayas. 1. Description Dataset name: CHM_PRE V2 Summary: The CHM_PRE V2 dataset is a new high-precision, long-term, daily gridded precipitation dataset for Chinese mainland. The long-term daily observation from 3,476 gauges and incorporated 11 related precipitation variables were utilized to characterize the correlations of precipitation. Then, the dataset was developed by employing an improved inverse distance weighting method combined with the machine learning-based light gradient boosting machine (LGBM) algorithm. CHM_PRE V2 demonstrates strong spatiotemporal consistency with existing gridded precipitation datasets, including CHM_PRE V1, GSMaP, IMERG, PERSIANN-CDR, and GLDAS. Validation against 63,397 high-density gauges confirms its high accuracy in both precipitation values and events. The dataset achieves a mean absolute error of 1.48 mm/day and a Kling-Gupta efficiency coefficient of 0.88. In terms of event detection capability, CHM_PRE V2 achieves a Heidke skill score of 0.68 and a false alarm ratio of 0.24. Overall, CHM_PRE V2 significantly enhances precipitation measurement accuracy and reduces the overestimation of precipitation events, providing a reliable foundation for hydrological modeling and climate assessments. The CHM_PRE V2 dataset provides daily precipitation data with a resolution of 0.1°, covering the entire Chinese mainland (18°N–54°N, 72°E–136°E). This dataset covers the period of 1960–2023, and will be continuously updated annually. The daily precipitation data is provided in NetCDF format, and for the convenience of users, we also offer annual and monthly total precipitation data in both NetCDF and GeoTIFF formats. Latest version: Version 2.0 (January 2025) 2. Content of the dataset This dataset comprises the following four types of data: (1) Metadata for CHM_PRE V2: This document provides detailed information about the dataset. (2) CHM_PRE_V2_daily_{YEAR}.nc: Daily precipitation data in NetCDF format, organized into one file per year. (3) CHM_PRE_V2_annual.nc and CHM_PRE_V2_annual.tif: Annual total precipitation data available in NetCDF and GeoTIFF formats. (4) CHM_PRE_V2_monthly.nc and CHM_PRE_V2_monthly.tif: Monthly total precipitation data available in NetCDF and GeoTIFF formats. 3. Details of the variables in the file Each NetCDF file contains the following four variables: (1) lat: Latitude dimension, measured in degrees (°). (2) lon: Longitude dimension, measured in degrees (°). (3) time: Time dimension, measured in days since January 1, 1960. (4) prec: Precipitation variable with dimensions (time, lat, lon). The unit of this variable is mm/day for daily values, mm/month for monthly values, and mm/year for annual values. Missing values are represented as NaN. All GeoTIFF files use the WGS84 projection, with missing values set to -9999. In annual precipitation GeoTIFF files, each band represents the total precipitation for a specific year. The bands are organized sequentially, with the first band corresponding to the total precipitation for 1960, the second for 1961, and so on. The structure for monthly precipitation files is similar to that of annual files. 4. Resolution and Data Range Resolution: 0.1°. Time frame: January 1, 1960, to December 31, 2023 (with annual updates to follow). Space scope: 18°N–54°N, 72°E–136°E (as detailed in the table below). North:54°N West:72°E East:136°E South:18°N 5. Examples of utilization This dataset can be used using various approaches that support NetCDF and GeoTIFF formats. Below is an example of how to read this dataset using the Python programming language: import xarray as xr import rioxarray import dask from pathlib import Path # Open a single NetCDF file path_nc_monthly = 'CHM_PRE_V2_monthly.nc' ds_monthly = xr.open_dataset(path_nc_monthly) # Open a single GeoTIFF file path_tiff_annual = 'CHM_PRE_V2_annual.tif' ds_annual = xr.open_dataset(path_tiff_annual) # Open multiple NetCDF files as one dataset dir_nc_daily = Path('daily') list_path_nc_daily = dir_nc_daily.glob('CHM_PRE_V2_daily_*.nc') # Get paths of all daily value files ds_daily = xr.open_mfdataset( list_path_nc_daily, # Pass the list of file paths chunks='auto' # Set chunk size to enable dask chunking, which reduces memory usage and allows processing of very large NetCDF files on personal computers ) 6. References 1. Hu, J., Miao, C., Su, J., Zhang, Q., Gou, J., and Sun, Q.: An upgraded high-precision gridded precipitation dataset for the Chinese mainland considering spatial autocorrelation and covariates, Earth System Science Data Discussions. [preprint], https://doi.org/10.5194/essd-2025-20, in review, 2025 2. Zhang, Q., Miao, C., Su, J., Gou, J., Hu, J., Zhao, X., & Xu, Y. (2025). A new high-resolution multi-drought-index dataset for mainland China. Earth System Science Data, 17(3), 837–853. https://doi.org/10.5194/essd-17-837-2025 3. Han, J., Miao, C., Gou, J., Zheng, H., Zhang, Q., & Guo, X. (2023). A new daily gridded precipitation dataset for the Chinese mainland based on gauge observations. Earth System Science Data, 15(7), 3147–3161. https://doi.org/10.5194/essd-15-3147-2023 7. Authors and contacts Jinlong Hu (hujl98@mail.bnu.edu.cn) Chiyuan Miao (miaocy@bnu.edu.cn)
China, Meteorology, Geophysics, Daily gridded precipitation, Atmospheric science, CHM_PRE, Precipitation, FOS: Earth and related environmental sciences, CHM, Hydrology, Climate Science
China, Meteorology, Geophysics, Daily gridded precipitation, Atmospheric science, CHM_PRE, Precipitation, FOS: Earth and related environmental sciences, CHM, Hydrology, Climate Science
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
