
This repository provides the dataset accompanying the GEMS-GER (Groundwater Levels, Environment, Meteorology, Site Properties – Germany) benchmark for machine learning-based groundwater modeling. The dataset includes long-term groundwater level time series, meteorological and hydrological forcing data, site-specific environmental properties, and benchmark model evaluation results. All data originate from official public sources and have been harmonized across the 16 German federal states (Bundesländer). Contents of this repository: Groundwater level time series (GEMS-GER_data/dynamic/*.csv): Weekly aggregated groundwater levels (GWL) from 3,207 monitoring wells (1991–2022), including: Daily temperature (mean, min, max) Precipitation and humidity (HYRAS/DWD) Real, potential, and reference evapotranspiration Soil moisture and soil temperature (5 m) Snow water equivalent, snowmelt, and runoff (ERA5-Land) GWL_flag indicating observed vs. imputed values Site-specific static descriptors (GEMS-GER_data/static/static_features.csv): Hydrogeology and soil type Land use and climate classification Elevation and derived topographic parameters (e.g. slope, TWI) Benchmark model evaluation results (GEMS-GER_data/static/model_performance.csv): NSE, RMSE, R², and Bias scores for ML models applied to each well Pre-generated time series plots (GEMS-GER_data_figures/*.pdf): Visualizations of groundwater levels and selected forcing variables for all wells Provided separately to reduce the size of the main dataset download Directory structure: GEMS-GER_data/├── dynamic/ # 3,207 individual CSV files, one per well│ ├── MW_1.csv│ ├── MW_2.csv│ └── ...├── static/│ ├── static_features.csv # Site-specific static descriptors (e.g. geology, land use, climate)│ └── model_performance.csv # ML model evaluation metrics (NSE, RMSE, R², Bias)├── license_information.txt # Licensing details for groundwater level data from federal state sources└── README.md # Dataset description and usage notes GEMS-GER_data_figures/├── DYN_Feat_MW_1.pdf├── DYN_Feat_MW_2.pdf└── ... The dataset is intended for research and benchmarking in hydrogeology, data-driven groundwater modeling, and environmental machine learning. It forms the basis of the GEMS-GER benchmark, as described in the associated preprint. All data originate from public sources and have been harmonized across administrative and institutional boundaries to enable consistent large-scale analysis.
groundwater levels, hydrogeology, environmental data, Germany, machine learning benchmark, meteorological forcing, groundwater monitoring, geospatial dataset, hydrology, time series
groundwater levels, hydrogeology, environmental data, Germany, machine learning benchmark, meteorological forcing, groundwater monitoring, geospatial dataset, hydrology, time series
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
