descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 17 Jun 2024 English Publisher:Springer Science and Business Media LLCJournal:Environmental Monitoring and Assessment, volume 196 (issn: 0167-6369, eissn: 1573-2959,

Authors: Alvin Lal; Ashneel Sharan; Krishneel K. Sharma; Anant Ram; Dilip Kumar Roy; Bithin Datta;

doi: 10.1007/s10661-024-12794-w , 10.60692/w3h9w-wm668 , 10.60692/e9npx-wec93

pmid: 38880864

pmc: PMC11519171

handle: 1959.13/1507127

Scrutinizing different predictive modeling validation methodologies and data-partitioning strategies: new insights using groundwater modeling case study

- Summary
- Subjects
- Metrics

Abstract

AbstractGroundwater salinity is a critical factor affecting water quality and ecosystem health, with implications for various sectors including agriculture, industry, and public health. Hence, the reliability and accuracy of groundwater salinity predictive models are paramount for effective decision-making in managing groundwater resources. This pioneering study presents the validation of a predictive model aimed at forecasting groundwater salinity levels using three different validation methods and various data partitioning strategies. This study tests three different data validation methodologies with different data-partitioning strategies while developing a group method of data handling (GMDH)-based model for predicting groundwater salinity concentrations in a coastal aquifer system. The three different methods are the hold-out strategy (last and random selection), k-fold cross-validation, and the leave-one-out method. In addition, various combinations of data-partitioning strategies are also used while using these three validation methodologies. The prediction model’s validation results are assessed using various statistical indices such as root mean square error (RMSE), means squared error (MSE), and coefficient of determination (R2). The results indicate that for monitoring wells 1, 2, and 3, the hold-out (random) with 40% data partitioning strategy gave the most accurate predictive model in terms of RMSE statistical indices. Also, the results suggested that the GMDH-based models behave differently with different validation methodologies and data-partitioning strategies giving better salinity predictive capabilities. In general, the results justify that various model validation methodologies and data-partitioning strategies yield different results due to their inherent differences in how they partition the data, assess model performance, and handle sources of bias and variance. Therefore, it is important to use them in conjunction to obtain a comprehensive understanding of the groundwater salinity prediction model's behavior and performance.

Related Organizations

University of Newcastle Australia
Australia
University of the South Pacific
Fiji
James Cook University
Australia
Bangladesh Agricultural Research Institute
Bangladesh

Keywords

Salinity, Sustainable Development Goals, 310, FEMWATER, Engineering, Hydrological Modeling using Machine Learning Methods, SDG 6, Groundwater, SDG 3, Physics, Statistics, Groundwater Level Forecasting, Power (physics), group method of data handling (GMDH), machine learning, Reliability (semiconductor), Physical Sciences, Environmental Monitoring, data partitioning strategies, Environmental Engineering, Rainfall-Runoff Modeling, groundwater salinity, Quantum mechanics, Environmental science, Artificial Intelligence, FOS: Mathematics, Inductive Modeling in Scientific Research, Data mining, Research, FOS: Environmental engineering, Predictive modelling, Reproducibility of Results, Coefficient of determination, Models, Theoretical, Computer science, Geotechnical engineering, Groundwater Flow and Transport Modeling, Environmental Science, Computer Science, Mean squared error, Aquifer, Water Pollutants, Chemical, Mathematics

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average