
doi: 10.3390/math13050849
In the context of neural network optimization, this study explores the performance and computational efficiency of learning rate adjustment strategies applied with Adam and SGD optimizers. Methods evaluated include exponential annealing, step decay, and SHAP-informed adjustments across three datasets: Breast Cancer, Diabetes, and California Housing. The SHAP-informed adjustments integrate feature importance metrics derived from cooperative game theory, either scaling the global learning rate or directly modifying gradients of first-layer parameters. A comprehensive grid search was conducted to optimize the hyperparameters, and performance was assessed using metrics such as test loss, RMSE, R2 score, accuracy, and training time. Results revealed that while step decay consistently delivered strong performance across datasets, SHAP-informed methods often demonstrated even higher accuracy and generalization, such as SHAP achieving the lowest test loss and RMSE on the California Housing dataset. However, the computational overhead of SHAP-based approaches was significant, particularly in targeted gradient adjustments. This study highlights the potential of SHAP-informed methods to guide optimization processes through feature-level insights, offering advantages in data with complex feature interactions. Despite computational challenges, these methods provide a foundation for exploring how feature importance can inform neural network training, presenting promising directions for future research on scalable and efficient optimization techniques.
Adam optimizer, SHAP, learning rate adjustments, QA1-939, neural networks, grid search, Mathematics, performance evaluation
Adam optimizer, SHAP, learning rate adjustments, QA1-939, neural networks, grid search, Mathematics, performance evaluation
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 3 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
