
The widespread use of neural networks and their increasing complexity necessitate effective training algorithms to optimize their performance. While second-order methods like Scaled Conjugate Gradient (SCG) offer potential benefits by utilizing curvature infor- mation, standard SCG scales poorly with large datasets typical in modern deep learning. This thesis tackles the problem of adapting SCG for training deep neural networks on large datasets. We investigate SCG's behavior on benchmark tasks, identifying its strengths and limitations. Based on this analysis, we propose Mini-Batch SCG (MBSCG) and two training techniques, Reused-Batch and Batch-Overlap, designed to enhance scalability and convergence. Comparative experiments against Adam on MNIST, CIFAR-100, and SST-2 demonstrate the viability of our approach. Furthermore, interpretability studies reveal that SCG-based methods can induce distinct learned representations compared to Adam.
Vysvětlitelná umělá inteligence|XAI|konjugované gradienty|škálované konjugované gradienty|reprezentace znalostí; explainable artificial intelligence|XAI|conjugate gradients|scaled conjugate gradients|knowledge representation
Vysvětlitelná umělá inteligence|XAI|konjugované gradienty|škálované konjugované gradienty|reprezentace znalostí; explainable artificial intelligence|XAI|conjugate gradients|scaled conjugate gradients|knowledge representation
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
