publication . Preprint . 2017

Theoretical properties of the global optimizer of two layer neural network

Boob, Digvijay; Lan, Guanghui;
Open Access English
  • Published: 30 Oct 2017
In this paper, we study the problem of optimizing a two-layer artificial neural network that best fits a training dataset. We look at this problem in the setting where the number of parameters is greater than the number of sampled points. We show that for a wide class of differentiable activation functions (this class involves "almost" all functions which are not piecewise linear), we have that first-order optimal solutions satisfy global optimality provided the hidden layer is non-singular. Our results are easily extended to hidden layers given by a flat matrix from that of a square matrix. Results are applicable even if network has more than one hidden layer p...
free text keywords: Computer Science - Learning
Download from
29 references, page 1 of 2

L 2

f (Wk+1, θk+1) ≤ f (Wk, θk+1) + 2 vect(Wk+1 − Wk)

+ vect(∇W f (Wk, θk+1)), vect(W1k+1 − W1k ) [6] Collobert, R., and Weston, J. A unified architecture for natural lan-

ceedings of the 25th International Conference on Machine Learning (2008),

ICML '08, pp. 160-167. [7] Ghadimi, S., and Lan, G. Stochastic first- and zeroth-order methods

(2013), 2341-2368. [8] Ghadimi, S., and Lan, G. Accelerated gradient methods for nonconvex

nonlinear and stochastic programming. Math. Program. 156 (2016), 59-99. [9] Ghadimi, S., Lan, G., and Zhang, H. Generalized uniformly optimal

methods for nonlinear programming. CoRR (2015). [10] Haeffele, B. D., and Vidal, R. Global optimality in tensor factoriza-

tion, deep learning, and beyond. CoRR (2015). [11] Hazan, E., Levy, K. Y., and Shalev-Shwartz, S. Beyond convexity:

Stochastic quasi-convex optimization. In Proceedings of the 28th Interna-

tional Conference on Neural Information Processing Systems - Volume 1

(2015), pp. 1594-1602. [12] Kakade, S., Kalai, A. T., Kanade, V., and Shamir, O. Efficient

sion. CoRR (2011). [13] Kalai, A., and Sastry, R. The isotron algorithm: High-dimensional

isotonic regression. [14] Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet clas-

25th International Conference on Neural Information Processing Systems

29 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue