publication . Conference object . Preprint . 2015

Gradual DropIn of Layers to Train Very Deep Neural Networks

Smith, Leslie N.; Hand, Emily M.; Doster, Timothy;
Open Access
  • Published: 21 Nov 2015
  • Publisher: IEEE
We introduce the concept of dynamically growing a neural network during training. In particular, an untrainable deep network starts as a trainable shallow network and newly added layers are slowly, organically added during training, thereby increasing the network's depth. This is accomplished by a new layer, which we call DropIn. The DropIn layer starts by passing the output from a previous layer (effectively skipping over the newly added layers), then increasingly including units from the new layers for both feedforward and backpropagation. We show that deep networks, which are untrainable with conventional methods, will converge with DropIn layers interspersed...
free text keywords: Architecture, Feed forward, Regularization (mathematics), Computer science, MNIST database, Artificial intelligence, business.industry, business, Backpropagation, Deep neural networks, Artificial neural network, Pattern recognition, Computer Science - Neural and Evolutionary Computing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Learning
Digital Humanities and Cultural Heritage
30 references, page 1 of 2

[1] P. Baldi and P. J. Sadowski. Understanding dropout. In Advances in Neural Information Processing Systems, pages 2814-2822, 2013. 2

[2] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 41-48. ACM, 2009. 1 [OpenAIRE]

[3] T. M. Breuel. Possible mechanisms for neural reconfigurability and their implications. arXiv preprint arXiv:1508.02792, 2015. 2 [OpenAIRE]

[4] D. Erhan, P.-A. Manzagol, Y. Bengio, S. Bengio, and P. Vincent. The difficulty of training deep architectures and the effect of unsupervised pre-training. In International Conference on Artificial Intelligence and Statistics, pages 153-160, 2009. 1, 2

[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 580-587. IEEE, 2014. 1

[6] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In International conference on artificial intelligence and statistics, pages 249-256, 2010. 1, 2

[7] A. Graves and N. Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the 31st International Conference on Machine Learning (ICML14), pages 1764-1772, 2014. 1

[8] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. arXiv preprint arXiv:1502.01852, 2015. 2

[9] G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527-1554, 2006. 2

[10] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012. 2

[11] B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J. Pazhayampallil, M. Andriluka, R. Cheng-Yue, F. Mujica, A. Coates, et al. An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716, 2015. 1

[12] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, pages 675-678, 2014. 3, 5

[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 2012. 1

[14] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436-444, 2015. 1

[15] G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio. On the number of linear regions of deep neural networks. In Advances in Neural Information Processing Systems, pages 2924-2932, 2014. 1 [OpenAIRE]

30 references, page 1 of 2
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue