publication . Preprint . 2018

Far-HO: A Bilevel Programming Package for Hyperparameter Optimization and Meta-Learning

Franceschi, Luca; Grazzi, Riccardo; Pontil, Massimiliano; Salzo, Saverio; Frasconi, Paolo;
Open Access English
  • Published: 13 Jun 2018
Abstract
In (Franceschi et al., 2018) we proposed a unified mathematical framework, grounded on bilevel programming, that encompasses gradient-based hyperparameter optimization and meta-learning. We formulated an approximate version of the problem where the inner objective is solved iteratively, and gave sufficient conditions ensuring convergence to the exact problem. In this work we show how to optimize learning rates, automatically weight the loss of single examples and learn hyper-representations with Far-HO, a software package based on the popular deep learning framework TensorFlow that allows to seamlessly tackle both HO and ML problems.
Subjects
free text keywords: Computer Science - Mathematical Software, Computer Science - Learning, Statistics - Machine Learning
Download from
17 references, page 1 of 2

Jonathan F. Bard. Practical bilevel optimization: algorithms and applications, volume 30. Springer Science & Business Media, 2013. 01251.

Jonathan Baxter. Learning internal representations. In Proceedings of the 8th Annual Conference on Computational Learning Theory (COLT), pages 311{320. ACM, 1995.

Rich Caruana. Multitask learning. In Learning to learn, pages 95{133. Springer, 1998. 02683. [OpenAIRE]

Beno^t Colson, Patrice Marcotte, and Gilles Savard. An overview of bilevel optimization. Annals of operations research, 153(1):235{256, 2007.

Nando de Freitas. Learning to Learn and Compositionality with Deep Recurrent Neural Networks: Learning to Learn and Compositionality. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. [OpenAIRE]

Justin Domke. Generic Methods for Optimization-Based Modeling. In AISTATS, volume 22, pages 318{326, 2012. URL http://www.jmlr.org/proceedings/papers/v22/domke12/ domke12.pdf.

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, (ICML), pages 1126{1135, 2017. URL http://proceedings.mlr. press/v70/finn17a.html.

Remi Flamary, Alain Rakotomamonjy, and Gilles Gasso. Learning constrained task similarities in graph-regularized multi-task learning. Regularization, Optimization, Kernels, and Support Vector Machines, page 103, 2014.

Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. Forward and reverse gradient-based hyperparameter optimization. In Proceedings of the 34th International Conference on Machine Learning, (ICML), pages 1165{1173, 2017. URL http://proceedings.mlr.press/v70/franceschi17a.html. [OpenAIRE]

Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. In Proceedings of The 35rd International Conference on Machine Learning (ICML), 2018. [OpenAIRE]

Andreas Griewank and Andrea Walther. Evaluating derivatives: principles and techniques of algorithmic di erentiation. SIAM, 2008. [OpenAIRE]

Frank Hutter, Jrg Lcke, and Lars Schmidt-Thieme. Beyond Manual Tuning of Hyperparameters. KI - Kunstliche Intelligenz, 29(4):329{337, November 2015. ISSN 0933- 1875, 1610-1987. doi: 10.1007/s13218-015-0381-0. URL http://link.springer.com/ 10.1007/s13218-015-0381-0.

S Sathiya Keerthi, Vikas Sindhwani, and Olivier Chapelle. An e cient method for gradientbased adaptation of hyperparameters in svm models. In Advances in Neural Information Processing Systems (NIPS), pages 673{680, 2007. [OpenAIRE]

G. Kunapuli, K.P. Bennett, Jing Hu, and Jong-Shi Pang. Classi cation model selection via bilevel programming. Optimization Methods and Software, 23(4):475{489, August 2008. ISSN 1055-6788, 1029-4937. doi: 10.1080/10556780802102586. URL http://www. tandfonline.com/doi/abs/10.1080/10556780802102586. [OpenAIRE]

Dougal Maclaurin, David K. Duvenaud, and Ryan P. Adams. Gradient-based hyperparameter optimization through reversible learning. In Proceedings of the 32nd International Conference on Machine Learning, (ICML, pages 2113{2122, 2015. [OpenAIRE]

17 references, page 1 of 2
Related research
Abstract
In (Franceschi et al., 2018) we proposed a unified mathematical framework, grounded on bilevel programming, that encompasses gradient-based hyperparameter optimization and meta-learning. We formulated an approximate version of the problem where the inner objective is solved iteratively, and gave sufficient conditions ensuring convergence to the exact problem. In this work we show how to optimize learning rates, automatically weight the loss of single examples and learn hyper-representations with Far-HO, a software package based on the popular deep learning framework TensorFlow that allows to seamlessly tackle both HO and ML problems.
Subjects
free text keywords: Computer Science - Mathematical Software, Computer Science - Learning, Statistics - Machine Learning
Download from
17 references, page 1 of 2

Jonathan F. Bard. Practical bilevel optimization: algorithms and applications, volume 30. Springer Science & Business Media, 2013. 01251.

Jonathan Baxter. Learning internal representations. In Proceedings of the 8th Annual Conference on Computational Learning Theory (COLT), pages 311{320. ACM, 1995.

Rich Caruana. Multitask learning. In Learning to learn, pages 95{133. Springer, 1998. 02683. [OpenAIRE]

Beno^t Colson, Patrice Marcotte, and Gilles Savard. An overview of bilevel optimization. Annals of operations research, 153(1):235{256, 2007.

Nando de Freitas. Learning to Learn and Compositionality with Deep Recurrent Neural Networks: Learning to Learn and Compositionality. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. [OpenAIRE]

Justin Domke. Generic Methods for Optimization-Based Modeling. In AISTATS, volume 22, pages 318{326, 2012. URL http://www.jmlr.org/proceedings/papers/v22/domke12/ domke12.pdf.

Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, (ICML), pages 1126{1135, 2017. URL http://proceedings.mlr. press/v70/finn17a.html.

Remi Flamary, Alain Rakotomamonjy, and Gilles Gasso. Learning constrained task similarities in graph-regularized multi-task learning. Regularization, Optimization, Kernels, and Support Vector Machines, page 103, 2014.

Luca Franceschi, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. Forward and reverse gradient-based hyperparameter optimization. In Proceedings of the 34th International Conference on Machine Learning, (ICML), pages 1165{1173, 2017. URL http://proceedings.mlr.press/v70/franceschi17a.html. [OpenAIRE]

Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, and Massimiliano Pontil. Bilevel programming for hyperparameter optimization and meta-learning. In Proceedings of The 35rd International Conference on Machine Learning (ICML), 2018. [OpenAIRE]

Andreas Griewank and Andrea Walther. Evaluating derivatives: principles and techniques of algorithmic di erentiation. SIAM, 2008. [OpenAIRE]

Frank Hutter, Jrg Lcke, and Lars Schmidt-Thieme. Beyond Manual Tuning of Hyperparameters. KI - Kunstliche Intelligenz, 29(4):329{337, November 2015. ISSN 0933- 1875, 1610-1987. doi: 10.1007/s13218-015-0381-0. URL http://link.springer.com/ 10.1007/s13218-015-0381-0.

S Sathiya Keerthi, Vikas Sindhwani, and Olivier Chapelle. An e cient method for gradientbased adaptation of hyperparameters in svm models. In Advances in Neural Information Processing Systems (NIPS), pages 673{680, 2007. [OpenAIRE]

G. Kunapuli, K.P. Bennett, Jing Hu, and Jong-Shi Pang. Classi cation model selection via bilevel programming. Optimization Methods and Software, 23(4):475{489, August 2008. ISSN 1055-6788, 1029-4937. doi: 10.1080/10556780802102586. URL http://www. tandfonline.com/doi/abs/10.1080/10556780802102586. [OpenAIRE]

Dougal Maclaurin, David K. Duvenaud, and Ryan P. Adams. Gradient-based hyperparameter optimization through reversible learning. In Proceedings of the 32nd International Conference on Machine Learning, (ICML, pages 2113{2122, 2015. [OpenAIRE]

17 references, page 1 of 2
Related research
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue