publication . Preprint . Conference object . 2017

Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space

Hernández-Lobato, José Miguel; Requeima, James; Pyzer-Knapp, Edward O.; Aspuru-Guzik, Alan;
Open Access English
  • Published: 06 Jun 2017
  • Country: United States
Abstract
Chemical space is so large that brute force searches for new interesting molecules are infeasible. High-throughput virtual screening via computer cluster simulations can speed up the discovery process by collecting very large amounts of data in parallel, e.g., up to hundreds or thousands of parallel measurements. Bayesian optimization (BO) can produce additional acceleration by sequentially identifying the most useful simulations or experiments to be performed next. However, current BO methods cannot scale to the large numbers of parallel measurements and the massive libraries of molecules currently used in high-throughput screening. Here, we propose a scalable ...
Subjects
free text keywords: Statistics - Machine Learning
41 references, page 1 of 3

Azimi, Javad, Fern, Alan, and Fern, Xiaoli Z. Batch Bayesian optimization via simulation matching. In NIPS, pp. 109-117, 2010.

Becke, Axel D. Density-functional thermochemistry. III. The role of exact exchange. The Journal of Chemical Physics, 98(7): 5648, 1993.

Blundell, Charles, Cornebise, Julien, Kavukcuoglu, Koray, and Wierstra, Daan. Weight uncertainty in neural networks. In ICML, pp. 1613-1622, 2015. [OpenAIRE]

Bousquet, Olivier and Bottou, Le´on. The tradeoffs of large scale learning. In NIPS, pp. 161-168, 2008.

Chapelle, Olivier and Li, Lihong. An empirical evaluation of Thompson sampling. In NIPS, pp. 2249-2257. 2011.

Chevalier, Cle´ment and Ginsbourger, David. Fast computation of the multi-points expected improvement with applications in batch selection. In International Conference on Learning and Intelligent Optimization, pp. 59-69. Springer, 2013. [OpenAIRE]

Contal, Emile, Buffoni, David, Robicquet, Alexandre, and Vayatis, Nicolas. Parallel Gaussian process optimization with upper confidence bound and pure exploration. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 225-240. Springer, 2013. [OpenAIRE]

Dennler, G., Scharber, M. C., Ameri, T., Denk, P., Forberich, K., Waldauf, C., and Brabec, C. J. Design rules for donors in bulk-heterojunction tandem solar cells? towards 15% energyconversion efficiency. Adv. Mater., 20(3):579-583, feb 2008. [OpenAIRE]

Desautels, Thomas, Krause, Andreas, and Burdick, Joel W. Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. Journal of Machine Learning Research, 15(1):3873-3923, 2014.

Gal, Yarin and Ghahramani, Zoubin. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML, pp. 1050-1059, 2016.

Ginsbourger, David, Le Riche, Rodolphe, and Carraro, Laurent. Kriging is well-suited to parallelize optimization. In Computational Intelligence in Expensive Optimization Problems, pp. 131-162. Springer, 2010. [OpenAIRE]

Ginsbourger, David, Janusevskis, Janis, and Le Riche, Rodolphe. Dealing with asynchronicity in parallel Gaussian process based global optimization. In 4th International Conference of the ERCIM WG on computing & statistics (ERCIM'11), 2011.

Ha, Dong-Gwang, Wu, Tony, Markopoulos, Georgios, Jeon, Soonok, Kang, Hosuk, Miyazaki, Hiroshi, Numata, Masaki, Kim, Sunghan, Huang, Wenliang, Hong, Seong Ik, Baldo, Marc, Adams, Ryan P., and Aspuru-Guzik, Ala´n. Design of efficient molecular organic light-emitting diodes by a highthroughput virtual screening and experimental approach. Nature Materials, aug 2016.

Gonzlez, J., Dai, Z., Hennig, P., and Lawrence, N. Batch Bayesian optimization via local penalization. In AISTATS, pp. 648-657, 2016.

Hachmann, Johannes, Olivares-Amaya, Roberto, Atahan-Evrenk, Sule, Amador-Bedolla, Carlos, Sanchez-Carrera, Roel S., Gold-Parker, Aryeh, Vogt, Leslie, Brockway, Anna M., and Aspuru-Guzik, Alan. The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid. J. Phys. Chem. Lett., 2(17):2241-2251, sep 2011.

41 references, page 1 of 3
Abstract
Chemical space is so large that brute force searches for new interesting molecules are infeasible. High-throughput virtual screening via computer cluster simulations can speed up the discovery process by collecting very large amounts of data in parallel, e.g., up to hundreds or thousands of parallel measurements. Bayesian optimization (BO) can produce additional acceleration by sequentially identifying the most useful simulations or experiments to be performed next. However, current BO methods cannot scale to the large numbers of parallel measurements and the massive libraries of molecules currently used in high-throughput screening. Here, we propose a scalable ...
Subjects
free text keywords: Statistics - Machine Learning
41 references, page 1 of 3

Azimi, Javad, Fern, Alan, and Fern, Xiaoli Z. Batch Bayesian optimization via simulation matching. In NIPS, pp. 109-117, 2010.

Becke, Axel D. Density-functional thermochemistry. III. The role of exact exchange. The Journal of Chemical Physics, 98(7): 5648, 1993.

Blundell, Charles, Cornebise, Julien, Kavukcuoglu, Koray, and Wierstra, Daan. Weight uncertainty in neural networks. In ICML, pp. 1613-1622, 2015. [OpenAIRE]

Bousquet, Olivier and Bottou, Le´on. The tradeoffs of large scale learning. In NIPS, pp. 161-168, 2008.

Chapelle, Olivier and Li, Lihong. An empirical evaluation of Thompson sampling. In NIPS, pp. 2249-2257. 2011.

Chevalier, Cle´ment and Ginsbourger, David. Fast computation of the multi-points expected improvement with applications in batch selection. In International Conference on Learning and Intelligent Optimization, pp. 59-69. Springer, 2013. [OpenAIRE]

Contal, Emile, Buffoni, David, Robicquet, Alexandre, and Vayatis, Nicolas. Parallel Gaussian process optimization with upper confidence bound and pure exploration. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 225-240. Springer, 2013. [OpenAIRE]

Dennler, G., Scharber, M. C., Ameri, T., Denk, P., Forberich, K., Waldauf, C., and Brabec, C. J. Design rules for donors in bulk-heterojunction tandem solar cells? towards 15% energyconversion efficiency. Adv. Mater., 20(3):579-583, feb 2008. [OpenAIRE]

Desautels, Thomas, Krause, Andreas, and Burdick, Joel W. Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. Journal of Machine Learning Research, 15(1):3873-3923, 2014.

Gal, Yarin and Ghahramani, Zoubin. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In ICML, pp. 1050-1059, 2016.

Ginsbourger, David, Le Riche, Rodolphe, and Carraro, Laurent. Kriging is well-suited to parallelize optimization. In Computational Intelligence in Expensive Optimization Problems, pp. 131-162. Springer, 2010. [OpenAIRE]

Ginsbourger, David, Janusevskis, Janis, and Le Riche, Rodolphe. Dealing with asynchronicity in parallel Gaussian process based global optimization. In 4th International Conference of the ERCIM WG on computing & statistics (ERCIM'11), 2011.

Ha, Dong-Gwang, Wu, Tony, Markopoulos, Georgios, Jeon, Soonok, Kang, Hosuk, Miyazaki, Hiroshi, Numata, Masaki, Kim, Sunghan, Huang, Wenliang, Hong, Seong Ik, Baldo, Marc, Adams, Ryan P., and Aspuru-Guzik, Ala´n. Design of efficient molecular organic light-emitting diodes by a highthroughput virtual screening and experimental approach. Nature Materials, aug 2016.

Gonzlez, J., Dai, Z., Hennig, P., and Lawrence, N. Batch Bayesian optimization via local penalization. In AISTATS, pp. 648-657, 2016.

Hachmann, Johannes, Olivares-Amaya, Roberto, Atahan-Evrenk, Sule, Amador-Bedolla, Carlos, Sanchez-Carrera, Roel S., Gold-Parker, Aryeh, Vogt, Leslie, Brockway, Anna M., and Aspuru-Guzik, Alan. The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid. J. Phys. Chem. Lett., 2(17):2241-2251, sep 2011.

41 references, page 1 of 3
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue