Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

descriptionPublicationkeyboard_double_arrow_right Article , Other literature type 23 Jul 2021 Netherlands English Publisher:Springer Science and Business Media LLCJournal:SN Computer Science, volume 2 (issn: 2662-995X, eissn: 2661-8907,

Copyright policy )Funded by:NWO | Real-time data-driven mai...

Authors: Paulo da Costa 0001; Jason Rhuggenaath; Yingqian Zhang 0001; Alp Akcay; Uzay Kaymak;

doi: 10.1007/s42979-021-00779-2

Learning 2-Opt Heuristics for Routing Problems via Deep Reinforcement Learning

- Summary
- Subjects
- Metrics

Abstract

AbstractRecent works using deep learning to solve routing problems such as the traveling salesman problem (TSP) have focused on learning construction heuristics. Such approaches find good quality solutions but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which can be easily extended to more generalk-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions faster than previous state-of-the-art deep learning methods for the TSP. We also show we can adapt the proposed method to two extensions of the TSP: the multiple TSP and the Vehicle Routing Problem, achieving results on par with classical heuristics and learned methods.

Country

Netherlands

Related Organizations

View all View all

Keywords

Deep reinforcement learning, Combinatorial optimization, Travelling salesman problem, Vehicle routing problem

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	67
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%