Show HN: Per-instance TSP Solver with No Pre-training (1.66% gap on d1291)
OP here. Most Deep Learning approaches for TSP rely on pre-training with large-scale datasets. I wanted to see if a solver could learn "on the fly" for a specific instance without any priors from other problems. I built a solver using PPO that learns from scratch per instance. It achieved a 1.66% gap on TSPLIB d1291 in about 5.6 hours on a single A100. The Core Idea: My hypothesis was that while optimal solutions are mostly composed of 'minimum edges' (nearest neighbors), the actual difficulty comes from a small number of 'exception edges' outside of that local scope. Instead of pre-training, I designed an inductive bias based on the topological/geometric structure of these exception edges. The agent receives guides on which edges are likely promising based on micro/macro structures, and PPO fills in the gaps through trial and error. It is interesting to see RL reach this level without a dataset. I have open-sourced the code and a Colab notebook for anyone who wants to verify the results or tinker with the 'exception edge' hypothesis. Code & Colab: https://github.com/jivaprime/TSP_exception-edge Happy to answer any questions about the geometric priors or the PPO implementation! Comments URL: https://news.ycombinator.com/item?id=46420670 Points: 12 # Comments: 2
OP here.
Most Deep Learning approaches for TSP rely on pre-training with large-scale datasets. I wanted to see if a solver could learn "on the fly" for a specific instance without any priors from other problems.
I built a solver using PPO that learns from scratch per instance. It achieved a 1.66% gap on TSPLIB d1291 in about 5.6 hours on a single A100.
The Core Idea: My hypothesis was that while optimal solutions are mostly composed of 'minimum edges' (nearest neighbors), the actual difficulty comes from a small number of 'exception edges' outside of that local scope.
Instead of pre-training, I designed an inductive bias based on the topological/geometric structure of these exception edges. The agent receives guides on which edges are likely promising based on micro/macro structures, and PPO fills in the gaps through trial and error.
It is interesting to see RL reach this level without a dataset. I have open-sourced the code and a Colab notebook for anyone who wants to verify the results or tinker with the 'exception edge' hypothesis.
Code & Colab: