Artificial Intelligence 18 min read

Machine Learning Methods for Solving Combinatorial Optimization Problems

This article reviews recent advances in applying machine learning—especially attention mechanisms, graph neural networks, and reinforcement learning—to combinatorial optimization, outlines fundamental problem definitions, classic algorithms, modern ML‑based approaches, experimental results, and future research directions.

DataFunSummit

Nov 14, 2022

Machine Learning Methods for Solving Combinatorial Optimization Problems

Reading: Since 2015, the field of machine learning methods for solving combinatorial optimization (ML+CO) has made great progress, with two main research lines: supervised learning and reinforcement learning. The goal is to design ML algorithm frameworks that can solve multiple combinatorial optimization problems.

Today's presentation covers four points:

Combinatorial optimization problems

Machine learning methods

ML algorithms for solving combinatorial optimization

Future research directions

Guest speaker: Prof. Cai Qingqiong, Associate Professor, School of Computer Science, Nankai University.

01 Combinatorial Optimization Problems

Combinatorial Optimization Problems (COP) are discrete‑state optimization problems with many real‑world applications such as communication networks, chip design, flight routing, and data‑center management.

The Traveling Salesman Problem (TSP) is a classic example: given a set of cities, find a tour that visits each city exactly once and returns to the start with minimal total distance or cost.

1. Mathematical Model of COP

Combinatorial optimization can be expressed as a discrete optimization model. For example, the TSP on a complete graph G = (V, E) with vertex set V = {0,1,…,n‑1} and edge weights ω: E → ℚ seeks a permutation that minimizes the sum of adjacent edge weights.

2. Classification of COP

According to computational complexity theory, problems are classified as P, NP, NP‑complete (NPC), and NP‑hard:

P: solvable in polynomial time by deterministic algorithms.

NP: solutions can be verified in polynomial time.

NPC: an NP problem to which all other NP problems can be reduced in polynomial time.

NP‑hard: at least as hard as NP problems, but not necessarily in NP.

Many COPs are NP‑hard, such as TSP, Minimum Vertex Cover (MVC), etc.

3. Basic COP Examples

Knapsack Problem (KP) : select items with given weights and values to maximize total value without exceeding capacity.

Traveling Salesman Problem (TSP) : find the shortest Hamiltonian circuit visiting each city once.

Vehicle Routing Problem (VRP) : design routes for a fleet to satisfy customer demands while minimizing distance or cost.

Minimum Vertex Cover (MVC) : find the smallest set of vertices covering all edges.

Maximum Independent Set (MIS) : find the largest set of mutually non‑adjacent vertices.

4. Traditional Algorithms for COP

Exact, approximation, and heuristic algorithms each have strengths and share three common challenges: (1) algorithms designed for one COP rarely transfer to others; (2) each instance requires re‑running the algorithm from scratch; (3) problem scales keep growing, making manual algorithm design increasingly difficult.

4 Machine Learning for Combinatorial Optimization (ML+CO)

Recent successes of machine learning and deep learning motivate designing ML frameworks to address the three challenges of traditional COP algorithms. Advantages of ML for COP include:

Automatic learning of patterns from large data, providing approximate solutions without extensive hand‑crafted heuristics.

Potential to discover useful properties that are hard for human designers to find.

Ability to generalize across multiple COPs (e.g., S2V‑DQN solves TSP, MVC, MaxCut), whereas classic algorithms are usually problem‑specific.

ML methods for COP date back to the 1980s (Hopfield networks). In the past five‑six years, two main streams have emerged: supervised learning and reinforcement learning, as illustrated below.

Machine Learning Methods

To apply ML to COP we need basic techniques such as attention mechanisms, graph neural networks (GNN), and reinforcement learning.

1. Attention Mechanism

2. Graph Neural Networks

3. (Deep) Reinforcement Learning

End‑to‑End vs. Hybrid Approaches

End‑to‑end models directly map a COP instance to a solution, offering fast inference and strong generalization but no guarantee of optimality. Hybrid methods embed ML components (e.g., branch‑and‑bound node selection) within traditional solvers, achieving better solution quality at the cost of slower inference.

Algorithms Based on Attention

2015: Vinyals et al. introduced the Pointer Network (Algorithm A1) using supervised learning. It generalizes well and runs fast, but solution quality cannot exceed that of the training samples.

2017: Bello et al. proposed Algorithm A2, training the Pointer Network with reinforcement learning (REINFORCE) and a critic baseline to reduce variance.

2019: Kool et al. presented Algorithm A3, adding a rollout baseline that selects the best historical policy as a reference, yielding superior performance close to exact solvers.

Experimental results (inference time only) show Algorithm A3 outperforms A1 and A2 and approaches the performance of traditional solvers.

Algorithms Based on Graph Neural Networks

2018: Li et al. introduced Algorithm B1 (S2V‑DQN) using graph convolutional networks and guided tree search.

2017: Dai et al. proposed Algorithm B2 (Learning combinatorial optimization algorithms over graphs).

Visualization of S2V‑DQN solving an MVC instance shows the model progressively selects high‑degree vertices while preserving graph connectivity, achieving better coverage than naive heuristics.

Future Research Directions

Improve model performance: solution quality, solving speed, and generalization; better integration of attention and GNN; tighter coupling of ML with traditional OR algorithms; develop more effective reinforcement‑learning strategies.

Solve more complex COPs: multi‑objective, multi‑constraint, and dynamically changing problems.

Enhance model interpretability.

Q&A

Q: How to choose a reinforcement‑learning strategy?

A: There is no single answer; it depends on the specific problem. Practitioners should try different strategies and adopt the one that works best for their case.

Thank you for attending.

About DataFun – A media platform focused on big data and AI applications, founded in 2017, with over 100 offline and online events, 2000+ expert speakers, and 800+ original articles.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Algorithms Attention Mechanism combinatorial optimization

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.