Reinforcement Learning Based Neural Architecture Search: Methods and Advances
This article reviews reinforcement‑learning‑driven neural architecture search, covering layer‑based, block‑based, and connection‑based strategies, as well as advanced techniques such as inverse reinforcement learning, graph hyper‑networks, Monte‑Carlo tree search, and knowledge‑distillation‑based model compression.
Neural architecture search (NAS) using reinforcement learning treats architecture generation as a sequential decision‑making problem where an agent selects actions (e.g., layer types, connections) and receives rewards based on validation performance. Recent breakthroughs have applied this paradigm to various search spaces, including layer‑level, block‑level, and connection‑level designs.
1. Basic search methods
1.1 Layer‑based search : Early works such as NASNet and MetaQNN use a recurrent neural network (RNN) controller to predict cells (Normal and Reduction) that are stacked to form scalable networks. The controller iteratively updates its policy by rewarding architectures that achieve higher accuracy on a validation set.
1.2 Block‑based search : Block‑QNN and Faster Block‑QNN encode network blocks with a Network Structure Code (NSC) consisting of layer index, operation type, kernel size, and predecessor indices. A Q‑learning agent samples NSC vectors, assembles blocks, and evaluates them, while a distributed asynchronous framework accelerates training across multiple GPUs.
1.3 Connection‑based search : Methods like MaskConnect and IRLAS learn binary masks or mirror reward functions to discover novel connectivity patterns beyond traditional sequential stacking, encouraging topologies similar to expert‑designed networks such as ResNet.
2. Advanced search methods
2.1 Inverse reinforcement learning (IRL) : IRLAS treats architecture design as an imitation‑learning problem, extracting reward functions from expert networks and guiding the agent to generate topologically similar architectures.
2.2 Graph Hyper‑Network (GHN) : GHN combines a graph neural network with a hyper‑network to directly predict weights for a given architecture, dramatically reducing the inner‑loop training cost of NAS.
2.3 Monte‑Carlo Tree Search (MCTS) : AlphaX integrates MCTS with a meta‑deep neural network to explore large search spaces efficiently, using selection, expansion, simulation, and back‑propagation steps to balance exploration and exploitation.
2.4 Knowledge distillation (teacher‑student learning) : N2N Learning employs reinforcement learning to compress a large teacher model into a smaller student model by learning optimal layer‑removal and layer‑shrink actions, optimizing a reward that balances compression ratio and accuracy.
Overall, the surveyed techniques illustrate how reinforcement learning and related optimization frameworks enable automated, scalable, and high‑performing neural network design, while also addressing challenges such as computational cost, generalization across datasets, and model compression.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.