Deep Reinforcement Learning for Online Resource Allocation in Network Slicing

This article presents a dynamic RAN slicing model and an online PW‑DRL approach that combines deep learning, reinforcement learning, and Lyapunov optimization to allocate resources adaptively, detailing a four‑step decision process, LSTM/CNN predictions, and experimental results showing improved transmission rates and acceptance ratios across DTT, DS, and TO slices.

Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Deep Reinforcement Learning for Online Resource Allocation in Network Slicing

Background Network slicing is a versatile technology for wireless communication systems, involving RAN, transport, and core network slices. RAN slicing connects mobile devices, maps virtual resource blocks to slices, and manages physical link status, requiring efficient, adaptive resource allocation to meet diverse QoS demands.

Main Contributions

Proposes a dynamic RAN slicing model that incorporates multiple distributions to handle different user request types and traffic priorities, with total available resources varying over time.

Formulates resource allocation as a time‑series dynamic optimization problem considering system stability, resource constraints, multi‑time‑scale dynamics, long‑term performance, and user priorities.

Introduces an online PW‑DRL method that leverages deep learning, deep reinforcement learning, and optimization; uses TRPO to capture environmental dynamics, Lyapunov equations for long‑term stability, and a predictive network to capture correlations between current and future states.

Problem Modeling and Solution

The study considers a downlink OFDMA network with base stations. Three slice types are defined: throughput‑oriented (TO), delay‑sensitive (DS), and delay‑throughput‑tolerant (DTT), each serving U<sub>TO</sub>, U<sub>DS</sub>, and U<sub>DTT</sub> users respectively.

The proposed solution consists of four main steps plus an auxiliary replay buffer:

When a DTT user generates a required transmission rate r(t) at TTI t, predict the rate r(t+1) for the next TTI.

The DRL network reads all users' rate requests r* (t<sub>m</sub>) and the prediction from step 1, then outputs a resource allocation decision α(t<sub>m</sub>).

Using α(t<sub>m</sub>), Lyapunov optimization computes the transmission power p(t<sub>m</sub>) that satisfies constraints.

Based on a modeling formula, compute N(t<sub>m</sub>); with known p(t<sub>m</sub>) and α(t<sub>m</sub>), update the virtual queue of the Lyapunov optimizer, then calculate the reward R(t<sub>m</sub>) to guide DRL training. The replay buffer stores state‑transition and reward tuples for iterative DRL updates.

For the prediction in step 1, an LSTM infers the next‑TTI transmission‑rate request and the next DS arrival rate λ, while a CNN estimates the current‑TTI λ.

Experimental Results

Figures compare transmission rates for DTT, DS, and TO slices, show the acceptance rate of the PW‑DRL method, and illustrate training reward evolution over 500 resource units.

The results demonstrate that the PW‑DRL framework achieves higher transmission rates and acceptance ratios compared with baseline methods across all three slice categories, confirming its effectiveness for online resource allocation in dynamic RAN environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

deep reinforcement learningresource allocationNetwork SlicingRANOFDMALyapunov optimization
Network Intelligence Research Center (NIRC)
Written by

Network Intelligence Research Center (NIRC)

NIRC is based on the National Key Laboratory of Network and Switching Technology at Beijing University of Posts and Telecommunications. It has built a technology matrix across four AI domains—intelligent cloud networking, natural language processing, computer vision, and machine learning systems—dedicated to solving real‑world problems, creating top‑tier systems, publishing high‑impact papers, and contributing significantly to the rapid advancement of China's network technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.