Artificial Intelligence 5 min read

Extending APEX for Real Distributed Reinforcement Learning with tf2rl

The article examines the limitations of the single‑machine APEX framework in the tf2rl reinforcement‑learning library, proposes a cross‑machine distributed architecture using middleware such as Redis, compares alternative frameworks like EasyRL, and outlines expected performance gains and future development plans.

360 Quality & Efficiency
360 Quality & Efficiency
360 Quality & Efficiency
Extending APEX for Real Distributed Reinforcement Learning with tf2rl

Google DeepMind’s open‑source reinforcement‑learning library tf2rl (https://github.com/keiohta/tf2rl) implements most mainstream algorithms, with its off‑policy methods relying on APEX for distributed training. However, the existing APEX implementation is limited to a single‑machine multi‑explorer mode and cannot scale across multiple machines, which is needed to break the resource bottleneck of a single host and increase the number of Actors.

In reinforcement learning, convergence speed is largely constrained by the rate at which Actors generate samples. Simulated environments (e.g., OpenAI gym) can produce hundreds of samples per second per Actor, but production projects often cannot achieve this, making the addition of more Actors a crucial way to accelerate training.

The APEX framework is simple and can be applied to any off‑policy algorithm, covering discrete‑action methods such as DQN (including DDQN, Prioritized DQN, Duel DQN, Distributed DQN, Noisy DQN) and continuous‑action policy‑gradient methods like DDPG (including TD3, BiResDDPG) and SAC. Its code structure uses multiprocessing.Process to launch all Actor nodes, a shared global_rb object protected by a Lock for sample storage, and a dedicated Queue per Actor for parameter synchronization.

To transform APEX into a truly distributed system, the inter‑process communication can be replaced with standard distributed middleware such as message queues or databases. In the author's organization, Redis Pub/Sub is used for this purpose; although it may lose messages when subscribers start early, the impact on long‑running jobs is minimal.

Other distributed training frameworks are considered. Alibaba’s EasyRL, built on TensorFlow’s Parameter Server (PS) architecture, offers multi‑Learner nodes without external middleware, but was not adopted due to concerns about update frequency. Real Distributed APEX can also leverage TensorFlow’s ParameterServerStrategy to achieve similar multi‑Learner concurrency.

Experimental results show that increasing the number of APEX Actors significantly improves training convergence speed. The article concludes with a plan to publish the concrete code implementation later and to continue exploring reinforcement‑learning‑based intelligent testing, inviting interested collaborators to join.

TensorFlowreinforcement learningdistributed trainingAPEXoff-policytf2rl
360 Quality & Efficiency
Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.