Artificial Intelligence 9 min read

Unleashing Game AI: Inside NetEase’s Bray Distributed RL Framework

NetEase’s AI team reveals how their self‑developed distributed reinforcement‑learning platform, Bray, enables high‑level AI agents for the MOBA game Dream of Kingdom 2, covering GameCore integration, weighted random initialization, modular APIs, difficulty scaling, and cost‑effective training for realistic player experiences.

NetEase Smart Enterprise Tech+

Oct 19, 2023

Unleashing Game AI: Inside NetEase’s Bray Distributed RL Framework

NetEase’s AI division previously introduced a brief overview of the AI capabilities behind the game Dream of Kingdom 2 . In a new three‑article series they now explain in depth which AI “black‑tech” from NetEase powers this Asian Games‑level title.

Games are an ideal testbed for AI deployment, with notable examples such as AlphaStar and OpenAI Five. However, domestic Chinese game AI applications have lagged behind. NetEase aims to change this by delivering a commercial‑grade AI solution for a large‑scale MOBA.

GameCore is the term NetEase uses for the game environment that connects the client to the AI server. It adds communication interfaces so that game state is sent to the AI, the AI returns a decision, and the game executes it. Because training requires many parallel GameCore instances, a large‑scale distributed training framework is essential.

To meet this need NetEase built Bray , a self‑developed distributed reinforcement‑learning framework that combines training and inference. Bray simplifies the transition from training to deployment, standardizes the AI integration workflow, and offers a modular API with clearly defined Actor, Model, Buffer, and Trainer components.

Training challenges include the sparse‑reward nature of Dream of Kingdom 2 . NetEase first applied completely random state initialization to increase early‑stage useful samples. To further improve efficiency they introduced weighted random initialization : a scoring function evaluates each initial state, stores low‑convergence states in a buffer, and samples from this buffer proportionally to the score, focusing the agent on under‑explored situations.

For diverse strategic styles, NetEase adds a coefficient to every reward component that influences style. At the start of each episode a new set of style coefficients is sampled and fed into the neural network, allowing the model to learn mappings between coefficients and resulting strategies, thus generating multiple distinct play styles without redesigning reward functions.

Difficulty scaling is achieved by injecting hierarchical noise into the network inputs (affecting macro‑level judgment) and adding delayed, perturbed outputs (affecting micro‑operations). Models of varying difficulty are evaluated online against real players, and the version that stabilizes at a target ladder rank is selected as the difficulty‑specific agent.

Future work will expand hero composition to cover all archetypes (single‑target control, area control, healer, assassin, tank, fighter, marksman) and ensure tactical diversity throughout the match, avoiding early‑ or late‑game weaknesses. Planned match modes range from 1v1 lane fights to full 5v5 team battles with objectives such as tower destruction and map control.

Key benefits of this approach include AI agents that can “lose gracefully” to maintain fair competition, skill‑matched bots that provide realistic challenges, and a dramatically reduced hardware cost for training thanks to Bray’s efficient distributed design.

reinforcement learning Distributed Training AI Framework Game AI MoBA

Written by

NetEase Smart Enterprise Tech+

Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.