Artificial Intelligence 5 min read

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

This article introduces Ant Group’s Ray‑based distributed agent framework Ragent, outlines its background, motivation, and design, and details the four essential modules—Profile, Memory, Planning, and Action—that power large‑language‑model agents in large‑scale AI serving.

DataFunTalk

Sep 5, 2025

Inside Ant Group’s Ragent: Building Scalable AI Agents on Ray

Introduction

This article shares Ant Group’s latest Ray‑based distributed Agent framework, Ragent.

Main Content

Ray is the underlying distributed framework used by OpenAI for large‑model training. Ant Group joined Ray early, contributed over 26% of its core code (the second‑largest contributor globally), and now runs more than 1.5 million CPU cores while operating the Ray community in China.

Ant’s Ray team was founded in 2017; in 2018 it released the first business‑scenario flow‑graph engine Geaflow. Between 2018 and 2022, Ant built several Ray‑based engines such as Realtime, the open‑source Mobius engine, and the inference/scientific‑computing engine Mars. Ant also contributed a Multi‑Tenant architecture, which the Ray community only began to consider recently.

During the 2023‑2024 large‑model era, Ant completed Unified AI Serving in the US, integrating offline, online, inference, and deployment into a single framework that powers one of its core 1.5 million‑core workloads. The latest work, the AI Agent framework built on Ray, is presented in three parts: background, motivation, and design & implementation.

Agent Architecture

LLM‑based agents typically require four core modules:

Profile : defines the agent’s persona, e.g., a gentle travel assistant that handles travel management, data analysis, and related tasks.

Memory : includes Knowledge (domain and prior knowledge) and Experience (records of past dialogues, user queries, reasoning steps, and action outcomes) to improve future behavior.

Planning : breaks complex tasks into manageable sub‑tasks using algorithms such as Chain‑of‑Thought or Tree‑of‑Thought.

Action : executes real‑world tasks based on experience and plans; features function calling to invoke external services or interact with physical devices like robotic arms.

These four modules constitute the core components of an LLM‑based agent, as implemented in Ant Group’s Ragent framework.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems AI Agents LLM Ray Ant Group

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.