How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

This article introduces Ant Group’s Ray‑based distributed agent framework Ragent, outlines its background, motivation, and design, and breaks down the four essential modules—Profile, Memory, Planning, and Action—that enable large‑language‑model agents to operate in real‑world scenarios.

DataFunSummit
DataFunSummit
DataFunSummit
How Ant Group’s Ray‑Powered Ragent Redefines LLM‑Based AI Agents

Background

Ray, the open‑source distributed framework originally built for large‑model training at OpenAI, has been adopted by Ant Group since its early days. Ant contributed over 26% of Ray’s core code, becoming the second‑largest contributor worldwide, and now runs more than 1.5 million CPU cores in production while maintaining the Ray community in China.

Since 2017 Ant’s Ray team launched the first business‑scene flow‑graph engine Geaflow in 2018. Between 2018 and 2022, leveraging Ray they built several engines such as Realtime, the open‑source Mobius engine, and the scientific‑computing engine Mars. They also pioneered a Multi‑Tenant architecture, which the Ray community only began to consider later.

In the 2023‑2024 era of large models, Ant delivered Unified AI Serving, a framework that unifies offline, online, AI inference, and AI deployment, serving as a core scenario for their 1.5 million‑core workload.

Motivation

To support the growing demand for large‑language‑model (LLM) agents in finance and other domains, Ant needed a scalable, Ray‑based framework that could orchestrate complex agent workflows, manage state, and integrate with external functions.

Design & Implementation

The Ragent framework is organized around four core modules that together constitute a functional LLM‑based agent:

Profile : Defines the agent’s persona and role, e.g., a gentle travel assistant that can manage itineraries and perform data analysis.

Memory : Split into Knowledge (domain and prior knowledge) and Experience (recorded dialogues, user queries, reasoning steps, and action outcomes) to enable continual learning and error avoidance.

Planning : Decomposes complex tasks into manageable subtasks using algorithms such as Chain‑of‑Thought or Tree‑of‑Thought, akin to flowcharts in software design.

Action : Executes tasks based on experience and plans. A key feature is Function Calling, allowing the model to invoke external services or even interact with physical devices like robotic arms.

These four modules form the essential building blocks of an LLM‑based agent in Ant’s Ragent framework.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsLLMRayAnt GroupRagent
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.