How Agentic RAG‑R1 Turns Retrieval‑Augmented Generation into an Autonomous AI Agent

Agentic RAG‑R1, an open‑source project from Peking University, combines Retrieval‑Augmented Generation with an agentic AI loop, introduces the GRPO reinforcement‑learning optimizer, supports LoRA‑based fine‑tuning, quantization and multimodal tool calls, and demonstrates significant accuracy gains on the MedQA benchmark across both Chinese and English test sets.

Architect
Architect
Architect
How Agentic RAG‑R1 Turns Retrieval‑Augmented Generation into an Autonomous AI Agent

Background

Agentic RAG‑R1 is an open‑source research project from Peking University that extends Retrieval‑Augmented Generation (RAG) with an agentic decision layer. The model can decide when to retrieve, what to retrieve, and how to incorporate retrieved evidence into its reasoning chain.

Core Highlights

Agentic RAG architecture merges RAG with an agentic AI mechanism, allowing the model to choose how to generate answers.

Generalized Relevance Policy Optimization (GRPO) uses reinforcement learning to reward trajectories that are highly relevant, accurate, and well‑formatted.

Multi‑turn reasoning and back‑track capabilities enable iterative "search‑think‑search‑think" workflows.

LoRA fine‑tuning combined with NF4 quantization reduces trainable parameters to ~10 % and stores weights in int‑4, lowering GPU memory requirements.

Rich reward signals (format, correctness, RAG performance) guide the model toward business‑aware behavior.

GitHub repository: https://github.com/jiangxinke/Agentic-RAG-R1

Architecture diagram
Architecture diagram

Why an Agentic RAG?

Fact: Traditional RAG mitigates hallucination by external retrieval but still relies on handcrafted prompts to decide when to retrieve.

Context explosion: More retrieved passages increase prompt length, diluting key information.

Multi‑hop reasoning: Complex tasks require iterative "search‑think" cycles that a single retrieval cannot cover.

Decision Process

Whether to retrieve? The agent can skip irrelevant retrieval calls, improving efficiency.

What to retrieve? The model selects relevant documents without manual prompt engineering.

How to cite? Retrieved evidence is automatically woven into the reasoning chain.

Architecture Overview

The system follows a TC‑RAG (Tool‑Call‑RAG) loop where the agent decides actions such as reasoning, back‑track, summarization, and tool observation. Each step can be validated and logged.

Agentic loop diagram
Agentic loop diagram

Key Technical Components

GRPO: Samples multiple reasoning‑retrieval trajectories and assigns positive rewards to paths that are highly relevant, accurate, and well‑formatted, leading to stable and fast convergence.

LoRA + NF4 quantization: Only ~10 % of parameters are trainable; int‑4 storage reduces memory footprint.

DeepSpeed Zero‑3: Splits weights and optimizer states across CPU/NVMe, enabling efficient training of a 32B model on a 3×A100 setup.

Multimodal tool interface: Supports text, code, database, and REST‑API calls, allowing the model to act in real‑world workflows.

Reward formula (simplified): r_total = r_rag + other_rewards, where r_rag is automatically evaluated by RAGAS for effective citation of retrieved fragments.

Experimental Results

Evaluation on the MedQA bilingual benchmark (judge model: Qwen‑2.5‑72B) shows substantial improvements:

Format accuracy increased from 39 % (baseline) to 92 % after fine‑tuning + retrieval (+53 %).

Answer accuracy rose from 84 % to 87 % (+3 %).

Both Chinese and English subsets improved markedly.

Complex multi‑hop reasoning accuracy improved by >8 %.

Tool‑call success rate exceeded 95 % with full traceability.

FAQ

Q1: Must I use a 32B model? No. The default configuration uses Qwen‑2.5‑7B‑Instruct; you can switch to Llama‑3‑8B or Baichuan‑13B by editing the config file.
Q2: Is RL training complicated? The training script follows standard LoRA parameters; you only need to add a reward configuration. DeepSpeed Zero‑3 with offloading handles limited GPU memory.

Code example

相关阅读:
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large language modelsopen sourceRetrieval Augmented Generationreinforcement learningagentic AILLM Tool Use
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.