7 min read

Building a Dify‑Powered Multi‑Agent RAG AI Service with Chinese Large Models

After the New Year the author landed several AI contracts, delivering a six‑week knowledge‑base Q&A system and a two‑month AI customer‑service platform built with Dify, multi‑Agent workflows, RAG, and domestic large language models, cutting staff from fifteen to two and boosting development efficiency twofold.

SpringMeng

Mar 26, 2026

Building a Dify‑Powered Multi‑Agent RAG AI Service with Chinese Large Models

Project 1 – Knowledge‑base Q&A

Six‑week development using Spring Boot, Vue 3, Python FastAPI, MySQL, Elasticsearch, MinIO. Cost ¥50,000. Builds a personal AI knowledge base that answers user queries.

Project 2 – AI Customer Service

Two‑month development for an e‑commerce client, cost ¥80,000. Architecture: Dify private deployment, multi‑Agent workflow, Retrieval‑Augmented Generation (RAG), domestic large language models. Reduces required human agents from 15 to 2.

Dify Private Deployment

Chosen for visual workflow editor, native multi‑Agent support, built‑in RAG engine, ability to integrate Chinese models (Qwen, DeepSeek, GLM), on‑premises data privacy, and RESTful APIs.

Large Model Selection and Scheduling

Three certified domestic models are used:

Qwen‑Plus – primary model for pre‑sale Q&A, standard after‑sale support, knowledge‑base retrieval, speech polishing, promotion strategies. Covers >80% of daily scenarios. IFBench instruction compliance 76.5. Cost: input ¥0.8 / M tokens, output ¥4.8 / M tokens.

DeepSeek‑V3 – fallback for complex refund decisions, amount confirmation, transaction dispute arbitration, multi‑rule conflict handling. Hallucination rate 3.9 % (industry lowest). Cost: input ¥2 / M tokens, output ¥8 / M tokens.

GLM‑4‑Flash – free model for simple FAQs, greetings, auto‑reply, intent classification. Handles 40‑60 % of simple queries at zero cost.

Model scheduling logic (configured in Dify workflow nodes):

All requests first pass through GLM‑4‑Flash for intent classification and simple‑question routing (zero cost).

Standard customer‑service scenarios are handled by Qwen‑Plus (high instruction compliance, very low cost).

High‑accuracy scenarios such as amount confirmation and dispute resolution are escalated to DeepSeek‑V3.

All models are accessed through a single Alibaba Cloud Bailei API key.

Knowledge‑Base Technical Solution

Dify’s built‑in RAG engine is the core. Documents (PDF, Word, Markdown, CSV) are parsed, automatically segmented, and optionally manually tuned. A vector database (Weaviate or Qdrant) stores embeddings generated by the text‑embedding‑v4 model (Qwen). Retrieval uses hybrid search (vector + keyword) followed by a reranker model to improve hit precision.

Multi‑Agent Design

Router Agent (master scheduler) – receives user messages, classifies intent, dispatches to appropriate agent. Uses GLM‑4‑Flash.

Pre‑sale Consultation Agent – product Q&A, promotion strategies, high‑intent recognition. Uses Qwen‑Plus and product knowledge base.

Post‑sale Service Agent – order lookup, progress inquiry, standard issue handling. Uses Qwen‑Plus with service rule base, Order API, Ticket API.

Refund Handling Agent – refund rule judgment, solution suggestion, refusal script generation. Uses DeepSeek‑V3 with refund rule base, Order API, Approval API.

Exception Handling Agent – abnormal order detection, ticket creation, responsible‑person notification. Uses Qwen‑Plus with Order API, Ticket API, Webhook.

Speech Polishing Agent – reply refinement, tone adjustment, sensitive‑word filtering. Uses Qwen‑Plus with speech template library and sensitive‑word library.

RAG Dify Multi-agent AI Customer Service Model Scheduling Chinese LLM

Written by

SpringMeng

Focused on software development, sharing source code and tutorials for various systems.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Project 1 – Knowledge‑base Q&A

Project 2 – AI Customer Service

Dify Private Deployment

Large Model Selection and Scheduling

Knowledge‑Base Technical Solution

Multi‑Agent Design

SpringMeng

How this landed with the community

Was this worth your time?

0 Comments

Project 1 – Knowledge‑base Q&A

Project 2 – AI Customer Service