Building a Dify‑Powered Multi‑Agent RAG AI Service with Chinese Large Models
After the New Year the author landed several AI contracts, delivering a six‑week knowledge‑base Q&A system and a two‑month AI customer‑service platform built with Dify, multi‑Agent workflows, RAG, and domestic large language models, cutting staff from fifteen to two and boosting development efficiency twofold.
Project 1 – Knowledge‑base Q&A
Six‑week development using Spring Boot, Vue 3, Python FastAPI, MySQL, Elasticsearch, MinIO. Cost ¥50,000. Builds a personal AI knowledge base that answers user queries.
Project 2 – AI Customer Service
Two‑month development for an e‑commerce client, cost ¥80,000. Architecture: Dify private deployment, multi‑Agent workflow, Retrieval‑Augmented Generation (RAG), domestic large language models. Reduces required human agents from 15 to 2.
Dify Private Deployment
Chosen for visual workflow editor, native multi‑Agent support, built‑in RAG engine, ability to integrate Chinese models (Qwen, DeepSeek, GLM), on‑premises data privacy, and RESTful APIs.
Large Model Selection and Scheduling
Three certified domestic models are used:
Qwen‑Plus – primary model for pre‑sale Q&A, standard after‑sale support, knowledge‑base retrieval, speech polishing, promotion strategies. Covers >80% of daily scenarios. IFBench instruction compliance 76.5. Cost: input ¥0.8 / M tokens, output ¥4.8 / M tokens.
DeepSeek‑V3 – fallback for complex refund decisions, amount confirmation, transaction dispute arbitration, multi‑rule conflict handling. Hallucination rate 3.9 % (industry lowest). Cost: input ¥2 / M tokens, output ¥8 / M tokens.
GLM‑4‑Flash – free model for simple FAQs, greetings, auto‑reply, intent classification. Handles 40‑60 % of simple queries at zero cost.
Model scheduling logic (configured in Dify workflow nodes):
All requests first pass through GLM‑4‑Flash for intent classification and simple‑question routing (zero cost).
Standard customer‑service scenarios are handled by Qwen‑Plus (high instruction compliance, very low cost).
High‑accuracy scenarios such as amount confirmation and dispute resolution are escalated to DeepSeek‑V3.
All models are accessed through a single Alibaba Cloud Bailei API key.
Knowledge‑Base Technical Solution
Dify’s built‑in RAG engine is the core. Documents (PDF, Word, Markdown, CSV) are parsed, automatically segmented, and optionally manually tuned. A vector database (Weaviate or Qdrant) stores embeddings generated by the text‑embedding‑v4 model (Qwen). Retrieval uses hybrid search (vector + keyword) followed by a reranker model to improve hit precision.
Multi‑Agent Design
Router Agent (master scheduler) – receives user messages, classifies intent, dispatches to appropriate agent. Uses GLM‑4‑Flash.
Pre‑sale Consultation Agent – product Q&A, promotion strategies, high‑intent recognition. Uses Qwen‑Plus and product knowledge base.
Post‑sale Service Agent – order lookup, progress inquiry, standard issue handling. Uses Qwen‑Plus with service rule base, Order API, Ticket API.
Refund Handling Agent – refund rule judgment, solution suggestion, refusal script generation. Uses DeepSeek‑V3 with refund rule base, Order API, Approval API.
Exception Handling Agent – abnormal order detection, ticket creation, responsible‑person notification. Uses Qwen‑Plus with Order API, Ticket API, Webhook.
Speech Polishing Agent – reply refinement, tone adjustment, sensitive‑word filtering. Uses Qwen‑Plus with speech template library and sensitive‑word library.
SpringMeng
Focused on software development, sharing source code and tutorials for various systems.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
