How Tencent Boosts LLM Power with RAG, GraphRAG, and Agent Technologies
This article examines Tencent's large language model deployments across content generation, intelligent customer service, and role‑playing scenarios, detailing the principles and practical implementations of Retrieval‑Augmented Generation (RAG), GraphRAG, and Agent techniques, and discusses challenges, optimization strategies, and real‑world use cases.
Application Scenarios
Tencent deploys large language models (LLMs) across a variety of business domains, including:
Content generation : automatic creation of ad copy, article drafts, and comment assistance.
Content understanding : text moderation, fraud detection, and semantic classification.
Intelligent customer service : knowledge‑base Q&A, user guidance, and ticket triage.
Code‑assistant (Copilot) : automated code review, test‑case generation, and snippet suggestion.
Role‑playing NPC interaction : dynamic dialogue for games and virtual characters.
Core Techniques
1. Supervised Fine‑Tuning (SFT)
SFT adapts a pretrained base LLM to a specific business domain by training on curated instruction‑response pairs. Typical workflow:
Collect domain‑specific data (e.g., support tickets, code snippets, product manuals).
Format data as {"instruction": ..., "input": ..., "output": ...} and split into train/validation sets.
Fine‑tune with a low learning rate (e.g., 5e‑5), batch size 32, 3–5 epochs, using mixed‑precision to reduce GPU memory.
Evaluate on task‑specific metrics (accuracy, BLEU, code‑review precision) before deployment.
2. Retrieval‑Augmented Generation (RAG)
RAG augments generation with external knowledge to improve factuality and reduce hallucinations.
Data preparation : Build a high‑quality knowledge base (documents, FAQs, manuals). Chunk documents into 1,024‑token segments using semantic or markdown‑based splitting. Encode each chunk with an embedding model (e.g., sentence‑transformers/all‑mpnet‑base‑v2) and store vectors in a vector database (FAISS, Milvus) alongside BM25 inverted indexes for keyword fallback.
Retrieval : Convert the user query to an embedding, perform approximate nearest‑neighbor search, optionally re‑rank with a cross‑encoder (e.g., cross‑encoder/ms‑marco‑T5‑large).
Generation : Pass the retrieved passages to the LLM as context (e.g., [Context] ... [User query]) and let the model generate a grounded answer.
3. GraphRAG for Role‑Playing
GraphRAG extends RAG by constructing a knowledge graph from long narrative texts (novels, scripts) to enable global and local reasoning.
Indexing : Apply named‑entity recognition and relation extraction to each chunk, then store entities, relations, and community structures in a graph database (Neo4j, JanusGraph).
Retrieval : Perform local queries that fetch facts about a specific entity, and global queries that summarize community‑level information (e.g., story arcs).
Generation : Provide the LLM with retrieved graph sub‑structures, allowing it to cite sources and produce coherent multi‑turn dialogue.
4. Agent Technology
Agents combine reasoning and action, enabling goal‑driven workflows.
Roles : User (issues a request), Planner (interprets the request and decides which tools to call), Tool (external API such as weather, budgeting, or search).
Process : The Planner parses the request, generates a plan, invokes the selected tool, receives the result, and iteratively refines the answer until a final response is produced.
Challenges and Optimizations
Hallucination : Pure generation can produce inaccurate statements; grounding with RAG or GraphRAG mitigates this risk.
Knowledge freshness : Business knowledge evolves rapidly; external retrieval allows dynamic updates without retraining the base model.
Explainability & safety : GraphRAG provides traceable reasoning paths, improving transparency and auditability.
Document processing : Heterogeneous formats (PDF, Office, images) require robust parsing, multi‑modal extraction, and flexible chunking (fixed length, semantic, markdown‑based, recursive).
Practical Recommendations
Curate high‑quality, well‑structured knowledge bases; “garbage in, garbage out” directly impacts RAG performance.
Combine dense vector retrieval with BM25 keyword retrieval to maximize recall and relevance.
Apply multi‑stage chunking: first split by document structure (headings), then refine with a BERT‑style segmentation model that predicts optimal split points for Chinese text.
Design prompts that explicitly define the model’s role, input/output schema, and include few‑shot examples for style control.
Technical Q&A Highlights
Q1: Should QA pairs be embedded as whole documents or cached separately? Both approaches are viable. Embedding the full document enables broader semantic search, while indexing questions separately allows fast exact‑match retrieval for specific queries. A hybrid index (question vectors + document vectors) often yields the best trade‑off.
Q2: How is Chinese semantic chunking performed? A BERT‑based classifier is trained on labeled split points. The model scores each token boundary; thresholds are chosen to keep semantic completeness and smooth context flow. This method outperforms naïve fixed‑length splitting on Chinese prose.
Q3: How to evaluate answer effectiveness? Use a combination of correctness (exact match or tolerance‑based scoring), relevance (nDCG against retrieved passages), and factuality (human‑in‑the‑loop verification or automated fact‑checking). Maintain a transparent evaluation pipeline that logs inputs, retrieved contexts, and generated outputs.
Q4: When to prefer SFT over RAG? Choose SFT when the task requires deep domain expertise that can be encoded directly into model parameters (e.g., code review, specialized legal reasoning). Use RAG when the knowledge is large, frequently updated, or when factual grounding is critical (e.g., customer support, product documentation).
Q5: Can SFT adapt model style (e.g., WhatsApp chat tone)? Yes. By providing style‑specific examples in the fine‑tuning dataset and using instruction‑following prompts, the model learns to generate responses that match the desired tone.
Q6: Does a fine‑tuned SFT model remain suitable as a general‑purpose Agent? It can still serve as the LLM backbone of an Agent, but performance may degrade on out‑of‑domain tasks if the fine‑tuning data is narrowly focused. Periodic multi‑task fine‑tuning or parameter‑efficient adapters (LoRA, prefix‑tuning) can preserve generality.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
