Why Every RAG System Needs Smart Query Understanding and Routing
The article explains how diverse user queries in a RAG‑based insurance system require intent classification, entity extraction, and multi‑path routing to choose between vector search, calculation, database lookup, or chit‑chat, and outlines practical rule‑ML‑LLM hybrid solutions with safety safeguards.
1. Why Not All Queries Should Follow the Same Path?
In our insurance RAG project, queries vary: factual Q&A, calculation, database lookup, time‑constrained queries, and chit‑chat. Sending all queries to vector search causes two problems: calculation queries return policy text instead of numbers, and time‑sensitive queries may retrieve outdated documents.
2. Intent Recognition: Three Approaches
Intent recognition decides which processing chain a query should follow.
Rule‑based
Maintain a keyword‑to‑intent map; fast and no model cost but brittle.
def classify_intent_rule(query: str) -> str:
intent_keywords = {
"计算求解": ["计算", "算一下", "多少钱", "怎么算"],
"报销流程查询": ["报销", "流程", "怎么办", "步骤"],
"数据统计": ["统计", "平均", "占比", "趋势"],
}
for intent, keywords in intent_keywords.items():
if any(kw in query for kw in keywords):
return intent
return "通用问答" # fallbackML‑based
Fine‑tune a lightweight BERT classifier on labeled intent data; more robust but requires training and incurs inference latency.
LLM‑prompt
Use a zero‑shot LLM with a system prompt that returns the intent number.
system_prompt = """你是一个意图分类助手。请判断用户问题属于以下哪个类别:
1. 知识问答(需要从知识库检索)
2. 计算求解(需要数值计算)
3. 数据查询(需要查数据库)
4. 闲聊(与业务无关)
只回复数字编号。"""This is cheap to deploy but adds latency and occasional misclassifications.
3. Choosing a Practical Strategy
We combine the three: rule‑first, ML fallback, LLM for hard cases. High‑confidence rule matches are instant; low‑confidence go to the ML model; only when the model’s confidence is below a threshold do we invoke the LLM.
4. Entity Extraction: Pulling Hidden Information
Beyond intent, queries often contain entities such as dates, sources, industries, or ranking targets. Extracting them enables metadata filtering during retrieval.
Implementation can mix regex for structured entities and NER models for open‑ended ones.
5. Retrieval Routing
After intent and entities are identified, route the query:
Knowledge Q&A : vector + BM25 search, apply time or source filters if present.
Calculation : skip retrieval, feed parameters to a calculation function or LLM math reasoning.
Data Query : convert to SQL (NL2SQL) and fetch from the database.
Chit‑chat : bypass RAG, let the LLM respond directly.
Multi‑index routing can further improve precision by selecting a topic‑specific index.
6. Pitfalls and Safeguards
Over‑parsing can be harmful: misrouting a retrieval query to the calculator yields no answer, which is worse than a fallback search.
Rule: if classifier confidence is low, prefer the safe default retrieval path.
Parallel paths (run both retrieval and rewritten query) and a fallback mechanism (detect anomalous calculation or SQL errors) protect against wrong routing.
7. How to Talk About Query Understanding in Interviews
Explain the need for the module, describe the three‑tier intent solution, discuss entity extraction and routing, and emphasize safety strategies such as conservative fallback and parallel paths.
Wu Shixiong's Large Model Academy
We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
