Artificial Intelligence 11 min read

Why Every RAG System Needs Smart Query Understanding and Routing

The article explains how diverse user queries in a RAG‑based insurance system require intent classification, entity extraction, and multi‑path routing to choose between vector search, calculation, database lookup, or chit‑chat, and outlines practical rule‑ML‑LLM hybrid solutions with safety safeguards.

Wu Shixiong's Large Model Academy

Mar 13, 2026

Why Every RAG System Needs Smart Query Understanding and Routing

1. Why Not All Queries Should Follow the Same Path?

In our insurance RAG project, queries vary: factual Q&A, calculation, database lookup, time‑constrained queries, and chit‑chat. Sending all queries to vector search causes two problems: calculation queries return policy text instead of numbers, and time‑sensitive queries may retrieve outdated documents.

2. Intent Recognition: Three Approaches

Intent recognition decides which processing chain a query should follow.

Rule‑based

Maintain a keyword‑to‑intent map; fast and no model cost but brittle.

def classify_intent_rule(query: str) -> str:
    intent_keywords = {
        "计算求解": ["计算", "算一下", "多少钱", "怎么算"],
        "报销流程查询": ["报销", "流程", "怎么办", "步骤"],
        "数据统计": ["统计", "平均", "占比", "趋势"],
    }
    for intent, keywords in intent_keywords.items():
        if any(kw in query for kw in keywords):
            return intent
    return "通用问答"  # fallback

ML‑based

Fine‑tune a lightweight BERT classifier on labeled intent data; more robust but requires training and incurs inference latency.

LLM‑prompt

Use a zero‑shot LLM with a system prompt that returns the intent number.

system_prompt = """你是一个意图分类助手。请判断用户问题属于以下哪个类别：
1. 知识问答（需要从知识库检索）
2. 计算求解（需要数值计算）
3. 数据查询（需要查数据库）
4. 闲聊（与业务无关）
只回复数字编号。"""

This is cheap to deploy but adds latency and occasional misclassifications.

3. Choosing a Practical Strategy

We combine the three: rule‑first, ML fallback, LLM for hard cases. High‑confidence rule matches are instant; low‑confidence go to the ML model; only when the model’s confidence is below a threshold do we invoke the LLM.

4. Entity Extraction: Pulling Hidden Information

Beyond intent, queries often contain entities such as dates, sources, industries, or ranking targets. Extracting them enables metadata filtering during retrieval.

Implementation can mix regex for structured entities and NER models for open‑ended ones.

5. Retrieval Routing

After intent and entities are identified, route the query:

Knowledge Q&A : vector + BM25 search, apply time or source filters if present.

Calculation : skip retrieval, feed parameters to a calculation function or LLM math reasoning.

Data Query : convert to SQL (NL2SQL) and fetch from the database.

Chit‑chat : bypass RAG, let the LLM respond directly.

Multi‑index routing can further improve precision by selecting a topic‑specific index.

6. Pitfalls and Safeguards

Over‑parsing can be harmful: misrouting a retrieval query to the calculator yields no answer, which is worse than a fallback search.

Rule: if classifier confidence is low, prefer the safe default retrieval path.

Parallel paths (run both retrieval and rewritten query) and a fallback mechanism (detect anomalous calculation or SQL errors) protect against wrong routing.

7. How to Talk About Query Understanding in Interviews

Explain the need for the module, describe the three‑tier intent solution, discuss entity extraction and routing, and emphasize safety strategies such as conservative fallback and parallel paths.

LLM RAG routing query understanding intent classification entity extraction

Written by

Wu Shixiong's Large Model Academy

We continuously share large‑model know‑how, helping you master core skills—LLM, RAG, fine‑tuning, deployment—from zero to job offer, tailored for career‑switchers, autumn recruiters, and those seeking stable large‑model positions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.