Five Intent Recognition Designs: From Keyword Matching to Classifier to LLM Self‑Routing – A Decision Tree to Choose the Right One

The article breaks down five production‑grade intent‑recognition designs—keyword matching, regex‑rule engine, embedding classifier, fine‑tuned small model, and zero‑shot LLM routing—provides code snippets, latency and cost benchmarks, decision‑making rules, and shows how a layered architecture can cut API costs from ¥80,000 to ¥3,000 while keeping accuracy above 90%.

James' Growth Diary
James' Growth Diary
James' Growth Diary
Five Intent Recognition Designs: From Keyword Matching to Classifier to LLM Self‑Routing – A Decision Tree to Choose the Right One

01 Keyword Matching: The Fastest Knife, but Only Cuts Straight Lines

Keyword matching is the oldest intent‑recognition technique, essentially a "trigger‑word → intent" map. The article provides a TypeScript example:

const intentKeywords: Record<string, string[]> = {
  order_query: ["订单", "物流", "快递", "发货", "到货", "运单"],
  refund_request: ["退款", "退货", "退钱", "申请退", "要退"],
  complaint: ["投诉", "不满意", "太差了", "骗人", "垃圾"],
  product_inquiry: ["价格", "多少钱", "有没有", "规格", "参数"]
};
function matchKeywordIntent(text: string): string | null {
  for (const [intent, kws] of Object.entries(intentKeywords)) {
    if (kws.some(kw => text.includes(kw))) return intent;
  }
  return null; // not hit, hand over downstream
}
console.log(matchKeywordIntent("我的订单还没发货啊")); // order_query ✅
console.log(matchKeywordIntent("这个多少钱")); // product_inquiry ✅
console.log(matchKeywordIntent("我想买个东西")); // null — ceiling of keyword approach

Latency is <1 ms and accuracy >95 % for high‑frequency intents, but it fails on utterances without the exact keywords.

Decision rhyme: Clear lexical mapping + extreme speed sensitivity → Keyword first

02 Regex + Rule Engine: Adding Context to Keywords

Combines regex patterns with logical combinators and negative patterns to capture more natural language while keeping latency low (1‑5 ms). Example:

const rules = [
  {
    intent: "order_status_query",
    patterns: [/我的(订单|包裹).*(到了|在哪|多久)/, /(查|看看).*(物流|快递)/],
    combinator: "any" as const,
  },
  {
    intent: "complaint_hardware",
    patterns: [/(不能用|用不了|坏了)/, /(产品|设备|手机)/],
    combinator: "all" as const,
    negativePatterns: [/怎么用/], // filter tutorial questions
  },
];
function matchRule(text: string): string | null {
  for (const rule of rules) {
    const hits = rule.patterns.filter(p => p.test(text));
    const negHits = rule.negativePatterns?.filter(p => p.test(text)) ?? [];
    if (negHits.length > 0) continue;
    const matched = rule.combinator === "any" ? hits.length > 0 : hits.length === rule.patterns.length;
    if (matched) return rule.intent;
  }
  return null;
}
console.log(matchRule("我的快递到哪了")); // order_status_query ✅
console.log(matchRule("手机用不了了")); // complaint_hardware ✅
console.log(matchRule("手机怎么用")); // null (negative filter) ✅

Latency stays under 5 ms, but rule files can balloon to thousands of lines as intent count grows, raising maintenance cost.

Decision rhyme: Intent count < 20 + regular expression‑friendly language → Rule engine

03 Embedding Classifier: From Literal to Semantic

When intents exceed ~20 and utterances vary widely, vector embeddings enable semantic nearest‑neighbor search. The article shows a LangChain‑based classifier that stores 5‑10 example sentences per intent, embeds them with text‑embedding‑3‑small, and classifies by cosine similarity:

class EmbeddingIntentClassifier {
  private embedder = new OpenAIEmbeddings({ model: "text-embedding-3-small" });
  private intents: Array<{ intent: string; embeddings: number[][] }> = [];
  async build(examples: Array<{ intent: string; texts: string[] }>) {
    for (const { intent, texts } of examples) {
      const embeddings = await this.embedder.embedDocuments(texts);
      this.intents.push({ intent, embeddings });
    }
  }
  async classify(text: string, threshold = 0.75) {
    const q = await this.embedder.embedQuery(text);
    let best = { intent: null as string | null, score: 0 };
    for (const { intent, embeddings } of this.intents) {
      for (const e of embeddings) {
        const score = cosineSimilarity(q, e);
        if (score > best.score) best = { intent, score };
      }
    }
    return best.score >= threshold ? best : { intent: null, score: best.score };
  }
}
// Build example
const clf = new EmbeddingIntentClassifier();
await clf.build([
  { intent: "order_query", texts: ["我的快递到哪了", "订单什么时候发货", "帮我查一下物流"] },
  { intent: "refund", texts: ["我要退款", "这个不想要了怎么退", "申请七天无理由"] },
]);
console.log(await clf.classify("包裹现在在哪个城市")); // { intent: "order_query", score: 0.87 }
console.log(await clf.classify("我想取消这笔购买")); // { intent: "refund", score: 0.81 }

Latency 30‑80 ms, accuracy high for semantic generalisation, and it scales to ~200 intents with only 5‑10 examples each.

Decision rhyme: Intent count 20‑200 + large linguistic variance + few examples → Embedding classifier

04 Small‑Model Fine‑Tune: Production Mainstay When Data Is Sufficient

With >200 labelled examples per intent, fine‑tuning a small model (e.g., BERT) becomes cost‑effective. The article reports a real‑world cost comparison: 100 k daily requests cost ¥80 k with pure GPT‑4o‑mini (200‑600 ms latency) versus ¥4 k with a self‑hosted BERT (10‑30 ms latency), a 55 % cost reduction and >10× speed gain.

// small-model-in-langgraph.ts (simplified)
import { Runnable } from "@langchain/core/runnables";
import { StateGraph, Annotation } from "@langchain/langgraph";
import { HfInference } from "@huggingface/inference";

class SmallModelClassifier extends Runnable<string, { intent: string; confidence: number }> {
  lc_namespace = ["custom", "intent"];
  private hf = new HfInference(process.env.HF_TOKEN!);
  constructor(private modelId: string) { super(); }
  async invoke(text: string) {
    const [top] = await this.hf.textClassification({ model: this.modelId, inputs: text });
    return { intent: top.label, confidence: top.score };
  }
}
// Integration into LangGraph routing graph (omitted for brevity)

A confidence threshold of 0.8 is used; below that the request falls back to the LLM layer.

Decision rhyme: Data > 200 samples/intent + high concurrency + cost‑sensitivity → Small‑model fine‑tune

05 LLM Self‑Routing: Zero‑Shot, No Annotation, Handles Fuzzy Intents

Zero‑shot routing uses a system prompt that lists intent definitions; the LLM returns the matching intent with confidence and reasoning. Example (using gpt‑4o‑mini):

const intentDefs = {
  order_query: "用户询问订单状态、物流进度;【不包括】修改地址",
  refund_request: "用户要求退款、退货、取消订单",
  complaint: "用户明确表达不满;【不包括】产品使用疑问",
  product_faq: "用户询问产品规格、功能、使用方法",
  other: "以上均不符合"
};
async function llmRoute(input: string) {
  const llm = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 })
    .withStructuredOutput(IntentSchema);
  const list = Object.entries(intentDefs).map(([k, v]) => `- ${k}: ${v}`).join("
");
  return llm.invoke([
    { role: "system", content: `你是意图分类器,判断用户输入属于哪个意图:
${list}` },
    { role: "user", content: input }
  ]);
}
// Example
const r = await llmRoute("我那个东西好久没消息了");
// { intent: "order_query", confidence: 0.85, reasoning: "暗指购买物品的追踪" }

While offering the highest accuracy (zero‑shot), latency grows with traffic: 280 ms at 50 QPS, 820 ms at 200 QPS, and >15 % timeout at 500 QPS, making it unsuitable as the sole layer for high‑throughput services.

Decision rhyme: New intent quickly onboard + fuzzy semantics + traffic < 1k QPS → LLM self‑routing

06 Layered Architecture: The Production‑Ready Combo

The five solutions are stacked in a four‑layer funnel. Rough traffic distribution: Layer 1 (keyword) ≈ 35 % of requests, Layer 2 (rule engine) ≈ 20 %, Layer 3 (embedding) ≈ 35 %, Layer 4 (LLM) ≈ 10 %.

class LayeredIntentRecognizer {
  async recognize(text: string): Promise<{ intent: string; layer: number }> {
    const kw = matchKeywordIntent(text);
    if (kw) return { intent: kw, layer: 1 };
    const rule = matchRule(text);
    if (rule) return { intent: rule, layer: 2 };
    const emb = await this.embCLF.classify(text);
    if (emb.intent && emb.score >= 0.75) return { intent: emb.intent, layer: 3 };
    const llm = await llmRoute(text);
    return { intent: llm.intent, layer: 4 };
  }
}

Real‑world A/B test on a customer‑service bot (100 k daily requests) showed:

Pure LLM: ~¥80 k/month cost, 800 ms avg latency, 94 % accuracy.

Layered solution: ~¥3 k/month cost, 60 ms avg latency, 93 % accuracy (essentially unchanged).

Monitoring recommendation: when Layer 4 hit‑rate exceeds 20 %, migrate hot intents into Layer 3’s example pool.

07 Common Pitfalls that Crash Systems

All‑LLM stack: traffic spikes cause exponential cost and latency blow‑up.

Embedding example quality: a handful of well‑chosen sentences beats dozens of noisy ones.

Out‑of‑Scope handling: always emit an out_of_scope label when confidence falls below a threshold.

LLM prompt breadth: explicitly list what is *not* included in each intent to avoid over‑matching.

Mixing coarse and fine intents in the same layer: reserve fast layers for high‑level intents, deeper layers for detailed sub‑intents.

Conclusion

The article decomposes intent recognition into five practical designs, each with code, latency, cost, and scalability characteristics, and demonstrates that a layered funnel (keyword → rule engine → embedding → LLM) yields a 97 % cost reduction while preserving near‑identical accuracy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

rule enginelayered architectureintent recognitionkeyword matchingLLM routingembedding classifierfine‑tune model
James' Growth Diary
Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.