Artificial Intelligence 18 min read

Embedding Technology for FAQ Retrieval: Cases, Evaluation Metrics, and Model Comparison

This article introduces the evolution of embedding techniques, presents real‑world case studies of embedding‑based FAQ retrieval, explains evaluation metrics such as Recall and MRR, and compares the performance of a proprietary ZhongAn embedding model with OpenAI and Sentence‑BERT models on Chinese FAQ datasets.

ZhongAn Tech Team

Sep 4, 2023

Embedding Technology for FAQ Retrieval: Cases, Evaluation Metrics, and Model Comparison

1. Introduction

With the rise of AIGC, embedding technology has become a key driver, evolving from the 2012 Word2Vec paper to modern transformer‑based deep learning models that support word, sentence, paragraph, structured data, image, speech, and multimodal embeddings.

The Data Science Application Center has long used embeddings across many projects. This article first showcases embedding use cases that improve ranking, then details model evaluation methods, and finally compares ZhongAn's proprietary embedding model with OpenAI and the open‑source S‑Bert model, showing superior metrics on both ZhongAn FAQ and a generic Chinese FAQ dataset.

2. Case Introduction

Embeddings can compute similarity for text, images, speech, etc., making them widely used in search, recommendation, clustering, and classification tasks.

2.1 Overview

Embedding is applied in intelligent customer service, financial risk control, and enterprise WeChat empowerment, providing significant business breakthroughs.

In intelligent customer service, FAQ retrieval uses vector‑based retrieval (Embedding‑Based Retrieval, EBR) to recall relevant standard questions; Top‑1 recall reaches 97.6% and Top‑5 reaches 99.7%.

In financial risk control, vector retrieval identifies near‑duplicate images from massive historical data with almost 100% similarity detection and sub‑second latency on CPU‑only resources.

In enterprise WeChat empowerment, combining embeddings with attention fuses user attributes and multi‑turn conversation texts to predict insurance intent, boosting 7‑day conversion by 80% and per‑capita premium by 10%.

2.2 Detailed Cases

We detail the FAQ retrieval case, illustrating how embeddings are combined with other algorithms to improve ranking. Before that, we clarify the speed‑accuracy trade‑off and vector‑based retrieval fundamentals.

Figure 1. FAQ Retrieval Architecture

2.2.1 Speed and Accuracy Trade‑off

Higher‑accuracy models usually have higher computational complexity. In production environments lacking GPUs or running on edge devices, a slightly less accurate but faster model is often chosen. For example, real‑time voice intent detection must convert speech to text via ASR, then run an intent model within milliseconds; a CPU‑only model achieving 93.6% accuracy in tens of milliseconds is preferred over a larger model that adds seconds of latency.

Large language models like ChatGPT are not used directly for intent recognition because their in‑context learning approach yields lower accuracy and slower inference compared to supervised models.

2.2.2 Vector‑Based Retrieval (EBR)

EBR encodes a query into a vector, then computes distances (cosine similarity, inner product, Euclidean) to pre‑computed vectors of knowledge‑base items. For large corpora, Approximate Nearest Neighbor (ANN) methods such as HNSW are preferred over exact K‑Nearest Neighbor (KNN) to balance speed and accuracy.

2.2.3 FAQ Retrieval Process

The architecture (Figure 1) uses EBR as one of the recall lanes to improve overall recall (recall@k). The workflow includes:

1. User Question Understanding

Embedding, intent recognition, error correction, and keyword extraction are performed. The proprietary ZhongAn FAQ embedding model provides strong recall and ranking.

2. Knowledge‑Base Question Recall

Two recall strategies are employed:

Vector retrieval (EBR) using FAISS; large knowledge bases use HNSW, smaller ones use KNN.

Keyword‑weighted recall: DeepCT‑enhanced BM25 improves term weighting over vanilla BM25.

3. Ranking

A coarse‑rank step selects the top 20 candidates with Poly‑Bert, followed by a fine‑rank step using Keywords‑Bert, which performs interactive token‑level comparison and achieves higher accuracy.

4. Strategy Layer

Rule‑based and language‑model logic determines the final answer, balancing recall quality and ranking precision.

3. Embedding Model Retrieval Effect

3.1 Evaluation Metrics

Key retrieval metrics include Recall, Precision, MAP, MRR, and nDCG. This article focuses on Recall and MRR.

Recall measures the proportion of relevant items retrieved within the top‑k results.

Recall@k does not consider ordering; therefore, in scenarios where the rank of the first relevant result matters, MRR is used.

MRR (Mean Reciprocal Rank) averages the reciprocal of the rank of the first relevant result across queries.

3.2 Model Performance

The following three Chinese embedding models are compared on ZhongAn FAQ and a generic Chinese FAQ dataset:

ZhongAn Embedding – 0.2 B parameters, output vector length 128/256.

OpenAI Embedding – 6 B parameters, output vector length 1536.

Sentence‑Bert (S‑Bert) – 4.8 B parameters, output vector length 768.

Because the ZhongAn model is smallest, it converts queries to vectors fastest and, with shorter vectors, yields the lowest latency for ANN/KNN retrieval.

Test Set

Request Count

Knowledge‑Base Size

ZhongAn FAQ

2k+

21k+

Generic Chinese FAQ

27k+

110k+

Recall and MRR results (Figures 4 & 5) show that the ZhongAn model outperforms the other two on the ZhongAn FAQ set (e.g., 1‑Recall@1 = 0.976 vs 0.828 for OpenAI, MRR@10 = 0.983 vs 0.887). On the generic FAQ set the gap narrows because OpenAI and S‑Bert achieve high scores (1‑Recall@1 > 0.92), but the lightweight ZhongAn model still retains an edge.

4. Conclusion

This article demonstrated embedding applications, evaluation methods, and comparative results of the ZhongAn embedding model. Embeddings can be used directly for retrieval or combined with other models for higher precision. Deploying high‑quality embeddings involves data preparation, deep‑learning modeling, production system engineering, and end‑to‑end optimization, all of which require continuous improvement to drive business value.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Artificial Intelligence Vector Search Evaluation Metrics Embedding FAQ Retrieval

Written by

ZhongAn Tech Team

China's first online insurer. Through tech innovation we make insurance simpler, warmer, and more valuable. Powered by technology, we support 50 billion RMB of policies and serve 600 million users with smart, personalized solutions. ZhongAn's hardcore tech and article shares are here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.