How Hologres Powers Fast Vector & Full‑Text Search for AI‑Driven Customer Service
The Taobao‑Tmall customer operations team built an integrated vector‑plus‑full‑text retrieval solution on Hologres, achieving millisecond‑level recall for massive unstructured knowledge bases, boosting intelligent客服, rule comparison, and sentiment analysis across multiple business scenarios.
In the era of large language models, Taobao‑Tmall (淘天) needs to retrieve relevant knowledge from hundreds of thousands to millions of unstructured text entries quickly and accurately for intelligent customer service, rule matching, and sentiment analysis. Traditional keyword matching with SQL LIKE or regex is slow (seconds) and imprecise at this scale.
Why Combine Vector and Full‑Text Search?
Full‑text search uses keyword matching and inverted indexes to return results in milliseconds, but it lacks semantic understanding (e.g., a query "What fruits are available?" cannot retrieve "apple" or "banana" without explicit keywords). Vector search embeds texts into high‑dimensional vectors (e.g., 128‑dim) and retrieves by semantic similarity, bridging the gap between concepts like "fruit" and "apple".
The team therefore runs a two‑stage pipeline: first vector search to get semantically similar candidates, then full‑text search to supplement keyword matches, finally feeding the fused results into a Retrieval‑Augmented Generation (RAG) workflow.
Why Hologres?
Hologres offers real‑time data warehousing and OLAP capabilities, supporting both vector and full‑text indexes on the same table, along with scalar filters, multi‑field sorting, and complex JOINs. Since version 4.0 it provides a self‑developed HGraph vector index that reduces average latency from 4 seconds (Proxima) to ~30 ms on tens of millions of vectors, a two‑order‑of‑magnitude improvement. It also includes built‑in Chinese tokenization, AND/OR logic, and write‑‑‑immediate query support.
Performance Comparison: HGraph vs Proxima
On 40 k 128‑dim vectors, both engines perform similarly (~30‑40 ms). On 9.5 M vectors, Proxima’s latency spikes to 4 s, while HGraph stays around 30 ms for pure vector recall; combined with downstream OLAP operations the total latency remains under a few hundred milliseconds.
Using Hologres for Vector Search
Define a vector column and index during table creation:
knowledge_vectors array<float> hgraphSpecify a similarity metric such as cosine similarity. Verify the index is used with: EXPLAIN ANALYZE and check the Vector Filter appears in the plan.
Using Hologres for Full‑Text Search
Create a column‑store table, then build a full‑text index on the text field. If the index exists before data insertion, writes automatically trigger compaction, enabling "write‑and‑search". Queries use the TEXT function with logical operators ( OR default, or AND) and can be verified with:
EXPLAIN ANALYZE Fulltext FilterThe built‑in jieba tokenizer provides Chinese segmentation, achieving ~30 ms response on small datasets and ~200 ms on 700 M records with complex AND queries and sorting.
Overall Technical Solution
The workflow is divided into three stages:
Preparation : Clean raw texts (knowledge base, platform rules), optionally generate similar questions, embed them into vectors, and keep the original text for full‑text indexing.
Retrieval : User queries are sent simultaneously to an embedding model and a tokenizer, producing a vector and keywords. Both vector and full‑text searches run in parallel, their results can be weighted or used independently, and the top‑K documents are returned.
Application : Retrieved documents are fed as context to a large language model, combined with prompt engineering and tool calls (e.g., rule comparison, order lookup) to produce final answers or decision suggestions for intelligent agents.
Scenario 1 – Merchant Help Knowledge Recall
In the客服 (customer service) front‑end, user queries are embedded and tokenized, then vector and full‑text searches retrieve 20‑40 relevant knowledge entries. These are passed to a LLM with carefully designed prompts for ranking, yielding accurate answers without human hand‑off. Metrics such as recall, click‑through, and accuracy improved significantly over the previous LIKE/regex approach.
Scenario 2 – Competitor Rule Full‑Text Search
The team built a rule‑analysis system that crawls competitor policies, cleans them, stores them in Hologres, and creates a full‑text index. A user query like "show all platforms' 7‑day no‑reason return policies" returns relevant clauses within 500 ms, with a much higher recall than regex‑based matching.
Future Outlook
Planned extensions include applying the retrieval capability to sentiment analysis and similar‑case clustering, integrating image‑based chat screenshot extraction, and simplifying the stack by adding built‑in embedding functions, variable full‑text query parameters, and improved incremental compaction for serverless workloads.
Overall, Hologres’s unified vector and full‑text search, high performance, and stable operations have become the backbone of Taobao‑Tmall’s AI‑driven knowledge retrieval, and its role is expected to grow as more scenarios adopt RAG techniques.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
