Artificial Intelligence 12 min read

How Milvus Powered a Scalable AI Assistant for Car Queries with Vector Search

This article details how an automotive AI assistant migrated from keyword matching to a Milvus‑based vector retrieval system, overcoming semantic gaps, scaling to millions of daily queries, optimizing indexing, introducing multi‑vector and sparse‑vector search, and building a real‑time RAG pipeline with Flink.

Yiche Technology

Dec 3, 2025

How Milvus Powered a Scalable AI Assistant for Car Queries with Vector Search

Background

In automotive Q&A and intelligent assistant scenarios, users increasingly ask natural‑language questions such as “a family‑friendly new energy car under 15 万” or “a spacious SUV”. Traditional keyword search cannot capture semantics, context, or fuzzy intent, creating a large gap between retrieval results and user expectations.

Why Vector Search?

Vector retrieval converts text, images, and other high‑dimensional data into vectors and computes similarity, moving from character matching to semantic understanding. Introducing a vector database became a must‑have for the industry.

Early Experiment (Early 2023)

The team first evaluated faiss on a 10 万 user pilot. Faiss could recognize semantic links like “大空间 SUV” ↔ “宽敞的车”, showing a qualitative leap over keyword matching.

However, as AI‑assistant traffic grew, the faiss‑based solution proved cumbersome: index construction and storage management were complex, and scaling the service incurred high deployment and resource‑expansion costs.

Adopting Milvus (Mid 2023)

After a multi‑dimensional technology comparison, Milvus was selected for its distributed architecture, horizontal scalability, and out‑of‑the‑box features, fitting the large‑scale AI‑assistant needs.

In June 2023 the team deployed Milvus clusters and migrated core retrieval scenarios (vehicle configuration, reviews, FAQs) to Milvus. The system achieved:

Semantic retrieval accuracy boost, correctly matching synonyms like “大空间 SUV” and “宽敞的车”.

Million‑level daily query throughput with millisecond‑level latency.

The breakthrough stemmed from Milvus’s cloud‑native design: a Proxy as a unified entry point, worker nodes handling queries, data initialization, and index building, and storage back‑ends such as MinIO and S3 . Adding query nodes increased throughput, while adding data nodes eased storage pressure, solving previous scaling bottlenecks.

Performance Tuning (Nov 2023)

When data grew to tens of millions, insertion and deletion became slow and batch updates overloaded the cluster. The team released a Milvus minor version focusing on:

Delete optimization – response time dropped from seconds to milliseconds.

Compression improvements – higher write speed.

These changes enabled Milvus to handle “海量数据+高频更新” scenarios efficiently.

Data Pipeline & RAG Integration

An enterprise‑grade vector data pipeline was built to ingest documents, web pages, and images, clean and vectorize them, and feed them into Milvus. This ensured consistent data logic and leveraged big‑data clusters for high‑throughput processing.

For real‑time knowledge‑augmented generation (RAG), the team used Flink with CDC connectors (MySQL, SQL Server, MongoDB) and Kafka to capture change data, perform embedding, and push results to Milvus within seconds, reducing latency from days to seconds.

Multi‑Vector & Sparse‑Vector Retrieval (2024)

As query complexity increased (e.g., “Audi A6 lowest‑trim + sound system + fuel consumption”), the previous architecture required splitting the request into multiple sub‑queries and stitching results, leading to inconsistencies.

In June 2024 Milvus added multi‑vector search, allowing several vectors to be evaluated in a single query, dramatically simplifying business logic.

Later in 2024 the team explored sparse‑vector support to improve keyword‑sensitive scenarios (e.g., “BMW 5 2025 price”). Combining sparse and dense vectors (ES + Milvus) enhanced accuracy without the overhead of a dual‑system architecture.

Index Strategy Optimization

Different data volumes received tailored index types:

Millions of car configuration records – IVF_FLAT with tuned nprobe and nlist for a 5‑10% performance gain.

Hundreds of thousands of news and review entries – HNSW graph index for balanced speed and accuracy.

FAQ short texts – FLAT index to save resources.

Results and Outlook

Milvus now underpins Yiche’s AI services across web, app, mini‑program, and enterprise WeChat, supporting over 10 AI products, handling C‑end, B‑end, and data‑layer queries with daily billion‑scale writes and stable millisecond latency.

The system’s distributed architecture, flexible indexing, continuous version upgrades, and open ecosystem make it a solid enterprise‑grade choice for future scaling and feature expansion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

RAG Milvus vector search Scalable Architecture AI assistant Semantic Retrieval

Written by

Yiche Technology

Official account of Yiche Technology, regularly sharing the team's technical practices and insights.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Why Vector Search?

Early Experiment (Early 2023)

Adopting Milvus (Mid 2023)

Performance Tuning (Nov 2023)

Data Pipeline & RAG Integration

Multi‑Vector & Sparse‑Vector Retrieval (2024)

Index Strategy Optimization

Results and Outlook

Yiche Technology

How this landed with the community

Was this worth your time?

0 Comments

Early Experiment (Early 2023)

Adopting Milvus (Mid 2023)

Performance Tuning (Nov 2023)