Artificial Intelligence 14 min read

How HuoLala Overcomes Large‑Model Limits with Retrieval‑Augmented Vector Databases

This article details HuoLala's journey from confronting large‑model knowledge gaps and latency issues to adopting Retrieval‑Augmented Generation (RAG) with vector databases, outlining their architecture challenges, selection criteria, deployment scenarios, and future plans for AI‑driven logistics.

Huolala Tech

Jul 9, 2025

How HuoLala Overcomes Large‑Model Limits with Retrieval‑Augmented Vector Databases

HuoLala, founded in 2013 in the Greater Bay Area, operates an internet logistics marketplace covering intra‑city and inter‑city freight, corporate logistics, moving, LTL, errands, cold chain, vehicle rental, and after‑market services, with 16.7 million monthly active users and 1.68 million active drivers across 11 markets and 400+ cities, supported by six data centers.

Challenges of Large Model Application

Leveraging its deep AI experience in logistics, HuoLala has explored large‑model use in over 14 business units and 50 real scenarios, but faces vertical knowledge gaps, latency, and data‑security concerns. To address these, they adopted Retrieval‑Augmented Generation (RAG), which injects external data so the model answers in an “open‑book” fashion, improving accuracy, reducing uncertainty, and enhancing data security.

The core of RAG combines powerful language models with vector‑database capabilities. Implementing RAG typically requires a vector database, which excels at handling multimodal data and semantic search.

Storage of unstructured data: vector databases efficiently store and manage multimodal data such as audio, video, images, and text.

Vectorization: neural networks transform data into high‑dimensional vectors, enabling semantic similarity search.

Retrieval of unstructured data: similarity is computed via vector distances, requiring extensive floating‑point operations for fast matching.

Vector Database Selection Considerations

(1) Existing Architecture and Pain Points

The current stack includes a compute layer (CPU & GPU), storage layer (vector DB, Elasticsearch, etc.), retrieval layer (graph‑based indexes), and access/entry layers, spread across five clusters with up to 380 GB memory per cluster and tables holding up to 20 million records.

Pain Point 1: Dynamic schema

Frequent schema changes require table recreation and index rebuilding, a time‑consuming process that heavily consumes CPU and memory, causing service instability.

Pain Point 2: Hybrid retrieval

Vector search excels at semantic similarity, while full‑text search excels at exact matching. Relying on a single method cannot meet precision requirements, leading HuoLala to integrate Elasticsearch for full‑text search, which adds architectural complexity and higher maintenance overhead.

Pain Point 3: Operational difficulty

Weak stability: frequent bugs, limited monitoring, and scarce expert knowledge make troubleshooting hard.

Poor scalability: limited horizontal scaling, manual data migration, and complex sharding.

Weak access control: inadequate permission mechanisms risk data leakage.

Low community activity: infrequent updates and limited ecosystem hinder future growth.

(2) Selection Criteria and Process

Based on the above pain points, a new vector‑database evaluation was conducted at the end of 2024, focusing on business and operational requirements, as illustrated below.

Ten vector‑database candidates were shortlisted. First‑round filtering removed cloud‑provider databases (due to multi‑cloud deployment needs), PostgreSQL (insufficient vector dimensions), and Weaviate (stability and permission concerns). The remaining candidates—Milvus, Elasticsearch, and OceanBase—underwent a second round focusing on stability and operational cost.

Milvus: High stability demands for real‑time risk control, but complex architecture and cross‑region deployment introduce stability risks, leading to its exclusion.

OceanBase: Community edition testing showed it meets functional and performance needs, with proven stability, active community, and vendor support.

Elasticsearch: Strong full‑text and hybrid search capabilities, but internal team considerations favored OceanBase over Elasticsearch.

After selection, the key decision was between self‑hosting and cloud deployment. Considering the company’s move toward cloud‑based databases, elastic scaling, SLA guarantees, and reduced operational effort, the team chose to build the vector‑database foundation on the cloud.

Vector Database Deployment Scenarios

(1) Asset‑Loss Code Identification

OceanBase vector search is used to automatically detect risky code that could cause financial loss. Historically, manual reviews were slow and incomplete. By vectorizing historical loss‑code cases and leveraging a large model for risk assessment, the system retrieves similar code snippets and blocks risky builds before deployment.

The workflow involves labeling historical loss‑code cases with a large model, confirming them manually, storing vectors in the database, and during code submission, performing similarity search and model‑based risk judgment to abort unsafe builds.

(2) Data Warehouse AI Q&A Assistant

HuoLala’s massive data warehouse contains hundreds of thousands of Hive tables, making data discovery difficult for users lacking business knowledge. By ingesting schema information, chat logs, and documentation into the OceanBase vector database, the AI assistant can understand intent, decompose complex queries, retrieve relevant knowledge via vector, scalar, and full‑text search, re‑rank results, and generate concise answers with a large model.

This solution lowers the barrier to data access, reduces developer workload, improves query efficiency, cuts labor costs, and enhances user experience.

Future Plans

With OceanBase operating stably in production, HuoLala plans deeper and richer applications.

Business migration: support unified hybrid search, adapt business logic, and handle data migration.

Performance and cost: explore quantized HNSW_SQ or IVF disk indexes, table‑level TTL, and hot‑cold data tiering.

Internal system integration: embed OceanBase into big‑data platforms, monitoring/alert systems, and DMS for smoother user experience.

Explore more scenarios: OLAP, OBKV, and other online storage use cases.

AI Elasticsearch RAG vector database Milvus OceanBase

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.