Innovations in Vector, Cloud‑Native, and KV Databases from Baidu Cloud at DTCC 2024
The article summarizes Baidu Intelligent Cloud's three technical sessions at DTCC 2024, covering native vector database VDB, cloud‑native GaiaDB with cost‑effective query acceleration, and the high‑capacity KV store PegaDB, highlighting their architectures, performance gains, and practical deployment insights.
Recently, the 15th China Database Technology Conference (DTCC 2024) was held in Beijing, organized by IT168, ITPUB, and ChinaUnix. The event, themed “Self‑Research Innovation, Intelligent Future,” gathered domestic and international database vendors and experts to discuss vector databases, data governance, cloud‑native databases, and large‑scale data platform construction.
1. Baidu Cloud Native Vector Database VDB: Innovation and Application
With the rise of large models and AIGC, demand for vector databases has surged. Baidu Cloud’s database architect Zhu Jie presented the practice of VDB, emphasizing the deep integration of databases and AI, and the role of Retrieval‑Augmented Generation (RAG) in addressing data freshness and hallucination issues. VDB is a native vector database built from scratch, offering higher scalability, performance, and enterprise capabilities compared to adding vector plugins to existing databases. The speaker also highlighted the upcoming importance of data engineering platforms for unstructured data governance.
2. Cloud‑Native Database GaiaDB: Extreme Cost Reduction and Complex Query Acceleration
As cloud computing deepens, databases are moving toward cloud‑native architectures. Baidu Cloud’s cloud‑native database lead Qiu Xueda introduced GaiaDB, which employs a compute‑storage adaptive replay technique to mitigate slow‑node issues and adopts a peer‑to‑peer design for logs and storage nodes, eliminating single‑point failures and simplifying synchronization.
GaiaDB decouples compute, log, and storage, unifying log streams to reduce format‑conversion risks. It uses dual‑level chain verification for data integrity, enabling rapid detection and retry of inconsistencies. Additionally, GaiaDB pushes certain operators to columnar indexes, achieving over a hundred‑fold SQL execution speedup in HTAP scenarios, thereby expanding use cases and lowering development costs.
3. Baidu Cloud Large‑Capacity KV Database PegaDB: Design and Practice
In the NoSQL session, Baidu Cloud’s Redis expert Shang Xiong presented PegaDB, focusing on horizontal scaling, bulk data import (Bulkload), and multi‑region active‑active architecture. By converting logical data migration to physical migration, PegaDB eliminates unnecessary encoding/decoding, consolidating writes to a unified interface, which multiplies migration efficiency while precisely controlling impact on online traffic.
PegaDB’s Bulkload tags data during generation, ensuring non‑overlapping key ranges between incoming files and existing data. It leverages RocksDB’s Delete‑File‑In‑Range for rapid removal of obsolete files, minimizing business impact and supporting high‑frequency bulk imports. The system also provides second‑level rollback after data injection.
The active‑active design deploys instances across multiple data centers, allowing independent read/write handling, improving fault tolerance, and reducing latency. It addresses challenges such as checkpoint continuation, data loops, and conflict resolution by embedding auxiliary information in WAL LogData.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baidu Intelligent Cloud Tech Hub
We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
