Databases 9 min read

Innovations in Vector, Cloud‑Native, and KV Databases from Baidu Cloud at DTCC 2024

The article summarizes Baidu Intelligent Cloud's three technical sessions at DTCC 2024, covering native vector database VDB, cloud‑native GaiaDB with cost‑effective query acceleration, and the high‑capacity KV store PegaDB, highlighting their architectures, performance gains, and practical deployment insights.

Baidu Intelligent Cloud Tech Hub

Aug 28, 2024

Innovations in Vector, Cloud‑Native, and KV Databases from Baidu Cloud at DTCC 2024

Recently, the 15th China Database Technology Conference (DTCC 2024) was held in Beijing, organized by IT168, ITPUB, and ChinaUnix. The event, themed “Self‑Research Innovation, Intelligent Future,” gathered domestic and international database vendors and experts to discuss vector databases, data governance, cloud‑native databases, and large‑scale data platform construction.

1. Baidu Cloud Native Vector Database VDB: Innovation and Application

With the rise of large models and AIGC, demand for vector databases has surged. Baidu Cloud’s database architect Zhu Jie presented the practice of VDB, emphasizing the deep integration of databases and AI, and the role of Retrieval‑Augmented Generation (RAG) in addressing data freshness and hallucination issues. VDB is a native vector database built from scratch, offering higher scalability, performance, and enterprise capabilities compared to adding vector plugins to existing databases. The speaker also highlighted the upcoming importance of data engineering platforms for unstructured data governance.

2. Cloud‑Native Database GaiaDB: Extreme Cost Reduction and Complex Query Acceleration

As cloud computing deepens, databases are moving toward cloud‑native architectures. Baidu Cloud’s cloud‑native database lead Qiu Xueda introduced GaiaDB, which employs a compute‑storage adaptive replay technique to mitigate slow‑node issues and adopts a peer‑to‑peer design for logs and storage nodes, eliminating single‑point failures and simplifying synchronization.

GaiaDB decouples compute, log, and storage, unifying log streams to reduce format‑conversion risks. It uses dual‑level chain verification for data integrity, enabling rapid detection and retry of inconsistencies. Additionally, GaiaDB pushes certain operators to columnar indexes, achieving over a hundred‑fold SQL execution speedup in HTAP scenarios, thereby expanding use cases and lowering development costs.

3. Baidu Cloud Large‑Capacity KV Database PegaDB: Design and Practice

In the NoSQL session, Baidu Cloud’s Redis expert Shang Xiong presented PegaDB, focusing on horizontal scaling, bulk data import (Bulkload), and multi‑region active‑active architecture. By converting logical data migration to physical migration, PegaDB eliminates unnecessary encoding/decoding, consolidating writes to a unified interface, which multiplies migration efficiency while precisely controlling impact on online traffic.

PegaDB’s Bulkload tags data during generation, ensuring non‑overlapping key ranges between incoming files and existing data. It leverages RocksDB’s Delete‑File‑In‑Range for rapid removal of obsolete files, minimizing business impact and supporting high‑frequency bulk imports. The system also provides second‑level rollback after data injection.

The active‑active design deploys instances across multiple data centers, allowing independent read/write handling, improving fault tolerance, and reducing latency. It addresses challenges such as checkpoint continuation, data loops, and conflict resolution by embedding auxiliary information in WAL LogData.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kv database

Written by

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.