Databases 11 min read

How Baidu’s New Cloud‑Native Databases Power Enterprise AI in 2024

At the 2024 Baidu Cloud Summit, the speaker detailed recent breakthroughs across Baidu’s cloud‑native database suite—including PegaDB KV, GaiaDB relational, VDB vector, and the integrated DBSC, EDAP, and DBStack platforms—highlighting performance, cost, scalability, and AI‑ready features that address enterprise data challenges.

Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
How Baidu’s New Cloud‑Native Databases Power Enterprise AI in 2024

This article is compiled from Baidu Cloud Summit 2024 – Cloud‑Native Forum talk.

Large models are a hot technology trend, but they are trained mainly on public internet data and lack enterprise‑specific knowledge, limiting their ability to solve real business problems.

Two main approaches address this gap: fine‑tuning and Retrieval‑Augmented Generation (RAG) that combine enterprise data with a general model to create “enterprise intelligence.”

Key challenges for building enterprise‑grade AI include:

Collecting, cleaning, transforming, and annotating structured and unstructured data from diverse storage systems before it can be ingested by large models or vector databases.

Scaling data platforms as business data and model‑generated data grow, demanding higher performance‑to‑cost ratios.

Providing highly agile, easy‑to‑use platforms that enable rapid development of AI‑driven applications.

To meet these challenges, Baidu Intelligent Cloud released a series of updates across its database and big‑data portfolio.

PegaDB is Baidu’s self‑developed KV store, positioned against open‑source Redis. Recent optimizations improve batch loading performance, multi‑region active‑active replication, and cost‑effective hot‑cold data separation by automatically migrating colder data to SSDs, cutting expenses dramatically.

GaiaDB, the cloud‑native relational offering, now ships as version 5.0 with HTAP support, columnar indexes and engines, scalable compute‑storage separation, and a Serverless mode that can reduce compute resources by over 50 % and storage by 80 % for variable workloads.

The vector database VDB 2.0, built in‑house, offers a 2.35× increase in memory efficiency, a 7× performance boost over open‑source alternatives, and an AI Search SDK for rapid knowledge‑base construction in RAG scenarios.

In the data‑warehouse space, Palo 2.0 (based on the open‑source Doris engine) improves stability, fixes over 500 bugs, and delivers more than a 10× TPC‑DS performance gain with hot‑cold storage that lowers storage cost by over 80 %.

Three platform pillars support these engines:

DBSC : a one‑stop DevOps platform for database development, management, security auditing, and intelligent diagnostics, now supporting over ten database types including GaiaDB, Redis, and openGauss.

EDAP : an integrated lake‑warehouse development and governance platform that adds full lifecycle support for unstructured data, AI‑platform integration, and serverless compute across Spark, Flink, and JDBC.

DBStack : a unified database management layer for both public‑cloud and private‑cloud scenarios, delivering multi‑cloud, multi‑engine, and hybrid‑cloud capabilities.

Overall, Baidu Intelligent Cloud’s cloud‑native data foundation now spans KV, relational, vector, and analytical databases, providing faster, stronger, smarter, and easier‑to‑use capabilities that empower enterprise AI workloads across industries.

big datacloud-nativeAIdatabasesenterprise data
Baidu Intelligent Cloud Tech Hub
Written by

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.