Databases 15 min read

From Apache Doris to SelectDB: Evolution Towards the Next‑Generation Cloud‑Native Data Warehouse

This presentation introduces Apache Doris, examines changing data analysis demands in the cloud era, explains why SelectDB was created, and details SelectDB’s cloud‑native architecture, performance, unified capabilities, ease of use, cost efficiency, open‑source nature, and its application scenarios for modern data warehousing and log analytics.

DataFunSummit
DataFunSummit
DataFunSummit
From Apache Doris to SelectDB: Evolution Towards the Next‑Generation Cloud‑Native Data Warehouse

The talk begins with an overview of Apache Doris, a high‑performance, real‑time analytical database built on an MPP architecture. Its key characteristics include outstanding performance via a vectorized execution engine, simple SQL compatibility with MySQL protocol, easy operations, and strong support for high‑concurrency point queries.

Doris has evolved from its origins in Baidu’s advertising reporting system to become an Apache top‑level project with a vibrant community of over 500 contributors and thousands of production deployments worldwide, serving companies such as Baidu, Meituan, Xiaomi, JD, ByteDance, Tencent, and many others.

The presentation then discusses how data‑analysis requirements have shifted in the cloud era, highlighting the need for real‑time, unified, and cloud‑native solutions. Traditional big‑data stacks are increasingly complex, prompting a move toward integrated, cloud‑native data warehouses.

SelectDB is introduced as the next‑generation cloud‑native data warehouse, built on Apache Doris and designed for multi‑cloud neutrality. Its architecture separates storage and compute, using object storage for scalable, cost‑effective storage and shared caches for fast query acceleration.

Six core advantages of SelectDB are detailed:

Extreme performance through columnar storage, MPP, vectorized queries, materialized views, and advanced indexing.

Unified experience that supports mixed workloads, hybrid data types (structured, semi‑structured, unstructured), and dynamic schema evolution.

Simplicity with MySQL‑compatible protocol, web‑based UI, one‑click cluster creation, and seamless integration with Flink, Spark, Kafka, and DBT.

High cost‑effectiveness via pay‑as‑you‑go storage and elastic compute, eliminating data duplication across clusters.

Open‑source openness, fully compatible with Apache Doris and providing migration tools.

Enterprise‑grade features such as fine‑grained access control, data encryption, backup/restore, observability, and professional support.

Application scenarios include a modern lake‑house platform that unifies data lake and warehouse capabilities, and a log‑storage and analysis solution that leverages Doris’s vectorized engine, storage‑compute separation, and lightweight inverted indexes to achieve superior write throughput, storage efficiency, and query performance compared with Elasticsearch.

The speaker also notes recent contributions back to the Apache Doris community, including improvements to inverted indexes, top‑N optimization, and time‑series compaction, as well as the release of Apache Doris 2.0 Alpha with advanced features such as high‑QPS point queries, object‑storage tiering, and a cost‑based optimizer.

Overall, the session outlines the technical evolution from Apache Doris to SelectDB, emphasizing how cloud‑native design, performance, simplicity, and openness address modern data‑analysis challenges.

Cloud Nativeanalyticsbig datadata warehouseApache DorisSelectDB
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.