Big Data 17 min read

Evolution and Architecture of Beike's OLAP Platform: From Hive/MySQL to Multi‑Engine Flexibility

Beike’s OLAP platform has progressed from a basic Hive‑MySQL batch pipeline to a Kylin‑based single‑engine solution, and now to a flexible multi‑engine architecture that uses a query‑engine layer to route metrics across Kylin, Druid, ClickHouse and Doris, dramatically cutting cube‑build times, supporting real‑time ingestion, and paving the way for further engine consolidation and automated performance routing.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Evolution and Architecture of Beike's OLAP Platform: From Hive/MySQL to Multi‑Engine Flexibility

Beike (贝壳找房) is a technology‑driven housing platform serving billions of users. To meet increasingly complex multi‑dimensional data analysis needs, its OLAP platform has evolved through three major stages.

Stage 0 (2015‑2016): A simple Hive + MySQL pipeline where data from Sqoop or Kafka is batch‑processed and written to MySQL for reporting. This stage lacked platformization and could not handle large data volumes or fast queries.

Stage 1 (2016‑2019): Adoption of Apache Kylin as a single OLAP engine. A three‑layer architecture (OLAP layer, metric platform layer, API layer) was introduced, providing a unified metric definition, management, and query interface. Kylin’s pre‑computed cubes enabled sub‑second responses but suffered from dimension explosion, long cube‑build times, and limited flexibility.

Stage 2 (2019‑present): A flexible architecture supporting multiple OLAP engines (Kylin, Druid, ClickHouse, Apache Doris, etc.). A query engine (QE) abstracts engine‑specific SQL differences, while a metric platform layer manages cubes that can be bound to any underlying engine. This enables dynamic routing, better performance, and easier integration with real‑time data sources.

The presentation also covered OLAP engine selection criteria (data volume, performance, concurrency, flexibility) and compared common open‑source engines. Apache Druid was highlighted for its columnar storage, sub‑second query latency, real‑time ingestion, and schema‑less design. Druid reduced cube‑build time from 220 minutes (Kylin) to 22 minutes and lowered data‑inflation compared to Kylin.

Future work includes consolidating to one or two engines for deep customization, implementing automatic engine routing based on performance, integrating with the Adhoc platform, and further optimizing real‑time metrics.

The Q&A addressed dimension change handling (using slowly changing dimension tables), query engine abstraction, real‑time ingestion via Kafka, and concurrency levels (up to 2‑3 k QPS).

Data WarehouseBig DataOLAPApache DruidApache KylinBeikeMulti‑Engine Architecture
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.