Big Data 21 min read

Baidu’s Secret to Faster Search Data: Wide‑Table Modeling & Fusion Engine

This article outlines Baidu’s innovative approach to building its search data platform, detailing the design of wide‑table models, the upgrade to a Spark‑based fusion computation engine, and the new Turing 3.0 service delivery framework, which together deliver higher efficiency, lower cost, and faster, more reliable analytics.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Baidu’s Secret to Faster Search Data: Wide‑Table Modeling & Fusion Engine

Overview

The article presents Baidu’s innovative construction of its search data platform, focusing on three main directions: wide‑table model design, computation engine optimization, and the next‑generation service delivery model (Turing 3.0). These improvements address the challenges of traditional data warehouses in search scenarios, achieving high efficiency, stability, and low cost.

Key Innovations

Wide‑Table Model

A theme‑based wide‑table model is built by keeping the ODS and DWD layer granularity consistent, integrating all downstream fields, dimensions, and metrics into a single table. This eliminates redundancy across layers, unifies metric definitions, and supports multi‑dimensional analysis for various business needs.

Wide‑table model diagram
Wide‑table model diagram

Fusion Computation Engine

The legacy C++ MapReduce (UPI) framework was replaced with a Spark‑based fusion engine that reuses resources via a long‑living Application Context, writes directly to Parquet without extra ETL scripts, and reduces job startup time. This upgrade cuts ETL processing from 40 minutes to 10 minutes and improves resource utilization by about 20%.

Fusion engine architecture
Fusion engine architecture

New Service Delivery (Turing 3.0)

Turing 3.0 integrates three products—Turing Data Engine (TDE), Turing Data Studio (TDS), and Turing Data Analysis (TDA)—to form a unified development paradigm. Data sets become the core artifact, enabling a closed loop of data set ↔ visual analysis ↔ dashboard, reducing delivery cycles from weeks to days and empowering self‑service analytics.

Turing 3.0 ecosystem
Turing 3.0 ecosystem

Performance Gains

Ad‑hoc query latency reduced from tens of seconds to a few seconds (≈5× speedup).

Complex field flattening improves query performance by 2.1×.

Parquet columnar storage with ZSTD compression and bucket sorting reduces storage by ~30% and improves I/O efficiency.

Merge‑Into on Iceberg cuts back‑fill time by ~54% compared with INSERT OVERWRITE.

Overall data‑warehouse table count decreased from hundreds to ~20, with a 30% reduction in storage and a 30% drop in operational cost.

Future Outlook

The team plans to extend the platform with generic data‑flow solutions, automated logging (including no‑code tracing), abstracted wide‑table model layers, and AI‑assisted development to further accelerate data‑driven product iteration.

big dataData WarehouseFusion EngineSearch AnalyticsTuring 3.0wide table modeling
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.