Big Data 21 min read

Baidu’s Secret to Faster Search Data: Wide‑Table Modeling & Fusion Engine

This article outlines Baidu’s innovative approach to building its search data platform, detailing the design of wide‑table models, the upgrade to a Spark‑based fusion computation engine, and the new Turing 3.0 service delivery framework, which together deliver higher efficiency, lower cost, and faster, more reliable analytics.

Baidu Geek Talk

Jul 2, 2025

Baidu’s Secret to Faster Search Data: Wide‑Table Modeling & Fusion Engine

Overview

The article presents Baidu’s innovative construction of its search data platform, focusing on three main directions: wide‑table model design, computation engine optimization, and the next‑generation service delivery model (Turing 3.0). These improvements address the challenges of traditional data warehouses in search scenarios, achieving high efficiency, stability, and low cost.

Key Innovations

Wide‑Table Model

A theme‑based wide‑table model is built by keeping the ODS and DWD layer granularity consistent, integrating all downstream fields, dimensions, and metrics into a single table. This eliminates redundancy across layers, unifies metric definitions, and supports multi‑dimensional analysis for various business needs.

Fusion Computation Engine

The legacy C++ MapReduce (UPI) framework was replaced with a Spark‑based fusion engine that reuses resources via a long‑living Application Context, writes directly to Parquet without extra ETL scripts, and reduces job startup time. This upgrade cuts ETL processing from 40 minutes to 10 minutes and improves resource utilization by about 20%.

New Service Delivery (Turing 3.0)

Turing 3.0 integrates three products—Turing Data Engine (TDE), Turing Data Studio (TDS), and Turing Data Analysis (TDA)—to form a unified development paradigm. Data sets become the core artifact, enabling a closed loop of data set ↔ visual analysis ↔ dashboard, reducing delivery cycles from weeks to days and empowering self‑service analytics.

Performance Gains

Ad‑hoc query latency reduced from tens of seconds to a few seconds (≈5× speedup).

Complex field flattening improves query performance by 2.1×.

Parquet columnar storage with ZSTD compression and bucket sorting reduces storage by ~30% and improves I/O efficiency.

Merge‑Into on Iceberg cuts back‑fill time by ~54% compared with INSERT OVERWRITE.

Overall data‑warehouse table count decreased from hundreds to ~20, with a 30% reduction in storage and a 30% drop in operational cost.

Future Outlook

The team plans to extend the platform with generic data‑flow solutions, automated logging (including no‑code tracing), abstracted wide‑table model layers, and AI‑assisted development to further accelerate data‑driven product iteration.

big data Data Warehouse Fusion Engine Search Analytics Turing 3.0 wide table modeling

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Overview

Key Innovations

Wide‑Table Model

Fusion Computation Engine

New Service Delivery (Turing 3.0)

Performance Gains

Future Outlook

Baidu Geek Talk

How this landed with the community

Was this worth your time?

0 Comments

New Service Delivery (Turing 3.0)