TDA: A One‑Stop Self‑Service BI Platform – Architecture, Challenges, and Solutions
The article presents Turing Data Analysis (TDA), a self‑service BI platform that replaces fragile traditional pipelines with a unified DWD‑based data model, drag‑and‑drop analytics, multi‑engine query optimization and caching, delivering sub‑10‑second queries on billions of rows, fine‑grained permissions, and rapid dashboard creation, while reporting significant usage growth and outlining AI‑driven future enhancements.
The article introduces Business Intelligence (BI) and the concept of Turing Data Analysis (TDA), a self‑service analytics platform built to overcome the limitations of traditional BI pipelines.
Background and Goals – Traditional BI requires repeated development of ADS tables when business requirements change and relies on downstream aggregation, leading to low efficiency. TDA aims to provide a unified data‑set model based on DWD wide tables, allowing users to drag‑and‑drop dimensions and metrics, save results to personal or public dashboards, and share analyses. The platform targets four objectives: full coverage of dimensions/metrics, accurate data calipers, timely data delivery (T+10h), and sub‑10‑second query performance on billions of rows.
Technical Design
Backend – Implements a unified query context, a query builder that creates one or multiple query objects (e.g., pagination and count), SQL connectors for MySQL, ClickHouse, Palo, and a caching layer (first‑time write‑through and offline pre‑warm). It also includes data‑set management, system‑level services (subscription, alert, permission), and multi‑process/multi‑coroutine handling.
Frontend – Provides a component library for chart rendering, filters, and custom components, an interaction layer for drag‑and‑drop editing, drill‑down, and canvas operations, and application modules for dashboards, large‑screen displays, and embedded analytics.
Challenges and Solutions
Full dimension coverage – build comprehensive public data sets.
Data accuracy – enforce unified data calipers.
Performance on tens of millions of rows – adopt MPP engines, query optimization, and caching.
Query Optimization
Cache + auto‑roll‑up covering ~70% of dashboard requests.
SQL construction that pushes aggregation to the MPP engine (ClickHouse/Palo).
Multi‑domain concurrent requests to bypass browser’s six‑connection limit and improve throughput.
System Guarantees
Subscription and alert mechanisms with configurable content, format, and trigger conditions.
Fine‑grained data permission control via dual‑layer (data‑set & dashboard) authorization, integrated with a unified permission service (MPS).
Summary and Future Planning
The platform has achieved significant growth (PV > 20k, UV > 1k, 300+ new charts daily), performance gains (first‑screen latency reduced from >10 s to ~5 s), and business efficiency improvements (self‑service rate >80%, analysis speed up 20×). Future directions include AI‑enhanced analytics, expanded data source integration, AI‑driven attribution analysis, and an AI‑powered management cockpit.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.