Big Data 15 min read

TDA: A One‑Stop Self‑Service BI Platform – Architecture, Challenges, and Solutions

The article presents Turing Data Analysis (TDA), a self‑service BI platform that replaces fragile traditional pipelines with a unified DWD‑based data model, drag‑and‑drop analytics, multi‑engine query optimization and caching, delivering sub‑10‑second queries on billions of rows, fine‑grained permissions, and rapid dashboard creation, while reporting significant usage growth and outlining AI‑driven future enhancements.

Baidu Geek Talk

Apr 10, 2024

TDA: A One‑Stop Self‑Service BI Platform – Architecture, Challenges, and Solutions

The article introduces Business Intelligence (BI) and the concept of Turing Data Analysis (TDA), a self‑service analytics platform built to overcome the limitations of traditional BI pipelines.

Background and Goals – Traditional BI requires repeated development of ADS tables when business requirements change and relies on downstream aggregation, leading to low efficiency. TDA aims to provide a unified data‑set model based on DWD wide tables, allowing users to drag‑and‑drop dimensions and metrics, save results to personal or public dashboards, and share analyses. The platform targets four objectives: full coverage of dimensions/metrics, accurate data calipers, timely data delivery (T+10h), and sub‑10‑second query performance on billions of rows.

Technical Design

Backend – Implements a unified query context, a query builder that creates one or multiple query objects (e.g., pagination and count), SQL connectors for MySQL, ClickHouse, Palo, and a caching layer (first‑time write‑through and offline pre‑warm). It also includes data‑set management, system‑level services (subscription, alert, permission), and multi‑process/multi‑coroutine handling.

Frontend – Provides a component library for chart rendering, filters, and custom components, an interaction layer for drag‑and‑drop editing, drill‑down, and canvas operations, and application modules for dashboards, large‑screen displays, and embedded analytics.

Challenges and Solutions

Full dimension coverage – build comprehensive public data sets.

Data accuracy – enforce unified data calipers.

Performance on tens of millions of rows – adopt MPP engines, query optimization, and caching.

Query Optimization

Cache + auto‑roll‑up covering ~70% of dashboard requests.

SQL construction that pushes aggregation to the MPP engine (ClickHouse/Palo).

Multi‑domain concurrent requests to bypass browser’s six‑connection limit and improve throughput.

System Guarantees

Subscription and alert mechanisms with configurable content, format, and trigger conditions.

Fine‑grained data permission control via dual‑layer (data‑set & dashboard) authorization, integrated with a unified permission service (MPS).

Summary and Future Planning

The platform has achieved significant growth (PV > 20k, UV > 1k, 300+ new charts daily), performance gains (first‑screen latency reduced from >10 s to ~5 s), and business efficiency improvements (self‑service rate >80%, analysis speed up 20×). Future directions include AI‑enhanced analytics, expanded data source integration, AI‑driven attribution analysis, and an AI‑powered management cockpit.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Query Optimization Data Platform Self‑service analytics MPP BI

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.