Understanding TiDB Architecture and Real‑Time Application Scenarios
This article explains TiDB's HTAP architecture, covering industry challenges, the row‑store TiKV and column‑store TiFlash design, MPP integration in TiDB 5.0, and a range of real‑time use cases such as dashboards, reporting, and data‑warehouse pipelines.
Guest: Huang Junshen (PingCAP) • Editor: Lin Donglian (Yicang Technology) • Platform: DataFunTalk
Overview: The article introduces TiDB's architecture and demonstrates how to build real‑time applications on top of TiDB, focusing on three parts: current industry database status, HTAP design thinking, and TiDB use cases.
01 Industry Status Recent demand for real‑time data analysis exposes limitations of traditional databases: MySQL struggles with scalability, NoSQL offers fast point queries but lacks complex analytics, and Hadoop/Spark pipelines introduce latency and operational overhead. Combining multiple products leads to data‑sync challenges and loss of freshness.
Typical solutions involve fragmented data pipelines, sharding on MySQL, OLAP databases like ClickHouse, or Elasticsearch for detail storage, each with drawbacks in elasticity, real‑time analysis, or query flexibility.
02 HTAP Architecture Design TiDB’s architecture consists of TiKV (row‑store) and TiFlash (column‑store) nodes, with PD managing metadata and range‑based sharding. Row‑store TiKV handles OLTP workloads, while column‑store TiFlash supports BI queries, achieving workload isolation via resource separation. Data is replicated between TiKV and TiFlash using Raft, preserving consistency and freshness.
To improve write performance for column stores, the Delta‑Main design writes incoming data to a Delta area and later merges it into the main column files, enabling fast updates without degrading read performance.
TiDB 5.0 introduces an MPP engine built on TiFlash, delivering 2‑3× performance over traditional databases. The MPP cluster performs data shuffle across TiFlash nodes, enabling parallel analysis for large multi‑table joins.
In MPP mode, TiDBServer acts as a coordinator, parsing SQL and generating execution plans that transparently route queries to either TiKV or the TiFlash MPP cluster based on workload characteristics.
03 TiDB Use Cases
HTAP scenario: row‑store TiKV for high‑concurrency TP workloads; column‑store TiFlash for low‑to‑medium concurrency BI analytics, ensuring strong consistency and fresh data.
Streaming & CDC: online schema changes, incremental data capture, and support for both index and batch queries.
Real‑time dashboards: TiDB serves OLTP traffic while TiFlash + MPP accelerates analytical queries, ideal for SaaS ERP applications.
Real‑time reporting: Apache Flink streams 150 M rows (≈20 k TPS) into TiDB; TiFlash enables multi‑dimensional, low‑latency analysis without offline data‑warehouse delays.
Real‑time data warehouse: AWS Kinesis + Flink writes to TiDB; row‑store handles hot data, column‑store handles large‑scale reporting.
Big‑data platform integration: TiSpark bridges TiDB with Hadoop ecosystems, providing a unified real‑time + offline data platform.
04 Q&A Highlights
Q: Does TiDB store data twice (TiKV + TiFlash) and are there extra replica mechanisms? A: TiKV maintains three Raft replicas for high availability; TiFlash adds learner replicas (typically one or two) for AP availability. The two stores are separate but synchronized via Raft, so data is not duplicated unnecessarily.
Q: How to obtain TiDB 5.0? A: TiDB is open‑source; use the TiUP tool (https://tiup.io) to deploy a cluster. Community support is available on asktug.com.
Q: Can we upgrade from 4.0 to 5.0 transparently? A: Yes, TiUP can perform in‑place upgrades, after which TiFlash replicas and MPP can be enabled without data migration.
Q: How does TiDB scaling work? A: Data is sharded into Ranges (Regions). Adding a TiKV or TiFlash node triggers PD to migrate Regions to the new node, achieving smooth scaling.
Q: How many replicas for TiKV and TiFlash? A: TiKV uses three Raft replicas; TiFlash typically runs with one or two learner replicas depending on AP availability requirements.
In summary, TiDB provides a unified OLTP + OLAP + HTAP solution suitable for high‑availability, strong‑consistency, and large‑scale enterprise workloads.
Thank you for listening.
Please like, share, and give a three‑click boost at the end of the article!
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.