Databases 9 min read

How TiDB Transformed Real‑Time Data Warehousing at Yiguo Group

This article details how Yiguo Group migrated from a single‑server SQL Server setup to a distributed TiDB + TiSpark architecture, highlighting performance gains, HTAP capabilities with TiFlash, ETL differences, and future data‑platform considerations.

dbaplus Community
dbaplus Community
dbaplus Community
How TiDB Transformed Real‑Time Data Warehousing at Yiguo Group

TiDB Real‑Time Data Warehouse at Yiguo Group

Initially Yiguo Group used a single SQL Server instance for its real‑time data warehouse. Stored procedures on modest data volumes completed in 1–2 minutes, but during peak events execution time could extend to 30–40 minutes, causing unacceptable latency.

To support growth, the offline layer was migrated from SQL Server to Hadoop, and the real‑time layer was rebuilt with a TiDB + TiSpark architecture. The key technical benefits are:

SQL‑based stored procedures can be rewritten as standard SQL statements with minimal effort, avoiding costly code rewrites.

TiDB maintains strong consistency between real‑time (TP) and offline (AP) data, eliminating the need for complex reconciliation logic despite heavy use of UPDATE and DELETE operations in the original procedures.

Moving from a single‑node SQL Server to a distributed TiDB cluster provides linear scalability and eliminates long‑running scripts.

TiSpark, an extension that integrates Spark with TiDB, was a decisive factor. Early TiDB versions lacked many analytical features; TiSpark enables analysts to run complex Spark SQL queries directly on TiDB data, and future support for write‑back will allow a single script to operate on both Hadoop and TiDB clusters.

TiDB real‑time data warehouse architecture
TiDB real‑time data warehouse architecture

TiFlash for AP/TP Isolation

In the initial TiDB deployment, transactional (TP) and analytical (AP) workloads interfered with each other, a common HTAP challenge. TiFlash, introduced in 2018, physically separates AP storage from TP storage, thereby eliminating resource contention and moving TiDB closer to true HTAP.

Performance and functional tests conducted in 2018 on representative workloads showed satisfactory latency and throughput. A subset of traffic has been moved to production, with broader rollout planned for the upcoming release.

TiFlash architecture diagram
TiFlash architecture diagram

ETL Comparison: Hadoop vs. TiDB

Hadoop ETL operates at the table level. Jobs are scheduled per table, which limits the impact on cluster resources and allows fine‑grained resource allocation. Unused tables have negligible effect on overall utilization.

TiDB ETL is driven by DM or Syncer, which replicate MySQL instances or entire databases into TiDB. This approach provides rapid data ingestion but can consume more resources when many tables are idle or when data quality varies, because replication runs at the instance/database granularity.

Both approaches benefit from a data‑cataloging component that assesses data usability, records business attributes, and integrates with data‑ingestion pipelines (e.g., OneData) to enforce governance and resource control.

Hadoop vs TiDB ETL comparison
Hadoop vs TiDB ETL comparison

OneService Integration

TiDB serves as a primary data source for OneService, Yiguo’s unified external API platform. OneService exposes TiDB data through RESTful endpoints, managing business attributes, owners, and versioning. Future enhancements include duplicate‑API detection to prevent redundant services.

OneService API management
OneService API management

Future Outlook

HTAP and NewSQL systems such as TiDB are converging with big‑data technologies, moving toward unified database platforms. Different stakeholder groups have distinct priorities:

Traditional DBAs focus on stability and performance.

Big‑data engineers additionally monitor task efficiency and resource occupancy.

Modeling engineers adjust data models based on analyst usage patterns.

Analysts require ease of use and accessibility.

As data‑middle‑platform concepts mature, finer‑grained data management, automated resource control, and enhanced security will evolve alongside TiDB’s ongoing feature improvements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

TiDBHTAPreal-time data warehouseTiFlash
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.