How TiDB Transformed Real‑Time Data Warehousing at Yiguo Group
This article details how Yiguo Group migrated from a single‑server SQL Server setup to a distributed TiDB + TiSpark architecture, highlighting performance gains, HTAP capabilities with TiFlash, ETL differences, and future data‑platform considerations.
TiDB Real‑Time Data Warehouse at Yiguo Group
Initially Yiguo Group used a single SQL Server instance for its real‑time data warehouse. Stored procedures on modest data volumes completed in 1–2 minutes, but during peak events execution time could extend to 30–40 minutes, causing unacceptable latency.
To support growth, the offline layer was migrated from SQL Server to Hadoop, and the real‑time layer was rebuilt with a TiDB + TiSpark architecture. The key technical benefits are:
SQL‑based stored procedures can be rewritten as standard SQL statements with minimal effort, avoiding costly code rewrites.
TiDB maintains strong consistency between real‑time (TP) and offline (AP) data, eliminating the need for complex reconciliation logic despite heavy use of UPDATE and DELETE operations in the original procedures.
Moving from a single‑node SQL Server to a distributed TiDB cluster provides linear scalability and eliminates long‑running scripts.
TiSpark, an extension that integrates Spark with TiDB, was a decisive factor. Early TiDB versions lacked many analytical features; TiSpark enables analysts to run complex Spark SQL queries directly on TiDB data, and future support for write‑back will allow a single script to operate on both Hadoop and TiDB clusters.
TiFlash for AP/TP Isolation
In the initial TiDB deployment, transactional (TP) and analytical (AP) workloads interfered with each other, a common HTAP challenge. TiFlash, introduced in 2018, physically separates AP storage from TP storage, thereby eliminating resource contention and moving TiDB closer to true HTAP.
Performance and functional tests conducted in 2018 on representative workloads showed satisfactory latency and throughput. A subset of traffic has been moved to production, with broader rollout planned for the upcoming release.
ETL Comparison: Hadoop vs. TiDB
Hadoop ETL operates at the table level. Jobs are scheduled per table, which limits the impact on cluster resources and allows fine‑grained resource allocation. Unused tables have negligible effect on overall utilization.
TiDB ETL is driven by DM or Syncer, which replicate MySQL instances or entire databases into TiDB. This approach provides rapid data ingestion but can consume more resources when many tables are idle or when data quality varies, because replication runs at the instance/database granularity.
Both approaches benefit from a data‑cataloging component that assesses data usability, records business attributes, and integrates with data‑ingestion pipelines (e.g., OneData) to enforce governance and resource control.
OneService Integration
TiDB serves as a primary data source for OneService, Yiguo’s unified external API platform. OneService exposes TiDB data through RESTful endpoints, managing business attributes, owners, and versioning. Future enhancements include duplicate‑API detection to prevent redundant services.
Future Outlook
HTAP and NewSQL systems such as TiDB are converging with big‑data technologies, moving toward unified database platforms. Different stakeholder groups have distinct priorities:
Traditional DBAs focus on stability and performance.
Big‑data engineers additionally monitor task efficiency and resource occupancy.
Modeling engineers adjust data models based on analyst usage patterns.
Analysts require ease of use and accessibility.
As data‑middle‑platform concepts mature, finer‑grained data management, automated resource control, and enhanced security will evolve alongside TiDB’s ongoing feature improvements.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
