How T‑TDSQL Enables Full‑Temporal Data Management in Distributed Databases
T‑TDSQL, a temporal extension of Tencent's TDSQL, introduces a full‑temporal data model, scalable distributed architecture, MVCC‑based history visibility algorithms, and rich temporal features that address financial audit, data lineage, and high‑performance analytics for massive historical datasets.
Abstract
T‑TDSQL is a distributed transactional database built on MySQL/InnoDB that adds full‑temporal capabilities. It stores both effective‑time (valid‑time) and transaction‑time dimensions, supports multi‑version transaction management, strong consistency, and provides efficient point‑in‑time and incremental historical queries.
Introduction
Motivation
Financial billing systems at large internet companies process billions of transactions daily. Auditing, reconciliation, and regulatory reporting require access to the complete change history of account records, which is impossible when only current values are retained.
Overall Architecture
TDSQL uses sharding to map logical tables to physical shards. Each shard (SET) is replicated across multiple nodes with strong synchronization via ZooKeeper and distributed XA/2PC for atomic transactions. The Temporal Storage Interface (TSI) extends the storage layer to manage massive historical data and exposes unified APIs for current and historical queries.
Temporal Requirements
Regulators and auditors often need to retrieve all changes to a customer's account over several years. Storing only current rows forces expensive log‑scanning and filtering.
Key Temporal Features
Full‑temporal data model that records the entire lifecycle of each row (creation, updates, deletions).
Row‑based and column‑based storage formats for historical data.
Cluster‑wide backup, recovery, and tiered storage for massive historical datasets.
MVCC‑based history‑visibility algorithm that enables non‑blocking point‑in‑time queries and incremental extraction.
Support for both effective‑time and transaction‑time queries, enabling audit trails, data lineage, and HTAP workloads.
Core Technology
The system retains old row versions via MVCC. Before garbage collection, versions slated for removal are captured and stored, forming a “history data visibility” layer. This layer uses a set‑difference (symmetric‑difference) principle to compute snapshot differences efficiently, allowing:
Retrieval of any historical version without scanning the entire table.
Identification of inserted, updated, and deleted rows between two timestamps.
Incremental data extraction for downstream ETL or analytical pipelines.
Two storage engines (row and column) enable OLAP on temporal data. Integration with Spark push‑down and columnar processing further accelerates analytical queries.
Problems Addressed by T‑TDSQL
Eliminates the need for external log‑parsing pipelines by exposing historical data through standard SQL.
Provides a unified temporal model, removing logical gaps caused by time‑partitioned tables.
Supports real‑time analytics on historical data without costly data export/import.
Offers scalable management of petabyte‑scale historical datasets via distributed storage and hot‑cold tiering.
Feature Comparison (TDSQL vs. T‑TDSQL)
All native transactional and distributed features of TDSQL are retained.
Effective‑time applications (e.g., contract management, archival) are supported only in T‑TDSQL.
Transaction‑time tracking, incremental extraction, incremental computation, trajectory data management, and historical‑value queries are enabled in T‑TDSQL.
Row‑store and column‑store historical storage, hot‑cold tiering, HTAP, data correction, and replay are provided by T‑TDSQL.
Implementation Highlights
Historical versions are captured before MVCC garbage collection and stored in dedicated temporal tables.
Visibility checks use index‑based range scans combined with the set‑difference algorithm to return only rows visible at a given effective or transaction time.
Snapshot‑diff queries can be expressed as:
SELECT * FROM account_history
WHERE ts BETWEEN '2022-01-01' AND '2022-12-31'
AND VALID_TIME CONTAINS '2022-06-15';Incremental joins and aggregations are performed directly on temporal tables, avoiding full‑table scans.
Conclusion
T‑TDSQL demonstrates that a distributed transactional database can natively manage, query, and compute over massive historical datasets while preserving strong consistency and high performance. By leveraging MVCC and a set‑difference based visibility algorithm, it provides low‑latency, non‑blocking access to any point in the data’s lifetime, enabling efficient auditing, compliance, and analytical workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
