Databases 11 min read

Hot and Cold Data Architecture: From MySQL to NewSQL and Data Governance

This article examines the evolution of database architectures, introduces the hot‑data and cold‑data concepts for a weather‑data service, and details a practical implementation using MySQL, TiDB, and PolarDB with synchronization, expiration procedures, and suggestions for future improvements.

Architecture Digest
Architecture Digest
Architecture Digest
Hot and Cold Data Architecture: From MySQL to NewSQL and Data Governance

As a startup providing commercial weather data, we recognized the need for systematic data planning and governance, especially as simple single‑instance databases become insufficient with growing data volumes.

Database Evolution Overview : The growth of internet data has driven databases through stages from single‑table designs to master‑slave replication, sharding, NoSQL, and finally NewSQL, which combines traditional protocols with high performance and horizontal scalability.

Hot Data vs. Cold Data : We define “hot data” as immediately distributable data stored in fast media (e.g., Redis) for real‑time API consumption, while “cold data” refers to long‑term stored data used for internal analysis, modeling, and archival, without affecting hot‑data service quality.

In practice, hot data is written to Redis and also persisted to MySQL; however, as row counts reach billions, MySQL struggles with performance and stability.

Offline Data Center Implementation : To offload historical data, we evaluated NewSQL solutions and selected Alibaba Cloud PolarDB as a cost‑effective offline store, while keeping compatibility with existing MySQL client protocols.

Data Synchronization and Expiration : We adopted a binlog‑based sync that replicates inserts and updates from the online MySQL cluster to the offline store but filters out DELETE statements. TiDB’s Syncer can be configured for this purpose:

[[skip-dmls]]
 db-name = "weather_data"
 tbl-name = "weather_now_history"
 type = "delete"

Data expiration is handled via MySQL EVENT and PROCEDURE mechanisms. The following procedure deletes rows older than a configurable number of days:

CREATE DEFINER=`weather`@`%` PROCEDURE `weather_data`.`del_old_data`(IN `date_inter` int)
BEGIN
  delete from weather_data.weather_now_history where datetime < date_sub(curdate(), interval date_inter day);
END

An event triggers this procedure daily:

CREATE EVENT del_old_data
ON SCHEDULE EVERY 1 DAY
STARTS '2018-12-25 10:08:35.000'
ON COMPLETION PRESERVE
ENABLE
DO call del_old_data(30);

Continuous Improvement : Recognizing that hot/cold boundaries can shift, we propose a publish/subscribe layer where the data acquisition service publishes to a message queue, and storage services (hot or cold) subscribe as needed, enabling flexible data routing without duplicate fetching.

We also note that PolarDB may eventually be insufficient for long‑term archival, suggesting column‑store NoSQL solutions for low‑cost, infrequently accessed data.

References : TiDB usage in Ele.me archiving environment.

MySQLTiDBNewSQLdatabasesdata governanceCold DataPolarDBhot data
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.