Databases 9 min read

Hot and Cold Data Separation: Concepts, Scenarios, and Implementation Methods

The article explains the principle of hot‑cold data separation, when it should be applied, how to distinguish hot versus cold data, and three practical implementation approaches—code modification, binlog listening, and scheduled scanning—to improve database performance and maintain consistency.

Code Ape Tech Column

Sep 7, 2023

Hot and Cold Data Separation: Concepts, Scenarios, and Implementation Methods

Hello everyone, I am Chen.

Regardless of how complex a business scenario is, the lifecycle of a piece of data is reflected in its CRUD operations—Create, Read, Update, Delete. Like human life, a data record’s value diminishes over time.

The value of data lies in how often it is used; different systems have different requirements for data of different ages.

For example, on platforms such as 12306 and Ctrip, users usually only care about orders within the last 30 days, and Ctrip keeps only 30 days of order information by default, requiring a phone‑number lookup for older orders.

Why does Ctrip do this?

If all billions of yearly orders were fully CRUD‑able, the system would collapse instantly. Once an order reaches its final state, it no longer needs modification or deletion—only queries.

Ctrip’s architecture uses hot‑cold separation .

What Is Hot‑Cold Separation?

Hot‑cold separation divides the database into a hot store and a cold store . The hot store holds data that may still be modified; the cold store holds data that has reached its final state.

For instance, within 30 days of a ticket order, users may need to request refunds or invoices (operations requiring updates). Orders older than 30 days typically only need to be queried, so recent orders go to the hot store while older ones go to the cold store.

This introduces two concepts:

Hot data : frequently updated; requires low latency.

Cold data : rarely or never updated; occasional queries; latency not critical.

When Should Hot‑Cold Separation Be Used?

In large‑scale internet systems, consider hot‑cold separation when:

Primary business response latency is too high (e.g., slow order placement on 12306).

Data has reached a final state with no update needs, only read requirements.

Users can tolerate separate queries for new and old data (e.g., Ctrip’s phone‑number lookup for orders older than 30 days).

Supplement: Some systems perform hot‑cold separation internally without exposing it to users.

How to Determine Whether Data Is Hot or Cold?

Typically you classify data based on business fields such as order time (time dimension) or order status (status dimension). For example, data older than three months can be marked as cold, while recent data remains hot. You can also combine dimensions—for instance, orders older than three months and already completed are cold.

In short: Analyze according to your specific business needs.

Two important notes:

If data is marked as cold, the application should no longer perform write operations on it.

Cold and hot data should not be required simultaneously for the same query.

How to Implement Hot‑Cold Data Separation?

After understanding the theory, three common implementation methods are introduced.

1. Modify Business Code

This approach directly changes the business code, which is highly invasive and cannot distinguish data by time; separation is triggered during data modification.

When an order’s status changes to the final state, the code marks it as cold data , writes it to the cold store, and deletes it from the hot store.

2. Listen to Database Binlog

This method monitors the binlog to trigger separation, e.g., when an order status changes.

It cannot distinguish by time but is non‑intrusive to the code.

Tools such as Alibaba’s Canal and other open‑source middleware can be used. For MySQL, Canal is recommended. See the linked tutorial for integration with Spring Boot.

Full process diagram:

3. Scheduled Scanning Tasks

This approach uses time‑based criteria, decouples from business code, and is a good choice.

Process diagram:

Summary

Hot‑cold separation is an effective solution for read/write performance issues. While all three methods can achieve separation, data consistency between hot and cold stores remains the most challenging problem.

Ensuring consistency requires careful handling; many techniques exist but are beyond the scope of this article.

Final Note (Please Support)

If this article helped you, please like, view, share, and bookmark—it motivates me to keep writing!

My Knowledge Planet is now open for a 199 CNY subscription, offering extensive resources such as the "Code Monkey Chronic Disease Cloud Management" project, Spring full‑stack series, billion‑scale sharding practice, DDD micro‑service series, and more.

For more details, visit the provided links.

To join the Knowledge Planet, add my WeChat: special_coder

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Architecture cold data Data Lifecycle hot data database partitioning

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.