Hot and Cold Data Separation: Concepts, Scenarios, and Implementation Methods
The article explains the principle of hot‑cold data separation, when it should be applied, how to distinguish hot versus cold data, and three practical implementation approaches—code modification, binlog listening, and scheduled scanning—to improve database performance and maintain consistency.
Hello everyone, I am Chen.
Regardless of how complex a business scenario is, the lifecycle of a piece of data is reflected in its CRUD operations—Create, Read, Update, Delete. Like human life, a data record’s value diminishes over time.
The value of data lies in how often it is used; different systems have different requirements for data of different ages.
For example, on platforms such as 12306 and Ctrip, users usually only care about orders within the last 30 days, and Ctrip keeps only 30 days of order information by default, requiring a phone‑number lookup for older orders.
Why does Ctrip do this?
If all billions of yearly orders were fully CRUD‑able, the system would collapse instantly. Once an order reaches its final state, it no longer needs modification or deletion—only queries.
Ctrip’s architecture uses hot‑cold separation .
What Is Hot‑Cold Separation?
Hot‑cold separation divides the database into a hot store and a cold store . The hot store holds data that may still be modified; the cold store holds data that has reached its final state.
For instance, within 30 days of a ticket order, users may need to request refunds or invoices (operations requiring updates). Orders older than 30 days typically only need to be queried, so recent orders go to the hot store while older ones go to the cold store.
This introduces two concepts:
Hot data : frequently updated; requires low latency.
Cold data : rarely or never updated; occasional queries; latency not critical.
When Should Hot‑Cold Separation Be Used?
In large‑scale internet systems, consider hot‑cold separation when:
Primary business response latency is too high (e.g., slow order placement on 12306).
Data has reached a final state with no update needs, only read requirements.
Users can tolerate separate queries for new and old data (e.g., Ctrip’s phone‑number lookup for orders older than 30 days).
Supplement: Some systems perform hot‑cold separation internally without exposing it to users.
How to Determine Whether Data Is Hot or Cold?
Typically you classify data based on business fields such as order time (time dimension) or order status (status dimension). For example, data older than three months can be marked as cold, while recent data remains hot. You can also combine dimensions—for instance, orders older than three months and already completed are cold.
In short: Analyze according to your specific business needs.
Two important notes:
If data is marked as cold, the application should no longer perform write operations on it.
Cold and hot data should not be required simultaneously for the same query.
How to Implement Hot‑Cold Data Separation?
After understanding the theory, three common implementation methods are introduced.
1. Modify Business Code
This approach directly changes the business code, which is highly invasive and cannot distinguish data by time; separation is triggered during data modification.
When an order’s status changes to the final state, the code marks it as cold data , writes it to the cold store, and deletes it from the hot store.
2. Listen to Database Binlog
This method monitors the binlog to trigger separation, e.g., when an order status changes.
It cannot distinguish by time but is non‑intrusive to the code.
Tools such as Alibaba’s Canal and other open‑source middleware can be used. For MySQL, Canal is recommended. See the linked tutorial for integration with Spring Boot.
Full process diagram:
3. Scheduled Scanning Tasks
This approach uses time‑based criteria, decouples from business code, and is a good choice.
Process diagram:
Summary
Hot‑cold separation is an effective solution for read/write performance issues. While all three methods can achieve separation, data consistency between hot and cold stores remains the most challenging problem.
Ensuring consistency requires careful handling; many techniques exist but are beyond the scope of this article.
Final Note (Please Support)
If this article helped you, please like, view, share, and bookmark—it motivates me to keep writing!
My Knowledge Planet is now open for a 199 CNY subscription, offering extensive resources such as the "Code Monkey Chronic Disease Cloud Management" project, Spring full‑stack series, billion‑scale sharding practice, DDD micro‑service series, and more.
For more details, visit the provided links.
To join the Knowledge Planet, add my WeChat: special_coder
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.