Preventing Cache Penetration and Syncing Redis with MySQL at Massive Scale
This article explains why cache penetration can cripple ultra‑large systems, how storing full data in Redis eliminates the risk, and presents two reliable cache‑update strategies—message‑queue‑driven updates and real‑time MySQL Binlog subscription via Canal—complete with configuration steps and Java code examples.
Cache Penetration in Ultra‑Large Systems
When request volume is massive, even a tiny fraction of reads that bypass the cache can overload the database and cause a cascading outage. Scaling Redis clusters alone does not solve this problem if cache misses still reach the DB.
Solution 1 – Cache the Full Dataset in Redis
If memory cost is acceptable, load the entire data set into a Redis cluster. All read operations hit Redis, eliminating cache‑penetration risk. The remaining challenge is keeping Redis consistent with the source database.
Solution 2 – Update Cache via Message Queue
For services that already emit change events (e.g., an order service), run a dedicated cache‑update service that subscribes to the MQ, consumes order‑change messages, and refreshes the corresponding Redis entries. This adds negligible development overhead and does not require changes to the core service.
Solution 3 – Real‑Time Cache Updates via MySQL Binlog (Canal)
If no change‑event stream exists, a cache‑update service can act as a MySQL slave, read the binary log (Binlog), parse entries, and update Redis asynchronously. This approach is more generic but requires Binlog parsing.
Setting Up Canal for Binlog Subscription
Download and extract Canal 1.1.4
wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.deployer-1.1.4.tar.gz
tar zvfx canal.deployer-1.1.4.tar.gzEnable Binlog in MySQL
[mysqld]
log-bin=mysql-bin
binlog-format=ROW
server_id=1Create a Canal user with replication privileges
CREATE USER canal IDENTIFIED BY 'canal';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;Record current Binlog file and position, then configure instance.properties
canal.instance.gtidon=false
canal.instance.master.address=127.0.0.1:3306
canal.instance.master.journal.name=binlog.000009
canal.instance.master.position=155
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset=UTF-8
canal.instance.defaultDatabaseName=test
canal.instance.filter.regex=.*\..*Start Canal
canal/bin/startup.shCanal opens port 11111 for clients. Clients pull batches of Binlog entries, apply them to Redis, and acknowledge success. On failure the client rolls back, guaranteeing ordered, loss‑less processing.
Java Example: Updating an Account Balance Cache
The program continuously pulls messages from Canal, processes each entry, and updates Redis accordingly.
while (true) {
Message message = connector.getWithoutAck(batchSize);
long batchId = message.getId();
try {
int size = message.getEntries().size();
if (batchId == -1 || size == 0) {
Thread.sleep(1000);
} else {
processEntries(message.getEntries(), jedis);
}
connector.ack(batchId);
} catch (Throwable t) {
connector.rollback(batchId);
}
}Processing logic for INSERT, UPDATE, DELETE events:
for (CanalEntry.RowData rowData : rowChange.getRowDatasList()) {
if (eventType == CanalEntry.EventType.DELETE) {
jedis.del(row2Key("user_id", rowData.getBeforeColumnsList()));
} else if (eventType == CanalEntry.EventType.INSERT) {
jedis.set(row2Key("user_id", rowData.getAfterColumnsList()),
row2Value(rowData.getAfterColumnsList()));
} else { // UPDATE
jedis.set(row2Key("user_id", rowData.getAfterColumnsList()),
row2Value(rowData.getAfterColumnsList()));
}
}Test the flow by inserting a record into account_balance and verifying the Redis entry.
INSERT INTO account_balance VALUES (888, 100, NOW(), 999); 127.0.0.1:6379> GET 888
{"log_id":"999","balance":"100","user_id":"888","timestamp":"2020-03-08 16:18:10"}Full example code is available at https://github.com/liyue2008/canal-to-redis-example
Conclusion
For massive concurrent workloads, caching the entire dataset in Redis eliminates cache‑penetration‑induced DB overload. Cache consistency can be maintained either by consuming business‑level change messages or by masquerading as a MySQL slave and subscribing to Binlog via Canal. Both approaches require careful handling of reliability and latency, and production deployments should include fallback or compensation mechanisms for potential data loss or lag.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
