Why Did MySQL Return Stale Data? A SkyWalking Tracing Case Study
This article walks through a real‑world incident where an update to a MySQL row was not visible to a subsequent read, explains how SkyWalking tracing revealed the exact timing of binlog writing, MQ delivery, and query execution, and offers practical mitigation strategies.
Background
Application A updates the y field of order x in MySQL, Canal captures the binlog change and pushes it to a message queue, and Application B consumes the message, calls Application A via Dubbo, and queries the same order. The MySQL deployment uses asynchronous master‑slave replication.
Observed Issue
When Application B queries the order after the update, it reads the old value ( ccc) instead of the new value ( ddd), even though the update has already been committed on the master.
Data Flow Overview
The data flow consists of three possible read paths:
Read from cache
Read from slave‑DB
Read from master‑DB
Initial speculation pointed to cache or slave reads, but logs showed neither path was taken.
Trace Investigation with SkyWalking
After integrating SkyWalking into both applications, the following trace details were captured for the update transaction:
Service: serviceA
Instance IP: 111.111.111.222
Endpoint: Mysql/JDBI/Connection/commit Span type: Exit (external MySQL call)
Component: mysql-connector-java
Peer: serviceA‑xxxx:3311
Success: false
DB type: sql
DB instance: serviceA
SQL statement: xxx
The trace shows the update transaction started at 09:58:54 + 72 ms and finished at + 123 ms on the master instance.
Another trace for the subsequent read confirmed that the query was executed directly against the master‑DB, with no cache or slave involvement.
Timeline Reconstruction
72 ms – Application A begins commit.
73 ms – MySQL writes the binlog.
77 ms – Canal syncs the binlog and sends a message to MQ.
78 ms – Application B consumes the MQ message and calls Application A via Dubbo.
88 ms – Application A performs the read query.
123 ms – Commit completes and returns.
MySQL Transaction Phases (8.x)
MySQL processes a transaction in four key steps:
Write binlog – after this point the transaction is durable.
Synchronize binlog – the exact mechanism (asynchronous, semi‑sync, or strong sync) varies by configuration.
Write engine – releases locks and makes changes visible to subsequent reads. Only after this step can the new data be read.
Return commit response to the client.
Version‑Specific Behavior
In MySQL 5.6 the engine write occurs before binlog synchronization, reducing the window where a read could see stale data. Starting with MySQL 5.7, the default order changed to synchronize first and write the engine later, increasing the probability of the observed anomaly.
Root Cause
The read query executed before the engine write completed, so it fetched the old value from the master even though the commit had already been logged.
Mitigation Recommendations
Implement retry logic using MQ delay‑retry to handle transient stale‑read cases.
Redesign the architecture to avoid direct DB reads after updates, e.g., by using event‑driven consistency checks or read‑after‑write guarantees.
Conclusion
SkyWalking tracing provided precise timestamps that linked the update, binlog propagation, and read operations, proving the issue stemmed from MySQL’s write‑engine timing rather than caching or replication bugs. Understanding the transaction phases and version‑specific behavior enables teams to anticipate and mitigate such rare stale‑read scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
