Why MySQL Binlog Can Cause Order Fulfillment Delays and How to Fix It
This article explains MySQL Binlog’s role in event‑driven order processing, analyzes hidden pitfalls such as premature Binlog reads and two‑phase commit issues, and offers practical solutions like retry strategies and direct Binlog consumption to ensure data consistency.
Background : MySQL Binlog records SQL statements for replication and is used by iQIYI’s membership order system to drive event‑based workflows, improving availability and consistency.
Problems with Direct MQ‑Based Design :
Business system tightly depends on the message broker, making it vulnerable to broker failures.
Reliable retry mechanisms are required to achieve "maximum‑effort" notifications.
Exponential backoff for retries can introduce significant latency.
Binlog‑Based Event‑Driven Redesign : The order table itself becomes an event table. A separate service subscribes to the table’s Binlog, converts row changes into business events, and decouples the core system from the message broker.
Hidden Issue Discovered : Occasionally, order fulfillment services receive a payment event from Binlog but see the order as unpaid in the primary database, despite no replication lag or concurrency conflicts.
MySQL Internals Review :
Redo Log – physical, InnoDB‑level log for crash‑safe recovery.
Binlog – logical, server‑level log for replication.
Both logs participate in a two‑phase commit: Redo Log Prepare → Binlog write → Redo Log Commit.
Two‑Phase Commit Scenarios :
Crash after Redo Log Prepare but before Binlog write – transaction rolls back, logs stay consistent.
Crash after Binlog write but before Redo Log Commit – recovery checks XID or Commit marker in Binlog to decide commit or rollback.
Update Execution Flow (example: UPDATE t SET n=n+1 WHERE id=2 ) :
Executor fetches row by primary key.
Executor computes new value and asks engine to write it.
Engine updates the in‑memory page and records changes in Redo Log Buffer.
Transaction commit marks Redo Log as Prepare and flushes to disk.
Executor generates Binlog entry and writes it.
Engine finalizes Redo Log as Commit.
Root Cause of Delay : Binlog is written before the Redo Log reaches the final Commit state, so downstream consumers may see the event before the primary transaction is fully committed.
Solution Approaches :
Retry : Simple delay‑based retry (e.g., thread sleep) or message re‑delivery (e.g., RocketMQ ConsumeConcurrentlyStatus.RECONSUME_LATER). Must handle potential ABA state issues.
Direct Binlog Consumption : Use row‑format Binlog, which contains the full after‑image of changed rows, to drive business logic directly, reducing DB QPS. Suitable for new systems; existing systems may require significant refactoring.
Takeaway : Understanding MySQL’s Redo Log, Binlog, and two‑phase commit is essential for building reliable event‑driven architectures. Proper handling of the timing gap between Binlog emission and transaction commit prevents data‑visibility anomalies.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
