Master‑Slave Replication & Read/Write Splitting: Scaling MySQL While Avoiding Latency
This article explains the principles of master‑slave read/write separation, the MySQL binlog‑based replication process, its side effects, practical ways to mitigate replication lag, and how middleware or proxy layers can simplify database access in high‑concurrency environments.
1. Master‑Slave Read/Write Separation
Most internet services are read‑heavy, so separating read and write traffic allows independent scaling of read replicas. When front‑end traffic spikes, DBAs can add read replicas, distributing queries and reducing load on each replica.
Cache introduces complexity (consistency, penetration, avalanche) whereas read/write separation is simpler; cache should be added only after replication reaches its limits.
Primary DB handles writes.
One or more replica DBs handle reads.
Key points of read/write separation:
Data copy (master‑slave replication).
Hide the change in DB access so developers still use a single logical DB.
2. Master‑Slave Replication
MySQL replication relies on the binary log (binlog) that records all changes. The master writes to the binlog; replicas asynchronously pull the binlog and replay it.
2.1 Replication Process
The replica creates an I/O thread to request the master’s binlog and writes it to a relay log.
The master creates a log‑dump thread to send the binlog.
A SQL thread on the replica reads the relay log and replays the statements, achieving consistency.
Using separate threads makes replication asynchronous, avoiding impact on the master’s write path. However, if the master crashes before the binlog is flushed, data loss and inconsistency can occur, though the probability is low.
If the master fails and the binlog is lost, manual recovery is required.
After replication is set up:
Writes go only to the master.
Reads go only to replicas, allowing multiple replicas to share read load.
Replicas also serve as backups.
Adding too many replicas (more than 3‑5) can overload the master with I/O threads and exceed network bandwidth.
2.2 Side Effects of Replication
Operations that depend on immediate consistency may fail if replicas lag. For example, a workflow that writes a post ID to a message queue and later reads the post from a replica can encounter “not found” errors when the replica is behind.
Monitor Seconds_Behind_Master from SHOW SLAVE STATUS\G to detect lag, but be aware that high I/O thread load can keep this value at zero, so also compare binlog positions.
2.3 Reducing Replication Lag
Common strategies focus on avoiding reads from replicas:
Data redundancy : Include all necessary data in the message sent to the consumer, eliminating the need to query the replica.
Cache usage : Write data to a cache alongside the DB write; consumers read from the cache first. This works well for insert‑only scenarios but can cause inconsistency on concurrent updates.
Query the master : Direct consumers to read from the master when the read volume is low and the master can handle the load. This should be used sparingly and with strict access controls.
Replication delay is a frequent source of hard‑to‑detect bugs; monitoring and alerting on lag (ms‑level normal, s‑level warning) is essential.
3. Accessing the Database
With replication, applications must distinguish between a master address for writes and multiple replica addresses for reads, increasing complexity. Middleware solutions simplify this:
3.1 In‑process Middleware (e.g., TDDL)
Embedded in the application, it acts as a data‑source proxy, routing SQL to the appropriate master or replica based on configuration.
Advantages : Easy to use, low deployment cost, suitable for small teams.
Disadvantages : Limited language support (mostly Java) and upgrade depends on the application.
3.2 Stand‑alone Proxy Layer (e.g., Mycat, Atlas, DBProxy)
Deployed as an independent server, it forwards SQL using the standard MySQL protocol, supporting many languages.
Advantages : Language‑agnostic, easier to maintain and upgrade, suitable for larger teams.
Disadvantages : Adds an extra network hop, introducing some performance overhead.
4. Summary
Master‑slave replication provides data redundancy and horizontal scalability but requires careful handling of consistency vs. write performance, and of replication latency. Monitoring lag, choosing the right middleware, and applying strategies such as data redundancy, caching, or selective master reads help mitigate the drawbacks.
Real‑world examples include Redis read/write splitting via replication, Elasticsearch index shard replication, and HDFS block replication.
FAQ
When orders are sharded by user ID, front‑end queries are fast but back‑office reports that need to sort across all shards become slow. The solution is to sync data to a dedicated reporting database or to Elasticsearch.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JavaEdge
First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
