Databases 14 min read

Advanced SpringBoot Read‑Write Splitting: Master‑Slave Switching and Automatic Failover

In high‑concurrency internet architectures, a MySQL master‑slave setup with read‑write splitting is the baseline for high availability, but static routing suffers from node failures and lag; this article explains how ShardingSphere provides health checks, auto‑failover, load‑balancing, and degradation to achieve resilient read‑write separation.

Java Tech Workshop
Java Tech Workshop
Java Tech Workshop
Advanced SpringBoot Read‑Write Splitting: Master‑Slave Switching and Automatic Failover

Read‑Write Splitting Architecture

In high‑traffic internet systems, a MySQL master‑slave topology with read‑write splitting provides basic high‑availability. The master processes all write statements, while one or more slaves handle read traffic, reducing write‑read contention, connection limits, and CPU saturation.

If a slave crashes, requests continue to be routed to the failed node, causing query timeouts, connection errors, circuit breaking, and widespread 500 responses.

If the master fails, the entire system cannot perform writes, resulting in service outage.

Replication lag can return stale data, breaking consistency.

Static routing lacks node health detection, automatic exclusion of faulty nodes, and automatic reintegration of recovered nodes.

Uneven load across multiple slaves can create a single‑point overload avalanche.

Basic Static Read‑Write Splitting Defects

Write requests always go to the master.

Read requests are distributed by fixed round‑robin or random selection among slave1 and slave2.

No node liveness detection.

No automatic removal of faulty nodes.

No automatic reintegration of recovered nodes.

Core Requirements for High‑Availability Read‑Write Splitting

Node health probing: periodic heartbeat checks for connectivity and availability.

Automatic slave failover: faulty slaves are excluded and traffic is smoothly shifted to healthy nodes.

Node auto‑recovery: after a failed node restarts and passes health checks, it is automatically re‑added to the cluster.

Global degradation fallback: if all slaves are down, reads are automatically routed to the master.

Master failover: supports manual emergency switch and can be combined with clustering for automatic master election.

Why Choose ShardingSphere?

Native support for health detection, failover, load balancing, and degradation fallback.

Zero code intrusion; all features are enabled via configuration.

Dynamic node management, multiple load‑balancing strategies, and compatibility with any SpringBoot project.

Replaces traditional dynamic data‑source frameworks with stronger stability and richer ecosystem.

Environment Setup (1 Master + 2 Slaves)

Topology

Master (3306): write, transaction, update, delete.

Slave1 (3307): read traffic.

Slave2 (3308): read traffic.

Ensure MySQL master‑slave replication is configured and binlog_format=ROW is enabled.

Maven Dependencies

<!-- SpringBoot Web -->
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-web</artifactId>
</dependency>

<!-- ShardingSphere read‑write splitting core -->
<dependency>
  <groupId>org.apache.shardingsphere</groupId>
  <artifactId>shardingsphere-jdbc-core-spring-boot-starter</artifactId>
  <version>5.2.1</version>
</dependency>

<!-- Druid connection pool -->
<dependency>
  <groupId>com.alibaba</groupId>
  <artifactId>druid-spring-boot-starter</artifactId>
  <version>1.2.16</version>
</dependency>

<!-- MySQL driver -->
<dependency>
  <groupId>com.mysql</groupId>
  <artifactId>mysql-connector-j</artifactId>
  <scope>runtime</scope>
</dependency>

Health Check and Automatic Slave Failover

ShardingSphere 5.x provides a built‑in database heartbeat mechanism that performs node probing, fault exclusion, traffic shifting, and node recovery without custom detection threads.

Read‑Write Splitting Configuration

spring:
  shardingsphere:
    datasource:
      names: master,slave1,slave2
      master:
        type: com.alibaba.druid.pool.DruidDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        url: jdbc:mysql://127.0.0.1:3306/test_db?useUnicode=true&serverTimezone=Asia/Shanghai&allowMultiQueries=true
        username: root
        password: root
      slave1:
        type: com.alibaba.druid.pool.DruidDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        url: jdbc:mysql://127.0.0.1:3307/test_db?useUnicode=true&serverTimezone=Asia/Shanghai
        username: root
        password: root
      slave2:
        type: com.alibaba.druid.pool.DruidDataSource
        driver-class-name: com.mysql.cj.jdbc.Driver
        url: jdbc:mysql://127.0.0.1:3308/test_db?useUnicode=true&serverTimezone=Asia/Shanghai
        username: root
        password: root
    rules:
      readwrite-splitting:
        data-sources:
          master-slave-group:
            write-data-source-name: master
            read-data-source-names: [slave1, slave2]
            load-balancer-name: round_robin
        read-data-source-query-enabled: true
        load-balancers:
          round_robin:
            type: ROUND_ROBIN
          random:
            type: RANDOM
          weight:
            type: WEIGHT
            props:
              slave1: 30
              slave2: 70
        health-check:
          enabled: true
          health-check-sql: SELECT 1
          interval: 10000   # 10 seconds
          timeout: 3000     # 3 seconds
          failure-threshold: 3   # mark node down after 3 consecutive failures
          recovery-threshold: 2  # bring node back after 2 consecutive successes
        props:
          sql-show: true

Core Parameters

Health‑check interval: every 10s execute SELECT 1 to verify connectivity.

Failure exclusion: after 3 consecutive failures, the node is marked unavailable and removed from the load‑balancing pool.

Automatic recovery: after 2 consecutive successful checks, the node is automatically re‑added.

Global degradation fallback: read-data-source-query-enabled: true routes reads to the master when all slaves are down.

Complete Failover Process

Slave1 crashes or loses network connectivity.

Heartbeat checks fail consecutively; ShardingSphere marks Slave1 as offline.

Subsequent read traffic is routed only to Slave2, keeping the service uninterrupted.

Slave1 restarts and passes health checks.

After two successful probes, the node is automatically re‑added and read traffic is re‑balanced.

Multi‑Slave Load‑Balancing Strategies

ShardingSphere offers three built‑in policies:

ROUND_ROBIN (default)

Evenly distributes traffic across all slaves; suitable when hardware and performance are uniform.

RANDOM

Selects a slave at random; provides relatively even distribution for small clusters.

WEIGHT

Assigns weights based on server capacity; higher‑spec machines handle more traffic. Example: slave1: 30, slave2: 70 results in 70% of read requests going to slave2.

Master‑Slave Replication Lag Solutions

The primary consistency risk is that the master commits a write while a slave has not yet synchronized, causing stale reads.

Solution Options

Force master for real‑time queries: define a custom annotation and AOP aspect to route critical reads to the master.

Transactional master routing: ShardingSphere defaults to using the master for all reads and writes within the same transaction, eliminating intra‑transaction lag.

Delay‑threshold routing: monitor slave lag; if it exceeds a threshold, automatically exclude the slave and route reads to the master.

Business‑level eventual consistency: tolerate short delays for non‑critical queries while forcing critical queries to the master, achieving a cost‑effective production solution.

Annotation Example for Master‑Only Queries

// Annotation to force master routing
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface MasterQuery {}

Combine this annotation with an AOP aspect and ShardingSphere's dynamic routing to enforce master reads.

Master Failover Strategies

While slaves can failover automatically, the master remains a single point of failure.

Manual Master‑Slave Switch

When the master crashes, operators promote a slave to master and restart services with updated configuration. This approach is simple and reliable but requires manual intervention.

Automatic Master Failover

Integrate MySQL Group Replication (MGR), Keepalived, or ZooKeeper to achieve:

Automatic master election upon failure.

Virtual IP (VIP) auto‑migration.

Zero‑code configuration changes for seamless switch.

Suitable for payment, transaction, and order systems that demand high availability.

Conclusion

Effective read‑write splitting goes beyond static routing; it requires fault‑tolerant self‑healing mechanisms such as health checks, automatic failover, node recovery, load balancing, and degradation fallback. ShardingSphere implements all of these mechanisms out‑of‑the‑box, providing a robust solution for high‑concurrency production environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

High AvailabilityMySQLRead-Write SplittingShardingSphereSpringBootAutomatic Failover
Java Tech Workshop
Written by

Java Tech Workshop

Focused on Java backend technologies, sharing fundamentals, multithreading, JVM, the Spring ecosystem, microservices, distributed systems, high concurrency, source‑code analysis, and practical experience. Continuously delivers high‑quality original content, interview guides, and learning roadmaps to help Java developers progress from beginner to advanced, enhancing technical skills and core competitiveness.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.