Databases 37 min read

Comprehensive Guide to Database Horizontal Scaling, Sharding, and High Availability with MariaDB and Keepalived

This article presents a detailed analysis and step‑by‑step implementation of horizontal database scaling, including sharding strategies, shutdown and stop‑write plans, log‑based migration, dual‑write approaches, and a smooth 2N expansion method, while also covering MariaDB master‑master configuration, dynamic data source addition, and Keepalived high‑availability setup.

Top Architect
Top Architect
Top Architect
Comprehensive Guide to Database Horizontal Scaling, Sharding, and High Availability with MariaDB and Keepalived

1. Expansion Scheme Analysis

In the early stage of the project, three databases A, B, C were deployed and data was evenly distributed using uid % 3 sharding in the service layer.

As user volume grows, the databases reach their storage limits, and further writes cause CRUD operations to slow down, affecting service response time.

1.1 Expansion Issues

Adding a new node D expands the cluster from 3 to 4 databases, changing the sharding rule to uid % 4 . Most existing data no longer matches the original shards, requiring massive data migration.

For example, uid=3 was stored in database A (3%3=0). After adding D, uid=3 maps to D (3%4=3).

Approximately 90% of the data must be migrated, creating heavy load and potential service instability.

1.2 Shutdown Scheme

Publish announcement to users about the upgrade date.

Stop the service.

Perform offline data migration according to the new sharding rule.

Validate data consistency.

Change the sharding configuration (uid%3 → uid%4).

Restart the service.

Prepare rollback plans for each step.

Service downtime guarantees migration correctness but harms user experience.

1.3 Stop‑Write Scheme

During migration, set each database to read‑only, intercept write requests, and return a maintenance notice.

Enable read‑write separation.

Publish upgrade announcement.

Intercept write operations in the service layer.

Perform data synchronization (copy).

Validate data.

Change sharding configuration (uid%3 → uid%4) via configuration center without restarting services.

Restore write capability.

Delete redundant data.

Rollback plans for each step.

Drawback: long stop‑write period and heavy data copy.

1.4 Log Scheme

Use database binary logs to record data changes (table, primary key, operation type) without storing full row data.

Before upgrade, record modification logs.

During migration, replay logs to the new database.

Validate data after replay.

Advantages: low impact on the running service, ability to throttle replay, and easy rollback.

1.5 Dual‑Write Scheme (small‑to‑medium data)

Implement dual writes via Canal or MQ. Steps include adding a new node, initial data migration, enabling Canal for incremental sync, switching traffic with Nginx, and handling exception data.

1.6 Smooth 2N Scheme (large data)

Use master‑master replication with two primary nodes (A and B) each having a slave (A0, B0). After adding slaves as new masters, adjust sharding rules so that uid%4=0/2 map to A/A0 and uid%4=1/3 map to B/B0, eliminating the need for full data migration.

2. Smooth 2N Expansion Practice

2.1 Implement Application‑Level Dynamic Scaling

2.1.1 MariaDB Service Installation

yum -y install wget
# Backup CentOS-Base.repo
mv /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bak

wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
wget -P /etc/yum.repos.d/ http://mirrors.aliyun.com/repo/epel-7.repo

yum clean all

yum makecache

Configure MariaDB YUM repository, install MariaDB server, and run mysql_secure_installation to set root password and remote access.

2.1.2 MariaDB Master‑Master Sync

On Server1 (ID=1) add:

[mysqld]
server-id = 1
log-bin = mysql-bin
relay-log = mysql-relay-bin
replicate-wild-ignore-table = mysql.%
replicate-wild-ignore-table = information_schema.%
log-slave-updates = on
slave-skip-errors = all
auto-increment-offset = 1
auto-increment-increment = 2
binlog_format = mixed
expire_logs_days = 10

On Server2 (ID=2) use the same configuration with server-id = 2 and auto-increment-offset = 2 . Create a replication user replica and grant REPLICATION privileges.

Set master info on each server:

CHANGE MASTER TO master_host='192.168.116.141', master_user='replica', master_password='replica', master_port=3306, master_log_file='mysql-bin.000007', master_log_pos=374, master_connect_retry=30;
START SLAVE;

2.1.3 KeepAlived Installation and HA Configuration

Install KeepAlived on both nodes:

yum -y install keepalived

Configure /etc/keepalived/keepalived.conf (example for Server1):

global_defs {
    router_id vip1
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 111
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 6666
    }
    virtual_ipaddress {
        192.168.116.150
    }
}

virtual_server 192.168.116.150 3306 {
    delay_loop 6
    lb_algo rr
    lb_kind DR
    persistence_timeout 0
    protocol TCP
    real_server 192.168.116.140 3306 {
        notify_down "/usr/local/shell/mariadb.sh"
        weight 1
        TCP_CHECK {
            connect_timeout 10
            retry 3
            delay_before_retry 3
            connect_port 3306
        }
    }
}

Create /usr/local/shell/mariadb.sh to stop KeepAlived on failure:

pkill keepalived

2.1.4 Application Service Engineering

Introduce ShardingJDBC (a lightweight Java sharding framework) to handle database routing. Configure data sources, sharding rules, and dynamically add new data sources at runtime.

// Dynamic data source addition example
Properties props = loadPropertiesFile("datasource1.properties");
DruidDataSource ds = (DruidDataSource) DruidDataSourceFactory.createDataSource(props);
OrchestrationShardingDataSource ods = SpringContextUtil.getBean("tradeSystemDataSource", OrchestrationShardingDataSource.class);
Map
dsMap = ods.getDataSource().getDataSourceMap();
dsMap.put("ds2", ds);
// Refresh sharding rule
ods.renew(new DataSourceChangedEvent("/config/schema/logic_db/datasource", dsConfigMap));

Note limitations of ShardingJDBC: limited sub‑query support, no HAVING, no UNION/UNION ALL, limited INSERT syntax, etc.

2.2 Implement Database Second‑Level Smooth 2N Expansion

2.2.1 Add Database VIP

On Server2, modify /etc/keepalived/keepalived.conf to add a new virtual IP (e.g., 192.168.116.151) with a unique virtual_router_id and priority.

2.2.2 Application Service Adds Dynamic Data Source

Update the application configuration to point a new data source to the newly created VIP (192.168.116.151) and trigger a runtime reload.

2.2.3 Remove Original Master‑Master Sync

STOP SLAVE;   -- on both original servers

Verify that writes are no longer replicated between the original pair.

2.2.4 Install New MariaDB Expansion Servers (Server3 & Server4)

Install MariaDB on the new VMs following the steps in 2.1.1. Configure server-id as 3 and 4 respectively, enable binlog, and set auto‑increment offsets to avoid primary‑key collisions.

[mysqld]
server-id = 3
log-bin = mysql-bin
relay-log = mysql-relay-bin
auto-increment-offset = 2
auto-increment-increment = 2
log-slave-updates = on

Perform a full dump from the original master, import it to the new server, and set up replication:

CHANGE MASTER TO master_host='192.168.116.140', master_user='replica', master_password='replica', master_log_file='mysql-bin.000016', master_log_pos=1754;
START SLAVE;

2.2.5 Add KeepAlived for High Availability on New Nodes

Configure KeepAlived on Server3 and Server4 similarly to previous nodes, using distinct router_id and virtual_router_id , and point the virtual IP to the new servers.

2.2.6 Clean Data and Verify

Use the application to perform a dynamic data‑source switch, then clean up old data (e.g., delete rows where accountNo % 2 != 0 on Server1 and accountNo % 2 != 1 on Server2) and verify consistency across all nodes.

3. KeepAlived High‑Availability Configuration Overview

Below are sample keepalived.conf files for the four servers involved.

Server1 (192.168.116.140)

global_defs {
    router_id vip1
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 111
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 6666
    }
    virtual_ipaddress {
        192.168.116.150
    }
}

virtual_server 192.168.116.150 3306 {
    delay_loop 6
    lb_algo rr
    lb_kind DR
    persistence_timeout 0
    protocol TCP
    real_server 192.168.116.140 3306 {
        notify_down "/usr/local/shell/mariadb.sh"
        weight 1
        TCP_CHECK {
            connect_timeout 10
            retry 3
            delay_before_retry 3
            connect_port 3306
        }
    }
}

Server2 (192.168.116.141)

global_defs {
    router_id vip2
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 111
    priority 98
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 6666
    }
    virtual_ipaddress {
        192.168.116.150
    }
}

virtual_server 192.168.116.150 3306 {
    delay_loop 6
    lb_algo rr
    lb_kind DR
    persistence_timeout 0
    protocol TCP
    real_server 192.168.116.141 3306 {
        notify_down "/usr/local/shell/mariadb.sh"
        weight 1
        TCP_CHECK {
            connect_timeout 10
            retry 3
            delay_before_retry 3
            connect_port 3306
        }
    }
}

Server3 (192.168.116.142) – New Node

global_defs {
    router_id vip3
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 111
    priority 98
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 6666
    }
    virtual_ipaddress {
        192.168.116.150
    }
}

virtual_server 192.168.116.150 3306 {
    delay_loop 6
    lb_algo rr
    lb_kind DR
    persistence_timeout 0
    protocol TCP
    real_server 192.168.116.142 3306 {
        notify_down "/usr/local/shell/mariadb.sh"
        weight 1
        TCP_CHECK {
            connect_timeout 10
            retry 3
            delay_before_retry 3
            connect_port 3306
        }
    }
}

Server4 (192.168.116.143) – New Node

global_defs {
    router_id vip4
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 112
    priority 98
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 6666
    }
    virtual_ipaddress {
        192.168.116.151
    }
}

virtual_server 192.168.116.151 3306 {
    delay_loop 6
    lb_algo rr
    lb_kind DR
    persistence_timeout 0
    protocol TCP
    real_server 192.168.116.143 3306 {
        notify_down "/usr/local/shell/mariadb.sh"
        weight 1
        TCP_CHECK {
            connect_timeout 10
            retry 3
            delay_before_retry 3
            connect_port 3306
        }
    }
}

All servers are configured as BACKUP; the node with the highest priority becomes MASTER and holds the virtual IP, providing seamless failover.

DatabaseShardinghigh availabilityscalingMariaDBkeepalived
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.