Databases 12 min read

How Alibaba Cloud Scaled SQL Server with AlwaysOn for Read/Write Separation

This article details Alibaba Cloud's evolution of SQL Server RDS, covering product growth, the challenges of read/write separation, technical evaluations of AlwaysOn versus Transactional Replication, cloud architecture iterations, and the final productized solution for high‑availability database services.

dbaplus Community
dbaplus Community
dbaplus Community
How Alibaba Cloud Scaled SQL Server with AlwaysOn for Read/Write Separation

Author: Wang Fangming, Alibaba Cloud technical expert, former DBA and product developer now leading RDS SQL Server development.

1. Product Rapid Development (2015‑2017)

Facing large‑scale customers whose peak workloads exceeded the limits of single‑node RDS specifications, Alibaba Cloud needed a scale‑out solution for SQL Server. Because SQL Server lacks mature middleware, traditional sharding or read‑write splitting required extensive application changes, which customers could not quickly adopt.

Key product improvements during this period included:

Replacing the underlying architecture to enhance stability and user experience.

Offering multiple SQL Server versions (2008R2, 2012, 2014, 2016) and editions (Web, Standard, Enterprise).

Providing cloud migration tools and full/incremental backup‑to‑cloud solutions for various versions.

Adding numerous stored procedures to expose advanced features while handling security and mirroring constraints.

Developing the SQL Server CloudDBA service to automate performance and space‑related issue resolution.

2. Read/Write Separation Concept

Unlike MySQL, which often uses middleware for transparent routing, SQL Server does not have a mature middleware layer because its Tabular Data Stream (TDS) protocol is not fully open. Any read/write separation therefore requires application‑level adaptation, even when using AlwaysOn, due to differences in driver configuration.

3. Technical Evaluation

The team compared all relevant SQL Server technologies, focusing on data security, high availability (HA), disaster recovery (DR), and the ability for replicas to serve reads.

Two candidates emerged:

Transactional Replication – logical replication with fine‑grained control but high complexity and limited controllability at table/column level.

AlwaysOn – native HA/DR solution offering both Failover Cluster Instances (FCI) and Availability Groups (AG).

After weighing performance, latency, reliability, and operational complexity, the team selected AlwaysOn for productization.

4. AlwaysOn Architecture

AlwaysOn consists of two models:

Failover Cluster Instances (FCI) – share‑storage, instance‑level synchronization.

Availability Groups (AG) – share‑nothing, database‑level synchronization.

Both rely on Windows Server Failover Clustering (WSFC). The synchronization process involves Primary node log writes, log capture, transmission to replicas, log receive, and redo on secondary nodes. Synchronous mode waits for ACK from secondary; asynchronous mode does not.

If a secondary becomes unreachable, the primary degrades to an asynchronous‑like mode, marking the replica state as NOT SYNCHRONIZING or DISCONNECT via system views (e.g., sys.dm_hadr_database_replica_states, sys.database_mirroring). DBA tools must detect this and, once the secondary recovers, resume synchronization until the Last Hardened LSN matches the primary.

SQL Server 2017 CU1 introduced the REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT parameter, forcing the primary to wait for all synchronized secondaries before committing.

5. Cloud Evolution – Architecture Iterations

Version 1 used ECS, SSD disks, OSS, VPC, SLB, and a classic WSFC+AD setup supporting SQL Server 12‑17.

Drawbacks: high cost (multiple AD nodes) and sensitivity to network jitter causing WSFC failures.

Version 2 introduced HAVIP for listener handling and replaced AD with certificate‑based authentication, reducing resource usage.

However, it still suffered from network instability across zones and regions.

Version 3 eliminated WSFC and AD entirely, relying only on cloud primitives and SQL Server itself. This reduced network‑jitter impact and allowed monitoring via SQL Server performance counters. Because the architecture lacks native clustering, a custom HA solution was built to provide high availability without WSFC.

6. Productization

The final architecture guarantees two synchronous replicas (placed in different availability zones) for HA, with up to seven asynchronous read‑only replicas. Three access patterns are offered:

Read/Write Path: directs traffic to the two synchronous nodes.

Unified Read Path: aggregates selected replicas and distributes reads based on weighted routing.

Single Read Path: provides a dedicated endpoint per read‑only replica, allowing clients to target the nearest zone.

This solution brings SQL Server AlwaysOn capabilities to Alibaba Cloud PaaS, currently supporting core features with room for future enhancements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilityRead-Write SeparationDatabase ReplicationAlibaba CloudAlwaysOnSQL Server
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.