Mastering Alibaba Cloud SLB: Build High‑Availability Load Balancing with Terraform
This guide walks through Alibaba Cloud SLB’s architecture, product variants, and environment prerequisites, and step‑by‑step Terraform provisioning for CLB, ALB, and NLB, covering health checks, HTTPS setup, traffic routing, performance testing, best practices, security hardening, monitoring, and disaster‑recovery procedures.
Overview
When traffic exceeds a single server or high availability is required, load balancing becomes essential. Alibaba Cloud Server Load Balancer (SLB) is the most widely used cloud load balancer in China, handling massive internet traffic.
Example: an e‑commerce platform handled 500 k requests per second during Double‑11 2024 by scaling from 20 to 200 ECS instances with SLB, achieving 99.99 % availability.
SLB offers Layer 4 (TCP/UDP) and Layer 7 (HTTP/HTTPS) modes. Layer 4 suits performance‑critical scenarios; Layer 7 provides richer traffic management such as URL‑based routing, cookie session persistence, and HTTPS offloading.
Product Variants
CLB (Classic Load Balancer) : classic, supports both layers, mature and stable.
ALB (Application Load Balancer) : application‑focused, Layer 7 only, richer routing rules.
NLB (Network Load Balancer) : network‑focused, Layer 4 only, ultra‑high performance.
2025 recommendation: new services prefer ALB/NLB, CLB for legacy workloads.
Key Features
Elastic scaling : automatic scaling without manual capacity changes. CLB billed by specification; ALB/NLB billed by actual usage.
Multi‑AZ disaster recovery : cross‑AZ deployment with automatic failover (primary‑backup or active‑active modes).
Health checks : Layer 4 (TCP/UDP) and Layer 7 (HTTP/HTTPS) checks, automatic isolation of unhealthy servers.
Applicable Scenarios
Web applications – ALB with HTTPS listener + URL routing.
API gateways – ALB with multi‑domain, forwarding rules, rate limiting.
Game services – NLB with UDP listener and session persistence.
Database proxy – NLB with TCP listener and backend server group.
Hybrid cloud – CLB with VPN gateway integration.
Micro‑services – ALB with gRPC support and service discovery.
Environment Requirements
VPC must be created.
At least two availability zones for high‑availability.
ECS instances running normally as backend servers.
Security groups must allow SLB health‑check traffic (100.64.0.0/10).
Alibaba Cloud account must be real‑name verified and SLB service enabled.
RAM permissions for SLB operations.
Detailed Steps
1. Preparation
VPC network planning
VPC: 10.0.0.0/8
├── Zone A
│ ├── Public subnet: 10.0.1.0/24 (SLB, NAT)
│ └── Private subnet: 10.0.10.0/24 (ECS)
├── Zone B
│ ├── Public subnet: 10.0.2.0/24 (SLB backup)
│ └── Private subnet: 10.0.20.0/24 (ECS)
└── Zone C
└── Private subnet: 10.0.30.0/24 (ECS expansion)Backend server preparation
# Check web service status
systemctl status nginx
# Verify port listening
ss -tlnp | grep ':80\|:443'
# Test local service
curl -I http://localhost/health
# Ensure security group allows SLB health‑check (100.64.0.0/10)Terraform infrastructure
# main.tf (excerpt)
provider "alicloud" {
region = "cn-hangzhou"
}
resource "alicloud_vpc" "main" {
vpc_name = "prod-vpc"
cidr_block = "10.0.0.0/8"
}
resource "alicloud_vswitch" "zone_a" {
vpc_id = alicloud_vpc.main.id
cidr_block = "10.0.10.0/24"
zone_id = "cn-hangzhou-h"
vswitch_name = "prod-vsw-a"
}
/* similar definitions for zone_b, security groups, and ECS instances */2. Core Configuration
Create CLB instance (console)
Log in to SLB console → Instance Management → Create Load Balancer.
Select configuration:
Instance type: Classic Load Balancer (CLB)
Specification: performance‑guaranteed (choose per workload)
Network: public or private
Primary AZ: cn‑hangzhou‑h
Backup AZ: cn‑hangzhou‑i
Create CLB with Terraform
resource "alicloud_slb_load_balancer" "main" {
load_balancer_name = "prod-clb"
address_type = "internet"
load_balancer_spec = "slb.s3.medium"
vswitch_id = alicloud_vswitch.zone_a.id
master_zone_id = "cn-hangzhou-h"
slave_zone_id = "cn-hangzhou-i"
tags = { Environment = "prod" }
}
resource "alicloud_slb_listener" "http" {
load_balancer_id = alicloud_slb_load_balancer.main.id
backend_port = 80
frontend_port = 80
protocol = "http"
bandwidth = -1
sticky_session = "on"
sticky_session_type = "insert"
cookie_timeout = 86400
health_check = "on"
health_check_type = "http"
health_check_uri = "/health"
health_check_connect_port = 80
healthy_threshold = 3
unhealthy_threshold = 3
health_check_timeout = 5
health_check_interval = 2
health_check_http_code = "http_2xx,http_3xx"
gzip = true
request_timeout = 60
idle_timeout = 15
}Create ALB instance (Terraform)
resource "alicloud_alb_load_balancer" "main" {
vpc_id = alicloud_vpc.main.id
address_type = "Internet"
address_allocated_mode = "Dynamic"
load_balancer_name = "prod-alb"
load_balancer_edition = "Standard"
load_balancer_billing_config {
pay_type = "PayAsYouGo"
}
zone_mappings {
vswitch_id = alicloud_vswitch.zone_a.id
zone_id = "cn-hangzhou-h"
}
zone_mappings {
vswitch_id = alicloud_vswitch.zone_b.id
zone_id = "cn-hangzhou-i"
}
}Server groups, health checks, sticky sessions, listeners, HTTPS certificates, and redirect rules are defined similarly (full snippets omitted for brevity).
3. Validation
Check SLB status via CLI
# Describe load balancers
aliyun slb DescribeLoadBalancers --RegionId cn-hangzhou --LoadBalancerId lb-xxx
# Describe listeners
aliyun slb DescribeLoadBalancerListeners --RegionId cn-hangzhou --LoadBalancerId lb-xxx
# Health status
aliyun slb DescribeHealthStatus --RegionId cn-hangzhou --LoadBalancerId lb-xxx --ListenerPort 80Functional testing
# Get public IP
SLB_IP=$(aliyun slb DescribeLoadBalancers --LoadBalancerId lb-xxx --output cols=Address | tail -1)
# HTTP request
curl -I http://$SLB_IP/
# Session persistence test
curl -c cookie.txt http://$SLB_IP/
for i in {1..5}; do curl -b cookie.txt -s http://$SLB_IP/server-info | jq '.hostname'; done
# HTTPS test
curl -I https://www.example.com/Stress testing
# wrk
wrk -t12 -c400 -d30s http://$SLB_IP/
# ApacheBench
ab -n 10000 -c 100 http://$SLB_IP/Example Configurations
Production‑grade ALB (Terraform)
# variables.tf, main.tf, server groups, listeners, forwarding rules, ACL, DDoS protection, alarms, etc.Full snippets are available in the original article.
Best Practices & Caveats
Performance Optimisation
Choose appropriate CLB specification (e.g., slb.s3.medium supports up to 50 k QPS).
Health‑check interval 2 s, timeout 5 s, thresholds 3 / 3 for fast failure detection.
Enable HTTP Keep‑Alive on backend servers.
Security Hardening
Use TLS 1.2+ cipher policy (e.g., tls_cipher_policy_1_2).
Configure ACL whitelist for office IP ranges.
Enable DDoS protection via alicloud_ddoscoo_instance.
High‑Availability Design
Deploy across at least two AZs.
Distribute backend ECS instances evenly.
Test failover by shutting down one AZ and verifying automatic switch.
Troubleshooting & Monitoring
Common Issues
Health check failure : likely security‑group missing 100.64.0.0/10.
502 Bad Gateway : backend error or timeout; check service and increase request_timeout.
504 Gateway Timeout : backend processing too slow; optimise or increase timeout.
Session not sticky : cookie configuration error; verify sticky_session settings.
HTTPS certificate error : mismatch or expiry; renew certificate.
Connection exhaustion : insufficient CLB spec; upgrade.
High latency : cross‑region traffic; use GTM for nearest‑region routing.
Uneven traffic : weight or session‑persistence mis‑config; adjust.
Health‑Check Debugging Steps
# Verify security group
aliyun ecs DescribeSecurityGroupAttribute --SecurityGroupId sg-xxx --Direction ingress | grep 100.64
# Simulate health check from another ECS
curl -I http://backend-ip:80/health
# Check backend service status
ssh backend-server "systemctl status nginx"
ssh backend-server "curl -I localhost/health"Performance Monitoring
QPS, ActiveConnection, NewConnection, TrafficRX/TX – alert when >80 % of spec.
StatusCode5xx – alert >1 % of total requests.
Rt (average response time) – alert >500 ms.
UnhealthyServerCount – alert when ≥1.
Terraform examples create CMS alarms for these metrics.
Backup & Disaster Recovery
Export configuration via CLI scripts, import into Terraform, and use Terraform plan/apply for recovery. DNS switch and CDN source update are part of the DR workflow.
Conclusion
Key Takeaways
Product selection: CLB for legacy, ALB for Layer 7, NLB for high‑performance Layer 4.
High‑availability: multi‑AZ, health checks, balanced backend distribution.
HTTPS: certificate management, TLS policy, HTTP‑to‑HTTPS redirect.
Traffic management: path/header routing, session persistence, canary releases.
Security: ACL, DDoS protection, strict TLS.
Monitoring: critical metrics, alarms, performance analysis.
Further Learning
GTM global traffic management.
DCDN acceleration.
WAF web application firewall.
Service mesh (ALB + ASM).
Kubernetes Ingress with ALB.
References
Alibaba Cloud SLB documentation.
ALB documentation.
NLB documentation.
Terraform Alibaba Cloud Provider.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
