Case Study: Full Cloud Migration of the "Smart Citizen Pass" Mini‑Program Using Kubernetes and Cloud Services
This article details the background, initial ECS‑based deployment, performance bottlenecks during rapid user growth, and the comprehensive cloud‑native migration—including DNS load balancing, managed Kubernetes, CDN acceleration, RDS/Redis replacement, and NAS/OSS storage—resulting in significant scalability and operational improvements for the Smart Citizen Pass mini‑program.
1. Project Background
During the COVID‑19 pandemic, the company quickly adapted its WeChat mini‑program "Smart Citizen Pass" in mid‑February to support epidemic control modules such as self‑check, isolation check, work‑return declaration, electronic pass, and knowledge base, providing technical support for local governments.
2. Initial Deployment Architecture
At the early stage, a traditional ECS setup with a self‑managed MySQL database, Redis cache, and SpringBoot microservices was used. When user growth was modest, performance and operational costs were acceptable.
Initial architecture diagram (image omitted).
3. Performance Issues During User Surge
From late February, user registrations surged from ~5,000 per day to over 300,000, reaching nearly 3 million total users. The following problems emerged:
Static resource overload on a single Nginx node, causing 499 errors.
Large bundled Vue JS file (≈345 KB) leading to bandwidth consumption of ~268 Mbps per request during peak.
Elastic IP bandwidth limit of 200 Mbps causing long‑running requests.
Redis memory pressure and occasional OOM kills.
JVM GC pauses and high old‑generation memory usage, causing backend latency.
Scaling ECS instances required restarts, causing service interruptions.
4. Full Cloud Migration Architecture
Given the projected QPS growth (current >1,000, expected >2,000), the team decided to migrate all services to a Kubernetes cluster within one day.
4.1 DNS‑Based Smart Load Balancing with Dual Nginx
DNS smart resolution distributes traffic across two Nginx servers, achieving horizontal scaling without a dedicated SLB. An initial SLB trial (1 Gbps, QPS limit 1,000) was dropped in favor of DNS routing, saving costs and improving performance.
4.2 Managed Kubernetes (ACK) Cluster
The Alibaba Cloud Container Service for Kubernetes (ACK) managed version was chosen. Workers are created by the user, while the control plane is fully managed, offering high availability and low operational overhead.
Cluster topology diagram (image omitted).
Allocate 4C16G or 4C32G ECS for Java containers due to high memory consumption.
Use dedicated VPC switches for POD/SVC CIDR isolation.
Prefer cloud enterprise network for cross‑VPC communication.
Avoid SLB unless QPS exceeds 10,000 to reduce hourly costs.
Consider self‑hosted EFK for log storage if cheaper than managed log service.
4.3 CDN for Static Content Acceleration
CDN "Full Site Acceleration" caches static assets at edge nodes, reducing origin load. Peak bandwidth reached ~400 Mbps, static QPS >800, and overall response times improved significantly.
CDN deployment steps (list omitted).
# Deployment environment variables
- env:
- name: CDN_REPLACE_FILES
value: mobile-vue/*/index.html
- name: CDN_SERVER
value: https://****.egova.com.cn
- name: CDN_PREFIX
value: EGOVA_CDN
# Script to replace CDN placeholders
function replace_cdn() {
if [ "${CDN_REPLACE_FILES}" == "" ];then
return 0
fi
[ "$(echo ${CDN_SERVER}|grep -E '^http*|^https*)'|wc -l)" -eq 0 ] && CDN_SERVER="."
echo CDN_SERVER=${CDN_SERVER}
[ "${CDN_PREFIX}" == "" ] && CDN_PREFIX="EGOVA_CDN"
echo CDN_PREFIX=${CDN_PREFIX}
for files in ${CDN_REPLACE_FILES}
do
sed -i "s#${CDN_PREFIX}#${CDN_SERVER}#g" /usr/share/Nginx/html/${files}
done
}4.4 RDS and Cloud Redis Replacement
Local MySQL and Redis were migrated to Alibaba Cloud RDS and cloud Redis using mydumper and myloader , reducing operational overhead. An RDS MySQL 8.0 bug (TempTable engine) caused a temporary error, later fixed by changing the engine to Memory.
# Export local DB
mydumper -u $u -p $password -h $host -G -E -R -B $db -o $dbname/ -v 3 -t 16
# Import to RDS
myloader -h ${pub_host} -u ${pub_user} -p ${pub_password} -B $dbname -q 25000 -d ./$dbname/ -v 3 -t 164.5 NAS (and OSS) for Media Storage
FastDFS was replaced with Alibaba Cloud NAS, providing high‑performance PV for Kubernetes. After CDN integration, media read pressure shifted to CDN. OSS offers lower cost (≈30% of NAS) and can also be mounted as a PV.
5. Post‑Migration Performance
≈3 million total users.
Peak DAU ~800 k.
Login page PV >2.5 million.
SMS verification requests up to 1.9 million per day.
Concurrent online users >40 k.
Peak QPS >3,500; stress‑test QPS reached 13,000.
6. Lessons Learned
Plan for sudden user spikes (e.g., pandemic) with technical reserves for rapid feature rollout.
Select cloud products that match actual traffic; SLB is unnecessary until multi‑million user scale.
Choose billing models wisely (bandwidth vs. traffic) for IP, CDN, and other services.
Kubernetes provides efficient horizontal scaling and cost savings for million‑user applications.
End
Zhengtong Technical Team
How do 700+ nationwide projects deliver quality service? What inspiring stories lie behind dozens of product lines? Where is the efficient solution for tens of thousands of customer needs each year? This is Zhengtong Digital's technical practice sharing—a bridge connecting engineers and customers!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.