Cloud Computing 9 min read

Design and Implementation of an Elastic Scaling Service on Alibaba ECS

This article explains why elastic scaling is needed for variable web traffic, describes how to build a cost‑effective, automatically adjustable service on Alibaba ECS using message queues, service refactoring, Docker deployment, logging, and a real‑time allocation algorithm, and shares practical lessons learned.

Architecture Digest
Architecture Digest
Architecture Digest
Design and Implementation of an Elastic Scaling Service on Alibaba ECS

Throughout this year we have been focusing on cost optimization, and this article shares how we built an elastic, scalable service.

Web traffic fluctuates between peaks and troughs; maintaining enough machines to handle peak loads at all times is expensive, so we need a system that automatically adjusts the number of machines based on traffic.

Our company uses Alibaba ECS, which offers elastic scaling groups billed per second, providing the basic capability for auto‑scaling. The following sections detail the implementation.

Work mode changed from "push" to "pull"

Our scaling strategy relies on real‑time QPS, which we obtain from a message queue. Services that previously used RPC push calls were refactored to pull messages from the queue, as illustrated in the diagram.

Service splitting and machine down‑scaling

Our compute services originally ran on high‑spec machines (16 CPU, 16 GB RAM), while the ECS scaling group provides low‑spec instances (up to 4 CPU, adjustable memory). We split compute‑intensive services so they can run on these smaller machines.

Fast service deployment

Because machines need to be added and removed on demand, services must support rapid deployment and teardown; Docker is used for this purpose.

Log collection

Since elastic services are constantly scaling up and down, logs cannot be discarded. A logging service is deployed alongside business services, also using Docker, to continuously collect logs.

Real‑time allocation algorithm

After completing the three foundational tasks above, we designed an allocation algorithm that decides the number of machines based on the message rate in the queue.

Each service has a configuration file, for example:

[strategy]
# maximum QPS a single service instance can handle
speed_ratio = 2.5
# approximate time (seconds) to deploy a service
start_time = 300
# number of instances to add per scaling step (minimum 1)
incr_num = 2
# number of machines that must always remain (persistent)
persistence = 16

Algorithm flow:

Measure the message rate A over a recent time window.

Obtain the current number of running service instances S.

Calculate the total supported rate B = S * speed_ratio.

Define a traffic fluctuation margin C (a fixed value based on observation).

If A > B + C, request new machines (incr_num) and deploy services.

If A < B - C, release machines (incr_num) and take services offline.

Always keep at least the number of machines specified by persistence.

Issues and considerations

Using the number of unacknowledged messages for scaling is ineffective; QPS is the only reliable metric.

Handling traffic spikes: for predictable spikes (e.g., major sales events) pre‑allocate many fixed machines and increase incr_num; for abnormal spikes, focus on attack mitigation, traffic filtering, and graceful degradation.

Frequent calls to Alibaba’s instance allocation API can cause failures; add a delay (e.g., 1 s) between calls and implement retry logic.

ECS instance start failures may occur; if an instance does not start within start_time, abandon it and request a new one.

Conclusion

After more than six months of stable operation, the elastic scaling service has reduced machine costs by roughly 60‑70%. Combined with earlier algorithm optimizations, overall service costs have dropped over 80%, demonstrating that even on Alibaba ECS, significant cost savings are achievable through thoughtful auto‑scaling design.

Source: http://www.cnblogs.com/haolujun/p/8075226.html

Dockercloud computingautoscalingelastic scalingallocation algorithmAlibaba ECS
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.