How Alibaba Cloud’s ECS‑Based FaaS Achieves High‑Density, Low‑Latency Serverless Scaling
This article explains the design of an ECS‑based Function‑as‑a‑Service platform, covering multi‑tenant deployment, rapid horizontal scaling, resource‑utilization optimization, avalanche‑prevention strategies, and high‑density deployment techniques that together enable fast, cost‑effective cloud‑native serverless workloads.
Architecture Overview
In the Alibaba Cloud ECS‑based Function‑as‑a‑Service (FaaS) design, external traffic first reaches an internal SLB (Server Load Balancer) that provides DDoS protection and request distribution. The SLB forwards each request to an API server, which creates a container request with the Scheduler. The Scheduler places containers on worker nodes (ECS compute instances). When a function needs to access a user VPC, the worker node attaches an ENI (Elastic Network Interface) to the target VPC.
Multi‑Tenant Isolation
Containers rely on Linux namespace and cgroup mechanisms for kernel‑level isolation. Docker packages an entire OS image into a container, isolating CPU, memory, and devices. To avoid kernel‑shared security risks, the ECS‑based FaaS runs a single tenant per ECS instance, which reduces cross‑tenant attack surface but can lead to low utilization for infrequently invoked functions.
Rapid Horizontal Elastic Scaling
Deploy custom runtime containers that embed required languages, SDKs, and libraries, eliminating per‑invocation downloads and reducing startup latency.
Maintain a shared container image repository; write images to ECS snapshots and launch new instances from these snapshots to expand the machine pool quickly.
Pool machines and containers, pre‑start runtimes, delay code mount, and perform early health checks to shrink cold‑start time.
Enforce application size limits and encourage modular business logic; provide built‑in SDKs/Libraries to keep functions lightweight.
Use P2P image distribution and on‑demand loading to reduce download latency and avoid overloading central registries.
Improving Resource Utilization
Fine‑grained analysis (millisecond level) shows that bursty container launches and long cold‑starts cause uneven CPU/memory usage. Optimizations include:
Uniform scheduling to avoid simultaneous container bursts.
Reduce cold‑start latency (target <300 ms) so fewer containers are created in a short window.
Increase deployment density (more containers per host) to raise per‑machine utilization.
Disaster Recovery and Avalanche Prevention
Retry storms can amplify load and trigger cascading failures. Mitigation techniques:
Accelerate container startup (e.g., pre‑warmed VM templates).
Deploy across multiple partitions and Availability Zones.
Apply exponential back‑off and circuit‑breaker patterns for retries.
Leverage SLB DDoS protection and multi‑AZ redundancy.
High‑Density Deployment Goals and Challenges
Target metrics for the “ShenLong” high‑density engine:
Launch up to 10,000 containers per second.
Cold‑start latency ≤ 300 ms.
Container lifetime on the order of minutes.
Resource granularity of 128 MB.
Key challenges:
Secure multi‑tenant isolation on a single host.
Maintain sub‑300 ms startup under extreme concurrency.
Fast VPC network provisioning for containers.
Robust fault tolerance; a single node failure must not affect many tenants.
Secure Container Template Optimization (ShenLong)
Each function runs inside an isolated lightweight VM sandbox with its own Linux kernel, providing strong security boundaries. The system pre‑creates VM templates and uses virtio‑fs for delayed code mount. Typical cold‑start time is ~250 ms, supporting ~2,000 containers per host with ~20 MB memory per micro‑kernel.
On‑Demand Code Loading
Code is loaded on demand using a single shared copy per ShenLong node:
FUSE provides a user‑space file system layer for code access.
NAS supplies low‑latency reads for small files; OSS delivers high‑bandwidth bulk downloads.
The code directory is split into an index file and a content file, enabling range‑based reads (e.g., GetRange) for rapid partial loading.
VPC Network Optimization
A service‑mesh VPC gateway proxy keeps ENI cards attached to a dedicated gateway cluster instead of per‑container plug‑in. This removes the 2–8 s ENI attachment latency observed in the original ECS design, dramatically reducing network provisioning time and resource overhead.
Resource Allocation Efficiency
By mixing heterogeneous multi‑tenant workloads on a single ShenLong host, the system achieves higher deployment density and better matches container resource requests, improving overall allocation efficiency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
