How Ant Group’s FaaS Architecture Boosts Performance and Security
Ant Group’s FaaS platform redefines serverless computing by eliminating infrastructure overhead, offering rapid function deployment, high‑throughput low‑latency scheduling, robust security isolation, and cost‑effective scaling, while detailing its architectural components, performance optimizations, and future AI‑driven enhancements.
What Is FaaS?
FaaS (Function as a Service) is a cloud‑computing model that lets developers write and deploy functions without managing underlying infrastructure.
The Rise of FaaS
Traditional application models suffer from heavy code coupling, complex deployment, and cumbersome operations. Developers must constantly upgrade middleware, tune performance, and handle multi‑cloud differences, while operations face capacity estimation, resource provisioning, and scaling challenges. FaaS addresses these pain points by abstracting away infrastructure, enabling faster development and one‑stop serverless operation.
High code coupling and complexity in traditional development.
Lengthy and inefficient operational processes.
Poor resource utilization and wasteful capacity planning.
FaaS adopts a function‑centric programming model, allowing users to focus solely on business logic.
Typical FaaS Use Cases
BFF (Backend‑for‑Frontend) scenarios : lightweight glue code that assembles data from multiple APIs, often short‑lived and tied to marketing events.
Event‑driven workloads : video transcoding, file‑upload triggers, or traffic spikes that require on‑demand execution.
Middle‑platform services : independent algorithm operators that need strong isolation due to many contributors.
Technical Challenges When Deploying FaaS
Performance issues
Function‑call latency must match the optimizations traditionally applied to RPC in micro‑services.
Elastic scaling response time can be slow when metrics are collected and acted upon.
Cold‑start latency is critical because function containers are created on demand.
Security issues
Resource isolation is required to keep costs low while ensuring safety.
Containers must prevent escape attacks.
The programming model must effectively shield developers from middleware and cloud‑service interference.
Ant FaaS Architecture
Ant Group’s implementation follows three core principles:
Traffic‑driven model : containers are created and destroyed directly by request traffic, not by metric‑based scaling.
Cold‑start target : aim for sub‑100 ms cold starts, avoiding warm‑pool or cache tricks.
Strong security isolation : each function runs in a highly isolated sandbox.
The architecture consists of four main components:
Function Gateway : forwards requests, initiates a container scheduling task for each request.
Container Scheduling Engine : manages the lifecycle of function containers, controls concurrency and reuse, and maintains a pool of function Pods.
Function Runtime : an OCI‑compatible runtime that quickly starts containers and controls the container runtime.
Function Container : the execution environment that combines the runtime, container, and user code.
All requests flow through the Function Gateway, which schedules a Pod via the Scheduler. The Pod’s node gateway caches the request while the container is launched. Once the container is ready, it pulls the request from the node gateway and executes the business logic.
Performance Optimization Practices
Function Gateway
The original Go‑based gateway suffered from P99 tail latency under high concurrency. The new design runs the Go gateway on top of Envoy (a high‑performance C++ proxy), allowing layer‑7 filtering and routing while reducing CPU usage by over 50 % and request latency by about 30 %.
Container Scheduling Engine (HUSE)
HUSE is a next‑generation scheduler built for high‑throughput, low‑latency serverless workloads. It provides multi‑level adaptive caching, a fast protocol stack, and intelligent package loading. In a 10 000 QPS test, HUSE achieves ~21 ms P50 and <50 ms P99 scheduling latency, an order‑of‑magnitude improvement over traditional schedulers.
Cold‑Start Optimizations
Eliminate warm‑pool caching; pure cold start avoids CPU waste.
Cache non‑CPU resources (IP, image download, volume mount, cgroup, namespace) to achieve near‑0 ms allocation.
Adopt a read‑only file system (ROFS) to halve CPU usage and turn file operations into memory‑mapped accesses.
Replace the standard OCI create + start sequence with a checkpoint + restore flow, reducing container start time to under 90 ms with <1 MiB memory overhead.
Security Capability Construction
Function containers run in a runSC sandbox with ACL rules and a virtual veth pair. Network traffic is filtered and controlled via eBPF, providing complete isolation. NanoVisor, a lightweight hypervisor derived from gVisor, handles system‑call interception, seccomp filtering, and process‑level isolation, ensuring that both guest and host kernels are protected.
Horizontal security includes network‑level ACLs for all ports, DNS, and five‑tuple flow control. Authentication and authorization are performed transparently by the runtime and proxy services, removing the need for developers to manage credentials.
Developer and Operations Experience
Development : Creating, coding, and deploying a function takes only a few seconds, eliminating repository setup, compilation, and packaging steps. A demo showed a sixth‑grader completing an Alipay mini‑program + FaaS function in five minutes.
Operations : The platform is fully serverless, offering built‑in monitoring, alerting, and observability without exposing any infrastructure details.
Conclusion
Cost‑effective: lower memory footprint, sub‑millisecond billing granularity.
Strong security isolation with zero‑trust networking.
Accelerated development and simplified operations.
One‑stop serverless platform with integrated monitoring and alerts.
Future Outlook: FaaS + AI
Ant Group aims for “extreme performance” (sub‑10 ms cold start) using fork‑based techniques and “extreme efficiency” by coupling AI‑generated code (AIGC) with FaaS. The vision is a low‑code/no‑code workflow where natural‑language prompts produce PRDs, which are then turned into code by AI and executed on a serverless platform, dramatically boosting developer productivity.
FaaS Extends Alipay Cloud Development
Building on the mature Ant FaaS stack, a new Alipay mini‑program cloud development product has been launched, offering the same serverless capabilities to a broader audience.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ant R&D Efficiency
We are the Ant R&D Efficiency team, focused on fast development, experience-driven success, and practical technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
