Cloud Computing 12 min read

How Alibaba Scales Millions of Apps with Serverless: Inside CSE’s Architecture

This article explores how Alibaba's Cloud Service Engine (CSE) overcomes the limitations of AWS Lambda by enabling rapid, millisecond‑level startup for legacy online services, delivering elastic scaling, cost‑per‑request billing, and seamless migration to a serverless architecture.

Alibaba Cloud Developer

Jun 4, 2019

How Alibaba Scales Millions of Apps with Serverless: Inside CSE’s Architecture

Serverless topics cover a wide range of lifecycle stages such as code management, testing, deployment, operations, and scaling. Alibaba’s Cloud Service Engine (CSE) is a middleware product designed to provide the advantages of AWS Lambda while solving the challenges faced by users of Lambda.

What is Serverless? AWS defines Serverless as a set of services that allow developers to run code without provisioning or managing servers, offering features like on‑demand scaling and pay‑per‑request billing.

While AWS Lambda provides a complete Serverless solution, it does not address how existing (stock) applications can migrate to a Serverless architecture. It is primarily suited for new applications developed as Functions.

Value of Serverless to Cloud Computing Serverless frees users from managing servers, eliminates capacity planning, enables virtually unlimited scaling by sharing a resource pool, and allows billing based on request count, execution time, and memory size.

CSE aims to provide a scalable solution similar to AWS Lambda but capable of smoothly migrating existing applications.

Challenges for Existing Online Services

Typical characteristics of legacy online applications include resource allocation measured in minutes and startup times exceeding ten minutes. Reducing startup time to the millisecond level would enable a new distributed architecture.

How to Achieve Millisecond‑Level Startup?

Applications typically initialize many components during startup. Alibaba’s internal SOA and micro‑service environment often leads to startup times of ten minutes or more due to loading shared SDKs.

Solution 1: Cold‑Start Resource Compression

L1 achieves high‑density deployment by pre‑starting multiple instances on a single physical machine, freezing containers so CPU usage drops to zero and RAM usage falls to 1/20 of the original, providing millisecond‑level elasticity.

L2 dumps the in‑memory state of a started application to disk, allowing rapid horizontal scaling by copying the snapshot to other machines, typically completing in about five seconds.

Combining L1 and L2 can start any application within 10‑50 ms under burst traffic.

Solution 2: Hot‑Copy Startup Acceleration

L1 uses a fork‑based seed process (fork2) that can specify a PID, enabling fast instance creation. pid_t fork2(pid_t pid); L2 allows a single snapshot to spawn multiple processes, establishing a one‑to‑many relationship.

Comparison of the Two Approaches

Solution 1: No UUID issue, but requires language‑specific VM customization; slightly higher cost.

Solution 2: Language‑agnostic and lower cost, but may inherit a UUID problem where forked instances share the same static UUID.

Overall, Solution 1 is broader in applicability, while Solution 2 is better suited for FaaS and NBF scenarios.

Comparison with AWS Lambda

Lambda requires applications to be developed as Functions and loads each Function dynamically. CSE, by contrast, shares the same instruction set across multiple instances and only loads the differences, making it compatible with mainstream stacks such as Spring Boot, PHP, Java, Python, and Node.js.

Theoretical Model

Serverless applications dynamically adjust the number of instances, allowing multiple services to share the same machine during off‑peak periods.

Quantitative Analysis

Serverless cost advantages can be combined with CPU‑share and mixed‑workload scheduling techniques to deliver superior overall cost efficiency.

CSE Code Samples

HSF Demo

package com.test.pandora.hsf;

import com.alibaba.boot.hsf.annotation.HSFProvider;

@HSFProvider(serviceInterface = HelloWorldService.class)
public class HelloWorldServiceImpl implements HelloWorldService {
    @Override
    public String sayHello(String name) {
        return "hello : " + name;
    }
}

Spring Boot Demo

package com.example.java.gettingstarted;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

@SpringBootApplication
@RestController
public class HelloworldApplication {
  @RequestMapping("/")
  public String home() {
    return "Hello World!";
  }

  @RequestMapping("/health")
  public String healthy() {
    // Message body required though ignored
    return "Still surviving.";
  }

  public static void main(String[] args) {
    SpringApplication.run(HelloworldApplication.class, args);
  }
}

CSE Production Practices

In an e‑commerce scenario, Serverless migration reduced the number of machines from 11 to 2, with elastic scaling handling traffic spikes from thousands to over a hundred thousand requests, then scaling back down after the peak.

Other e‑commerce services saw similar reductions in machine count and dynamic scaling between 1 and 4 instances.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Serverless Microservices Cost Optimization CSE

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.