Cloud Computing 9 min read

How Serverless Billing Evolved: From Per-Request to Resource-Based Pricing

From the early days of per‑hour VM rentals to today’s millisecond‑granular, resource‑aware pricing, serverless function compute has transformed its billing models across three stages, each driven by advances in request boundary detection, concurrency handling, and AI‑centric resource consumption, aligning cost with actual usage.

Alibaba Cloud Native

Sep 16, 2025

How Serverless Billing Evolved: From Per-Request to Resource-Based Pricing

In the development of cloud computing, billing is often the most intuitive perception for developers, reflecting key choices in resource abstraction, scheduling, security isolation, and development experience.

Stage One: From Resource Rental to Per‑Request Billing

The initial breakthrough was shifting from hourly VM rentals to charging only for the exact time a function is invoked, eliminating idle costs and lowering the entry barrier for developers.

Key technologies that enable this billing model include:

Accurate request boundary identification : the platform must detect the start and end of a request at microsecond/millisecond precision to ensure fair billing.

Exclusive resource allocation per request : each request receives dedicated CPU and memory, avoiding performance jitter caused by resource contention.

Cold‑start latency optimization : instances are not resident; they are launched on demand and reclaimed immediately after execution.

1 ms active/idle state transition : when no request is present, the function instance’s CPU is frozen, consuming no time slices; it is instantly re‑activated when a request arrives.

Stage Two: Multi‑Concurrency + Millisecond‑Level Billing – Optimized for Web

As serverless became popular, web and API workloads began to adopt the model. Billing per single request became costly for high‑concurrency scenarios, prompting a shift to active‑interval billing.

Core changes:

Break single‑request limitation : billing is based on the entire active interval of a function instance, regardless of how many concurrent requests are processed.

Active interval granularity of 1 ms : this fine‑grained measurement supports mainstream web and API services.

Supporting technologies include:

Identify active intervals as billing boundaries : any executing request marks the instance as active.

Custom Runtime / Container Runtime : enables smooth migration of popular web frameworks (Express, Flask, Spring Boot) that naturally support concurrency, reducing costs and mitigating database connection spikes.

Billing granularity from 100 ms to 1 ms : most web requests complete in under 100 ms; finer granularity makes billing fairer.

Full‑stack latency optimization : the platform optimizes authentication, routing, scheduling, and forwarding to keep end‑to‑end latency low.

Stage Three: Billing by Actual Resource Consumption – AI Era

AI applications feature long sessions, strong interaction, low latency, and often sparse load, making the previous "active = request" model inefficient.

The key transformation is to differentiate active and idle states based on real resource usage rather than merely request presence.

Essential techniques:

Session affinity : routes all requests of the same session to the same instance, preserving context.

Configure IdleTimeout to actively control session retention time (upcoming feature).

Resource‑based active/idle detection : if CPU usage exceeds a threshold, the period is billed as active; otherwise only memory, disk, or network costs are charged.

Low‑load discount mechanism : the platform samples CPU usage each second; periods below the threshold receive a CPU cost waiver, default in MCP, WebSocket, and similar low‑load scenarios.

Non‑freeze mode for background tasks : allows functions to continue running after request completion for tasks such as cache warm‑up, indexing, or callbacks, while still charging based on actual consumption.

The result is a billing model that aligns cost with true resource consumption, avoiding extra charges for long‑lived connections or low‑load keep‑alive, making serverless truly suitable for AI‑driven long‑session workloads (GPU resources are currently excluded).

Overall, the evolution of function compute billing reflects a continuous effort to align product form with user value, moving from per‑request to active‑interval to fine‑grained resource‑based pricing, enabling developers to focus on business logic while the cloud handles resource management.

serverless cloud computing AI resource management function compute billing

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.