Cloud Native 26 min read

Why Serverless Is the Future of AI Apps: From Stateless to Session‑Aware Cloud‑Native Architecture

The article examines how AI application infrastructure is shifting from traditional always‑on, stateless deployments to serverless models that balance cost, scalability, and stateful conversational needs, outlining architectural challenges, evolution stages, and emerging solutions such as external state storage, session‑affinity scheduling, durable functions, and session‑oriented runtimes.

Alibaba Cloud Native

Sep 19, 2025

Why Serverless Is the Future of AI Apps: From Stateless to Session‑Aware Cloud‑Native Architecture

AI Application Infrastructure Paradigm Shift

AI workloads that rely on large language models are moving away from permanently provisioned, always‑online compute because the cost of idle resources is unsustainable for bursty or unpredictable demand. Industry analysts report that serverless computing can reduce infrastructure costs by 40‑70% while providing elastic scaling.

1. Request‑Response (Stateless) AI Model

This early‑stage model treats AI tasks as independent, transactional operations similar to traditional web services.

One‑off interaction: the user sends a single request, the model returns a final result, and the conversation ends.

Stateless: each request is processed without any memory of previous interactions, enabling any compute instance to handle it.

Clear task focus: suitable for idempotent operations where repeated calls produce the same output.

Fixed I/O: inputs must be formatted for the model, and outputs are returned in a structured form.

Typical examples include image classification, sentiment analysis, and machine translation. The stateless design aligns perfectly with serverless’s event‑driven, short‑lived function model, allowing massive parallelism and fault tolerance.

2. Conversational (Stateful) AI Model

More advanced AI agents require multi‑turn interactions, memory of prior context, and the ability to explore tasks iteratively.

Continuous interaction: users can ask follow‑up questions, clarify, or refine requests across multiple rounds.

Stateful: the system must retain conversation context to resolve references like “cancel it”.

Task exploration: the agent helps the user discover requirements and co‑create solutions.

Natural language interface: no special command syntax is needed.

Examples include chatbots that rewrite poems, code assistants that generate, explain, and refactor code, and personalized travel planners that adapt itineraries over several exchanges.

Serverless Architecture for AI

Serverless offers event‑driven execution, automatic scaling, pay‑per‑use billing, and isolation, which matches the bursty nature of AI inference. However, classic serverless assumes short‑lived, stateless functions, creating a mismatch for stateful conversational workloads.

Challenges of Running Stateful AI on Classic Serverless

1) State persistence and data locality

Function instances are short‑lived; any in‑memory data disappears when the instance is reclaimed. To preserve session state, developers must externalize it to storage services such as object storage, table storage, or Redis, and reload it on each invocation.

Programming model complexity: Refactoring applications to externalize state adds significant engineering effort and cost, especially for legacy services.

Performance and cost overhead: Each round of dialogue requires at least two network trips (read and write), increasing latency and introducing reliability risks.

2) Simulating session lifecycle with functions

Some platforms allocate a dedicated function instance per session (session‑affinity) so that state can be kept in memory. This requires manually scaling the instance count to zero when the session ends, which is fragile and difficult to manage.

3) Traditional web session‑affinity limitations

Session‑affinity (e.g., cookie‑ or header‑based routing) can keep a user’s requests on the same instance, reducing storage calls, but it offers only “best‑effort” guarantees. Instances may be recycled, causing state loss, and the approach harms horizontal scalability because traffic is no longer evenly distributed.

Reliability: If the sticky instance fails, the session is lost.

Isolation risk: A single instance may serve multiple users, risking data leakage.

Scalability breakage: Load is uneven, creating hotspots.

Cost model conflict: Maintaining hot instances undermines serverless’s pay‑per‑use economics.

Evolutionary Solutions

Stage 1: External State Storage (Best‑Practice Serverless)

A function receives a user request.

It reads the current session context from a high‑performance key‑value store (e.g., Redis) or a table service using the session ID.

The business logic runs and may update the session state.

Before responding, the function writes the updated state back to external storage.

The function instance is terminated, leaving no residual state.

This approach preserves the core benefits of serverless—elasticity and fault tolerance—while incurring the latency of external storage reads/writes.

Stage 2: Traditional Session‑Affinity Scheduling

To reduce storage latency, platforms route all requests of a session to the same instance using either cookie‑based or custom header‑based affinity.

Cookie affinity: The load balancer sets a cookie with the instance identifier on the first response; subsequent requests carry the cookie to the same instance.

Header affinity: Clients send a custom header (e.g., x-fc-session-id) that the router hashes to a stable instance.

While performance improves, the model remains “best‑effort” and can still suffer from instance recycling and scalability limits.

Stage 3: Native State Abstraction (Durable Functions)

Platforms such as Azure Durable Functions introduce persistent entities (actors) that encapsulate state and behavior. The runtime automatically persists entity state, guarantees ordered execution, and abstracts away external storage concerns, simplifying development of stateful workflows.

Stage 4: Session‑Oriented Serverless Runtime (AgentRun, Bedrock AgentCore)

Leading cloud providers now offer runtimes that allocate a dedicated micro‑VM (MicroVM) per session, keeping the entire session context, intermediate results, and temporary files in memory for the lifetime of the session (up to 8‑24 hours).

Native state retention: No external storage calls; latency is limited to intra‑VM memory access.

Cold‑start elimination: Only the first invocation incurs a cold start; subsequent calls run on the same hot MicroVM.

Strong isolation: Each MicroVM provides kernel‑level isolation, protecting sensitive data and untrusted code.

This “session‑aware” serverless model combines the operational simplicity of serverless with the performance and security of traditional long‑running services.

Key Takeaways and Recommendations

Stage 1 – External state: Ideal for simple workloads that can tolerate tens to hundreds of milliseconds of extra latency.

Stage 2 – Session affinity: A transitional compromise for legacy applications that cannot be refactored to stateless patterns; not recommended for new projects.

Stage 3 – Durable entities: Suitable for complex, reliable stateful workflows such as order processing or IoT aggregation.

Stage 4 – Session‑oriented runtimes: The optimal choice for AI agents requiring multi‑turn interaction, low latency, and strong security, though the cost model shifts from pure request‑based billing to session‑duration pricing.

Architects should evaluate latency tolerance, cost sensitivity, development complexity, and reliability requirements to select the most appropriate evolution path for their AI workloads.

serverless architecture AI Runtime Session Stateful

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.