Why Upgrade to MCP Streamable HTTP? Boost Performance and Session Management in Serverless
This article explains the limitations of the traditional HTTP+SSE transport for MCP, introduces the new MCP Streamable HTTP protocol with its unified endpoint, on‑demand streaming and session recovery features, and shows how it improves connection count, success rate, latency, and overall reliability for serverless function compute workloads.
Background
Traditional serverless platforms target stateless applications, routing requests to automatically scaling function instances. When an application requires session affinity, these platforms cannot reliably forward session‑specific requests to the correct instance without external state storage, which adds latency, limits scalability, and incurs extra costs.
MCP Protocol Overview
The MCP (Model Context Protocol) standardizes connections between LLMs and external data sources, acting like a USB‑C interface for AI models. It originally supported an SSE‑based transport (MCP SSE) and has now been upgraded to the more efficient MCP Streamable HTTP transport.
Transport Types
Stdio : Client and server run on the same machine, communicating via standard input/output.
MCP SSE : Uses HTTP Server‑Sent Events; requires separate POST and SSE endpoints.
MCP Streamable HTTP : Uses standard HTTP POST/GET, consolidates endpoints, and can upgrade to SSE when needed, reducing latency and connection overhead.
Limitations of HTTP+SSE
No reconnection/recovery – a dropped SSE connection loses all session state.
Servers must maintain long‑lived SSE connections, consuming resources under high concurrency.
All responses must travel through SSE, even for simple request‑response interactions.
Many network components (CDN, load balancers, firewalls) struggle with long‑lived SSE connections.
Key Improvements with Streamable HTTP
Unified endpoint (e.g., /mcp) replaces separate /sse and /message endpoints.
On‑demand streaming – servers can return a normal HTTP response or upgrade to SSE for large or incremental results.
Clients can initialize a session with a simple GET request.
Session recovery – disconnected sessions can resume without losing state, provided the session has not been explicitly deleted.
Performance Comparison
Streamable HTTP uses significantly fewer TCP connections than HTTP+SSE.
Success rates under varying concurrency are markedly higher for Streamable HTTP.
Average response times are lower and more stable, especially under high load.
These results justify upgrading MCP services to the Streamable HTTP transport.
Protocol Details
Client‑to‑Server Messaging (MCP SSE)
Clients send requests to /messages with a JSON‑RPC payload. The server acknowledges with 202 Accepted, then streams results via an SSE connection.
# client request
POST /messages/?session_id=706c5bb094fe43c89a6cb33fb96f470d HTTP/1.1
Host: 127.0.0.1:8000
Connection: keep-alive
Accept: text/event-stream
Content-Type: application/json
{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{"_meta":{"progressToken":1}}}
# server async response
HTTP/1.1 202 Accepted
Date: Thu, 31 Jul 2025 07:50:51 GMT
Server: uvicorn
Content-Length: 8
# SSE result
event: message
data: {"jsonrpc":"2.0","id":1,"result":{...}}Streamable HTTP Workflow
Two possible flows exist:
Synchronous response – the server returns the result directly in the HTTP response.
Streaming response – the server upgrades the connection to SSE and streams partial results.
Session Management
Each session receives a unique ID (cryptographically secure, ASCII‑only). The server includes this ID in the Mcp-Session-Id response header. Clients must send this header on all subsequent requests; missing headers result in 400 Bad Request. Sessions can be terminated by the server ( 404 Not Found) or explicitly by the client via an HTTP DELETE request.
Compatibility
Servers that need backward compatibility must support both the old POST+SSE endpoints ( /sse and /message) and the new Streamable HTTP endpoint ( /mcp). Clients first attempt a POST to /mcp; on failure they fall back to a GET that opens an SSE stream and receives an endpoint event.
Function Compute Architecture
Function Compute consists of three layers:
Gateway : Entry point, handling authentication, rate limiting, and request routing.
Scheduler : Dispatches requests to appropriate function instances based on session affinity.
VMS : The execution environment for the function code.
Session lifecycle is divided into initialization, active handling, and termination, with the gateway persisting session‑instance mappings in a database.
Session Initialization
Client sends an Initialize request.
Gateway authenticates and forwards to the scheduler.
Scheduler selects an instance and starts the user code.
The instance returns Mcp-Session-Id in the response header.
Gateway stores the mapping for future requests.
Data Flow (Active Session)
Client sends subsequent requests with the session ID.
Gateway looks up the session‑instance mapping (cache or DB) and forwards the request.
Scheduler routes the request to the bound instance.
Instance returns either a synchronous HTTP response or an SSE stream.
Session Termination
When the client issues a DELETE request with the session ID, the server removes the mapping and releases resources. Additionally, the platform enforces SessionTTL (maximum lifetime) and SessionIdleTimeout (maximum idle period) to clean up stale sessions.
Graceful Upgrade / Rolling Deployment
During function updates, existing sessions continue to be served by the old instance, while new sessions are routed to the updated instance, ensuring uninterrupted service for stateful MCP workloads.
Demo Steps
Create a web function with the MCP Streamable HTTP affinity.
Configure an HTTP trigger with Bearer authentication.
Start the MCP Inspector locally, supplying the Bearer token and function endpoint.
Click “Connect” to establish a session and begin using the MCP Streamable HTTP service.
Conclusion
By adopting the MCP Streamable HTTP transport, developers gain lower latency, higher success rates, reduced connection overhead, and robust session management for serverless AI services, while maintaining security through Bearer authentication.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
