Why Streamable HTTP Beats HTTP+SSE in MCP: Stability, Performance, and Simplicity
The article analyzes the new Streamable HTTP transport introduced in MCP (Model Context Protocol) PR #206, comparing it with the legacy HTTP+SSE approach across stability, TCP connection usage, request success rate, latency, and client‑side code complexity, and shows why Streamable HTTP is superior in high‑concurrency cloud‑native deployments.
Background
Model Context Protocol (MCP) is a standard for communication between AI models and tools. The original transport used HTTP combined with Server‑Sent Events (SSE), which shows stability, performance and client‑side complexity issues in high‑concurrency scenarios.
Issues with HTTP + SSE
Long‑living connections – servers must keep persistent connections, consuming resources under load.
Message delivery limited to SSE – adds unnecessary overhead.
Infrastructure compatibility – firewalls and load balancers often terminate long SSE streams.
Streamable HTTP introduced in PR #206
The new transport replaces the dual‑channel design with a single, on‑demand HTTP endpoint that can return a normal response or stream data when needed. Key improvements:
Unified endpoint : removes the dedicated /sse path.
On‑demand streaming : server chooses between plain HTTP and streaming.
Session management : adds a session mechanism for stateful interactions.
Stability comparison
In a simulated test with 1 000 concurrent users, the SSE server required a separate TCP connection for each request, causing the number of open connections to explode, while Streamable HTTP reused a few dozen connections.
TCP connection count
Results show that HTTP + SSE continuously increases TCP connections, whereas Streamable HTTP keeps the count low by establishing connections only when needed.
Request success rate
When the number of concurrent users approaches the OS limit (≈1024 connections), the SSE server’s success rate drops sharply, while Streamable HTTP maintains a high success ratio.
Performance
Response‑time measurements (log scale) reveal that the SSE server’s latency grows from 0.0018 s to 1.5112 s as concurrency rises, whereas the Streamable HTTP server stays around 0.0075 s, benefiting from the high‑performance Higress gateway.
Client‑side complexity
Sample client code demonstrates that the SSE implementation must handle connection setup, reconnection logic and separate POST calls, whereas the Streamable HTTP client only sends a single POST request and processes the response.
Code examples
class SSEClient:
def __init__(self, url: str, headers: dict = None):
self.url = url
self.headers = headers or {}
self.event_source = None
self.endpoint = None
async def connect(self):
# 1. Establish SSE connection
async with aiohttp.ClientSession(headers=self.headers) as session:
self.event_source = await session.get(self.url)
# 2. Handle connection event
print('SSE connection established')
# 3. Process messages
async for line in self.event_source.content:
if line:
message = json.loads(line)
await self.handle_message(message)
# 4. Error handling and reconnection
if self.event_source.status != 200:
print(f'SSE error: {self.event_source.status}')
await self.reconnect()
async def send(self, message: dict):
# Requires an extra POST request
async with aiohttp.ClientSession(headers=self.headers) as session:
async with session.post(self.endpoint, json=message) as response:
return await response.json()
async def handle_message(self, message: dict):
print(f'Received message: {message}')
async def reconnect(self):
print('Attempting to reconnect...')
await self.connect() class StreamableHTTPClient:
def __init__(self, url: str, headers: dict = None):
self.url = url
self.headers = headers or {}
async def send(self, message: dict):
# 1. Send POST request
async with aiohttp.ClientSession(headers=self.headers) as session:
async with session.post(self.url, json=message,
headers={'Content-Type': 'application/json'}) as response:
# 2. Handle response
if response.status == 200:
return await response.json()
else:
raise Exception(f'HTTP error: {response.status}')Conclusions
Streamable HTTP offers better stability, lower TCP connection usage, higher success rates under load, shorter and more predictable response times, and a simpler client implementation compared with the legacy HTTP + SSE transport.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
