Cloud Native 19 min read

Why SSE and WebSocket Are the Perfect Fit for Large Language Model Apps

This article explains how Server‑Sent Events (SSE) and WebSocket provide the low‑latency, bidirectional communication required by large language model applications, compares them with traditional HTTP/HTTPS protocols, outlines technical challenges such as gateway stability, bandwidth and security, and offers practical implementation steps and mitigation strategies.

Alibaba Cloud Native

Jan 26, 2025

Why SSE and WebSocket Are the Perfect Fit for Large Language Model Apps

Real‑time communication as a prerequisite for LLM applications

Large language model (LLM) services generate content continuously and need to push partial results to clients without waiting for the entire response, making real‑time communication essential.

What are SSE and WebSocket?

SSE (Server‑Sent Events) is an HTTP‑based, unidirectional protocol that lets the server stream text data to the client using the text/event-stream MIME type. Its main advantages are efficient single‑direction streaming, low latency, and a lightweight handshake.

WebSocket establishes a full‑duplex, persistent TCP connection after an HTTP upgrade handshake, enabling real‑time bidirectional data exchange suitable for chat, online games, collaborative editing, and multimodal LLM interactions.

Traditional web protocols before LLMs

Web applications historically relied on HTTP/HTTPS (including HTTP/2 and HTTP/3) for request‑response communication. HTTPS provides encryption, wide browser support, and a stateless model, but each request creates a new connection and cannot push incremental results.

Why HTTP/HTTPS falls short for LLMs

Only supports single‑direction request‑response, preventing streaming and long‑running tasks.

Repeated connection establishment adds latency, unsuitable for real‑time dialogue.

Stateless nature forces the client to resend context on every request, increasing network overhead.

Even with HTTP/2 multiplexing, the protocol was not designed for continuous server‑initiated streams required by LLMs.

Why SSE and WebSocket suit LLMs

SSE matches the pattern “client sends one request, server continuously returns tokens”.

WebSocket adds true bidirectional communication, allowing the client to interrupt generation or send additional inputs.

Technical challenges and mitigation strategies

When user volume grows, gateways that manage SSE/WebSocket connections face stability, bandwidth, and security issues.

Gateway stability during software changes and scaling

Challenge: Service restarts or instance scaling can break long‑living connections.

Solutions: Implement lossless up/down‑grade mechanisms, client‑side auto‑reconnect with heartbeat, and fallback to long‑polling when necessary.

Bandwidth and memory pressure

Challenge: LLMs often stream large text, images, or video, causing high bandwidth consumption and rapid memory growth.

Solutions: Use gateways that support streaming chunking (e.g., Higress), enable compression (Gzip), and apply rate‑limiting.

Security and DDoS resilience

Challenge: LLM inference consumes far more backend resources than typical web requests, amplifying the impact of attacks.

Solutions: Deploy authentication (OAuth2, JWT), IP‑based access control, request‑level throttling, and WSS encryption.

SSE workflow

Client creates an EventSource pointing to the SSE endpoint.

const eventSource = new EventSource('https://example.com/sse-endpoint');

Server responds with headers:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Server streams data lines prefixed with data: and terminated by

data: {"message": "Hello"}

Client handles messages via onmessage.

eventSource.onmessage = (event) => {
    console.log('Received data:', event.data);
};

On error or disconnect, the client automatically retries; the server may send retry: 5000 to set reconnection delay.

WebSocket workflow

Client initiates a handshake with an HTTP GET request containing Upgrade: websocket, Connection: Upgrade, and Sec-WebSocket-Key.

GET /ws-endpoint HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

Server replies with 101 Switching Protocols and includes Sec-WebSocket-Accept.

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

Connection upgrades to a full‑duplex TCP channel; both sides can send frames (text or binary).

Typical text frame payload: {"message": "Hello"} Binary frame example: [0x01, 0x02, 0x03] Either side can close the connection by sending a close frame with a status code.

Feature comparison (summary of the omitted table)

Protocol basis: HTTP/1.1 or HTTP/2 for SSE; independent TCP for WebSocket.

Communication mode: Unidirectional (SSE) vs. bidirectional (WebSocket).

Connection reuse: SSE reuses HTTP/1.1 connections; WebSocket keeps a persistent connection.

Header compression: Available in HTTP/2, not in WebSocket.

Latency: SSE low; WebSocket extremely low.

Reconnection: Automatic in SSE, manual in WebSocket.

Typical use cases: SSE for real‑time notifications and LLM streaming; WebSocket for chat, online gaming, collaborative editing.

Future trends

LLM services are driving an “API‑First” approach, where capabilities are exposed via REST or Realtime APIs. For example, Perplexity recently launched AI‑search APIs (Sonar and Sonar Pro) that can be integrated into platforms like Zoom, illustrating the shift toward API‑driven LLM integration.

Upcoming articles will explore why API management is gaining attention in the LLM era.

References

Server‑Sent Events specification: https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events

OpenAI Realtime WebSocket guide: https://platform.openai.com/docs/guides/realtime-websocket

OpenAI Realtime overview: https://platform.openai.com/docs/guides/realtime

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

api-gateway WebSocket real-time communication SSE

Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.