Why SSE and WebSocket Are the Perfect Fit for Large Language Model Apps
This article explains how Server‑Sent Events (SSE) and WebSocket provide the low‑latency, bidirectional communication required by large language model applications, compares them with traditional HTTP/HTTPS protocols, outlines technical challenges such as gateway stability, bandwidth and security, and offers practical implementation steps and mitigation strategies.
Real‑time communication as a prerequisite for LLM applications
Large language model (LLM) services generate content continuously and need to push partial results to clients without waiting for the entire response, making real‑time communication essential.
What are SSE and WebSocket?
SSE (Server‑Sent Events) is an HTTP‑based, unidirectional protocol that lets the server stream text data to the client using the text/event-stream MIME type. Its main advantages are efficient single‑direction streaming, low latency, and a lightweight handshake.
WebSocket establishes a full‑duplex, persistent TCP connection after an HTTP upgrade handshake, enabling real‑time bidirectional data exchange suitable for chat, online games, collaborative editing, and multimodal LLM interactions.
Traditional web protocols before LLMs
Web applications historically relied on HTTP/HTTPS (including HTTP/2 and HTTP/3) for request‑response communication. HTTPS provides encryption, wide browser support, and a stateless model, but each request creates a new connection and cannot push incremental results.
Why HTTP/HTTPS falls short for LLMs
Only supports single‑direction request‑response, preventing streaming and long‑running tasks.
Repeated connection establishment adds latency, unsuitable for real‑time dialogue.
Stateless nature forces the client to resend context on every request, increasing network overhead.
Even with HTTP/2 multiplexing, the protocol was not designed for continuous server‑initiated streams required by LLMs.
Why SSE and WebSocket suit LLMs
SSE matches the pattern “client sends one request, server continuously returns tokens”.
WebSocket adds true bidirectional communication, allowing the client to interrupt generation or send additional inputs.
Technical challenges and mitigation strategies
When user volume grows, gateways that manage SSE/WebSocket connections face stability, bandwidth, and security issues.
Gateway stability during software changes and scaling
Challenge: Service restarts or instance scaling can break long‑living connections.
Solutions: Implement lossless up/down‑grade mechanisms, client‑side auto‑reconnect with heartbeat, and fallback to long‑polling when necessary.
Bandwidth and memory pressure
Challenge: LLMs often stream large text, images, or video, causing high bandwidth consumption and rapid memory growth.
Solutions: Use gateways that support streaming chunking (e.g., Higress), enable compression (Gzip), and apply rate‑limiting.
Security and DDoS resilience
Challenge: LLM inference consumes far more backend resources than typical web requests, amplifying the impact of attacks.
Solutions: Deploy authentication (OAuth2, JWT), IP‑based access control, request‑level throttling, and WSS encryption.
SSE workflow
Client creates an EventSource pointing to the SSE endpoint.
const eventSource = new EventSource('https://example.com/sse-endpoint');Server responds with headers:
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-aliveServer streams data lines prefixed with data: and terminated by
.
data: {"message": "Hello"}Client handles messages via onmessage.
eventSource.onmessage = (event) => {
console.log('Received data:', event.data);
};On error or disconnect, the client automatically retries; the server may send retry: 5000 to set reconnection delay.
WebSocket workflow
Client initiates a handshake with an HTTP GET request containing Upgrade: websocket, Connection: Upgrade, and Sec-WebSocket-Key.
GET /ws-endpoint HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13Server replies with 101 Switching Protocols and includes Sec-WebSocket-Accept.
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=Connection upgrades to a full‑duplex TCP channel; both sides can send frames (text or binary).
Typical text frame payload: {"message": "Hello"} Binary frame example: [0x01, 0x02, 0x03] Either side can close the connection by sending a close frame with a status code.
Feature comparison (summary of the omitted table)
Protocol basis: HTTP/1.1 or HTTP/2 for SSE; independent TCP for WebSocket.
Communication mode: Unidirectional (SSE) vs. bidirectional (WebSocket).
Connection reuse: SSE reuses HTTP/1.1 connections; WebSocket keeps a persistent connection.
Header compression: Available in HTTP/2, not in WebSocket.
Latency: SSE low; WebSocket extremely low.
Reconnection: Automatic in SSE, manual in WebSocket.
Typical use cases: SSE for real‑time notifications and LLM streaming; WebSocket for chat, online gaming, collaborative editing.
Future trends
LLM services are driving an “API‑First” approach, where capabilities are exposed via REST or Realtime APIs. For example, Perplexity recently launched AI‑search APIs (Sonar and Sonar Pro) that can be integrated into platforms like Zoom, illustrating the shift toward API‑driven LLM integration.
Upcoming articles will explore why API management is gaining attention in the LLM era.
References
Server‑Sent Events specification: https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events
OpenAI Realtime WebSocket guide: https://platform.openai.com/docs/guides/realtime-websocket
OpenAI Realtime overview: https://platform.openai.com/docs/guides/realtime
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
