Understanding the WebSocket Protocol: Frame Structure, Extensions, and Handshake
This article explains the WebSocket protocol (RFC 6455), covering its HTTP upgrade handshake, binary frame format, message fragmentation, multiplexing and compression extensions, and the required header fields for establishing a secure, bidirectional communication channel between client and server.
WebSocket Protocol Overview
WebSocket, developed by the HyBi working group (RFC 6455), consists of two high‑level components: an opening HTTP handshake that negotiates connection parameters and a binary framing mechanism that enables low‑overhead, message‑oriented text and binary data transfer.
The WebSocket protocol attempts to solve the shortcomings of existing bidirectional HTTP technologies within the existing HTTP infrastructure, allowing it to operate on standard HTTP ports (80 and 443) while not being limited to them. ----WebSocket Protocol RFC 6455
Although WebSocket is a fully independent protocol that can be used outside browsers, its primary use case is bidirectional communication for browser‑based applications.
Binary Framing Layer
Clients and servers communicate via a message‑oriented API: the sender provides an arbitrary UTF‑8 or binary payload, and the receiver is notified when the complete message is available. To achieve this, WebSocket defines a custom binary frame format that splits each application message into one or more frames, transports them, reassembles them, and notifies the receiver once the whole message has arrived.
Figure 17‑1. WebSocket frame: 2–14 bytes + payload
Frame
The smallest unit of communication; each unit contains a variable‑length header and a payload that may carry a full or partial application message.
Message
A complete sequence of frames that maps to a logical application message.
Whether an application message is split into multiple frames is decided by the underlying implementation of the client and server framing code, so the application itself does not need to be aware of individual frames. Nevertheless, understanding how each frame is represented on the wire is useful:
The first bit (FIN) of each frame indicates whether this frame is the final fragment of a message; a message may consist of a single frame.
The 4‑bit opcode specifies the frame type: text (1), binary (2), or control frames such as close (8), ping (9), and pong (10).
The mask bit indicates whether the payload is masked (required for client‑to‑server messages).
Payload length is expressed with a variable‑length field: values 0‑125 are the length directly; 126 means the next two bytes hold a 16‑bit unsigned length; 127 means the next eight bytes hold a 64‑bit unsigned length.
The masking key is a 32‑bit value used to mask the payload.
The payload contains the application data and any negotiated extensions.
All client‑initiated frames have their payload masked using the value specified in the frame header; this prevents malicious scripts from performing cache‑poisoning attacks on intermediaries that do not understand the WebSocket protocol.
Each server‑sent WebSocket frame adds 2–10 bytes of overhead. Clients also send a 4‑byte masking key, increasing overhead by an additional 4 bytes (total 6–14 bytes). No other metadata (such as header fields) is transmitted; the payload is treated as an opaque blob.
WebSocket multiplexing and head‑of‑line (HOL) blocking WebSocket is susceptible to HOL blocking: frames from different messages cannot interleave because HTTP/2 lacks an equivalent “stream ID”. Consequently, a large message split into many frames can block the delivery of frames belonging to other messages. For latency‑sensitive data, consider limiting payload size or splitting large messages into multiple logical messages. Because each WebSocket connection requires a dedicated TCP connection, browsers limit the number of concurrent connections per origin, which can become a resource‑exhaustion issue. The HyBi‑defined “WebSocket Multiplexing Extension” addresses this limitation by allowing multiple virtual WebSocket channels to share a single TCP connection, each identified by a channel‑ID. Even with multiplexing, individual channels can still suffer HOL blocking, so using separate channels or parallel TCP connections may be necessary. Although the extension is defined for HTTP/1, using HTTP/2 (which already provides stream multiplexing) would simplify transporting multiple WebSocket connections over a single session.
Protocol Extensions
The WebSocket specification permits extensions that add new opcodes and data fields, enabling additional functionality without changes to application code.
Two official extensions under development are:
WebSocket Multiplexing Extension
Provides a way for multiple logical WebSocket connections to share a single underlying transport connection.
WebSocket Compression Extension
Defines a framework for adding compression to the WebSocket protocol.
Enabling extensions requires the client to advertise them during the initial upgrade handshake; the server must select and confirm the extensions for the lifetime of the connection.
WebSocket multiplexing and compression in the wild As of mid‑2013, multiplexing was not supported by popular browsers, and compression support was limited to the deprecated “x‑webkit‑deflate‑frame” extension. Modern browsers are moving toward per‑message compression, but it remains experimental. Applications should monitor the content type of transmitted data and apply their own compression when appropriate, especially for mobile clients where every byte matters.
HTTP Upgrade Negotiation
Before any messages are exchanged, the client and server must negotiate the connection parameters via an HTTP upgrade handshake.
Key request headers include:
Sec-WebSocket-Version : indicates the WebSocket protocol version the client wishes to use (e.g., 13).
Sec-WebSocket-Key : a randomly generated key that the server must hash and return to prove protocol support.
Sec-WebSocket-Protocol : optional list of application sub‑protocols the client supports.
Sec-WebSocket-Extensions : optional list of extensions the client wishes to use.
Example upgrade request:
GET /socket HTTP/1.1
Host: thirdparty.com
Origin: http://example.com
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Version: 13
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Protocol: appProtocol, appProtocol-v2
Sec-WebSocket-Extensions: x-webkit-deflate-message, x-custom-extensionThe server responds with a 101 Switching Protocols status and confirms the selected version, sub‑protocol, and extensions:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Access-Control-Allow-Origin: http://example.com
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: appProtocol-v2
Sec-WebSocket-Extensions: x-custom-extensionAll RFC6455‑compatible servers compute the Sec‑WebSocket‑Accept value by concatenating the client’s Sec‑WebSocket‑Key with a GUID, applying SHA‑1, and base64‑encoding the result.
Once the handshake succeeds, the connection becomes a full‑duplex channel for WebSocket messages, and no further HTTP traffic occurs.
Proxies, intermediaries, and WebSockets Because many networks only allow ports 80 and 443, WebSocket upgrades are performed over HTTP to maintain compatibility. However, some intermediaries may not understand WebSocket frames, leading to failures or unintended modifications. Using TLS (WSS) creates an encrypted tunnel before the upgrade, mitigating many proxy‑related issues, especially for mobile clients that traverse various proxy services.
Tags: WebSocket, Protocol, Binary Frames, Handshake, Extensions, Networking
Architects Research Society
A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.