Design and Performance Optimization of a Million‑Scale WebSocket Gateway
This article details the evolution from a Node.js Socket.IO gateway to a Go‑based, gRPC‑enabled WebSocket gateway that supports millions of concurrent connections, covering architecture redesign, TLS handling, socket ID generation, session management, heartbeat optimization, custom Kafka headers, code‑level refinements, and extensive performance testing results.
The article begins by explaining why WebSocket, defined by the HTML5 standard, has become the mainstream solution for server‑push and notification scenarios in Shimo Docs, especially as daily peak connections reach the million‑level, prompting a gateway redesign.
Gateway 1.0 was built with Node.js and Socket.IO, satisfying early traffic but suffering from high CPU, memory consumption, tight coupling with business services, and limited observability.
Gateway 2.0 separates responsibilities into two services: WS‑Gateway (handling authentication, TLS, and WebSocket connection management) and WS‑API (business logic accessed via gRPC). This decoupling enables independent scaling, removal of Nginx, and integration with Shimo’s monitoring platform.
The new handshake flow supports graceful degradation to HTTP long‑polling when network conditions are poor, and TLS termination is moved from Nginx to the service itself, reducing overall memory usage by about 30%.
To guarantee unique connection identifiers, a SnowFlake‑based Socket ID scheme is adopted, with different strategies for physical machines and Kubernetes deployments.
Session data is stored partially in memory and partially in Redis using keys such as ws:user:clients:${uid}, ws:guid:clients:${guid}, and ws:client:${socket.id}. Two cluster‑wide session propagation methods are compared (event broadcast vs. registration center); event broadcast is chosen for simplicity, and Redis is selected over Kafka and RocketMQ after benchmarking 100 w enqueue/dequeue operations.
Heartbeat handling is optimized by reporting timestamps to memory first and syncing to Redis at a lower frequency, with a dynamic interval that can reduce QPS by a factor of y when connections are stable.
Custom Kafka headers (e.g., X-ID, X-Uid, X-Guid, X-Event) are used to avoid payload decoding overhead and to provide full traceability of message flow.
Message send/receive code is refactored to use only two goroutines per connection instead of three, employing a mutex‑protected write path and a sync.Pool for Connection objects, as shown below:
type Packet struct { ... }
type Connect struct {
*websocket.Conn
mux sync.RWMutex
}
func NewConnect(conn net.Conn) *Connect {
c := &Connect{send: make(chan Packet, N)}
go c.reader()
return c
}
func (c *Connect) Write(data []byte) (err error) {
c.mux.Lock()
defer c.mux.Unlock()
// write logic
return nil
}Object pooling further reduces GC pressure:
var ConnectionPool = sync.Pool{New: func() interface{} { return &Connection{} }}
func GetConn() *Connection { return ConnectionPool.Get().(*Connection) }
func PutConn(cli *Connection) { cli.Reset(); ConnectionPool.Put(cli) }Message payloads are serialized with MessagePack and MTU is tuned (e.g., probing with ping -s {a} {ip}) to avoid fragmentation, achieving efficient network usage.
The service is built on the EGO framework, providing structured logging, asynchronous output, dynamic log levels, and comprehensive monitoring of CPU, memory, goroutine count, and latency.
Four performance‑testing scenarios (varying connection counts, push frequencies, and acknowledgment requirements) are executed on a 16‑core, 32 GB machine, demonstrating peak CPU usage between 22% and 47% and memory usage up to 94% while handling up to 1.1 w new connections per second and 400 k messages per second.
In summary, the redesign decouples gateway and business logic, introduces a degradable handshake, unique Socket IDs, optimized heartbeat, custom Kafka headers, streamlined message handling, connection pooling, payload compression, and full observability, resulting in lower per‑user memory consumption, higher scalability, and stable operation at the hundred‑thousand‑connection scale.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
