Design and Performance Optimization of a High‑Concurrency WebSocket Gateway (Version 2.0)
This article details the evolution from a Node.js‑based WebSocket gateway to a Go‑implemented, gRPC‑driven architecture, describing the redesign of connection handling, TLS off‑loading, socket ID generation, session management, custom Kafka headers, code‑level optimizations, and extensive performance testing that validates the new gateway’s scalability and resource efficiency.
1 Introduction
In several StoneDoc business scenarios—document sharing, comments, slide presentations, and spreadsheet collaboration—real‑time data synchronization and server‑initiated push are required, which HTTP cannot satisfy, so a WebSocket solution was adopted.
As daily peak connections grew to the million‑level, memory and CPU usage surged, prompting a gateway redesign.
2 Gateway 1.0
Gateway 1.0 was built with Node.js and Socket.IO, meeting early traffic needs.
2.1 Architecture
Architecture diagram (original image omitted).
2.2 Pain Points
Resource waste: Nginx only performed TLS termination and passed through requests, consuming CPU and memory.
Maintenance & monitoring: No integration with StoneDoc’s monitoring system.
Business coupling: Gateway and business services were tightly coupled, preventing independent scaling.
3 Gateway 2.0
Gateway 2.0 separates the gateway function (WS‑Gateway) from business processing (WS‑API). WS‑Gateway handles authentication, TLS, and WebSocket management; WS‑API communicates with component services via gRPC, enabling targeted scaling and removing Nginx.
3.1 Overall Architecture
Architecture diagram (original image omitted).
3.2 Handshake Process
Describes a multi‑step handshake that falls back to HTTP long‑polling under poor network conditions. Includes a JSON example of the initial Socket.IO response wrapped in a blockquote.
3.3 TLS Memory Optimization
TLS termination moved from Nginx to the service; analysis shows TLS handshake consumes ~30% of total memory.
3.4 Socket ID Design
Uses SnowFlake algorithm to generate unique IDs; in K8s environments IDs are allocated via a registration service and stored in a database for consistency across restarts.
3.5 Cluster Session Management – Event Broadcast
Session data is stored in Redis with keys such as ws:user:clients:${uid}, ws:guid:clients:${guid}, and ws:client:${socket.id}. Two broadcast strategies are compared: simple event broadcast (easy but scales with node count) and a registry center (clear mapping but adds operational cost). After benchmarking, Redis was chosen for message broadcasting due to its superior performance for small payloads (~1 KB).
3.6 Heartbeat Mechanism
Clients report heartbeats at a server‑defined interval; timestamps are first updated in memory, then periodically synced to Redis to avoid spikes. The QPS calculation shows how dynamic interval adjustment can reduce load.
for {
select {
case <-t.C:
var now = time.Now().Unix()
var clients = make([]*Connection, 0)
dispatcher.clients.Range(func(_, v interface{}) bool {
client := v.(*Connection)
lastTs := atomic.LoadInt64(&client.LastMessageTS)
if now-lastTs > int64(expireTime) {
clients = append(clients, client)
} else {
dispatcher.clearRedisMapping(client.Id, client.Uid, lastTs, clearTimeout)
}
return true
})
for _, cli := range clients {
cli.WsClose()
}
}
}Dynamic heartbeat intervals can lower QPS from 500 k /s to 500 k /y, where y is the maximum interval multiplier.
3.7 Custom Kafka Headers
Headers such as X‑ID, X‑Uid, X‑Guid, X‑Inner, X‑Event, X‑Operator, etc., are used to avoid payload decoding and to provide full traceability.
Field
Description
Detail
X-ID
WebSocket ID
Connection ID
X-Uid
User ID
User ID
X-Guid
File ID
File ID
X-Inner
Gateway internal command
User join/leave
X-Event
Gateway event
Connect/Message/Disconnect
X-Operator
API command
Unicast, broadcast, internal ops
These headers also carry trace IDs and timestamps for end‑to‑end monitoring.
3.8 Message Receive & Send
type Packet struct { ... }
type Connect struct {
*websocket.Conn
send chan Packet
}
func NewConnect(conn net.Conn) *Connect {
c := &Connect{send: make(chan Packet, N)}
go c.reader()
go c.writer()
return c
}Initial implementation used three goroutines per connection; later it was reduced to two by removing the idle writer goroutine and using a mutex‑protected write method.
type Connect struct {
*websocket.Conn
mux sync.RWMutex
}
func (c *Connect) Write(data []byte) error {
c.mux.Lock()
defer c.mux.Unlock()
// write logic
return nil
}Explored event‑driven libraries (gev, gnet) but did not adopt them for production.
3.9 Core Object Pooling
var ConnectionPool = sync.Pool{New: func() interface{} { return &Connection{} }}
func GetConn() *Connection { return ConnectionPool.Get().(*Connection) }
func PutConn(cli *Connection) { cli.Reset(); ConnectionPool.Put(cli) }Using sync.Pool reduces GC pressure.
3.10 Data Transfer Optimization
MessagePack is used for payload serialization; MTU is tuned via ping -s {a} {ip} to avoid fragmentation.
3.11 Infrastructure Support
The service is built on the EGO framework, with integrated logging, monitoring, and client SDKs for Redis, Kafka, and MySQL.
4 Performance Benchmark
4.1 Test Setup
One 16‑core / 32 GB VM as server targeting 480 k concurrent connections.
Eight 4‑core / 8 GB VMs as clients, each exposing 60 k ports.
4.2 Scenario 1
50 k online users; WS‑Gateway consumes 22 % CPU and 70 % memory, with 1.6 w connections/s and 47 KB per user.
4.3 Scenario 2
15‑minute test with 50 w users, push every 5 s, acknowledgments required. Memory exceeded limits, causing a restart. Broadcast code added 9.32 % memory, receipt handling added 10.38 %.
4.4 Scenario 3
Same load without acknowledgments; memory usage reached 93 % but no crashes.
4.5 Scenario 4
50 w users, push every 5 s with acknowledgments, plus 40 k up/down events per second. CPU peaked at 47 %, memory at 66 % with stable operation.
4.6 Summary
Under 16 C / 32 GB hardware, the gateway handles 500 k connections across all scenarios with acceptable CPU and memory usage, confirming the redesign’s scalability.
5 Conclusion
Decoupled gateway from business services and removed Nginx dependency.
Optimized handshake, socket ID generation, heartbeat handling, custom headers, message code structure, object pooling, payload compression, and monitoring integration.
Unified component calls via gRPC for traceability and future extensibility.
6 Technical Links
Microservice framework: https://github.com/gotomicro/ego
Kafka, Redis, MySQL client monitoring SDK: https://github.com/gotomicro/ego-component
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
