Design and Performance Optimization of a Million‑Scale WebSocket Gateway at Shimo Docs
This article details the redesign of Shimo Docs' WebSocket gateway—from a Node.js/Socket.IO 1.0 version to a Go‑based 2.0 architecture—covering handshake degradation, TLS memory savings, SnowFlake SocketID generation, Redis‑based session broadcasting, heartbeat tuning, custom Kafka headers, object pooling, MessagePack compression, extensive performance testing, and the resulting stability and scalability improvements for handling half‑a‑million concurrent connections.
1 Introduction
Shimo Docs requires real‑time data synchronization for document sharing, comments, slides, etc., and long/short polling cannot meet the demand, so the team adopted the HTML5 WebSocket standard.
With daily peak connections reaching the million‑level, memory and CPU usage grew sharply, prompting a gateway reconstruction.
2 Gateway 1.0
2.1 Architecture
Gateway 1.0 was built with Node.js and Socket.IO; diagram omitted.
2.2 Pain points
Resource consumption: Nginx passed most traffic, Node gateway consumed high CPU/memory.
Maintenance & observability: not integrated with Shimo monitoring.
Business coupling: gateway and business logic tightly coupled, hindering horizontal scaling.
3 Gateway 2.0
Gateway 2.0 separates the gateway (WS‑Gateway) handling authentication, TLS, and connection management from the business layer (WS‑API) which communicates via gRPC. This decoupling enables targeted scaling, removes Nginx, and integrates with monitoring.
3.1 Overall architecture
Diagram omitted.
3.2 Handshake process
Describes normal WebSocket handshake steps and fallback to HTTP long‑polling under poor network conditions.
3.3 TLS memory optimization
Moving TLS termination from Nginx to the service reduced memory consumption; TLS handshake accounts for ~30% of total memory.
3.4 Socket ID design
Uses SnowFlake algorithm to generate unique IDs; in K8s the replica number is allocated via a registration service and stored in a database.
3.5 Cluster session management – event broadcast
Session data stored in Redis (sets and hashes). Two broadcast strategies compared: simple event broadcast vs. registration center; event broadcast chosen for simplicity.
3.6 Heartbeat mechanism
Clients receive heartbeat parameters, report timestamps, which are cached in memory and periodically synced to Redis to avoid burst load.
for {
select {
case <-t.C:
var now = time.Now().Unix()
var clients = make([]*Connection, 0)
dispatcher.clients.Range(func(_, v interface{}) bool {
client := v.(*Connection)
lastTs := atomic.LoadInt64(&client.LastMessageTS)
if now-lastTs > int64(expireTime) {
clients = append(clients, client)
} else {
dispatcher.clearRedisMapping(client.Id, client.Uid, lastTs, clearTimeout)
}
return true
})
for _, cli := range clients {
cli.WsClose()
}
}
}Dynamic heartbeat interval reduces QPS from QPS1 = 500000/1 to QPS2 = 500000/y, lowering server load.
3.7 Custom Headers
Kafka headers carry routing and tracing information (X‑ID, X‑Uid, X‑Guid, X‑Inner, X‑Event, X‑Locale, X‑Operator, X‑Auth‑Type, X‑Client‑Version, X‑Server‑Version, X‑Push‑Client‑ID, X‑Trace‑ID) to avoid payload decoding.
3.8 Message receive & send
type Packet struct {
...
}
type Connect struct {
*websocket.Conn
send chan Packet
}
func NewConnect(conn net.Conn) *Connect {
c := &Connect{
send: make(chan Packet, N),
}
go c.reader()
return c
}Optimized to use two goroutines instead of three, reducing memory per connection.
3.9 Core object pooling
Connection objects are cached in a sync.Pool to lower GC pressure.
var ConnectionPool = sync.Pool{
New: func() interface{} {
return &Connection{}
},
}
func GetConn() *Connection {
cli := ConnectionPool.Get().(*Connection)
return cli
}
func PutConn(cli *Connection) {
cli.Reset()
ConnectionPool.Put(cli) // 放回连接池
}3.10 Data transmission optimization
MessagePack serializes payloads; MTU tuned to avoid fragmentation (example ping‑s command shown).
3.11 Infrastructure support
Uses the EGO microservice framework for logging, asynchronous log output, dynamic log level, and integrates Redis/Kafka/MySQL monitoring SDKs.
4 Performance testing
4.1 Test setup
One 4‑core 8 GB VM as server, targeting 480 k connections.
Eight 4‑core 8 GB VMs as clients, each opening 60 k ports.
4.2 Scenario 1 – 500 k online users
WS‑Gateway consumes 22 % CPU and 71 % memory; each connection uses ~47 KB.
4.3 Scenario 2 – periodic broadcast with acknowledgments
Memory spikes caused service restart; broadcast code added 9.32 % memory, ack handling 10.38 %.
4.4 Scenario 3 – broadcast without ack
Memory usage ~93 % but no crashes; most consumption from 5 s broadcast timer.
4.5 Scenario 4 – high churn (40 k up/down per second)
CPU 47 %, memory 66 %; peak connection rate 18 570/s, receive 330 k/s, send 394 k/s.
4.6 Summary
Under 16 CPU / 32 GB RAM a single machine handles 500 k connections across all scenarios, meeting resource and stability expectations.
5 Conclusion
Gateway reconstruction decouples services, introduces degradable handshake, unique Socket IDs, optimized heartbeat, custom headers, streamlined message handling, object pooling, payload compression, and integrated monitoring, laying a solid foundation for future scaling.
6 Q&A
Addresses why Kafka is retained, the role of Redis broadcast, and other architectural decisions.
7 References
Microservice framework: https://github.com/gotomicro/ego
Monitoring SDKs: https://github.com/gotomicro/ego-component
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
