Design and Performance Optimization of a Million‑Scale WebSocket Gateway at Shimo Docs

This article details the redesign of Shimo Docs' WebSocket gateway—from a Node.js/Socket.IO 1.0 version to a Go‑based 2.0 architecture—covering handshake degradation, TLS memory savings, SnowFlake SocketID generation, Redis‑based session broadcasting, heartbeat tuning, custom Kafka headers, object pooling, MessagePack compression, extensive performance testing, and the resulting stability and scalability improvements for handling half‑a‑million concurrent connections.

Architecture Digest
Architecture Digest
Architecture Digest
Design and Performance Optimization of a Million‑Scale WebSocket Gateway at Shimo Docs

1 Introduction

Shimo Docs requires real‑time data synchronization for document sharing, comments, slides, etc., and long/short polling cannot meet the demand, so the team adopted the HTML5 WebSocket standard.

With daily peak connections reaching the million‑level, memory and CPU usage grew sharply, prompting a gateway reconstruction.

2 Gateway 1.0

2.1 Architecture

Gateway 1.0 was built with Node.js and Socket.IO; diagram omitted.

2.2 Pain points

Resource consumption: Nginx passed most traffic, Node gateway consumed high CPU/memory.

Maintenance & observability: not integrated with Shimo monitoring.

Business coupling: gateway and business logic tightly coupled, hindering horizontal scaling.

3 Gateway 2.0

Gateway 2.0 separates the gateway (WS‑Gateway) handling authentication, TLS, and connection management from the business layer (WS‑API) which communicates via gRPC. This decoupling enables targeted scaling, removes Nginx, and integrates with monitoring.

3.1 Overall architecture

Diagram omitted.

3.2 Handshake process

Describes normal WebSocket handshake steps and fallback to HTTP long‑polling under poor network conditions.

3.3 TLS memory optimization

Moving TLS termination from Nginx to the service reduced memory consumption; TLS handshake accounts for ~30% of total memory.

3.4 Socket ID design

Uses SnowFlake algorithm to generate unique IDs; in K8s the replica number is allocated via a registration service and stored in a database.

3.5 Cluster session management – event broadcast

Session data stored in Redis (sets and hashes). Two broadcast strategies compared: simple event broadcast vs. registration center; event broadcast chosen for simplicity.

3.6 Heartbeat mechanism

Clients receive heartbeat parameters, report timestamps, which are cached in memory and periodically synced to Redis to avoid burst load.

for { 
   select { 
   case <-t.C: 
     var now = time.Now().Unix() 
     var clients = make([]*Connection, 0) 
     dispatcher.clients.Range(func(_, v interface{}) bool { 
         client := v.(*Connection) 
         lastTs := atomic.LoadInt64(&client.LastMessageTS) 
         if now-lastTs > int64(expireTime) { 
            clients = append(clients, client) 
         } else { 
            dispatcher.clearRedisMapping(client.Id, client.Uid, lastTs, clearTimeout) 
         } 
         return true 
     }) 
     for _, cli := range clients { 
         cli.WsClose() 
     } 
   } 
}

Dynamic heartbeat interval reduces QPS from QPS1 = 500000/1 to QPS2 = 500000/y, lowering server load.

3.7 Custom Headers

Kafka headers carry routing and tracing information (X‑ID, X‑Uid, X‑Guid, X‑Inner, X‑Event, X‑Locale, X‑Operator, X‑Auth‑Type, X‑Client‑Version, X‑Server‑Version, X‑Push‑Client‑ID, X‑Trace‑ID) to avoid payload decoding.

3.8 Message receive & send

type Packet struct { 
  ... 
}

type Connect struct { 
  *websocket.Conn 
  send chan Packet 
}

func NewConnect(conn net.Conn) *Connect { 
  c := &Connect{ 
    send: make(chan Packet, N), 
  } 
  go c.reader() 
  return c 
}

Optimized to use two goroutines instead of three, reducing memory per connection.

3.9 Core object pooling

Connection objects are cached in a sync.Pool to lower GC pressure.

var ConnectionPool = sync.Pool{ 
   New: func() interface{} { 
     return &Connection{} 
   }, 
}

func GetConn() *Connection { 
   cli := ConnectionPool.Get().(*Connection) 
   return cli 
}

func PutConn(cli *Connection) { 
   cli.Reset() 
   ConnectionPool.Put(cli)  // 放回连接池 
}

3.10 Data transmission optimization

MessagePack serializes payloads; MTU tuned to avoid fragmentation (example ping‑s command shown).

3.11 Infrastructure support

Uses the EGO microservice framework for logging, asynchronous log output, dynamic log level, and integrates Redis/Kafka/MySQL monitoring SDKs.

4 Performance testing

4.1 Test setup

One 4‑core 8 GB VM as server, targeting 480 k connections.

Eight 4‑core 8 GB VMs as clients, each opening 60 k ports.

4.2 Scenario 1 – 500 k online users

WS‑Gateway consumes 22 % CPU and 71 % memory; each connection uses ~47 KB.

4.3 Scenario 2 – periodic broadcast with acknowledgments

Memory spikes caused service restart; broadcast code added 9.32 % memory, ack handling 10.38 %.

4.4 Scenario 3 – broadcast without ack

Memory usage ~93 % but no crashes; most consumption from 5 s broadcast timer.

4.5 Scenario 4 – high churn (40 k up/down per second)

CPU 47 %, memory 66 %; peak connection rate 18 570/s, receive 330 k/s, send 394 k/s.

4.6 Summary

Under 16 CPU / 32 GB RAM a single machine handles 500 k connections across all scenarios, meeting resource and stability expectations.

5 Conclusion

Gateway reconstruction decouples services, introduces degradable handshake, unique Socket IDs, optimized heartbeat, custom headers, streamlined message handling, object pooling, payload compression, and integrated monitoring, laying a solid foundation for future scaling.

6 Q&A

Addresses why Kafka is retained, the role of Redis broadcast, and other architectural decisions.

7 References

Microservice framework: https://github.com/gotomicro/ego

Monitoring SDKs: https://github.com/gotomicro/ego-component

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MicroservicesredisGoKafkaWebSocketgateway
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.