Design and Performance Optimization of a High‑Concurrency WebSocket Gateway (Version 2.0)

This article details the evolution from a Node.js‑based WebSocket gateway to a Go‑implemented, gRPC‑driven architecture, describing the redesign of connection handling, TLS off‑loading, socket ID generation, session management, custom Kafka headers, code‑level optimizations, and extensive performance testing that validates the new gateway’s scalability and resource efficiency.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Design and Performance Optimization of a High‑Concurrency WebSocket Gateway (Version 2.0)

1 Introduction

In several StoneDoc business scenarios—document sharing, comments, slide presentations, and spreadsheet collaboration—real‑time data synchronization and server‑initiated push are required, which HTTP cannot satisfy, so a WebSocket solution was adopted.

As daily peak connections grew to the million‑level, memory and CPU usage surged, prompting a gateway redesign.

2 Gateway 1.0

Gateway 1.0 was built with Node.js and Socket.IO, meeting early traffic needs.

2.1 Architecture

Architecture diagram (original image omitted).

2.2 Pain Points

Resource waste: Nginx only performed TLS termination and passed through requests, consuming CPU and memory.

Maintenance & monitoring: No integration with StoneDoc’s monitoring system.

Business coupling: Gateway and business services were tightly coupled, preventing independent scaling.

3 Gateway 2.0

Gateway 2.0 separates the gateway function (WS‑Gateway) from business processing (WS‑API). WS‑Gateway handles authentication, TLS, and WebSocket management; WS‑API communicates with component services via gRPC, enabling targeted scaling and removing Nginx.

3.1 Overall Architecture

Architecture diagram (original image omitted).

3.2 Handshake Process

Describes a multi‑step handshake that falls back to HTTP long‑polling under poor network conditions. Includes a JSON example of the initial Socket.IO response wrapped in a blockquote.

3.3 TLS Memory Optimization

TLS termination moved from Nginx to the service; analysis shows TLS handshake consumes ~30% of total memory.

3.4 Socket ID Design

Uses SnowFlake algorithm to generate unique IDs; in K8s environments IDs are allocated via a registration service and stored in a database for consistency across restarts.

3.5 Cluster Session Management – Event Broadcast

Session data is stored in Redis with keys such as ws:user:clients:${uid}, ws:guid:clients:${guid}, and ws:client:${socket.id}. Two broadcast strategies are compared: simple event broadcast (easy but scales with node count) and a registry center (clear mapping but adds operational cost). After benchmarking, Redis was chosen for message broadcasting due to its superior performance for small payloads (~1 KB).

3.6 Heartbeat Mechanism

Clients report heartbeats at a server‑defined interval; timestamps are first updated in memory, then periodically synced to Redis to avoid spikes. The QPS calculation shows how dynamic interval adjustment can reduce load.

for {
    select {
    case <-t.C:
        var now = time.Now().Unix()
        var clients = make([]*Connection, 0)
        dispatcher.clients.Range(func(_, v interface{}) bool {
            client := v.(*Connection)
            lastTs := atomic.LoadInt64(&client.LastMessageTS)
            if now-lastTs > int64(expireTime) {
                clients = append(clients, client)
            } else {
                dispatcher.clearRedisMapping(client.Id, client.Uid, lastTs, clearTimeout)
            }
            return true
        })
        for _, cli := range clients {
            cli.WsClose()
        }
    }
}

Dynamic heartbeat intervals can lower QPS from 500 k /s to 500 k /y, where y is the maximum interval multiplier.

3.7 Custom Kafka Headers

Headers such as X‑ID, X‑Uid, X‑Guid, X‑Inner, X‑Event, X‑Operator, etc., are used to avoid payload decoding and to provide full traceability.

Field

Description

Detail

X-ID

WebSocket ID

Connection ID

X-Uid

User ID

User ID

X-Guid

File ID

File ID

X-Inner

Gateway internal command

User join/leave

X-Event

Gateway event

Connect/Message/Disconnect

X-Operator

API command

Unicast, broadcast, internal ops

These headers also carry trace IDs and timestamps for end‑to‑end monitoring.

3.8 Message Receive & Send

type Packet struct { ... }

type Connect struct {
    *websocket.Conn
    send chan Packet
}

func NewConnect(conn net.Conn) *Connect {
    c := &Connect{send: make(chan Packet, N)}
    go c.reader()
    go c.writer()
    return c
}

Initial implementation used three goroutines per connection; later it was reduced to two by removing the idle writer goroutine and using a mutex‑protected write method.

type Connect struct {
    *websocket.Conn
    mux sync.RWMutex
}

func (c *Connect) Write(data []byte) error {
    c.mux.Lock()
    defer c.mux.Unlock()
    // write logic
    return nil
}

Explored event‑driven libraries (gev, gnet) but did not adopt them for production.

3.9 Core Object Pooling

var ConnectionPool = sync.Pool{New: func() interface{} { return &Connection{} }}

func GetConn() *Connection { return ConnectionPool.Get().(*Connection) }

func PutConn(cli *Connection) { cli.Reset(); ConnectionPool.Put(cli) }

Using sync.Pool reduces GC pressure.

3.10 Data Transfer Optimization

MessagePack is used for payload serialization; MTU is tuned via ping -s {a} {ip} to avoid fragmentation.

3.11 Infrastructure Support

The service is built on the EGO framework, with integrated logging, monitoring, and client SDKs for Redis, Kafka, and MySQL.

4 Performance Benchmark

4.1 Test Setup

One 16‑core / 32 GB VM as server targeting 480 k concurrent connections.

Eight 4‑core / 8 GB VMs as clients, each exposing 60 k ports.

4.2 Scenario 1

50 k online users; WS‑Gateway consumes 22 % CPU and 70 % memory, with 1.6 w connections/s and 47 KB per user.

4.3 Scenario 2

15‑minute test with 50 w users, push every 5 s, acknowledgments required. Memory exceeded limits, causing a restart. Broadcast code added 9.32 % memory, receipt handling added 10.38 %.

4.4 Scenario 3

Same load without acknowledgments; memory usage reached 93 % but no crashes.

4.5 Scenario 4

50 w users, push every 5 s with acknowledgments, plus 40 k up/down events per second. CPU peaked at 47 %, memory at 66 % with stable operation.

4.6 Summary

Under 16 C / 32 GB hardware, the gateway handles 500 k connections across all scenarios with acceptable CPU and memory usage, confirming the redesign’s scalability.

5 Conclusion

Decoupled gateway from business services and removed Nginx dependency.

Optimized handshake, socket ID generation, heartbeat handling, custom headers, message code structure, object pooling, payload compression, and monitoring integration.

Unified component calls via gRPC for traceability and future extensibility.

6 Technical Links

Microservice framework: https://github.com/gotomicro/ego

Kafka, Redis, MySQL client monitoring SDK: https://github.com/gotomicro/ego-component

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceredisGoKafkaWebSocket
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.