Backend Development 24 min read

Inside 360’s High‑Performance Go Push System: Architecture, Metrics, and Lessons Learned

An in‑depth look at 360’s long‑connection push platform built with Go, covering system architecture, key performance indicators, scalability challenges, optimization strategies, operational practices, and practical Q&A for building reliable, high‑throughput messaging services.

21CTO

Aug 26, 2015

Inside 360’s High‑Performance Go Push System: Architecture, Metrics, and Lessons Learned

Zhou Yang, technical manager and architect of 360 Mobile Assistant, introduces the 360 long‑connection push system, which serves thousands of internal apps and supports various push scenarios.

System Overview

The system is a long‑connection push platform deployed across multiple IDC clusters, handling billions of online users. It provides upstream data, user‑status callbacks, and supports both PC, mobile, and IoT devices.

Key Performance Indicators

1. Connection count per instance – In stable conditions a single instance can maintain up to 3 million long connections, but real‑world network variability limits practical usage.

2. Memory usage – Go’s goroutine model adds overhead; full‑duplex designs consume more memory than half‑duplex.

3. Messages per second – Throughput depends on QoS, push‑pull strategy, logging, and buffering; typical peaks reach 20‑50 k QPS with 256 B–1 kB payloads on a 24‑core, 64 GB server.

Architecture Overview

The architecture consists of several services, all written in Go:

dispatcher service – Returns a set of IPs for clients to establish long connections.

room service – Holds user connections, registers them, and enforces security policies.

coordinator service – Forwards upstream data and coordinates asynchronous operations.

saver service – Access layer for Redis and MySQL, provides caching and dead‑message handling.

center service – Exposes internal APIs for unicast, broadcast, and status queries.

deployd/agent service – Manages process deployment and collects component status; Zookeeper/keeper handle configuration.

Push models include pure server‑push, pull‑only, and hybrid push‑pull. Pure push offers low latency and low overhead, while hybrid models improve reliability for messages requiring strict ordering or persistence.

Factors Affecting Push Effectiveness

SDK completeness, adaptive heartbeat, read/write timeout tuning, and server‑side routing strategies (e.g., mapping a user to a specific room instance) are critical for performance in weak network environments.

Go Development Challenges and Solutions

Major issues observed were excessive goroutine‑spawned I/O, uncontrolled buffer allocation, and long GC pauses (up to 6 s). Solutions included:

Limiting goroutine creation and using task pools with resident workers.

Implementing connection pooling and long‑lived RPC channels.

Introducing pipeline RPC to reduce connection count.

Adopting memory/object pools where beneficial, while weighing lock contention.

Profiling showed memory peaks of 69 GB and GC times of 3‑6 s, prompting instance splitting and careful resource isolation.

Operations and Testing

Architecture iterates by splitting clusters, deploying multi‑instance setups, and isolating business‑specific workloads. Load testing uses long‑connection pressure tools and visual monitoring; Go’s built‑in profiling aids performance analysis.

Q&A Highlights

Answers cover timeout settings for mobile networks, message persistence (Redis + MySQL), handling message storms, Go toolchain debugging, TCP‑based protocol stack, upstream data routing, SDK multi‑app reuse, profiling strategy, consumer grouping, choice of Go over Erlang, flow‑control, and coordination via Zookeeper versus Raft‑based keeper.

Overall, the 360 push system demonstrates a scalable, high‑throughput backend architecture built with Go, emphasizing careful resource management, robust SDK design, and continuous operational tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Scalability Go long-connection Push System

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.