High‑Availability Architecture and Scaling Experience of Snowball During Stock Market Turbulence

This article shares Snowball's (Xueqiu) high‑availability architecture, performance optimizations, and scaling strategies—including hybrid cloud migration, service decomposition, in‑memory caching, and metric‑driven monitoring—implemented to handle massive traffic spikes and operational challenges during a volatile stock market period.

High Availability Architecture
High Availability Architecture
High Availability Architecture
High‑Availability Architecture and Scaling Experience of Snowball During Stock Market Turbulence

Snowball (Xueqiu) is a financial social platform with under 100 employees, half engineers, serving millions of users and billions of API calls. The talk, based on Tang Fulin’s experience, outlines the company overview, overall architecture, and the evolution of its high‑availability design.

The current stack includes Java, Scala, Akka, Finagle, Node.js, Docker, Hadoop, running in a private IDC with a hybrid public‑private cloud strategy. Services are organized into web, Android, iOS front‑ends, an API layer behind Nginx, a legacy “snowball” monolith, and numerous RPC and HTTP services built on Finagle.

Key challenges during the 2015 market volatility were massive traffic spikes (up to 30‑plus times normal load), bandwidth surge, and registration bursts from advertising. Optimizations included a dedicated quote server with in‑memory cache, scaling IM push to 50 k msg/s, moving critical services to public cloud, and extensive use of Redis, Hazelcast, and async processing.

Performance improvements covered HA for front‑end modules using Hazelcast replication, disabling JVM attach mechanisms, redesigning DB access (caching nickname and phone checks, async inserts), and refining the IM system (Netty + custom protocol, per‑client Akka actors, push‑first design). Monitoring migrated from Zabbix to OpenFalcon, and metrics such as QPS, p99 latency, and error rate were emphasized.

The talk concludes with lessons: prioritize solving bottlenecks over perfect design, keep the tech stack simple and consistent, adopt incremental changes, enforce idempotency, and enforce metric‑driven ownership across services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backendmonitoringperformancearchitecturescalinghigh-availability
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.