Cloud Native 11 min read

How Snowball Used Apache APISIX to Build a Dual‑Active Architecture and Streamline Authentication

This article details Snowball's transition from a single‑datacenter setup to a dual‑active, cloud‑native architecture using Apache APISIX, covering background challenges, problem analysis, gateway selection, architectural adjustments, authentication unification, observability enhancements, ZooKeeper integration, and future plans.

Snowball Engineer Team
Snowball Engineer Team
Snowball Engineer Team
How Snowball Used Apache APISIX to Build a Dual‑Active Architecture and Streamline Authentication

Background

Snowball provides real‑time market data, trading tools, and wealth‑management services. Real‑time data processing consumes large resources and becomes a bottleneck during market spikes, prompting a stability‑focused redesign.

Problem Description

The original single‑datacenter architecture routed traffic through a cloud SLB to a lightweight gateway, then to backend services. Authentication was split between an SDK call to Snowball’s user centre and JWT verification inside services. This produced tens of billions of RPC calls per day (peak ≈ 50 k QPS) and occasional latency spikes. Additional pain points were:

OpenResty gateway lacked flexible plugin integration and required custom Lua scripts.

Service discovery relied on a self‑built registry that needed manual updates.

Gateway Selection

After comparing market options, the team selected Apache APISIX for its cloud‑native design, dynamic configuration, rich plugin ecosystem, and active community support.

Adjusted Architecture

A dual‑active, multi‑region design was introduced. The left side of the diagram retains the legacy datacenter layout; the right side shows the cloud‑native APISIX‑based deployment. Key changes:

Authentication moved to the gateway layer.

JWT verification performed locally with the jwt-auth plugin.

OAuth 2.0 flows delegated to Snowball’s user centre via the grpc-transcode plugin.

When JWT verification fails, the gateway falls back to the internal RPC service registry for backend authentication.

Authentication Implementation

Three implementation options were evaluated for invoking the user‑centre gRPC service:

Direct Lua gRPC calls – required custom load‑balancing and dynamic upstream handling.

Lua coroutine callbacks to a Golang helper – added complexity.

Lua HTTP calls combined with the grpc-transcode plugin – chosen for community support and simplicity.

Configuration of jwt-auth is performed via APISIX routes and upstream definitions. The grpc-transcode plugin translates incoming HTTP requests to the gRPC method defined in the user‑centre protobuf. Because protobuf definitions evolve, the files must be manually synchronized with the gateway to avoid mismatches.

Observability

Monitoring focuses on three metric groups:

NGINX connection status and traffic volume.

HTTP error‑code rates for upstream/service diagnosis.

APISIX request latency, calculated as NGINX total time – upstream latency . Plugin‑level latency metrics were added to isolate latency introduced by specific plugins.

Access logs are formatted uniformly and aggregated into a traffic dashboard, enabling multi‑dimensional analysis of request paths, error patterns, and latency trends.

ZooKeeper Service Registry Extension

Snowball’s gRPC services use ZooKeeper for discovery. The built‑in apisix-seed plugin was extended to poll a ZK‑Rest endpoint, cache the service list in each worker process, and refresh it periodically. This provides high availability but introduces a potential delay between registry updates and cache refreshes.

Results

Unified authentication, circuit breaking, and rate limiting at the gateway layer.

Reduced coupling between front‑end traffic and backend services, improving reliability in dual‑datacenter scenarios.

Enhanced end‑to‑end observability through APISIX latency and plugin metrics.

Seamless gRPC‑HTTP conversion and traffic splitting via plugins.

Future Work

Planned next steps include:

Deploying the APISIX Ingress Controller in Kubernetes clusters.

Using grpc-transcode for protocol conversion across all services.

Applying the traffic-split plugin for gray‑release traffic management and integration with Nacos.

Gradually replacing the legacy OpenResty gateway with APISIX to achieve full north‑south traffic governance.

cloud nativeobservabilityapi-gatewayAuthenticationDual-Active ArchitectureApache APISIX
Snowball Engineer Team
Written by

Snowball Engineer Team

Proactivity, efficiency, professionalism, and empathy are the core values of the Snowball Engineer Team; curiosity, passion, and sharing of technology drive their continuous progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.