Building a High‑Concurrency, Scalable Proxy for Weibo Recommendation Engine

This article details the design and implementation of a high‑concurrency, easily extensible proxy built in Go for Weibo's recommendation system, covering background, challenges with twemproxy, technical research, architecture, configuration, logging, monitoring, module breakdown, business logic, performance testing, and future improvements.

21CTO
21CTO
21CTO
Building a High‑Concurrency, Scalable Proxy for Weibo Recommendation Engine

1 Background

Data is the foundation of the Weibo recommendation engine; whether raw data from platform departments or derived keywords and candidate sets, read/write operations are pervasive throughout the system.

The recommendation engine gradually separated the data layer into import, storage, and external access models. Data import (Rin) uses a message middleware with memcache protocol for publish/subscribe, storage relies mainly on Redis and Lushan, and external access is handled via twemproxy.

In the industry, 7‑layer business proxies route requests to backend App Server or Data Server clusters; all cloud or distributed DB services pass through such proxies.

2 Problem

During incremental growth of Weibo, using twemproxy revealed several issues:

The Lushan storage service uses a key format that includes a DB number (e.g., "$db-$uid"); upstream must hash and concatenate the key, coupling logic and requiring code changes when hash rules change.

twemproxy only supports Redis and memcache protocols; extending it to MySQL, MongoDB, HTTP, etc., incurs high cost.

The high‑concurrency async callback model tightly couples with business code, scattering logic across multiple places.

Routing is single‑layer; for critical data stored in MySQL, a cache‑aside pattern is needed, but the current design forces synchronous calls and complicates the architecture.

From a technical perspective, twemproxy also has drawbacks:

It runs as a single process with async I/O, which cannot fully utilize multi‑core servers; multi‑process deployment adds operational complexity.

Implemented in C with custom data structures (arrays, strings, red‑black trees, queues), making maintenance and extension difficult.

PS: twemproxy is no longer used internally by Twitter.

3 Technical Research

Given the business needs, we require a proxy that is high‑concurrency, easy to extend, and easy to maintain.

We evaluated two open‑source proxies besides twemproxy:

McRouter – Facebook’s C++0x proxy, memcache‑only, uses lightweight fibers, supports cross‑datacenter fault tolerance, but has a large codebase.

Codis – Wandoujia’s Go proxy, Redis‑only, uses Zookeeper for configuration and provides transparent sharding and resharding.

Both have strengths but are not ideal for our scenario; we decided to develop a custom proxy in Go for the following reasons:

Rapid development cycle – a demo can be built in about a week.

Go’s built‑in concurrency allows synchronous code to achieve async I/O performance.

Static compilation offers no clear advantage, but Go’s goroutine model simplifies maintenance and extension.

High concurrency versus high performance: I/O latency dominates, so Go vs. C/C++ differences are negligible for QPS.

4 Proxy Design

Main functions:

Support protocols Redis, memcache, HTTP; backend services Redis, memcache, MySQL; route requests based on configurable hash rules (modulo, consistent hash, etc.).

Two‑level data service: cache layer for hot data, persistent layer for full data.

Automatic failover between two data‑center clusters.

Write‑back: on cache miss, fetch from the second layer and write back to cache, handling master‑slave permissions.

Configuration, logging, and monitoring are essential for a 24×7 service.

4.1 Configuration

Inspired by twemproxy’s YAML, we use TOML for configuration. Example configuration is shown below.

The connection‑pool list uses four hyphens to separate services from different data centers; removing them disables cross‑datacenter failover.

4.2 Logging

IO‑intensive logging requires buffering and async writes; we adopt the seelog library, which supports size‑ or time‑based rotation and automatic log count management.

4.3 Monitoring

To ensure 24×7 reliability, we expose a simple HTTP endpoint that reports metrics such as current connections, QPS, and request counts, which can be scraped by the existing monitoring platform.

4.4 Proxy Modules

The proxy is divided into four layers, each implemented as a Go package. Overview diagram:

Key modules include:

protocol : parses Redis and memcache protocols using bufio.Reader.

hash : defines

type HashFunc func(key string) (dbIndexNum int, serverPosition int, err error)

, supporting modulo and consistent hashing.

tunnel : manages a physical connection, parses, processes, and replies; uses a goroutine pool.

entry : entry services for TCP and HTTP; TCP uses a task queue + goroutine pool, HTTP adds gzip/deflate, graceful shutdown, and context propagation.

conn‑pool : maintains fixed‑size pools for backend services, handles failures, heartbeats, and cross‑datacenter switching.

common : utilities such as logging, configuration, MySQL client, monitoring, error handling, and other helpers.

business logic : loads configuration, starts monitoring, initializes pools, launches instances, and performs graceful shutdown.

4.5 Business Logic

The core scenario is a multi‑key read: split keys by hash, launch a goroutine per backend, collect results via channels, and reassemble preserving the original order. The same pattern can be recursively applied for multi‑level storage.

Workers are goroutines; channels provide thread‑safe FIFO queues.

5 Performance

Throughout development we performed benchmark tests (redis‑benchmark, memtier_benchmark, custom Python scripts). The bottleneck is the backend storage; the proxy aims to keep request latency comparable to direct storage access.

6 Summary

The new proxy is now deployed across many services; most business changes can be applied without code modifications.

Remaining issues:

Framework abstraction is still limited; a Hadoop‑style MapReduce model would let developers focus only on serialize, split, hash, map, and reduce logic.

The proxy should evolve beyond simple routing to include sharding, distributed transactions, and lock management.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CacheProxyGolanghigh concurrencyTwemproxy
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.