Designing High‑Performance Read Services: Principles, Caching, and Concurrency

This article shares practical design principles for building scalable read services, covering stateless architecture, data closed‑loop processing, multi‑layer caching strategies, concurrency optimization, degradation switches, rate limiting, traffic switching, and other operational best practices.

21CTO
21CTO
21CTO
Designing High‑Performance Read Services: Principles, Caching, and Concurrency

The author has been developing read services at JD for over a year, handling workloads from hundreds of millions to tens of billions of requests, and continuously experimenting with architecture and code to achieve a satisfactory read‑service design.

Some Design Principles

Stateless

Data closed loop

Cache silver bullet

Concurrency

Degrade switch

Rate limiting

Traffic cutting

Others

Stateless

Stateless applications can scale horizontally; in practice configuration may remain stateful, such as specifying different data sources per data center.

Data Closed Loop

When many data sources are involved, adopt a data closed loop: first heterogenize data via mechanisms like MQ into an appropriate storage engine (Redis or persistent KV); optionally aggregate the data so the frontend can retrieve everything with one or few calls. This ensures the system continues operating even if a dependent system fails, with possible update backlog but no impact on frontend display.

For requests needing multiple pieces of data, a hash‑tag mechanism can co‑locate related keys in the same Redis instance (e.g., product ID as the hash tag).

Cache Silver Bullet

Cache is essential for read services.

Browser Cache

Control expiration via response headers (Expires, Cache‑Control). Suitable for data not time‑critical such as product detail frames, merchant scores, ads; not suitable for price or inventory which require real‑time freshness.

CDN Cache

Push pages, activity pages, images to the nearest CDN node. Two mechanisms: push (active update) and pull (on‑demand fetch). Design URLs without random numbers to avoid cache bypass; crawlers can receive stale data to reduce origin hits.

Access Layer Cache

For services without CDN, use Nginx as an access layer with:

URL rewriting to remove random elements.

Consistent hashing based on parameters (e.g., category or product ID) to ensure the same data lands on the same server.

proxy_cache for memory/SSD caching.

proxy_cache_lock to merge multiple origin requests into one.

lua shared_dict (when using nginx+lua) to keep cache across reloads.

Avoid caching fallback or erroneous data.

Application Layer Cache

In Tomcat, use in‑heap or off‑heap cache; consider local Redis cache to survive restarts and mitigate traffic storms.

Distributed Cache

Prefer a local Redis cluster with master‑slave sync for moderate data volumes; if data is too large, shard using consistent hashing or adopt a full distributed cache solution.

Concurrency

Parallel fetching can halve latency. For example, fetching five data items sequentially takes 60 ms, while concurrent fetching reduces it to 30 ms, and pre‑fetching dependent data can further lower latency to 25 ms.

Degrade Switch

Key ideas for degradation switches:

Centralized management: push switches to all applications.

Multi‑level read fallback: local cache → distributed cache → default degraded data.

Place switches at the access layer (e.g., Nginx) to prevent requests from reaching backend services.

Rate Limiting

Prevent malicious traffic and attacks:

Let cache handle most traffic.

Use Nginx limit module for traffic that reaches the backend.

Block malicious IPs with Nginx deny rules.

Typically, rate limiting is applied at the application layer rather than the access layer.

Traffic Cutting

For large applications, switch traffic when a data center, rack, or server fails using:

DNS to change entry points.

LVS/HAProxy to switch faulty Nginx instances.

Nginx to switch faulty application instances.

Sometimes traffic can be switched directly at the Nginx access layer without LVS/HAProxy.

Other Practices

Use stateless domains without cookies (e.g., 3.cn).

Filter request headers at the access layer, forwarding only useful ones.

Pre‑validate request parameters at the access layer.

Set reasonable internal connection, read, and write timeouts.

Enable gzip compression to reduce traffic.

Use Unix domain sockets to reduce local connections.

Use HTTP keep‑alive for internal traffic.

Add server IP information in response headers for debugging.

The read services mainly handle KV data, so massive caching is the core strategy. Bringing cache closer to users improves speed, and a well‑designed degradation plan ensures the system remains resilient under abnormal conditions. The stack heavily relies on Nginx + Lua + Redis to solve many read‑service challenges.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Scalabilityconcurrencycachinghigh-availabilitybackend-developmentservice-architecture
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.