Designing High‑Performance Read Services: Principles, Caching, and Concurrency
This article shares practical design principles for building scalable read services, covering stateless architecture, data closed‑loop processing, multi‑layer caching strategies, concurrency optimization, degradation switches, rate limiting, traffic switching, and other operational best practices.
The author has been developing read services at JD for over a year, handling workloads from hundreds of millions to tens of billions of requests, and continuously experimenting with architecture and code to achieve a satisfactory read‑service design.
Some Design Principles
Stateless
Data closed loop
Cache silver bullet
Concurrency
Degrade switch
Rate limiting
Traffic cutting
Others
Stateless
Stateless applications can scale horizontally; in practice configuration may remain stateful, such as specifying different data sources per data center.
Data Closed Loop
When many data sources are involved, adopt a data closed loop: first heterogenize data via mechanisms like MQ into an appropriate storage engine (Redis or persistent KV); optionally aggregate the data so the frontend can retrieve everything with one or few calls. This ensures the system continues operating even if a dependent system fails, with possible update backlog but no impact on frontend display.
For requests needing multiple pieces of data, a hash‑tag mechanism can co‑locate related keys in the same Redis instance (e.g., product ID as the hash tag).
Cache Silver Bullet
Cache is essential for read services.
Browser Cache
Control expiration via response headers (Expires, Cache‑Control). Suitable for data not time‑critical such as product detail frames, merchant scores, ads; not suitable for price or inventory which require real‑time freshness.
CDN Cache
Push pages, activity pages, images to the nearest CDN node. Two mechanisms: push (active update) and pull (on‑demand fetch). Design URLs without random numbers to avoid cache bypass; crawlers can receive stale data to reduce origin hits.
Access Layer Cache
For services without CDN, use Nginx as an access layer with:
URL rewriting to remove random elements.
Consistent hashing based on parameters (e.g., category or product ID) to ensure the same data lands on the same server.
proxy_cache for memory/SSD caching.
proxy_cache_lock to merge multiple origin requests into one.
lua shared_dict (when using nginx+lua) to keep cache across reloads.
Avoid caching fallback or erroneous data.
Application Layer Cache
In Tomcat, use in‑heap or off‑heap cache; consider local Redis cache to survive restarts and mitigate traffic storms.
Distributed Cache
Prefer a local Redis cluster with master‑slave sync for moderate data volumes; if data is too large, shard using consistent hashing or adopt a full distributed cache solution.
Concurrency
Parallel fetching can halve latency. For example, fetching five data items sequentially takes 60 ms, while concurrent fetching reduces it to 30 ms, and pre‑fetching dependent data can further lower latency to 25 ms.
Degrade Switch
Key ideas for degradation switches:
Centralized management: push switches to all applications.
Multi‑level read fallback: local cache → distributed cache → default degraded data.
Place switches at the access layer (e.g., Nginx) to prevent requests from reaching backend services.
Rate Limiting
Prevent malicious traffic and attacks:
Let cache handle most traffic.
Use Nginx limit module for traffic that reaches the backend.
Block malicious IPs with Nginx deny rules.
Typically, rate limiting is applied at the application layer rather than the access layer.
Traffic Cutting
For large applications, switch traffic when a data center, rack, or server fails using:
DNS to change entry points.
LVS/HAProxy to switch faulty Nginx instances.
Nginx to switch faulty application instances.
Sometimes traffic can be switched directly at the Nginx access layer without LVS/HAProxy.
Other Practices
Use stateless domains without cookies (e.g., 3.cn).
Filter request headers at the access layer, forwarding only useful ones.
Pre‑validate request parameters at the access layer.
Set reasonable internal connection, read, and write timeouts.
Enable gzip compression to reduce traffic.
Use Unix domain sockets to reduce local connections.
Use HTTP keep‑alive for internal traffic.
Add server IP information in response headers for debugging.
The read services mainly handle KV data, so massive caching is the core strategy. Bringing cache closer to users improves speed, and a well‑designed degradation plan ensures the system remains resilient under abnormal conditions. The stack heavily relies on Nginx + Lua + Redis to solve many read‑service challenges.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
