Backend Development 12 min read

Cache Design and Optimization Practices for High‑Concurrency Music Library Service

The article details NetEase Cloud Music’s high‑concurrency cache architecture—using lazy‑load, hole‑wrapped objects for penetration protection, placeholder values for missing data, horizontal and vertical scaling with consistent hashing, and asynchronous binlog‑driven invalidation—to achieve sub‑millisecond reads for a read‑heavy, write‑light music library.

NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Cache Design and Optimization Practices for High‑Concurrency Music Library Service

Cache is a technique used in system design to improve the access capability of underlying services. In NetEase Cloud Music, a typical cache call chain is employed, using lazy loading: first try the cache, return if found, otherwise fetch from the next layer and write back to the current cache.

Key performance numbers: a simple DB operation costs 0.5–0.6 ms, a non‑local cache operation costs 0.5–0.6 ms, and a local cache operation costs 0.2–0.3 ms.

The music library read service handles the highest RPC QPS (over 500 k combined across two data centers). Various cache strategies and optimizations are applied to achieve better performance.

Characteristics of the music library data

Read‑heavy, write‑light

Read‑write separation possible

Second‑level data changes are not user‑sensitive (seconds‑level delay acceptable)

Hot data is concentrated

Many List‑based queries, resulting in many MultiGet operations

These characteristics guide the cache design.

Practical Scenario 1: High‑Concurrency Protection

Two main problems arise during hot song releases and pre‑sale songs that are not yet in the database: massive cache misses cause a sudden surge of DB queries.

Scenario 1 – Protecting Hot Data Retrieval

The cache service is deployed in two tiers: a distributed Memcache near the DB as a central cache, and a local Memcache on the read‑service host for the hottest songs. To prevent cache‑stampede during releases, a “cache penetration” (穿刺) technique is used.

Each cached value is wrapped in an object that also stores its expiration time:

public static class HoleWrapper
implements Serializable {
    private long expire; // object expiration time
    private T target;   // the actual object
}

When retrieving a key, the system checks the expire field. If the object is near expiration, the thread extends the expiration and writes it back, then penetrates to the next layer to fetch fresh data.

This “penetration” reduces the probability of cache stampede because cache reads are far faster than DB reads.

Scenario 2 – Protecting Requests for Non‑existent DB Data

For “cache‑through” (防穿透) problems, if a key is missing both in cache and DB, a special placeholder is written to the cache with an appropriate TTL.

从缓存取不到的数据,在数据库中也没有取到,这时也可以在缓存中写入一个特殊值进行标记,缓存时间的设置可以视情况确定(如果主动清理可以设置长一点、否则短一点)

This placeholder logic is encapsulated in the library’s cache code and applied at the fourth step of the relevant sequence diagram.

Practical Scenario 2: Cache Scaling

Scenario 1 – Horizontal Scaling (Scale‑Out) when capacity is sufficient but performance is insufficient

During peak periods, multiple cache clusters store the same data to increase QPS. Reads are random, writes are sequential, and a proxy ensures consistency across clusters.

Key concerns:

Ensuring data consistency across clusters.

Preventing a cold‑start surge on new clusters that would hit the DB.

The proxy performs random reads and ordered writes, allowing the new cluster to warm up with a small fraction of reads before it takes full traffic.

Scenario 2 – Vertical Scaling (Scale‑Up) when capacity is insufficient

Adding more resources to a single cache cluster (Scale‑Up) increases capacity. However, naive hashing can cause a full cache invalidation after scaling, leading to high risk. Consistent hashing is used to mitigate this risk.

Practical Scenario 3: Cache Invalidation

Given the read‑heavy, write‑light nature and tolerance for second‑level delays, an asynchronous cache cleaning strategy is adopted. Cache invalidation is triggered by database binlog messages.

All cleaning can be asynchronous because of the tolerated delay.

Binlog messages drive the invalidation.

Each related key can be generated from a single binlog entry.

For multi‑level caches, the central cache listens to binlog events first, then forwards a message to another queue for local cache cleaning. A provided SDK plugin allows each read service instance to run a dedicated consumer for local cache cleanup.

When the data structure of a cached object changes, instead of actively deleting all old entries, a cache version is added to the key generator. Incrementing the version effectively makes old entries invisible, forcing a refresh from the DB. This trade‑off requires careful rollout to avoid a cache avalanche.

Summary

The article outlines the end‑to‑end cache practice for the music library service, covering lazy loading, cache penetration, horizontal and vertical scaling, and asynchronous invalidation. Different business scenarios can adopt the relevant techniques.

Future directions include separating metadata from state data, staticizing metadata, and further optimizing cache usage based on business change patterns.

distributed systemscachinghigh concurrencymemcacheCache Invalidationscale-outScale-Up
NetEase Cloud Music Tech Team
Written by

NetEase Cloud Music Tech Team

Official account of NetEase Cloud Music Tech Team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.