Cloud Computing 10 min read

Adoption of Curve Block Storage for NetEase Cloud Music: Background, Challenges, and Benefits

Facing Ceph’s high latency, jitter, and upgrade constraints, NetEase Cloud Music adopted Curve block storage, which delivers over twice the IOPS, sub‑2‑second upgrade pauses, and stable, low‑latency I/O across 40 GB‑4 TB volumes, enabling the platform to meet its 99.99% availability SLA for billions of users while supporting future cloud‑native services.

NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Adoption of Curve Block Storage for NetEase Cloud Music: Background, Challenges, and Benefits

NetEase Cloud Music is one of China’s leading online music platforms, offering a community‑centered service that includes the main music app as well as social entertainment products such as LOOK live, Sound Wave, and Music Street.

The cloud‑disk business of NetEase Cloud Music consists of several Java applications (main site, UGC, music library). The main site is the core service and must meet a strict SLA (annual availability ≥99.99%) for a user base of billions of users.

Before 2019 the service relied on Ceph cloud disks. In large‑scale scenarios Ceph exhibited serious performance defects: high I/O latency, I/O jitter, and inability to maintain latency under failure conditions such as bad disks, OSD crashes, or network congestion. Extensive optimisation efforts only provided marginal relief, prompting the team to evaluate the Curve distributed block‑storage system.

Curve block storage is well‑suited to mainstream cloud platforms. It offers high performance, easy operation, and stable, non‑jittery I/O. In production NetEase Cloud Music integrates Curve with OpenStack components: Cinder uses Curve as the backend for cloud‑disk storage, Nova uses Curve (via its Python SDK) to clone volumes for system disks, and Glance stores images on Curve. Libvirt/QEMU drives virtual machines directly from Curve volumes through a provided driver library, eliminating the need to mount the volume locally.

Why Curve was chosen – business side: Ceph’s poor per‑volume performance (low IOPS, high latency) limited its use to system disks or log storage, and frequent I/O jitter caused util‑100% alerts and service avalanches. Curve eliminates jitter, delivers over twice the performance of Ceph on identical hardware, and maintains lower latency, as shown in the comparative performance charts.

Why Curve was chosen – operations side: Ceph upgrades required client restarts or live migrations, which are impractical for thousands of VMs. Curve supports hot upgrades without restarting QEMU processes, and its Raft‑based quorum protocol ensures that upgrades affect I/O for only a few seconds (latency <2 s). Curve also avoids the data‑distribution imbalance caused by Ceph’s CRUSH algorithm, reducing the need for costly data‑rebalancing operations.

After nearly three years of production use, Curve block storage has proven stable and performant. It supports system‑disk sizes of 40 GB or 60 GB, cloud‑disk capacities from 50 GB up to 4 TB (soft limit; PB‑scale volumes are possible), and meets all core‑business requirements without noticeable I/O jitter during failures or upgrades.

Future plans include: exploring cloud‑native middleware (Redis, Kafka, message queues) on Curve volumes; deploying a CurveBS + PolarFS + MySQL cloud‑native database stack; migrating remaining Ceph or local‑disk VMs to Curve; and delivering a shared file‑system service (ReadWriteMany PVCs) built on CurveFS, which can store data on Curve block storage or S3‑compatible object stores.

References: Curve project on GitHub (https://github.com/opencurve/curve) and the OpenCurve WeChat community (search for OpenCurve_bot).

performance optimizationoperationscloud storageCephOpenStackblock storagecurve
NetEase Cloud Music Tech Team
Written by

NetEase Cloud Music Tech Team

Official account of NetEase Cloud Music Tech Team

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.