Cloud Computing 21 min read

Design and Performance Evaluation of the Ursa Distributed Block Storage System

Ursa, a master‑based distributed block storage system created by Meituan Cloud, overcomes Ceph, Sheepdog, and MooseFS limitations by delivering SSD‑level IOPS and near‑theoretical 10 GbE bandwidth with low CPU usage, supporting large virtual disks, snapshots, high availability, and efficient replica‑striped writes.

Meituan Technology Team

Mar 18, 2016

Design and Performance Evaluation of the Ursa Distributed Block Storage System

Cloud disks are essential components of IaaS platforms such as Amazon EBS, Alibaba Cloud Pangu, and OpenStack Cinder. They provide high reliability, snapshot capability, VM migration support, and fast fault recovery. With the widespread adoption of 10 GbE, the advantages of cloud disks become even more pronounced.

The underlying storage is usually a distributed block storage system. Open‑source projects like Ceph RBD, Sheepdog, MooseFS and GlusterFS have been evaluated, but each exhibits limitations: Ceph suffers high CPU usage, Sheepdog shows data loss under stress, MooseFS has POSIX‑semantic overhead and incomplete openness, and GlusterFS is not commonly used for cloud disks. Moreover, these systems cannot fully exploit 10 GbE and SSD performance.

To address these issues, Meituan Cloud developed a new distributed block storage system called Ursa . The name, inspired by the Dota hero Ursa, symbolizes high IOPS, throughput, and stability.

Related Projects

2.1 Ceph – Originated from Sage Weil’s PhD work (2004) and presented at OSDI 2006. Provides object, block, and file storage; block layer is relevant for IaaS. Performance test (Ceph 0.81, CentOS 6.x, fio) on a 4‑node cluster showed 16 407 read IOPS (CPU > 500 %), 941 write IOPS, read throughput 218 KB/s, write throughput 67 KB/s, read latency 1.6 ms, write latency 4.4 ms, and high CPU consumption.

2.2 Sheepdog – Distributed block storage from NTT Labs (2009). Uses a symmetric, no‑master architecture with consistent hashing. Service process acts as both data server and QEMU gateway. Reliability tests revealed inconsistent replicas after a week of stress testing.

2.3 MooseFS – Fault‑tolerant distributed file system with FUSE POSIX interface. Consists of Master, Metalogger, Chunk Server, and Client. Drawbacks include high Master load, FUSE overhead, and coarse‑grained snapshotting.

2.4 GFS/HDFS – GFS‑style master‑worker architecture; HDFS simplifies GFS but lacks features like concurrent append and native HA.

2.5 HLFS – Combines LFS and HDFS; suffers random I/O, write latency, and garbage‑collection overhead.

2.6 iSCSI, FCoE, AoE, NBD – Network block device protocols that follow a client‑server model and cannot directly support distributed block storage.

Ursa Design Goals

Support large virtual disks (typically > 1 GB).

Enable random read/write, resize, and snapshot/clone.

High reliability and availability (tolerate two simultaneous server failures).

Leverage 10 GbE and SSD performance.

Optimize both throughput and IOPS.

Efficient resource usage to reduce cost.

3.1 System Architecture

Ursa adopts a master‑based architecture (Master, Chunk Server, Client). Persistent volume metadata is stored in MySQL; transient chunk and server metadata are kept in Redis. The Master also includes a Manager component.

3.2 CAP Trade‑off

Since a cloud disk is typically attached to a single VM, consistency requirements are low. Ursa therefore favors Availability and Partition‑tolerance (AP) over strong Consistency.

3.3 Concurrency Model

A hybrid model combines multi‑process, multi‑coroutine, and event‑driven mechanisms. Multi‑process provides isolation, coroutines handle I/O concurrency, and event‑driven networking maximizes throughput.

3.4 Storage Structure

Data is stored in configurable 64 MB chunks. Three chunks form a replica group; two groups form a stripe for read/write interleaving. A client‑side cache accelerates hot data access.

3.5 Write Strategy

Ursa uses a forked write approach: the client writes to a primary replica (WRITE_REPLICATE), which then propagates the data to secondary replicas, balancing latency and throughput.

3.6 Stateless Service

Chunk Servers are stateless; each request is independent, allowing easy scaling and robustness.

3.7 Modules

The I/O stack follows the Decorator pattern: all modules implement the IStore interface, with the concrete component handling direct Chunk Server communication and decorators adding caching, compression, etc.

3.8 Product UI

Performance Evaluation

Testbed: 10 GbE network, 1 client, 3 Chunk Servers, data on tmpfs (memory‑only). Results show Ursa approaches the theoretical bandwidth of the NIC, achieves SSD‑level IOPS, and read latency close to ping (write latency ~68 % higher due to three‑replica writes).

Compared with Ceph, Ursa’s server processes consume 43 % CPU for 61 340 read IOPS (vs. Ceph’s 123.7 % CPU for 4 101 IOPS). On the client side, Ursa uses 96 % CPU for 61 340 IOPS, while Ceph uses > 500 % CPU for 16 407 IOPS, indicating a 21‑fold efficiency gain.

Conclusion and Outlook

Ursa was developed and deployed in nine months. Future work includes native SSD optimization and handling massive VM boot storms via overlay multicast distribution.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance testing Distributed storage Ceph cloud block storage Ursa

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.