Cloud Computing 13 min read

How USnap Extends CDP for Low‑Cost Snapshots on SSD and RSSD Cloud Disks

This article details how UCloud’s USnap service extends the UDataArk CDP technology to provide low‑cost, second‑level snapshots for ordinary, SSD, and RSSD cloud disks, describing the client‑side IO capture, Stream‑based front layer redesign, Arker storage architecture, and efficient rollback mechanisms.

UCloud Tech

Apr 13, 2021

How USnap Extends CDP for Low‑Cost Snapshots on SSD and RSSD Cloud Disks

Background

UCloud launched the UDataArk product in 2015, offering continuous data protection (CDP) with second‑level recovery for cloud host disks. It has been applied in many data‑security cases and earned wide customer recognition.

With the growing demand for high‑performance storage, SSD and RSSD cloud disks became mainstream, yet UDataArk only supported local and ordinary disks. To address the lack of efficient backup for SSD/RSSD disks, UCloud introduced the USnap snapshot service, which builds on UDataArk’s CDP technology and provides low‑cost backup for all disk types.

Key Challenges

USnap needed to (1) integrate high‑performance SSD/RSSD devices and (2) reduce the implementation cost of continuous data protection. This required architectural improvements in UDataArk and a redesign of all I/O path modules.

Client‑Side I/O Capture

The Ark plug‑in is integrated into the UDisk client. It asynchronously captures write I/O from the client and pushes it to the backup storage cluster.

For SSD UDisk, the Bdev thread creates an ArkIORequest containing a smart pointer to the data, enqueues it to a lock‑free queue, and the ArkHandle thread forwards it to an ArkIO thread for transmission. The UDisk I/O returns success without waiting for the backup I/O; data is released only after both complete.

For RSSD UDisk, which uses the SPDK Vhost scheme, a copy thread retrieves the bdev_io from the lock‑free queue, copies the data, then builds an ArkIORequest for the ArkIO thread. The data is released by the ArkHandle after the backup I/O finishes.

Performance tests show that in low I/O depth scenarios the backup functionality adds at most a 5% overhead, while in high I/O depth scenarios the impact is negligible.

Front Real‑Time I/O Layer

The front layer uses a small amount of NVMe storage to ingest massive real‑time I/O, which is later drained to a capacity layer built on HDDs. The original log‑structured design suffered from single‑node bottlenecks and hotspot issues.

USnap replaces it with a Stream‑based design. Logical disk writes are abstracted as a continuous data stream, split into fixed‑size shards. Each shard is mapped via consistent hashing to a placement group (a replica set). This distributes a single logical disk’s I/O across the entire front cluster, eliminating bottlenecks and hotspots.

The Shuffle module periodically pulls data from the front, shards it in memory, forms journal entries, and pushes them to the Arker capacity storage layer.

Arker Capacity Storage Layer

Data is organized into five granularity types (Granu): journal (second‑level), hour, day, base, and snapshot. Each type is stored using one of three blob formats: BASE Blob, CUT Blob, or JOURNAL Blob. Base and snapshot use BASE Blob; hour and day use CUT Blob; journal uses JOURNAL Blob.

For journal, hour, and day granules, each data shard maps to a unique inode that references a JOURNAL or CUT Blob. For base and snapshot granules, data is further deduplicated into TinyShards; identical TinyShards share the same inode and BASE Blob, achieving storage savings.

Metadata (Granu, Shard/TinyShard, Inode) is stored as key‑value pairs in a KVDevice, while compressed blob data resides in an FSDevice using the ZSTD algorithm, which reduces storage cost by at least 30% compared with the previous Snappy compression.

Rollback Process

The Chrono scheduler controls rollback. When a user specifies a rollback timestamp, Chrono checks Granu metadata to locate the target data, which may reside either in the front layer or already in Arker.

If the data is still in the front layer, Chrono instructs Shuffle to pull it into Arker. Once the data is in Arker, Chrono gathers all relevant Granus, assigns sequence numbers, and dispatches merge tasks to all Arker nodes. Each node first merges indexes, then merges the corresponding data blobs, producing a new BASE version that represents the restored full dataset. Finally, the restored data is written back to the UDisk cluster.

This merge operates at the shard level and runs concurrently across all capacity‑layer disks, while the write‑back to UDisk also uses parallel shard copying, enabling typical 1 TB restores within 30 minutes.

Conclusion

The article explains how a public‑cloud CDP backup system is built, how it integrates high‑performance I/O devices, and how it reduces implementation cost. It outlines USnap’s architectural choices, storage engine optimizations, and future enhancements such as customizable backup windows, erasure coding, and Copy‑On‑Read acceleration.

CDP Cloud Backup UCloud High-performance storage disk snapshot

Written by

UCloud Tech

UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.