Backend Development 17 min read

How UFS’s Performance‑Optimized Architecture Cuts I/O Latency to Sub‑10ms

This article explains how UCloud's performance‑oriented UFS leverages NVMe SSDs, upgrades to NFSv4, redesigns business indexing, and introduces a novel append‑only storage engine with stream/extent architecture to achieve sub‑10 ms latency and high IOPS for demanding AI and analytics workloads.

UCloud Tech

Sep 2, 2019

How UFS’s Performance‑Optimized Architecture Cuts I/O Latency to Sub‑10ms

Protocol Improvements

Earlier capacity‑type UFS used the stateless NFSv3 protocol, which is widely supported but incurs high latency in high‑IO scenarios. The performance‑type UFS switches to NFSv4, which adds stateful lock semantics and a compound mechanism that can complete multiple NFS operations in a single round‑trip, halving interaction counts and reducing latency.

Business Index

The index service is a core component of the distributed file system. It consists of two parts: directory index (tree‑structured hierarchy) and file index (metadata, block locations, permissions). The design must ensure external consistency under concurrent access and provide high availability and elasticity.

FileIdx adopts a stateless design with a lease mechanism between index nodes and the master for node management and fault tolerance.

Underlying Storage

The storage layer, named Nebula, follows an append‑only, immutable design. It uses an extent‑based model where each stream consists of one or more extents, and each extent is composed of fixed‑size segments (default 128 MB). A centralized index stores segment metadata, enabling pure‑computational block addressing without lookup queries.

Data blocks are addressed by calculating their offset within a stream; the extentsvr can directly map this offset to a physical location on disk, eliminating indexing latency.

Storage Engine Optimizations

To exploit NVMe SSD multi‑queue capabilities, the service model moves from a single‑threaded loop to a one‑loop‑per‑thread, lock‑free architecture, fully utilizing CPU and network bandwidth.

Block addressing is performed via a two‑level index: the first level maps block IDs to extents via streamsvr, and the second level locates the block within the extent on extentsvr, but the design ultimately reduces this to pure computation.

Random‑IO support is added through a FileLayer that implements a Log‑Structured File System‑style overlay, allowing overwrite operations without breaking the append‑only model.

Performance Results

Benchmarks on the performance‑type UFS show 4 KB random write latency under 10 ms, random read latency under 5 ms, and IOPS of 17 K (write) and 23 K (read) at a queue depth of 128.

Conclusion

The article details the motivations, protocol upgrade, index redesign, storage architecture, and engine optimizations that together enable the performance‑type UFS to meet the stringent latency and throughput demands of AI training, big‑data analytics, and high‑performance web services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization storage architecture distributed file system NVMe SSD NFSv4 append‑only storage

Written by

UCloud Tech

UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.