How UCloud’s IO Acceleration Boosts Mechanical Disk Performance 150× in the Cloud
This article explains how UCloud’s self‑developed IO acceleration technology dramatically narrows the CPU‑disk performance gap by raising 4K random write throughput from 300 IOPS to 45,000 IOPS, detailing its architecture, first‑generation limitations, and the second‑generation enhancements that enable scalable, fault‑tolerant cloud storage.
Modern CPUs outpace disk access latency, making disk I/O a severe bottleneck for cloud hosts; UCloud’s self‑developed IO acceleration raises 4K random write performance from 300 IOPS to 45,000 IOPS (a 150× improvement), now deployed on 93% of standard cloud hosts, covering 12.7 k instances and 26 PB of storage.
Why IO acceleration is needed
Mechanical disks suffer from seek latency, limiting SATA 4K random IOPS to around 300, which is insufficient for multi‑tenant cloud environments; while SSDs offer speed, their cost drives the need for a technology that boosts mechanical disk performance.
First‑generation IO acceleration principle
The solution uses a smaller cache disk for sequential writes and a larger target disk for final storage; a background thread flushes data from cache to target in order, making the cache appear as a regular block device via the kernel’s device‑mapper (dm) layer.
Each 4 KB data block is paired with a 512‑byte index stored in memory and persisted on disk; reads consult the index to locate data in either cache or target, and writes larger or misaligned I/O are split into 4 KB chunks.
Fast index recovery and backup
A periodic dump writes the in‑memory index to the system disk every hour; on reboot, the dump is loaded first, then the latest hour’s index from the cache, greatly speeding up recovery.
Issues with the first generation
Large memory consumption for the index.
Cache disk overload under high load.
Hot‑upgrade not friendly.
Incompatible with newer 512e disks.
Performance not scalable.
Second‑generation improvements
The new design stores the index on the system disk, reduces its size to 64 B, and keeps data on the cache disk.
Every eight indexes are merged into a 4 KB backup block written to the cache disk, providing fault tolerance if the system disk fails.
A merge‑write buffer combines multiple indexes into a single 4 KB write, ensuring 4 KB alignment and reducing write overhead.
Sequential I/O detection uses a trigger mechanism: when consecutive writes form a continuous block for a threshold count, the I/O is classified as sequential and bypasses the cache, writing directly to the target disk.
Hot‑upgrade is achieved with parent‑child modules: the parent forwards I/O while the child contains complex logic; the child can be detached and upgraded without disrupting the parent’s simple forwarding function.
Support for multiple cache disks (local and network) scales random write performance: one local cache disk reaches 4.7 W IOPS; adding one network cache disk raises it to 9 W IOPS; adding two network caches reaches 13.6 W IOPS.
One local cache disk: 4.7 W IOPS.
One local + one network cache: 9 W IOPS.
One local + two network caches: 13.6 W IOPS.
Conclusion
The IO acceleration technology dramatically improves mechanical‑disk random write capability, allowing cost‑effective cloud services while decoupling performance from the storage medium, and addresses hot‑upgrade, fault tolerance, and scalability challenges.
UCloud Tech
UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
