Introducing Tencent Cloud CBS Enhanced and Ultra‑Fast SSD Cloud Disks: Architecture and Performance Optimizations
Tencent Cloud’s new CBS 3.0‑based Enhanced and Ultra‑Fast SSD cloud disks cut latency by over 50 % to sub‑100 µs, boost IOPS up to 1.1 million and throughput to 4 GB/s, and achieve these gains through SPDK‑driven virtualization, RDMA‑based data paths, a user‑space ZTCP stack, zero‑copy memory handling and dedicated hardware acceleration, targeting latency‑sensitive and IO‑intensive workloads such as large databases, video processing and AI inference.
As more users migrate critical workloads to the cloud, scenarios such as large‑scale SQL databases, NoSQL stores, video transcoding, and AI inference demand storage with stable low latency and high performance. The existing CBS product line could not fully satisfy these requirements.
After more than six months of online pressure testing, Tencent Cloud has launched a new generation of CBS 3.0‑based SSD cloud disks: the Enhanced SSD Cloud Disk and the Ultra‑Fast SSD Cloud Disk.
1. New SSD Product Series
Enhanced SSD Cloud Disk – built on the latest CBS 3.0 storage engine, it delivers over 50% latency reduction and a 92% increase in IOPS compared with previous SSD disks.
Typical scenarios include:
Latency‑sensitive workloads with high reliability requirements (e.g., database services, Docker cluster logs).
IO‑intensive workloads that exceed the IOPS or throughput limits of regular SSD disks (e.g., ClickHouse, live streaming).
Big‑data processing, video encoding/decoding, online stateless services such as game logic.
Ultra‑Fast SSD Cloud Disk – further optimizes latency and performance by replacing the TCP protocol stack with RDMA, dramatically reducing resource overhead and access latency.
2. Achieving Sub‑100 µs Latency and Over 1 Million IOPS
Performance of block storage is measured by IOPS, throughput, and latency. In IO‑intensive scenarios, low and stable latency is essential to keep read/write operations fast and to support high concurrency.
The data path in a virtualized environment traditionally traverses multiple layers (guest kernel → QEMU → host kernel → CBS component), causing frequent VM‑exits and context switches, which become bottlenecks.
Virtualization Layer Optimizations
Adopt SPDK to replace the traditional kernel‑based stack, eliminating VM‑exits and reducing the IO path.
Implement zero‑copy memory handling in the access layer to lower CPU pressure.
Make each thread own a dedicated resource pool, removing lock contention and improving latency stability.
Storage Cluster Optimizations
RDMA enables direct memory access between servers without CPU involvement, bypassing the TCP/IP stack and further cutting latency.
User‑Space TCP/IP Stack (ZTCP) Upgrade
The in‑house ZTCP stack runs entirely in user space, using the zero‑copy buffer (zbuf) for DMA transfers. Each thread has an exclusive resource pool, eliminating lock overhead and improving scalability.
Hardware Acceleration
Continuous hardware iterations allow the software optimizations to be migrated onto custom servers, delivering integrated software‑hardware performance gains.
After these optimizations, CBS achieves near‑local‑disk latency (≈100 µs) and single‑volume performance up to 1.1 million IOPS and 4 GB/s bandwidth.
Reference: https://cloud.tencent.com/act/pro/HSSD_TSSD_newarrival_activity
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
