Emerging Disaggregated Compute‑Storage Architecture for Cloud and Internet Scenarios
The article examines challenges of traditional server‑based distributed storage in cloud and internet workloads and proposes a new disaggregated compute‑storage architecture leveraging emerging hardware such as EBOF, DPU, CXL, and NVMe to improve resource utilization, performance, reliability, and efficiency.
From the perspective of cloud and internet business scenarios, the storage domain mainly adopts a server‑deployed distributed storage service model, which faces several challenges:
1. Data retention cycles do not match server update cycles; massive data from AI and other emerging services must be stored according to lifecycle policies (e.g., 8‑10 years).
2. Performance‑reliability and resource utilization are hard to achieve simultaneously. Performance‑oriented storage (using three‑replica or two‑replica RAID) offers high reliability but only about 30% space utilization, wasting resources.
Capacity‑oriented systems use Erasure Code (EC) to improve space efficiency, but EC’s read/write and reconstruction consume large network resources, leading to low reconstruction efficiency and reliability risks.
3. New serverless‑style distributed applications demand lightweight, high‑bandwidth, low‑latency shared storage without complex enterprise features.
4. Data‑center tax penalizes data‑intensive applications; CPU‑centric architectures waste up to 30% of compute power on I/O handling, reducing energy efficiency.
Traditional compute‑storage separation architectures split compute and storage into independent domains connected by Ethernet or Fibre Channel, which suits complex enterprise needs but cannot meet the above challenges. New hardware trends provide a basis for rebuilding data‑center infrastructure.
Hardware trends include:
Ethernet Bunch of Flash (EBOF) and similar high‑performance disk frames (OpenFlex, Vast Data Ceres) that adopt standards like NoF (NVMe over Fabric) for high‑performance storage.
Data Processing Units (DPUs) and Infrastructure Processing Units (IPUs) that replace general‑purpose CPUs in data‑flow paths, improving compute‑to‑data efficiency.
Advanced network standards such as CXL, Gen‑Z, OpenCAP, and NVMe 2.0, which enhance memory pooling and high‑speed interconnects.
These new storage, compute, and network hardware enable a new disaggregated compute‑storage architecture that:
Provides thorough compute‑storage decoupling, forming independent resource pools (CPU, memory, HDD/SSD) for flexible scaling.
Offers finer‑grained task allocation, letting DPUs and other accelerators handle workloads unsuitable for CPUs.
Key features of the new architecture:
Diskless servers with remote storage pools and memory pooling, improving resource utilization and fault tolerance.
Diverse network protocols (CXL, NoF, IP) that lower latency to sub‑microsecond levels and support various media (HDD, SSD).
Specialized data processors that offload storage and access tasks, boosting overall system energy efficiency.
High‑density disaggregated storage (e.g., EBOD) that combines RAID/EC and compression to reduce redundancy and increase usable space.
The architecture reorganizes resources into three simplified layers: storage modules (EBOF, Ethernet Bunch of Memory/Disk), compute modules (DPUs, GPUs, FPGAs), and high‑throughput data buses. This enables server‑local storage remote pooling, flexible network assembly, data‑centric processing, and high‑capacity, minimal‑overhead storage solutions.
Network technology is crucial; 10‑Gb Ethernet, NVMe/RoCE, and emerging CXL fabrics drive high‑speed data bus performance and memory pooling, supporting hot, warm, and cold data workloads across cloud and internet data centers.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.