Databases 15 min read

Designing and Building Massive Storage Systems: Architecture, Key Technologies, and Practical Experience

Zhu Jianping’s talk outlined how to architect massive storage systems—from understanding memory, SSD, HDD, and tape characteristics, through selecting cloud object or block services and data distribution algorithms, to scaling from terabytes to exabytes while avoiding in‑memory middle‑layer pitfalls and optimizing cost, performance, and reliability.

Tencent Cloud Developer

Jun 11, 2019

Designing and Building Massive Storage Systems: Architecture, Key Technologies, and Practical Experience

On May 25, the Internet Architecture Technology Salon concluded with a session featuring a Tencent technical expert who shared insights on technology architecture, practical case studies, serverless cloud function architecture, and massive storage system architecture. The following is a summary of Zhu Jianping’s talk on how to architect massive storage systems.

The talk is divided into four parts: (1) an overview of storage concepts; (2) how to build a massive storage system from scratch; (3) key storage technologies to help select public‑cloud storage products and reduce operational costs; (4) lessons learned from more than a decade of storage R&D and operations, highlighting two major pitfalls.

Understanding Storage

In the storage domain we often refer to “flour” (raw materials) and “bread” (finished products). Common “flour” includes memory, non‑volatile memory (NVM), SSDs, HDDs, magnetic tape, and Blu‑ray discs. Tape and Blu‑ray, once used for music and movies, are now employed for cold backup data in enterprises.

Memory is well known. NVM is a newer technology that sits between memory and SSDs, offering low‑cost capacity expansion. SSDs have become mature and are widely used in high‑performance data storage. HDDs (mechanical disks) have limited performance improvements over the past 20 years, typically delivering ~100 MB/s bandwidth and ~5 ms latency, but remain the cheapest storage medium.

When evaluating storage media, consider three dimensions: performance, cost, and I/O constraints. For example, DDR4 memory costs about ¥87 per GB, SSDs about ¥1.5 per GB, and HDDs about ¥0.3 per GB. Performance ranges from nanoseconds (memory) to milliseconds (HDD). I/O constraints differ: SSDs require block‑wise sequential writes, while tape/Blu‑ray are write‑once‑read‑many.

From Bread to Cloud Storage Services

Public‑cloud storage products fall into two main categories: object storage (e.g., AWS S3‑compatible interfaces) and block device storage (virtual disks backed by distributed storage). A third type is POSIX‑compliant file systems that can be mounted locally, also backed by distributed storage.

Database interfaces commonly used include SQL, Redis/Memcached, MongoDB, graph stores, time‑series databases, Elasticsearch (inverted index), and column‑oriented stores for big‑data workloads.

Building a Massive Distributed Storage System from Zero

The construction starts with data organization on storage media. Traditional file systems (Ext3, Ext4) split large files across multiple disk locations. Inodes provide hierarchical indexing. Data structures such as B+ trees (widely used in databases) and LSM trees (used in LevelDB, RocksDB) are fundamental for indexing and write‑optimized storage.

Scaling from 1 TB to 10 TB requires careful media selection. When a single server can no longer hold the data, data distribution mechanisms become essential. Common approaches include:

DHT – a fully decentralized hash table, historically popular but with replication control challenges.

CRUSH – Ceph’s hierarchical placement algorithm (OSD‑Shelf‑Cabinet‑Room) that maps data to placement groups.

Data distribution tables – widely used in industry to maintain dynamic mappings between storage units and their replicas, offering strong operational controllability.

At the petabyte (PB) scale, operational support systems are critical. Key areas include capacity management (real‑time usage monitoring and growth forecasting), fault management (detecting, isolating, and replacing failed disks or nodes), and data migration (moving data safely between machines or data centers).

When reaching 10 PB, data hot‑cold dynamics affect cost. Erasure coding can reduce the number of replicas while improving reliability. Automated tiering moves cold data to cheaper storage tiers.

At 100 PB, latency concerns arise. Caching hot objects and pre‑distribution via CDNs (e.g., placing popular data closer to users) can mitigate slow downloads.

At the exabyte (EB) level, idle CPU cycles can be leveraged by virtualizing or containerizing compute tasks onto storage nodes.

Key Technologies

The essential technologies for massive storage systems are:

Data distribution algorithms

Storage engines

Data consistency protocols

Data compression

Disk management

Disaster recovery and data restoration

Additional considerations include cross‑region distribution techniques.

Practical Experience

Example: In a social‑game data storage project, a front‑end access layer fed into a storage layer backed by HDDs. The high request rate (tens of thousands of reads/writes per second) exceeded HDD IOPS limits (100‑200 IOPS). An in‑memory caching layer was added, but this introduced complexity: the cache had to ensure data reliability, synchronize with multiple replica nodes, persist to local disks, and eventually flush to HDDs. The four‑way parallel design caused many pitfalls.

The lesson is to avoid temporary “in‑memory middle layers” that add complexity; instead, design the system so that performance depends directly on the chosen storage media for better operational controllability.

Speaker Introduction: Zhu Jianping, graduate of Wuhan University (Computational Mathematics), currently Technical Director of Tencent Cloud Architecture Platform, responsible for object storage, NoSQL storage, and related platforms, with extensive experience in distributed storage, video processing, heterogeneous computing, and data transmission.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

massive storage storage engineering

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.