Big Data 14 min read

How Distributed Unified Storage Solves Modern Big Data Challenges

This article explores the evolution of storage technology, the rise of software‑defined distributed unified storage like UMStor, and the Hadapter solution that enables high‑performance, compute‑storage separation for big‑data and cloud environments, highlighting real‑world deployments and performance insights.

UCloud Tech

Jul 9, 2018

How Distributed Unified Storage Solves Modern Big Data Challenges

Evolution of Unified Storage Technology

The digital economy’s rapid integration of cloud computing, big data, and IoT is reshaping enterprise IT, making traditional heterogeneous storage insufficient for unified management, data sharing, and emerging workloads such as virtualization, big data, IoT, hybrid cloud, and AI.

Distributed storage, with software at its core and a unified pool of underlying resources, offers elastic capacity and performance scaling, decoupling hardware from software and enabling flexible resource allocation to applications.

Yoyun Digital Intelligence (优云数智) developed the software‑defined distributed unified storage solution UMStor, using Hadapter to call librados directly, bypassing gateway bottlenecks and maintaining high performance in big‑data environments.

Historical Stages of Storage Technology

Information Age : Transition from direct‑attached storage to network storage (SAN, NAS). No explicit unified storage concept, but SAN integrated structured data.

Internet Age : Rise of e‑commerce and explosive growth of unstructured data; unified storage emerged to consolidate structured and unstructured data.

Social Age : Massive unstructured data growth; distributed storage became prevalent.

We are now in a fourth stage of data explosion, driven by cloud computing, big data, AI, and IoT, all centered around data.

Impact of New Digital Technologies on Storage

These technologies increase data volumes dramatically, pushing storage requirements from tens of terabytes to petabyte scales, challenging traditional storage capabilities such as protection, performance, and scalability.

Compute‑Storage Separation with Hadapter

Traditional big‑data deployments rely on HDFS, which offers high performance and low cost but creates management challenges when multiple Hadoop clusters coexist.

Public‑cloud services like AWS EMR separate compute and storage by using S3 object storage, improving elasticity but introducing gateway overhead and potential bottlenecks.

Hadapter draws inspiration from NFS‑Ganesha and Ceph’s librgw library to allow direct librados calls from Hadoop clients, eliminating the object‑storage gateway and reducing I/O latency.

The Hadapter plugin is deployed on Hadoop clients; requests prefixed with uds:// are intercepted, translated to librados calls, and sent directly to OSDs, achieving compute‑storage separation while preserving performance.

Performance Comparison

Benchmarks show HDFS still delivers the best performance, object‑storage‑only access incurs roughly double the latency, and Hadapter falls slightly behind HDFS but significantly outperforms direct object‑storage access.

Real‑World Deployment

A large‑scale project built a multi‑petabyte distributed storage system for a private cloud, supporting VM images, relational databases, and unstructured data. The solution combined UMStor with Hadapter, enabling seamless “big data on cloud” capabilities.

Deployment evolved from physical nodes to a hybrid physical‑virtual model, with plans for full virtualization. The jar‑based Hadapter simplifies installation and maintenance.

Re‑architecting Big‑Data Processing

Using UMStor’s multi‑protocol distributed storage, a data‑lake architecture was created, reducing data movement and streamlining processing across diverse sources.

Future Outlook

As cloud, big data, and AI continue to grow, “dark data” stored across disparate systems becomes increasingly valuable. Multi‑protocol distributed unified storage can consolidate this data, unlocking new insights and supporting the next wave of storage innovation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed storage Data Lake Software-Defined Storage Hadapter UMStor

Written by

UCloud Tech

UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.