How UMStor and HAdapter Power Big Data Cloud Migration with Superior Performance
The article reports on UCloud's subsidiary presenting at ArchSummit 2018 in Shenzhen, detailing the evolution to the digital era, challenges of PB‑scale data storage, and their solution using NFS‑Ganesha, Hadapter, and UMStor to achieve efficient big‑data‑on‑cloud performance and a data‑lake model.
On July 7, the two‑day 2018 ArchSummit global architects summit concluded its Shenzhen stop, gathering over 2,000 technology managers, CTOs, and architects to discuss the latest industry achievements. UCloud and its subsidiary Youyun Shuzi were invited to participate.
About Youyun Shuzi
Youyun Shuzi (Shanghai Youming Cloud Computing Co., Ltd.), a wholly‑owned subsidiary of UCloud, focuses on enterprise‑grade private‑cloud products and solutions, offering a one‑stop PaaS + IaaS platform. Its core team comes from Google, Huawei, Mirantis, and other leading cloud companies, with headquarters in Shanghai and branches in Beijing and Shenzhen.
The conference featured a “101‑style” pyramid symbolizing partners competing for the central spot, reflecting today’s rapidly expanding technology ecosystem where demand connects everyone and technology evolves endlessly.
During the opening session on July 6, senior solutions director Fang Yong divided the evolution of data growth into three stages: the information era, the Internet era (Web 1.0), and the social era (Web 2.0). The information era began in finance and telecom, introducing networked storage such as SAN and NAS. The Internet era saw explosive growth of unstructured data and the birth of unified storage. In the social era, data volumes reached petabyte levels, prompting the emergence of digital technologies like AI, cloud computing, IoT, and VR.
Is “Big Data on Cloud” Inevitable?
Traditional commercial storage for B‑side businesses is mature, but supporting the massive data generated by digital technologies raises questions about required performance and the number of storage controllers needed for PB‑scale workloads.
In private‑cloud scenarios, storage holds VM images, RDS relational data, and unstructured data. Youyun Shuzi explored whether cloud elasticity could enable big‑data‑on‑cloud while meeting performance needs.
Defining the Storage Model
Comparing HDFS and S3 led to the conclusion that a method combining the benefits of compute‑storage separation with high performance was needed. The inspiration came from Red Hat’s open‑source NFS‑Ganesha, which supports many back‑ends, including Ceph object storage.
NFS‑Ganesha uses the librgw library to access Ceph objects. Leveraging this, Youyun Shuzi built a Hadoop plugin called Hadapter. Deployed on Hadoop clients, Hadapter intercepts requests prefixed with uds://, forwards them to the librados library, and accesses OSDs directly, achieving compute‑storage separation and independent scaling.
Performance Comparison
Benchmarks show HDFS still delivers the best performance, while S3‑based storage is the slowest. Hadapter’s performance is slightly below HDFS but far exceeds direct S3 usage.
Production Deployment of UMStor + Hadapter
After rigorous research, Hadapter was productized and deployed in a cloud environment with over 100 nodes serving as Hadoop clients, using a hybrid physical‑virtual node model.
The Hadapter + UMStor solution implements a compute‑storage separation model, addressing the “store‑and‑retrieve” challenge in big‑data environments. As a Java JAR, Hadapter is easy to install, deploy, and maintain. Production benchmarks compare favorably against a prior HDFS setup.
Data‑Lake Storage Model
Beyond migration, Youyun Shuzi developed a “data‑lake” storage model, extending HDFS concepts to unify unstructured data storage. This model makes data ingestion, processing, usage, and deletion visible, enabling full data‑lifecycle management.
UMStor is a multi‑protocol distributed storage system built on object storage, offering NFS, iSCSI, and other interfaces. In big‑data scenarios, UMStor acts as a data‑exchange platform, supporting one‑write‑multiple‑read patterns and forming the backbone of a data‑lake, dramatically improving processing efficiency.
What Storage Technology Does the Digital Era Need?
Fang Yong emphasizes that we are in a “digital era” where data value extends beyond raw data to regenerated data. Future storage innovation must revolve around multi‑protocol distributed storage and integrate with emerging digital technologies to unlock deeper value. The UMStor‑Hadoop integration via Hadapter exemplifies a small yet significant step toward that future.
The “UCan Technology Night” on July 6 highlighted how gaming architecture, edge computing, service granularity, and storage demands are all evolving, signaling the start of a new era.
As the physical and digital worlds converge, participants were urged to continuously think ahead to survive in the fast‑changing tech ecosystem.
Thank you to all partners who joined UCloud at ArchSummit 2018 Shenzhen. More technical content and expert exchanges will be shared through the “UCan Club”.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
UCloud Tech
UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
