Big Data 11 min read

Apache Ozone: Architecture, Advantages, and New Features Overcoming HDFS Limitations

This article explains the shortcomings of HDFS at large scale, describes the Federation and Scaling approaches, and details how Apache Ozone redesigns metadata storage, introduces container abstraction, object semantics, and new features such as optimized OM, streaming writes, erasure coding, and RocksDB consolidation to improve scalability and performance.

DataFunTalk

Jul 4, 2022

Apache Ozone: Architecture, Advantages, and New Features Overcoming HDFS Limitations

In the past decade, HDFS has become the core component of distributed big‑data storage due to its high fault tolerance and throughput, but its original architecture shows limitations when clusters grow, including NameNode GC pressure, heartbeat storms, and poor handling of massive small files.

Two community directions address these issues: Federation, which expands the number of NameNodes, and Scaling HDFS, which increases the metadata capacity of a single NameNode. Federation evolves from ViewFs (client‑side configuration) to RBF (router‑based centralized configuration).

Scaling HDFS led to the creation of Apache Ozone, a project that separates metadata management into Ozone Manager (OM) and Storage Container Manager (SCM), replaces in‑memory metadata with RocksDB, and introduces a Container abstraction to reduce heartbeat volume.

Ozone Architecture Advantages

Metadata Storage: OM handles volume/bucket/key metadata, SCM manages container metadata; both use RocksDB instead of all‑in‑memory storage.

Container Abstraction: Blocks are grouped into containers; DataNode reports container status, reducing reporting overhead.

Object Semantics: Supports both file and S3 protocols with a three‑level path (Volume → Bucket → Object).

Ozone DataNode stores blocks within containers and uses RocksDB for internal block data. Containers have OPEN and CLOSE states to control write and delete operations.

SCM, analogous to HDFS NameNode block management, oversees DataNode registration, container lifecycle, and replication, relying on Apache Ratis for consistency.

New Ozone Features

Ozone FS Optimization: Replaces OM with a NameNode‑style layer to provide native file‑system semantics, improving delete/rename performance.

OM Table Redesign: Splits a single RocksDB table into separate file and directory tables, enabling O(1) rename and delete operations.

Ozone Streaming: Reduces RaftLog write amplification by batching chunks and performing asynchronous flushes, achieving ~90% of HDFS write throughput in version 1.3.

Erasure Coding (EC): Implements RD and XOR coding for fault tolerance; online repair is supported, while offline repair and replica conversion are pending.

RocksDB Consolidation: Merges per‑container RocksDB instances into one per‑disk instance, dramatically reducing the number of RocksDB processes and improving DataNode performance.

These enhancements collectively raise Ozone's metadata limits, improve scalability, and bring its performance close to native HDFS while adding object‑storage capabilities.

Thank you for reading.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

scalability erasure-coding HDFS RocksDB big data storage Apache Ozone

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.