Big Data 7 min read

Key New Features and Improvements in Hadoop 3.x

Hadoop 3.x upgrades the platform to JDK 1.8 and introduces a range of enhancements across common components, HDFS, YARN, and MapReduce, including erasure coding, multi‑NameNode high availability, cgroup‑based resource isolation, native map‑output collectors, and split client libraries, while also adding support for Azure and Aliyun distributed file systems.

Big Data Technology & Architecture

Jan 22, 2021

Key New Features and Improvements in Hadoop 3.x

Hadoop 3.x is a major release that upgrades the platform to JDK 1.8 and introduces numerous enhancements across its core components.

Common improvements include removal of deprecated APIs, default component upgrades (e.g., FileOutputCommitter v2), classpath isolation to avoid jar conflicts, and a refactored management script suite.

HDFS enhancements add erasure coding for half‑size storage savings, multi‑NameNode high‑availability with active‑standby configurations, and a “recent block” caching strategy that moves hot blocks into memory for faster processing.

YARN upgrades bring cgroup‑based memory and I/O isolation, Curator‑based ResourceManager leader election, container resizing, and the next‑generation Timeline Service (v2) with detailed documentation links.

MapReduce advances feature a native C/C++ map‑output collector (TaskNative) that can boost shuffle‑heavy jobs by ~30 % and automatic inference of memory parameters to prevent mis‑configuration.

Additional changes include splitting the Hadoop client into hadoop-client-api and hadoop-client-runtime jars, support for Azure and Aliyun distributed file systems, and various bug‑fixes and port adjustments.

For a concise list of the top 10 new items, see the code block below.

1、JDK版本的最低依赖从1.7变成了1.8</code>
<code>2、HDFS支持Erasure Encoding</code>
<code>3、Timeline Server v2版本</code>
<code>4、hadoop-client分为hadoop-client-api和hadoop-client-runtime两个依赖</code>
<code>5、支持随机container和分布式调度</code>
<code>6、MR进行了task级别的本地优化，性能提升30%</code>
<code>7、支持多个Standby状态的NameNode</code>
<code>8、多个端口被改动</code>
<code>9、支持微软的Azure分布式文件系统和阿里的aliyun分布式文件系统</code>
<code>10、datanode内部添加了负载均衡

Official documentation is available at http://hadoop.apache.org/docs/r3.0.0/ .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MapReduce YARN HDFS Hadoop Version 3

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.