How Hadoop YARN on Kubernetes Pods Supercharge Resource Utilization and Cut Costs
This article explains how Tencent Cloud EMR integrated Hadoop YARN with Kubernetes Pods to create a hybrid online‑offline deployment, implement elastic autoscaling and multi‑label resource allocation, and achieve several‑hundred‑percent improvements in CPU utilization while preserving cluster stability.
Background and Motivation
Traditional Hadoop ecosystems rely on YARN for resource management, leading to distinct online and offline clusters with low overall utilization and high cost. To address the need for dynamic offline capacity without reserving idle resources, Tencent Cloud EMR and the container team introduced Hadoop YARN on Kubernetes Pod , enabling elastic scaling and mixed deployment of online and offline workloads.
Hybrid Deployment Mode
The solution provides two key capabilities: elastic scaling that quickly adds container resources when needed, and online‑offline mixed deployment that leverages idle capacity of the online TKE cluster for offline jobs, reducing the frequency of pre‑reserved offline resources.
Elastic Autoscaling Module (yarn‑autoscaler)
The autoscaler supports two scaling strategies:
Load‑based scaling – triggers on YARN metrics such as availablevcore, pendingvcore, availablemem, pendingmem.
Time‑based scaling – follows daily, weekly or monthly schedules.
When a rule fires, the offline module queries the online TKE cluster for available compute specifications, calls the Kubernetes API to create the required Pods, and the ex‑scheduler places them on nodes with the most free resources.
Challenges Introduced by the New Features
1. AM Pod Eviction – Under node‑resource pressure, kubelet may evict Pods, causing the Application Master (AM) to fail and the whole application to restart, which is unacceptable for large jobs.
2. Yarn’s Non‑Exclusive Partition Limits – Yarn supports exclusive and non‑exclusive node‑label partitions, but using them in a mixed environment raises two problems:
Resource isolation – non‑exclusive partitions can be consumed by other partitions, preventing timely reallocation.
Dynamic partition selection – without visibility of remaining resources, jobs cannot choose the most suitable partition.
YARN Modifications to Address the Challenges
1. AM‑Side Storage Medium Selection
YARN’s community does not consider heterogeneous cloud resources. The new design lets the AM decide whether to run on a Pod‑type resource via a configurable flag reported by NodeManager through RPC. Benefits include:
Decentralization : Reduces logic in ResourceManager.
Cluster stability : Only ResourceManager restarts are needed; NodeManager remains unchanged.
Simplicity : Users can control AM placement without code changes.
2. Multi‑Label Dynamic Resource Allocation
YARN originally allowed a single label in job submissions. To enable simultaneous use of multiple partitions, the team extended the label expression syntax with logical operators (e.g., x||) and built a resource‑statistics module that dynamically reports each partition’s available capacity, allowing the scheduler to pick the best partition.
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>a,b</value>
</property>
...
</configuration>Practical Steps
Example configuration for a test environment:
NodeManager on 172.17.48.28/172.17.48.17 belongs to the default partition.
NodeManager on 172.17.48.29/172.17.48.26 belongs to the x partition.
To force the AM to run only on a specific node, set: yarn.nodemanager.am-alloc-disabled = true Submit a MapReduce job with combined labels:
hadoop jar /usr/local/service/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar pi \
-D mapreduce.job.queuename="a" \
-D mapreduce.job.node-label-expression="x||" 10 10After submission, containers are allocated across the x and default partitions.
Best Practices and Results
In production, the mixed deployment reduced idle CPU usage on the online TKE cluster by up to 500%, dramatically improving offline job timeliness and overall cluster cost efficiency. The approach also maintained high stability because only the ResourceManager needed a restart during upgrades.
Conclusion
The Hadoop YARN on Kubernetes Pod solution demonstrates how cloud‑native containerization can optimize big‑data workloads, boost resource utilization, and lower hardware expenses. Future work will explore additional cloud‑native scenarios for enterprise big‑data applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
