Cloud Native 9 min read

Boost Cluster Efficiency with Koordinator’s K8s‑YARN Co‑Location Solution

Koordinator extends its open‑source container scheduler to enable seamless co‑location of Kubernetes Pods and Hadoop YARN tasks, allowing over‑provisioned batch resources to be shared without modifying YARN, and has delivered up to 10 % CPU utilization gains and sub‑1 % eviction rates in Xiaohongshu’s production clusters.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Boost Cluster Efficiency with Koordinator’s K8s‑YARN Co‑Location Solution

Background

Koordinator is an open‑source project from Alibaba that originally focused on container scheduling within the Kubernetes ecosystem. While many workloads have moved to K8s, a large number of big‑data jobs still run on Apache Hadoop YARN, and cloud providers continue to offer YARN‑based services such as E‑MapReduce.

Motivation and Community Effort

To extend Koordinator’s offline co‑location capabilities, developers from Alibaba Cloud, Xiaohongshu, and Ant Financial launched a joint Hadoop YARN‑K8s co‑location project. The solution enables over‑provisioned batch resources to be shared with YARN, and it is already deployed in Xiaohongshu’s production environment.

Design Principles

YARN remains the submission entry for offline jobs.

The solution builds on the open‑source Hadoop YARN without invasive modifications.

Co‑located resources can be consumed by both K8s Pods and YARN tasks on the same node.

QoS policies are managed by Koordlet and are compatible with YARN task runtime.

Architecture

ResourceManager (RM) and NodeManager (NM) stay as core YARN components; NM runs as a container in the mixed environment. Koordinator adds a koord‑yarn‑operator to synchronize batch resource quotas to the YARN RM. Resource isolation is enforced via cgroup paths under the besteffort QoS class.

A sidecar module koord‑yarn‑copilot collects task metadata, resource metrics, and performs eviction actions. QoS strategies remain in Koordlet and are exposed to the copilot through a plugin interface, preserving extensibility for future resource frameworks.

Production Experience at Xiaohongshu

Facing heavy Spark workloads that congested offline clusters, Xiaohongshu leveraged the co‑location solution to keep the YARN submission interface unchanged while moving tasks onto idle online resources. Key techniques included RemoteShuffleService to mitigate local‑disk bottlenecks and fine‑grained priority and QoS policies for different job types.

Results: coverage of tens of thousands of online nodes providing hundreds of thousands of CPU cores, offline task eviction rate below 1 %, and an average CPU utilization increase of 8‑10 % (some nodes exceeding 45 %). The benefits continue to grow as more workloads are added.

Getting Started

The K8s‑YARN co‑location features are near completion; the Koordinator team is preparing the final release. Interested contributors can join the discussion at the community forum and follow the design documentation for detailed implementation steps.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesResource ManagementCluster Scheduling
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.