Big Data 23 min read

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

This article details Meituan's experience optimizing the Hadoop YARN fair scheduler, covering background challenges, architectural components, resource abstractions, scheduling flow, performance metrics, a series of code‑level optimizations, stability strategies for production rollout, and future directions for large‑scale cluster scheduling.

Qunar Tech Salon

Aug 22, 2019

Performance Optimization Practices for Meituan's Hadoop YARN Fair Scheduler

Background

YARN is Hadoop's resource manager responsible for managing compute resources and job scheduling across the cluster. Meituan builds on the community 2.7.1 branch to support offline, real‑time, and machine‑learning workloads. As the cluster grows, scheduler performance becomes a bottleneck, limiting resource utilization.

Overall Architecture

YARN Architecture

YARN finds suitable resources for jobs, launches tasks, and manages job lifecycles.

Resource Abstraction

YARN abstracts resources in two dimensions: CPU and memory.

class Resource{<br>  int cpu;   // number of CPU cores<br>  int memory‑mb; // memory in MB<br>}

Job requests are expressed as List[ResourceRequest]:

class ResourceRequest{<br>  int numContainers; // number of containers needed<br>  Resource capability; // resources per container<br>}

The scheduler returns List[Container]:

class Container{<br>  ContainerId containerId; // globally unique ID<br>  Resource capability; // container resources<br>  String nodeHttpAddress; // NodeManager hostname<br>}

YARN Scheduling Architecture

Key components include ResourceScheduler, AsyncDispatcher, ResourceTrackerService, ApplicationMasterService, and AppMaster.

Fair Scheduler

Job Organization

Jobs (Apps) are leaf nodes in a tree‑structured queue hierarchy.

Core Scheduling Process

The scheduler locks the FairScheduler object to avoid data‑structure conflicts.

It selects a node, traverses the queue tree from ROOT, chooses a child queue at each level by the fair policy, and finally selects an App to allocate resources on the node.

For each queue the scheduler performs a pre‑check, sorts child queues/apps, and recurses.

Fair Scheduler Architecture

The scheduler is single‑threaded for the core flow, making container allocation a serialization point.

Scheduler Lock: FairScheduler object lock.

AllocationFileLoaderService: hot‑loads fair‑policy config.

Continuous Scheduling Thread: repeatedly executes the core flow.

Update Thread: updates queue demands and performs container preemption.

Scheduler Event Dispatcher Thread: handles events such as App addition/removal and Node changes.

Performance Evaluation

Traditional online metrics (QPS, TP99) are unsuitable for schedulers because RPC latency is decoupled from scheduling latency. Instead, Meituan defines validSchedule (whether resource demand is met) and measures it per minute and per day.

validPending = min(queuePending, QueueMaxQuota)<br>if (usage/total > 90% || validPending == 0) validSchedulePerMin = 1;<br>if (validPending > 0 && usage/total < 90%) validSchedulePerMin = 0;

Another key metric is CPS (Containers per Second), which reflects the scheduler’s throughput under varying pressure.

Key Optimization Points

Optimizing Sorting Comparison Function

The original quick‑sort comparator repeatedly computed resourceUsage by traversing the entire subtree, causing high CPU cost. The new approach pre‑computes and updates resourceUsage incrementally when containers are allocated or released, reducing the complexity to O(1).

class FairScheduler{<br>  synchronized Resource attemptScheduling(NodeId node){<br>    root.assignContainer(node);<br>  }<br>}<br>class Queue{<br>  Resource assignContainer(NodeId node){<br>    if (!preCheck(node)) return;<br>    sort(this.children);<br>    if (this.isParent){<br>      for (Queue q : this.children) q.assignContainer(node);<br>    } else {<br>      for (App app : this.runnableApps) app.assignContainer(node);<br>    }<br>  }<br>}

Optimizing Job Skip Time

Many queues/apps have zero demand. By filtering them out before sorting, the time spent skipping such jobs dropped from ~20 seconds per minute to negligible.

Parallel Queue Sorting

Sorting each queue for every container allocation is linear in the number of queues/apps. Meituan decoupled sorting from allocation, using a thread‑pool where each thread sorts a cloned snapshot of a queue, achieving sub‑5 ms sorting for 2 000 queues and supporting >5 × 10⁴ CPS under heavy load.

Stability Roll‑out Strategies

Online Rollback

All optimization knobs are configurable at runtime. Changing a parameter triggers a copy‑on‑write of the current configuration so that the scheduler sees a consistent view during a dispatch cycle, avoiding crashes caused by mid‑cycle config changes.

Automatic Data Verification

The system periodically compares the old (on‑the‑fly) resourceUsage with the new pre‑computed value. Mismatches raise alerts and automatically correct the erroneous data.

Conclusion and Future Outlook

The article presents a systematic approach: define macro metrics, instrument fine‑grained counters, use an efficient load‑simulation tool, apply algorithmic improvements (complexity reduction, caching, parallelism), and ensure safe production roll‑out. Future work includes adopting Hadoop 3.0 Global Scheduling and YARN Federation to scale beyond 10 k nodes.

Author

Shi Long and Ting Wen , R&D Engineers, Meituan User Platform Big Data & Algorithm Department.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Big Data resource scheduling YARN Fair Scheduler Load Simulation

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.