Cloud Native 28 min read

LAR: Load Auto-Regulator System for Resource Utilization and Service Quality

The article analyzes Meituan’s self‑designed Load Auto‑Regulator (LAR), detailing its tiered resource‑pool architecture, dynamic load‑to‑static‑resource mapping, and QoS mechanisms that together raise data‑center CPU utilization by 5‑10% while keeping online service quality stable, and discusses its deployment in online and mixed‑workload scenarios.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
LAR: Load Auto-Regulator System for Resource Utilization and Service Quality

Background

Rapid expansion of cloud‑era data centers has led to massive scale but low overall resource utilization—global averages hover around 10%‑20%—resulting in high cost and energy waste. National policies and industry reports emphasize improving utilization as a key to green, cost‑effective operations. Google’s Borg paper shows a rise from ~30% to ~50% CPU utilization over eight years, saving billions in costs, prompting many cloud providers to pursue similar gains.

What Is LAR?

LAR (Load Auto‑Regulator) is Meituan’s cluster‑load auto‑balancing management system built on top of Kubernetes. It introduces a tiered resource‑pool model and a complete QoS guarantee mechanism to reconcile the traditionally conflicting goals of higher resource utilization and strict service‑quality assurance.

Goals and Challenges

The system aims to increase resource reuse while maintaining service stability. Key challenges include inter‑service interference on shared hardware, workload peaks and valleys, and the unacceptable degradation of critical online services under contention.

System Architecture

LAR adds two core innovations to Kubernetes:

Tiered Resource Pooling : Nodes are divided into multiple resource pools with different priorities and isolation levels; resources flow between pools based on load, ensuring high‑priority pools receive sufficient capacity.

Dynamic Load ↔ Static Resource Mapping : Static resource requests are mapped to a dynamic load space; a Resource Configuration Factor (RCF, (0,1]) adjusts actual allocatable resources per pool according to real‑time load.

These extensions are realized via three new components— QoSAdaptor , Recommender , and an enhanced Scheduler —that plug into the native Kubernetes Scheduler and Kubelet without invasive changes.

Key Capability Implementations

Tiered Pooling Model : Dynamic pool management and priority‑based isolation; high‑priority pools receive stricter CPU pinning, I/O, network, and cache isolation.

Dynamic Resource View : Real‑time mapping of load to static resources, enabling seamless resource flow among pools while respecting user‑requested limits.

QoS Service‑Quality Guarantees : Multi‑dimensional isolation (CPU, memory, disk I/O, cache, network, etc.) and tiered mitigation actions (container eviction, CPU throttling, forced resource pre‑emption) triggered per‑second based on load levels.

Intelligent Operation : Recommender predicts peak usage from historical load, updates pool configurations, and guides node‑level scaling decisions.

Application Scenarios

Online Services : Critical, latency‑sensitive services are placed in the highest‑priority pool with strong isolation (CPU pinning, process‑level scheduling). Less critical online services occupy lower‑priority pools. Deployments in Meituan’s production clusters show average CPU utilization 5‑10% higher than native Kubernetes while maintaining or improving overall service quality.

Mixed (Online + Offline) Workloads : Offline batch jobs, which tolerate higher latency, share lower‑priority pools. During online low‑load periods, idle resources from high‑priority pools are reclaimed for offline jobs; during peaks, resources are pre‑empted back to online services within seconds, ensuring no degradation of critical services.

Evolution Roadmap

Since 2021, LAR has progressed through versions 1.0 (tiered pooling and dynamic view) and 2.0 (full QoS guarantee). Ongoing work focuses on deeper automation, smarter prediction, and broader mixed‑workload adoption.

Performance Highlights

In production, LAR’s online clusters achieve a 5‑10% increase in average CPU utilization compared with native Kubernetes, while service‑quality metrics (e.g., latency, TPS) remain more stable. Figures illustrate the CPU utilization curves and quality metrics across both systems.

Meituan online service dual‑peak characteristic
Meituan online service dual‑peak characteristic
Hulk resource utilization operation system
Hulk resource utilization operation system
LAR system architecture
LAR system architecture
Online cluster CPU utilization
Online cluster CPU utilization
Cluster service quality
Cluster service quality
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesload balancingresource schedulingCluster ManagementQoS
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.