Cloud Native 17 min read

How ByteDance’s Katalyst Memory Advisor Boosts Kubernetes Memory Efficiency

This article explains the challenges of memory management in mixed workloads, outlines the limitations of native Linux and Kubernetes mechanisms, and details how ByteDance’s open‑source Katalyst Memory Advisor improves memory utilization, QoS, and eviction handling through user‑space policies, multi‑dimensional interference detection, and adaptive mitigation actions.

Volcano Engine Developer Services

Oct 12, 2023

How ByteDance’s Katalyst Memory Advisor Boosts Kubernetes Memory Efficiency

Background

In mixed‑workload environments, memory management is critical because tight node or container memory can cause latency jitter or OOM, especially when memory is oversold. Unreleased memory further reduces the amount that can be allocated to offline jobs, limiting effective overselling.

Limitations of Native Solutions

Kernel native memory allocation and reclamation are greedy, allocating aggressively and only triggering reclamation when watermarks are high. Fast allocation checks low‑watermark, may invoke quick reclamation, while slow allocation involves waking Kswapd, compaction, global reclamation, and possibly OOM.

Kubernetes native memory management includes Memory Limits (setting memory.limit_in_bytes), eviction via node taints ( node.kubernetes.io/memory-pressure), and OOM scoring ( /proc/<pid>/oom_score_adj) based on QoS, priority, and usage. Memory QoS (since v1.22) uses cgroup v2 settings memory.min, memory.high, and memory.max, but suffers from fairness, priority, and throttling issues.

Katalyst Memory Advisor Overview

Katalyst implements a user‑space memory management framework called Memory Advisor, open‑sourced in the Katalyst resource manager. Its architecture is plug‑in based, consisting of:

Katalyst Agent with Eviction Manager and various eviction plugins (System Memory Pressure, NUMA Memory Pressure, RSS Overuse, Reclaimed Resource Pressure).

Memory QRM Plugin for Memcg configuration and Drop Cache.

SysAdvisor with plugins such as Cache Reaper, Memory Guard, and Memset Binder.

Reporter for taint reporting and MetaServer for metadata.

Malachite for metrics collection.

Katalyst Scheduler with native and QoS‑aware taint‑toleration plugins.

Key Features of Memory Advisor

Multi‑dimensional interference detection monitors whole‑node and NUMA watermarks, Kswapd reclaim rate, pod‑level RSS overuse, and QoS memory satisfaction.

Multi‑level mitigation actions include:

Disabling scheduling via node taints.

"Tune Memcg" – adjusting Memcg thresholds for victim pods.

"Drop Cache" – forcing cache release via memory.force_empty (cgroup v1) or memory.reclaim (cgroup v2).

Eviction – ranking pods by QoS, priority, and memory usage before removal.

The Eviction Manager delegates strategies to plugins and converges actions, supporting dry‑run validation.

Offline Large‑Frame Management

Memory Guard computes the total memory quota ( reclaimed_cores) for offline pods and writes it to memory.limit_in_bytes of the BestEffort cgroup.

Memory Migration

For NUMA‑sensitive workloads (e.g., Flink), Memory Advisor detects hotspot nodes and dynamically rebinds containers to less‑loaded NUMA nodes.

Memcg Differentiated Reclamation

By leveraging veLinux’s asynchronous Memcg reclamation, ByteDance provides per‑pod annotations to set conservative or aggressive reclamation thresholds, reducing direct reclamation impact on latency‑sensitive services.

Cold Memory Offloading

Inspired by Meta’s TMO, future work will use PSI and DAMON to detect cold pages and offload them to cheap storage or compress via zRAM, increasing available memory for oversold workloads.

Future Plans

Upcoming enhancements aim to decouple memory‑advisor capabilities from QoS, broaden applicability beyond mixed workloads, refine OOM priority handling, and integrate BPF‑based programmable OOM policies.

Conclusion

Deployed on over 900,000 nodes managing tens of millions of cores, Katalyst has raised cluster‑level memory utilization from ~20% to ~60% while maintaining stability for microservices, search, storage, big data, and AI jobs.

memory management Kubernetes Resource Optimization Katalyst

Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.