Cloud Native 12 min read

How Katalyst v0.4.0 Brings Tidal Colocation and Resource Overcommit to Native Kubernetes

Katalyst v0.4.0 introduces tidal colocation for mixed workloads, online resource overcommit, fine‑grained NUMA memory management, OOM priority enhancements, and topology‑aware scheduling, providing a comprehensive cost‑optimization solution for cloud‑native Kubernetes clusters.

ByteDance Cloud Native

Jan 24, 2024

How Katalyst v0.4.0 Brings Tidal Colocation and Resource Overcommit to Native Kubernetes

Katalyst, an open‑source cost‑optimization system from ByteDance, released version 0.4.0, adding new capabilities for cloud‑native environments.

Background

The system aims to solve inefficient resource usage in cloud‑native scenarios by providing standardized, open‑source solutions for resource management and cost reduction.

Tidal Colocation

Katalyst now supports tidal colocation, allowing workloads to be mixed on the same nodes by classifying nodes as "online" or "offline" and managing instance counts and node pools. This feature improves resource efficiency while reducing infrastructure complexity.

Instance Management : Uses HPA, CronHPA, etc., to scale online workloads and free resources for offline jobs.

Node Pool Management : The Tidal Controller performs bin‑packing on tidal node pools, allocating freed resources to offline workloads.

Example configuration for a tidal node pool:

apiVersion: tide.katalyst.kubewharf.io/v1alpha1
kind: TideNodePool
metadata:
  name: tidenodepool-example
spec:
  nodeConfigs:
    nodeSelector:
      tidenodes: "true"
    reserve:
      offline: 25%
      online: 10%

Online Overcommit

Katalyst introduces a transparent overcommit mechanism for online workloads, allowing higher resource utilization without requiring changes from application owners.

Steps to enable overcommit:

Label nodes for overcommit.

Create a TideNodePool configuration.

The controller tags nodes and performs dynamic bin‑packing.

Deploy workloads with appropriate labels and tolerations.

Sample node label:

apiVersion: v1
kind: Node
metadata:
  labels:
    beta.kubernetes.io/arch: amd64
    beta.kubernetes.io/os: linux
    tidenodes: "true"
  name: 192.168.0.11

Sample overcommit rule:

apiVersion: overcommit.katalyst.kubewharf.io/v1alpha1
kind: NodeOvercommitConfig
metadata:
  name: node-overcommit-config-1
spec:
  nodeOvercommitSelectorVal: "node-pool-1"
  resourceOvercommitRatio:
    cpu: 2

NUMA‑Granular Memory Management

Katalyst adds a NUMA‑aware memory provisioning framework that interacts with the sysadvisor memory plugin to calculate per‑NUMA memory provisions, enabling finer‑grained control of memory distribution across NUMA nodes.

Key components:

MemoryProvisioner plugin implementing MemoryAdvisorPlugin.

ProvisionPolicy interface with Update and GetProvision methods.

Policy calculates memory headroom per NUMA node, considering reclaimed cores, system scale factors, and reserved memory.

OOM Priority Enhancement

Katalyst injects a custom OOM priority strategy via eBPF, allowing dynamic adjustment of oom_score_adj based on pod QoS and resource usage. Users can specify OOM priority in pod annotations:

annotations:
  "katalyst.kubewharf.io/memory_enhancement": '{
    "numa_binding": "true",
    "numa_exclusive": "true",
    "oom_priority": 200
  }'

Topology‑Aware Scheduling

Version 0.4.0 adds topology‑aware scheduling with two strategies:

Native : Compatible with Kubernetes native NUMA affinity and CPU binding.

Dynamic : Enhanced binding for mixed workloads, supporting numa_binding and numa_exclusive semantics for dedicated cores QoS.

Additional Features

SysAdvisor framework supports custom business models.

QRM can set TCP memory limits at node and container levels.

Eviction integrates RootFS eviction with custom sorting and QoS thresholds.

KCMAS optimizes storage data structures and indexing.

ServiceProfilingDescriptor enables service‑level colocation baselines and per‑pod gray‑release.

Getting Started

Refer to the official documentation for detailed usage of tidal colocation and resource overcommit:

Tidal colocation: https://gokatalyst.io/docs/user-guide/tidal-colocation/

Resource overcommit: https://gokatalyst.io/docs/user-guide/resource-overcommitment/

Acknowledgments

The release thanks new contributors and invites the community to join the Katalyst open‑source project.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kubernetes resource optimization Overcommit OOM Priority numa Tidal Colocation

Written by

ByteDance Cloud Native

Sharing ByteDance's cloud-native technologies, technical practices, and developer events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.