Cloud Native 20 min read

How Xiaohongshu Cut Kafka Storage Costs by 60% with a Cloud‑Native Tiered Architecture

Facing exploding Kafka scale, Xiaohongshu’s data‑storage team adopted a cloud‑native design that introduces tiered hot‑cold storage, containerization, and a custom load‑balancing service, achieving dramatic storage‑cost reductions, minute‑level cluster migrations, high‑performance data access, and automated resource scheduling.

dbaplus Community

Jul 2, 2024

How Xiaohongshu Cut Kafka Storage Costs by 60% with a Cloud‑Native Tiered Architecture

Background

Xiaohongshu’s Kafka clusters grew to TB‑level peak throughput driven by AI models and large‑scale data pipelines. The classic Apache Kafka deployment on VMs with replicated disks incurred high storage costs, tightly coupled compute‑storage, and scaling that required moving massive amounts of data over days.

Key Technical Challenges

Cost : Multi‑replica storage on cloud SSDs was prohibitively expensive.

Efficiency : Scaling required full data migration, taking days to weeks.

Stability : Slow scaling degraded latency and response time under load.

Solution Exploration

Three architectural directions were evaluated:

Self‑built compute‑storage separation on ObjectStore : Store data in an object store with erasure coding and add a separate read/write acceleration layer.

Hot‑cold tiered storage on ObjectStore : Keep recent (hot) data on high‑performance cloud disks and offload older (cold) data to low‑cost object storage.

Apache Pulsar : Cloud‑native messaging engine with native compute‑storage separation, but required a large migration effort and offered limited storage‑cost benefit.

Cost reduction was the primary driver, leading to the adoption of the tiered‑storage approach.

Cloud‑Native Kafka Architecture

The system is organized into four layers:

Access Layer : Custom SDK providing authentication, rate‑limiting, and service discovery.

Compute Layer : Kafka brokers configured with tiered storage, treating data as “weakly stateful” to enable elasticity.

Scheduling Layer : A load‑balancing scheduler (Balance Control) that distributes traffic across brokers and works with containers for automatic scaling.

Infrastructure Layer : Container runtime (Kubernetes) + object‑store foundation offering PaaS‑style management and virtually unlimited low‑cost storage.

Tiered Storage Benefits

Cold segments are migrated to object storage, reducing storage cost by up to 60 % and extending topic retention from 1 day to ≥7 days. Hot data remains on high‑performance disks, cutting required disk capacity by ~75 %.

Minute‑Level Elastic Migration

Because all replicas of a topic share the same cold data, scaling a broker only requires copying the missing hot segments (typically the latest segment). In practice a 10 GB/s topic can be re‑balanced in ~5 minutes instead of days.

High‑Performance Cache Strategy

Object storage introduces higher latency and lower single‑thread throughput. The following optimizations were applied:

Batch reads aligned to 8 MB segment boundaries to reduce I/O calls.

Cache pre‑loading based on locality and sequential read patterns, with memory isolation to avoid page‑cache pollution.

Resulting cache hit rate reached **99 %**, and cold‑data read throughput improved by ~30 %.

State Management in Kubernetes

Kafka topology (Pod IP, identity) is stabilized by the container platform. Persistent data, logs, and configuration are stored in:

PersistentVolumes (PV) / PersistentVolumeClaims (PVC) attached to each StatefulSet pod.

ConfigMaps for static configuration.

Supervisord inside the container monitors the Kafka process and restarts it on failure.

A Webhook‑based delete‑protection mechanism rejects destructive operations when a safety threshold is exceeded.

Autoscaling & Elastic Scheduling

Two complementary mechanisms drive elasticity:

Cluster Autoscaler (CA) automatically adds or removes Kubernetes nodes based on aggregate broker metrics.

Balance Control continuously collects broker metrics via a custom KafkaMetricsReporter, builds a ClusterModel (broker capacity, traffic, replica distribution), and runs a multi‑objective optimizer to move or swap partitions.

Horizontal Pod Autoscaler (HPA) triggers broker pod scaling when CPU or network usage exceeds a threshold (e.g., 70 %). The combined workflow achieves “elastic scaling, pay‑as‑you‑go”.

Implementation Highlights

Cold data migration to object storage using Kafka’s tiered‑storage configuration ( storage.type=object).

Hot data retained on SSD/EBS with aggressive retention policies.

Batch‑read logic added to the broker’s fetch service to align reads to 8 MB.

Cache pre‑loader implemented as a sidecar that pre‑fetches sequential segments into an in‑memory LRU cache.

Kubernetes manifests (StatefulSet, Service, PVC) version‑controlled in a Git repository (e.g., [email protected]:xiaohongshu/kafka‑tiered‑storage.git).

Results

After six months of production rollout (≈80 % of clusters upgraded):

Storage cost reduced by up to **60 %**.

Topic retention extended from 1 day to ≥7 days.

Cluster scaling time decreased from days to minutes (10 GB/s topic re‑balanced in ~5 min).

Cache hit rate of **99 %** and cold‑data read performance improvement of **≈30 %**.

Operational efficiency improved ten‑fold, enabling a commercial “elastic‑scaling, pay‑per‑use” offering.

Future Directions

Ongoing work focuses on deeper compute‑storage separation (e.g., native object‑store read/write acceleration) and multi‑active disaster‑recovery designs to further improve scalability and reliability.

cloud-native autoscaling containerization Tiered Storage cost-optimization

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.