How Xiaohongshu Cut Kafka Storage Costs by 60% with a Cloud‑Native Tiered Architecture
Facing exploding Kafka scale, Xiaohongshu’s data‑storage team adopted a cloud‑native design that introduces tiered hot‑cold storage, containerization, and a custom load‑balancing service, achieving dramatic storage‑cost reductions, minute‑level cluster migrations, high‑performance data access, and automated resource scheduling.
Background
Xiaohongshu’s Kafka clusters grew to TB‑level peak throughput driven by AI models and large‑scale data pipelines. The classic Apache Kafka deployment on VMs with replicated disks incurred high storage costs, tightly coupled compute‑storage, and scaling that required moving massive amounts of data over days.
Key Technical Challenges
Cost : Multi‑replica storage on cloud SSDs was prohibitively expensive.
Efficiency : Scaling required full data migration, taking days to weeks.
Stability : Slow scaling degraded latency and response time under load.
Solution Exploration
Three architectural directions were evaluated:
Self‑built compute‑storage separation on ObjectStore : Store data in an object store with erasure coding and add a separate read/write acceleration layer.
Hot‑cold tiered storage on ObjectStore : Keep recent (hot) data on high‑performance cloud disks and offload older (cold) data to low‑cost object storage.
Apache Pulsar : Cloud‑native messaging engine with native compute‑storage separation, but required a large migration effort and offered limited storage‑cost benefit.
Cost reduction was the primary driver, leading to the adoption of the tiered‑storage approach.
Cloud‑Native Kafka Architecture
The system is organized into four layers:
Access Layer : Custom SDK providing authentication, rate‑limiting, and service discovery.
Compute Layer : Kafka brokers configured with tiered storage, treating data as “weakly stateful” to enable elasticity.
Scheduling Layer : A load‑balancing scheduler (Balance Control) that distributes traffic across brokers and works with containers for automatic scaling.
Infrastructure Layer : Container runtime (Kubernetes) + object‑store foundation offering PaaS‑style management and virtually unlimited low‑cost storage.
Tiered Storage Benefits
Cold segments are migrated to object storage, reducing storage cost by up to 60 % and extending topic retention from 1 day to ≥7 days. Hot data remains on high‑performance disks, cutting required disk capacity by ~75 %.
Minute‑Level Elastic Migration
Because all replicas of a topic share the same cold data, scaling a broker only requires copying the missing hot segments (typically the latest segment). In practice a 10 GB/s topic can be re‑balanced in ~5 minutes instead of days.
High‑Performance Cache Strategy
Object storage introduces higher latency and lower single‑thread throughput. The following optimizations were applied:
Batch reads aligned to 8 MB segment boundaries to reduce I/O calls.
Cache pre‑loading based on locality and sequential read patterns, with memory isolation to avoid page‑cache pollution.
Resulting cache hit rate reached **99 %**, and cold‑data read throughput improved by ~30 %.
State Management in Kubernetes
Kafka topology (Pod IP, identity) is stabilized by the container platform. Persistent data, logs, and configuration are stored in:
PersistentVolumes (PV) / PersistentVolumeClaims (PVC) attached to each StatefulSet pod.
ConfigMaps for static configuration.
Supervisord inside the container monitors the Kafka process and restarts it on failure.
A Webhook‑based delete‑protection mechanism rejects destructive operations when a safety threshold is exceeded.
Autoscaling & Elastic Scheduling
Two complementary mechanisms drive elasticity:
Cluster Autoscaler (CA) automatically adds or removes Kubernetes nodes based on aggregate broker metrics.
Balance Control continuously collects broker metrics via a custom KafkaMetricsReporter, builds a ClusterModel (broker capacity, traffic, replica distribution), and runs a multi‑objective optimizer to move or swap partitions.
Horizontal Pod Autoscaler (HPA) triggers broker pod scaling when CPU or network usage exceeds a threshold (e.g., 70 %). The combined workflow achieves “elastic scaling, pay‑as‑you‑go”.
Implementation Highlights
Cold data migration to object storage using Kafka’s tiered‑storage configuration ( storage.type=object).
Hot data retained on SSD/EBS with aggressive retention policies.
Batch‑read logic added to the broker’s fetch service to align reads to 8 MB.
Cache pre‑loader implemented as a sidecar that pre‑fetches sequential segments into an in‑memory LRU cache.
Kubernetes manifests (StatefulSet, Service, PVC) version‑controlled in a Git repository (e.g., [email protected]:xiaohongshu/kafka‑tiered‑storage.git).
Results
After six months of production rollout (≈80 % of clusters upgraded):
Storage cost reduced by up to **60 %**.
Topic retention extended from 1 day to ≥7 days.
Cluster scaling time decreased from days to minutes (10 GB/s topic re‑balanced in ~5 min).
Cache hit rate of **99 %** and cold‑data read performance improvement of **≈30 %**.
Operational efficiency improved ten‑fold, enabling a commercial “elastic‑scaling, pay‑per‑use” offering.
Future Directions
Ongoing work focuses on deeper compute‑storage separation (e.g., native object‑store read/write acceleration) and multi‑active disaster‑recovery designs to further improve scalability and reliability.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
