Cloud Native 7 min read

Xiaohongshu Large-Scale Cloud-Native Mixed Deployment and Elasticity Practices

Xiaohongshu’s cloud‑native team transformed its over‑90% containerized services by introducing resource‑pooled mixed deployment, fine‑grained unified scheduling, and an elastic container pool with global HPA and cluster autoscaling—driving 35% of resources to mixed use, tens of millions of daily core‑hours, and roughly 30% cost savings while preparing for hybrid‑cloud expansion and FinOps.

Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Xiaohongshu Large-Scale Cloud-Native Mixed Deployment and Elasticity Practices

The China Academy of Information and Communications (CAICT) organized the "2024 Cloud-Native and Application Modernization Benchmark Cases" to showcase innovative and practical cases. Xiaohongshu’s "Large-Scale Cloud-Native Mixed Deployment and Elasticity Practice" was selected as a benchmark case for cloud-native applications.

Since its founding in 2013, Xiaohongshu has pursued a cloud-native strategy for development efficiency and iteration speed. Today, more than 90% of its core services—including community, search, recommendation, and advertising—are containerized.

Rapid business growth created increasing demand for compute resources, while online cluster CPU utilization remained low. Analysis identified three main issues: (1) early resource management prioritized business needs over efficient allocation, leading to coarse-grained resource isolation and planning; (2) pronounced traffic tides caused insufficient elasticity; (3) cost‑optimization pressures required a solution that balances stability and resource efficiency.

To address these challenges, the cloud‑native team introduced several technical innovations:

Resource pooling and mixed deployment eliminated high fragmentation and low utilization of dedicated resource pools, simplifying scaling and improving overall efficiency.

A unified scheduling system incorporating resource isolation, reuse, interference detection, and conflict handling enabled fine‑grained scheduling across heterogeneous resources, supporting differentiated QoS for online and offline services.

Construction of a company‑wide elastic container pool with on‑demand scaling, combined with global HPA (Horizontal Pod Autoscaling) and CA (Cluster Autoscaling) capabilities, reduced idle buffer costs and introduced an elastic‑as‑a‑service model.

System‑software enhancements, including the development of RedOS (a custom operating system) and the "big‑node small‑pod" strategy, mitigated unpredictable interference and improved stability for mixed‑deployment workloads.

These efforts resulted in mixed deployment covering over 35% of Xiaohongshu’s total resource pool and delivering tens of millions of core‑hours of offline compute daily. The elasticity features (HPA, CA) have been integrated into core services such as search, recommendation, advertising, and e‑commerce, achieving approximately a 30% cost reduction.

Looking ahead, the team plans to extend the architecture to hybrid‑cloud environments, further productize performance‑enhancement platforms, and advance FinOps practices.

The Xiaohongshu cloud‑native platform team is actively recruiting talent interested in operating systems, system software, cloud‑native technologies, and scheduling. Interested candidates can send resumes to [email protected] (cc: [email protected]).

Performance Optimizationcloud-nativeContainerizationOperating SystemResource Schedulingmixed deploymentelasticity
Xiaohongshu Tech REDtech
Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.