Big Data 6 min read

How Xiaohongshu Cut Data Architecture Costs by One‑Third with Incremental Computing

This article explains how Xiaohongshu, a lifestyle community with over 350 million monthly users, transformed its data platform from a traditional Lambda architecture to a next‑generation incremental computing model, reducing architectural complexity, resource consumption and development effort each by roughly two‑thirds while supporting massive real‑time and offline data demands.

DataFunSummit
DataFunSummit
DataFunSummit
How Xiaohongshu Cut Data Architecture Costs by One‑Third with Incremental Computing

Overview

Xiaohongshu is a lifestyle community app with more than 350 million monthly active users. Its business revolves around "community + e‑commerce + commercialization" and generates daily logs of several hundred billion records, creating huge real‑time and offline data needs.

Xiaohongshu data architecture overview
Xiaohongshu data architecture overview

Data Platform Overview

The platform follows industry‑standard data‑warehouse modeling and includes self‑built scheduling, operations, asset‑management, governance, and reporting tools. Value output is divided into four categories:

Data analysis – reports for executives and self‑service analytics for operations and sales.

Data products – platforms for advertisers, merchants, creators and internal stakeholders.

Data services – user profiles and feature tags for recommendation, search and algorithm teams.

AI‑related services – AI‑driven insights, report generation and business recommendations.

2024 Infrastructure Migration

In 2024 the underlying infrastructure moved from AWS to Alibaba Cloud, migrating roughly 500 PB of data. The effort involved 110 000 tasks, 1 500 participants from more than 40 departments, and set an industry record for migration complexity. A hybrid‑cloud architecture is planned for the future.

Evolution of Data Architecture

To increase data efficiency and lower the barrier for executives and frontline staff, Xiaohongshu iterated its data architecture four times, aiming to reduce data acquisition cost, improve usage efficiency, and simplify data access for business teams.

Version 1.0 – ClickHouse‑Based Ad‑hoc Analysis

The initial architecture relied on ClickHouse for both offline wide‑table processing and online query serving. Spark SQL was used for batch processing, but response time improved only from minutes to seconds after moving to ClickHouse.

High cost – ClickHouse clusters require substantial CPU and memory resources.

Difficult scaling – As a compute‑storage integrated system, expanding the cluster entails costly data migration.

Poor data freshness – Data processed with Spark T+1 incurs latency, making real‑time insights hard to achieve.

Version 2.0 – Incremental Computing (Overview)

In the Big AI Data era, Xiaohongshu replaced the Lambda architecture with a new generic incremental computing framework, reducing architectural complexity by one‑third and cutting resource and development costs to one‑third of the previous levels. This approach defines clear standards for incremental data processing and enables more efficient, low‑latency data delivery.

Data platform components
Data platform components
Data architecture evolution diagram
Data architecture evolution diagram

By adopting incremental computation, Xiaohongshu achieved faster data freshness, lower operational costs, and a more scalable architecture that better supports AI‑driven services.

big datacloud migrationAIXiaohongshudata architectureincremental computing
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.