Mobile Development 13 min read

How Xiaohongshu Delivered Billion‑User Voice & Fireworks Effects with Adaptive Rendering

During the 2026 Chinese New Year, Xiaohongshu built a real‑time dynamic interaction system that combined adaptive scheduling, high‑performance particle rendering, and industrial‑grade ASR to deliver synchronized voice greetings and emoji fireworks to over a billion daily active users across heterogeneous mobile devices.

Xiaohongshu Tech REDtech

Mar 4, 2026

How Xiaohongshu Delivered Billion‑User Voice & Fireworks Effects with Adaptive Rendering

Background

During the 2026 Chinese New Year, the social platform needed to support two large‑scale interactive features – voice greetings and emoji‑triggered fireworks – for a peak of over 100 million daily active users. The goal was to deliver a synchronized audio‑visual experience on a heterogeneous set of mobile devices.

Key Interaction Scenarios

Emoji fireworks: sending a specific New Year emoji launches a full‑screen fireworks animation.

Voice greetings: when a voice message plays, subtitles appear word‑by‑word in sync with fireworks.

Technical Challenges

Millisecond‑level synchronization of real‑time ASR, subtitle rendering and particle effects.

High‑performance particle rendering capable of displaying hundreds of particles simultaneously.

Multiple fireworks styles and background themes that must be switchable on demand.

Unified multimodal asset management with on‑demand loading.

Adaptive rendering and graceful degradation across a wide range of device capabilities.

System Architecture – Five‑Layer Real‑Time Dynamic Interaction System

Application layer : defines user‑visible interactions (voice greetings, emoji fireworks) and invokes standardized interfaces.

Resource & Global Configuration layer : preloads animation assets, distributes routing rules, and adapts to device/network conditions.

Render‑Engine Scheduling & Decision layer : maps business parameters, fuses runtime features, and dynamically selects the optimal rendering engine.

Multi‑Modal Rendering Execution layer : encapsulates Predy, Lottie, PAG, GIF, video, etc., providing a unified lifecycle and callback interface.

Observability layer : builds a four‑dimensional monitoring system (core experience, smart scheduling, resource efficiency, stability) with real‑time alerts.

Intelligent Scheduling

The scheduling layer builds a decision matrix from business demands, device resource water‑level, compatibility whitelist, and asset readiness. It outputs engine choice, render level and parameter indices. A dual‑threshold model (down‑threshold Tdown, up‑threshold Tup, hysteresis interval H = Tup‑Tdown) guarantees stable degradation and recovery without oscillation.

Engine Evaluation

Several rendering solutions were benchmarked. The Predy engine – a hybrid of JavaScript logic and native GPU rendering – achieved cross‑platform consistency, high performance, and graded rendering support. Compared alternatives:

Lottie/PAG: stable but limited particle performance.

Native rendering: highest raw performance but poor cross‑platform reuse.

Flutter/Unity: powerful particle/physics engines but heavy package size.

H5/WebView: flexible but long startup time and lower playback success.

ASR Solution – FireRedASR2S

The voice‑recognition pipeline uses the internally developed FireRedASR2S system, which integrates silence detection, language detection, speech‑to‑text, and punctuation restoration. It supports 20+ Chinese dialects and multiple acoustic scenarios. Evaluation on 24 test sets shows:

Average character error rate (CER): 9.67% (vs. 12.98% for Doubao ASR).

Language‑detection accuracy: 97.18% .

Silence‑detection F1 score: 97.57% .

Observability & Metrics

A four‑dimensional core indicator system monitors:

Core experience (resource preload, engine creation, render interaction).

Smart scheduling (real‑time decision quality).

Resource efficiency (memory downgrade, cache hit rate).

Stability (crash‑free rate).

Key performance numbers during the CNY event:

Overall success rate: 99.9% .

Predy engine load success: 99.8% .

Animation playback success: 99.6% .

P95 first‑frame render time: < 220 ms .

Audio‑visual sync deviation (P95): < 50 ms .

81 % of devices operated at High rendering level; 18 % gracefully degraded to Medium/Low.

Results

The system handled over 100 million DAU without stability incidents. Voice greetings exceeded participation expectations, activating many previously silent or new social connections. The real‑time interaction system achieved a 99.9 % reach success rate, confirming the feasibility of the architecture for future large‑scale, intelligent rendering scenarios.

Cross‑platform mobile performance real-time rendering ASR adaptive scheduling

Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.