Frontend Development 12 min read

Performance Optimization of Cloud Editing Playback: Preloading and Latency Reduction

By analyzing latency sources and introducing a pre‑loading ‘prepare’ step with new player APIs, the cloud‑editing team reduced audio start‑up delays by roughly 200 ms on average—cutting half‑second waits to under three‑hundred milliseconds and markedly improving streamer workflow.

Ximalaya Technology Team
Ximalaya Technology Team
Ximalaya Technology Team
Performance Optimization of Cloud Editing Playback: Preloading and Latency Reduction

Background

The author previously wrote an article about cloud‑editing performance optimization that claimed a >200% efficiency gain. This follow‑up focuses on reducing the start‑up ("启播") latency of audio playback in the cloud‑editing product.

Cloud editing playback is a critical step for streamers: each edit must be auditioned to confirm the audio meets expectations. Daily click volume reaches hundreds of thousands, and many users experience noticeable waiting times before playback starts.

Pre‑analysis

Data analysis shows that long audio files and feature‑rich clips tend to have longer start‑up times. Users with weaker CPU or low memory also suffer longer delays. The distribution of start‑up times roughly follows a natural distribution, suggesting a mix of audio length and hardware factors.

At least 50% of users wait >0.5 s, and 25% wait >1 s before playback begins.

Code analysis

The existing implementation already follows a solid design; there is little room for further code‑level optimization. The playback flow is:

Audio data is loaded into memory.

When the user clicks, the latest edit configuration (including the audio URL) is sent to the algorithm package.

The algorithm package reads the audio file and streams the required segment.

The player receives the segment and starts playback, triggering UI events.

The UI renders the pointer movement and playback visualisation.

The author provides a representative code snippet that prepares the player:

import player from '@/lib/controller/player'

await player.prepare()

Time consumption analysis

Algorithm processing (CPU scheduling and audio read) consumes ~60‑80% of the latency.

Data flow through workers, WASM, iframes, and player loading accounts for ~20‑40%.

Optimization idea: Pre‑loading

Pre‑loading is common in static media (e.g., music or video) where the resource is known in advance. In the editing scenario the content changes with each user action, making pre‑loading risky because the pre‑loaded data may become stale.

Two main challenges:

The content is dynamic; pre‑loading may expire before the user clicks.

Integrating pre‑loading into the existing complex pipeline (worker → WASM → iframe → player) is non‑trivial. Expected effects If the interval between configuration change and click is longer than the original wait time, playback should be almost instantaneous. If the interval is short or zero, the wait time is reduced by the interval length. When the configuration does not change, pause‑resume should require virtually no waiting. Changing only the playback position should trigger only UI updates without re‑loading the audio. Additional minor optimisations are expected. These expectations rely on three guarantees: User actions are not continuous; there is always an interval that can be leveraged. The new player supports true resume without full reload. Pointer positioning can be adjusted without re‑reading the audio. Coverage of scenarios Ideal: configuration changes, then user waits briefly before clicking. Immediate click after configuration change. Frequent configuration changes without clicking (potentially throttled). Frequent changes interleaved with rapid clicks. Interruptions at any stage. The author enumerates all possible configuration changes (effects, tracks, operations, ASR, AI‑generated music, etc.) and notes that each may require separate pre‑loading logic. Design & implementation The solution introduces a prepare step that separates audition into pre‑load and actual playback. New player APIs include prepare , playAudio , pauseAudio , and seekTo . The player is refactored to support pause/resume, and throttling mechanisms are added to decide when to pre‑load. Result Data collected after deployment shows a clear reduction in latency: 300‑500 ms range: average latency reduced by ~200 ms. 500 ms‑1 s range: ~30% reduction (e.g., 800 ms → 500 ms). 1‑3 s range: slight improvement (e.g., 1.2 s → 1 s). Overall, each click now saves roughly 200 ms, making the experience noticeably smoother for streamers. Conclusion The series of optimisations, driven by data analysis and a careful pre‑loading strategy, successfully cut playback latency and improve user experience in a high‑traffic C‑end product. The author reflects on the technical difficulty and the pressure of refactoring production code, expressing gratitude for the learning experience.

performance optimizationfrontendPreloadingcloud editingLatency Reductionweb audio
Ximalaya Technology Team
Written by

Ximalaya Technology Team

Official account of Ximalaya's technology team, sharing distilled technical experience and insights to grow together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.