Performance Optimization of Cloud Editing Playback: Preloading and Latency Reduction
By analyzing latency sources and introducing a pre‑loading ‘prepare’ step with new player APIs, the cloud‑editing team reduced audio start‑up delays by roughly 200 ms on average—cutting half‑second waits to under three‑hundred milliseconds and markedly improving streamer workflow.
Background
The author previously wrote an article about cloud‑editing performance optimization that claimed a >200% efficiency gain. This follow‑up focuses on reducing the start‑up ("启播") latency of audio playback in the cloud‑editing product.
Cloud editing playback is a critical step for streamers: each edit must be auditioned to confirm the audio meets expectations. Daily click volume reaches hundreds of thousands, and many users experience noticeable waiting times before playback starts.
Pre‑analysis
Data analysis shows that long audio files and feature‑rich clips tend to have longer start‑up times. Users with weaker CPU or low memory also suffer longer delays. The distribution of start‑up times roughly follows a natural distribution, suggesting a mix of audio length and hardware factors.
At least 50% of users wait >0.5 s, and 25% wait >1 s before playback begins.
Code analysis
The existing implementation already follows a solid design; there is little room for further code‑level optimization. The playback flow is:
Audio data is loaded into memory.
When the user clicks, the latest edit configuration (including the audio URL) is sent to the algorithm package.
The algorithm package reads the audio file and streams the required segment.
The player receives the segment and starts playback, triggering UI events.
The UI renders the pointer movement and playback visualisation.
The author provides a representative code snippet that prepares the player:
import player from '@/lib/controller/player'
await player.prepare()Time consumption analysis
Algorithm processing (CPU scheduling and audio read) consumes ~60‑80% of the latency.
Data flow through workers, WASM, iframes, and player loading accounts for ~20‑40%.
Optimization idea: Pre‑loading
Pre‑loading is common in static media (e.g., music or video) where the resource is known in advance. In the editing scenario the content changes with each user action, making pre‑loading risky because the pre‑loaded data may become stale.
Two main challenges:
The content is dynamic; pre‑loading may expire before the user clicks.
Integrating pre‑loading into the existing complex pipeline (worker → WASM → iframe → player) is non‑trivial. Expected effects If the interval between configuration change and click is longer than the original wait time, playback should be almost instantaneous. If the interval is short or zero, the wait time is reduced by the interval length. When the configuration does not change, pause‑resume should require virtually no waiting. Changing only the playback position should trigger only UI updates without re‑loading the audio. Additional minor optimisations are expected. These expectations rely on three guarantees: User actions are not continuous; there is always an interval that can be leveraged. The new player supports true resume without full reload. Pointer positioning can be adjusted without re‑reading the audio. Coverage of scenarios Ideal: configuration changes, then user waits briefly before clicking. Immediate click after configuration change. Frequent configuration changes without clicking (potentially throttled). Frequent changes interleaved with rapid clicks. Interruptions at any stage. The author enumerates all possible configuration changes (effects, tracks, operations, ASR, AI‑generated music, etc.) and notes that each may require separate pre‑loading logic. Design & implementation The solution introduces a prepare step that separates audition into pre‑load and actual playback. New player APIs include prepare , playAudio , pauseAudio , and seekTo . The player is refactored to support pause/resume, and throttling mechanisms are added to decide when to pre‑load. Result Data collected after deployment shows a clear reduction in latency: 300‑500 ms range: average latency reduced by ~200 ms. 500 ms‑1 s range: ~30% reduction (e.g., 800 ms → 500 ms). 1‑3 s range: slight improvement (e.g., 1.2 s → 1 s). Overall, each click now saves roughly 200 ms, making the experience noticeably smoother for streamers. Conclusion The series of optimisations, driven by data analysis and a careful pre‑loading strategy, successfully cut playback latency and improve user experience in a high‑traffic C‑end product. The author reflects on the technical difficulty and the pressure of refactoring production code, expressing gratitude for the learning experience.
Ximalaya Technology Team
Official account of Ximalaya's technology team, sharing distilled technical experience and insights to grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.