Mobile Development 14 min read

Ximalaya Live Streaming Instant-Start Optimization Practices

Ximalaya improves live‑stream start‑up by measuring first‑frame latency, optimizing push (dynamic bitrate, GOP caching) and pull (RTMP/HTTP‑FLV, HttpDNS, pre‑fetch URLs, progressive component loading, pre‑render SurfaceView, low‑buffer water‑level) across detailed pipeline stages, achieving over 90 % audio and 85 % video instant‑open rates while planning H.265 adoption.

Ximalaya Technology Team

Jul 13, 2023

Ximalaya Live Streaming Instant-Start Optimization Practices

With the rapid development of live streaming, Ximalaya offers various live formats such as show rooms, courses, and voice chat rooms. To improve user experience, this article systematically analyzes the entire Ximalaya live‑streaming workflow and details a series of start‑up optimization measures.

Background : Quality of Experience (QoE) is crucial in live streaming, and first‑frame latency is the primary metric. Experience levels are defined as Excellent (≤500 ms), Good (500 ms‑1 s), Fair (1‑1.5 s), Poor (1.5‑2 s), and Very Poor (>2 s).

Measurement : Start‑up experience is converted into quantifiable indicators. The start time depends on the entry point—either the page creation time (Android Fragment onCreate) or the ViewPager onPageSelected callback. The end time is the player’s first‑frame callback. First‑frame latency = end time – start time.

Full‑chain analysis : The live‑streaming pipeline consists of push → media server → CDN → client, with client pull stages including network request, data read, demux, decode, and render. Although most optimizations focus on the pull side, push and server stages also affect start‑up speed.

Current status – protocol selection : Ximalaya uses RTMP for push streaming and HTTP‑FLV for pull streaming. RTMP is widely supported by CDNs, HTTP‑FLV offers similar real‑time performance with reduced protocol overhead, and HLS is also supported on Android.

Dynamic bitrate on the push side : Bandwidth‑aware adaptation adjusts video bitrate, frame rate, resolution, and audio bitrate according to current network conditions, ensuring smooth publishing.

Media server GOP caching : The server caches 1‑2 recent GOPs so that the first packet delivered to the client is an I‑frame, eliminating the need to wait for a keyframe and shortening first‑frame latency.

Start‑up optimization scheme – stage breakdown : The pull latency is divided into detailed stages such as businessCost, prepareMediaSourceCost, cdnRequestCost (including dnsCost, connectCost, firstPackageCost), flvHeaderCost, scriptTagCost, audioHeaderTagCost, videoHeaderTagCost, firstVideoTagCost, audioDecoderCreateCost, videoDecoderCreateCost, and firstFrameDecodeCost. Each stage is instrumented for data reporting and targeted for optimization.

Pre‑fetch stream URL : The stream URL is obtained on the list page, allowing the player to start playback immediately after entering a room, eliminating the serial request delay.

Component progressive loading : High‑priority components are loaded first, with on‑demand and asynchronous loading for less critical parts, following a priority‑based serial loading strategy.

Pre‑render : A SurfaceView is created in the container page to render decoded frames before the TextureView becomes available; once the TextureView’s surface is ready, frames are switched seamlessly.

HttpDNS : Domain resolution is performed via HTTP to bypass ISP DNS, reducing DNS latency by dozens of milliseconds and caching IPs for subsequent requests.

Player buffer water‑level management : The start‑up water level is set to 100 ms, enabling the player to begin playback as soon as a minimal buffer is filled.

Results : After a quarter of optimization, audio live‑stream instant‑open rate exceeds 90 % and video instant‑open rate exceeds 85 %, as shown in the performance charts.

Experience‑driven development philosophy : Each optimization is evaluated for overall benefit, designed, implemented, and validated through A/B testing, followed by iterative refinement.

Future outlook : Plans include adopting H.265 to halve bandwidth usage, using multiple player instances for smoother swipe transitions, and reusing decoders to avoid repeated initialization overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Live Streaming HTTP-FLV RTMP QoE buffer management

Written by

Ximalaya Technology Team

Official account of Ximalaya's technology team, sharing distilled technical experience and insights to grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.