Operations 12 min read

How We Slimmed Down Youku’s Playback SDK: Cutting Threads, Memory, and Power

This article details the systematic refactoring of Youku’s cross‑platform playback core, describing how redundant threads were removed, memory usage was cut by two‑thirds, and CPU‑driven power consumption was reduced, resulting in a leaner, faster, and more energy‑efficient SDK.

Youku Technology

Aug 13, 2020

How We Slimmed Down Youku’s Playback SDK: Cutting Threads, Memory, and Power

The Youku playback core is a proprietary SDK built on a pipeline architecture that abstracts platform differences while exposing rich business logic. Over time, extensive cross‑team collaboration and continuous iteration made the core bloated, leading to high memory consumption, excessive thread count, and elevated power usage, especially problematic for short‑video multi‑instance scenarios.

Overview of the Original Architecture

The original SDK consists of an interface layer, an engine for command handling and message reporting, a filter layer for message forwarding, a module layer for core processing, a data download module, and rendering/post‑processing modules. The thread count approached 30, far exceeding comparable open‑source players such as ijkplayer.

Refactor Goals

Fewer threads

Smaller memory footprint

Lower power consumption

Thread Reduction

The analysis identified three categories of threads: essential, reusable, and redundant. By defining a minimal thread set required for playback, the team reduced the count from nearly 30 to 12 (including quality‑monitoring threads) and to 10 when subtitles are disabled.

Key retained threads include:

engine – receives interface commands and reports kernel messages

source – reads data and drives the pipeline

decoder (audio & video) – decodes media streams

consumer (audio & video) – synchronizes and renders output

hal buffer – demuxing and cache state monitoring

ykstream – controls the download module and interacts with segment parsing

render – manages rendering

Redundant threads removed:

Extra filter threads – merged filter logic into engine’s prepare phase.

Message dispatcher and clock manager – unified all reporting through engine and eliminated the dedicated timer thread.

Interface command and message reporting threads – reduced after improving force‑stop handling and relying on ANR fallback.

Demux and secondary cache threads – kept only three essential threads for data handling.

Pre‑load manager and subtitle decoding module – made pre‑load optional and removed subtitle decoding as text can be parsed directly after reading.

Memory Trimming

Memory hotspots were identified in download buffers, pipeline buffers, message structures, and class objects. Optimizations included:

Consolidating duplicated codec contexts per packet, cutting memory use by ~33%.

Reducing cache buffer sizes to align with competitor settings and avoid excessive buffering.

Eliminating secondary cache in the pipeline, shrinking pipeline memory from 3.5 MiB to 0.5 MiB.

Replacing the heavyweight AMessage structure (≈4 KB each) with a lightweight custom equivalent, reducing total message memory from >6 MiB to a fraction of that.

After these changes, peak memory consumption dropped to roughly one‑third of the original value.

Power Consumption Optimization

Power usage is driven mainly by CPU load and network request duration. The following measures were applied:

Further thread cuts (already covered in the thread‑reduction step).

Batching network reads to avoid frequent Wi‑Fi/4G wake‑ups; the kernel now restarts downloads only after the buffer falls below two‑thirds of its capacity.

Replacing vector push‑front operations with a list that appends to the tail, eliminating costly CPU spikes during large‑scale data insertion.

Switching Android OMX decoding from synchronous to asynchronous mode on API 28+, reducing CPU cycles spent in queue/dequeue loops.

Removing unnecessary calculations in the speed‑adjustment algorithm, cutting audio consumer CPU usage.

Moving barrage (danmaku) rendering into the kernel layer, decreasing UI‑level processing and cutting barrage‑related power draw by two‑thirds.

Post‑optimization measurements show average CPU usage below 7% on mid‑range Android devices, with a 1080p 90‑minute video consuming 12% less power—a 30% improvement over the original implementation.

Conclusion

The refactor dramatically “slimmed” the playback core: code logic became clearer, data flow more efficient, memory usage fell to one‑third, and power consumption dropped substantially, enabling many more concurrent instances and a noticeably better user experience. Ongoing monitoring of memory and power metrics, coupled with regular small‑scale refactors, is recommended to prevent future bloat as the product evolves.

SDK playback thread optimization media engineering memory reduction power saving

Written by

Youku Technology

Discover top-tier entertainment technology here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.