Re‑engineering a Live‑Streaming Frontend to Cut Black‑Screen Errors & Boost Performance
Facing fragmented devices, high memory usage, and unreliable signaling, the Zuoyebang live‑streaming frontend was redesigned with container‑based isolation, a modular SDK, and state‑synchronization mechanisms, resulting in timely memory reclamation, reduced CPU load, and dramatically lower black‑screen incidents.
Background Introduction
Zuoyebang's live‑streaming room is built on signaling and provides teacher and student sides with synchronized rendering. It includes push/pull streams, media rendering, chat, courseware, mic‑link, and generic interaction. The courseware area hosts H5, Cocos, whiteboard, interactive questions, and mini‑games, all displayed inside a WebView that aggregates diverse content types.
Technical Challenges
Fragmented terminals cause stability issues for multiple content renderings.
Ensuring overall performance while controlling memory/CPU for sub‑modules.
Guaranteeing reliable signaling delivery and ordered execution.
Fault tolerance so a single content failure does not affect subsequent operations.
Continuous integration of different technology stacks.
Current Situation
The legacy architecture aggregates all business code into a massive module named zb , running in a persistent browser. Growing user scale, device diversity, and richer content have stressed stability and performance, leading to increased incident tickets such as black/white screens, freezes, lag, interactive question failures, and missing teacher notes.
Problem Analysis
WebView memory continuously rises; garbage collection cannot be triggered, causing out‑of‑memory black/white screens.
CPU spikes and memory bursts from animation cause lag.
Signaling loss across the full chain (service → NA → WebView) lacks a robust guarantee, resulting in coupled business logic.
Scheduling logic for different content is tightly coupled with deep business logic, making debugging hard.
Errors in any content can corrupt subsequent signaling and scheduling.
Running all content in a single WebView makes resource quantification difficult.
Courseware layers are unclear and heavily coupled.
Unclear boundaries between roles increase communication cost and reduce debugging efficiency.
Design Goals
Ensure timely memory reclamation at the lower layer.
Make the rendering runtime swappable.
Guarantee signaling reliability (NA/FE).
Isolate content.
Decouple scheduling logic from specific content.
Quantify and reduce memory/CPU per content.
Practice & Exploration
1. Container & SDK Design
Container Definition
A container is the carrier for any live‑room content. It can be a native container, a WebView container, or a Cocos container. Characteristics:
Carrier
Independent and dependency‑free
Has its own lifecycle
Supports independent development and deployment
With containers, scheduling logic can be extracted to the edge, focusing on task dispatch and signaling handling, decoupled from content.
Container State
Container SDK
The SDK abstracts container details, offering lifecycle hooks, custom debugging, extensibility, and a basic utility library.
2. Content Player
To standardize integration, a content player abstraction is introduced. Main players include H5 PPT, H5 interactive questions, Cocos courseware & interaction, and whiteboard.
3. H5 PPT Player
The H5 PPT player is layered from bottom to top:
Container layer – runtime environment.
Frontend foundation library.
Adaptation layer – bridges container lifecycle, registers signaling, and handles rendering.
Rendering layer.
4. Cocos Content Player Design & Optimization
Current issues:
Unstable signaling recovery.
Poor loading and recovery performance.
Difficult online debugging.
Analysis reveals lost or out‑of‑order signaling, heavy loading of the Cocos engine, and touch‑event‑driven synchronization that hides business logic.
Solutions & Technical Challenges
Refactor signaling recovery to state synchronization, separating business logic and reducing data volume.
Improve load performance and memory usage via early runtime initialization and production‑time content standardization.
State Synchronization Design
Capture input events and rendering data at the low level, decouple from business logic, and provide both incremental and full‑state updates.
Only synchronize full data on key frames; other frames transmit incremental changes to keep bandwidth low while ensuring eventual consistency.
5. Memory and CPU Optimization
Optimizations target WebView‑based content, including process isolation and Cocos‑specific tweaks.
1. Browser Rendering Process
Key CPU/memory consumers:
Complex JS code and DOM operations.
JS memory leaks.
Deep DOM trees causing layout/composite costs.
Complex CSS parsing.
Image decoding and rendering.
Reflows and repaints.
Excessive compositing layers.
2. Image Rendering Process
I/O
CPU decoding (bitmap → GPU)
Render pipeline (vertex processing, rasterization, fragment shading, framebuffer output)
3. Image Memory
Memory = width × height × (colorDepth/8) / (1024²). Larger resolutions dramatically increase memory usage.
4. Optimization Techniques
Image compression and cropping.
GIF frame extraction (e.g., 12 fps).
Animation frame rate reduction (e.g., Lottie to 12 fps).
On‑demand loading for interactive question panels.
Render GIFs as compositing layers to lower GPU memory.
Reduce signaling frequency and payload size.
Various whiteboard optimizations (not detailed).
Cocos framework optimizations as described earlier.
6. Summary
Black/white screen rate reduced to per‑thousand level.
Problem tracing becomes more efficient.
Complexity and iteration cost lowered.
Single‑content failures no longer affect subsequent playback.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
