How to Build a High‑Concurrency, Low‑Latency Live Streaming System for Online Education

This article details the design and implementation of a self‑developed interactive live‑streaming platform that supports massive concurrent users and ultra‑low latency for online education, covering business scenarios, technical abstractions, key low‑latency and high‑concurrency techniques, and real‑world performance results.

Zuoyebang Tech Team
Zuoyebang Tech Team
Zuoyebang Tech Team
How to Build a High‑Concurrency, Low‑Latency Live Streaming System for Online Education

The rapid growth of online education has introduced new live‑streaming requirements, especially for immersive classrooms that demand real‑time interaction and low latency, which traditional RTMP‑based solutions can no longer satisfy.

ZuoYeBang built a self‑developed interactive live‑streaming system that can handle high concurrency and low latency. This article explains the business scenarios, abstracts the technical categories, and dives into three key points: low latency, high concurrency, and overall architecture.

Online‑Education Business Scenarios

1v1 tutoring – requires bidirectional interaction.

Large class – one teacher, many students, primarily one‑way broadcast with optional 1v1 breakout.

Interactive large class – low‑latency broadcast where the teacher can see all students' video feeds.

Super small class – group‑based real‑time interaction similar to video conferencing.

Small class – true small‑group video conference with audio/video interaction.

Technical Abstraction

1v1 real‑time audio/video (including 1vN or 6v6 variations).

1vN live streaming – both high‑latency and low‑latency modes.

N‑vN real‑time audio/video for multi‑user interaction.

Choosing the appropriate technical model for each scenario is the core task for developers.

Key Technical Points

Low Latency

Use UDP at the transport layer; the application layer can employ standard or custom protocol stacks.

Compress processing time at every link (pre‑processing, network transmission, etc.).

Dynamic buffer control (e.g., WebRTC neteq) to adapt to network conditions.

High Concurrency

Maximize single‑machine performance through extreme optimization.

Deploy a clustered, distributed IDC architecture.

Adopt a hierarchical tree structure for scaling.

Overall Architecture

The system consists of two major parts:

Global intelligent scheduling – nearest‑edge access, optimal routing, and global media‑stream management with CPU and network load control.

Distributed IDC delivery – zrelay for inter‑IDC RTC stream relay, zrtcpush for edge push, and zrtcpull for edge pull. Push and pull are separated to protect the source server from massive pull traffic.

When a teacher in Beijing starts streaming, the scheduling system determines the best edge IP based on geographic and ISP information, returns it to the teacher, and the stream is pushed to a nearby zrtcpush node, then relayed via zrelay and registered in the scheduling system. Students in Guangzhou request the best edge IP, pull from the nearest zrtcpull, which fetches the stream from the appropriate zrelay node.

Protocol Simplification

The system treats streams as independent entities without a room concept, reducing complexity and improving concurrency.

Signaling is performed over simple HTTP instead of long‑lived connections. The pull‑stream workflow is:

Client sends /signaling/pull request to the signaling service, which forwards it to the zrtc service. zrtc generates an SDP offer and returns it to the signaling service.

The signaling service returns the offer as an HTTP response to the client.

Client creates an SDP answer and sends it via /signaling/sendanswer to the signaling service, which forwards it to zrtc.

Both sides now have each other's SDP and can proceed with ICE negotiation.

For WebRTC security, DTLS key exchange is optional; disabling encryption can improve performance when security requirements are low.

Inter‑IDC Relay with KCP

KCP is chosen for inter‑IDC media transport because it sacrifices bandwidth to reduce latency by 30‑40% (up to three‑fold) while providing reliable delivery, simplifying loss handling.

Multi‑Core Distribution Model

To support high concurrency, a single server’s performance is pushed to the limit, then a multi‑core distribution model is applied:

Users are assigned to CPU cores using uid % coreCount.

Each core maintains a queue for sharing media streams with other cores, using lock‑free queues and message notifications to avoid mutex contention.

Shared pointers reduce memory copying; multi‑queue NICs are used for optimal network I/O.

Performance test: on a server with an Intel® Xeon® Silver 4110 (32 cores, 128 GB RAM, 2 Gbps external bandwidth), 76 streams at 300‑400 kbps each were handled, supporting up to 4 500 concurrent pull streams.

Uplink Push Optimization

For 1‑to‑many scenarios, unstable uplink streams cause poor downstream experience. The system introduces a forwarding control module:

Video: if packet loss or reordering is detected, pause forwarding until packets are reordered or an I‑frame arrives; then discard pre‑I‑frame data and resume.

Audio: pause forwarding on loss/reordering until packets are ordered or a timeout (min = 10 × RTT, max = 300 ms) expires, then resume.

Long‑term tests across four IDC sites for 24 hours showed no stream interruption and stable latency with perfect audio‑video sync.

Summary

High Concurrency Recommendations

Design lightweight architectures; heavy designs create bottlenecks.

Use lightweight protocols and processes.

Maximize single‑machine performance.

Optimize socket send/receive to the extreme.

Employ multi‑core distribution with lock‑free queues and message notifications.

Leverage shared pointers to avoid frequent memory copies.

Consider multi‑queue NICs.

Low Latency Recommendations

Prefer UDP transport.

Use KCP for inter‑IDC relay to cut latency.

Optimize the full link; control buffers dynamically (e.g., WebRTC neteq) to reduce jitter.

Building an excellent system requires patience, craftsmanship, and continuous refinement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

backend architecturelive streamingHigh Concurrencylow-latencyWebRTCKCP
Zuoyebang Tech Team
Written by

Zuoyebang Tech Team

Sharing technical practices from Zuoyebang

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.