Understanding AI Live Streaming: Architecture, Protocols, and Practical Implementation
This article explains the AI live‑streaming architecture used by Liulishuo, covering the basic workflow of push, server, and pull, video/audio encoding and container formats, streaming protocols such as RTMP, HLS and HTTP‑FLV, SEI integration, and provides a step‑by‑step demo using nginx‑rtmp and FFmpeg.
Liulishuo's AI‑driven English teaching platform uses AI live streaming to let virtual teachers interact with students in real time, providing instant feedback based on student responses. The article introduces the overall live‑streaming workflow, the media protocols involved, and the supplementary information techniques used.
Basic Live‑Streaming Concepts
The live‑streaming system consists of three main components: the push side (captures and encodes audio/video, then sends it to the server), the server side (receives streams and forwards them), and the pull side (receives, demultiplexes, decodes, and renders the media).
Encoding and Decoding
To reduce bandwidth, video and audio are compressed using codecs such as H.264/HEVC/VP9 for video and AAC/MP3 for audio. Decoding restores the compressed streams for playback.
Container Formats
Container
Supported Video Codecs
Supported Audio Codecs
FLV
VP6 / H.264
MP3 / AAC
MP4
MPEG‑2 / H.264
AAC / AC‑3
TS
MPEG‑2 / H.264
MPEG‑1 / AAC
Streaming Protocols
Common protocols include RTMP (TCP‑based, not HTML5‑compatible), HTTP‑FLV (HTTP‑based, HTML5‑compatible), and HLS (HTTP‑based, HTML5‑compatible). Each protocol defines how the push and pull ends communicate with the server and which container formats they support.
RTMP Details
RTMP, developed by Adobe, uses a handshake process where the client and server exchange three chunks each (C0‑C2 and S0‑S2). After the handshake, various message types (audio, video, data, command) are exchanged. Command messages, encoded with AMF, enable actions such as creating a stream and playing it.
SEI (Supplemental Enhancement Information)
SEI allows embedding time‑critical auxiliary data (e.g., slide changes, teacher feedback) directly into H.264 video streams. In the NAL unit, type 6 indicates an SEI payload, whose type and size are parsed by reading bytes until a 0xFF terminator is encountered.
H.264/AVC Structure
H.264 separates processing into the Video Coding Layer (VCL) and the Network Abstraction Layer (NAL). The NAL unit consists of a header and a payload (RBSP). The header’s bits indicate forbidden zero, reference level, and unit type (6 for SEI).
nal_unit(NumBytesInNALunit) {
forbidden_zero_bit
nal_ref_idc
nal_unit_type
NumBytesInRBSP = 0
nalUnitHeaderBytes = 1
// ... parsing logic ...
}Demo Implementation
To set up a simple RTMP server, use the nginx‑rtmp‑module with the following configuration:
rtmp {
server {
listen 1935;
application mytv {
live on;
}
}
}Push a local FLV file to the server using FFmpeg:
ffmpeg -re -i demo.flv -c copy -f flv rtmp://127.0.0.1:1935/mytv/roomPull the stream with FFplay:
ffplay rtmp://127.0.0.1:1935/mytv/roomThe article concludes with references to the nginx‑rtmp‑module repository, FFmpeg, and several standards documents.
Liulishuo Tech Team
Help everyone become a global citizen!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.