Backend Development 11 min read

Understanding AI Live Streaming: Architecture, Protocols, and Practical Implementation

This article explains the AI live‑streaming architecture used by Liulishuo, covering the basic workflow of push, server, and pull, video/audio encoding and container formats, streaming protocols such as RTMP, HLS and HTTP‑FLV, SEI integration, and provides a step‑by‑step demo using nginx‑rtmp and FFmpeg.

Liulishuo Tech Team
Liulishuo Tech Team
Liulishuo Tech Team
Understanding AI Live Streaming: Architecture, Protocols, and Practical Implementation

Liulishuo's AI‑driven English teaching platform uses AI live streaming to let virtual teachers interact with students in real time, providing instant feedback based on student responses. The article introduces the overall live‑streaming workflow, the media protocols involved, and the supplementary information techniques used.

Basic Live‑Streaming Concepts

The live‑streaming system consists of three main components: the push side (captures and encodes audio/video, then sends it to the server), the server side (receives streams and forwards them), and the pull side (receives, demultiplexes, decodes, and renders the media).

Encoding and Decoding

To reduce bandwidth, video and audio are compressed using codecs such as H.264/HEVC/VP9 for video and AAC/MP3 for audio. Decoding restores the compressed streams for playback.

Container Formats

Container

Supported Video Codecs

Supported Audio Codecs

FLV

VP6 / H.264

MP3 / AAC

MP4

MPEG‑2 / H.264

AAC / AC‑3

TS

MPEG‑2 / H.264

MPEG‑1 / AAC

Streaming Protocols

Common protocols include RTMP (TCP‑based, not HTML5‑compatible), HTTP‑FLV (HTTP‑based, HTML5‑compatible), and HLS (HTTP‑based, HTML5‑compatible). Each protocol defines how the push and pull ends communicate with the server and which container formats they support.

RTMP Details

RTMP, developed by Adobe, uses a handshake process where the client and server exchange three chunks each (C0‑C2 and S0‑S2). After the handshake, various message types (audio, video, data, command) are exchanged. Command messages, encoded with AMF, enable actions such as creating a stream and playing it.

SEI (Supplemental Enhancement Information)

SEI allows embedding time‑critical auxiliary data (e.g., slide changes, teacher feedback) directly into H.264 video streams. In the NAL unit, type 6 indicates an SEI payload, whose type and size are parsed by reading bytes until a 0xFF terminator is encountered.

H.264/AVC Structure

H.264 separates processing into the Video Coding Layer (VCL) and the Network Abstraction Layer (NAL). The NAL unit consists of a header and a payload (RBSP). The header’s bits indicate forbidden zero, reference level, and unit type (6 for SEI).

nal_unit(NumBytesInNALunit) {
    forbidden_zero_bit
    nal_ref_idc
    nal_unit_type
    NumBytesInRBSP = 0
    nalUnitHeaderBytes = 1
    // ... parsing logic ...
}

Demo Implementation

To set up a simple RTMP server, use the nginx‑rtmp‑module with the following configuration:

rtmp {
    server {
        listen 1935;
        application mytv {
            live on;
        }
    }
}

Push a local FLV file to the server using FFmpeg:

ffmpeg -re -i demo.flv -c copy -f flv rtmp://127.0.0.1:1935/mytv/room

Pull the stream with FFplay:

ffplay rtmp://127.0.0.1:1935/mytv/room

The article concludes with references to the nginx‑rtmp‑module repository, FFmpeg, and several standards documents.

Live StreamingNginxFFmpegVideo EncodingRTMPSEI
Liulishuo Tech Team
Written by

Liulishuo Tech Team

Help everyone become a global citizen!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.