Backend Development 11 min read

Understanding AI Live Streaming: Architecture, Protocols, and Practical Implementation

This article explains the AI live‑streaming architecture used by Liulishuo, covering the basic workflow of push, server, and pull, video/audio encoding and container formats, streaming protocols such as RTMP, HLS and HTTP‑FLV, SEI integration, and provides a step‑by‑step demo using nginx‑rtmp and FFmpeg.

Liulishuo Tech Team

Jul 29, 2020

Understanding AI Live Streaming: Architecture, Protocols, and Practical Implementation

Liulishuo's AI‑driven English teaching platform uses AI live streaming to let virtual teachers interact with students in real time, providing instant feedback based on student responses. The article introduces the overall live‑streaming workflow, the media protocols involved, and the supplementary information techniques used.

Basic Live‑Streaming Concepts

The live‑streaming system consists of three main components: the push side (captures and encodes audio/video, then sends it to the server), the server side (receives streams and forwards them), and the pull side (receives, demultiplexes, decodes, and renders the media).

Encoding and Decoding

To reduce bandwidth, video and audio are compressed using codecs such as H.264/HEVC/VP9 for video and AAC/MP3 for audio. Decoding restores the compressed streams for playback.

Container Formats

Container

Supported Video Codecs

Supported Audio Codecs

FLV

VP6 / H.264

MP3 / AAC

MP4

MPEG‑2 / H.264

AAC / AC‑3

MPEG‑2 / H.264

MPEG‑1 / AAC

Streaming Protocols

Common protocols include RTMP (TCP‑based, not HTML5‑compatible), HTTP‑FLV (HTTP‑based, HTML5‑compatible), and HLS (HTTP‑based, HTML5‑compatible). Each protocol defines how the push and pull ends communicate with the server and which container formats they support.

RTMP Details

RTMP, developed by Adobe, uses a handshake process where the client and server exchange three chunks each (C0‑C2 and S0‑S2). After the handshake, various message types (audio, video, data, command) are exchanged. Command messages, encoded with AMF, enable actions such as creating a stream and playing it.

SEI (Supplemental Enhancement Information)

SEI allows embedding time‑critical auxiliary data (e.g., slide changes, teacher feedback) directly into H.264 video streams. In the NAL unit, type 6 indicates an SEI payload, whose type and size are parsed by reading bytes until a 0xFF terminator is encountered.

H.264/AVC Structure

H.264 separates processing into the Video Coding Layer (VCL) and the Network Abstraction Layer (NAL). The NAL unit consists of a header and a payload (RBSP). The header’s bits indicate forbidden zero, reference level, and unit type (6 for SEI).

nal_unit(NumBytesInNALunit) {
    forbidden_zero_bit
    nal_ref_idc
    nal_unit_type
    NumBytesInRBSP = 0
    nalUnitHeaderBytes = 1
    // ... parsing logic ...
}

Demo Implementation

To set up a simple RTMP server, use the nginx‑rtmp‑module with the following configuration:

rtmp {
    server {
        listen 1935;
        application mytv {
            live on;
        }
    }
}

Push a local FLV file to the server using FFmpeg:

ffmpeg -re -i demo.flv -c copy -f flv rtmp://127.0.0.1:1935/mytv/room

Pull the stream with FFplay: ffplay rtmp://127.0.0.1:1935/mytv/room The article concludes with references to the nginx‑rtmp‑module repository, FFmpeg, and several standards documents.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Live Streaming NGINX FFmpeg Video Encoding RTMP SEI

Written by

Liulishuo Tech Team

Help everyone become a global citizen!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.