History and Technical Overview of Web Audio/Video: From Early HTML to HTML5, Flash, Codecs, Canvas Playback and FFmpeg
This article traces the evolution of web audio and video from the static early HTML era through Flash's rise and fall, explains HTML5 video/audio support, discusses video and audio encoding, container formats, bitrate, playback pipelines, canvas‑based rendering, and provides practical FFmpeg command examples for developers.
Web Audio/Video Development History
In the early days of HTML, limited bandwidth and technology meant web pages were static, supporting only text and images without any ability to stream audio or video.
(Figure: Yahoo! in 1994)
Flash's Rise and Decline
At the turn of the 21st century, the demand for richer web content led to the emergence of Flash, which offered lightweight, cross‑platform vector animations and small‑size video playback even on dial‑up connections.
(Flash created many classic mini‑games, e.g., the stick‑man)
Flash thrived because HTML lacked native media support; it acted as a plugin handling heavy media tasks and gained widespread adoption after Adobe’s promotion, adding support for JavaScript, HTML, XML, and enhancing audio/video capabilities.
However, the 2007 iPhone abandoned Flash for better battery life and security, and Android followed suit in 2012, removing Flash from mobile platforms. Desktop browsers also phased out Flash: Chrome sandboxed it from version 42 and removed it entirely by version 88, Edge and Firefox similarly discontinued support.
The final blow came from superior web media solutions—HTML5.
HTML5's Arrival
Predicted in 2011, HTML5 introduced native <video> and <audio> tags, allowing browsers that support the standard to play media without any plugins.
<!-- A simple video tag -->
<video src="movie.mp4" poster="movie.jpg" controls></video>Major video portals now default to HTML5 playback.
Chrome 88 (Jan 2021) completely disabled Flash, marking its exit from the web.
What Is a Video?
A video is a sequence of images displayed rapidly; at 24 fps the human eye perceives continuous motion. Each image is a frame; the frame rate (fps) determines smoothness. Film typically uses 24 fps, while TV standards vary around 30 fps.
Film vs. Game Frame Rates
Films capture frames with a shutter speed of 1/24 s, producing natural motion blur that our eyes interpret as smooth. Games render each frame independently; limited GPU power may drop frames, causing visible stutter.
Game frames often show larger jumps between frames, making inconsistencies noticeable.
Video Encoding
Raw video (e.g., 1920×1080, 24 fps, 24‑bit RGB) would require ~8.8 GB per minute, so compression is essential. Video compression removes spatial and temporal redundancy using intra‑frame, inter‑frame, and entropy coding techniques.
Common codecs include H.264, MPEG‑4, VP8; browsers primarily support H.264.
Audio Encoding
Uncompressed audio (e.g., CD quality) also consumes large space; common compressed formats are WAV, MP3, AAC, which are widely supported across browsers.
Container Formats
Containers (MP4, AVI, RMVB, etc.) package video, audio, and metadata together. The container is independent of the codec; an MP4 file may contain H.264 video or MPEG‑4 video, affecting playback compatibility.
Bitrate
Bitrate (bits per second) determines file size and visual quality. Typical 1080p H.264 movies run at ~10 Mbps, Blu‑ray at ~20 Mbps, while low‑quality streams may be as low as 5 Mbps.
Video Player Pipeline
Playback involves: Demux → Demuxer → Decoder → Audio/Video Sync . Network streams use protocols (HTTP, RTMP, MMS) that carry both media data and control signals; demuxing extracts pure media streams.
Canvas‑Based Video Playback
When HTML5 <video> is insufficient, developers can draw video frames onto a canvas using ctx.drawImage(video, x, y, width, height) . By repeatedly capturing frames at a set interval, video can be rendered on the canvas.
<video id="video" controls style="display:none;">
<source src="https://xxx.com/vid_159411468092581" />
</video>
<canvas id="myCanvas" width="460" height="270" style="border:1px solid blue;"></canvas>
<div>
<button id="playBtn">Play</button>
<button id="pauseBtn">Pause</button>
</div>
const video = document.querySelector("#video");
const canvas = document.querySelector("#myCanvas");
const playBtn = document.querySelector("#playBtn");
const pauseBtn = document.querySelector("#pauseBtn");
const ctx = canvas.getContext("2d");
let timerId = null;
function draw() {
if (video.paused || video.ended) return;
ctx.clearRect(0, 0, canvas.width, canvas.height);
ctx.drawImage(video, 0, 0, canvas.width, canvas.height);
timerId = setTimeout(draw, 0);
}
playBtn.addEventListener("click", () => {
if (!video.paused) return;
video.play();
draw();
});
pauseBtn.addEventListener("click", () => {
if (video.paused) return;
video.pause();
clearTimeout(timerId);
});Libraries such as JSMpeg provide ready‑made canvas or WebGL renderers for MPEG‑TS streams.
Transport Stream (TS)
TS (MPEG‑2 Transport Stream) is a container designed for reliable streaming; each segment can be decoded independently, making it suitable for low‑latency playback.
FFmpeg – The Ultimate Media Toolbox
FFmpeg is an open‑source command‑line tool for converting, trimming, cropping, and processing audio/video.
brew install ffmpegTo convert an MP4 to a JSMpeg‑compatible TS file:
$ ffmpeg -i input.mp4 -f mpegts \
-codec:v mpeg1video -s 640x360 -b:v 1500k -r 25 -bf 0 \
-codec:a mp2 -ar 44100 -ac 1 -b:a 64k \
output.ts-i : input file (e.g., input.mp4 )
-f : output container format ( mpegts )
-codec:v : video codec ( mpeg1video )
-s : resolution (e.g., 640x360 )
-b:v : video bitrate (e.g., 1500k )
-r : frame rate (e.g., 25 )
-bf : number of B‑frames (usually 0 )
-codec:a : audio codec ( mp2 )
-ar : audio sample rate ( 44100 )
-ac : audio channels ( 1 )
-b:a : audio bitrate ( 64k )
B‑frames are bidirectional predicted frames that improve compression by storing differences relative to surrounding frames.
Further Resources
For deeper learning, see the works of Leixiaohua (雷霄骅) and tutorials on WebRTC, FFmpeg, and video processing.
References
[1]
JSMpeg:
https://github.com/phoboslab/jsmpeg
[2]
FFmpeg:
https://www.ffmpeg.org/
[3]
Leixiaohua:
https://link.juejin.cn/?target=https://blog.csdn.net/leixiaohua1020
[4]
"[Summary] Audio/Video Codec Basics" – Leixiaohua blog
[5]
Real‑time communication with WebRTC:
https://codelabs.developers.google.com/codelabs/webrtc-web
[6]
Li Chao – imooc tutorial:
https://www.imooc.com/t/4873493
ByteFE
Cutting‑edge tech, article sharing, and practical insights from the ByteDance frontend team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.