Design and Implementation of a Live Streaming Highlight System with AI Optimization
The paper details a live‑streaming highlight system that integrates heterogeneous data sources, uses a three‑stage pipeline with MySQL/Redis storage, applies sliding‑window interval optimization and AI‑driven title generation, scoring, and segment selection, managed by a shared state‑machine, and outlines future stability and observability improvements.
The document presents a comprehensive technical overview of a live‑streaming highlight ("high‑light") system, describing its business value, architecture, data generation mechanisms, storage design, interval optimization algorithms, AI‑driven enhancements, state‑machine management, and future roadmap.
Background : Live streaming replay is essential for user engagement and data analysis. Highlight segments capture memorable moments, improve fan interaction, and provide valuable data for content creators.
System Overview : The high‑light system integrates multiple heterogeneous data sources (danmaku, interaction logs, revenue, PK games, voice chat, AI‑generated content) and must support high concurrency for fan‑generated clips.
Architecture : The system follows a three‑stage pipeline – data generation (active and passive triggers), unified data ingestion, and downstream processing. Active triggers are initiated by anchors or fans after a stream, while passive triggers use RPC or MQ for real‑time highlight creation.
Data Generation :
Active trigger creates a task when no existing highlight exists for a session.
Passive trigger processes real‑time requests (e.g., AI‑detected events) via RPC/MQ.
Data Storage : The solution combines MySQL for persistent storage and Redis for caching status flags.
MySQL schema (highlight_get_record):
CREATE TABLE `highlight_get_record` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
`uid` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户ID',
`roomid` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户ID',
`live_key` varchar(100) NOT NULL DEFAULT '' COMMENT '场次id',
`highlight_type` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光类型',
`status` tinyint(20) unsigned NOT NULL DEFAULT '0' COMMENT '查询状态 0 未成功 1成功',
-- ......
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='高光查询记录表';MySQL schema (highlight_data):
CREATE TABLE `highlight_data` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT 'id',
`uid` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户ID',
`roomid` bigint(20) NOT NULL DEFAULT '0' COMMENT '用户ID',
`live_key` varchar(100) NOT NULL DEFAULT '' COMMENT '场次id',
`highlight_type` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光类型',
`title` varchar(256) NOT NULL DEFAULT '' COMMENT '标题',
`start_time` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光片段开始时间',
`end_time` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光片段结束时间',
`score` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光打分',
`status` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '高光状态',
-- ......
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='高光查询记录表';Redis caches the status of each highlight using a composite key of uid:live_key:highlight_type , reducing DB load.
Highlight Interval Optimization : To locate high‑value segments in long streams (often >6 hours), the system extracts minute‑level metrics (PCU, danmaku, revenue) and applies sliding‑window scoring. The algorithm computes average values over windows of varying sizes (e.g., 3‑point vs. 4‑point averages) and uses weighted coefficients (k₁, k₂, …) to decide whether a longer window should be selected. Formulas for area approximation and coefficient selection are illustrated in the original figures.
AI‑Driven Enhancements :
Title generation: ASR transcribes audio, and a large language model creates engaging titles.
Quality scoring: AI evaluates each clip’s audio‑derived transcript and assigns a score for ranking.
Precise segment selection: Sentence‑level scoring of subtitles determines the most coherent highlight interval.
Challenges include model QPS pressure, caching of generated results, and model tuning via online A/B testing.
State‑Machine Optimization : A shared state‑machine abstracts lifecycle stages across modules, reducing duplicated logic and simplifying maintenance.
Data Presentation : The highlight API returns sorted highlights based on type, AI score, and duration, while filtering overlapping segments using a configurable overlap threshold.
Future Plans : Improve service stability, conduct failure‑drills, build unified observability tools, and continue AI‑based feature enhancements to boost user experience.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.