Multimodal Video High‑Energy Segment Extraction for Dynamic Video Covers
The authors present a multimodal system that automatically extracts high‑energy video segments for dynamic covers by analyzing subtitles, audio, visual frames, and danmu, employing LLM prompt‑tuning, scene‑cut detection, and aesthetic scoring to reduce manual effort and boost click‑through rates.