AutoClip: One‑Click AI Video Highlight Extraction and Editing
AutoClip is an open‑source, locally‑run tool that uses Alibaba's Qwen large language model and OpenAI Whisper to automatically download, transcribe, analyze, and cut high‑light segments from YouTube or Bilibili videos, offering real‑time task monitoring, smart collections, preview, Docker deployment, and a roadmap of future AI‑driven features.
What it does
Video creators often need to extract the most valuable segments from long recordings. AutoClip lets an AI watch the video, understand its content, score each segment, and automatically cut high‑light clips. The user supplies only a video link or file.
Input methods
YouTube link – paste the URL, the tool downloads the video and extracts subtitles.
Bilibili link – supports BV numbers or full URLs.
Local upload – upload an existing file directly.
AI video‑understanding pipeline
The core pipeline consists of seven steps powered by Alibaba Qwen large language model and OpenAI Whisper:
Download + subtitle extraction – yt-dlp downloads the video; Whisper transcribes the audio.
AI‑generated outline – the LLM reads the transcript and produces a structured summary.
Topic timeline segmentation – identifies where key topics appear on the timeline.
Highlight scoring – evaluates each segment for information density and viewing value.
Automatic title generation – creates a catchy title for each clip.
Smart collection recommendation – clusters clips by thematic similarity and suggests groupings.
Video export – FFmpeg trims and assembles the final clips.
The entire process runs locally, so video content never leaves the user’s machine.
Real‑time task management
AutoClip uses a Celery asynchronous task queue together with WebSocket push notifications. Users can monitor processing progress, view status, and see output for each task without manual page refresh. Multiple projects are handled in parallel.
Smart collections
Beyond single‑video highlights, the system can automatically group clips from different videos that share the same theme. For example, after processing ten interview videos, all segments discussing “entrepreneurial experience” can be aggregated into one collection, while users may manually reorder or filter the clips.
Preview and export
Before exporting, every cut can be previewed directly in the browser. When satisfied, a single click renders the final video, eliminating the need for external editing software.
Technology stack
Frontend : React 18, TypeScript, Ant Design, Vite, Zustand.
Backend : FastAPI, Celery, Redis, SQLite, yt-dlp, FFmpeg.
AI engine : Alibaba Qwen LLM for video understanding + OpenAI Whisper for speech‑to‑text.
One‑click deployment (≈3 minutes)
git clone https://github.com/zhouxiaoka/autoclip.git
cd autoclip
./docker-start.shAfter launch, the services are reachable at:
Frontend UI: localhost:3000 API docs: localhost:8000/docs Task monitor: localhost:5555 System requirements: 4 GB + RAM, 10 GB + disk space, and macOS, Linux, or Windows (WSL).
Planned features
Bilibili one‑click upload – cut and publish directly.
AI‑generated cover images – automatically select the best frame.
Multilingual subtitle translation – generate Chinese subtitles for English videos.
Visual subtitle editor – edit subtitles on the timeline.
Desktop client (beta recruitment).
Core value
Extract multiple highlights from a single video using AI analysis and automatic editing, run locally for lightweight, efficient, privacy‑preserving processing, and let creators focus on creativity while repetitive work is handled by AI.
Project repository
GitHub: https://github.com/zhouxiaoka/autoclip (MIT license)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
