Artificial Intelligence 11 min read

Translate Foreign Videos into Chinese with Whisper, Ollama & FFmpeg

This guide shows how to automatically extract subtitles from English videos using OpenAI's Whisper, translate them into Chinese with a locally‑deployed Ollama large language model, and finally merge the bilingual subtitles back into the video using FFmpeg, all with GPU acceleration.

360 Zhihui Cloud Developer

Dec 17, 2024

Translate Foreign Videos into Chinese with Whisper, Ollama & FFmpeg

Many valuable learning resources are in non‑Chinese languages, such as Andrej Karpathy's talks or MIT's 6.824 distributed systems lectures. Translating these videos sentence‑by‑sentence is time‑consuming, but large‑model tools can automate the process.

1. Use Whisper to extract subtitles

Whisper is an open‑source speech‑recognition system from OpenAI that supports over a hundred languages. Install it with: pip install -U openai-whisper Then extract an SRT subtitle file from a video:

whisper video.mp4 --model turbo --language en --output_format srt

Key parameters: --model turbo – an optimized version of the large model, offering the same accuracy with eight times the speed. --language en – specifies the source language (Whisper can also auto‑detect). --output_format srt – outputs subtitles in SRT format, which we need for further processing.

2. Translate subtitles to Chinese with Ollama

Whisper can only translate to English, so we use Ollama to run a large language model locally. Deploy a model such as qwen2.5:32b, which handles Chinese well.

When translating, feed each subtitle line individually to avoid context‑length limits and to keep timestamps intact. Use a prompt that forces the model to output only the translation.

import requests, re

def parse_srt(content):
    """Parse SRT content and return a list of subtitle blocks."""
    pattern = r'(\d+)
(\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3})
((?:.*?
)*?)(?:
|$)'
    return re.findall(pattern, content, re.MULTILINE)

def translate_text(text):
    """Call Ollama API to translate a single English sentence to Chinese."""
    prompt = """You are a professional translation assistant. Translate the following English text into Chinese.
Only output the translated Chinese text, without any explanation or extra characters."""
    data = {"model": "qwen2.5:32b", "prompt": prompt + text.strip(), "stream": False}
    try:
        resp = requests.post('http://{ollamaapi}/api/generate', json=data, timeout=60)
        resp.raise_for_status()
        return resp.json()['response'].strip()
    except Exception as e:
        print(f"Translation error: {e}")
        return text

def translate_srt(input_file, output_file):
    with open(input_file, 'r', encoding='utf-8') as f:
        content = f.read()
    subtitle_blocks = parse_srt(content)
    output = ""
    total = len(subtitle_blocks)
    for i, block in enumerate(subtitle_blocks, 1):
        number, timestamp, text = block[0], block[1], block[2].strip()
        print(f"Translating {i}/{total}...")
        zh = translate_text(text)
        output += f"{number}
{timestamp}
{zh}

"
    with open(output_file, 'w', encoding='utf-8') as f:
        f.write(output)
    print(f"Translation completed, saved to {output_file}")

if __name__ == "__main__":
    dir_path = "[1hr Talk] Intro to Large Language Models"
    translate_srt(f"{dir_path}/en.srt", f"{dir_path}/zh.srt")

The resulting Chinese subtitles look like this:

3. Merge subtitles with the video using FFmpeg

With both English and Chinese SRT files ready, combine them with the original video so that viewers can toggle subtitles:

ffmpeg -i "[1hr Talk] Intro to Large Language Models.mkv" -i zh.srt -i en.srt -c:v copy -c:a copy -c:s srt -map 0 -map 1 -map 2 output.mkv

Explanation of key options: -c:v copy – copies the video stream without re‑encoding. -c:a copy – copies the audio stream. -c:s srt – tells FFmpeg to handle subtitle streams as SRT. -map – selects which streams from each input are included in the output.

After merging, the video can display either English subtitles, Chinese subtitles, or both simultaneously.

GPU acceleration (optional)

Both Whisper and Ollama can leverage GPU resources for dramatically faster processing. Platforms such as TAI provide various GPU instances (P4, T4, L20, A100, H800) and support interactive modeling, distributed training, and service deployment.

References

https://openai.com/index/whisper/

https://github.com/ollama/ollama

https://github.com/openai/whisper

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python AI GPU Acceleration Subtitle Translation Ollama Whisper

Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.