Translate Foreign Videos into Chinese with Whisper, Ollama & FFmpeg
This guide shows how to automatically extract subtitles from English videos using OpenAI's Whisper, translate them into Chinese with a locally‑deployed Ollama large language model, and finally merge the bilingual subtitles back into the video using FFmpeg, all with GPU acceleration.
Many valuable learning resources are in non‑Chinese languages, such as Andrej Karpathy's talks or MIT's 6.824 distributed systems lectures. Translating these videos sentence‑by‑sentence is time‑consuming, but large‑model tools can automate the process.
1. Use Whisper to extract subtitles
Whisper is an open‑source speech‑recognition system from OpenAI that supports over a hundred languages. Install it with:
<code>pip install -U openai-whisper</code>Then extract an SRT subtitle file from a video:
<code>whisper video.mp4 --model turbo --language en --output_format srt</code>Key parameters:
--model turbo – an optimized version of the large model, offering the same accuracy with eight times the speed.
--language en – specifies the source language (Whisper can also auto‑detect).
--output_format srt – outputs subtitles in SRT format, which we need for further processing.
2. Translate subtitles to Chinese with Ollama
Whisper can only translate to English, so we use Ollama to run a large language model locally. Deploy a model such as qwen2.5:32b , which handles Chinese well.
When translating, feed each subtitle line individually to avoid context‑length limits and to keep timestamps intact. Use a prompt that forces the model to output only the translation.
<code>import requests, re
def parse_srt(content):
"""Parse SRT content and return a list of subtitle blocks."""
pattern = r'(\d+)\n(\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3})\n((?:.*?\n)*?)(?:\n|$)'
return re.findall(pattern, content, re.MULTILINE)
def translate_text(text):
"""Call Ollama API to translate a single English sentence to Chinese."""
prompt = """You are a professional translation assistant. Translate the following English text into Chinese.\nOnly output the translated Chinese text, without any explanation or extra characters."""
data = {"model": "qwen2.5:32b", "prompt": prompt + text.strip(), "stream": False}
try:
resp = requests.post('http://{ollamaapi}/api/generate', json=data, timeout=60)
resp.raise_for_status()
return resp.json()['response'].strip()
except Exception as e:
print(f"Translation error: {e}")
return text
def translate_srt(input_file, output_file):
with open(input_file, 'r', encoding='utf-8') as f:
content = f.read()
subtitle_blocks = parse_srt(content)
output = ""
total = len(subtitle_blocks)
for i, block in enumerate(subtitle_blocks, 1):
number, timestamp, text = block[0], block[1], block[2].strip()
print(f"Translating {i}/{total}...")
zh = translate_text(text)
output += f"{number}\n{timestamp}\n{zh}\n\n"
with open(output_file, 'w', encoding='utf-8') as f:
f.write(output)
print(f"Translation completed, saved to {output_file}")
if __name__ == "__main__":
dir_path = "[1hr Talk] Intro to Large Language Models"
translate_srt(f"{dir_path}/en.srt", f"{dir_path}/zh.srt")
</code>The resulting Chinese subtitles look like this:
3. Merge subtitles with the video using FFmpeg
With both English and Chinese SRT files ready, combine them with the original video so that viewers can toggle subtitles:
<code>ffmpeg -i "[1hr Talk] Intro to Large Language Models.mkv" -i zh.srt -i en.srt -c:v copy -c:a copy -c:s srt -map 0 -map 1 -map 2 output.mkv</code>Explanation of key options:
-c:v copy – copies the video stream without re‑encoding.
-c:a copy – copies the audio stream.
-c:s srt – tells FFmpeg to handle subtitle streams as SRT.
-map – selects which streams from each input are included in the output.
After merging, the video can display either English subtitles, Chinese subtitles, or both simultaneously.
GPU acceleration (optional)
Both Whisper and Ollama can leverage GPU resources for dramatically faster processing. Platforms such as TAI provide various GPU instances (P4, T4, L20, A100, H800) and support interactive modeling, distributed training, and service deployment.
References
https://openai.com/index/whisper/
https://github.com/ollama/ollama
https://github.com/openai/whisper
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.