Artificial Intelligence 13 min read

IQDubbing: AI-Powered Multi-Language, Multi-Voice Dubbing System for Film and TV

iQIYI’s IQDubbing system leverages AI‑driven voice conversion to automatically generate high‑quality, expressive dubbing in dozens of languages and over 50 character voice styles, streamlining multilingual film and TV localization, reducing reliance on scarce actors, and earning positive audience feedback, patents and industry awards.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
IQDubbing: AI-Powered Multi-Language, Multi-Voice Dubbing System for Film and TV

iQIYI has developed an intelligent dubbing system for film and TV called IQDubbing (奇声). The solution is built on multiple self‑developed AI technologies, with Voice Conversion (VC) as the core technique, and provides AI dubbing in multiple languages and voice styles with high expressiveness and naturalness.

System Highlights

Multi‑language support: Mandarin, Thai, Vietnamese and other languages.

Multi‑voice support: Over 50 voice models covering gender, age and character styles such as "strong female leader", "intellectual lady", "magnetic CEO", "sunny boy", etc.

High expressiveness: Fine‑grained restoration of emotions, intonation and prosody.

High naturalness: Near‑human audio quality that preserves the input emotion and content.

Project Background

1. Massive dubbing demand – more than 2,000 foreign movies have been imported in the past five years, but only a few have Mandarin dubbing. Domestic platforms also need to localize Chinese productions for overseas markets (Thai, Vietnamese, etc.).

2. Voice‑style matching difficulty – a single film can have 20+ characters, each requiring a suitable voice style; the problem is amplified for low‑resource languages.

3. Scarcity of professional voice actors – especially for niche languages and specific character personas.

4. Lack of international voice tracks – many productions do not have an "international voice" track, which is a prerequisite for dubbing.

To address these challenges, iQIYI built IQDubbing, integrating several AI models into a dubbing management platform.

System Workflow

Dialogue‑track creation: Single‑speaker recordings are processed with IQDubbing’s track‑checking and splitting tools, then the core multi‑language, multi‑voice VC model converts the single‑voice track into multiple character tracks.

International‑voice creation: AI models repair or generate missing international voice tracks to meet broadcast standards.

Mixing and production: Dialogue tracks and international voice tracks are mixed, with audio‑video synchronization checks.

Technical Framework

Hardware layer: Supports both GPU and CPU depending on model requirements.

Framework layer: Uses TensorFlow, PyTorch and traditional DSP algorithms.

Application‑algorithm layer: Core deep‑learning VC model, plus speaker verification, face recognition, NLP, denoising, EQ and other DSP modules.

System layer: Business logic is modularized to enable one‑click quality inspection, voice selection and production.

Voice Conversion (VC) Technology

The VC model converts one voice into another while preserving linguistic content and prosody, enabling a single actor to generate multiple character voices. Unlike conventional DSP‑based voice changers, AI‑based VC retains the original emotion and rhythm.

Corpus and Model Optimization

Corpus: Leverages large amounts of neutral TTS data and emotional speech for vocoder training; designs specific voice libraries for film‑style characters (e.g., "strong female leader", "sunny boy").

Model: Iterative improvements from a first‑generation recognition‑synthesis framework (Tacotron‑based) to a second‑generation framework that enhances prosody modeling and temporal resolution, using GAN‑based vocoders for higher fidelity.

Evaluation

Two dimensions are evaluated:

Technical: MOS scoring by local speakers for Mandarin, Thai, Vietnamese across gender, age, voice style and emotion.

Business: User‑level listening tests focusing on perceptible errors such as pronunciation mistakes or insufficient emotion.

Results: Over 60 movies and more than 200 TV series have been released on iQIYI’s domestic and overseas channels with positive audience feedback.

Awards and Publications

The research has produced 3 top‑conference papers (ICASSP, InterSpeech), 10+ patents, 5 software copyrights, and won the ChinaMM2022 "Innovation Product" award.

Reference list includes seminal works on voice conversion, prosody transfer, and GAN‑based speech synthesis.

Deep LearningSpeech SynthesisAI DubbingFilm ProductionMultilingual Speechvoice conversion
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.