Artificial Intelligence 9 min read

Turn Photos and Videos into Cartoons with the Open‑Source Cartoonize AI

Cartoonize is an open‑source web application that leverages a white‑box GAN model to convert images and short videos into high‑quality cartoon style, offering easy Docker or virtualenv installation, detailed usage instructions, and insights into the underlying research paper.

Programmer DD

Aug 12, 2020

Turn Photos and Videos into Cartoons with the Open‑Source Cartoonize AI

Recently, based on a ByteDance‑coauthored AI paper, a web application called Cartoonize was released that can cartoonize images and videos with a single click.

Features

Cartoonize is now open source; users simply upload an image or a video (up to 10 seconds, max 30 MB, formats mp4/webm/avi/mkv for video and jpeg/png for images) and obtain cartoonized results that preserve fine details. The tool supports four scenarios: rapid animation prototyping, artistic creation, game asset generation without motion capture, and modeling assistance for designers.

The conversion also works for short video clips, as demonstrated with a sample from "Avengers".

Algorithm

Cartoonize uses Algorithmia’s Serverless AI Layer to perform inference on the uploaded media.

Test Environment

python 3.7

tensorflow 2.1.0

tf_slim 1.1.0

CUDA 10.1

Linux (Ubuntu 18.04)

Installation

Using Docker

Build the image: docker build -t cartoonize . Run the container: docker run -p 8080:8080 cartoonize Before building, edit config.yaml with appropriate values.

Using virtualenv

Create and activate a virtual environment:

virtualenv -p python3 cartoonize
source cartoonize/bin/activate

Install Python dependencies: pip install -r requirements.txt Run the web app (ensure config.yaml is configured): python app.py Project homepage: https://cartoonize-lkqov62dia-de.a.run.app/cartoonize

White‑box Cartoonization Paper

The underlying technology comes from the CVPR 2020 paper “Learning to Cartoonize Using White‑box Cartoon Representations”, authored by researchers from ByteDance, the University of Tokyo, and Style2Paints. The paper proposes a GAN‑based, controllable, white‑box model that extracts three cartoon representations: weighted low‑frequency content for outlines, adaptive color segmentation for structure, and texture preservation for details.

The separately extracted cartoon representations enable the cartoonization problem to be optimized end‑to‑end within a Generative Neural Networks (GAN) framework, making it scalable, controllable, and easy to fine‑tune for diverse artistic demands.

A partial open‑source implementation of the white‑box model is available.

Prerequisites

Training code: Linux or Windows

NVIDIA GPU with CUDA/cuDNN

Inference code: Linux, Windows, macOS

Installation for White‑box Model

Assume NVIDIA GPU and CUDA/cuDNN are installed.

Install tensorflow‑gpu (tested with 1.12.0 and 1.13.0rc0).

Install scikit‑image==0.14.5 (other versions may cause issues).

Pre‑trained Model Inference

Place test images in /test_code/test_images.

Run /test_code/cartoonize.py.

Results are saved in /test_code/cartoonized_images.

Training

Put training data in the appropriate folders under /dataset.

Run pretrain.py (outputs to /pretrain).

Run train.py (outputs to /train_cartoon).

Note: Code is stripped from production; some minor issues may arise but are easy to fix.

Pre‑trained VGG‑19 model can be downloaded from Google Drive .

Dataset

Training cartoon images are not provided due to copyright, but can be assembled.

Landscape images are sourced from films by Makoto Shinkai, Hayao Miyazaki, and Mamoru Hosoda.

Frames are extracted, randomly cropped, and resized to 256×256.

Portraits come from Kyoto Animation and PA Works; face regions are detected using lbpcascade_animeface .

Manual cleaning greatly improves dataset quality.