Artificial Intelligence 14 min read

Cartoonizing GIFs and Videos with PyTorch: Code and Usage Guide

This article presents a Python-based tutorial for cartoonizing GIFs and videos using PyTorch models, detailing environment setup, core network architecture, processing functions, and practical results with code snippets and usage instructions, including visual examples.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Cartoonizing GIFs and Videos with PyTorch: Code and Usage Guide

This tutorial demonstrates how to transform GIF animations and video clips into a cartoon style using deep‑learning models implemented in PyTorch. It covers the required environment, the network architecture, processing pipelines, and shows sample results.

Environment dependencies – In addition to the libraries from the previous article, the following packages are required (see requirements.txt image):

Core network code

<code>from PIL import Image, ImageEnhance, ImageSequence</code>
<code>import torch</code>
<code>from torchvision.transforms.functional import to_tensor, to_pil_image</code>
<code>from torch import nn</code>
<code>import os</code>
<code>import torch.nn.functional as F</code>
<code>import uuid</code>
<code>import imageio</code>

<code># -------------------------- hy add 01 --------------------------</code>
<code>class ConvNormLReLU(nn.Sequential):
    def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1, pad_mode="reflect", groups=1, bias=False):
        pad_layer = {
            "zero": nn.ZeroPad2d,
            "same": nn.ReplicationPad2d,
            "reflect": nn.ReflectionPad2d,
        }
        if pad_mode not in pad_layer:
            raise NotImplementedError
        super(ConvNormLReLU, self).__init__(
            pad_layer[pad_mode](padding),
            nn.Conv2d(in_ch, out_ch, kernel_size=kernel_size, stride=stride, padding=0, groups=groups, bias=bias),
            nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True),
            nn.LeakyReLU(0.2, inplace=True)
        )
</code>
<code>class InvertedResBlock(nn.Module):
    def __init__(self, in_ch, out_ch, expansion_ratio=2):
        super(InvertedResBlock, self).__init__()
        self.use_res_connect = in_ch == out_ch
        bottleneck = int(round(in_ch * expansion_ratio))
        layers = []
        if expansion_ratio != 1:
            layers.append(ConvNormLReLU(in_ch, bottleneck, kernel_size=1, padding=0))
        layers.append(ConvNormLReLU(bottleneck, bottleneck, groups=bottleneck, bias=True))  # dw
        layers.append(ConvNormLReLU(bottleneck, out_ch, kernel_size=1, padding=0, bias=False))  # pw
        layers.append(nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True))
        self.layers = nn.Sequential(*layers)
    def forward(self, input):
        out = self.layers(input)
        if self.use_res_connect:
            out = input + out
        return out
</code>
<code>class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        self.block_a = nn.Sequential(
            ConvNormLReLU(3, 32, kernel_size=7, padding=3),
            ConvNormLReLU(32, 64, stride=2, padding=(0,1,0,1)),
            ConvNormLReLU(64, 64)
        )
        self.block_b = nn.Sequential(
            ConvNormLReLU(64, 128, stride=2, padding=(0,1,0,1)),
            ConvNormLReLU(128, 128)
        )
        self.block_c = nn.Sequential(
            ConvNormLReLU(128, 128),
            InvertedResBlock(128, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            ConvNormLReLU(256, 128)
        )
        self.block_d = nn.Sequential(
            ConvNormLReLU(128, 128),
            ConvNormLReLU(128, 128)
        )
        self.block_e = nn.Sequential(
            ConvNormLReLU(128, 64),
            ConvNormLReLU(64, 64),
            ConvNormLReLU(64, 32, kernel_size=7, padding=3)
        )
        self.out_layer = nn.Sequential(
            nn.Conv2d(32, 3, kernel_size=1, stride=1, padding=0, bias=False),
            nn.Tanh()
        )
    def forward(self, input, align_corners=True):
        out = self.block_a(input)
        half_size = out.size()[-2:]
        out = self.block_b(out)
        out = self.block_c(out)
        if align_corners:
            out = F.interpolate(out, half_size, mode="bilinear", align_corners=True)
        else:
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_d(out)
        if align_corners:
            out = F.interpolate(out, input.size()[-2:], mode="bilinear", align_corners=True)
        else:
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_e(out)
        out = self.out_layer(out)
        return out
</code>
<code># -------------------------- hy add 02 --------------------------</code>
<code>def handle(gif_path: str, output_dir: str, type: int, device='cpu'):
    _ext = os.path.basename(gif_path).strip().split('.')[-1]
    if type == 1:
        _checkpoint = './weights/paprika.pt'
    elif type == 2:
        _checkpoint = './weights/face_paint_512_v1.pt'
    elif type == 3:
        _checkpoint = './weights/face_paint_512_v2.pt'
    elif type == 4:
        _checkpoint = './weights/celeba_distill.pt'
    else:
        raise Exception('type not support')
    os.makedirs(output_dir, exist_ok=True)
    net = Generator()
    net.load_state_dict(torch.load(_checkpoint, map_location="cpu"))
    net.to(device).eval()
    result = os.path.join(output_dir, f"{uuid.uuid1().hex}.{_ext}")
    img = Image.open(gif_path)
    out_images = []
    for frame in ImageSequence.Iterator(img):
        frame = frame.convert("RGB")
        with torch.no_grad():
            image = to_tensor(frame).unsqueeze(0) * 2 - 1
            out = net(image.to(device), False).cpu()
            out = out.squeeze(0).clip(-1, 1) * 0.5 + 0.5
            out = to_pil_image(out)
            out_images.append(out)
    imageio.mimsave(result, out_images, fps=15)
    return result
</code>

GIF cartoonization – The handle function receives a GIF path, output directory, model type (1‑4), and device (CPU or CUDA). It loads the selected checkpoint, processes each frame through the Generator , and saves the transformed frames as a new GIF.

Video cartoonization – A similar pipeline processes video files. The extract function pulls the audio track, handle reads frames with OpenCV, runs them through the network, enhances contrast, writes them to a new video, and finally merges the original audio using video_add_audio . The main entry point demonstrates usage with a sample video.

<code>def handle(video_path: str, output_dir: str, type: int, fps: int, device='cpu'):
    # ... (same checkpoint selection as GIF version)
    _audio = extract(video_path, output_dir, 'wav')
    net = Generator()
    net.load_state_dict(torch.load(_checkpoint, map_location="cpu"))
    net.to(device).eval()
    result = os.path.join(output_dir, f"{uuid.uuid1().hex}.{_ext}")
    capture = cv2.VideoCapture(video_path)
    size = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    videoWriter = cv2.VideoWriter(result, cv2.VideoWriter_fourcc(*'mp4v'), fps, size)
    with torch.no_grad():
        while True:
            ret, frame = capture.read()
            if not ret:
                break
            image = to_tensor(frame).unsqueeze(0) * 2 - 1
            out = net(image.to(device), False).cpu()
            out = out.squeeze(0).clip(-1, 1) * 0.5 + 0.5
            out = to_pil_image(out)
            contrast_enhancer = ImageEnhance.Contrast(out)
            img_enhanced_image = contrast_enhancer.enhance(2)
            enhanced_image = np.asarray(img_enhanced_image)
            videoWriter.write(enhanced_image)
    videoWriter.release()
    _final_video = video_add_audio(result, _audio, output_dir)
    return _final_video
</code>

The article includes screenshots of the original and cartoonized GIFs, as well as video frames before and after processing, demonstrating that the model works well on facial images (especially non‑Asian faces) and that subtitles in videos are also stylized.

Summary – The provided code offers a complete end‑to‑end solution for cartoonizing both GIFs and videos using pretrained PyTorch models. Users should note that the models perform better on Western facial features, may struggle with subtitles, and that temporary files are not automatically cleaned up.

Deep Learningimage processingvideoPyTorchGIFCartoonization
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.