How NetEase Cloud Accelerates Video Transcoding with Slice‑Based Parallelism

NetEase Cloud’s video transcoding service boosts processing speed by combining hardware acceleration, custom codecs, AMD EPYC servers, and a slice‑based parallel transcoding pipeline, while optimizing cluster task scheduling and handling straggler issues to achieve significant performance gains across large‑scale media workloads.

NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
How NetEase Cloud Accelerates Video Transcoding with Slice‑Based Parallelism

Introduction

Video transcoding is a core media‑processing function that often requires long processing time for large files. NetEase Cloud (网易云信) aims to improve transcoding speed to enhance service quality.

Factors Affecting Transcoding Performance

Source video length : longer videos need more encoding time.

Container and codec : simple remuxing takes 1–2 s, while full re‑encoding time varies with source video, target bitrate, resolution, and frame rate (e.g., AV1 > H.264).

Compute resources : stronger single‑core CPUs, GPUs, and higher concurrency (multithreading or multiprocess slicing) reduce latency.

Cluster task scheduling : efficient multi‑tenant scheduling, priority handling, and resource‑aware dispatch improve throughput.

The article focuses on four optimization directions: hardware capability, codec optimization, slice‑based transcoding, and cluster scheduling efficiency.

Hardware Acceleration

Offloading video‑image computation to dedicated hardware (Intel VA‑API, NVIDIA VDPAU, Intel Quick Sync, NVENC/NVDEC) significantly speeds up high‑bitrate, high‑resolution encoding. NetEase Cloud uses Intel integrated graphics with FFmpeg’s QSV and VAPPI plugins for AVDecoder, AVFilter, and AVEncoder acceleration.

AMD EPYC Servers

Servers equipped with AMD EPYC CPUs provide stronger single‑core performance and superior parallelism, benefiting both single‑process transcoding and multi‑process slice transcoding by reducing cross‑machine I/O.

Self‑Developed Codec (NE264/NE265)

NetEase’s proprietary encoders deliver 20‑30 % bitrate savings at comparable visual quality, especially for high‑bitrate live streams such as gaming or concerts.

Slice‑Based Transcoding

Video streams consist of GOPs delimited by IDR frames. By cutting a video into independent slices (similar to MapReduce), each slice can be transcoded in parallel and later merged, turning a sequential bottleneck into a highly parallel workload.

Task Scheduling Architecture

Jobs are split into a parent task (Job) and multiple child tasks (Tasks). Workers execute parent tasks, which dispatch child tasks to other workers. Two dispatch mechanisms are used:

Master pushes tasks to workers (low latency, but snapshot‑based resource view may cause overload).

Workers pull tasks from Master (better load balancing, but less real‑time control).

Child‑Task Scheduling Details

Child tasks are placed in a high‑priority global queue to avoid waiting behind regular tasks. Scheduling considers machine type, codec version, and data locality to minimize network I/O.

Straggler Problem and Mitigation

When a few child tasks lag, the parent worker can become blocked. Two mitigation strategies are employed:

Redundant scheduling : duplicate slow tasks after a timeout; the first to finish cancels the other.

Parent‑worker takeover : the parent worker re‑encodes the slowest slice itself, avoiding extra resource consumption.

Progress Tracking

Transcoding progress is divided into stages: scheduling (0 %), download/prep (20 %), compute (30‑100 %), upload/cleanup (90 %). Metrics are logged to compute real‑time progress as processed time divided by total required time.

HLS/DASH Packaging

Because HLS generates multiple .ts segments and an .m3u8 manifest, NetEase first converts source video to MP4, merges slices, then performs HLS/DASH packaging as a final step.

Test Results

Benchmarking on two test videos shows that each optimization (hardware acceleration, custom codec, AMD servers, slice transcoding) contributes measurable speed improvements, and combined they achieve significant overall acceleration.

Conclusion

NetEase Cloud’s transcoding team improves video processing speed through hardware upgrades, custom codecs, slice‑based parallelism, and refined cluster scheduling, with future articles planned to dive deeper into scheduling algorithms and hardware acceleration techniques.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

task schedulingDistributed ProcessingHardware AccelerationVideo Transcodingslice transcoding
NetEase Smart Enterprise Tech+
Written by

NetEase Smart Enterprise Tech+

Get cutting-edge insights from NetEase's CTO, access the most valuable tech knowledge, and learn NetEase's latest best practices. NetEase Smart Enterprise Tech+ helps you grow from a thinker into a tech expert.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.