Cloud Computing 11 min read

Boost Cloud Rendering with NVIDIA GPU: Hardware Encoding & Decoding Using FFmpeg

This article explains how to leverage server‑side GPUs for hardware‑accelerated H.264 encoding and decoding with FFmpeg, covering installation, key API calls, format conversion to OpenGL textures, multi‑process considerations, and performance optimizations for cloud‑rendered visual effects.

Kuaishou Large Model

Aug 26, 2022

Boost Cloud Rendering with NVIDIA GPU: Hardware Encoding & Decoding Using FFmpeg

Abstract

In cloud‑rendering scenarios, server‑side high‑performance hardware can not only process effects quickly but also accelerate video encoding and decoding. This article describes how to use server GPUs for hardware encoding/decoding and presents optimization techniques for special cases.

Installation

To enable NVIDIA‑accelerated codecs in FFmpeg, first compile the nv‑codec headers and then configure FFmpeg with the required options.

git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git
cd nv-codec-headers
make
sudo make install

Configure FFmpeg:

./configure ... --enable-cuda --enable-cuvid --enable-nvenc ...

Encoding and Decoding

FFmpeg provides example programs that illustrate the encoding/decoding workflow:

hw_decode.c – decode a video and output YUV frames.

decode_video.c – decode a video and output YUV frames.

encode_video.c – encode video with a specified codec.

Key FFmpeg APIs used for hardware acceleration include:

av_hwdevice_ctx_create() – creates a hardware device context.

av_hwdevice_find_type_by_name() – finds the hardware device type by name (e.g., "cuda").

avcodec_find_encoder_by_name() and avcodec_find_decoder_by_name() – locate specific encoders/decoders such as h264_nvenc and h264_cuvid.

Encoder options can be set via av_opt_set(), for example:

if (codec->id == AV_CODEC_ID_H264)
  av_opt_set(c->priv_data, "preset", "slow", 0);

Format Conversion

After decoding, frames must be transferred to OpenGL textures for further processing. When using NVIDIA hardware, decoded frames reside in CUDA memory, which can be mapped directly to OpenGL textures.

// Obtain CUDA context from VideoCodecContext->hw_device_ctx
cuCtxPushCurrent(cudaContext);
// Register OpenGL texture with CUDA
cuGraphicsGLRegisterImage(&(cuTexRes[channel]), texYuv[channel], GL_TEXTURE_2D, cudaGraphicsRegisterFlagsWriteDiscard);
cuGraphicsMapResources(1, &(cuTexRes[channel]), 0);
CUarray mapped_array;
cuGraphicsSubResourceGetMappedArray(&mapped_array, cuTexRes[channel], 0, 0);
// Copy data from CUDA device to OpenGL texture
CUDA_MEMCPY2D cpy = {
  .srcMemoryType = CU_MEMORYTYPE_DEVICE,
  .srcDevice = (CUdeviceptr)((AVFrame *)frame->data[channel]),
  .srcPitch = width,
  .dstMemoryType = CU_MEMORYTYPE_ARRAY,
  .dstArray = mapped_array,
  .WidthInBytes = width,
  .Height = height,
};
cuMemcpy2D(&cpy);
cuGraphicsUnmapResources(1, &(cuTexRes[channel]), 0);
cuCtxPopCurrent(&cudaContext);

Multi‑Task Scenario

When running multiple encoding tasks, desktop‑grade NVIDIA GPUs typically limit concurrent encoding to three streams, while server‑grade GPUs have no such limit. Using NVIDIA's Multi‑Process Service (MPS) can reduce context‑switch overhead and improve GPU utilization.

# Determine GPU index (e.g., 0)
DEVICE_NUM=`nvidia-smi --query-gpu=index --format=csv | tail -n 1`
export CUDA_VISIBLE_DEVICES=$DEVICE_NUM
# Start MPS server
nvidia-cuda-mps-control -d

Summary

Understanding FFmpeg’s interaction with hardware accelerators enables fine‑grained optimizations for cloud‑based visual effects, and the same principles can be extended to machine‑learning workloads where CUDA data exchange becomes a performance bottleneck.

References

FFmpeg repository: https://github.com/FFmpeg/FFmpeg/

Hardware Acceleration Introduction: https://trac.ffmpeg.org/wiki/HWAccelIntro

NVIDIA Video Encode and Decode GPU Support Matrix: https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new

NVIDIA MPS documentation: https://docs.nvidia.com/deploy/mps/index.html

NVENC vs libx264 performance: https://developer.nvidia.com/zh-cn/blog/turing-h264-video-encoding-speed-and-quality/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GPU Acceleration CUDA NVIDIA FFmpeg cloud rendering hardware encoding

Written by

Kuaishou Large Model

Official Kuaishou Account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.