Boost MXNet Video Training Speed by Up to 18× with Rec‑Format I/O

This article analyzes MXNet's lack of native video I/O, compares existing image iterators, introduces a Rec‑format based video iterator, and demonstrates through single‑GPU and multi‑GPU experiments that the new approach can accelerate training by up to eighteen times.

Meitu Technology
Meitu Technology
Meitu Technology
Boost MXNet Video Training Speed by Up to 18× with Rec‑Format I/O

Background

When training models on large‑scale video datasets, the time spent loading video frames can dominate the overall training time. MXNet provides image iterators ( image.ImageIter, io.ImageRecordIter, io.MNISTIter) but does not include a native video iterator, forcing developers to rely on OpenCV or skimage reads, which are comparatively slow.

Image I/O Interface Performance Comparison

Three MXNet image iterators were evaluated: io.MNISTIter: Designed for the MNIST dataset; limited augmentation support. io.ImageRecordIter (and io.ImageRecordUInt8Iter): Reads data stored in Rec format, offers many augmentation options, implemented in C++ for high throughput, but requires pre‑packing all images into Rec files, increasing disk usage. image.ImageIter: Supports both raw images and Rec files; flexible Python implementation, but slower than the C++ backend of io.ImageRecordIter.

Benchmark configuration:

MXNet version: 0.11.0

Network: Inception‑v3 (3 classes)

GPU: Titan X

Batch size: 128 (single‑GPU) / 128 × 3 (3‑GPU)

Results on a single GPU showed that io.ImageRecordIter reduced I/O time by roughly 1.4× compared with image.ImageIter. In a 3‑GPU setup the speedup grew to about 4.4× because the other iterators’ I/O time remained almost constant while the Rec‑based iterator scaled with the number of GPUs.

Video I/O Optimization Strategy

Two implementation paths were explored to create a video iterator on top of io.ImageRecordIter:

OpenCV‑based iterator : Reads each video with OpenCV, packs the frames into a tensor of shape (batch, frames, C, H, W). This approach is easy to implement in Python but becomes a bottleneck for large datasets.

Wrapped ImageRecordIter : For each video, sample a fixed number of frames (e.g., 3). Store the sampled frames as separate images in Rec files, resulting in an intermediate shape (3 × batch, C, H, W). During iteration, reshape the batch to (batch, 3, C, H, W) and similarly reshape the label vector. This leverages the fast C++ backend of io.ImageRecordIter and avoids per‑frame decoding overhead.

Reference implementation and source code are available at:

https://github.com/MTCloudVision/mxnet-videoio

Experimental Results

Three groups of experiments compared OpenCV, the Rec‑based video iterator, and the baseline image iterators on both single‑GPU and multi‑GPU (up to 4 GPUs) configurations.

Single‑GPU performance chart
Single‑GPU performance chart

The Rec‑format video iterator achieved roughly 18× faster data loading than the OpenCV baseline and exhibited near‑linear scaling with the number of GPUs (e.g., 4‑GPU speed ≈ 4× single‑GPU speed).

Multi‑GPU performance chart
Multi‑GPU performance chart

In a realistic workload of 100 k videos (10 frames per video) using ResNet‑200 on a Titan X for 20 epochs, the OpenCV pipeline with four threads required approximately 228 hours, whereas the Rec‑based iterator completed the same training in about 22 hours.

Conclusion

For fixed video datasets, storing video frames in MXNet Rec format and accessing them through a wrapped io.ImageRecordIter dramatically reduces I/O overhead—by about 1.4× on a single GPU and 4.4× on multi‑GPU setups—making it the preferred solution despite the increased disk consumption required for the Rec files.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance Optimizationdeep learningMXNetImageRecordIterRec FormatVideo I/O
Meitu Technology
Written by

Meitu Technology

Curating Meitu's technical expertise, valuable case studies, and innovation insights. We deliver quality technical content to foster knowledge sharing between Meitu's tech team and outstanding developers worldwide.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.