Boost MXNet Video Training Speed by Up to 18× with Rec‑Format I/O
This article analyzes MXNet's lack of native video I/O, compares existing image iterators, introduces a Rec‑format based video iterator, and demonstrates through single‑GPU and multi‑GPU experiments that the new approach can accelerate training by up to eighteen times.
Background
When training models on large‑scale video datasets, the time spent loading video frames can dominate the overall training time. MXNet provides image iterators ( image.ImageIter, io.ImageRecordIter, io.MNISTIter) but does not include a native video iterator, forcing developers to rely on OpenCV or skimage reads, which are comparatively slow.
Image I/O Interface Performance Comparison
Three MXNet image iterators were evaluated: io.MNISTIter: Designed for the MNIST dataset; limited augmentation support. io.ImageRecordIter (and io.ImageRecordUInt8Iter): Reads data stored in Rec format, offers many augmentation options, implemented in C++ for high throughput, but requires pre‑packing all images into Rec files, increasing disk usage. image.ImageIter: Supports both raw images and Rec files; flexible Python implementation, but slower than the C++ backend of io.ImageRecordIter.
Benchmark configuration:
MXNet version: 0.11.0
Network: Inception‑v3 (3 classes)
GPU: Titan X
Batch size: 128 (single‑GPU) / 128 × 3 (3‑GPU)
Results on a single GPU showed that io.ImageRecordIter reduced I/O time by roughly 1.4× compared with image.ImageIter. In a 3‑GPU setup the speedup grew to about 4.4× because the other iterators’ I/O time remained almost constant while the Rec‑based iterator scaled with the number of GPUs.
Video I/O Optimization Strategy
Two implementation paths were explored to create a video iterator on top of io.ImageRecordIter:
OpenCV‑based iterator : Reads each video with OpenCV, packs the frames into a tensor of shape (batch, frames, C, H, W). This approach is easy to implement in Python but becomes a bottleneck for large datasets.
Wrapped ImageRecordIter : For each video, sample a fixed number of frames (e.g., 3). Store the sampled frames as separate images in Rec files, resulting in an intermediate shape (3 × batch, C, H, W). During iteration, reshape the batch to (batch, 3, C, H, W) and similarly reshape the label vector. This leverages the fast C++ backend of io.ImageRecordIter and avoids per‑frame decoding overhead.
Reference implementation and source code are available at:
https://github.com/MTCloudVision/mxnet-videoio
Experimental Results
Three groups of experiments compared OpenCV, the Rec‑based video iterator, and the baseline image iterators on both single‑GPU and multi‑GPU (up to 4 GPUs) configurations.
The Rec‑format video iterator achieved roughly 18× faster data loading than the OpenCV baseline and exhibited near‑linear scaling with the number of GPUs (e.g., 4‑GPU speed ≈ 4× single‑GPU speed).
In a realistic workload of 100 k videos (10 frames per video) using ResNet‑200 on a Titan X for 20 epochs, the OpenCV pipeline with four threads required approximately 228 hours, whereas the Rec‑based iterator completed the same training in about 22 hours.
Conclusion
For fixed video datasets, storing video frames in MXNet Rec format and accessing them through a wrapped io.ImageRecordIter dramatically reduces I/O overhead—by about 1.4× on a single GPU and 4.4× on multi‑GPU setups—making it the preferred solution despite the increased disk consumption required for the Rec files.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meitu Technology
Curating Meitu's technical expertise, valuable case studies, and innovation insights. We deliver quality technical content to foster knowledge sharing between Meitu's tech team and outstanding developers worldwide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
