Artificial Intelligence 11 min read

ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications

ShaderNN is an open‑source, sub‑2 MB GPU‑shader inference engine that runs TensorFlow, PyTorch and ONNX models directly on mobile graphics textures via OpenGL fragment and compute shaders, delivering real‑time, low‑power AI for image‑heavy tasks while eliminating third‑party dependencies and achieving up to 90 % speed gains.

OPPO Kernel Craftsman
OPPO Kernel Craftsman
OPPO Kernel Craftsman
ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications

Background: With the rapid development of deep learning and the increasing computational power of mobile devices, many inference tasks that were previously executed in the cloud are now being moved to mobile platforms. Mobile deep‑learning inference involves hardware, drivers, compilation optimizations, model compression, operator optimization, and deployment, creating a strong demand for an efficient inference framework that can be integrated into system‑level business development.

Existing mobile inference frameworks such as Xiaomi MACE, Tencent NCNN/TNN, Alibaba MNN, and Google TensorFlow Lite rely on various third‑party libraries and hardware drivers, making adaptation, verification, optimization, deployment, maintenance, and upgrades cumbersome. Moreover, real‑time graphics‑oriented AI applications (e.g., video super‑resolution, AI rendering, ray‑tracing denoising, game AI post‑processing) require tight coupling with the graphics pipeline and low‑latency I/O, which most current frameworks cannot satisfy.

To address these challenges, we developed ShaderNN, a GPU‑Shader‑based inference engine. ShaderNN operates directly on GPU textures, eliminating the need for third‑party libraries and enabling seamless cross‑hardware deployment. It supports models trained with TensorFlow, PyTorch, and ONNX, and can be customized and optimized for integration, deployment, and upgrades.

What is ShaderNN? ShaderNN (Shader Neural Network) is a shader‑based inference framework that leverages OpenGL fragment shaders and compute shaders to execute deep‑learning models. In the first phase, OpenGL‑based inference has been completed and open‑sourced, supporting mainstream CNN models and operators. The second phase will extend support to Vulkan compute shaders.

The inference workflow consists of model conversion and layer‑fusion optimization, model and weight loading, computation‑graph generation, operator execution, and result return. Models from TensorFlow or PyTorch are converted to ONNX and then to a ShaderNN‑specific JSON format. During conversion, the model structure and weights are decoupled, operators are parsed, and inter‑layer fusion is performed. The engine then topologically sorts the graph, generates a compute graph, and dispatches operators to shaders. Optimizations occur both at compile time (shader compilation, caching, operator replacement/fusion) and runtime (convolution optimization, texture reuse, CPU/GPU memory reuse, data layout, caching, vectorization).

Advantages of ShaderNN

1. Real‑time graphics‑oriented AI: By using native GPU textures as input/output, ShaderNN avoids costly format conversions and data copies, dramatically reducing I/O overhead for large‑scale image/video pipelines.

2. High performance: It is the first engine to combine fragment and compute shaders in a static computation‑graph, allowing pre‑execution graph optimizations and achieving faster inference.

3. Lightweight and portable: The core library is <2 MB and has no third‑party dependencies, making deployment on mobile devices straightforward.

4. Extensibility: Custom operators can be added to support new models and optimizations.

5. Generality: Supports TensorFlow, PyTorch, and ONNX models, covering typical CNN tasks such as classification, object detection, segmentation, and image enhancement, with a ModelZoo and Android demo app.

Performance and Power Consumption

Four representative CNN models (Spatial Denoise, ESPCN, ResNet‑18, YOLO‑V3 Tiny) were benchmarked on two MediaTek and two Qualcomm platforms against TensorFlow Lite OpenGL. Results show that ShaderNN reduces inference time by 75‑90% for Spatial Denoise and ESPCN, and up to 50% for ResNet‑18 and YOLO‑V3 Tiny on certain chips. Power measurements indicate that a single inference consumes significantly less energy than TensorFlow Lite, with average power savings of 51‑80% across the tested models.

Typical Application Scenarios

ShaderNN excels in graphics‑intensive AI tasks such as ray‑tracing denoising, deep‑learning super‑sampling, high‑dynamic‑range imaging (HDR), super‑resolution, and style transfer.

Roadmap and Outlook

Future work includes adding Vulkan‑based inference, expanding supported operators, and fostering community contributions via the open‑source GitHub repository (Apache 2.0). The goal is to enable more desktop‑class AI features (e.g., DLSS, ray‑tracing denoising) on mobile GPUs and explore complementary use of NPU/DSP.

Conclusion

ShaderNN is an open‑source, lightweight inference engine that leverages GPU shaders to deliver high‑performance, low‑power AI inference for real‑time graphics and image processing on mobile devices.

performancedeep learningmobile AIGPUShaderInference Enginepower efficiency
OPPO Kernel Craftsman
Written by

OPPO Kernel Craftsman

Sharing Linux kernel-related cutting-edge technology, technical articles, technical news, and curated tutorials

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.