Tagged articles

mobile inference

11 articles · Page 1 of 1

May 13, 2026 · Artificial Intelligence

How MiniCPM-V 4.6 Achieves Lightning‑Fast Multimodal AI on Smartphones (Open‑Source)

MiniCPM-V 4.6 combines a SigLIP2 visual encoder with a Qwen3.5 LLM, cuts FLOPs by over 50%, lowers token cost up to 43×, scores 13 on the Artificial Analysis Intelligence Index, and runs with 75 ms first‑token latency on 3136×3136 images across iOS, Android and HarmonyOS, all with fully open‑source code and extensive quantization support.

MiniCPM-VMultimodal AIQuantization

0 likes · 6 min read

How MiniCPM-V 4.6 Achieves Lightning‑Fast Multimodal AI on Smartphones (Open‑Source)

Old Zhang's AI Learning

Mar 23, 2026 · Artificial Intelligence

How Large‑Model Research Is Shifting: Insights from 120 Top Papers

The article reveals that large‑model research has moved from sheer scale to deeper capabilities and multimodal integration, highlighting ten hot directions and summarizing 120 recent top‑conference papers—including Spec‑VLA, Mobile‑O, OccTENS, and latent‑CoT studies—while offering free access to the full collection.

3D occupancy modelingMultimodal AIcausal reasoning

0 likes · 7 min read

How Large‑Model Research Is Shifting: Insights from 120 Top Papers

AIWalker

Mar 3, 2026 · Artificial Intelligence

How NanoSD Cuts 90% Parameters to Enable Real‑Time Photo Editing on Mobile

NanoSD distills Stable Diffusion 1.5 into a 130 M‑parameter model that runs inference in 20 ms on a Qualcomm SM8750 NPU, using hardware‑aware module pruning, module‑level knowledge distillation, and Bayesian optimization to achieve Pareto‑optimal quality‑efficiency trade‑offs for on‑device image restoration.

Bayesian OptimizationStable Diffusionknowledge distillation

0 likes · 14 min read

How NanoSD Cuts 90% Parameters to Enable Real‑Time Photo Editing on Mobile

JD Retail Technology

Feb 28, 2024 · Artificial Intelligence

Edge AI at JD Retail: Architecture, Challenges, and Business Practices

This article details JD Retail's edge AI (on‑device intelligence) platform, covering its definition, performance and security challenges, three‑layer cloud‑edge‑device architecture, key components such as high‑performance inference engine, data pipeline, Python VM container, and real‑world applications in traffic distribution and image recognition.

AI ArchitectureJD Retailedge AI

0 likes · 15 min read

Edge AI at JD Retail: Architecture, Challenges, and Business Practices

Kuaishou Tech

Oct 21, 2022 · Artificial Intelligence

Real-time Short Video Recommendation on Mobile Devices: System Design, Model Architecture, and Experimental Evaluation

The paper presents a lightweight on‑device re‑ranking system for short‑video recommendation that leverages real‑time user feedback and context‑aware generative ranking, detailing its architecture, feature engineering, beam‑search optimization, and both offline and online experimental results showing significant performance gains.

Beam SearchContext-Awarefeature engineering

0 likes · 12 min read

Real-time Short Video Recommendation on Mobile Devices: System Design, Model Architecture, and Experimental Evaluation

DaTaobao Tech

Jul 15, 2022 · Artificial Intelligence

Edge AI Model Evaluation and Optimization with TensorFlow, JAX, and TVM

The article demonstrates how to evaluate, compress, and convert deep‑learning models for edge devices using TensorFlow, JAX, and TVM—showing a faster iPhone‑based MNIST training benchmark, FLOPs measurement scripts, TFLite/ONNX/CoreML conversion, TVM compilation with auto‑tuning, and up to 50 % speed improvements on mobile NPU hardware.

JAXTVMTensorFlow

0 likes · 29 min read

Edge AI Model Evaluation and Optimization with TensorFlow, JAX, and TVM

Alibaba Terminal Technology

Jun 22, 2022 · Artificial Intelligence

How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization

This article examines the computational capabilities of modern mobile devices for machine learning, compares training times on a MacBook and iPhone, explains model evaluation metrics like FLOPs, and provides step‑by‑step guides for converting and optimizing models using TensorFlow, PyTorch, ONNX, JAX, and TVM for edge deployment.

JAXModel OptimizationTVM

0 likes · 29 min read

How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization

Alibaba Terminal Technology

Apr 28, 2022 · Artificial Intelligence

How MNN’s Sparse Computing Boosts Mobile AI Inference Performance

This article details the design and implementation of sparse computation in Alibaba’s MNN inference engine, covering weight sparsity techniques, block‑sparse layouts, performance benchmarks on MobileNet models versus XNNPack, and real‑world deployment cases that demonstrate significant speedups and memory savings on mobile CPUs.

AI accelerationMNNblock sparsity

0 likes · 16 min read

How MNN’s Sparse Computing Boosts Mobile AI Inference Performance

Alibaba Cloud Developer

Jul 15, 2021 · Artificial Intelligence

How Alibaba Sports Built AI‑Powered Home Exercise with Real‑Time Pose Detection

This article explains how Alibaba Sports created an AI‑driven home‑exercise solution that uses on‑device pose estimation, describes the underlying MNN inference engine, outlines challenges such as accuracy, performance and testing, and shares the business impact of supporting dozens of workout motions.

AIMNN engineautomated testing

0 likes · 11 min read

How Alibaba Sports Built AI‑Powered Home Exercise with Real‑Time Pose Detection

Alibaba Cloud Developer

May 7, 2019 · Artificial Intelligence

What Makes Alibaba’s MNN Engine a Game-Changer for Mobile AI Inference?

Alibaba’s open‑source MNN is a lightweight, high‑performance deep‑learning inference engine optimized for edge devices, supporting multiple model formats and backends, offering portability across iOS, Android, and IoT, with detailed architecture, performance benchmarks, roadmap, and real‑world application examples.

MNNPerformance Optimizationdeep learning

0 likes · 12 min read

What Makes Alibaba’s MNN Engine a Game-Changer for Mobile AI Inference?

Liulishuo Tech Team

Sep 3, 2016 · Artificial Intelligence

Optimizing Deep Neural Network Inference for Offline Speech Evaluation on Mobile Devices

This article describes how the English fluency app leverages deep neural network (DNN) models for real‑time speech scoring on smartphones, detailing offline inference challenges, BLAS‑based matrix‑vector optimizations, sparsity exploitation, cache‑friendly implementations, fixed‑point and NEON acceleration, as well as model compression techniques to improve accuracy and latency.

BLASDNN optimizationMatrix multiplication

0 likes · 11 min read

Optimizing Deep Neural Network Inference for Offline Speech Evaluation on Mobile Devices