Tagged articles
11 articles
Page 1 of 1
SuanNi
SuanNi
May 13, 2026 · Artificial Intelligence

How MiniCPM-V 4.6 Achieves Lightning‑Fast Multimodal AI on Smartphones (Open‑Source)

MiniCPM-V 4.6 combines a SigLIP2 visual encoder with a Qwen3.5 LLM, cuts FLOPs by over 50%, lowers token cost up to 43×, scores 13 on the Artificial Analysis Intelligence Index, and runs with 75 ms first‑token latency on 3136×3136 images across iOS, Android and HarmonyOS, all with fully open‑source code and extensive quantization support.

BenchmarkMiniCPM-VMultimodal AI
0 likes · 6 min read
How MiniCPM-V 4.6 Achieves Lightning‑Fast Multimodal AI on Smartphones (Open‑Source)
Old Zhang's AI Learning
Old Zhang's AI Learning
Mar 23, 2026 · Artificial Intelligence

How Large‑Model Research Is Shifting: Insights from 120 Top Papers

The article reveals that large‑model research has moved from sheer scale to deeper capabilities and multimodal integration, highlighting ten hot directions and summarizing 120 recent top‑conference papers—including Spec‑VLA, Mobile‑O, OccTENS, and latent‑CoT studies—while offering free access to the full collection.

3D occupancy modelingMultimodal AIcausal reasoning
0 likes · 7 min read
How Large‑Model Research Is Shifting: Insights from 120 Top Papers
AIWalker
AIWalker
Mar 3, 2026 · Artificial Intelligence

How NanoSD Cuts 90% Parameters to Enable Real‑Time Photo Editing on Mobile

NanoSD distills Stable Diffusion 1.5 into a 130 M‑parameter model that runs inference in 20 ms on a Qualcomm SM8750 NPU, using hardware‑aware module pruning, module‑level knowledge distillation, and Bayesian optimization to achieve Pareto‑optimal quality‑efficiency trade‑offs for on‑device image restoration.

Bayesian OptimizationStable Diffusionknowledge distillation
0 likes · 14 min read
How NanoSD Cuts 90% Parameters to Enable Real‑Time Photo Editing on Mobile
JD Retail Technology
JD Retail Technology
Feb 28, 2024 · Artificial Intelligence

Edge AI at JD Retail: Architecture, Challenges, and Business Practices

This article details JD Retail's edge AI (on‑device intelligence) platform, covering its definition, performance and security challenges, three‑layer cloud‑edge‑device architecture, key components such as high‑performance inference engine, data pipeline, Python VM container, and real‑world applications in traffic distribution and image recognition.

AI ArchitectureJD Retailedge AI
0 likes · 15 min read
Edge AI at JD Retail: Architecture, Challenges, and Business Practices
Kuaishou Tech
Kuaishou Tech
Oct 21, 2022 · Artificial Intelligence

Real-time Short Video Recommendation on Mobile Devices: System Design, Model Architecture, and Experimental Evaluation

The paper presents a lightweight on‑device re‑ranking system for short‑video recommendation that leverages real‑time user feedback and context‑aware generative ranking, detailing its architecture, feature engineering, beam‑search optimization, and both offline and online experimental results showing significant performance gains.

Beam SearchContext-Awarefeature engineering
0 likes · 12 min read
Real-time Short Video Recommendation on Mobile Devices: System Design, Model Architecture, and Experimental Evaluation
DaTaobao Tech
DaTaobao Tech
Jul 15, 2022 · Artificial Intelligence

Edge AI Model Evaluation and Optimization with TensorFlow, JAX, and TVM

The article demonstrates how to evaluate, compress, and convert deep‑learning models for edge devices using TensorFlow, JAX, and TVM—showing a faster iPhone‑based MNIST training benchmark, FLOPs measurement scripts, TFLite/ONNX/CoreML conversion, TVM compilation with auto‑tuning, and up to 50 % speed improvements on mobile NPU hardware.

JAXTVMTensorFlow
0 likes · 29 min read
Edge AI Model Evaluation and Optimization with TensorFlow, JAX, and TVM
Alibaba Terminal Technology
Alibaba Terminal Technology
Jun 22, 2022 · Artificial Intelligence

How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization

This article examines the computational capabilities of modern mobile devices for machine learning, compares training times on a MacBook and iPhone, explains model evaluation metrics like FLOPs, and provides step‑by‑step guides for converting and optimizing models using TensorFlow, PyTorch, ONNX, JAX, and TVM for edge deployment.

JAXModel OptimizationTVM
0 likes · 29 min read
How Fast Can Your Smartphone Run ML Models? Exploring Edge AI Optimization
Alibaba Terminal Technology
Alibaba Terminal Technology
Apr 28, 2022 · Artificial Intelligence

How MNN’s Sparse Computing Boosts Mobile AI Inference Performance

This article details the design and implementation of sparse computation in Alibaba’s MNN inference engine, covering weight sparsity techniques, block‑sparse layouts, performance benchmarks on MobileNet models versus XNNPack, and real‑world deployment cases that demonstrate significant speedups and memory savings on mobile CPUs.

AI accelerationMNNblock sparsity
0 likes · 16 min read
How MNN’s Sparse Computing Boosts Mobile AI Inference Performance
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 15, 2021 · Artificial Intelligence

How Alibaba Sports Built AI‑Powered Home Exercise with Real‑Time Pose Detection

This article explains how Alibaba Sports created an AI‑driven home‑exercise solution that uses on‑device pose estimation, describes the underlying MNN inference engine, outlines challenges such as accuracy, performance and testing, and shares the business impact of supporting dozens of workout motions.

AIAutomated TestingMNN engine
0 likes · 11 min read
How Alibaba Sports Built AI‑Powered Home Exercise with Real‑Time Pose Detection
Alibaba Cloud Developer
Alibaba Cloud Developer
May 7, 2019 · Artificial Intelligence

What Makes Alibaba’s MNN Engine a Game-Changer for Mobile AI Inference?

Alibaba’s open‑source MNN is a lightweight, high‑performance deep‑learning inference engine optimized for edge devices, supporting multiple model formats and backends, offering portability across iOS, Android, and IoT, with detailed architecture, performance benchmarks, roadmap, and real‑world application examples.

Deep LearningMNNPerformance Optimization
0 likes · 12 min read
What Makes Alibaba’s MNN Engine a Game-Changer for Mobile AI Inference?
Liulishuo Tech Team
Liulishuo Tech Team
Sep 3, 2016 · Artificial Intelligence

Optimizing Deep Neural Network Inference for Offline Speech Evaluation on Mobile Devices

This article describes how the English fluency app leverages deep neural network (DNN) models for real‑time speech scoring on smartphones, detailing offline inference challenges, BLAS‑based matrix‑vector optimizations, sparsity exploitation, cache‑friendly implementations, fixed‑point and NEON acceleration, as well as model compression techniques to improve accuracy and latency.

BLASDNN optimizationMatrix Multiplication
0 likes · 11 min read
Optimizing Deep Neural Network Inference for Offline Speech Evaluation on Mobile Devices