Tagged articles
20 articles
Page 1 of 1
DaTaobao Tech
DaTaobao Tech
Aug 15, 2025 · Mobile Development

How to Eliminate Text Lag in iOS LLM Chat Apps with Smart Buffering and Typewriter Animation

This article explains how to eliminate stuttered text output in iOS chat applications powered by local LLMs using the MNN framework, by introducing a three‑layer optimization—smart stream buffering, UI update throttling with batch processing, and a typewriter‑style animation—to achieve smooth, near‑online responsiveness.

LLMMNNSwift
0 likes · 16 min read
How to Eliminate Text Lag in iOS LLM Chat Apps with Smart Buffering and Typewriter Animation
DaTaobao Tech
DaTaobao Tech
Apr 21, 2025 · Artificial Intelligence

How MNN LLM Delivers Fast, Stable On‑Device LLM Inference for Android, iOS, and Desktop

Facing DeepSeek R1 server instability, the open‑source MNN LLM framework offers local, mobile‑friendly deployment with model quantization and hardware‑specific optimizations, dramatically improving inference speed, stability, and download reliability across Android, iOS, and desktop platforms while supporting multimodal inputs.

AndroidLLMMNN
0 likes · 11 min read
How MNN LLM Delivers Fast, Stable On‑Device LLM Inference for Android, iOS, and Desktop
DaTaobao Tech
DaTaobao Tech
Nov 20, 2024 · Mobile Development

MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment

MNN‑Transformer provides an end‑to‑end framework that enables large language and diffusion models to run efficiently on modern smartphones by exporting, quantizing (including dynamic int4/int8 and KV cache compression) and executing via a plugin‑engine runtime, achieving up to 35 tokens/s decoding and 2‑3× faster image generation compared with existing on‑device solutions.

LLMMNNMobile AI
0 likes · 15 min read
MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment
DaTaobao Tech
DaTaobao Tech
Oct 16, 2024 · Artificial Intelligence

Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend

The article details MNN’s CPU backend dynamic quantization for Transformer‑type models, describing runtime int8 conversion, block‑wise matrix‑multiply optimizations using ARM SMMLA/SDOT and AVX‑512 VNNI, weight‑group and batch‑wise quantization techniques, and reports up to three‑fold speed‑ups on Snapdragon 8 Gen 3.

CPU optimizationDynamic QuantizationINT8
0 likes · 19 min read
Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend
DaTaobao Tech
DaTaobao Tech
Oct 14, 2024 · Artificial Intelligence

MNN Stable Diffusion: On‑Device Deployment and Performance Optimizations

The article presents Alibaba’s open‑source MNN inference engine, demonstrating how quantization, operator fusion (including fused multi‑head attention, GroupNorm/SplitGeLU, Winograd convolutions), optimized GEMM and memory‑paging enable on‑device Stable Diffusion with 1‑second‑per‑step performance on Snapdragon 8 Gen3 and Apple M3 GPUs, and outlines future speed‑up directions.

AIMNNStable Diffusion
0 likes · 11 min read
MNN Stable Diffusion: On‑Device Deployment and Performance Optimizations
DaTaobao Tech
DaTaobao Tech
Jan 5, 2024 · Mobile Development

Edge Deployment and Performance Optimization of Large Language Models with MNN

The upgraded mnn‑llm framework adds a unified llm‑export pipeline, cross‑platform inference with tokenizers and disk‑embedding, and ARM‑focused linear‑layer optimizations—including SIMD, hand‑written assembly and 4‑bit quantization—that dramatically speed up prefilling and achieve real‑time LLM conversation on mobile devices within a 2 GB memory budget, outperforming llama.cpp, fastllm and mlc‑llm.

ARM CPULLMMNN
0 likes · 17 min read
Edge Deployment and Performance Optimization of Large Language Models with MNN
DataFunSummit
DataFunSummit
Sep 11, 2023 · Artificial Intelligence

Challenges and Insights for Deploying Large Models on Edge with MNN

The talk presents an overview of the MNN inference engine, outlines the end‑to‑end workflow for deploying large language models on mobile devices, discusses technical challenges and practical solutions, and concludes with future directions for edge AI deployment.

AIInference EngineMNN
0 likes · 2 min read
Challenges and Insights for Deploying Large Models on Edge with MNN
DaTaobao Tech
DaTaobao Tech
Jul 12, 2023 · Artificial Intelligence

Optimizing ChatGLM-6B Deployment with MNN: Model Conversion, Quantization, and Edge Inference

The article details a workflow that converts the PyTorch ChatGLM‑6B model to MNN, splits and compresses embeddings, applies int4/int8 quantization, supports dynamic shapes, and uses hybrid GPU/CPU or CPU‑only loading to enable low‑memory edge inference on PCs and mobile devices with competitive token‑per‑second performance.

ChatGLMLLMMNN
0 likes · 16 min read
Optimizing ChatGLM-6B Deployment with MNN: Model Conversion, Quantization, and Edge Inference
DaTaobao Tech
DaTaobao Tech
Jul 18, 2022 · Artificial Intelligence

Walle: An End-to-End, General-Purpose, Large-Scale Device-Cloud Collaborative Machine Learning System

Walle is Alibaba’s first end‑to‑end, general‑purpose, large‑scale device‑cloud collaborative machine‑learning platform that manages billions of mobile devices, provides a full‑stack data and compute pipeline, cuts cloud load by 87 %, reduces latency to ~100 ms, and already powers over a trillion daily ML invocations across dozens of Alibaba apps.

MNNOSDIdevice-cloud collaboration
0 likes · 11 min read
Walle: An End-to-End, General-Purpose, Large-Scale Device-Cloud Collaborative Machine Learning System
DaTaobao Tech
DaTaobao Tech
Jul 13, 2022 · Artificial Intelligence

MNN 2.0: A Unified Edge‑Cloud Deep Learning Framework Overview

MNN 2.0 transforms Alibaba’s lightweight deep‑learning engine into a unified edge‑cloud framework, delivering ultra‑small binaries, broad model‑format support, and aggressive CPU/GPU/DSP/NPU optimizations—including SIMD, Winograd, quantization, and sparse computation—while providing Python‑style APIs for preprocessing, inference, and on‑device training.

Deep LearningEdge ComputingMNN
0 likes · 18 min read
MNN 2.0: A Unified Edge‑Cloud Deep Learning Framework Overview
Alibaba Terminal Technology
Alibaba Terminal Technology
Apr 28, 2022 · Artificial Intelligence

How MNN’s Sparse Computing Boosts Mobile AI Inference Performance

This article details the design and implementation of sparse computation in Alibaba’s MNN inference engine, covering weight sparsity techniques, block‑sparse layouts, performance benchmarks on MobileNet models versus XNNPack, and real‑world deployment cases that demonstrate significant speedups and memory savings on mobile CPUs.

AI accelerationMNNblock sparsity
0 likes · 16 min read
How MNN’s Sparse Computing Boosts Mobile AI Inference Performance
DaTaobao Tech
DaTaobao Tech
Mar 11, 2022 · Artificial Intelligence

How Alibaba’s MNN Engine Achieves 350% CPU Speedup and Sparse Acceleration

Alibaba’s MNN, a lightweight high‑performance deep‑learning inference engine, earned top honors in China’s 2022 “Science & Innovation China” awards, and delivers impressive gains such as 350% speedup on X86 CPUs, 2.1‑2.3× acceleration on ARM with sparse models, plus integrated OpenCV/Numpy functionality for edge AI deployment.

AI deploymentAlibabaDeep Learning
0 likes · 4 min read
How Alibaba’s MNN Engine Achieves 350% CPU Speedup and Sparse Acceleration
DaTaobao Tech
DaTaobao Tech
Feb 17, 2022 · Artificial Intelligence

Unifying Edge AI Training and Deployment: Inside MNN Workbench’s New Workflow

The article outlines how MNN Workbench, Alibaba’s open‑source edge‑AI platform, integrates professional training capabilities, cloud‑based PAI‑DLC resources, multi‑window debugging, and visual Git Flow to streamline end‑to‑end model development, deployment, and iteration for developers of varying expertise.

DeploymentGit FlowMNN
0 likes · 10 min read
Unifying Edge AI Training and Deployment: Inside MNN Workbench’s New Workflow
Amap Tech
Amap Tech
Jun 4, 2021 · Artificial Intelligence

Deploying Multiple CNN Models on Low‑End Devices with MNN: Memory Tricks and Error Debugging

This article explains how a high‑traffic map service captures road features using client‑side computer‑vision models, details the deployment of many CNNs with the lightweight MNN engine on memory‑constrained devices, and shares practical memory‑saving techniques, inference scheduling, and error‑analysis methods.

AndroidComputer VisionMNN
0 likes · 12 min read
Deploying Multiple CNN Models on Low‑End Devices with MNN: Memory Tricks and Error Debugging
DataFunTalk
DataFunTalk
Mar 25, 2021 · Artificial Intelligence

Optimizing MNN Mobile Neural Network Inference on GPU with OpenCL: Memory Objects, Work‑Group Tuning, and Auto‑Tuning

This article explains how the MNN deep‑learning framework leverages OpenCL to achieve high‑performance inference on mobile, PC and embedded GPUs by diversifying memory objects, aligning data, using local‑memory reductions, selecting optimal work‑group sizes, applying pre‑inference auto‑tuning, caching compiled programs, and providing practical GPU‑friendly model design guidelines.

GPU OptimizationMNNOpenCL
0 likes · 20 min read
Optimizing MNN Mobile Neural Network Inference on GPU with OpenCL: Memory Objects, Work‑Group Tuning, and Auto‑Tuning
Alibaba Terminal Technology
Alibaba Terminal Technology
Jun 28, 2020 · Frontend Development

Accelerating Frontend AI: From WebGL to MNN.js and Beyond

This article explores the rise of AI in front‑end development during the pandemic, compares frameworks like TensorFlow.js, ONNX.js and WebNN, presents a performance‑focused case study of MNN.js, and outlines practical acceleration tools for cross‑platform web and mini‑program AI applications.

MNNWasmfrontend
0 likes · 10 min read
Accelerating Frontend AI: From WebGL to MNN.js and Beyond
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 2, 2019 · Artificial Intelligence

How MNN Powers Mobile AI: Inside Alibaba’s Open‑Source Inference Engine

Alibaba’s MNN (Mobile Neural Network) engine, now open‑sourced on GitHub, showcases how a lightweight, end‑side deep‑learning inference framework tackles fragmentation, optimizes model conversion, scheduling, and execution across diverse devices, delivering significant performance gains for mobile and IoT AI applications.

Inference EngineMNNMobile AI
0 likes · 15 min read
How MNN Powers Mobile AI: Inside Alibaba’s Open‑Source Inference Engine
Alibaba Cloud Developer
Alibaba Cloud Developer
May 7, 2019 · Artificial Intelligence

What Makes Alibaba’s MNN Engine a Game-Changer for Mobile AI Inference?

Alibaba’s open‑source MNN is a lightweight, high‑performance deep‑learning inference engine optimized for edge devices, supporting multiple model formats and backends, offering portability across iOS, Android, and IoT, with detailed architecture, performance benchmarks, roadmap, and real‑world application examples.

Deep LearningMNNedge AI
0 likes · 12 min read
What Makes Alibaba’s MNN Engine a Game-Changer for Mobile AI Inference?