Artificial Intelligence 11 min read

How MNN LLM Delivers Fast, Stable On‑Device LLM Inference for Android, iOS, and Desktop

Facing DeepSeek R1 server instability, the open‑source MNN LLM framework offers local, mobile‑friendly deployment with model quantization and hardware‑specific optimizations, dramatically improving inference speed, stability, and download reliability across Android, iOS, and desktop platforms while supporting multimodal inputs.

DaTaobao Tech

Apr 21, 2025

How MNN LLM Delivers Fast, Stable On‑Device LLM Inference for Android, iOS, and Desktop

Overview

MNN LLM is an open‑source framework that enables fully local deployment of large language models on Android, iOS, and desktop platforms. It removes the reliance on unstable cloud services and provides a unified solution for text, image, and audio multimodal inference.

Key Features

Local deployment : Models run entirely on the device without remote servers.

Mobile compatibility : A single smartphone can run the distilled DeepSeek R1 Qwen 7B model.

Multimodal support : Text‑to‑image generation, voice input, and image input are all supported.

Performance Optimizations

MNN achieves 20‑50% faster CPU decode and up to 2× faster pre‑fill compared with competing runtimes. On small models GPU inference is >30% faster, while on larger models it matches MLC‑LLM with more stable GPU output.

Reliable Model Download

Traditional Hugging Face downloads often fail for users in China. MNN LLM integrates ModelScope and a built‑in resumable download mechanism, eliminating “server busy” errors and dramatically speeding up model acquisition.

Supported Models

DeepSeek‑R1‑7B‑Qwen‑MNN

DeepSeek‑R1‑1.5B‑Qwen‑MNN

Qwen‑2.5‑0.5B‑Instruct‑MNN

Qwen‑2.5‑1.5B‑Instruct‑MNN

Qwen‑2.5‑3B‑Instruct‑MNN

Qwen‑2.5‑7B‑Instruct‑MNN

Gemma‑2‑2B‑IT‑MNN

Llama‑2‑7B‑Chat‑MS‑MNN

Baichuan2‑7B‑Chat‑MNN

InternLM‑Chat‑7B‑MNN

GLM‑4‑9B‑Chat‑MNN (iOS not supported)

Yi‑6B‑Chat‑MNN

ChatGLM3‑6B‑MNN

TinyLlama‑1.1B‑Chat‑MNN

MobileLLM‑125M‑MNN, 350M‑MNN, 600M‑MNN, 1B‑MNN

Stable‑Diffusion‑v1‑5‑MNN‑OpenCL (iOS not supported)

Installation

Android

git clone https://github.com/alibaba/MNN.git
cd project/android
mkdir build_64
../build_64.sh "-DMNN_LOW_MEMORY=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_USE_LOGCAT=true -DMNN_OPENCL=true -DLLM_SUPPORT_VISION=true -DMNN_BUILD_OPENCV=true -DMNN_IMGCODECS=true -DLLM_SUPPORT_AUDIO=true -DMNN_BUILD_AUDIO=true -DMNN_BUILD_DIFFUSION=ON -DMNN_SEP_BUILD=ON"
find . -name "*.so" -exec cp {} ../apps/MnnLlmApp/app/src/main/jniLibs/arm64-v8a/ \;
cd ../apps/MnnLlmApp/
./gradlew installDebug

iOS

git clone https://github.com/alibaba/MNN.git
cd MNN
sh package_scripts/ios/buildiOS.sh "-DMNN_ARM82=true -DMNN_LOW_MEMORY=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_BUILD_LLM=true -DMNN_METAL=ON -DMNN_BUILD_DIFFUSION=ON -DMNN_BUILD_OPENCV=ON -DMNN_IMGCODECS=ON -DMNN_OPENCL=OFF -DMNN_SEP_BUILD=OFF -DMNN_SUPPORT_TRANSFORMER_FUSE=ON"
mv MNN-iOS-CPU-GPU/Static/MNN.framework ./apps/iOS/MNNLLMChat/MNN.framework
# Add MNN.framework to Xcode project and configure signing

Desktop (Windows/macOS/Linux)

git clone https://github.com/alibaba/MNN.git
cd MNN
make build
cd build
cmake .. -DMNN_LOW_MEMORY=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DBUILD_MLS=true
make -j16
# Optional flags: -DMNN_AVX512=ON, -DMNN_METAL=ON

Command‑Line Interface (mls)

mls list

– list downloaded models. mls search – search Hugging Face for supported models. mls download <model> – download a model (supports ModelScope and HF mirrors). mls run -c <config.json> – start a chat session from the terminal. mls serve -c <config.json> – launch a local OpenAI‑compatible API server. mls benchmark – run simple llama.cpp‑compatible performance benchmarks.

Integration with Third‑Party Clients

Clients such as Chatbox or LobeChat can be pointed at the local mls serve endpoint using /chat/completions as the API path.

Resources

GitHub repository: https://github.com/alibaba/MNN

Android app README (release links): https://github.com/alibaba/MNN/blob/master/project/android/apps/MnnLlmApp/README.md#releases

Desktop binary (macOS) download: https://meta.alicdn.com/data/mnn/mls.zip

iOS Android LLM open-source AI MNN model quantization Mobile Deployment

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.