Tagged articles
5 articles
Page 1 of 1
AI Engineering
AI Engineering
Jan 21, 2026 · Artificial Intelligence

Running Large Language Models on Phones: Liquid AI’s LFM2.5‑1.2B‑Thinking Fits in 900 MB

Liquid AI’s LFM2.5‑1.2B‑Thinking model runs entirely on a smartphone with only 900 MB of memory, scores 88 on MATH‑500, 69 on Multi‑IF, and 57 on BFCLv3 benchmarks, outperforms larger rivals, and achieves real‑time speeds on Snapdragon 8 Elite and AMD Ryzen 9 3950X, signaling a shift toward edge AI.

BenchmarkLFM2.5Mobile AI
0 likes · 4 min read
Running Large Language Models on Phones: Liquid AI’s LFM2.5‑1.2B‑Thinking Fits in 900 MB
DaTaobao Tech
DaTaobao Tech
Jul 12, 2023 · Artificial Intelligence

Optimizing ChatGLM-6B Deployment with MNN: Model Conversion, Quantization, and Edge Inference

The article details a workflow that converts the PyTorch ChatGLM‑6B model to MNN, splits and compresses embeddings, applies int4/int8 quantization, supports dynamic shapes, and uses hybrid GPU/CPU or CPU‑only loading to enable low‑memory edge inference on PCs and mobile devices with competitive token‑per‑second performance.

ChatGLMLLMMNN
0 likes · 16 min read
Optimizing ChatGLM-6B Deployment with MNN: Model Conversion, Quantization, and Edge Inference
Code DAO
Code DAO
May 21, 2022 · Artificial Intelligence

How Quantization and Fusion Accelerate CNN Inference on Edge Devices

The article explains CNN inference optimization by applying PyTorch quantization and module‑fusion techniques, compares model size and latency before and after quantization, shows code for building, quantizing, and fusing a simple CNN, and presents benchmark results on CPU, highlighting a four‑fold size reduction and up to 1.7× speed‑up.

CNNPyTorchedge inference
0 likes · 11 min read
How Quantization and Fusion Accelerate CNN Inference on Edge Devices
Tencent Music Tech Team
Tencent Music Tech Team
Apr 30, 2020 · Mobile Development

Edge Deep Learning Inference on Mobile Devices: Challenges, Hardware Diversity, and Optimization Strategies

Edge deep learning inference on mobile devices faces hardware and software fragmentation, diverse CPUs, GPUs, DSPs, and NPUs, and limited programmability; optimization techniques such as model selection, quantization, and architecture‑specific tuning enable real‑time performance, with most inference on CPUs, GPUs offering 5–10× speedups, and co‑processor support varying across Android and iOS.

DSPGPU programmingNPU
0 likes · 17 min read
Edge Deep Learning Inference on Mobile Devices: Challenges, Hardware Diversity, and Optimization Strategies
Alibaba Cloud Developer
Alibaba Cloud Developer
May 21, 2019 · Artificial Intelligence

How Alibaba’s Offline AI Advances Model Compression and Edge Inference

Alibaba’s Machine Intelligence Lab shares two years of breakthroughs in offline AI, detailing low‑bit quantization, unified sparsity frameworks, hardware‑software co‑design, lightweight networks, and on‑device detection, alongside standardized training tools, multi‑platform inference engines, and productized edge solutions such as smart boxes and integrated cameras.

AIedge inferencehardware-software co-design
0 likes · 16 min read
How Alibaba’s Offline AI Advances Model Compression and Edge Inference