Tagged articles

edge inference

6 articles · Page 1 of 1

Jul 18, 2026 · Artificial Intelligence

Six Key AI Trends Unveiled at WAIC 2026: From Usable to Handy Models

The 2026 World AI Conference in Shanghai highlighted six major trends—including native multimodal models, engineering‑grade AI agents, powerful edge NPU inference, world‑model‑driven embodied intelligence, practical AI safety frameworks, and vertically‑focused medium‑scale models—each illustrating a shift from experimental prototypes to production‑ready, finely engineered solutions.

AI SafetyAI agentsedge inference

0 likes · 15 min read

Six Key AI Trends Unveiled at WAIC 2026: From Usable to Handy Models

AI Engineering

Jan 21, 2026 · Artificial Intelligence

Running Large Language Models on Phones: Liquid AI’s LFM2.5‑1.2B‑Thinking Fits in 900 MB

Liquid AI’s LFM2.5‑1.2B‑Thinking model runs entirely on a smartphone with only 900 MB of memory, scores 88 on MATH‑500, 69 on Multi‑IF, and 57 on BFCLv3 benchmarks, outperforms larger rivals, and achieves real‑time speeds on Snapdragon 8 Elite and AMD Ryzen 9 3950X, signaling a shift toward edge AI.

BenchmarkLFM2.5Ryzen

0 likes · 4 min read

Running Large Language Models on Phones: Liquid AI’s LFM2.5‑1.2B‑Thinking Fits in 900 MB

DaTaobao Tech

Jul 12, 2023 · Artificial Intelligence

Optimizing ChatGLM-6B Deployment with MNN: Model Conversion, Quantization, and Edge Inference

The article details a workflow that converts the PyTorch ChatGLM‑6B model to MNN, splits and compresses embeddings, applies int4/int8 quantization, supports dynamic shapes, and uses hybrid GPU/CPU or CPU‑only loading to enable low‑memory edge inference on PCs and mobile devices with competitive token‑per‑second performance.

ChatGLMLLMMNN

0 likes · 16 min read

Optimizing ChatGLM-6B Deployment with MNN: Model Conversion, Quantization, and Edge Inference

Code DAO

May 21, 2022 · Artificial Intelligence

How Quantization and Fusion Accelerate CNN Inference on Edge Devices

The article explains CNN inference optimization by applying PyTorch quantization and module‑fusion techniques, compares model size and latency before and after quantization, shows code for building, quantizing, and fusing a simple CNN, and presents benchmark results on CPU, highlighting a four‑fold size reduction and up to 1.7× speed‑up.

CNNModel CompressionPerformance benchmarking

0 likes · 11 min read

How Quantization and Fusion Accelerate CNN Inference on Edge Devices

Tencent Music Tech Team

Apr 30, 2020 · Mobile Development

Edge Deep Learning Inference on Mobile Devices: Challenges, Hardware Diversity, and Optimization Strategies

Edge deep learning inference on mobile devices faces hardware and software fragmentation, diverse CPUs, GPUs, DSPs, and NPUs, and limited programmability; optimization techniques such as model selection, quantization, and architecture‑specific tuning enable real‑time performance, with most inference on CPUs, GPUs offering 5–10× speedups, and co‑processor support varying across Android and iOS.

DSPGPU programmingNPU

0 likes · 17 min read

Edge Deep Learning Inference on Mobile Devices: Challenges, Hardware Diversity, and Optimization Strategies

Alibaba Cloud Developer

May 21, 2019 · Artificial Intelligence

How Alibaba’s Offline AI Advances Model Compression and Edge Inference

Alibaba’s Machine Intelligence Lab shares two years of breakthroughs in offline AI, detailing low‑bit quantization, unified sparsity frameworks, hardware‑software co‑design, lightweight networks, and on‑device detection, alongside standardized training tools, multi‑platform inference engines, and productized edge solutions such as smart boxes and integrated cameras.

AIModel Compressionedge inference