Running Large Language Models on Phones: Liquid AI’s LFM2.5‑1.2B‑Thinking Fits in 900 MB

Liquid AI’s LFM2.5‑1.2B‑Thinking model runs entirely on a smartphone with only 900 MB of memory, scores 88 on MATH‑500, 69 on Multi‑IF, and 57 on BFCLv3 benchmarks, outperforms larger rivals, and achieves real‑time speeds on Snapdragon 8 Elite and AMD Ryzen 9 3950X, signaling a shift toward edge AI.

AI Engineering
AI Engineering
AI Engineering
Running Large Language Models on Phones: Liquid AI’s LFM2.5‑1.2B‑Thinking Fits in 900 MB

Liquid AI announced LFM2.5‑1.2B‑Thinking, a 1.2‑billion‑parameter language model designed for fully on‑device inference. The model requires just 900 MB of RAM, enabling it to run on any modern smartphone.

The architecture emphasizes concise inference training and generates an internal “thinking” trajectory before producing answers, which improves systematic problem solving. On the MATH‑500 benchmark, math reasoning rose from 63 to 88 points; instruction following improved from 61 to 69 on Multi‑IF; and tool‑use scores increased from 49 to 57 on BFCLv3.

Compared with the earlier Qwen3‑1.7B model, LFM2.5‑1.2B‑Thinking has 40 % fewer parameters yet matches or exceeds most benchmark results, while requiring fewer output tokens and less compute during testing.

Inference speed tests show the model decoding at 237 tokens per second on an AMD Ryzen 9 3950X, surpassing Granite‑4.0‑H‑1B (147 tok/s) and Qwen3‑17B (122 tok/s). Memory usage during inference is 853 MB, lower than the competing models.

On the Qualcomm Snapdragon 8 Elite mobile platform, the model reaches 70 tok/s, sufficient for real‑time interactive applications, demonstrating the feasibility of deploying sophisticated AI services directly on phones.

The model is publicly available via Hugging Face, LEAP, and Liquid Playground, and works out‑of‑the‑box with major inference frameworks such as llama.cpp, MLX, vLLM, and ONNX Runtime.

Community comments note that this level of on‑device inference shifts intelligence from centralized services to edge hardware, changing latency, privacy, and ownership dynamics. The release also makes it reasonable for major smartphone manufacturers to design AI‑focused devices around this model family, which includes base, instruction, thinking, Japanese‑optimized, vision‑language, and audio variants.

Developers can fine‑tune the model using TRL and Unsloth. Its hybrid architecture comprises ten dual‑gate LIV convolution blocks and six GQA blocks, delivering strong reasoning capability while keeping the footprint small.

The advancement mirrors the historical trajectory of mobile computing—from mainframes to PCs to smartphones—and suggests a similar decentralization trend for AI.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Mobile AIlarge language modelBenchmarkSnapdragonedge inferenceRyzenLFM2.5
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.