Artificial Intelligence 10 min read

What Is UE8M0 FP8 and Why It’s Boosting China’s Next‑Gen AI Chips

The article explains the UE8M0 FP8 precision format, its MXFP8 origins, how it reduces bandwidth and power consumption, and why Chinese AI chip makers like Cambricon, HaiGuang and Moore Threads are rapidly adopting it, signaling a shift toward domestic AI hardware independence.

IT Services Circle

Aug 24, 2025

What Is UE8M0 FP8?

UE8M0 FP8 can be split into two parts: the UE8M0 part is the "scaling factor" in the MXFP8 path.

MXFP8 is an 8‑bit microscaling block format defined in the Open Compute Project’s 2023 "Microscaling (MX) Formats Specification v1.0".

The Open Compute Project, launched in 2011 by Meta (formerly Facebook) together with Intel, Rackspace and others, aims to improve data‑center efficiency through open‑source hardware designs.

MXFP8 builds on FP8, which compresses a conventional floating‑point format into 8 bits.

The core idea of MXFP8 is to split a tensor into fixed‑length blocks, assign each block a power‑of‑two scaling factor, divide all numbers in the block by that factor, and then store them as FP8.

This block‑level scaling (instead of whole‑tensor scaling) retains the 8‑bit width while expanding the usable dynamic range by dozens of times.

The scaling factor itself occupies 8 bits, containing a sign bit, exponent bits and mantissa bits, which developers can allocate as needed. In UE8M0, the sign bit is omitted (unsigned), the exponent bits take all 8 bits (E8M0), while other common formats like E4M3 and E5M2 split the bits between exponent and mantissa.

DeepSeek’s earlier open‑source FP8 GEMM kernel project DeepGEMM already supports UE8M0, but it targets NVIDIA GPUs and the CUDA ecosystem.

Using a full‑exponent scaling factor brings two main benefits: restoration requires only a power‑of‑two multiplication (no floating‑point multiply, normalization or rounding), shortening the critical path; and the dynamic range (2⁻¹²⁷ to 2¹²⁸) easily covers the needed span for block scaling.

UE8M0 also avoids the overflow or underflow issues of single‑scale FP8, dramatically reducing information loss while keeping 8‑bit precision.

Which Domestic Chips Are Optimized for DeepSeek?

Many Chinese AI accelerators still use FP16/BF16 + INT8 pipelines and lack native E4M3/E5M2 FP8 units, but new chips such as Moore Thread’s MUSA 3.1 GPU and Chip‑Origin’s VIP9000 NPU list "native FP8" or "Block FP8" support and have jointly validated the UE8M0 format with DeepSeek and other vendors.

Cambricon’s MLU370‑S4, Siyuan 590 and the latest 690 series already support FP8, leading to a near‑14% stock surge and a market‑cap jump to the top of the Science‑Innovation board.

HaiGuang: its Deep‑Compute DCU supports FP8.

MuXi: the Xiyun C600 (released July) supports FP8.

Zhonghao Xinying: its "Shana" TPU AI chip supports FP8.

Moore Thread: flagship MTT S5000 GPU supports FP8.

Other manufacturers such as Huawei’s Ascend line are expected to add native FP8 support by late 2025, with the upcoming 910D likely to include it.

This indicates that domestic AI is moving toward a hardware‑software co‑design stage, substantially reducing reliance on foreign compute power from NVIDIA, AMD and others.

Because UE8M0 FP8 reduces bandwidth, power consumption and improves throughput, the same hardware can run larger models, dramatically increasing the cost‑performance ratio of Chinese chips.

The collaboration between DeepSeek and domestic chip makers mirrors the historic Wintel alliance, creating a unified ecosystem that strengthens the competitive position of Chinese AI hardware.

One More Thing

The official announcement only mentioned UE8M0 FP8 in a single sentence: "DeepSeek‑V3.1 uses UE8M0 FP8 scale parameters," a note that was hidden deep within a long list of feature updates.

Reference links:

1. https://www.zhihu.com/question/1941891000319580108

2. https://www.zhihu.com/question/1941882763503473149/answer/1942093625908524069

3. https://docs.nvidia.com/cuda/nvmath-python/0.3.0/tutorials/notebooks/matmul/04_fp8.html

4. https://www.ainvest.com/news/deepseek-ue8m0-fp8-optimization-rise-china-sufficient-ai-stack-2508/

DeepSeek FP8 AI hardware Chinese chips precision format UE8M0

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.