What Is UE8M0? Unpacking FP8 and Fixed‑Point Numbers Behind DeepSeek V3.1
This article explains the meaning of UE8M0 by introducing fixed‑point (INT8) and floating‑point representations, showing how integers and decimals are stored in binary, describing the limitations of fixed‑point, the advantages of floating‑point scientific notation, and detailing the emerging FP8 formats such as E4M3 and E5M2 used in modern AI hardware.
What Is UE8M0?
DeepSeek V3.1 sparked a comment about the term UE8M0, which leads to a discussion of numeric representations in computers.
1. Fixed‑Point Numbers
To store the integer 66 in a computer, we convert it to binary: 0100 0010. This 8‑bit binary integer is called INT8 .
For a decimal like 13.25 , we convert the integer part (13) to binary ( 1101) and the fractional part (0.25) to binary ( 0100), yielding 1101.0100. Storing this as an 8‑bit fixed‑point number means the first four bits represent the integer part and the last four bits represent the fractional part.
Fixed‑point representation is simple but inflexible: the position of the decimal point is fixed, limiting either the range (if the point is near the integer side) or the precision (if the point is near the fractional side).
2. Floating‑Point Numbers
Floating‑point numbers solve the fixed‑point limitation by allowing the decimal point to “float.” Using scientific notation, a number is expressed as a mantissa multiplied by a power of two. For example, the decimal 235 with a three‑digit mantissa and exponent –3 is represented as 235 × 2⁻³.
Adjusting the exponent changes the range (larger exponent) or the precision (smaller exponent). This flexible format is called a floating‑point number .
3. FP8 (Eight‑Bit Floating‑Point)
Designing an 8‑bit floating‑point format, we allocate bits as follows: 1 sign bit, 4 exponent bits, and 3 mantissa bits. This format is called Float Point 8 (FP8) and abbreviated as FP8 . The specific allocation E4M3 (4 exponent bits, 3 mantissa bits) is one of the two FP8 formats supported by NVIDIA GPUs; the other is E5M2.
Different allocations of exponent and mantissa bits create various FP8 formats, each balancing range and precision. For example, E5M2 has more exponent bits (larger range) but fewer mantissa bits (lower precision).
4. Historical Background
The most common floating‑point formats are FP32 (single precision) and FP64 (double precision), used in languages like Java as float and double. As deep learning grew, 32‑bit became too large, leading to 16‑bit formats such as FP16 and BF16 . With the rise of large models, storage was further compressed, giving rise to the two FP8 formats and even more extreme variants.
These formats serve different purposes—some store weights, others gradients or scaling factors—but none is universally superior; they are engineering trade‑offs to squeeze performance from modern AI hardware.
5. Final Thoughts
The author reflects that current AI engineering optimizations, such as precise bit‑level tweaks, may one day be viewed as a historical curiosity, much like early programmers optimized for limited memory and storage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
