Fundamentals 10 min read

Unmasking the Algorithm Myth: Russ Cox’s 15‑Year Quest to Simplify Float‑to‑Decimal Conversion

Russ Cox revisits the half‑century‑old challenge of converting binary floating‑point numbers to shortest, correct decimal strings, presenting a new unrounded‑scaling algorithm that outperforms Dragonbox and Ryū while remaining simple, and explains its integration into Go 1.27 with detailed benchmarks and proofs.

TonyBai
TonyBai
TonyBai
Unmasking the Algorithm Myth: Russ Cox’s 15‑Year Quest to Simplify Float‑to‑Decimal Conversion

Russ Cox, a core Go contributor, revisits the long‑standing problem of converting a binary float64 value to a human‑readable decimal string that is both correct and the shortest possible representation. He frames the difficulty as an “impossible triangle” of three requirements.

The Impossible Triangle

Correctness : Parse(Print(f)) == f must always hold; no precision may be lost.

Shortest : The output string must be the shortest among all strings that round‑trip to the original value (e.g., 0.3 instead of 0.2999999999999999889).

Speed : Conversion speed directly impacts throughput in large‑scale data processing such as JSON serialization.

Historical Evolution

Dragon4 (1990): Guarantees correctness and shortest output but relies on slow big‑integer arithmetic.

Grisu3 (2010): Used in Google V8, extremely fast but does not guarantee shortest output; about 0.5% of cases fall back to a slower path.

Ryū (2018) & Dragonbox (2020): Achieve correctness and shortest output without big‑integer operations by using complex table‑lookup tricks, resulting in peak performance but very intricate code.

Cox’s goal is to combine Ryū‑level speed and correctness with the simplicity of the 2011 approach.

Core Technique – “Unrounded Scaling”

The new algorithm is built around a mathematical primitive called Fast Unrounded Scaling, which works with an “unrounded number” ⟨x⟩ consisting of three parts:

Integer part : floor(x) ½ bit : a flag indicating whether the fractional part is ≥ 0.5 ( x - floor(x) >= 0.5)

Sticky bit : a flag indicating whether any non‑zero fractional remainder remains ( x has a non‑zero tail)

This representation retains all information needed for correct round‑to‑even rounding while allowing cheap bitwise operations ( | and &) to manipulate the value until the final truncation step.

Scaling Magic

Floating‑point printing requires converting f = m * 2^e to a decimal form d * 10^p. Cox pre‑computes 128‑bit approximations of 10^p and multiplies m * 2^e by these tables. Remarkably, for 64‑bit floats he shows that a full 128‑bit product is unnecessary: computing only the high 64 bits of the product and using the low‑bit “sticky” information yields a completely correct result.

This insight, dubbed “Omit Needless Multiplications”, reduces the conversion to a handful of native 64‑bit multiplications.

From Theory to Go 1.27

Based on the primitive, Cox constructs an algorithm family:

FixedWidth : Fixed‑point printing (e.g., %.2f).

Shortest : Shortest‑representation printing (e.g., %g).

Parse : String‑to‑float parsing.

Performance Crush

Benchmarks on Apple M4 and AMD Ryzen 9 show:

Fixed‑width printing : The new uscale implementation outperforms glibc, double‑conversion, and even Ryū.

Shortest printing : Matches or exceeds the industry‑leading Dragonbox while having far simpler code.

Parsing : Beats the current fastest fast_float (Eisel‑Lemire) implementation.

Consequently, Go 1.27 will integrate this algorithm (or parts of it) directly into the standard library, giving fmt.Sprintf, json.Marshal and strconv.ParseFloat a noticeable speed boost without any code changes.

Proof by Computation

Cox also provides a full mathematical proof written in Ivy, an APL‑style language. Instead of using a formal verifier like Coq, he validates correctness by executing code that checks every possible float64 input, demonstrating “proof by computation”.

Conclusion

The 15‑year journey from the 2011 prototype to the 2026 breakthrough illustrates that extreme performance does not require extreme complexity. By returning to the mathematical roots of the problem, Cox delivers an algorithm that is both simple and fast, ready to become the new “heart” of Go’s standard library.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformancealgorithmGofloat64floating-point conversionproof by computationunrounded scaling
TonyBai
Written by

TonyBai

Tony Bai's tech world (tonybai.com). Not satisfied with just "knowing how", we strive for mastery. Focused on Go language internals, high-quality engineering practices, and cloud‑native architecture, exploring cutting‑edge intersections of Go and AI. Gophers who pursue technology are welcome—follow me and evolve with Go.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.