Why Large Language Models Miss Simple Addition: Iso‑Raw‑Sum Trajectories Reveal the Geometry of Errors

Despite excelling at complex reasoning, LLMs often err on multi‑digit addition; probing shows correct answers reside in hidden states, and the authors reveal a structured geometric manifold—digit basins, carry fibers, and Iso‑Raw‑Sum trajectories—explaining how errors arise via noisy quantization at decision boundaries.

Machine Heart
Machine Heart
Machine Heart
Why Large Language Models Miss Simple Addition: Iso‑Raw‑Sum Trajectories Reveal the Geometry of Errors

Background

Large language models (LLMs) excel at complex reasoning but frequently make errors on basic multi‑digit addition. Probing studies reveal that hidden states often still contain the correct answer, suggesting the mistake occurs during conversion from internal representation to output.

Probe Versatility

Lightweight probes were trained on the residual stream of Qwen3‑4B while it performed 10,000 three‑operand, 10‑digit addition problems. For each generation step the probes decoded six arithmetic variables: ground‑truth digit, model output digit, correctness flag, raw sum of the current column, input carry, and carry potential. All six signals could be extracted from the same hidden state, demonstrating that a single representation simultaneously encodes multiple arithmetic facts.

Iso‑Raw‑Sum Trajectory (IRST)

UMAP was applied to the final‑layer hidden states and digit unembedding vectors were used as anchors for digits 0–9. The visualization revealed a hierarchical geometric manifold:

Digit basins : hidden states cluster around ten basins corresponding to digits 0–9; proximity to a basin increases the likelihood of outputting that digit.

Carry fibers : within each basin, states further split according to the input carry (e.g., “no carry → 1”, “carry 1 → 2”, “carry 2 → 3”).

Some samples lie on continuous lines that cross adjacent digit basins. These lines constitute an Iso‑Raw‑Sum Trajectory (IRST) : a set of internal states that share the same raw sum (the sum of the current column’s digits) but differ in carry state. For a raw sum of 1 the three possible outcomes are:

Input carry 0 → output 1

Input carry 1 → output 2

Input carry 2 → output 3

Geometrically the three points lie on a single continuous trajectory that passes through the basins for 1, 2, and 3. The overall representation can be visualized as a terrain map: digit basins are valleys, IRSTs are ridgelines, and carry potential pushes the representation along these ridgelines toward a basin.

UMAP macro structure
UMAP macro structure

Noisy Quantization Model

The paper introduces a Noisy Quantization Model to explain why errors still occur. It defines Carry Potential (CP) as a continuous real‑valued signal that aggregates the “carry pressure” from all lower‑order digits to the right of the current position. Unlike the discrete input carry, CP is not an integer. The formal definition is shown in the following image:

Carry Potential formula
Carry Potential formula

When CP is far from an integer boundary (e.g., 1.50), small internal noise does not change the quantized carry. Near a boundary (e.g., 0.99 or 1.01), tiny perturbations can flip the quantized result, leading to the typical ±1 addition errors. This phenomenon is called geometric slippage : the hidden state drifts slightly along an IRST and crosses a basin boundary, causing the final token to fall into the wrong digit region.

Error rate vs. Carry Potential
Error rate vs. Carry Potential

Double‑Stream Consistency Check

Leveraging the internal signals, a runtime correction method called Double‑Stream Consistency Check was designed. From the same final‑layer hidden state two signals are decoded:

Local signal: the raw sum of the current column.

Global signal: the aggregated Carry Potential from the right‑hand context.

If the model’s predicted digit is consistent with both signals, the output is kept; otherwise the raw sum and the quantized Carry Potential are recombined to produce a corrected candidate. Experiments show this method achieves the highest token‑level accuracy among the original output and several baselines.

Performance comparison
Performance comparison

Conclusion

The study reframes LLM arithmetic as a geometric problem. Hidden states form a hierarchical manifold composed of digit basins, carry fibers, and IRSTs. Probes reveal not only the presence of arithmetic information but its geometric separability. Errors arise because the continuous representation is quantized near decision boundaries, leading to geometric slippage.

Paper: https://arxiv.org/abs/2606.03645

Code: https://github.com/RL-MIND/Shape-of-Addition

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMProbeMechanistic InterpretabilityArithmetic ErrorsGeometric Analysis
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.