Are Transformers Truly Invertible? Uncovering Injectivity and the SIPIT Algorithm

A recent study demonstrates that mainstream Transformer language models are mathematically injective and practically invertible, with large‑scale experiments confirming no hidden‑state collisions and a new SIPIT algorithm achieving 100% input reconstruction across text and code.

Data Party THU
Data Party THU
Data Party THU
Are Transformers Truly Invertible? Uncovering Injectivity and the SIPIT Algorithm

Injectivity of Transformer Language Models

The paper Language Models are Injective and Hence Invertible demonstrates that mainstream Transformer‑based language models map each distinct input sequence to a unique hidden‑state representation. Mathematically, the forward pass of these models is an injective function, which implies that the transformation can be reversed in principle. Consequently, hidden states should be viewed as lossless re‑encodings of the original tokens rather than compressed semantic abstractions.

Figure
Figure

Empirical Verification of Injectivity

Six representative models were evaluated: GPT‑2, Gemma‑3, LLaMA‑3.1, Mistral, Phi‑4‑mini, and TinyStories. For each model the authors sampled 100 000 inputs from diverse corpora (Wikipedia, C4, The Pile, and a large Python code collection) and extracted the final‑token hidden state from every layer. Pairwise Euclidean distances were computed for more than 5 × 10⁹ input pairs. A collision was defined as a distance below 10⁻⁶. No collisions were observed in any layer of any model, even after generating all possible permutations of the 10 most semantically similar samples (over 3 × 10¹² comparisons). This exhaustive test confirms practical injectivity across scales and architectures.

Figure
Figure

SIPIT: Sequential Inverse Prompt via Iterative Updates

The authors introduce SIPIT, an algorithm that reconstructs the original input solely from hidden states. SIPIT exploits the causal structure of Transformers: the hidden state at position t depends only on tokens 1…t. By iteratively updating a candidate sequence to minimise the discrepancy between its forward pass and the observed hidden states, SIPIT recovers the exact input in linear time with respect to sequence length. Experiments on both natural‑language sentences and Python code snippets achieve 100 % reconstruction accuracy, and runtime is orders of magnitude faster than brute‑force enumeration.

Figure
Figure

Limitations, Numerical Considerations, and Privacy Implications

While the theoretical results assume exact arithmetic, real‑world deployments involve floating‑point rounding, quantisation, and stochastic training dynamics. These factors can introduce tiny deviations that may prevent perfect reconstruction in practice, even though no collisions were observed in the extensive empirical suite. The authors caution that hidden activations effectively contain the raw input data; therefore, exposing intermediate states can leak user information. Secure handling of activations and careful assessment of model‑compression or distillation pipelines are recommended to mitigate privacy risks.

TransformerInjectivityInvertibilitySIPIT
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.