Are Transformers Truly Invertible? Uncovering Injectivity and the SIPIT Algorithm
A recent study demonstrates that mainstream Transformer language models are mathematically injective and practically invertible, with large‑scale experiments confirming no hidden‑state collisions and a new SIPIT algorithm achieving 100% input reconstruction across text and code.
Injectivity of Transformer Language Models
The paper Language Models are Injective and Hence Invertible demonstrates that mainstream Transformer‑based language models map each distinct input sequence to a unique hidden‑state representation. Mathematically, the forward pass of these models is an injective function, which implies that the transformation can be reversed in principle. Consequently, hidden states should be viewed as lossless re‑encodings of the original tokens rather than compressed semantic abstractions.
Empirical Verification of Injectivity
Six representative models were evaluated: GPT‑2, Gemma‑3, LLaMA‑3.1, Mistral, Phi‑4‑mini, and TinyStories. For each model the authors sampled 100 000 inputs from diverse corpora (Wikipedia, C4, The Pile, and a large Python code collection) and extracted the final‑token hidden state from every layer. Pairwise Euclidean distances were computed for more than 5 × 10⁹ input pairs. A collision was defined as a distance below 10⁻⁶. No collisions were observed in any layer of any model, even after generating all possible permutations of the 10 most semantically similar samples (over 3 × 10¹² comparisons). This exhaustive test confirms practical injectivity across scales and architectures.
SIPIT: Sequential Inverse Prompt via Iterative Updates
The authors introduce SIPIT, an algorithm that reconstructs the original input solely from hidden states. SIPIT exploits the causal structure of Transformers: the hidden state at position t depends only on tokens 1…t. By iteratively updating a candidate sequence to minimise the discrepancy between its forward pass and the observed hidden states, SIPIT recovers the exact input in linear time with respect to sequence length. Experiments on both natural‑language sentences and Python code snippets achieve 100 % reconstruction accuracy, and runtime is orders of magnitude faster than brute‑force enumeration.
Limitations, Numerical Considerations, and Privacy Implications
While the theoretical results assume exact arithmetic, real‑world deployments involve floating‑point rounding, quantisation, and stochastic training dynamics. These factors can introduce tiny deviations that may prevent perfect reconstruction in practice, even though no collisions were observed in the extensive empirical suite. The authors caution that hidden activations effectively contain the raw input data; therefore, exposing intermediate states can leak user information. Secure handling of activations and careful assessment of model‑compression or distillation pipelines are recommended to mitigate privacy risks.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
