What Do Your Logits Know? Surprising Insights from Apple’s New AI Paper

Apple’s recent AI paper probes whether large vision‑language models truly forget user data by examining residual streams and final logits, revealing that hidden image attributes persist in top‑k outputs and exposing significant privacy and security risks.

Machine Heart
Machine Heart
Machine Heart
What Do Your Logits Know? Surprising Insights from Apple’s New AI Paper

Apple’s AI research team recently released the paper “What do your logits know? (The answer may surprise you!)” (arXiv:2604.09885), which investigates whether large vision‑language models truly forget information after processing.

Information Bottleneck Principle

The authors introduce the Information Bottleneck Principle, illustrating it with a CEO deciding on an acquisition: only decision‑relevant data should survive compression, while irrelevant details are discarded. The same idea applies to visual‑language models, where irrelevant image features should be filtered before the final answer.

Experiment Design

Two lightweight probe networks are attached to specific model layers: the Residual Stream , which stores all hidden states, and the final Logits , the raw probability scores before the last token is chosen. Experiments use the synthetic CLEVR dataset and the real‑world MSCOCO dataset, adding various perturbations such as Gaussian noise, glass blur, and motion blur.

Probes are trained to infer image attributes (noise level, object color, background objects) from the selected layers after the model answers a simple visual question.

Seven Findings

1. Residual Stream as an Oracle

The residual stream retains almost all image details, allowing probes to recover noise type, object shape, color, and even unrelated background attributes with near‑perfect accuracy, indicating no effective compression at this stage.

2. Low‑dimensional Projections Still Leak Secrets

Using Tuned Lens to map residual stream trajectories to Logit space, probes can extract core decision information and background features from the top‑2 trajectories, showing that information bottleneck filtering does not occur.

3. Final Logits Encode Decision and Target Information

At the last layer, some compression happens but is insufficient; probes can accurately predict image noise level and type from the top‑2 logits.

4. Unasked Attributes Appear in Top‑k Logits

Even when a prompt omits certain object properties (e.g., material or size), probes can infer these from the top‑0.5L logits, revealing that the model carries redundant target features to the output.

5. Logits Record Environmental Context

Beyond the target object, increasing the number of examined logits allows accurate prediction of background object count, color, and other scene attributes, exposing hidden environmental data.

6. Leakage Peaks with ~60 Logits (U‑shaped Curve)

Accuracy rises sharply when observing 30‑80 logits, then drops as more logits add high‑dimensional noise, indicating that a small head of the output distribution is sufficient for privacy leakage.

7. Top‑k Logits Match Deep‑Layer Risks

When the observation dimension is held constant, extracting information from top‑k logits (often exposed via public APIs) is as effective as accessing deep internal states, challenging the belief that gray‑box API access is inherently safe.

Privacy and Security Implications

The findings highlight a serious privacy risk: even a simple visual‑question‑answer API that returns only a short answer and top‑k probabilities can inadvertently expose detailed background and personal information contained in the uploaded image. Malicious actors could reconstruct private attributes from these probability scores, and the residual hidden information also contributes to hallucinations in generated text.

Conclusion

The paper warns that the seemingly harmless top‑k logits of large models can act as a “recording device” for user data, posing a damoclean over generative AI deployments and urging stronger safeguards for privacy‑preserving model design.

privacyvision-language modelsinformation bottleneckAI securitylogitsmodel probing
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.