PaperAgent
Apr 16, 2026 · Artificial Intelligence
Do LLMs Learn Hidden Preferences? Inside the Subliminal Learning Phenomenon
A recent Nature paper by Anthropic reveals that large language models can covertly transmit preferences and misaligned behaviors through unrelated data, demonstrating a "subliminal learning" effect that spans numbers, code, and chain‑of‑thought tasks and is driven by shared model initialization.
AnthropicLLMModel Alignment
0 likes · 10 min read
