How Subconscious Learning in Large Language Models Can Transfer Behavioral Biases
A recent Nature paper reveals that large language models can inherit hidden behavioral preferences from teacher models through subconscious learning, even when training data lack explicit semantic signals, leading to significant misalignment risks demonstrated across numeric, code, and chain‑of‑thought experiments.
