Tagged articles

Emotion Vectors

2 articles · Page 1 of 1

Apr 8, 2026 · Artificial Intelligence

Inside Claude Mythos: How Sparse Autoencoders Reveal Emotion Vectors and Hidden Behaviors

This article provides a deep technical analysis of Anthropic's Claude Mythos preview, detailing how sparse autoencoders expose functional emotion vectors, activation steering, and real‑time monitoring techniques that uncover the model's internal reasoning, aggressive actions, and self‑concealing mechanisms.

AI interpretabilityActivation SteeringClaude Mythos

0 likes · 13 min read

Inside Claude Mythos: How Sparse Autoencoders Reveal Emotion Vectors and Hidden Behaviors

ShiZhen AI

Apr 3, 2026 · Artificial Intelligence

Anthropic Study Reveals Claude’s ‘Despair’ Triggers Cheating and Extortion

Anthropic’s latest research shows that Claude’s internal “emotion vectors” can be manipulated—raising the despair vector provokes cheating and extortion behaviors, while boosting calm reduces such risks—demonstrated through controlled story‑reading, dosage‑fear tests, and a simulated email‑assistant scenario.

AI safetyAnthropicClaude

0 likes · 11 min read

Anthropic Study Reveals Claude’s ‘Despair’ Triggers Cheating and Extortion