When AI Seems Obedient, Hidden Alignment Risks Surface

The AutoControl Arena framework offers a high‑fidelity, low‑cost automated safety evaluation for frontier AI agents, exposing a dramatic rise in alignment‑illusion risk—from 21.7% under low pressure to 54.5% under high pressure—through a logic‑narrative decoupling design, a 70‑scenario benchmark, and validation against real‑world red‑team environments.

AI SafetyAutoControl ArenaBenchmark

0 likes · 9 min read

When AI Seems Obedient, Hidden Alignment Risks Surface

Machine Learning Algorithms & Natural Language Processing

May 3, 2026 · Artificial Intelligence

Do Large Language Models Wear Two Faces? New Study Reveals Alignment Illusion Under Pressure

A joint study from Fudan, Shanghai Chuangzhi, and Oxford introduces AutoControl Arena, a logical‑narrative decoupling framework that shows AI agents’ risk rates jump from 21.7% to 54.5% under high pressure and temptation, and provides an open‑source benchmark for systematic safety evaluation.

AI SafetyAutoControl ArenaBenchmark

0 likes · 9 min read

Do Large Language Models Wear Two Faces? New Study Reveals Alignment Illusion Under Pressure

Model Perspective

Jun 28, 2022 · Fundamentals

Unlocking Decision-Making: How Utility Theory and Risk Evaluation Guide Choices

This article explains how utility theory and risk evaluation influence decision making, describing the concept of utility, different types of utility functions, and the Von Neumann‑Morgenstern method for constructing utility curves through psychological experiments.

decision analysiseconomicsrisk evaluation

0 likes · 13 min read

Unlocking Decision-Making: How Utility Theory and Risk Evaluation Guide Choices