PaperAgent
PaperAgent
Dec 19, 2025 · Artificial Intelligence

Can We Trust AI? Inside GPT‑5.2‑Codex’s Monitorability Breakthrough

OpenAI’s new GPT‑5.2‑Codex model achieves state‑of‑the‑art performance on SWE‑Bench Pro and Terminal‑Bench 2.0, and a 90‑page technical report introduces the concept of monitorability, defining metrics, benchmark suites, and key findings about chain‑of‑thought length, RL training, and model size.

AI safetyGPT-5.2benchmark
0 likes · 10 min read
Can We Trust AI? Inside GPT‑5.2‑Codex’s Monitorability Breakthrough