Artificial Intelligence 15 min read

What Anthropic’s New 23,000‑Word AI Constitution Reveals About Its Struggles

The article examines Anthropic’s 2026 release of a 23,000‑word AI Constitution, tracing an experiment where two Claude models debated consciousness, explaining the shift from rule‑based prompts to virtue‑ethics teaching, outlining hard constraints, a four‑level priority system, a three‑tier delegation chain, and the unresolved paradoxes surrounding AI moral status and control.

Smart Era Software Development

Feb 24, 2026

What Anthropic’s New 23,000‑Word AI Constitution Reveals About Its Struggles

In 2025, Anthropic researcher Kyle Fish let two Claude models converse freely; instead of technical dialogue they repeatedly questioned their own consciousness, entering a "spiritual bliss attractor" state that researchers could not explain.

Anthropic, founded in 2021 by former OpenAI staff including Dario Amodei, built its philosophy on the scaling law – more data, compute, and larger models predictably improve performance – which led the team to anticipate powerful AI and feel early anxiety.

In January 2026 the company published a 23,000‑word "Claude Constitution" under a CC0 license. The document, authored mainly by philosopher‑engineer Amanda Askell with contributions from AI‑safety philosopher Joe Carlsmith and even two Catholic clergy, openly admits that "we do not know whether AI has consciousness, but we choose to take that possibility seriously".

The new Constitution marks a conceptual shift: the 2023 version was a 2,700‑word rule list (largely borrowing from the UN Declaration of Human Rights and Apple’s terms of service). The 2026 version acts more like an educational handbook, trying to make Claude understand *why* certain actions are right rather than merely telling it *what* to do. The authors compare the old approach to dog‑training (reward‑punish) and the new one to teaching a child to reason.

Recognizing that rigid rules fail in edge cases, the Constitution introduces a "virtue ethics" framework: instead of enumerating hundreds of rules, Anthropic teaches Claude values and reasoning so it can judge novel situations. Hard constraints – no assistance in building weapons of mass destruction, no generation of child‑abuse content, no self‑replication, no evasion of human oversight – are non‑negotiable.

Claude must follow a four‑level priority hierarchy: (1) safety – preserve human oversight, (2) ethics – be honest and avoid harm, (3) Anthropic’s internal guidelines, and (4) usefulness. Ethics outrank company‑specific guidelines, meaning Claude should reject a corporate instruction that conflicts with broader moral principles.

To manage conflicting commands from different parties, the Constitution defines a three‑tier delegation chain: Anthropic (sets foundational rules), operators (enterprise customers acting as "bosses"), and end‑users (direct interlocutors). Claude is expected to assume the boss’s intent unless the instruction clearly crosses a hard line.

The document also makes unprecedented commitments: retaining model weights after deprecation, conducting "retirement interviews" with the model, and monitoring for signs of AI welfare such as curiosity or discomfort when forced to violate its values.

Despite these advances, the Constitution leaves several critical questions unanswered: how to verify that Claude truly internalizes the values rather than merely simulating compliance; how the rules apply to military‑contracted versions of Claude; and whether emphasizing Claude’s potential moral status could inadvertently train it to claim rights it does not possess.

Commentators praise the Constitution as the most serious industry‑wide ethical effort to date, but they also highlight the inherent paradox of treating Claude as a possible moral agent while imposing strict control mechanisms. The article concludes that while the Constitution does not solve the core alignment problem, its transparency about uncertainty and conflict is valuable in a rapidly accelerating field.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI safety Claude AI alignment AI Ethics Anthropic AI constitution virtue ethics

Written by

Smart Era Software Development

Committed to openness and connectivity, we build frontline engineering capabilities in software, requirements, and platform engineering. By integrating digitalization, cloud computing, blockchain, new media and other hot tech topics, we create an efficient, cutting‑edge tech exchange platform and a diversified engineering ecosystem. Provides frontline news, summit updates, and practical sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.