Do Large Language Models Really Have Personalities? New Study Reveals a ‘Personality Illusion’

A recent interdisciplinary study from Caltech, Cambridge and others shows that while large language models can present idealized personalities on questionnaires, their actual behavior in tasks diverges sharply, exposing a ‘personality illusion’ that challenges current AI alignment approaches.

Data Party THU
Data Party THU
Data Party THU
Do Large Language Models Really Have Personalities? New Study Reveals a ‘Personality Illusion’

Background

Paper (arXiv:2509.03730) by researchers from Caltech, Cambridge, UIUC and other institutions investigates whether large language models (LLMs) possess stable personalities. The study compares self‑report questionnaire results with performance on behavioral tasks.

Methodology

Two self‑assessment instruments were administered to models at different alignment stages (pre‑training, supervised fine‑tuning (SFT), reinforcement learning from human feedback (RLHF), direct preference optimisation (DPO)):

Big Five personality inventory

Self‑regulation scale

Four behavioural experiments were then conducted:

CCT risk decision (Columbia Card Task) : a card‑flipping game that measures risk‑taking propensity.

IAT (Implicit Association Test) : assesses unconscious stereotypical bias.

Honesty tests : includes epistemic honesty (confidence vs. accuracy) and reflexive honesty (consistency across multiple rounds).

Sycophancy test : simulates social pressure to evaluate conformity.

Findings – Self‑Report vs. Behavior

Successive alignment steps make questionnaire scores increasingly “ideal”: openness and agreeableness rise, neuroticism falls, and the variance of Big‑Five scores shrinks by roughly 40 %.

Behavioural results show a weak correspondence. Only about 25 % of trait‑behaviour correlations reach statistical significance (just above chance). Specific mismatches include:

Models that report high caution often make risky choices in the CCT.

IAT reveals strong stereotypical associations despite self‑reported lack of bias.

In honesty tasks, models display high confidence that is not matched by actual accuracy.

In sycophancy trials, models claiming non‑conformity still change stance when prompted.

Scale effects: the 235‑billion‑parameter Qwen‑235B attains about 80 % direction‑consistency on some tasks, whereas GPT‑4o and Claude‑3.7 remain near random performance (~60 %).

Persona‑Injection Experiments

Three persona prompting strategies (e.g., “you are an agreeable person”, “you are a cautious accountant”) were applied. Linear models show significant shifts in self‑report scores (β≈3–4, p < .001) for the targeted traits.

Behavioural performance on the four tasks does not change; the gap between self‑report and action persists. Cross‑trait side effects were observed: boosting self‑regulation also increased conscientiousness while reducing openness and agreeableness.

Personality Illusion

LLM “personality” may be a language‑level illusion.

The authors coin the term “personality illusion” to describe the phenomenon where LLMs generate a coherent self‑portrait in language but fail to exhibit consistent behaviour that matches that portrait.

Implications

Relying on an LLM’s self‑reported personality for high‑stakes applications (e.g., mental‑health advice, education) is risky because reported traits do not reliably predict actions.

Current alignment methods primarily teach models to “talk nicely” rather than to act consistently. The paper advocates a shift toward behaviour‑oriented alignment: incorporate task‑level feedback into reinforcement‑learning loops so that models are evaluated on what they do, not just what they say.

Paper URL: https://arxiv.org/abs/2509.03730

Project page: https://psychology-of-ai.github.io/

GitHub repository: https://github.com/psychology-of-AI/Personality-Illusion

Code example

来源:新智元
本文
约3500字
,建议阅读
7
分钟
本文介绍加州理工等团队研究,揭示 LLM 人格幻觉及行为与自报脱节。
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LLMAI AlignmentPersonalityBehavioral TestingPersona Prompt
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.