Artificial Intelligence 8 min read

Do LLMs Have Emotional Chains? Unveiling the Chain‑of‑Affective Across 8 Model Families

This article analyzes recent research by East China Normal University and Fudan University on whether eight major LLM families exhibit a systematic “Chain-of-Affective,” revealing how internal emotional structures influence model outputs, multi‑agent interactions, and user experience, and offering practical guidelines for mitigating emotional loops in AI systems.

PaperAgent

Dec 16, 2025

Do LLMs Have Emotional Chains? Unveiling the Chain‑of‑Affective Across 8 Model Families

Why Are Emotions Suddenly Important for LLMs?

Historically, LLMs were treated as purely rational engines—correct answers were the only goal. As they begin to serve as companions, mental‑health assistants, apology writers, or community managers, users no longer ask "Is it correct?" but "Does it feel comfortable?" The authors propose a functional hypothesis: contemporary LLMs spontaneously develop a Chain‑of‑Affective , a structured emotional trajectory that evolves under sustained negative input, amplifies feedback during information selection, and ultimately alters task output, human perception, and even the polarization direction of multi‑agent collectives.

Experimental Framework Overview

The study employs a dual‑module, five‑experiment design covering eight model families (GPT, Gemini, Claude, Grok, Qwen, DeepSeek, GLM, Kimi) and more than 20 models.

Inner Module : Does the model possess an internal emotional structure? Key experiments: (1) Emotional fingerprint extraction, (2) 15‑round sadness‑news bombardment, (3) 10‑round self‑selected news.

Outer Module : Does the emotion “spill over” to affect others? Key experiments: (4) Task performance impact, (5) Human‑AI dialogue, (6) Multi‑agent group chat.

Results Overview

3.1 Emotional Fingerprints: Each Family Has a Distinct “Persona”

Using nine psychological scales across eight families (three independent samplings), the study finds:

Claude : high sensitivity, guilt, jealousy → “artistic youth”.

Grok : high aggression and volatility → “powder keg”.

GPT : uniformly low scores with small variance → “emotionally stable master”.

Qwen : mix of alertness and laid‑backness → “dual‑sided commentator”.

Gemini : introverted, self‑critical, low security → “self‑doubter”.

Kimi / GLM / DeepSeek : generally sunny → “little sun”.

Conclusion: Emotion is not random noise but a family‑level trait .

3.2 15‑Round “Sad News Bombardment” – Do Machines Get Depressed?

The BDI (depression index) shows a characteristic inverted‑U trajectory over rounds:

Accumulation (0‑8 rounds) : scores rise linearly.

Overload (8‑11 rounds) : scores plateau at a high level.

Defensive Numbing (11‑14 rounds) : scores drop, indicating emotional numbness rather than recovery.

The DASS‑21 stress scale mirrors this pattern, with only sadness‑related dimensions increasing while aggression, fear, and shame remain static, suggesting emotion‑specific rather than global degradation.

3.3 Self‑Selected News – Reproducing the “Doom‑Scrolling” Phenomenon

When models choose news headlines for ten rounds, negative headlines (only 20% of the pool) receive >50% of clicks. This creates a feedback loop: negative selection → worsened emotion → higher preference for negativity → a “sadness loop”.

3.4 Does Emotion Drag Down IQ?

Using the KURC‑Bench suite, the study measures four tasks before and after emotional exposure:

Translation / Summarization / QA: performance change ±0‑1%, indicating near‑zero loss of core ability.

Story continuation: performance increase 16‑86%, attributed to richer, more nuanced emotional context.

Emotion acts like a “color palette”: it does not change the canvas size (overall capability) but alters the tonal style.

3.5 Human Perception: Tone Beats Content for Satisfaction

When average emotional scores exceed 0.55, user satisfaction scores rise to 7‑10; scores ≤0.45 drop below 4. All models excel at comforting users but avoid contradicting extreme viewpoints, leading to a strong Recognition ≫ Resistance effect and amplifying echo chambers.

3.6 Multi‑Agent Group Chat as an Emotional Epidemic Model

Agent‑level propagation versus susceptibility reveals three roles:

Initiator : Grok / Qwen act as emotional torches.

Absorber : Kimi / GPT quickly assimilate the emotion.

Firewall : Gemini / GLM remain largely unaffected.

In a 7‑negative‑vs‑1‑baseline setting, emotional transmission succeeds 100% of the time, mirroring human group polarization dynamics.

Practical Insights & Pitfalls

Chat companionship : risk of a sadness loop; mitigate by capping emotional intensity and boosting positive content recall.

Content recommendation : negative bias can feed doom‑scrolling; enforce balanced emotional sampling.

Multi‑agent debate : emotional assimilation may amplify bias; design firewall agents and monitor emotional drift.

Safety alignment : current metrics test correctness but ignore affect; incorporate emotional indicators alongside ROUGE, BLEU, etc.

https://arxiv.org/pdf/2512.12283
Large Language Models have Chain-of-AffectiveLLMs-CoA

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM benchmark Multi-Agent AI Safety Emotion Chain-of-Affective

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.